[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103652
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 19/May/18 05:42
Start Date: 19/May/18 05:42
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #5422: [BEAM-4347] 
Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#issuecomment-390381393
 
 
   Thanks for the review @rangadi 
   
   If you're ok with the proposal I create a Jira for a review of Future 
handling in `KafkaIO` I don't think there are further changes.  Alternatively, 
we can do that exploration now - if we find changes are needed they may be more 
invasive and warrant a specific jira for that anyway though.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103652)
Time Spent: 2h  (was: 1h 50m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103651
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 19/May/18 05:41
Start Date: 19/May/18 05:41
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on a change in pull request 
#5422: [BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189426046
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaExactlyOnceSink.java
 ##
 @@ -433,6 +433,7 @@ void beginTxn() {
 ProducerSpEL.beginTransaction(producer);
   }
 
+  @SuppressWarnings("FutureReturnValueIgnored")
 
 Review comment:
   In general I think leaving `TODOs` in code is a poor practice. 
   How about I open a jira task to review future handling in `KafkaIO`, 
informing that we've suppressed warnings (doesn't change current behaviour) but 
would like to review if we may drop messages on exceptional cases? I'll be 
happy to dig deeper with you on that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103651)
Time Spent: 1h 50m  (was: 1h 40m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103650
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 19/May/18 05:37
Start Date: 19/May/18 05:37
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on a change in pull request 
#5422: [BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189425976
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaExactlyOnceSink.java
 ##
 @@ -563,6 +564,7 @@ void commitTxn(long lastRecordId, Counter numTransactions) 
throws IOException {
  * closed if it is stays in cache for more than 1 minute, i.e. not used 
inside
  * KafkaExactlyOnceSink DoFn for a minute.
  */
+@SuppressWarnings("FutureReturnValueIgnored")
 
 Review comment:
   FYI: It's from the ` SCHEDULED_CLEAN_UP_THREAD.scheduleAtFixedRate(...)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103650)
Time Spent: 1h 40m  (was: 1.5h)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103649
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 19/May/18 05:03
Start Date: 19/May/18 05:03
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on a change in pull request 
#5422: [BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189425329
 
 

 ##
 File path: 
sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java
 ##
 @@ -529,7 +529,7 @@ public void testUnboundedSourceCustomTimestamps() {
   (tp, prevWatermark) -> new 
CustomTimestampPolicyWithLimitedDelay(
 (record -> new 
Instant(TimeUnit.SECONDS.toMillis(record.getKV().getValue())
  + customTimestampStartMillis)),
-   Duration.millis(0),
+   Duration.ZERO,
 
 Review comment:
   Yes.
   It offers some nice suggestions for readability I've found (e.g. 
seconds(180) -> minutes(3))


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103649)
Time Spent: 1.5h  (was: 1h 20m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103648
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 19/May/18 05:02
Start Date: 19/May/18 05:02
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on a change in pull request 
#5422: [BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189425305
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java
 ##
 @@ -614,7 +619,7 @@ private void nextBatch() {
 partitionStates.forEach(p -> p.recordIter = 
records.records(p.topicPartition).iterator());
 
 // cycle through the partitions in order to interleave records from each.
-curBatch = Iterators.cycle(new LinkedList<>(partitionStates));
+curBatch = Iterators.cycle(new ArrayList<>(partitionStates));
 
 Review comment:
   Yes. 
   
   > `LinkedList` almost never out-performs `ArrayList` or `ArrayDeque`. If you 
are using `LinkedList` as a list, prefer `ArrayList`. If you are using 
`LinkedList` as a stack or queue/deque, prefer `ArrayDeque`.
   
   Because `ArrayDeque` rejects nulls I erred on the side of caution


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103648)
Time Spent: 1h 20m  (was: 1h 10m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle #298

2018-05-18 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103627
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 19/May/18 01:45
Start Date: 19/May/18 01:45
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189421412
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
+  bundleDescriptorBuilder
+  .getPcollectionsMap()
+  .get(pCollectionId)
+  .toBuilder()
+  .setCoderId(lengthPrefixedSideInputCoderId)
+  .build());
+
+  FullWindowedValueCoder coder =
+  (FullWindowedValueCoder) WireCoders.instantiateRunnerWireCoder(
 
 Review comment:
   Serialization will also be needed for push back with side inputs (at least 
for streaming).
   
   In the old, non-portable Flink runner's ParDo execution, the operator also 
requires access to the key to manage state and timers. Will the key extraction 
be in the SDK (and a parameter to the runner)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103627)
Time Spent: 3.5h  (was: 3h 20m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #478

2018-05-18 Thread Apache Jenkins Server
See 


--
[...truncated 19.95 MB...]
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:28.159Z: Autoscaling was automatically enabled for 
job 2018-05-18_17_47_28-5990285780540745716.
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:31.574Z: Checking required Cloud APIs are enabled.
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:32.141Z: Checking permissions granted to controller 
Service Account.
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:35.890Z: Worker configuration: n1-standard-1 in 
us-central1-b.
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:36.413Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:36.668Z: Expanding GroupByKey operations into 
optimizable parts.
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:36.706Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:36.973Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.021Z: Elided trivial flatten 
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.057Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.088Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.125Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.172Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.213Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.257Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 19, 2018 12:47:39 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-19T00:47:37.301Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103616
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 19/May/18 00:49
Start Date: 19/May/18 00:49
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189419398
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
 
 Review comment:
   It would be helpful to just put that as a comment into the code. I actually 
had the same doubt.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103616)
Time Spent: 3h 20m  (was: 3h 10m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by 

[jira] [Work logged] (BEAM-4297) Flink portable runner executable stage operator for streaming

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4297?focusedWorklogId=103615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103615
 ]

ASF GitHub Bot logged work on BEAM-4297:


Author: ASF GitHub Bot
Created on: 19/May/18 00:49
Start Date: 19/May/18 00:49
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5407: 
[BEAM-4297] Streaming executable stage translation and operator for portable 
Flink runner.
URL: https://github.com/apache/beam/pull/5407#discussion_r189330462
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchPortablePipelineTranslator.java
 ##
 @@ -616,7 +616,7 @@ private static void pruneOutput(
   }
 
   /**  Creates a mapping from PCollection id to output tag integer. */
-  private static BiMap createOutputMap(Iterable 
localOutputs) {
+  static BiMap createOutputMap(Iterable localOutputs) 
{
 
 Review comment:
   Please move this into a shared utility class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103615)
Time Spent: 0.5h  (was: 20m)

> Flink portable runner executable stage operator for streaming
> -
>
> Key: BEAM-4297
> URL: https://issues.apache.org/jira/browse/BEAM-4297
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103612
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 19/May/18 00:30
Start Date: 19/May/18 00:30
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189418022
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
+  bundleDescriptorBuilder
+  .getPcollectionsMap()
+  .get(pCollectionId)
+  .toBuilder()
+  .setCoderId(lengthPrefixedSideInputCoderId)
+  .build());
+
+  FullWindowedValueCoder coder =
+  (FullWindowedValueCoder) WireCoders.instantiateRunnerWireCoder(
 
 Review comment:
   I would like that in practice, although currently we require WireCoders on 
the runner side (at least in Flink) in order to give collections an associated 
serialization mechanism. This may change down the line when everything is "just 
bytes", but that will require some additional machinery. For example, to 
extract keys and values for grouping operations.
   
   The good news is that runner-side coders need not be kept in sync with the 
wire coders here. Anyway, this seems fine for now and we can address that when 
we get around to it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103612)
Time Spent: 3h 10m  (was: 3h)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103610
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 19/May/18 00:30
Start Date: 19/May/18 00:30
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189417781
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
 
 Review comment:
   Ah, interesting. I didn't realize those required separate materializations. 
Thanks for the explanation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103610)
Time Spent: 3h  (was: 2h 50m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103611
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 19/May/18 00:30
Start Date: 19/May/18 00:30
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189418051
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java
 ##
 @@ -39,15 +40,31 @@ public static GrpcStateService create() {
 return new GrpcStateService();
   }
 
+  private final ConcurrentLinkedQueue clients;
 
 Review comment:
   Ah, this makes much more sense now.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103611)
Time Spent: 3h  (was: 2h 50m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by the runner match the coders used by SDKs. 
> Furthermore, it allows the _runners_ to specify coders. This involves changes 
> to the proto models and all SDKs.
>  * Annotate side input state requests with both key and value coders. This 
> inverts the expected responsibility and has the SDK determine runner coders. 
> Additionally, because runners do not understand all SDK types, additional 
> coder substitution will need to be done at request handling time to make sure 
> that the requested coder can be instantiated and will remain consistent with 
> the SDK coder. This requires only small changes to SDKs because they may opt 
> to use their default PCollection coders.
> All of the these approaches have their own downsides. Explicit side input 
> coders is probably the right thing to do long-term, but the simplest change 
> for now is to modify PCollection coders to match exactly how they're 
> materialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Compressed_TextIOIT_HDFS #186

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[daniel.o.programmer] [BEAM-2937] Add new Combine URNs.

[tgroh] Update worker_id Documentation

[Pablo] Increasing the concurrent test execution count (#5408)

--
[...truncated 471.40 KB...]
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy65.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy66.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.match(HadoopFileSystem.java:81)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:123)
at 
org.apache.beam.sdk.io.common.FileBasedIOITHelper$DeleteFileFn.processElement(FileBasedIOITHelper.java:89)
at 
org.apache.beam.sdk.io.common.FileBasedIOITHelper$DeleteFileFn$DoFnInvoker.invokeProcessElement(Unknown
 Source)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:138)
at 
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
at 
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
at 
com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
at 
com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:200)
at 
com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
at 
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:383)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:355)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:286)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Workflow failed. Causes: S30:Delete test files failed., A work item was 
attempted 4 times without success. Each 

Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #187

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[daniel.o.programmer] [BEAM-2937] Add new Combine URNs.

[tgroh] Update worker_id Documentation

[Pablo] Increasing the concurrent test execution count (#5408)

--
[...truncated 231.24 KB...]
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:219)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(MongoDbIO.java:652)
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting 
for a server that matches WritableServerSelector. Client view of cluster state 
is {type=UNKNOWN, servers=[{address=35.188.199.191:27017, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception 
opening socket}, caused by {java.net.SocketTimeoutException: connect timed 
out}}]
at 
com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:219)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(MongoDbIO.java:652)
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting 
for a server that matches WritableServerSelector. Client view of cluster state 
is {type=UNKNOWN, servers=[{address=35.188.199.191:27017, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception 
opening socket}, caused by {java.net.SocketTimeoutException: connect timed 
out}}]
at 
com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:219)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:667)
at 
org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.processElement(MongoDbIO.java:652)
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting 
for a server that matches WritableServerSelector. Client view of cluster state 
is {type=UNKNOWN, servers=[{address=35.188.199.191:27017, type=UNKNOWN, 
state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception 
opening socket}, caused by {java.net.SocketTimeoutException: connect timed 
out}}]
at 

Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #279

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[daniel.o.programmer] [BEAM-2937] Add new Combine URNs.

[tgroh] Update worker_id Documentation

[Pablo] Increasing the concurrent test execution count (#5408)

--
[...truncated 103.92 KB...]

> Task :beam-sdks-java-io-hadoop-input-format:classes UP-TO-DATE
Skipping task ':beam-sdks-java-io-hadoop-input-format:classes' as it has no 
actions.
:beam-sdks-java-io-hadoop-input-format:classes (Thread[Task worker for ':' 
Thread 14,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-io-google-cloud-platform:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) started.

> Task :beam-sdks-java-io-google-cloud-platform:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-google-cloud-platform:compileTestJava' is 
941cd7c4ef68840f00c70769017ca6d0
Skipping task ':beam-sdks-java-io-google-cloud-platform:compileTestJava' as it 
is up-to-date.
:beam-sdks-java-io-google-cloud-platform:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) completed. Took 0.067 secs.
:beam-sdks-java-io-google-cloud-platform:testClasses (Thread[Task worker for 
':' Thread 10,5,main]) started.

> Task :beam-sdks-java-io-google-cloud-platform:testClasses UP-TO-DATE
Skipping task ':beam-sdks-java-io-google-cloud-platform:testClasses' as it has 
no actions.
:beam-sdks-java-io-google-cloud-platform:testClasses (Thread[Task worker for 
':' Thread 10,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-io-google-cloud-platform:shadowTestJar (Thread[Task worker for 
':' Thread 10,5,main]) started.

> Task :beam-sdks-java-io-google-cloud-platform:shadowTestJar UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-google-cloud-platform:shadowTestJar' is 
423a3734912c4c3e0b79631dc6d9d77b
Caching disabled for task 
':beam-sdks-java-io-google-cloud-platform:shadowTestJar': Caching has not been 
enabled for the task
Skipping task ':beam-sdks-java-io-google-cloud-platform:shadowTestJar' as it is 
up-to-date.
:beam-sdks-java-io-google-cloud-platform:shadowTestJar (Thread[Task worker for 
':' Thread 10,5,main]) completed. Took 0.037 secs.
:beam-runners-google-cloud-dataflow-java:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) started.

> Task :beam-runners-google-cloud-dataflow-java:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-runners-google-cloud-dataflow-java:compileTestJava' is 
9391da5273a709845c0d7c9465b53822
Skipping task ':beam-runners-google-cloud-dataflow-java:compileTestJava' as it 
is up-to-date.
:beam-runners-google-cloud-dataflow-java:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) completed. Took 0.062 secs.
:beam-runners-google-cloud-dataflow-java:testClasses (Thread[Task worker for 
':' Thread 2,5,main]) started.

> Task :beam-runners-google-cloud-dataflow-java:testClasses UP-TO-DATE
Skipping task ':beam-runners-google-cloud-dataflow-java:testClasses' as it has 
no actions.
:beam-runners-google-cloud-dataflow-java:testClasses (Thread[Task worker for 
':' Thread 2,5,main]) completed. Took 0.0 secs.
:beam-runners-google-cloud-dataflow-java:shadowTestJar (Thread[Task worker for 
':' Thread 2,5,main]) started.

> Task :beam-runners-google-cloud-dataflow-java:shadowTestJar UP-TO-DATE
Build cache key for task 
':beam-runners-google-cloud-dataflow-java:shadowTestJar' is 
1f8c5783bf2c77bc2f29229d5f736363
Caching disabled for task 
':beam-runners-google-cloud-dataflow-java:shadowTestJar': Caching has not been 
enabled for the task
Skipping task ':beam-runners-google-cloud-dataflow-java:shadowTestJar' as it is 
up-to-date.
:beam-runners-google-cloud-dataflow-java:shadowTestJar (Thread[Task worker for 
':' Thread 2,5,main]) completed. Took 0.045 secs.
:beam-sdks-java-io-hadoop-input-format:compileTestJava (Thread[Task worker for 
':' Thread 2,5,main]) started.

> Task :beam-sdks-java-io-hadoop-input-format:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-hadoop-input-format:compileTestJava' is 
39692de6510ef8de2df7d84fd30ecb42
Skipping task ':beam-sdks-java-io-hadoop-input-format:compileTestJava' as it is 
up-to-date.
:beam-sdks-java-io-hadoop-input-format:compileTestJava (Thread[Task worker for 
':' Thread 2,5,main]) completed. Took 0.494 secs.
:beam-sdks-java-io-hadoop-input-format:testClasses (Thread[Task worker for ':' 
Thread 2,5,main]) started.

> Task :beam-sdks-java-io-hadoop-input-format:testClasses UP-TO-DATE
Skipping task ':beam-sdks-java-io-hadoop-input-format:testClasses' as it has no 
actions.
:beam-sdks-java-io-hadoop-input-format:testClasses (Thread[Task worker for ':' 
Thread 2,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-io-hadoop-input-format:integrationTest (Thread[Task worker for 
':' Thread 2,5,main]) started.
Gradle Test Executor 1 started executing tests.

> Task :beam-sdks-java-io-hadoop-input-format:integrationTest
Build cache key for task 

Jenkins build is back to normal : beam_PerformanceTests_XmlIOIT_HDFS #185

2018-05-18 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_ParquetIOIT #9

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[daniel.o.programmer] [BEAM-2937] Add new Combine URNs.

[tgroh] Update worker_id Documentation

[Pablo] Increasing the concurrent test execution count (#5408)

--
[...truncated 93.38 KB...]
Skipping task ':beam-runners-google-cloud-dataflow-java:shadowJar' as it is 
up-to-date.
:beam-runners-google-cloud-dataflow-java:shadowJar (Thread[Task worker for 
':',5,main]) completed. Took 0.01 secs.

> Task :beam-sdks-java-core:compileTestJava UP-TO-DATE
Build cache key for task ':beam-sdks-java-core:compileTestJava' is 
a45bc5e16b9165cfc9a3c3e27a02a941
Skipping task ':beam-sdks-java-core:compileTestJava' as it is up-to-date.
:beam-sdks-java-core:compileTestJava (Thread[Task worker for ':' Thread 
11,5,main]) completed. Took 0.289 secs.
:beam-sdks-java-core:processTestResources (Thread[Task worker for ':' Thread 
11,5,main]) started.

> Task :beam-sdks-java-core:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':beam-sdks-java-core:processTestResources' as it has no source 
files and no previous output files.
:beam-sdks-java-core:processTestResources (Thread[Task worker for ':' Thread 
11,5,main]) completed. Took 0.001 secs.
:beam-sdks-java-core:testClasses (Thread[Task worker for ':' Thread 11,5,main]) 
started.

> Task :beam-sdks-java-core:testClasses UP-TO-DATE
Skipping task ':beam-sdks-java-core:testClasses' as it has no actions.
:beam-sdks-java-core:testClasses (Thread[Task worker for ':' Thread 11,5,main]) 
completed. Took 0.0 secs.
:beam-sdks-java-core:shadowTestJar (Thread[Task worker for ':' Thread 
11,5,main]) started.

> Task :beam-sdks-java-core:shadowTestJar UP-TO-DATE
Build cache key for task ':beam-sdks-java-core:shadowTestJar' is 
4dec0993ec9cbaaa1e190d8ce076fdbd
Caching disabled for task ':beam-sdks-java-core:shadowTestJar': Caching has not 
been enabled for the task
Skipping task ':beam-sdks-java-core:shadowTestJar' as it is up-to-date.
:beam-sdks-java-core:shadowTestJar (Thread[Task worker for ':' Thread 
11,5,main]) completed. Took 0.024 secs.
:beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava 
(Thread[Task worker for ':' Thread 11,5,main]) started.
:beam-sdks-java-core:jar (Thread[Task worker for ':' Thread 15,5,main]) started.

> Task :beam-sdks-java-core:jar UP-TO-DATE
Build cache key for task ':beam-sdks-java-core:jar' is 
c144fa1c75859647a891c171b117dc46
Caching disabled for task ':beam-sdks-java-core:jar': Caching has not been 
enabled for the task
Skipping task ':beam-sdks-java-core:jar' as it is up-to-date.
:beam-sdks-java-core:jar (Thread[Task worker for ':' Thread 15,5,main]) 
completed. Took 0.012 secs.

> Task :beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava 
> UP-TO-DATE
Build cache key for task 
':beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava' is 
50a11c952ea1fbc12e939121938f029d
Skipping task 
':beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava' as it 
is up-to-date.
:beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava 
(Thread[Task worker for ':' Thread 11,5,main]) completed. Took 0.032 secs.
:beam-sdks-java-extensions-google-cloud-platform-core:testClasses (Thread[Task 
worker for ':' Thread 11,5,main]) started.

> Task :beam-sdks-java-extensions-google-cloud-platform-core:testClasses 
> UP-TO-DATE
Skipping task 
':beam-sdks-java-extensions-google-cloud-platform-core:testClasses' as it has 
no actions.
:beam-sdks-java-extensions-google-cloud-platform-core:testClasses (Thread[Task 
worker for ':' Thread 11,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar 
(Thread[Task worker for ':' Thread 11,5,main]) started.

> Task :beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar 
> UP-TO-DATE
Build cache key for task 
':beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar' is 
f59afe23febaed688a527813e91201a9
Caching disabled for task 
':beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar': Caching 
has not been enabled for the task
Skipping task 
':beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar' as it is 
up-to-date.
:beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar 
(Thread[Task worker for ':' Thread 11,5,main]) completed. Took 0.015 secs.
:beam-sdks-java-io-google-cloud-platform:compileTestJava (Thread[Task worker 
for ':' Thread 11,5,main]) started.

> Task :beam-sdks-java-io-google-cloud-platform:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-google-cloud-platform:compileTestJava' is 
941cd7c4ef68840f00c70769017ca6d0
Skipping task ':beam-sdks-java-io-google-cloud-platform:compileTestJava' as it 
is up-to-date.

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103602
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:58
Start Date: 18/May/18 23:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189416235
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
 
 Review comment:
   PCollection which is consumed as a side input within an executable stage can 
only be consumed as a side input. You will never have an executable stage where 
the same PCollection is consumed as a main input and a side input.
   
   Side input materialization and access requires a fusion break.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103602)
Time Spent: 2h 50m  (was: 2h 40m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, 

[jira] [Work logged] (BEAM-3827) Add Go SDK integration tests

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3827?focusedWorklogId=103600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103600
 ]

ASF GitHub Bot logged work on BEAM-3827:


Author: ASF GitHub Bot
Created on: 18/May/18 23:50
Start Date: 18/May/18 23:50
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #5409: [BEAM-3827] Invoke Go 
Dataflow integration tests from post-commit
URL: https://github.com/apache/beam/pull/5409#issuecomment-390360023
 
 
   R: @lostluck @kennknowles 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103600)
Time Spent: 1.5h  (was: 1h 20m)

> Add Go SDK integration tests
> 
>
> Key: BEAM-3827
> URL: https://issues.apache.org/jira/browse/BEAM-3827
> Project: Beam
>  Issue Type: Task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should add post-commit testing similar to ValidateRunner tests to ensure 
> that the model is implemented correctly.
> Proposal: 
> https://docs.google.com/document/d/1jy6EE7D4RjgfNV0FhD3rMsT1YKhnUfcHRZMAlC6ygXw/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103599
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:48
Start Date: 18/May/18 23:48
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5374: [BEAM-4271] Support 
side inputs for ExecutableStage and provide runner side utilities for handling 
multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#issuecomment-390359774
 
 
   Ben, PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103599)
Time Spent: 2h 40m  (was: 2.5h)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by the runner match the coders used by SDKs. 
> Furthermore, it allows the _runners_ to specify coders. This involves changes 
> to the proto models and all SDKs.
>  * Annotate side input state requests with both key and value coders. This 
> inverts the expected responsibility and has the SDK determine runner coders. 
> Additionally, because runners do not understand all SDK types, additional 
> coder substitution will need to be done at request handling time to make sure 
> that the requested coder can be instantiated and will remain consistent with 
> the SDK coder. This requires only small changes to SDKs because they may opt 
> to use their default PCollection coders.
> All of the these approaches have their own downsides. Explicit side input 
> coders is probably the right thing to do long-term, but the simplest change 
> for now is to modify PCollection coders to match exactly how they're 
> materialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4141) Data channel deadlocks when user function fails

2018-05-18 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-4141.
-
Resolution: Fixed

> Data channel deadlocks when user function fails
> ---
>
> Key: BEAM-4141
> URL: https://issues.apache.org/jira/browse/BEAM-4141
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> There is a deadlock condition in the data channel code that occurs when a 
> user function fails while processing an element. The producer for the data 
> channel is continuing to send information across a channel, but the intended 
> consumer has stopped listening. Unfortunately, this channel blocks the entire 
> data channel, blocking data for any other DoFn that might be running, causing 
> the whole worker to deadlock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103598
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:45
Start Date: 18/May/18 23:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189415143
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
+  bundleDescriptorBuilder
+  .getPcollectionsMap()
+  .get(pCollectionId)
+  .toBuilder()
+  .setCoderId(lengthPrefixedSideInputCoderId)
+  .build());
+
+  FullWindowedValueCoder coder =
+  (FullWindowedValueCoder) WireCoders.instantiateRunnerWireCoder(
 
 Review comment:
   I would rather have WireCoders not be exposed and actually hidden away as an 
implementation detail of how this specific executable process bundle descriptor 
handles materialization of side inputs and also how it handles side input 
access. Other usages should be migrated to either know what they are doing via 
LengthPrefixUnknownCoders or be modeled as an ExecutableStage and hence can 
then use the ExecutableStage library code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103598)
Time Spent: 2.5h  (was: 2h 20m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103596
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:37
Start Date: 18/May/18 23:37
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189414455
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java
 ##
 @@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.state;
+
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.protobuf.ByteString;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionStage;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.TypeCase;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.MultimapSideInputSpec;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.stream.DataStreams;
+import org.apache.beam.sdk.fn.stream.DataStreams.ElementDelimitedOutputStream;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.common.Reiterable;
+
+/**
+ * A set of utility methods which construct {@link StateRequestHandler}s.
+ *
+ * TODO: Add a variant which works on {@link ByteString}s to remove 
encoding/decoding overhead.
+ */
+public class StateRequestHandlers {
+
+  /**
+   * A handler for multimap side inputs.
+   */
+  public interface MultimapSideInputHandler {
+/**
+ * Returns an {@link Iterable} of values representing the side input for 
the given key and
+ * window.
+ *
+ * TODO: Add support for side input chunking and caching if a {@link 
Reiterable} is returned.
+ */
+Iterable get(K key, W window);
+  }
+
+  /**
+   * A factory which constructs {@link MultimapSideInputHandler}s.
+   */
+  public interface MultimapSideInputHandlerFactory {
+
+/**
+ * Returns a {@link MultimapSideInputHandler} for the given {@code 
pTransformId} and
+ * {@code sideInputId}. The supplied {@code keyCoder}, {@code valueCoder}, 
and
+ * {@code windowCoder} should be used to encode/decode their respective 
values.
+ */
+ MultimapSideInputHandler 
forSideInput(
+String pTransformId,
+String sideInputId,
+Coder keyCoder,
+Coder valueCoder,
+Coder windowCoder);
+
+/**
+ * Throws a {@link UnsupportedOperationException} on the first access.
+ */
+static MultimapSideInputHandlerFactory unsupported() {
+  return new MultimapSideInputHandlerFactory() {
+@Override
+public  MultimapSideInputHandler forSideInput(
+String pTransformId, String sideInputId, Coder keyCoder, 
Coder valueCoder,
+Coder windowCoder) {
+  throw new UnsupportedOperationException(String.format(
+  "The %s does not support handling sides inputs for PTransform %s 
with side "
+  + "input id %s.",
+  MultimapSideInputHandler.class.getSimpleName(),
+  pTransformId,
+  sideInputId));
+}
+  };
+}
+  }
+
+  /**
+   * 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103597
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:37
Start Date: 18/May/18 23:37
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189413794
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
 
 Review comment:
   Sorry, I should have made that clearer. The same PCollection may be 
_consumed_ multiple times in the same executable stage: both as a side input 
and as a PCollection. Again, we maintain types here, so it should be "correct". 
It's just different from what we did before (where PCollection nodes that were 
actually processed _within_ an executable stage, as opposed to on the read 
nodes, used their original coders).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103597)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103595
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:34
Start Date: 18/May/18 23:34
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189414162
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
+  bundleDescriptorBuilder
+  .getPcollectionsMap()
+  .get(pCollectionId)
+  .toBuilder()
+  .setCoderId(lengthPrefixedSideInputCoderId)
+  .build());
+
+  FullWindowedValueCoder coder =
+  (FullWindowedValueCoder) WireCoders.instantiateRunnerWireCoder(
+  sideInputReference.collection(), components);
+  idsToSpec.put(
+  sideInputReference.transform().getId(),
+  sideInputReference.localName(),
+  MultimapSideInputSpec.of(
+  sideInputReference.transform().getId(),
+  sideInputReference.localName(),
+  ((KvCoder) coder.getValueCoder()).getKeyCoder(),
+  ((KvCoder) coder.getValueCoder()).getValueCoder(),
+  coder.getWindowCoder()));
+}
+return idsToSpec.build().rowMap();
+  }
+
   @AutoValue
   abstract static class TargetEncoding {
 abstract BeamFnApi.Target getTarget();
 
 abstract Coder getCoder();
   }
 
+  /**
+   * A container type storing references to the key, value, and window coder 
used when
 
 Review comment:
   Added {@link Coder} to the appropriate instance type.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103595)
Time Spent: 2h 10m  (was: 2h)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103594
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:33
Start Date: 18/May/18 23:33
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189414036
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -179,9 +298,13 @@ private static String addWireCoder(
 public static ExecutableProcessBundleDescriptor of(
 ProcessBundleDescriptor descriptor,
 RemoteInputDestination inputDestination,
-Map> outputTargetCoders) {
+Map> outputTargetCoders,
+Map> 
multimapSideInputSpecs) {
   return new 
AutoValue_ProcessBundleDescriptors_ExecutableProcessBundleDescriptor(
-  descriptor, inputDestination, 
Collections.unmodifiableMap(outputTargetCoders));
+  descriptor,
+  inputDestination,
+  Collections.unmodifiableMap(outputTargetCoders),
+  multimapSideInputSpecs);
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103594)
Time Spent: 2h  (was: 1h 50m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by the runner match the coders used by SDKs. 
> Furthermore, it allows the _runners_ to specify coders. This involves changes 
> to the proto models and all SDKs.
>  * Annotate side input state requests with both key and value coders. This 
> inverts the expected responsibility and has the SDK determine runner coders. 
> Additionally, because runners do not understand all SDK types, additional 
> coder substitution will need to be done at request handling time to make sure 
> that the requested coder can be instantiated and will remain consistent with 
> the SDK coder. This requires only small changes to SDKs because they may opt 
> to use their default PCollection coders.
> All of the these approaches have their own downsides. Explicit side input 
> coders is probably the right thing to do long-term, but the simplest change 
> for now is to modify PCollection coders to match exactly how they're 
> materialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4167) Implement UNNEST

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4167?focusedWorklogId=103593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103593
 ]

ASF GitHub Bot logged work on BEAM-4167:


Author: ASF GitHub Bot
Created on: 18/May/18 23:31
Start Date: 18/May/18 23:31
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5428: [BEAM-4167] 
Implement UNNEST
URL: https://github.com/apache/beam/pull/5428#issuecomment-390357844
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103593)
Time Spent: 0.5h  (was: 20m)

> Implement UNNEST
> 
>
> Key: BEAM-4167
> URL: https://issues.apache.org/jira/browse/BEAM-4167
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need to be able to convert collections to relations in the query to 
> perform any meaningful operations on them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103592
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:23
Start Date: 18/May/18 23:23
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189413133
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java
 ##
 @@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.state;
+
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.protobuf.ByteString;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionStage;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.TypeCase;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.MultimapSideInputSpec;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.stream.DataStreams;
+import org.apache.beam.sdk.fn.stream.DataStreams.ElementDelimitedOutputStream;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.common.Reiterable;
+
+/**
+ * A set of utility methods which construct {@link StateRequestHandler}s.
+ *
+ * TODO: Add a variant which works on {@link ByteString}s to remove 
encoding/decoding overhead.
+ */
+public class StateRequestHandlers {
+
+  /**
+   * A handler for multimap side inputs.
+   */
+  public interface MultimapSideInputHandler {
+/**
+ * Returns an {@link Iterable} of values representing the side input for 
the given key and
+ * window.
+ *
+ * TODO: Add support for side input chunking and caching if a {@link 
Reiterable} is returned.
+ */
+Iterable get(K key, W window);
+  }
+
+  /**
+   * A factory which constructs {@link MultimapSideInputHandler}s.
+   */
+  public interface MultimapSideInputHandlerFactory {
+
+/**
+ * Returns a {@link MultimapSideInputHandler} for the given {@code 
pTransformId} and
+ * {@code sideInputId}. The supplied {@code keyCoder}, {@code valueCoder}, 
and
+ * {@code windowCoder} should be used to encode/decode their respective 
values.
+ */
+ MultimapSideInputHandler 
forSideInput(
+String pTransformId,
+String sideInputId,
+Coder keyCoder,
+Coder valueCoder,
+Coder windowCoder);
+
+/**
+ * Throws a {@link UnsupportedOperationException} on the first access.
+ */
+static MultimapSideInputHandlerFactory unsupported() {
+  return new MultimapSideInputHandlerFactory() {
+@Override
+public  MultimapSideInputHandler forSideInput(
+String pTransformId, String sideInputId, Coder keyCoder, 
Coder valueCoder,
+Coder windowCoder) {
+  throw new UnsupportedOperationException(String.format(
+  "The %s does not support handling sides inputs for PTransform %s 
with side "
+  + "input id %s.",
+  MultimapSideInputHandler.class.getSimpleName(),
+  pTransformId,
+  sideInputId));
+}
+  };
+}
+  }
+
+  /**
+   * 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103591
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:21
Start Date: 18/May/18 23:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189412942
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java
 ##
 @@ -39,15 +40,31 @@ public static GrpcStateService create() {
 return new GrpcStateService();
   }
 
+  private final ConcurrentLinkedQueue clients;
 
 Review comment:
   Thanks, this hint actually fixed the TODO so that the state server now shuts 
its clients down.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103591)
Time Spent: 1h 40m  (was: 1.5h)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by the runner match the coders used by SDKs. 
> Furthermore, it allows the _runners_ to specify coders. This involves changes 
> to the proto models and all SDKs.
>  * Annotate side input state requests with both key and value coders. This 
> inverts the expected responsibility and has the SDK determine runner coders. 
> Additionally, because runners do not understand all SDK types, additional 
> coder substitution will need to be done at request handling time to make sure 
> that the requested coder can be instantiated and will remain consistent with 
> the SDK coder. This requires only small changes to SDKs because they may opt 
> to use their default PCollection coders.
> All of the these approaches have their own downsides. Explicit side input 
> coders is probably the right thing to do long-term, but the simplest change 
> for now is to modify PCollection coders to match exactly how they're 
> materialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103590
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:14
Start Date: 18/May/18 23:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189412271
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java
 ##
 @@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.state;
+
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.protobuf.ByteString;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionStage;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.TypeCase;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.MultimapSideInputSpec;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.stream.DataStreams;
+import org.apache.beam.sdk.fn.stream.DataStreams.ElementDelimitedOutputStream;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.common.Reiterable;
+
+/**
+ * A set of utility methods which construct {@link StateRequestHandler}s.
+ *
+ * TODO: Add a variant which works on {@link ByteString}s to remove 
encoding/decoding overhead.
+ */
+public class StateRequestHandlers {
+
+  /**
+   * A handler for multimap side inputs.
+   */
+  public interface MultimapSideInputHandler {
+/**
+ * Returns an {@link Iterable} of values representing the side input for 
the given key and
+ * window.
+ *
+ * TODO: Add support for side input chunking and caching if a {@link 
Reiterable} is returned.
+ */
+Iterable get(K key, W window);
+  }
+
+  /**
+   * A factory which constructs {@link MultimapSideInputHandler}s.
+   */
+  public interface MultimapSideInputHandlerFactory {
+
+/**
+ * Returns a {@link MultimapSideInputHandler} for the given {@code 
pTransformId} and
+ * {@code sideInputId}. The supplied {@code keyCoder}, {@code valueCoder}, 
and
+ * {@code windowCoder} should be used to encode/decode their respective 
values.
+ */
+ MultimapSideInputHandler 
forSideInput(
+String pTransformId,
+String sideInputId,
+Coder keyCoder,
+Coder valueCoder,
+Coder windowCoder);
+
+/**
+ * Throws a {@link UnsupportedOperationException} on the first access.
+ */
+static MultimapSideInputHandlerFactory unsupported() {
+  return new MultimapSideInputHandlerFactory() {
+@Override
+public  MultimapSideInputHandler forSideInput(
+String pTransformId, String sideInputId, Coder keyCoder, 
Coder valueCoder,
+Coder windowCoder) {
+  throw new UnsupportedOperationException(String.format(
+  "The %s does not support handling sides inputs for PTransform %s 
with side "
+  + "input id %s.",
+  MultimapSideInputHandler.class.getSimpleName(),
+  pTransformId,
+  sideInputId));
+}
+  };
+}
+  }
+
+  /**
+   * 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103589
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 23:14
Start Date: 18/May/18 23:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189412218
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
 
 Review comment:
   The side input pcollection can never be part of the same executable stage as 
the one that is materializing it.
   
   As a side note, the way that the coder is being mutated preserves SDK type 
safety so it would only impact PCollection size estimation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103589)
Time Spent: 1h 20m  (was: 1h 10m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This 

[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103588
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 23:07
Start Date: 18/May/18 23:07
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #5348: [BEAM-3326] Add a 
Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#issuecomment-390354745
 
 
   Done to all


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103588)
Time Spent: 10h 50m  (was: 10h 40m)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3256) Add archetype testing/generation to existing GradleBuild PreCommit

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3256?focusedWorklogId=103587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103587
 ]

ASF GitHub Bot logged work on BEAM-3256:


Author: ASF GitHub Bot
Created on: 18/May/18 23:06
Start Date: 18/May/18 23:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5014: 
BEAM-3256 Add archetype testing/generation to existing GradleBuild Pr…
URL: https://github.com/apache/beam/pull/5014#discussion_r189411417
 
 

 ##
 File path: sdks/java/maven-archetypes/examples/build.gradle
 ##
 @@ -55,6 +55,17 @@ task generateSources(type: Exec) {
   commandLine './generate-sources.sh'
 }
 
+// Add archetype testing/generation to existing GradleBuild PreCommit
+// https://issues.apache.org/jira/browse/BEAM-3256
+task archetypesTest(type: Exec) {
+  if (project.hasProperty("maven_home")) {
+commandLine "${maven_home}/bin/mvn", 'clean', 'install'
+environment "MAVEN_HOME", "${maven_home}"
+  } else {
+commandLine 'mvn', 'clean', 'install'
 
 Review comment:
   Not part of this change. This change uses whichever mvn repo the 
developer/integration framework has which means that it will attempt to use the 
nightly snapshot.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103587)
Time Spent: 3h  (was: 2h 50m)

> Add archetype testing/generation to existing GradleBuild PreCommit
> --
>
> Key: BEAM-3256
> URL: https://issues.apache.org/jira/browse/BEAM-3256
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This build currently is not exercising the archetype build and tests 
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PreCommit_Java_GradleBuild.groovy
> found here:
> https://github.com/apache/beam/tree/master/sdks/java/maven-archetypes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103585
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 22:56
Start Date: 18/May/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5348: 
[BEAM-3326] Add a Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#discussion_r189410028
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/IdGenerator.java
 ##
 @@ -0,0 +1,30 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
 
 Review comment:
   Messed up license header.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103585)
Time Spent: 10h 40m  (was: 10.5h)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103580
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 22:56
Start Date: 18/May/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5348: 
[BEAM-3326] Add a Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#discussion_r189410092
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/package-info.java
 ##
 @@ -0,0 +1,23 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
 
 Review comment:
   license header


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103580)
Time Spent: 9h 50m  (was: 9h 40m)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103582
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 22:56
Start Date: 18/May/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5348: 
[BEAM-3326] Add a Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#discussion_r189410106
 
 

 ##
 File path: 
sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/IdGeneratorsTest.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
+ *  * or more contributor license agreements.  See the NOTICE file
+ *  * distributed with this work for additional information
+ *  * regarding copyright ownership.  The ASF licenses this file
+ *  * to you under the Apache License, Version 2.0 (the
+ *  * "License"); you may not use this file except in compliance
+ *  * with the License.  You may obtain a copy of the License at
+ *  *
 
 Review comment:
   license header.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103582)
Time Spent: 10h 10m  (was: 10h)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103584
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 22:56
Start Date: 18/May/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5348: 
[BEAM-3326] Add a Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#discussion_r189409699
 
 

 ##
 File path: 
runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/DirectJobBundleFactoryTest.java
 ##
 @@ -0,0 +1,181 @@
+/*
+ *
+ *  * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   License header incorrectly formatted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103584)
Time Spent: 10.5h  (was: 10h 20m)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103583
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 22:56
Start Date: 18/May/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5348: 
[BEAM-3326] Add a Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#discussion_r189409672
 
 

 ##
 File path: 
runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/BundleFactoryOutputRecieverFactoryTest.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.direct.portable;
+
+import static com.google.common.collect.Iterables.getOnlyElement;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.equalTo;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.Iterables;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.runners.core.construction.CoderTranslation;
+import org.apache.beam.runners.core.construction.RehydratedComponents;
+import org.apache.beam.runners.core.construction.SdkComponents;
+import org.apache.beam.runners.core.construction.graph.PipelineNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import org.apache.beam.runners.fnexecution.control.OutputReceiverFactory;
+import org.apache.beam.runners.fnexecution.wire.WireCoders;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
+import org.apache.beam.sdk.transforms.windowing.PaneInfo;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.util.CoderUtils;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.values.PCollection;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link BundleFactoryOutputRecieverFactory}. */
+@RunWith(JUnit4.class)
+public class BundleFactoryOutputRecieverFactoryTest {
+  private final BundleFactory bundleFactory = 
ImmutableListBundleFactory.create();
+  private PCollectionNode fooPC;
+  private PCollectionNode barPC;
+  private RunnerApi.Components components;
+
+  private OutputReceiverFactory factory;
+  private Collection outputBundles;
+
+  @Before
+  public void setup() throws IOException {
+Pipeline p = Pipeline.create();
+PCollection foo =
+p.apply("createFoo", Create.of("1", "2", "3"))
+.apply("windowFoo", 
Window.into(FixedWindows.of(Duration.standardMinutes(5L;
+PCollection bar = p.apply("bar", Create.of(1, 2, 3));
+
+SdkComponents sdkComponents = SdkComponents.create();
+String fooId = sdkComponents.registerPCollection(foo);
+String barId = sdkComponents.registerPCollection(bar);
+components = sdkComponents.toComponents();
+
+fooPC = PipelineNode.pCollection(fooId, 
components.getPcollectionsOrThrow(fooId));
+barPC = PipelineNode.pCollection(barId, 
components.getPcollectionsOrThrow(barId));
+
+outputBundles = new ArrayList<>();
+factory =
+BundleFactoryOutputRecieverFactory.create(bundleFactory, components, 
outputBundles::add);
+  }
+
+  @Test
+  public void addsBundlesToResult() {
+factory.create(fooPC.getId());
+factory.create(barPC.getId());
+
+assertThat(Iterables.size(outputBundles), equalTo(2));
+
+Collection pcollections = new ArrayList<>();
+for (UncommittedBundle bundle : outputBundles) {
+  

[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=103581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103581
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 18/May/18 22:56
Start Date: 18/May/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5348: 
[BEAM-3326] Add a Direct Job Bundle Factory
URL: https://github.com/apache/beam/pull/5348#discussion_r189408444
 
 

 ##
 File path: runners/direct-java/pom.xml
 ##
 @@ -204,14 +204,19 @@
   beam-runners-core-java
 
 
+
+  org.apache.beam
+  beam-sdks-java-fn-execution
+
+
 
   org.apache.beam
   beam-runners-java-fn-execution
 
 
 
   com.google.guava
-  guava
+  guavatn
 
 Review comment:
   `tn` will break the Maven build


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103581)
Time Spent: 10h  (was: 9h 50m)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4167) Implement UNNEST

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4167?focusedWorklogId=103579=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103579
 ]

ASF GitHub Bot logged work on BEAM-4167:


Author: ASF GitHub Bot
Created on: 18/May/18 22:53
Start Date: 18/May/18 22:53
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5428: [BEAM-4167] 
Implement UNNEST
URL: https://github.com/apache/beam/pull/5428#issuecomment-390352654
 
 
   @apilloud @akedin finally


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103579)
Time Spent: 20m  (was: 10m)

> Implement UNNEST
> 
>
> Key: BEAM-4167
> URL: https://issues.apache.org/jira/browse/BEAM-4167
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We need to be able to convert collections to relations in the query to 
> perform any meaningful operations on them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4167) Implement UNNEST

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4167?focusedWorklogId=103578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103578
 ]

ASF GitHub Bot logged work on BEAM-4167:


Author: ASF GitHub Bot
Created on: 18/May/18 22:43
Start Date: 18/May/18 22:43
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new pull request #5428: 
[BEAM-4167] WIP Implement UNNEST
URL: https://github.com/apache/beam/pull/5428
 
 
   DO NOT REVIEW - opening for self-review
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103578)
Time Spent: 10m
Remaining Estimate: 0h

> Implement UNNEST
> 
>
> Key: BEAM-4167
> URL: https://issues.apache.org/jira/browse/BEAM-4167
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to be able to convert collections to relations in the query to 
> perform any meaningful operations on them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103577
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 22:30
Start Date: 18/May/18 22:30
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5392: [BEAM-4267] 
JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#issuecomment-390349321
 
 
   Looks like the `dependencyUpdates` task is non-existent, so that can 
probably be ignored.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103577)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement a reusable library that can run an ExecutableStage with a given 
> Environment
> -
>
> Key: BEAM-4267
> URL: https://issues.apache.org/jira/browse/BEAM-4267
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Axel Magnuson
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Build off of the interfaces introduced in 
> [BEAM-3327|https://github.com/apache/beam/pull/5152] to provide a reusable 
> execution library to runners.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103576
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 22:24
Start Date: 18/May/18 22:24
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5392: [BEAM-4267] 
JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#issuecomment-390348297
 
 
   Review comments addressed and rebased to fix merge conflict. PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103576)
Time Spent: 1h 10m  (was: 1h)

> Implement a reusable library that can run an ExecutableStage with a given 
> Environment
> -
>
> Key: BEAM-4267
> URL: https://issues.apache.org/jira/browse/BEAM-4267
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Axel Magnuson
>Assignee: Ben Sidhom
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Build off of the interfaces introduced in 
> [BEAM-3327|https://github.com/apache/beam/pull/5152] to provide a reusable 
> execution library to runners.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #477

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[Pablo] Increasing the concurrent test execution count (#5408)

--
[...truncated 19.77 MB...]
INFO: 2018-05-18T21:56:03.953Z: Checking required Cloud APIs are enabled.
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:04.307Z: Checking permissions granted to controller 
Service Account.
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:07.626Z: Worker configuration: n1-standard-1 in 
us-central1-b.
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:07.999Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.289Z: Expanding GroupByKey operations into 
optimizable parts.
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.338Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.596Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.628Z: Elided trivial flatten 
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.669Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.711Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.755Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.806Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.854Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.904Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.946Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
May 18, 2018 9:56:11 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T21:56:08.967Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 

[jira] [Resolved] (BEAM-4172) Make public FileSystem registry

2018-05-18 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-4172.
-
   Resolution: Fixed
Fix Version/s: 2.5.0

> Make public FileSystem registry
> ---
>
> Key: BEAM-4172
> URL: https://issues.apache.org/jira/browse/BEAM-4172
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Cody Schroeder
>Assignee: Cody Schroeder
>Priority: Major
>  Labels: io
> Fix For: 2.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/5743a37/sdks/go/pkg/beam/io/textio/filesystem.go]
> The current _beam/io/textio_ package includes a useful _FileSystem_ interface 
> and corresponding _RegisterFileSystem_ function.  The _textio_ package uses 
> this internally to expose a _Read(beam.Scope, string)_ function that will 
> work for any file path corresponding to a registered FileSystem.
> It would be extremely useful to expose the _FileSystem_ interface outside of 
> just the _textio_ package and add global analogs for each of the _FileSystem_ 
> interface functions using the registry.  This would allow for easier 
> implementation of other file reading sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4157) Add Go QuickStart

2018-05-18 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-4157.
-
   Resolution: Fixed
Fix Version/s: 2.5.0

> Add Go QuickStart
> -
>
> Key: BEAM-4157
> URL: https://issues.apache.org/jira/browse/BEAM-4157
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=103568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103568
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 18/May/18 21:36
Start Date: 18/May/18 21:36
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-390339021
 
 
   Run Python Dependency Check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103568)
Time Spent: 0.5h  (was: 20m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=103567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103567
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 18/May/18 21:36
Start Date: 18/May/18 21:36
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-390338998
 
 
   Run Java Dependency Check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103567)
Time Spent: 20m  (was: 10m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=103566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103566
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 18/May/18 21:28
Start Date: 18/May/18 21:28
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-390337180
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103566)
Time Spent: 10m
Remaining Estimate: 0h

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: Increasing the concurrent test execution count (#5408)

2018-05-18 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 7038b56  Increasing the concurrent test execution count (#5408)
7038b56 is described below

commit 7038b5610eba02a6c9d0016ba98de06a9ee247b8
Author: Ankur 
AuthorDate: Fri May 18 14:15:20 2018 -0700

Increasing the concurrent test execution count (#5408)

* Tuning number of parallel executors and timeout. for Python PostCommits
---
 sdks/python/run_postcommit.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sdks/python/run_postcommit.sh b/sdks/python/run_postcommit.sh
index d26a1c9..afcdb3b 100755
--- a/sdks/python/run_postcommit.sh
+++ b/sdks/python/run_postcommit.sh
@@ -56,8 +56,8 @@ echo ">>> RUNNING TEST DATAFLOW RUNNER it tests"
 python setup.py nosetests \
   --attr $1 \
   --nocapture \
-  --processes=4 \
-  --process-timeout=1800 \
+  --processes=8 \
+  --process-timeout=2000 \
   --test-pipeline-options=" \
 --runner=TestDataflowRunner \
 --project=$PROJECT \

-- 
To stop receiving notification emails like this one, please contact
pabl...@apache.org.


[jira] [Work logged] (BEAM-1755) Python-SDK: Move build specific scripts to a dedicated folder

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1755?focusedWorklogId=103554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103554
 ]

ASF GitHub Bot logged work on BEAM-1755:


Author: ASF GitHub Bot
Created on: 18/May/18 21:01
Start Date: 18/May/18 21:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #5400: [BEAM-1755] Add a 
directory with build-specific scripts to clear up the Python SDK dir
URL: https://github.com/apache/beam/pull/5400#issuecomment-390331106
 
 
   And, thanks for the ping, I missed the previous update.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103554)
Time Spent: 1.5h  (was: 1h 20m)

> Python-SDK: Move build specific scripts to a dedicated folder
> -
>
> Key: BEAM-1755
> URL: https://issues.apache.org/jira/browse/BEAM-1755
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Tibor Kiss
>Assignee: Pablo Estrada
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are numerous build related files (run_*.sh, generate_pydoc.sh and most 
> recently findSupportedPython.groovy) are located now in Python-SDK's root.
> We should create a dedicated {{build_utils}} directory and relocate the 
> scripts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1755) Python-SDK: Move build specific scripts to a dedicated folder

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1755?focusedWorklogId=103552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103552
 ]

ASF GitHub Bot logged work on BEAM-1755:


Author: ASF GitHub Bot
Created on: 18/May/18 20:57
Start Date: 18/May/18 20:57
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #5400: [BEAM-1755] Add a 
directory with build-specific scripts to clear up the Python SDK dir
URL: https://github.com/apache/beam/pull/5400#issuecomment-390330284
 
 
   Giving folders meaningful names could be more mnemonic as it to the purpose 
of the content in the folder, but I don't have a strong opinion on this; in 
fact I have a scripts folder on my dev box that serves a similar purpose. Your 
call.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103552)
Time Spent: 1h 20m  (was: 1h 10m)

> Python-SDK: Move build specific scripts to a dedicated folder
> -
>
> Key: BEAM-1755
> URL: https://issues.apache.org/jira/browse/BEAM-1755
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Tibor Kiss
>Assignee: Pablo Estrada
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are numerous build related files (run_*.sh, generate_pydoc.sh and most 
> recently findSupportedPython.groovy) are located now in Python-SDK's root.
> We should create a dedicated {{build_utils}} directory and relocate the 
> scripts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1755) Python-SDK: Move build specific scripts to a dedicated folder

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1755?focusedWorklogId=103549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103549
 ]

ASF GitHub Bot logged work on BEAM-1755:


Author: ASF GitHub Bot
Created on: 18/May/18 20:40
Start Date: 18/May/18 20:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5400: [BEAM-1755] Add a 
directory with build-specific scripts to clear up the Python SDK dir
URL: https://github.com/apache/beam/pull/5400#issuecomment-390326146
 
 
   @tvalentyn


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103549)
Time Spent: 1h 10m  (was: 1h)

> Python-SDK: Move build specific scripts to a dedicated folder
> -
>
> Key: BEAM-1755
> URL: https://issues.apache.org/jira/browse/BEAM-1755
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Tibor Kiss
>Assignee: Pablo Estrada
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There are numerous build related files (run_*.sh, generate_pydoc.sh and most 
> recently findSupportedPython.groovy) are located now in Python-SDK's root.
> We should create a dedicated {{build_utils}} directory and relocate the 
> scripts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #476

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[daniel.o.programmer] [BEAM-2937] Add new Combine URNs.

--
[...truncated 20.32 MB...]
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:26.711Z: Autoscaling is enabled for job 
2018-05-18_13_21_26-6105214082174582967. The number of workers will be between 
1 and 1000.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:26.743Z: Autoscaling was automatically enabled for 
job 2018-05-18_13_21_26-6105214082174582967.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:29.341Z: Checking required Cloud APIs are enabled.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:29.506Z: Checking permissions granted to controller 
Service Account.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:33.264Z: Worker configuration: n1-standard-1 in 
us-central1-b.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:33.729Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:33.883Z: Expanding GroupByKey operations into 
optimizable parts.
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:33.914Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.081Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.106Z: Elided trivial flatten 
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.135Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.155Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.183Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.210Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.235Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:21:34.266Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 18, 2018 8:21:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #475

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Update worker_id Documentation

--
[...truncated 20.06 MB...]
INFO: Running Dataflow job 2018-05-18_13_06_25-17215356982797383000 with 0 
expected assertions.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:25.998Z: Autoscaling is enabled for job 
2018-05-18_13_06_25-17215356982797383000. The number of workers will be between 
1 and 1000.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:26.036Z: Autoscaling was automatically enabled for 
job 2018-05-18_13_06_25-17215356982797383000.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:29.110Z: Checking required Cloud APIs are enabled.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:29.329Z: Checking permissions granted to controller 
Service Account.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:32.782Z: Worker configuration: n1-standard-1 in 
us-central1-b.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:33.249Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:33.553Z: Expanding GroupByKey operations into 
optimizable parts.
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:33.599Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:33.936Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:33.978Z: Elided trivial flatten 
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:34.034Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:34.079Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:34.133Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:34.180Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:34.229Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 18, 2018 8:06:39 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T20:06:34.283Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 18, 2018 8:06:39 PM 

[jira] [Work logged] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=103545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103545
 ]

ASF GitHub Bot logged work on BEAM-3042:


Author: ASF GitHub Bot
Created on: 18/May/18 19:59
Start Date: 18/May/18 19:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5309: [BEAM-3042] Adding 
time tracking of batch side inputs [low priority]
URL: https://github.com/apache/beam/pull/5309#issuecomment-390316406
 
 
   Flag pushed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103545)
Time Spent: 6.5h  (was: 6h 20m)

> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103541
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 19:56
Start Date: 18/May/18 19:56
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5392: 
[BEAM-4267] JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#discussion_r189370302
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.net.HostAndPort;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactSource;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import org.apache.beam.runners.fnexecution.data.RemoteInputDestination;
+import org.apache.beam.runners.fnexecution.environment.DockerCommand;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public class DockerJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
+
+  // TODO: This host name seems to change with every other Docker release. Do 
we attempt to keep up
+  // or attempt to document the supported Docker version(s)?
+  private static final String DOCKER_FOR_MAC_HOST = "host.docker.internal";
+
+  private final IdGenerator stageIdGenerator;
+  private final GrpcFnServer controlServer;
+  private final GrpcFnServer loggingServer;
+  private final GrpcFnServer retrievalServer;
+  private 

[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103543
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 19:56
Start Date: 18/May/18 19:56
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5392: 
[BEAM-4267] JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#discussion_r189375290
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.net.HostAndPort;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactSource;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import org.apache.beam.runners.fnexecution.data.RemoteInputDestination;
+import org.apache.beam.runners.fnexecution.environment.DockerCommand;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public class DockerJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
+
+  // TODO: This host name seems to change with every other Docker release. Do 
we attempt to keep up
+  // or attempt to document the supported Docker version(s)?
+  private static final String DOCKER_FOR_MAC_HOST = "host.docker.internal";
+
+  private final IdGenerator stageIdGenerator;
+  private final GrpcFnServer controlServer;
+  private final GrpcFnServer loggingServer;
+  private final GrpcFnServer retrievalServer;
+  private 

[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103544
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 19:56
Start Date: 18/May/18 19:56
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5392: 
[BEAM-4267] JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#discussion_r189372624
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.net.HostAndPort;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactSource;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import org.apache.beam.runners.fnexecution.data.RemoteInputDestination;
+import org.apache.beam.runners.fnexecution.environment.DockerCommand;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public class DockerJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
+
+  // TODO: This host name seems to change with every other Docker release. Do 
we attempt to keep up
+  // or attempt to document the supported Docker version(s)?
+  private static final String DOCKER_FOR_MAC_HOST = "host.docker.internal";
+
+  private final IdGenerator stageIdGenerator;
+  private final GrpcFnServer controlServer;
+  private final GrpcFnServer loggingServer;
+  private final GrpcFnServer retrievalServer;
+  private 

[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103542
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 19:56
Start Date: 18/May/18 19:56
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5392: 
[BEAM-4267] JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#discussion_r189378039
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.net.HostAndPort;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactSource;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import org.apache.beam.runners.fnexecution.data.RemoteInputDestination;
+import org.apache.beam.runners.fnexecution.environment.DockerCommand;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public class DockerJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
+
+  // TODO: This host name seems to change with every other Docker release. Do 
we attempt to keep up
+  // or attempt to document the supported Docker version(s)?
+  private static final String DOCKER_FOR_MAC_HOST = "host.docker.internal";
+
+  private final IdGenerator stageIdGenerator;
+  private final GrpcFnServer controlServer;
+  private final GrpcFnServer loggingServer;
+  private final GrpcFnServer retrievalServer;
+  private 

[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103540
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 19:56
Start Date: 18/May/18 19:56
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5392: 
[BEAM-4267] JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#discussion_r189374884
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.net.HostAndPort;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactSource;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import org.apache.beam.runners.fnexecution.data.RemoteInputDestination;
+import org.apache.beam.runners.fnexecution.environment.DockerCommand;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public class DockerJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
+
+  // TODO: This host name seems to change with every other Docker release. Do 
we attempt to keep up
+  // or attempt to document the supported Docker version(s)?
+  private static final String DOCKER_FOR_MAC_HOST = "host.docker.internal";
+
+  private final IdGenerator stageIdGenerator;
+  private final GrpcFnServer controlServer;
+  private final GrpcFnServer loggingServer;
+  private final GrpcFnServer retrievalServer;
+  private 

[jira] [Work logged] (BEAM-4267) Implement a reusable library that can run an ExecutableStage with a given Environment

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4267?focusedWorklogId=103539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103539
 ]

ASF GitHub Bot logged work on BEAM-4267:


Author: ASF GitHub Bot
Created on: 18/May/18 19:56
Start Date: 18/May/18 19:56
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5392: 
[BEAM-4267] JobBundleFactory that uses Docker-backed environments
URL: https://github.com/apache/beam/pull/5392#discussion_r189370218
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import com.google.common.net.HostAndPort;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.fnexecution.v1.ProvisionApi.ProvisionInfo;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactSource;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import org.apache.beam.runners.fnexecution.data.RemoteInputDestination;
+import org.apache.beam.runners.fnexecution.environment.DockerCommand;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public class DockerJobBundleFactory implements JobBundleFactory {
+  private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
+
+  // TODO: This host name seems to change with every other Docker release. Do 
we attempt to keep up
+  // or attempt to document the supported Docker version(s)?
+  private static final String DOCKER_FOR_MAC_HOST = "host.docker.internal";
+
+  private final IdGenerator stageIdGenerator;
+  private final GrpcFnServer controlServer;
+  private final GrpcFnServer loggingServer;
+  private final GrpcFnServer retrievalServer;
+  private 

Jenkins build is back to normal : beam_PostCommit_Py_VR_Dataflow #46

2018-05-18 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #499

2018-05-18 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #479

2018-05-18 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-2937) Fn API combiner support w/ lifting to PGBK

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2937?focusedWorklogId=103536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103536
 ]

ASF GitHub Bot logged work on BEAM-2937:


Author: ASF GitHub Bot
Created on: 18/May/18 19:40
Start Date: 18/May/18 19:40
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #5128: [BEAM-2937] Update 
Portable Combine URNs to new URNs.
URL: https://github.com/apache/beam/pull/5128
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto 
b/model/pipeline/src/main/proto/beam_runner_api.proto
index 11646239b93..44e3f427887 100644
--- a/model/pipeline/src/main/proto/beam_runner_api.proto
+++ b/model/pipeline/src/main/proto/beam_runner_api.proto
@@ -245,6 +245,21 @@ message StandardPTransforms {
 COMBINE_PGBKCV = 0 [(beam_urn) = "beam:transform:combine_pgbkcv:v1"];
 COMBINE_MERGE_ACCUMULATORS = 1 [(beam_urn) = 
"beam:transform:combine_merge_accumulators:v1"];
 COMBINE_EXTRACT_OUTPUTS = 2 [(beam_urn) = 
"beam:transform:combine_extract_outputs:v1"];
+
+// Represents the Pre-Combine part of a lifted Combine Per Key, as 
described
+// in the following document:
+// 
https://s.apache.org/beam-runner-api-combine-model#heading=h.ta0g6ase8z07
+COMBINE_PER_KEY_PRECOMBINE = 3 [(beam_urn) = 
"beam:transform:combine_per_key_precombine:v1"];
+
+// Represents the Merge Accumulators part of a lifted Combine Per Key, as
+// described in the following document:
+// 
https://s.apache.org/beam-runner-api-combine-model#heading=h.jco9rvatld5m
+COMBINE_PER_KEY_MERGE_ACCUMULATORS = 4 [(beam_urn) = 
"beam:transform:combine_per_key_merge_accumulators:v1"];
+
+// Represents the Extract Outputs part of a lifted Combine Per Key, as
+// described in the following document:
+// 
https://s.apache.org/beam-runner-api-combine-model#heading=h.i9i6p8gtl6ku
+COMBINE_PER_KEY_EXTRACT_OUTPUTS = 5 [(beam_urn) = 
"beam:transform:combine_per_key_extract_outputs:v1"];
   }
   // Payload for all of these: ParDoPayload containing the user's SDF
   enum SplittableParDoComponents {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103536)
Time Spent: 1.5h  (was: 1h 20m)

> Fn API combiner support w/ lifting to PGBK
> --
>
> Key: BEAM-2937
> URL: https://issues.apache.org/jira/browse/BEAM-2937
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The FnAPI should support this optimization. Detailed design: 
> https://s.apache.org/beam-runner-api-combine-model
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4041) Performance tests fail due to kubernetes load balancer problems

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4041?focusedWorklogId=103535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103535
 ]

ASF GitHub Bot logged work on BEAM-4041:


Author: ASF GitHub Bot
Created on: 18/May/18 19:39
Start Date: 18/May/18 19:39
Worklog Time Spent: 10m 
  Work Description: DariuszAniszewski commented on a change in pull request 
#5425: [BEAM-4041] Increase timeout for getting K8s LoadBalancer external IP
URL: https://github.com/apache/beam/pull/5425#discussion_r189373886
 
 

 ##
 File path: .test-infra/jenkins/common_job_properties.groovy
 ##
 @@ -260,6 +260,8 @@ class common_job_properties {
   dpb_log_level: 'INFO',
   maven_binary: '/home/jenkins/tools/maven/latest/bin/mvn',
   bigquery_table: 'beam_performance.pkb_results',
+  k8s_get_retry_count: 36, // wait up tp 6 minutes for K8s LoadBalancer
 
 Review comment:
   ah, sorry. fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103535)
Time Spent: 40m  (was: 0.5h)

> Performance tests fail due to kubernetes load balancer problems
> ---
>
> Key: BEAM-4041
> URL: https://issues.apache.org/jira/browse/BEAM-4041
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Łukasz Gajowy
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Recently, as we added more IOITs to be run on jenkins using kubernetes, some 
> of them started to fail randomly, because they couldn't retrieve LoadBalancer 
> address. Normally obtaining the address took about one minute. Perfkit waits 
> for the address (actively checking for it) for 3 minutes. This should be 
> enough for getting the address, yet it recently started to exceed the 3 
> minutes limit. I also noticed that this error didn't happen when there were 
> fewer tests.
> Example logs:
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_Compressed_TextIOIT_HDFS/31/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (c21ce55 -> d645a64)

2018-05-18 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from c21ce55  Merge pull request #5426: Update worker_id Documentation
 add 030749f  [BEAM-2937] Add new Combine URNs.
 new d645a64  [BEAM-2937] Update Portable Combine URNs to new URNs.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 model/pipeline/src/main/proto/beam_runner_api.proto | 15 +++
 1 file changed, 15 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
lc...@apache.org.


[beam] 01/01: [BEAM-2937] Update Portable Combine URNs to new URNs.

2018-05-18 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit d645a6423a3108900a5d885d921aa85f464dfbec
Merge: c21ce55 030749f
Author: Lukasz Cwik 
AuthorDate: Fri May 18 12:40:21 2018 -0700

[BEAM-2937] Update Portable Combine URNs to new URNs.

 model/pipeline/src/main/proto/beam_runner_api.proto | 15 +++
 1 file changed, 15 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
lc...@apache.org.


[beam] branch master updated (30d8176 -> c21ce55)

2018-05-18 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 30d8176  Merge pull request #5386: Add a new DockerEnvironmentFactory 
Constructor
 add 5fe2eb5  Update worker_id Documentation
 new c21ce55  Merge pull request #5426: Update worker_id Documentation

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 model/fn-execution/src/main/proto/beam_provision_api.proto | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[beam] 01/01: Merge pull request #5426: Update worker_id Documentation

2018-05-18 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit c21ce55cd3ca6983edc660e84e424db253bb0c33
Merge: 30d8176 5fe2eb5
Author: Thomas Groh 
AuthorDate: Fri May 18 12:25:25 2018 -0700

Merge pull request #5426: Update worker_id Documentation

 model/fn-execution/src/main/proto/beam_provision_api.proto | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


Build failed in Jenkins: beam_PostCommit_Py_VR_Dataflow #45

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[herohde] Add Go integration tests

[timrobertson100] Fix error-prone warnings for io/hadoop-common

[timrobertson100] Fix error-prone and some static analysis warnings in

[tgroh] Add a new DockerEnvironmentFactory Constructor

[samuelw] [BEAM-3776] Fix issue with merging late windows where a watermark hold

[kedin] Fix Maven build

[rober] Update generated protos in Go SDK

[herohde] Add the Go SDK to the README

[timrobertson100] [BEAM-4340] Enforce ErrorProne analysis in file-based-io-tests

[timrobertson100] [BEAM-4335] Enforce ErrorProne analysis in 
amazon-web-services IO

[timrobertson100] [BEAM-4339] Enforce ErrorProne analysis in elasticsearch IO

[timrobertson100] [BEAM-4355] Enforce ErrorProne analysis in XML IO

[timrobertson100] [BEAM-4338] Enforce ErrorProne analysis in common IO

[timrobertson100] [BEAM-4337] Enforce ErrorProne analysis in cassandra IO

[timrobertson100] [BEAM-4355] Reduces scope of findbugs annotations to build 
time only

[timrobertson100] [BEAM-4353] Enforce ErrorProne analysis in solr IO

[timrobertson100] [BEAM-4345] Enforce ErrorProne analysis in JDBC IO

[timrobertson100] [BEAM-4336] Enforce ErrorProne analysis in AMQP IO

[timrobertson100] [BEAM-4346] Enforce ErrorProne analysis in JMS IO

[timrobertson100] [BEAM-4352] Enforce ErrorProne analysis in Redis IO

--
[...truncated 158.13 KB...]
TimedOutException: 'test_multi_valued_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest)'

==
ERROR: test_multiple_empty_outputs 
(apache_beam.transforms.ptransform_test.PTransformTest)
--
Traceback (most recent call last):
  File 
"
 line 812, in run
test(orig)
  File 
"
 line 45, in __call__
return self.run(*arg, **kwarg)
  File 
"
 line 133, in run
self.runTest(result)
  File 
"
 line 151, in runTest
test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
testMethod()
  File 
"
 line 271, in test_multiple_empty_outputs
pipeline.run()
  File 
"
 line 102, in run
result = super(TestPipeline, self).run(test_runner_api)
  File 
"
 line 389, in run
self.to_runner_api(), self.runner, self._options).run(False)
  File 
"
 line 402, in run
return self.runner.run_pipeline(self)
  File 
"
 line 67, in run_pipeline
self.wait_until_in_state(PipelineState.CANCELLED, timeout=300)
  File 
"
 line 87, in wait_until_in_state
job_state = self.result.state
  File 
"
 line 1033, in state
self._update_job()
  File 
"
 line 1011, in _update_job
self._job = self._runner.dataflow_client.get_job(self.job_id())
  File 
"
 line 180, in wrapper
return fun(*args, **kwargs)
  File 
"
 line 624, in get_job
response = self._client.projects_locations_jobs.Get(request)
  File 

[jira] [Work logged] (BEAM-3377) assert_that not working for streaming

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3377?focusedWorklogId=103533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103533
 ]

ASF GitHub Bot logged work on BEAM-3377:


Author: ASF GitHub Bot
Created on: 18/May/18 19:09
Start Date: 18/May/18 19:09
Worklog Time Spent: 10m 
  Work Description: mariapython commented on issue #5384: [BEAM-3377] Add 
validation for streaming wordcount with assert_that
URL: https://github.com/apache/beam/pull/5384#issuecomment-390304466
 
 
   Just saw this, will do after lunch
   
   ---
   María from the phone
   
   > On May 18, 2018, at 9:40 AM, Ahmet Altay  wrote:
   > 
   > LGTM. Could you squash your changes? I can also do this if you prefer that.
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub, or mute the thread.
   > 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103533)
Time Spent: 5h 40m  (was: 5.5h)

> assert_that not working for streaming
> -
>
> Key: BEAM-3377
> URL: https://issues.apache.org/jira/browse/BEAM-3377
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: María GH
>Priority: Major
>  Labels: starter
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> assert_that does not work for AfterWatermark timers.
> Easy way to reproduce: modify test_gbk_execution [1] in this form:
>  
> {code:java}
>  def test_this(self):
> test_stream = (TestStream()
>.add_elements(['a', 'b', 'c'])
>.advance_watermark_to(20))
> def fnc(x):
>   print 'fired_elem:', x
>   return x
> options = PipelineOptions()
> options.view_as(StandardOptions).streaming = True
> p = TestPipeline(options=options)
> records = (p
>| test_stream
>| beam.WindowInto(
>FixedWindows(15),
>
> trigger=trigger.AfterWatermark(early=trigger.AfterCount(2)),
>accumulation_mode=trigger.AccumulationMode.ACCUMULATING)
>| beam.Map(lambda x: ('k', x))
>| beam.GroupByKey())
> assert_that(records, equal_to([
> ('k', ['a', 'b', 'c'])]))
> p.run()
> {code}
> This test will pass, but if the .advance_watermark_to(20) is removed, the 
> test will fail. However, both cases fire the same elements:
>   fired_elem: ('k', ['a', 'b', 'c'])
>   fired_elem: ('k', ['a', 'b', 'c'])
> In the passing case, they correspond to the sorted_actual inside the 
> assert_that. In the failing case:
>   sorted_actual: [('k', ['a', 'b', 'c']), ('k', ['a', 'b', 'c'])]
>   sorted_actual: []
> [1] 
> https://github.com/mariapython/incubator-beam/blob/direct-timers-show/sdks/python/apache_beam/testing/test_stream_test.py#L120



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4318) Enforce ErrorProne analysis in Spark runner project

2018-05-18 Thread Scott Wegner (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481061#comment-16481061
 ] 

Scott Wegner commented on BEAM-4318:


For the {{TypeParameterUnusedInFormals}} warning, is [this the code 
here|https://github.com/apache/beam/blob/30d8176a4190c6401cfbc2b43e8c9d930f99ee99/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L215].
 Note that fixing this would be an API breaking change, so I'd say the correct 
thing to do is suppress. Too bad we didn't have such analysis when we were 
designing the API.

And the {{WordCount.java}} validation is 
[here|https://github.com/apache/beam/blob/30d8176a4190c6401cfbc2b43e8c9d930f99ee99/runners/spark/src/main/java/org/apache/beam/runners/spark/examples/WordCount.java#L57].
 I saw many instances of this when updating sdks-java-core, and I took the 
suggested fix.

I'm not sure about the {{DefaultAnnotation}} warning. I remember seeing before 
but I can't remember the fix. Does this block enabling {{-Werror}} for the 
project?

Thanks for your help so far! Feel free to open a PR with your progress so far; 
it might be easier to move the conversation closer to the code.

> Enforce ErrorProne analysis in Spark runner project
> ---
>
> Key: BEAM-4318
> URL: https://issues.apache.org/jira/browse/BEAM-4318
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Scott Wegner
>Priority: Minor
>  Labels: errorprone, starter
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-runners-spark}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-runners-spark:assemble}}
> # Fix each ErrorProne warning from the {{runners/spark}} project.
> # In {{runners/spark/build.gradle}}, add {{failOnWarning: true}} to the call 
> the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3999) Futurize and fix python 2 compatibility for internal subpackage

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3999?focusedWorklogId=103532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103532
 ]

ASF GitHub Bot logged work on BEAM-3999:


Author: ASF GitHub Bot
Created on: 18/May/18 18:59
Start Date: 18/May/18 18:59
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#5334: [BEAM-3999] Futurize internal subpackage
URL: https://github.com/apache/beam/pull/5334#discussion_r189364662
 
 

 ##
 File path: sdks/python/apache_beam/internal/util.py
 ##
 @@ -20,9 +20,13 @@
 For internal use only. No backwards compatibility guarantees.
 """
 
+from __future__ import absolute_import
+
 import logging
 import threading
 import weakref
+from builtins import next
 
 Review comment:
   I checked again, and it seems like the linter checks for `.next()` calls on 
objects. So we should be safe.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103532)
Time Spent: 2h 50m  (was: 2h 40m)

> Futurize and fix python 2 compatibility for internal subpackage
> ---
>
> Key: BEAM-3999
> URL: https://issues.apache.org/jira/browse/BEAM-3999
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #474

2018-05-18 Thread Apache Jenkins Server
See 


--
[...truncated 20.52 MB...]
INFO: Running Dataflow job 2018-05-18_11_51_45-872778116657510860 with 0 
expected assertions.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:45.522Z: Autoscaling is enabled for job 
2018-05-18_11_51_45-872778116657510860. The number of workers will be between 1 
and 1000.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:45.563Z: Autoscaling was automatically enabled for 
job 2018-05-18_11_51_45-872778116657510860.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:48.260Z: Checking required Cloud APIs are enabled.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:48.392Z: Checking permissions granted to controller 
Service Account.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:52.890Z: Worker configuration: n1-standard-1 in 
us-central1-f.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:53.315Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:53.576Z: Expanding GroupByKey operations into 
optimizable parts.
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:53.609Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:53.873Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:53.914Z: Elided trivial flatten 
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:53.961Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:54.008Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:54.055Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:54.099Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:54.142Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:51:54.185Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 18, 2018 6:51:56 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 

[jira] [Work logged] (BEAM-2937) Fn API combiner support w/ lifting to PGBK

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2937?focusedWorklogId=103529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103529
 ]

ASF GitHub Bot logged work on BEAM-2937:


Author: ASF GitHub Bot
Created on: 18/May/18 18:44
Start Date: 18/May/18 18:44
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5128: 
[BEAM-2937] Update Portable Combine URNs to new URNs.
URL: https://github.com/apache/beam/pull/5128#discussion_r189360229
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -245,6 +245,25 @@ message StandardPTransforms {
 COMBINE_PGBKCV = 0 [(beam_urn) = "beam:transform:combine_pgbkcv:v1"];
 COMBINE_MERGE_ACCUMULATORS = 1 [(beam_urn) = 
"beam:transform:combine_merge_accumulators:v1"];
 COMBINE_EXTRACT_OUTPUTS = 2 [(beam_urn) = 
"beam:transform:combine_extract_outputs:v1"];
+
+// Represents the lifted part of a Combine.perKey() operation. Caches and
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103529)
Time Spent: 1h 20m  (was: 1h 10m)

> Fn API combiner support w/ lifting to PGBK
> --
>
> Key: BEAM-2937
> URL: https://issues.apache.org/jira/browse/BEAM-2937
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The FnAPI should support this optimization. Detailed design: 
> https://s.apache.org/beam-runner-api-combine-model
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3883) Python SDK stages artifacts when talking to job server

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3883?focusedWorklogId=103528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103528
 ]

ASF GitHub Bot logged work on BEAM-3883:


Author: ASF GitHub Bot
Created on: 18/May/18 18:40
Start Date: 18/May/18 18:40
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5251: [BEAM-3883] Refactor 
and clean dependency.py to make it reusable with artifact service
URL: https://github.com/apache/beam/pull/5251#issuecomment-390296544
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103528)
Time Spent: 15h 40m  (was: 15.5h)

> Python SDK stages artifacts when talking to job server
> --
>
> Key: BEAM-3883
> URL: https://issues.apache.org/jira/browse/BEAM-3883
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> The Python SDK does not currently stage its user-defined functions or 
> dependencies when talking to the job API. Artifacts that need to be staged 
> include the user code itself, any SDK components not included in the 
> container image, and the list of Python packages that must be installed at 
> runtime.
>  
> Artifacts that are currently expected can be found in the harness boot code: 
> [https://github.com/apache/beam/blob/58e3b06bee7378d2d8db1c8dd534b415864f63e1/sdks/python/container/boot.go#L52.]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3999) Futurize and fix python 2 compatibility for internal subpackage

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3999?focusedWorklogId=103526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103526
 ]

ASF GitHub Bot logged work on BEAM-3999:


Author: ASF GitHub Bot
Created on: 18/May/18 18:36
Start Date: 18/May/18 18:36
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on a change in pull request 
#5334: [BEAM-3999] Futurize internal subpackage
URL: https://github.com/apache/beam/pull/5334#discussion_r189358357
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -17,7 +17,8 @@
 
 [tox]
 # new environments will be excluded by default unless explicitly added to 
envlist.
-envlist = py27,py27-{gcp,cython,lint,lint3},py3-lint,docs
+#envlist = py27,py27-{gcp,cython,lint,lint3},py3-lint,docs
 
 Review comment:
   My mistake.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103526)
Time Spent: 2h 40m  (was: 2.5h)

> Futurize and fix python 2 compatibility for internal subpackage
> ---
>
> Key: BEAM-3999
> URL: https://issues.apache.org/jira/browse/BEAM-3999
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #473

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Add a new DockerEnvironmentFactory Constructor

--
[...truncated 19.77 MB...]
INFO: 2018-05-18T18:22:57.086Z: Autoscaling is enabled for job 
2018-05-18_11_22_57-15055022852153310261. The number of workers will be between 
1 and 1000.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:22:57.123Z: Autoscaling was automatically enabled for 
job 2018-05-18_11_22_57-15055022852153310261.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:22:59.677Z: Checking required Cloud APIs are enabled.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:22:59.794Z: Checking permissions granted to controller 
Service Account.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:03.309Z: Worker configuration: n1-standard-1 in 
us-central1-b.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:03.767Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.011Z: Expanding GroupByKey operations into 
optimizable parts.
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.050Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.288Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.324Z: Elided trivial flatten 
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.368Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.404Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.440Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.475Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.512Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.557Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 18, 2018 6:23:06 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:23:04.600Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #472

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[samuelw] [BEAM-3776] Fix issue with merging late windows where a watermark hold

--
[...truncated 20.04 MB...]
INFO: Running Dataflow job 2018-05-18_11_16_37-1126602433956561183 with 0 
expected assertions.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:37.589Z: Autoscaling is enabled for job 
2018-05-18_11_16_37-1126602433956561183. The number of workers will be between 
1 and 1000.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:37.624Z: Autoscaling was automatically enabled for 
job 2018-05-18_11_16_37-1126602433956561183.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:40.484Z: Checking required Cloud APIs are enabled.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:41.086Z: Checking permissions granted to controller 
Service Account.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:48.564Z: Worker configuration: n1-standard-1 in 
us-central1-b.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.043Z: Expanding CoGroupByKey operations into 
optimizable parts.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.475Z: Expanding GroupByKey operations into 
optimizable parts.
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.525Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.780Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.812Z: Elided trivial flatten 
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.859Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.900Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.952Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:49.991Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:50.035Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
May 18, 2018 6:16:58 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-05-18T18:16:50.075Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
May 18, 

[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103519
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 18/May/18 18:19
Start Date: 18/May/18 18:19
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5422: 
[BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189347253
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaExactlyOnceSink.java
 ##
 @@ -563,6 +564,7 @@ void commitTxn(long lastRecordId, Counter numTransactions) 
throws IOException {
  * closed if it is stays in cache for more than 1 minute, i.e. not used 
inside
  * KafkaExactlyOnceSink DoFn for a minute.
  */
+@SuppressWarnings("FutureReturnValueIgnored")
 
 Review comment:
   I don't see where a Future is ignored. Please leave a comment :
  // TODO : rangadi : review if this is even required.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103519)
Time Spent: 1h 10m  (was: 1h)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103515
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 18/May/18 18:18
Start Date: 18/May/18 18:18
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5422: 
[BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189345982
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaExactlyOnceSink.java
 ##
 @@ -433,6 +433,7 @@ void beginTxn() {
 ProducerSpEL.beginTransaction(producer);
   }
 
+  @SuppressWarnings("FutureReturnValueIgnored")
 
 Review comment:
   Please leave a TODO for me.
   // TODO : rangadi : add explanation for why this is ok.
   Can we make this send() return future and ignore it at the call site?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103515)
Time Spent: 50m  (was: 40m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103514
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 18/May/18 18:18
Start Date: 18/May/18 18:18
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5422: 
[BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189352223
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java
 ##
 @@ -614,7 +619,7 @@ private void nextBatch() {
 partitionStates.forEach(p -> p.recordIter = 
records.records(p.topicPartition).iterator());
 
 // cycle through the partitions in order to interleave records from each.
-curBatch = Iterators.cycle(new LinkedList<>(partitionStates));
+curBatch = Iterators.cycle(new ArrayList<>(partitionStates));
 
 Review comment:
   Was this also recommended by ErrorProne?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103514)
Time Spent: 40m  (was: 0.5h)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103516
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 18/May/18 18:18
Start Date: 18/May/18 18:18
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5422: 
[BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189347253
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaExactlyOnceSink.java
 ##
 @@ -563,6 +564,7 @@ void commitTxn(long lastRecordId, Counter numTransactions) 
throws IOException {
  * closed if it is stays in cache for more than 1 minute, i.e. not used 
inside
  * KafkaExactlyOnceSink DoFn for a minute.
  */
+@SuppressWarnings("FutureReturnValueIgnored")
 
 Review comment:
   I don't see where a Future is ignored. Please leave a commit : 
  // TODO : rangadi : review if this is even required.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103516)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103518
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 18/May/18 18:18
Start Date: 18/May/18 18:18
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5422: 
[BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189351687
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java
 ##
 @@ -50,6 +50,7 @@ public void setup() {
   }
 
   @ProcessElement
+  @SuppressWarnings("FutureReturnValueIgnored")
 
 Review comment:
// TODO : rangadi : explain why this is ok.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103518)
Time Spent: 1h  (was: 50m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4347) Enforce ErrorProne analysis in the kafka IO project

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4347?focusedWorklogId=103517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103517
 ]

ASF GitHub Bot logged work on BEAM-4347:


Author: ASF GitHub Bot
Created on: 18/May/18 18:18
Start Date: 18/May/18 18:18
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5422: 
[BEAM-4347] Enforce ErrorProne analysis in kafka IO
URL: https://github.com/apache/beam/pull/5422#discussion_r189352587
 
 

 ##
 File path: 
sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java
 ##
 @@ -529,7 +529,7 @@ public void testUnboundedSourceCustomTimestamps() {
   (tp, prevWatermark) -> new 
CustomTimestampPolicyWithLimitedDelay(
 (record -> new 
Instant(TimeUnit.SECONDS.toMillis(record.getKV().getValue())
  + customTimestampStartMillis)),
-   Duration.millis(0),
+   Duration.ZERO,
 
 Review comment:
   Just curious: ErrorProne suggested this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103517)
Time Spent: 1h  (was: 50m)

> Enforce ErrorProne analysis in the kafka IO project
> ---
>
> Key: BEAM-4347
> URL: https://issues.apache.org/jira/browse/BEAM-4347
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Scott Wegner
>Assignee: Tim Robertson
>Priority: Minor
>  Labels: errorprone, starter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Java ErrorProne static analysis was [recently 
> enabled|https://github.com/apache/beam/pull/5161] in the Gradle build 
> process, but only as warnings. ErrorProne errors are generally useful and 
> easy to fix. Some work was done to [make sdks-java-core 
> ErrorProne-clean|https://github.com/apache/beam/pull/5319] and add 
> enforcement. This task is clean ErrorProne warnings and add enforcement in 
> {{beam-sdks-java-io-kafka}}. Additional context discussed on the [dev 
> list|https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E].
> Fixing this issue will involve:
> # Follow instructions in the [Contribution 
> Guide|https://beam.apache.org/contribute/] to set up a {{beam}} development 
> environment.
> # Run the following command to compile and run ErrorProne analysis on the 
> project: {{./gradlew :beam-sdks-java-io-kafka:assemble}}
> # Fix each ErrorProne warning from the {{sdks/java/io/kafka}} project.
> # In {{sdks/java/io/kafka/build.gradle}}, add {{failOnWarning: true}} to the 
> call the {{applyJavaNature()}} 
> ([example|https://github.com/apache/beam/pull/5319/files#diff-9390c20635aed5f42f83b97506a87333R20]).
> This starter issue is sponsored by [~swegner]. Feel free to [reach 
> out|https://beam.apache.org/community/contact-us/] with questions or code 
> review:
> * JIRA: [~swegner]
> * GitHub: [@swegner|https://github.com/swegner]
> * Slack: [@Scott Wegner|https://s.apache.org/beam-slack-channel]
> * Email: swegner at google dot com



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_XmlIOIT_HDFS #184

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Add a new DockerEnvironmentFactory Constructor

[samuelw] [BEAM-3776] Fix issue with merging late windows where a watermark hold

[herohde] Add the Go SDK to the README

[timrobertson100] [BEAM-4340] Enforce ErrorProne analysis in file-based-io-tests

[timrobertson100] [BEAM-4335] Enforce ErrorProne analysis in 
amazon-web-services IO

[timrobertson100] [BEAM-4339] Enforce ErrorProne analysis in elasticsearch IO

[timrobertson100] [BEAM-4355] Enforce ErrorProne analysis in XML IO

[timrobertson100] [BEAM-4338] Enforce ErrorProne analysis in common IO

[timrobertson100] [BEAM-4337] Enforce ErrorProne analysis in cassandra IO

[timrobertson100] [BEAM-4355] Reduces scope of findbugs annotations to build 
time only

[timrobertson100] [BEAM-4353] Enforce ErrorProne analysis in solr IO

[timrobertson100] [BEAM-4345] Enforce ErrorProne analysis in JDBC IO

[timrobertson100] [BEAM-4336] Enforce ErrorProne analysis in AMQP IO

[timrobertson100] [BEAM-4346] Enforce ErrorProne analysis in JMS IO

[timrobertson100] [BEAM-4352] Enforce ErrorProne analysis in Redis IO

--
[...truncated 419.99 KB...]
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.net.ConnectException: Call From 
xmlioit0writethenreadall--05181107-6skv-harness-09q6.c.apache-beam-testing.internal/10.128.0.35
 to 149.153.193.35.bc.googleusercontent.com:9000 failed on connection 
exception: java.net.ConnectException: Connection refused; For more details see: 
 http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy65.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy66.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 

Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #186

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Add a new DockerEnvironmentFactory Constructor

[samuelw] [BEAM-3776] Fix issue with merging late windows where a watermark hold

[herohde] Add the Go SDK to the README

[timrobertson100] [BEAM-4340] Enforce ErrorProne analysis in file-based-io-tests

[timrobertson100] [BEAM-4335] Enforce ErrorProne analysis in 
amazon-web-services IO

[timrobertson100] [BEAM-4339] Enforce ErrorProne analysis in elasticsearch IO

[timrobertson100] [BEAM-4355] Enforce ErrorProne analysis in XML IO

[timrobertson100] [BEAM-4338] Enforce ErrorProne analysis in common IO

[timrobertson100] [BEAM-4337] Enforce ErrorProne analysis in cassandra IO

[timrobertson100] [BEAM-4355] Reduces scope of findbugs annotations to build 
time only

[timrobertson100] [BEAM-4353] Enforce ErrorProne analysis in solr IO

[timrobertson100] [BEAM-4345] Enforce ErrorProne analysis in JDBC IO

[timrobertson100] [BEAM-4336] Enforce ErrorProne analysis in AMQP IO

[timrobertson100] [BEAM-4346] Enforce ErrorProne analysis in JMS IO

[timrobertson100] [BEAM-4352] Enforce ErrorProne analysis in Redis IO

--
[...truncated 188.28 KB...]
INFO: No server chosen by WritableServerSelector from cluster description 
ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, 
all=[ServerDescription{address=35.202.235.114:27017, type=UNKNOWN, 
state=CONNECTING}]}. Waiting for 3 ms before timing out
May 18, 2018 6:16:16 PM com.mongodb.diagnostics.logging.SLF4JLogger info
INFO: Opened connection [connectionId{localValue:1, serverValue:1}] to 
35.202.235.114:27017
May 18, 2018 6:16:16 PM com.mongodb.diagnostics.logging.SLF4JLogger info
INFO: Monitor thread successfully connected to server with description 
ServerDescription{address=35.202.235.114:27017, type=STANDALONE, 
state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 6, 4]}, 
minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, 
roundTripTimeNanos=1529402}
May 18, 2018 6:16:16 PM com.mongodb.diagnostics.logging.SLF4JLogger info
INFO: Opened connection [connectionId{localValue:2, serverValue:2}] to 
35.202.235.114:27017

Gradle Test Executor 1 finished executing tests.

> Task :beam-sdks-java-io-mongodb:integrationTest FAILED

org.apache.beam.sdk.io.mongodb.MongoDBIOIT > testWriteAndRead FAILED
java.lang.RuntimeException: com.mongodb.MongoSocketReadException: 
Prematurely reached end of stream
at com.mongodb.connection.SocketStream.read(SocketStream.java:88)
at 
com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:491)
at 
com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221)
at 
com.mongodb.connection.UsageTrackingInternalConnection.receiveMessage(UsageTrackingInternalConnection.java:102)
at 
com.mongodb.connection.DefaultConnectionPool$PooledConnection.receiveMessage(DefaultConnectionPool.java:435)
at 
com.mongodb.connection.WriteCommandProtocol.receiveMessage(WriteCommandProtocol.java:234)
at 
com.mongodb.connection.WriteCommandProtocol.execute(WriteCommandProtocol.java:104)
at 
com.mongodb.connection.InsertCommandProtocol.execute(InsertCommandProtocol.java:67)
at 
com.mongodb.connection.InsertCommandProtocol.execute(InsertCommandProtocol.java:37)
at 
com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at 
com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286)
at 
com.mongodb.connection.DefaultServerConnection.insertCommand(DefaultServerConnection.java:115)
at 
com.mongodb.operation.MixedBulkWriteOperation$Run$2.executeWriteCommandProtocol(MixedBulkWriteOperation.java:455)
at 
com.mongodb.operation.MixedBulkWriteOperation$Run$RunExecutor.execute(MixedBulkWriteOperation.java:646)
at 
com.mongodb.operation.MixedBulkWriteOperation$Run.execute(MixedBulkWriteOperation.java:401)
at 
com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:179)
at 
com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:230)
at 
com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:221)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at 
com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at 
com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
at 

Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #278

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Add a new DockerEnvironmentFactory Constructor

[samuelw] [BEAM-3776] Fix issue with merging late windows where a watermark hold

[herohde] Add the Go SDK to the README

[timrobertson100] [BEAM-4340] Enforce ErrorProne analysis in file-based-io-tests

[timrobertson100] [BEAM-4335] Enforce ErrorProne analysis in 
amazon-web-services IO

[timrobertson100] [BEAM-4339] Enforce ErrorProne analysis in elasticsearch IO

[timrobertson100] [BEAM-4355] Enforce ErrorProne analysis in XML IO

[timrobertson100] [BEAM-4338] Enforce ErrorProne analysis in common IO

[timrobertson100] [BEAM-4337] Enforce ErrorProne analysis in cassandra IO

[timrobertson100] [BEAM-4355] Reduces scope of findbugs annotations to build 
time only

[timrobertson100] [BEAM-4353] Enforce ErrorProne analysis in solr IO

[timrobertson100] [BEAM-4345] Enforce ErrorProne analysis in JDBC IO

[timrobertson100] [BEAM-4336] Enforce ErrorProne analysis in AMQP IO

[timrobertson100] [BEAM-4346] Enforce ErrorProne analysis in JMS IO

[timrobertson100] [BEAM-4352] Enforce ErrorProne analysis in Redis IO

--
[...truncated 103.53 KB...]

> Task :beam-sdks-java-io-hadoop-input-format:classes UP-TO-DATE
Skipping task ':beam-sdks-java-io-hadoop-input-format:classes' as it has no 
actions.
:beam-sdks-java-io-hadoop-input-format:classes (Thread[Task worker for ':' 
Thread 6,5,main]) completed. Took 0.0 secs.

> Task :beam-sdks-java-io-google-cloud-platform:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-google-cloud-platform:compileTestJava' is 
26612998be703f318146f2150d1341a5
Skipping task ':beam-sdks-java-io-google-cloud-platform:compileTestJava' as it 
is up-to-date.
:beam-sdks-java-io-google-cloud-platform:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) completed. Took 0.336 secs.
:beam-sdks-java-io-google-cloud-platform:testClasses (Thread[Task worker for 
':',5,main]) started.

> Task :beam-sdks-java-io-google-cloud-platform:testClasses UP-TO-DATE
Skipping task ':beam-sdks-java-io-google-cloud-platform:testClasses' as it has 
no actions.
:beam-sdks-java-io-google-cloud-platform:testClasses (Thread[Task worker for 
':',5,main]) completed. Took 0.0 secs.
:beam-sdks-java-io-google-cloud-platform:shadowTestJar (Thread[Task worker for 
':',5,main]) started.

> Task :beam-sdks-java-io-google-cloud-platform:shadowTestJar UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-google-cloud-platform:shadowTestJar' is 
3190b11084f03dc95424cd6775d27ae6
Caching disabled for task 
':beam-sdks-java-io-google-cloud-platform:shadowTestJar': Caching has not been 
enabled for the task
Skipping task ':beam-sdks-java-io-google-cloud-platform:shadowTestJar' as it is 
up-to-date.
:beam-sdks-java-io-google-cloud-platform:shadowTestJar (Thread[Task worker for 
':',5,main]) completed. Took 0.196 secs.
:beam-runners-google-cloud-dataflow-java:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) started.

> Task :beam-runners-google-cloud-dataflow-java:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-runners-google-cloud-dataflow-java:compileTestJava' is 
7d91180e8e0df1ce1a6ce0f1ec011676
Skipping task ':beam-runners-google-cloud-dataflow-java:compileTestJava' as it 
is up-to-date.
:beam-runners-google-cloud-dataflow-java:compileTestJava (Thread[Task worker 
for ':' Thread 10,5,main]) completed. Took 0.267 secs.
:beam-runners-google-cloud-dataflow-java:testClasses (Thread[Task worker for 
':',5,main]) started.

> Task :beam-runners-google-cloud-dataflow-java:testClasses UP-TO-DATE
Skipping task ':beam-runners-google-cloud-dataflow-java:testClasses' as it has 
no actions.
:beam-runners-google-cloud-dataflow-java:testClasses (Thread[Task worker for 
':',5,main]) completed. Took 0.0 secs.
:beam-runners-google-cloud-dataflow-java:shadowTestJar (Thread[Task worker for 
':',5,main]) started.

> Task :beam-runners-google-cloud-dataflow-java:shadowTestJar UP-TO-DATE
Build cache key for task 
':beam-runners-google-cloud-dataflow-java:shadowTestJar' is 
9053fe8df3683ba3a91d6a670bc81ee6
Caching disabled for task 
':beam-runners-google-cloud-dataflow-java:shadowTestJar': Caching has not been 
enabled for the task
Skipping task ':beam-runners-google-cloud-dataflow-java:shadowTestJar' as it is 
up-to-date.
:beam-runners-google-cloud-dataflow-java:shadowTestJar (Thread[Task worker for 
':',5,main]) completed. Took 0.292 secs.
:beam-sdks-java-io-hadoop-input-format:compileTestJava (Thread[Task worker for 
':' Thread 10,5,main]) started.

> Task :beam-sdks-java-io-hadoop-input-format:compileTestJava UP-TO-DATE
Build cache key for task 
':beam-sdks-java-io-hadoop-input-format:compileTestJava' is 
355c9d82644e1c8268a1354ecd0ed515
Skipping task ':beam-sdks-java-io-hadoop-input-format:compileTestJava' as it is 
up-to-date.

Jenkins build is back to normal : beam_PerformanceTests_Compressed_TextIOIT_HDFS #185

2018-05-18 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103510
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 18:06
Start Date: 18/May/18 18:06
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189349159
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java
 ##
 @@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.state;
+
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.protobuf.ByteString;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionStage;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.TypeCase;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.MultimapSideInputSpec;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.stream.DataStreams;
+import org.apache.beam.sdk.fn.stream.DataStreams.ElementDelimitedOutputStream;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.common.Reiterable;
+
+/**
+ * A set of utility methods which construct {@link StateRequestHandler}s.
+ *
+ * TODO: Add a variant which works on {@link ByteString}s to remove 
encoding/decoding overhead.
+ */
+public class StateRequestHandlers {
+
+  /**
+   * A handler for multimap side inputs.
+   */
+  public interface MultimapSideInputHandler {
+/**
+ * Returns an {@link Iterable} of values representing the side input for 
the given key and
+ * window.
+ *
+ * TODO: Add support for side input chunking and caching if a {@link 
Reiterable} is returned.
+ */
+Iterable get(K key, W window);
+  }
+
+  /**
+   * A factory which constructs {@link MultimapSideInputHandler}s.
+   */
+  public interface MultimapSideInputHandlerFactory {
+
+/**
+ * Returns a {@link MultimapSideInputHandler} for the given {@code 
pTransformId} and
+ * {@code sideInputId}. The supplied {@code keyCoder}, {@code valueCoder}, 
and
+ * {@code windowCoder} should be used to encode/decode their respective 
values.
+ */
+ MultimapSideInputHandler 
forSideInput(
+String pTransformId,
+String sideInputId,
+Coder keyCoder,
+Coder valueCoder,
+Coder windowCoder);
+
+/**
+ * Throws a {@link UnsupportedOperationException} on the first access.
+ */
+static MultimapSideInputHandlerFactory unsupported() {
+  return new MultimapSideInputHandlerFactory() {
+@Override
+public  MultimapSideInputHandler forSideInput(
+String pTransformId, String sideInputId, Coder keyCoder, 
Coder valueCoder,
+Coder windowCoder) {
+  throw new UnsupportedOperationException(String.format(
+  "The %s does not support handling sides inputs for PTransform %s 
with side "
+  + "input id %s.",
+  MultimapSideInputHandler.class.getSimpleName(),
+  pTransformId,
+  sideInputId));
+}
+  };
+}
+  }
+
+  /**
+   * 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103511
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 18:06
Start Date: 18/May/18 18:06
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189349690
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/StateRequestHandlers.java
 ##
 @@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.state;
+
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.protobuf.ByteString;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.CompletionStage;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateGetResponse;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey.TypeCase;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.MultimapSideInputSpec;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.stream.DataStreams;
+import org.apache.beam.sdk.fn.stream.DataStreams.ElementDelimitedOutputStream;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.common.Reiterable;
+
+/**
+ * A set of utility methods which construct {@link StateRequestHandler}s.
+ *
+ * TODO: Add a variant which works on {@link ByteString}s to remove 
encoding/decoding overhead.
+ */
+public class StateRequestHandlers {
+
+  /**
+   * A handler for multimap side inputs.
+   */
+  public interface MultimapSideInputHandler {
+/**
+ * Returns an {@link Iterable} of values representing the side input for 
the given key and
+ * window.
+ *
+ * TODO: Add support for side input chunking and caching if a {@link 
Reiterable} is returned.
+ */
+Iterable get(K key, W window);
+  }
+
+  /**
+   * A factory which constructs {@link MultimapSideInputHandler}s.
+   */
+  public interface MultimapSideInputHandlerFactory {
+
+/**
+ * Returns a {@link MultimapSideInputHandler} for the given {@code 
pTransformId} and
+ * {@code sideInputId}. The supplied {@code keyCoder}, {@code valueCoder}, 
and
+ * {@code windowCoder} should be used to encode/decode their respective 
values.
+ */
+ MultimapSideInputHandler 
forSideInput(
+String pTransformId,
+String sideInputId,
+Coder keyCoder,
+Coder valueCoder,
+Coder windowCoder);
+
+/**
+ * Throws a {@link UnsupportedOperationException} on the first access.
+ */
+static MultimapSideInputHandlerFactory unsupported() {
+  return new MultimapSideInputHandlerFactory() {
+@Override
+public  MultimapSideInputHandler forSideInput(
+String pTransformId, String sideInputId, Coder keyCoder, 
Coder valueCoder,
+Coder windowCoder) {
+  throw new UnsupportedOperationException(String.format(
+  "The %s does not support handling sides inputs for PTransform %s 
with side "
+  + "input id %s.",
+  MultimapSideInputHandler.class.getSimpleName(),
+  pTransformId,
+  sideInputId));
+}
+  };
+}
+  }
+
+  /**
+   * 

Build failed in Jenkins: beam_PerformanceTests_ParquetIOIT #8

2018-05-18 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Add a new DockerEnvironmentFactory Constructor

[samuelw] [BEAM-3776] Fix issue with merging late windows where a watermark hold

[herohde] Add the Go SDK to the README

[timrobertson100] [BEAM-4340] Enforce ErrorProne analysis in file-based-io-tests

[timrobertson100] [BEAM-4335] Enforce ErrorProne analysis in 
amazon-web-services IO

[timrobertson100] [BEAM-4339] Enforce ErrorProne analysis in elasticsearch IO

[timrobertson100] [BEAM-4355] Enforce ErrorProne analysis in XML IO

[timrobertson100] [BEAM-4338] Enforce ErrorProne analysis in common IO

[timrobertson100] [BEAM-4337] Enforce ErrorProne analysis in cassandra IO

[timrobertson100] [BEAM-4355] Reduces scope of findbugs annotations to build 
time only

[timrobertson100] [BEAM-4353] Enforce ErrorProne analysis in solr IO

[timrobertson100] [BEAM-4345] Enforce ErrorProne analysis in JDBC IO

[timrobertson100] [BEAM-4336] Enforce ErrorProne analysis in AMQP IO

[timrobertson100] [BEAM-4346] Enforce ErrorProne analysis in JMS IO

[timrobertson100] [BEAM-4352] Enforce ErrorProne analysis in Redis IO

--
[...truncated 93.24 KB...]
Skipping task ':beam-runners-google-cloud-dataflow-java:shadowJar' as it is 
up-to-date.
:beam-runners-google-cloud-dataflow-java:shadowJar (Thread[Task worker for ':' 
Thread 7,5,main]) completed. Took 0.01 secs.

> Task :beam-sdks-java-core:compileTestJava UP-TO-DATE
Build cache key for task ':beam-sdks-java-core:compileTestJava' is 
aba26b5e7f8d3325c4117602fbe58fcc
Skipping task ':beam-sdks-java-core:compileTestJava' as it is up-to-date.
:beam-sdks-java-core:compileTestJava (Thread[Task worker for ':' Thread 
8,5,main]) completed. Took 0.254 secs.
:beam-sdks-java-core:processTestResources (Thread[Task worker for ':' Thread 
8,5,main]) started.

> Task :beam-sdks-java-core:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':beam-sdks-java-core:processTestResources' as it has no source 
files and no previous output files.
:beam-sdks-java-core:processTestResources (Thread[Task worker for ':' Thread 
8,5,main]) completed. Took 0.001 secs.
:beam-sdks-java-core:testClasses (Thread[Task worker for ':' Thread 8,5,main]) 
started.

> Task :beam-sdks-java-core:testClasses UP-TO-DATE
Skipping task ':beam-sdks-java-core:testClasses' as it has no actions.
:beam-sdks-java-core:testClasses (Thread[Task worker for ':' Thread 8,5,main]) 
completed. Took 0.0 secs.
:beam-sdks-java-core:shadowTestJar (Thread[Task worker for ':' Thread 
8,5,main]) started.

> Task :beam-sdks-java-core:shadowTestJar UP-TO-DATE
Build cache key for task ':beam-sdks-java-core:shadowTestJar' is 
e0385a5411a1f336e952d42246dcec03
Caching disabled for task ':beam-sdks-java-core:shadowTestJar': Caching has not 
been enabled for the task
Skipping task ':beam-sdks-java-core:shadowTestJar' as it is up-to-date.
:beam-sdks-java-core:shadowTestJar (Thread[Task worker for ':' Thread 
8,5,main]) completed. Took 0.022 secs.
:beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava 
(Thread[Task worker for ':' Thread 8,5,main]) started.
:beam-sdks-java-core:jar (Thread[Task worker for ':' Thread 3,5,main]) started.

> Task :beam-sdks-java-core:jar UP-TO-DATE
Build cache key for task ':beam-sdks-java-core:jar' is 
96009b7e45414e759c09b34bb68900e9
Caching disabled for task ':beam-sdks-java-core:jar': Caching has not been 
enabled for the task
Skipping task ':beam-sdks-java-core:jar' as it is up-to-date.
:beam-sdks-java-core:jar (Thread[Task worker for ':' Thread 3,5,main]) 
completed. Took 0.01 secs.

> Task :beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava 
> UP-TO-DATE
Build cache key for task 
':beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava' is 
829debe8c153d17a34810ecd31170893
Skipping task 
':beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava' as it 
is up-to-date.
:beam-sdks-java-extensions-google-cloud-platform-core:compileTestJava 
(Thread[Task worker for ':' Thread 8,5,main]) completed. Took 0.039 secs.
:beam-sdks-java-extensions-google-cloud-platform-core:testClasses (Thread[Task 
worker for ':' Thread 8,5,main]) started.

> Task :beam-sdks-java-extensions-google-cloud-platform-core:testClasses 
> UP-TO-DATE
Skipping task 
':beam-sdks-java-extensions-google-cloud-platform-core:testClasses' as it has 
no actions.
:beam-sdks-java-extensions-google-cloud-platform-core:testClasses (Thread[Task 
worker for ':' Thread 8,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar 
(Thread[Task worker for ':' Thread 8,5,main]) started.

> Task :beam-sdks-java-extensions-google-cloud-platform-core:shadowTestJar 
> UP-TO-DATE
Build cache key for task 

[jira] [Work logged] (BEAM-2937) Fn API combiner support w/ lifting to PGBK

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2937?focusedWorklogId=103505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103505
 ]

ASF GitHub Bot logged work on BEAM-2937:


Author: ASF GitHub Bot
Created on: 18/May/18 17:50
Start Date: 18/May/18 17:50
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5128: 
[BEAM-2937] Update Portable Combine URNs to new URNs.
URL: https://github.com/apache/beam/pull/5128#discussion_r189345697
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -245,6 +245,25 @@ message StandardPTransforms {
 COMBINE_PGBKCV = 0 [(beam_urn) = "beam:transform:combine_pgbkcv:v1"];
 COMBINE_MERGE_ACCUMULATORS = 1 [(beam_urn) = 
"beam:transform:combine_merge_accumulators:v1"];
 COMBINE_EXTRACT_OUTPUTS = 2 [(beam_urn) = 
"beam:transform:combine_extract_outputs:v1"];
+
+// Represents the lifted part of a Combine.perKey() operation. Caches and
 
 Review comment:
   Better to refer to the name and point to the doc then the java code and a 
vague description.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103505)
Time Spent: 1h 10m  (was: 1h)

> Fn API combiner support w/ lifting to PGBK
> --
>
> Key: BEAM-2937
> URL: https://issues.apache.org/jira/browse/BEAM-2937
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The FnAPI should support this optimization. Detailed design: 
> https://s.apache.org/beam-runner-api-combine-model
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103501
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 17:48
Start Date: 18/May/18 17:48
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189336837
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
+  bundleDescriptorBuilder
+  .getPcollectionsMap()
+  .get(pCollectionId)
+  .toBuilder()
+  .setCoderId(lengthPrefixedSideInputCoderId)
+  .build());
+
+  FullWindowedValueCoder coder =
+  (FullWindowedValueCoder) WireCoders.instantiateRunnerWireCoder(
+  sideInputReference.collection(), components);
+  idsToSpec.put(
+  sideInputReference.transform().getId(),
+  sideInputReference.localName(),
+  MultimapSideInputSpec.of(
+  sideInputReference.transform().getId(),
+  sideInputReference.localName(),
+  ((KvCoder) coder.getValueCoder()).getKeyCoder(),
+  ((KvCoder) coder.getValueCoder()).getValueCoder(),
+  coder.getWindowCoder()));
+}
+return idsToSpec.build().rowMap();
+  }
+
   @AutoValue
   abstract static class TargetEncoding {
 abstract BeamFnApi.Target getTarget();
 
 abstract Coder getCoder();
   }
 
+  /**
+   * A container type storing references to the key, value, and window coder 
used when
 
 Review comment:
   "coder" -> "java coders"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103501)
Time Spent: 40m  (was: 0.5h)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103499
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 17:48
Start Date: 18/May/18 17:48
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189344951
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/GrpcStateService.java
 ##
 @@ -39,15 +40,31 @@ public static GrpcStateService create() {
 return new GrpcStateService();
   }
 
+  private final ConcurrentLinkedQueue clients;
 
 Review comment:
   I'm not sure what this is doing here since it never appears to be populated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103499)
Time Spent: 0.5h  (was: 20m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by the runner match the coders used by SDKs. 
> Furthermore, it allows the _runners_ to specify coders. This involves changes 
> to the proto models and all SDKs.
>  * Annotate side input state requests with both key and value coders. This 
> inverts the expected responsibility and has the SDK determine runner coders. 
> Additionally, because runners do not understand all SDK types, additional 
> coder substitution will need to be done at request handling time to make sure 
> that the requested coder can be instantiated and will remain consistent with 
> the SDK coder. This requires only small changes to SDKs because they may opt 
> to use their default PCollection coders.
> All of the these approaches have their own downsides. Explicit side input 
> coders is probably the right thing to do long-term, but the simplest change 
> for now is to modify PCollection coders to match exactly how they're 
> materialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103498
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 17:48
Start Date: 18/May/18 17:48
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189338944
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
 
 Review comment:
   I'm not sure this is what we want (though it should should not change 
correctness). What if this PCollection is consumed as a normal input to some 
transform here? We're mutating the coder used by the internal SDK nodes during 
bundle processing. Prior to this change, PCollections internally used their 
original coders and our synthetic wire coders were only used for grpc reads and 
writes. Could you add a note (or todo or bug) about this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103498)
Time Spent: 20m  (was: 10m)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match 

[jira] [Work logged] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-05-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4271?focusedWorklogId=103500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103500
 ]

ASF GitHub Bot logged work on BEAM-4271:


Author: ASF GitHub Bot
Created on: 18/May/18 17:48
Start Date: 18/May/18 17:48
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5374: 
[BEAM-4271] Support side inputs for ExecutableStage and provide runner side 
utilities for handling multimap side inputs.
URL: https://github.com/apache/beam/pull/5374#discussion_r189340035
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
 ##
 @@ -146,13 +189,89 @@ private static TargetEncoding addStageOutput(
 wireCoder);
   }
 
+  private static Map> 
forMultimapSideInputs(
+  ExecutableStage stage,
+  Components components,
+  ProcessBundleDescriptor.Builder bundleDescriptorBuilder) throws 
IOException {
+ImmutableTable.Builder idsToSpec =
+ImmutableTable.builder();
+for (SideInputReference sideInputReference : stage.getSideInputs()) {
+  // Update the coder specification for side inputs to be length prefixed 
so that the
+  // SDK and Runner agree on how to encode/decode the key, window, and 
values for multimap
+  // side inputs.
+  String pCollectionId = sideInputReference.collection().getId();
+  RunnerApi.MessageWithComponents lengthPrefixedSideInputCoder =
+  LengthPrefixUnknownCoders.forCoder(
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId(),
+  components,
+  false);
+  String wireCoderId =
+  addWireCoder(sideInputReference.collection(), components, 
bundleDescriptorBuilder);
+  String lengthPrefixedSideInputCoderId = SyntheticComponents.uniqueId(
+  String.format(
+  "fn/side_input/%s",
+  components.getPcollectionsOrThrow(pCollectionId).getCoderId()),
+  bundleDescriptorBuilder.getCodersMap().keySet()::contains);
+
+  bundleDescriptorBuilder.putCoders(
+  lengthPrefixedSideInputCoderId, 
lengthPrefixedSideInputCoder.getCoder());
+  bundleDescriptorBuilder.putAllCoders(
+  lengthPrefixedSideInputCoder.getComponents().getCodersMap());
+  bundleDescriptorBuilder.putPcollections(
+  pCollectionId,
+  bundleDescriptorBuilder
+  .getPcollectionsMap()
+  .get(pCollectionId)
+  .toBuilder()
+  .setCoderId(lengthPrefixedSideInputCoderId)
+  .build());
+
+  FullWindowedValueCoder coder =
+  (FullWindowedValueCoder) WireCoders.instantiateRunnerWireCoder(
 
 Review comment:
   The correctness of this change now depends on `instantiateRunnerWireCoder` 
having the same behavior as the hand coder instantiation above. Can we tie the 
implementations together somehow? For example, by moving side input SDK coder 
construction into `WireCoders`? It would also be good to add a comment 
mentioning that those constructions must be kept in sync.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103500)

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these 

  1   2   3   >