Jenkins build is still unstable: beam_RunnableOnService_GoogleCloudDataflow » Apache Beam :: SDKs :: Java :: Core #14

2016-03-31 Thread Apache Jenkins Server
See 




[2/2] incubator-beam git commit: This closes #105

2016-03-31 Thread kenn
This closes #105


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/fba1259b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/fba1259b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/fba1259b

Branch: refs/heads/master
Commit: fba1259b402a462cd92d8ac62a4b872ecc3a24df
Parents: 164676b 27c6c79
Author: Kenneth Knowles 
Authored: Thu Mar 31 19:18:07 2016 -0700
Committer: Kenneth Knowles 
Committed: Thu Mar 31 19:18:07 2016 -0700

--
 .../cloud/dataflow/sdk/util/ReduceFnRunner.java | 29 ---
 .../dataflow/sdk/util/ReduceFnRunnerTest.java   | 38 
 2 files changed, 54 insertions(+), 13 deletions(-)
--




[GitHub] incubator-beam pull request: [BEAM-162] Check for closed windows e...

2016-03-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/105


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Drop elements in closed windows before mapping window

2016-03-31 Thread kenn
Repository: incubator-beam
Updated Branches:
  refs/heads/master 164676bc7 -> fba1259b4


Drop elements in closed windows before mapping window

Previously, the sequence was:

1. Map a window to a representative of its equivalence class
   according to merging.
2. Drop the element if that window was closed.

But this crashes if the original window was already closed.

The new sequence is reversed. This is safe, because it is not possible
to map to a representative which is closed, as it is no longer a
candidate for merges.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/27c6c795
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/27c6c795
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/27c6c795

Branch: refs/heads/master
Commit: 27c6c795271e7a927ed0d07679ce9d6de300c38f
Parents: 96e286f
Author: Mark Shields 
Authored: Thu Mar 31 10:57:55 2016 -0700
Committer: Kenneth Knowles 
Committed: Thu Mar 31 19:17:42 2016 -0700

--
 .../cloud/dataflow/sdk/util/ReduceFnRunner.java | 29 ---
 .../dataflow/sdk/util/ReduceFnRunnerTest.java   | 38 
 2 files changed, 54 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/27c6c795/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/ReduceFnRunner.java
--
diff --git 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/ReduceFnRunner.java
 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/ReduceFnRunner.java
index d62bcc9..2415dab 100644
--- 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/ReduceFnRunner.java
+++ 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/ReduceFnRunner.java
@@ -438,8 +438,22 @@ public class ReduceFnRunner {
 for (BoundedWindow untypedWindow : value.getWindows()) {
   @SuppressWarnings("unchecked")
   W window = (W) untypedWindow;
+
+  ReduceFn.Context directContext =
+  contextFactory.base(window, StateStyle.DIRECT);
+  if (triggerRunner.isClosed(directContext.state())) {
+// This window has already been closed.
+droppedDueToClosedWindow.addValue(1L);
+WindowTracing.debug(
+"ReduceFnRunner.processElement: Dropping element at {} for key:{}; 
window:{} "
++ "since window is no longer active at inputWatermark:{}; 
outputWatermark:{}",
+value.getTimestamp(), key, window, 
timerInternals.currentInputWatermarkTime(),
+timerInternals.currentOutputWatermarkTime());
+continue;
+  }
+
   W active = activeWindows.representative(window);
-  Preconditions.checkState(active != null, "Window %s should have been 
added", window);
+  Preconditions.checkState(active != null, "Window %s has no 
representative", window);
   windows.add(active);
 }
 
@@ -450,24 +464,13 @@ public class ReduceFnRunner {
   triggerRunner.prefetchForValue(window, directContext.state());
 }
 
-// Process the element for each (representative) window it belongs to.
+// Process the element for each (representative, not closed) window it 
belongs to.
 for (W window : windows) {
   ReduceFn.ProcessValueContext directContext = 
contextFactory.forValue(
   window, value.getValue(), value.getTimestamp(), StateStyle.DIRECT);
   ReduceFn.ProcessValueContext renamedContext = 
contextFactory.forValue(
   window, value.getValue(), value.getTimestamp(), StateStyle.RENAMED);
 
-  // Check to see if the triggerRunner thinks the window is closed. If so, 
drop that window.
-  if (triggerRunner.isClosed(directContext.state())) {
-droppedDueToClosedWindow.addValue(1L);
-WindowTracing.debug(
-"ReduceFnRunner.processElement: Dropping element at {} for key:{}; 
window:{} "
-+ "since window is no longer active at inputWatermark:{}; 
outputWatermark:{}",
-value.getTimestamp(), key, window, 
timerInternals.currentInputWatermarkTime(),
-timerInternals.currentOutputWatermarkTime());
-continue;
-  }
-
   nonEmptyPanes.recordContent(renamedContext.state());
 
   // Make sure we've scheduled the end-of-window or garbage collection 
timer for this window.

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/27c6c795/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/util/ReduceFnRunnerTest.java
--
diff --git 

[GitHub] incubator-beam pull request: Repeatedly#onFire should clear all fi...

2016-03-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/110


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This closes #110

2016-03-31 Thread kenn
This closes #110


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/164676bc
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/164676bc
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/164676bc

Branch: refs/heads/master
Commit: 164676bc7877bb0bb4e85381639b921b7f902234
Parents: e5bca60 2ad027f
Author: Kenneth Knowles 
Authored: Thu Mar 31 19:11:35 2016 -0700
Committer: Kenneth Knowles 
Committed: Thu Mar 31 19:11:35 2016 -0700

--
 .../sdk/transforms/windowing/Repeatedly.java|  4 +-
 .../transforms/windowing/RepeatedlyTest.java| 83 
 2 files changed, 85 insertions(+), 2 deletions(-)
--




Jenkins build is still unstable: beam_RunnableOnService_GoogleCloudDataflow » Apache Beam :: SDKs :: Java :: Core #13

2016-03-31 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: DebuggingWordCount now takes filter a...

2016-03-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/108


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: DebuggingWordCount now takes filter as an option

2016-03-31 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master 061e6b5d8 -> e5bca60de


DebuggingWordCount now takes filter as an option

Previously it was hard-coded as "Flourish|stomach".
Now it is a PipelineOption with that as the default.

This allows "breaking" the pipeline by mis-specifying the pattern
without changing the code.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/abb24cff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/abb24cff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/abb24cff

Branch: refs/heads/master
Commit: abb24cff479b714fec8a61d85af18bdea0a6aa16
Parents: 061e6b5
Author: bchambers 
Authored: Thu Mar 31 15:18:09 2016 -0700
Committer: bchambers 
Committed: Thu Mar 31 17:55:07 2016 -0700

--
 .../dataflow/examples/DebuggingWordCount.java   | 20 +--
 .../src/main/java/DebuggingWordCount.java   | 21 ++--
 2 files changed, 37 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/abb24cff/examples/java/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java
--
diff --git 
a/examples/java/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java
 
b/examples/java/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java
index 1f76181..331d7c6 100644
--- 
a/examples/java/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java
+++ 
b/examples/java/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java
@@ -17,9 +17,10 @@
  */
 package com.google.cloud.dataflow.examples;
 
-import com.google.cloud.dataflow.examples.WordCount.WordCountOptions;
 import com.google.cloud.dataflow.sdk.Pipeline;
 import com.google.cloud.dataflow.sdk.io.TextIO;
+import com.google.cloud.dataflow.sdk.options.Default;
+import com.google.cloud.dataflow.sdk.options.Description;
 import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
 import com.google.cloud.dataflow.sdk.testing.DataflowAssert;
 import com.google.cloud.dataflow.sdk.transforms.Aggregator;
@@ -151,6 +152,21 @@ public class DebuggingWordCount {
 }
   }
 
+  /**
+   * Options supported by {@link DebuggingWordCount}.
+   *
+   * Inherits standard configuration options and all options defined in
+   * {@link WordCount.WordCountOptions}.
+   */
+  public static interface WordCountOptions extends WordCount.WordCountOptions {
+
+@Description("Regex filter pattern to use in DebuggingWordCount. "
++ "Only words matching this pattern will be counted.")
+@Default.String("Flourish|stomach")
+String getFilterPattern();
+void setFilterPattern(String value);
+  }
+
   public static void main(String[] args) {
 WordCountOptions options = 
PipelineOptionsFactory.fromArgs(args).withValidation()
   .as(WordCountOptions.class);
@@ -159,7 +175,7 @@ public class DebuggingWordCount {
 PCollection> filteredWords =
 p.apply(TextIO.Read.named("ReadLines").from(options.getInputFile()))
  .apply(new WordCount.CountWords())
- .apply(ParDo.of(new FilterTextFn("Flourish|stomach")));
+ .apply(ParDo.of(new FilterTextFn(options.getFilterPattern(;
 
 /**
  * Concept #4: DataflowAssert is a set of convenient PTransforms in the 
style of

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/abb24cff/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/src/main/java/DebuggingWordCount.java
--
diff --git 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/src/main/java/DebuggingWordCount.java
 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/src/main/java/DebuggingWordCount.java
index 905670f..32fca4e 100644
--- 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/src/main/java/DebuggingWordCount.java
+++ 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/src/main/java/DebuggingWordCount.java
@@ -17,9 +17,11 @@
  */
 package ${package};
 
-import ${package}.WordCount.WordCountOptions;
+import ${package}.WordCount;
 import com.google.cloud.dataflow.sdk.Pipeline;
 import com.google.cloud.dataflow.sdk.io.TextIO;
+import com.google.cloud.dataflow.sdk.options.Default;
+import com.google.cloud.dataflow.sdk.options.Description;
 import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
 import com.google.cloud.dataflow.sdk.testing.DataflowAssert;
 import com.google.cloud.dataflow.sdk.transforms.Aggregator;
@@ -150,6 +152,21 @@ public class 

[2/2] incubator-beam git commit: This closes #108

2016-03-31 Thread bchambers
This closes #108


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/e5bca60d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/e5bca60d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/e5bca60d

Branch: refs/heads/master
Commit: e5bca60de2e94c0ba33592533cea73605e39eaf3
Parents: 061e6b5 abb24cf
Author: bchambers 
Authored: Thu Mar 31 17:55:36 2016 -0700
Committer: bchambers 
Committed: Thu Mar 31 17:55:36 2016 -0700

--
 .../dataflow/examples/DebuggingWordCount.java   | 20 +--
 .../src/main/java/DebuggingWordCount.java   | 21 ++--
 2 files changed, 37 insertions(+), 4 deletions(-)
--




[GitHub] incubator-beam pull request: Minor tweaks to PULL_REQUEST_TEMPLATE...

2016-03-31 Thread bjchambers
GitHub user bjchambers opened a pull request:

https://github.com/apache/incubator-beam/pull/111

Minor tweaks to PULL_REQUEST_TEMPLATE.md

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable 
   Travis-CI for on your fork and ensure the whole test matrix passes).
 - [ ] Replace "" in the title with the actual Jira issue 
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
   
---

Fix a typo

Add a horizontal rule between the PR description contents and the
checkboxes/process content.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/incubator-beam tweak-template

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #111


commit 6f9c2c46a41e3521a58e4521da19025ab63be9f4
Author: bchambers 
Date:   2016-04-01T00:48:28Z

Update PULL_REQUEST_TEMPLATE

Fix a typo

Add a horizontal rule between the template content and the PR
description which is automatically filled in by Github. This
makes it easier to separate the two, and reduces the number
of edits to the template necessary to clean it up.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_RunnableOnService_GoogleCloudDataflow #12

2016-03-31 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_RunnableOnService_GoogleCloudDataflow » Apache Beam :: SDKs :: Java :: Core #12

2016-03-31 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: Replace unambiguous of `throw Throwab...

2016-03-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/70


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/3] incubator-beam git commit: Re-interrupt thread when handling InterruptedException

2016-03-31 Thread lcwik
Re-interrupt thread when handling InterruptedException


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/8b31c530
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/8b31c530
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/8b31c530

Branch: refs/heads/master
Commit: 8b31c5305b94b643af9c05be998c07f00819017e
Parents: 9f8712c
Author: Kenneth Knowles 
Authored: Thu Mar 31 09:41:51 2016 -0700
Committer: Luke Cwik 
Committed: Thu Mar 31 16:22:43 2016 -0700

--
 .../test/java/com/google/cloud/dataflow/sdk/util/GcsUtilTest.java   | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/8b31c530/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/util/GcsUtilTest.java
--
diff --git 
a/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/util/GcsUtilTest.java
 
b/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/util/GcsUtilTest.java
index 6017969..70fbe1a 100644
--- 
a/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/util/GcsUtilTest.java
+++ 
b/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/util/GcsUtilTest.java
@@ -152,6 +152,7 @@ public class GcsUtilTest {
   countDownLatches[currentLatch - 1].countDown();
 }
   } catch (InterruptedException e) {
+Thread.currentThread().interrupt();
 throw new RuntimeException(e);
   }
 }



[1/3] incubator-beam git commit: Replace unambiguous of `throw Throwables.propagate` with definition

2016-03-31 Thread lcwik
Repository: incubator-beam
Updated Branches:
  refs/heads/master cab0c57c0 -> 061e6b5d8


Replace unambiguous of `throw Throwables.propagate` with definition

In the SDK the path taken by Throwables.propagate is always statically
known, and the inlined logic is more explicit and readable:

 - If an exception e is already a checked exception, Throwables.propagate(e)
   is the same as `throw new RuntimeException(e)`.
 - If an exception e is already a RuntimeException or Error,
   Throwables.propagate(e) is the same as `throw e`.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/9f8712c3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/9f8712c3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/9f8712c3

Branch: refs/heads/master
Commit: 9f8712c325364d71d1e6cc38bc37d5dad996cb9b
Parents: cab0c57
Author: Kenneth Knowles 
Authored: Wed Mar 23 09:29:45 2016 -0700
Committer: Luke Cwik 
Committed: Thu Mar 31 16:22:42 2016 -0700

--
 .../sdk/options/PipelineOptionsFactory.java |  4 +--
 .../sdk/runners/inprocess/InProcessCreate.java  |  3 +--
 .../dataflow/sdk/util/MutationDetectors.java|  3 +--
 .../sdk/testing/RestoreSystemProperties.java|  4 +--
 .../cloud/dataflow/sdk/util/GcsUtilTest.java| 28 ++--
 5 files changed, 19 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/9f8712c3/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/options/PipelineOptionsFactory.java
--
diff --git 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/options/PipelineOptionsFactory.java
 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/options/PipelineOptionsFactory.java
index fe53318..988d346 100644
--- 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/options/PipelineOptionsFactory.java
+++ 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/options/PipelineOptionsFactory.java
@@ -603,7 +603,7 @@ public class PipelineOptionsFactory {
 COMBINED_CACHE.put(combinedPipelineOptionsInterfaces,
 new Registration(allProxyClass, propertyDescriptors));
   } catch (IntrospectionException e) {
-throw Throwables.propagate(e);
+throw new RuntimeException(e);
   }
 }
 
@@ -618,7 +618,7 @@ public class PipelineOptionsFactory {
 INTERFACE_CACHE.put(iface,
 new Registration(proxyClass, propertyDescriptors));
   } catch (IntrospectionException e) {
-throw Throwables.propagate(e);
+throw new RuntimeException(e);
   }
 }
 @SuppressWarnings("unchecked")

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/9f8712c3/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/runners/inprocess/InProcessCreate.java
--
diff --git 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/runners/inprocess/InProcessCreate.java
 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/runners/inprocess/InProcessCreate.java
index c05c8c5..efda8fc 100644
--- 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/runners/inprocess/InProcessCreate.java
+++ 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/runners/inprocess/InProcessCreate.java
@@ -31,7 +31,6 @@ import com.google.cloud.dataflow.sdk.values.PCollection;
 import com.google.cloud.dataflow.sdk.values.PInput;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Optional;
-import com.google.common.base.Throwables;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Iterators;
 import com.google.common.collect.PeekingIterator;
@@ -77,7 +76,7 @@ class InProcessCreate extends ForwardingPTransform {
 try {
   source = new InMemorySource<>(original.getElements(), elementCoder);
 } catch (IOException e) {
-  throw Throwables.propagate(e);
+  throw new RuntimeException(e);
 }
 PCollection result = input.getPipeline().apply(Read.from(source));
 result.setCoder(elementCoder);

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/9f8712c3/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/MutationDetectors.java
--
diff --git 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/MutationDetectors.java
 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/MutationDetectors.java
index 5e4a9e9..e14c008 100644
--- 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/util/MutationDetectors.java
+++ 

[3/3] incubator-beam git commit: This closes #70

2016-03-31 Thread lcwik
This closes #70


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/061e6b5d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/061e6b5d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/061e6b5d

Branch: refs/heads/master
Commit: 061e6b5d8d3fe9e698e5eb59a1ddbc546cac3e0a
Parents: cab0c57 8b31c53
Author: Luke Cwik 
Authored: Thu Mar 31 16:23:23 2016 -0700
Committer: Luke Cwik 
Committed: Thu Mar 31 16:23:23 2016 -0700

--
 .../sdk/options/PipelineOptionsFactory.java |  4 +--
 .../sdk/runners/inprocess/InProcessCreate.java  |  3 +-
 .../dataflow/sdk/util/MutationDetectors.java|  3 +-
 .../sdk/testing/RestoreSystemProperties.java|  4 +--
 .../cloud/dataflow/sdk/util/GcsUtilTest.java| 29 ++--
 5 files changed, 20 insertions(+), 23 deletions(-)
--




[jira] [Commented] (BEAM-134) Investigate use of AutoValue

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220778#comment-15220778
 ] 

ASF GitHub Bot commented on BEAM-134:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/109

[BEAM-134] Example of AutoValue integration

This is a sample for [\[BEAM-134\] Investigate use of 
AutoValue](https://issues.apache.org/jira/browse/BEAM-134)

In the case of `IsmFormat`, there are four inner-classes used as immutable 
value types which can be simplified with AutoValue. Each of them demonstrate 
some interesting functionality:

* `KeyPrefix` is the simplest of the bunch, and it's implementation 
basically disappears after conversion.
* `IsmRecord` can act as two different types (value or metadata), and has 
validation logic in its getters they verify correct usage. In this case, we can 
keep the existing getter functionality and let AutoValue hook into separate 
package-private abstract properties. `IsmShard` is similar in this respect.
* `Footer` provides has custom logic in its `.toString()` to include a 
version string. For other value classes, AutoValue will generate a `toString()` 
implementation compatible with the equivalent `Objects.toStringHelper` version. 
In this case, we can keep the existing implementation and AutoValue won't 
override it.

As noted in the JIRA issue, I've identified 39 distinct classes which could 
be similarly converted. The benefits to converting are:
* Less boilerplate code to maintain.
* Eliminate a common source of bugs.
* Lower the barrier to writing immutable value types, which carry their own 
benefits.

I recommend we take this work.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam autovalue-example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #109


commit 880a0685e3ff8e235cb9f046c4097e3f797e9760
Author: Scott Wegner 
Date:   2016-03-31T22:04:26Z

Refactor IsmFormat value classes to use AutoValue




> Investigate use of AutoValue
> 
>
> Key: BEAM-134
> URL: https://issues.apache.org/jira/browse/BEAM-134
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Attachments: 
> 0001-Mark-classes-which-might-benefit-from-AutoValue.patch
>
>
> The initial PR for [BEAM-118] added a dependency on 
> [AutoValue|https://github.com/google/auto/tree/master/value#how-to-use-autovalue]
>  to auto-implement equality semantics for a new POJO. We decided to remove 
> the dependency because the cost of adding the dependency for this feature may 
> not be worth it for the value.
> However, we could use AutoValue for all of our POJO's, it might be worth it. 
> The proposal here is to follow-up with an investigation on whether we would 
> gain significant value to porting our code to use AutoValue instead of 
> hand-written POJO's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: DebuggingWordCount now takes filter a...

2016-03-31 Thread bjchambers
GitHub user bjchambers opened a pull request:

https://github.com/apache/incubator-beam/pull/108

DebuggingWordCount now takes filter as an option

Allow DebuggingWordCount to take the filter to be matched as an option.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable 
   Travis-CI for on your fork and ensure the whole test matrix passes).
 - [ ] Replace "" in the title with the actual Jira issue 
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
   

Previously it was hard-coded as "Flourish|stomach".
Now it is a PipelineOption with that as the default.

This allows "breaking" the pipeline by mis-specifying the pattern
without changing the code.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/incubator-beam debugging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #108


commit 374c6477f053b67e3edb4743168f82157628c558
Author: bchambers 
Date:   2016-03-31T22:18:09Z

DebuggingWordCount now takes filter as an option

Previously it was hard-coded as "Flourish|stomach".
Now it is a PipelineOption with that as the default.

This allows "breaking" the pipeline by mis-specifying the pattern
without changing the code.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-134) Investigate use of AutoValue

2016-03-31 Thread Scott Wegner (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner updated BEAM-134:
--
Description: 
The initial PR for [BEAM-118] added a dependency on 
[AutoValue|https://github.com/google/auto/tree/master/value#how-to-use-autovalue]
 to auto-implement equality semantics for a new POJO. We decided to remove the 
dependency because the cost of adding the dependency for this feature may not 
be worth it for the value.

However, we could use AutoValue for all of our POJO's, it might be worth it. 
The proposal here is to follow-up with an investigation on whether we would 
gain significant value to porting our code to use AutoValue instead of 
hand-written POJO's.

  was:
The initial PR for [BEAM-118] added a dependency on AutoValue to auto-implement 
equality semantics for a new POJO. We decided to remove the dependency because 
the cost of adding the dependency for this feature may not be worth it for the 
value.

However, we could use AutoValue for all of our POJO's, it might be worth it. 
The proposal here is to follow-up with an investigation on whether we would 
gain significant value to porting our code to use AutoValue instead of 
hand-written POJO's.


> Investigate use of AutoValue
> 
>
> Key: BEAM-134
> URL: https://issues.apache.org/jira/browse/BEAM-134
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
> Attachments: 
> 0001-Mark-classes-which-might-benefit-from-AutoValue.patch
>
>
> The initial PR for [BEAM-118] added a dependency on 
> [AutoValue|https://github.com/google/auto/tree/master/value#how-to-use-autovalue]
>  to auto-implement equality semantics for a new POJO. We decided to remove 
> the dependency because the cost of adding the dependency for this feature may 
> not be worth it for the value.
> However, we could use AutoValue for all of our POJO's, it might be worth it. 
> The proposal here is to follow-up with an investigation on whether we would 
> gain significant value to porting our code to use AutoValue instead of 
> hand-written POJO's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_RunnableOnService_GoogleCloudDataflow #11

2016-03-31 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_RunnableOnService_GoogleCloudDataflow » Apache Beam :: SDKs :: Java :: Core #11

2016-03-31 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: [BEAM-22] Clean up InProcess Read Eva...

2016-03-31 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/106

[BEAM-22] Clean up InProcess Read Evaluators

These are a couple of minor improvements to BoundedReadEvaluator
and UnboundedReadEvaluator that enables splitting a source at
evaluation time, as well as minor code cleanup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
ippr_cleaner_read_evaluators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #106


commit b83328c5b763c1974ad5a84d2870753e2078d1ee
Author: Thomas Groh 
Date:   2016-03-31T17:40:37Z

Explicitly track the Source a ReadEvaluator is using

This permits use of sources that are not the initial source used in the
transform. BoundedSource#splitIntoBundles and
UnboundedSource#generateInitialSplits generate multiple source objects
for the same transform in order to permit parallelism.

commit 2f756cc06967afa6d49aae54296568e70145551d
Author: Thomas Groh 
Date:   2016-03-31T17:43:56Z

Use proper scoping, interfaces in BoundedReadEvaluator

Use BoundedReader instead of Reader.

contentsRemaining should be method-scoped not instance-scoped.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request: Optimize the Count CombineFn

2016-03-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/88


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This closes #88

2016-03-31 Thread kenn
This closes #88


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/cab0c57c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/cab0c57c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/cab0c57c

Branch: refs/heads/master
Commit: cab0c57c0b9d9b15d593b50f8590b0cb379eb665
Parents: 96e286f 1792965
Author: Kenneth Knowles 
Authored: Thu Mar 31 13:24:05 2016 -0700
Committer: Kenneth Knowles 
Committed: Thu Mar 31 13:24:05 2016 -0700

--
 .../cloud/dataflow/sdk/transforms/Count.java| 81 +---
 1 file changed, 69 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/cab0c57c/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/transforms/Count.java
--
diff --cc 
sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/transforms/Count.java
index d5350bc,5ce4d2e..e3e7947
--- 
a/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/transforms/Count.java
+++ 
b/sdks/java/core/src/main/java/com/google/cloud/dataflow/sdk/transforms/Count.java
@@@ -1,23 -1,27 +1,28 @@@
  /*
 - * Copyright (C) 2015 Google Inc.
 + * Licensed to the Apache Software Foundation (ASF) under one
 + * or more contributor license agreements.  See the NOTICE file
 + * distributed with this work for additional information
 + * regarding copyright ownership.  The ASF licenses this file
 + * to you under the Apache License, Version 2.0 (the
 + * "License"); you may not use this file except in compliance
 + * with the License.  You may obtain a copy of the License at
   *
 - * Licensed under the Apache License, Version 2.0 (the "License"); you may not
 - * use this file except in compliance with the License. You may obtain a copy 
of
 - * the License at
 - *
 - * http://www.apache.org/licenses/LICENSE-2.0
 + * http://www.apache.org/licenses/LICENSE-2.0
   *
   * Unless required by applicable law or agreed to in writing, software
 - * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 - * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 - * License for the specific language governing permissions and limitations 
under
 - * the License.
 + * distributed under the License is distributed on an "AS IS" BASIS,
 + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 + * See the License for the specific language governing permissions and
 + * limitations under the License.
   */
 -
  package com.google.cloud.dataflow.sdk.transforms;
  
+ import com.google.cloud.dataflow.sdk.coders.Coder;
+ import com.google.cloud.dataflow.sdk.coders.CoderException;
+ import com.google.cloud.dataflow.sdk.coders.CoderRegistry;
+ import com.google.cloud.dataflow.sdk.coders.CustomCoder;
  import com.google.cloud.dataflow.sdk.transforms.Combine.CombineFn;
+ import com.google.cloud.dataflow.sdk.util.VarInt;
  import com.google.cloud.dataflow.sdk.values.KV;
  import com.google.cloud.dataflow.sdk.values.PCollection;
  



[jira] [Comment Edited] (BEAM-162) assert fail using session windows

2016-03-31 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220413#comment-15220413
 ] 

Kenneth Knowles edited comment on BEAM-162 at 3/31/16 6:34 PM:
---

I agree with your PR, but I think the cause is different than described. If it 
can actually be caused by that, we have a second bug that is not as obvious.


was (Author: kenn):
I agree with your PR, but I think the cause is different than described.

> assert fail using session windows
> -
>
> Key: BEAM-162
> URL: https://issues.apache.org/jira/browse/BEAM-162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> java.lang.IllegalStateException: Window 
> [2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
> at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
> at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
> at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
> at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
> at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
> at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-162) assert fail using session windows

2016-03-31 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220413#comment-15220413
 ] 

Kenneth Knowles commented on BEAM-162:
--

I agree with your PR, but I think the cause is different than described.

> assert fail using session windows
> -
>
> Key: BEAM-162
> URL: https://issues.apache.org/jira/browse/BEAM-162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> java.lang.IllegalStateException: Window 
> [2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
> at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
> at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
> at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
> at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
> at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
> at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-162) assert fail using session windows

2016-03-31 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220403#comment-15220403
 ] 

Kenneth Knowles commented on BEAM-162:
--

I meant should in the normative sense, of course.

> assert fail using session windows
> -
>
> Key: BEAM-162
> URL: https://issues.apache.org/jira/browse/BEAM-162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> java.lang.IllegalStateException: Window 
> [2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
> at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
> at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
> at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
> at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
> at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
> at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-162) assert fail using session windows

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220383#comment-15220383
 ] 

ASF GitHub Bot commented on BEAM-162:
-

GitHub user mshields822 opened a pull request:

https://github.com/apache/incubator-beam/pull/105

[BEAM-162] Check for closed windows early

Rather than squash speculation on the bug I'll post this early.

Will do unit test and try to get more randomization into NexMark Q11 so we 
can tease out these rare failures before our customers do.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mshields822/incubator-beam beam-162

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #105


commit 11ab1c69d8c0cbe09e3efe62a76520fd5dc30550
Author: Mark Shields 
Date:   2016-03-31T17:57:55Z

Check for closed windows early




> assert fail using session windows
> -
>
> Key: BEAM-162
> URL: https://issues.apache.org/jira/browse/BEAM-162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> java.lang.IllegalStateException: Window 
> [2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
> at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
> at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
> at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
> at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
> at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
> at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-162) assert fail using session windows

2016-03-31 Thread Mark Shields (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220386#comment-15220386
 ] 

Mark Shields commented on BEAM-162:
---

https://github.com/apache/incubator-beam/pull/105

> assert fail using session windows
> -
>
> Key: BEAM-162
> URL: https://issues.apache.org/jira/browse/BEAM-162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> java.lang.IllegalStateException: Window 
> [2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
> at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
> at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
> at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
> at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
> at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
> at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-162] Check for closed windows e...

2016-03-31 Thread mshields822
GitHub user mshields822 opened a pull request:

https://github.com/apache/incubator-beam/pull/105

[BEAM-162] Check for closed windows early

Rather than squash speculation on the bug I'll post this early.

Will do unit test and try to get more randomization into NexMark Q11 so we 
can tease out these rare failures before our customers do.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mshields822/incubator-beam beam-162

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #105


commit 11ab1c69d8c0cbe09e3efe62a76520fd5dc30550
Author: Mark Shields 
Date:   2016-03-31T17:57:55Z

Check for closed windows early




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-162) assert fail using session windows

2016-03-31 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220369#comment-15220369
 ] 

Kenneth Knowles commented on BEAM-162:
--

Elaborating on Ben's comment, only the trigger's {{onFire()}} should be capable 
of closing a window.

> assert fail using session windows
> -
>
> Key: BEAM-162
> URL: https://issues.apache.org/jira/browse/BEAM-162
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> java.lang.IllegalStateException: Window 
> [2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
> at 
> com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
> at 
> com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
> at 
> com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
> at 
> com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
> at 
> com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
> at 
> com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
> at 
> com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
> at 
> com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-163) Warn the user when a PCollection (via Window.into) has smaller allowedLateness than upstream

2016-03-31 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-163:
-
Description: In this situation, data may be dropped downstream even though 
it was allowed upstream. It is probably quite rare that this is desirable.

> Warn the user when a PCollection (via Window.into) has smaller 
> allowedLateness than upstream
> 
>
> Key: BEAM-163
> URL: https://issues.apache.org/jira/browse/BEAM-163
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: windowing
>
> In this situation, data may be dropped downstream even though it was allowed 
> upstream. It is probably quite rare that this is desirable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-163) Warn the user when a PCollection (via Window.into) has smaller allowedLateness than upstream

2016-03-31 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-163:


 Summary: Warn the user when a PCollection (via Window.into) has 
smaller allowedLateness than upstream
 Key: BEAM-163
 URL: https://issues.apache.org/jira/browse/BEAM-163
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Kenneth Knowles
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-162) assert fail using session windows

2016-03-31 Thread Mark Shields (JIRA)
Mark Shields created BEAM-162:
-

 Summary: assert fail using session windows
 Key: BEAM-162
 URL: https://issues.apache.org/jira/browse/BEAM-162
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Mark Shields
Assignee: Frances Perry


java.lang.IllegalStateException: Window 
[2016-03-31T05:35:31.158Z..2016-03-31T06:05:31.158Z) should have been added
at 
com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:199)
at 
com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElement(ReduceFnRunner.java:440)
at 
com.google.cloud.dataflow.sdk.util.ReduceFnRunner.processElements(ReduceFnRunner.java:282)
at 
com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowViaWindowSetDoFn.processElement(GroupAlsoByWindowViaWindowSetDoFn.java:83)
at 
com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
at 
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138)
at 
com.google.cloud.dataflow.sdk.util.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:67)
at 
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:191)
at 
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
at 
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-68) Support for limiting parallelism of a step

2016-03-31 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220089#comment-15220089
 ] 

Eugene Kirpichov commented on BEAM-68:
--

I re-duped BEAM-169 against BEAM-92.
- I agree that it won't work with empty K, however that should be relatively 
unlikely if the user has enough data that sharding it makes sense, and if their 
hash function is good; and in some sharded sink scenarios it may be desirable 
to not write empty shards.
- I'm not sure what you mean by "won't scale": individual shards have to be 
written sequentially, but they can be written in parallel with each other in 
this proposed implementation: dynamic rebalancing will separate the shard keys 
from each other.

> Support for limiting parallelism of a step
> --
>
> Key: BEAM-68
> URL: https://issues.apache.org/jira/browse/BEAM-68
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Daniel Halperin
>
> Users may want to limit the parallelism of a step. Two classic uses cases are:
> - User wants to produce at most k files, so sets 
> TextIO.Write.withNumShards(k).
> - External API only supports k QPS, so user sets a limit of k/(expected 
> QPS/step) on the ParDo that makes the API call.
> Unfortunately, there is no way to do this effectively within the Beam model. 
> A GroupByKey with exactly k keys will guarantee that only k elements are 
> produced, but runners are free to break fusion in ways that each element may 
> be processed in parallel later.
> To implement this functionaltiy, I believe we need to add this support to the 
> Beam Model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (BEAM-159) Support fixed number of shards in custom sinks

2016-03-31 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reopened BEAM-159:
---

Reopening to dupe against BEAM-92 instead, per discussion in BEAM-68.

> Support fixed number of shards in custom sinks
> --
>
> Key: BEAM-159
> URL: https://issues.apache.org/jira/browse/BEAM-159
> Project: Beam
>  Issue Type: Improvement
>Reporter: Eugene Kirpichov
>
> TextIO supports .withNumShards, however custom sinks, in particular 
> FileBasedSinks, provide no support for controlling sharding. Some users want 
> this, e.g. 
> http://stackoverflow.com/questions/36316304/set-num-of-output-shard-in-write-tosink-in-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-158] add support for bounded so...

2016-03-31 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/incubator-beam/pull/104

[BEAM-158] add support for bounded sources in streaming

Apart from a few improvements, this PR introduces bounded sources in 
streaming. The BoundedSource wrapper (`SourceInputFormat`) is the same as for 
the batch part of the runner. The translator assigns ingestion time watermarks 
and processing time timestamps upon reading from the source. We could make this 
more flexible in terms of watermark generation if we had an UnboundedSource 
wrapper for a BoundedSource.

Perhaps we could have common utility for runners to deal with serialization 
of PipelineOptions. At some point, they have to be shipped. I had to change the 
serialization code because I was experiencing a serialization bug which led to 
a serialization loop. Debugging this was almost impossible because the stack 
trace doesn't show all serialization calls due to some magic in the VM. I 
didn't find any cyclic references between the PipelineOptions and Flink 
components. I'm assuming this is a bug and the workaround using byte array 
serialization of the options is fair enough. See `SourceInputFormat`.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/incubator-beam BEAM-158

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #104


commit f26674e7a7b30ee3f992edfc8e473df2a7ee3e80
Author: Maximilian Michels 
Date:   2016-03-30T14:43:04Z

[flink] improve InputFormat wrapper and ReadSourceITCase

commit 03404c7f4a5656bb5c5c0a2510f12e33292fef01
Author: Maximilian Michels 
Date:   2016-03-30T17:05:27Z

[flink] improvements to UnboundedSource translation

commit de574136a5b4ba9b75231b321d0190e23af3bac2
Author: Maximilian Michels 
Date:   2016-03-31T08:18:01Z

[BEAM-158] add support for bounded sources in streaming




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-158) Add support for bounded sources in streaming mode

2016-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219884#comment-15219884
 ] 

ASF GitHub Bot commented on BEAM-158:
-

GitHub user mxm opened a pull request:

https://github.com/apache/incubator-beam/pull/104

[BEAM-158] add support for bounded sources in streaming

Apart from a few improvements, this PR introduces bounded sources in 
streaming. The BoundedSource wrapper (`SourceInputFormat`) is the same as for 
the batch part of the runner. The translator assigns ingestion time watermarks 
and processing time timestamps upon reading from the source. We could make this 
more flexible in terms of watermark generation if we had an UnboundedSource 
wrapper for a BoundedSource.

Perhaps we could have common utility for runners to deal with serialization 
of PipelineOptions. At some point, they have to be shipped. I had to change the 
serialization code because I was experiencing a serialization bug which led to 
a serialization loop. Debugging this was almost impossible because the stack 
trace doesn't show all serialization calls due to some magic in the VM. I 
didn't find any cyclic references between the PipelineOptions and Flink 
components. I'm assuming this is a bug and the workaround using byte array 
serialization of the options is fair enough. See `SourceInputFormat`.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/incubator-beam BEAM-158

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #104


commit f26674e7a7b30ee3f992edfc8e473df2a7ee3e80
Author: Maximilian Michels 
Date:   2016-03-30T14:43:04Z

[flink] improve InputFormat wrapper and ReadSourceITCase

commit 03404c7f4a5656bb5c5c0a2510f12e33292fef01
Author: Maximilian Michels 
Date:   2016-03-30T17:05:27Z

[flink] improvements to UnboundedSource translation

commit de574136a5b4ba9b75231b321d0190e23af3bac2
Author: Maximilian Michels 
Date:   2016-03-31T08:18:01Z

[BEAM-158] add support for bounded sources in streaming




> Add support for bounded sources in streaming mode
> -
>
> Key: BEAM-158
> URL: https://issues.apache.org/jira/browse/BEAM-158
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[3/3] incubator-beam git commit: This closes #94.

2016-03-31 Thread mxm
This closes #94.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/96e286fe
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/96e286fe
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/96e286fe

Branch: refs/heads/master
Commit: 96e286fec758bb451ff383e6e7c3f2b5bb0cb840
Parents: 0c47cad 63a7c3d
Author: Maximilian Michels 
Authored: Thu Mar 31 11:11:09 2016 +0200
Committer: Maximilian Michels 
Committed: Thu Mar 31 11:11:09 2016 +0200

--
 .../FlinkGroupAlsoByWindowWrapper.java  | 22 +---
 1 file changed, 14 insertions(+), 8 deletions(-)
--




[1/3] incubator-beam git commit: [flink] improve lifecycle handling of GroupAlsoByWindowWrapper

2016-03-31 Thread mxm
Repository: incubator-beam
Updated Branches:
  refs/heads/master 0c47cad48 -> 96e286fec


[flink] improve lifecycle handling of GroupAlsoByWindowWrapper


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/033b9240
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/033b9240
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/033b9240

Branch: refs/heads/master
Commit: 033b9240765543438068c1adea6d0cff34ddcd53
Parents: 17863c8
Author: Maximilian Michels 
Authored: Mon Mar 28 11:31:38 2016 +0200
Committer: Maximilian Michels 
Committed: Wed Mar 30 11:31:56 2016 +0200

--
 .../wrappers/streaming/FlinkGroupAlsoByWindowWrapper.java  | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/033b9240/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkGroupAlsoByWindowWrapper.java
--
diff --git 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkGroupAlsoByWindowWrapper.java
 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkGroupAlsoByWindowWrapper.java
index b413d7a..751d44c 100644
--- 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkGroupAlsoByWindowWrapper.java
+++ 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/FlinkGroupAlsoByWindowWrapper.java
@@ -220,6 +220,7 @@ public class FlinkGroupAlsoByWindowWrapper
   public void open() throws Exception {
 super.open();
 this.context = new ProcessContext(operator, new 
TimestampedCollector<>(output), this.timerInternals);
+operator.startBundle(context);
   }
 
   /**
@@ -252,11 +253,7 @@ public class FlinkGroupAlsoByWindowWrapper
 
   private void processKeyedWorkItem(KeyedWorkItem workItem) throws 
Exception {
 context.setElement(workItem, getStateInternalsForKey(workItem.key()));
-
-// TODO: Ideally startBundle/finishBundle would be called when the 
operator is first used / about to be discarded.
-operator.startBundle(context);
 operator.processElement(context);
-operator.finishBundle(context);
   }
 
   @Override
@@ -309,6 +306,7 @@ public class FlinkGroupAlsoByWindowWrapper
 
   @Override
   public void close() throws Exception {
+operator.finishBundle(context);
 super.close();
   }
 



[GitHub] incubator-beam pull request: Cassandra source and sink connector c...

2016-03-31 Thread TechM-Google
GitHub user TechM-Google opened a pull request:

https://github.com/apache/incubator-beam/pull/103

Cassandra source and sink connector classes

[BEAM-] Description of pull request

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable 
   Travis-CI for on your fork and ensure the whole test matrix passes).
 - [ ] Replace "" in the title with the actual Jira issue 
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
   



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TechM-Google/incubator-beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #103


commit 16e104cd0cc7859f346315f63061f3f5125a654c
Author: unknown 
Date:   2016-03-31T09:24:28Z

Cassandra source and sink connector classes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---