date:20160614

[GitHub] incubator-beam pull request #458: Rename DoFnTester#processBatch to processB...

2016-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/458


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-beam pull request #467: [BEAM-243] Add peekOutputValuesInWindow to...

2016-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/467


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/3] incubator-beam git commit: Use TimestampedValue in DoFnTester

2016-06-14 Thread kenn

Repository: incubator-beam
Updated Branches:
  refs/heads/master 774944014 -> 2b269559f


Use TimestampedValue in DoFnTester

This removes the duplicate OutputElementWithTimestamp data structure.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/4da5ebfb
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/4da5ebfb
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/4da5ebfb

Branch: refs/heads/master
Commit: 4da5ebfbf021051288620634ec84cafa9208265c
Parents: 7749440
Author: Thomas Groh 
Authored: Tue Jun 14 13:39:59 2016 -0700
Committer: Thomas Groh 
Committed: Tue Jun 14 13:39:59 2016 -0700

--
 .../apache/beam/sdk/transforms/DoFnTester.java  | 56 
 .../beam/sdk/transforms/DoFnTesterTest.java | 22 
 2 files changed, 20 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/4da5ebfb/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
index 332ea13..1df42e2 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
@@ -28,12 +28,12 @@ import org.apache.beam.sdk.util.SerializableUtils;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.util.WindowingInternals;
 import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.TimestampedValue;
 import org.apache.beam.sdk.values.TupleTag;
 import org.apache.beam.sdk.values.TupleTagList;
 
 import com.google.common.base.Function;
 import com.google.common.base.MoreObjects;
-import com.google.common.base.Objects;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
 
@@ -256,10 +256,10 @@ public class DoFnTester {
 // TODO: Should we return an unmodifiable list?
 return Lists.transform(
 peekOutputElementsWithTimestamp(),
-new Function() {
+new Function() {
   @Override
   @SuppressWarnings("unchecked")
-  public OutputT apply(OutputElementWithTimestamp input) {
+  public OutputT apply(TimestampedValue input) {
 return input.getValue();
   }
 });
@@ -274,16 +274,14 @@ public class DoFnTester {
* @see #clearOutputElements
*/
   @Experimental
-  public List 
peekOutputElementsWithTimestamp() {
+  public List peekOutputElementsWithTimestamp() {
 // TODO: Should we return an unmodifiable list?
 return Lists.transform(getOutput(mainOutputTag),
-new Function() {
+new Function() {
   @Override
   @SuppressWarnings("unchecked")
-  public OutputElementWithTimestamp apply(Object input) {
-return new OutputElementWithTimestamp(
-((WindowedValue) input).getValue(),
-((WindowedValue) input).getTimestamp());
+  public TimestampedValue apply(WindowedValue input) 
{
+return TimestampedValue.of(input.getValue(), input.getTimestamp());
   }
 });
   }
@@ -318,8 +316,8 @@ public class DoFnTester {
* @see #clearOutputElements
*/
   @Experimental
-  public List 
takeOutputElementsWithTimestamp() {
-List resultElems =
+  public List takeOutputElementsWithTimestamp() {
+List resultElems =
 new ArrayList<>(peekOutputElementsWithTimestamp());
 clearOutputElements();
 return resultElems;
@@ -383,42 +381,6 @@ public class DoFnTester {
 return combiner.extractOutput(accumulator);
   }
 
-  /**
-   * Holder for an OutputElement along with its associated timestamp.
-   */
-  @Experimental
-  public static class OutputElementWithTimestamp {
-private final OutputT value;
-private final Instant timestamp;
-
-OutputElementWithTimestamp(OutputT value, Instant timestamp) {
-  this.value = value;
-  this.timestamp = timestamp;
-}
-
-OutputT getValue() {
-  return value;
-}
-
-Instant getTimestamp() {
-  return timestamp;
-}
-
-@Override
-public boolean equals(Object obj) {
-  if (!(obj instanceof OutputElementWithTimestamp)) {
-return false;

[GitHub] incubator-beam pull request #462: Use TimestampedValue in DoFnTester

2016-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[3/3] incubator-beam git commit: This closes #467

2016-06-14 Thread kenn

This closes #467


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/2b269559
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/2b269559
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/2b269559

Branch: refs/heads/master
Commit: 2b269559f3afa11a1defb74de8d33075fdc90274
Parents: 7749440 c627fa8
Author: Kenn Knowles 
Authored: Tue Jun 14 20:18:19 2016 -0700
Committer: Kenn Knowles 
Committed: Tue Jun 14 20:18:19 2016 -0700

--
 .../apache/beam/sdk/transforms/DoFnTester.java  | 81 
 .../beam/sdk/transforms/DoFnTesterTest.java | 44 ---
 2 files changed, 67 insertions(+), 58 deletions(-)
--

[2/3] incubator-beam git commit: Add DoFnTester#peekOutputValuesInWindow

2016-06-14 Thread kenn

Add DoFnTester#peekOutputValuesInWindow

This permits DoFns that interact with windowing to test the windowed,
rather than overall output.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/c627fa84
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/c627fa84
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/c627fa84

Branch: refs/heads/master
Commit: c627fa847fe649b4308c2becbe101dcb0a945843
Parents: 4da5ebf
Author: Thomas Groh 
Authored: Tue Jun 14 13:18:41 2016 -0700
Committer: Thomas Groh 
Committed: Tue Jun 14 18:25:23 2016 -0700

--
 .../apache/beam/sdk/transforms/DoFnTester.java  | 25 
 .../beam/sdk/transforms/DoFnTesterTest.java | 22 +
 2 files changed, 47 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c627fa84/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
index 1df42e2..415af95 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
@@ -34,6 +34,7 @@ import org.apache.beam.sdk.values.TupleTagList;
 
 import com.google.common.base.Function;
 import com.google.common.base.MoreObjects;
+import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
 
@@ -287,6 +288,30 @@ public class DoFnTester {
   }
 
   /**
+   * Returns the elements output so far to the main output in the provided 
window with associated
+   * timestamps.
+   */
+  public List 
peekOutputElementsInWindow(BoundedWindow window) {
+return peekOutputElementsInWindow(mainOutputTag, window);
+  }
+
+  /**
+   * Returns the elements output so far to the specified output in the 
provided window with
+   * associated timestamps.
+   */
+  public List peekOutputElementsInWindow(
+  TupleTag tag,
+  BoundedWindow window) {
+ImmutableList.Builder valuesBuilder = 
ImmutableList.builder();
+for (WindowedValue value : getOutput(tag)) {
+  if (value.getWindows().contains(window)) {
+valuesBuilder.add(TimestampedValue.of(value.getValue(), 
value.getTimestamp()));
+  }
+}
+return valuesBuilder.build();
+  }
+
+  /**
* Clears the record of the elements output so far to the main output.
*
* @see #peekOutputElements

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/c627fa84/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java
index 490ed7f..3261f85 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/DoFnTesterTest.java
@@ -17,14 +17,18 @@
  */
 package org.apache.beam.sdk.transforms;
 
+import static org.hamcrest.Matchers.containsInAnyOrder;
 import static org.hamcrest.Matchers.equalTo;
 import static org.hamcrest.Matchers.hasItems;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertThat;
 import static org.junit.Assert.assertTrue;
 
+import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
+import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
 import org.apache.beam.sdk.values.TimestampedValue;
 
+import org.hamcrest.Matchers;
 import org.joda.time.Instant;
 import org.junit.Test;
 import org.junit.runner.RunWith;
@@ -205,6 +209,24 @@ public class DoFnTesterTest {
 assertThat(aggValue, equalTo(1L + 2L));
   }
 
+  @Test
+  public void peekValuesInWindow() throws Exception {
+CounterDoFn fn = new CounterDoFn(1L, 2L);
+DoFnTester tester = DoFnTester.of(fn);
+
+tester.startBundle();
+tester.processElement(1L);
+tester.processElement(2L);
+tester.finishBundle();
+
+assertThat(tester.peekOutputElementsInWindow(GlobalWindow.INSTANCE),
+containsInAnyOrder(TimestampedValue.of("1", new Instant(1000L)),
+TimestampedValue.of("2", new Instant(2000L;
+assertThat(tester.peekOutputElementsInWindow(
+new IntervalWindow(new Instant(0L), new Instant(10L))),
+Matchers.emptyIterable());
+  }
+
   /**

[GitHub] incubator-beam pull request #467: Add peekOutputValuesInWindow to DoFnTester

2016-06-14 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/467

Add peekOutputValuesInWindow to DoFnTester

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This allows a test (especially GroupAlsoByWindowsProperties) to assert
the per-window contents of the DoFn's output.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam peek_windowed_values

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #467


commit 4da5ebfbf021051288620634ec84cafa9208265c
Author: Thomas Groh 
Date:   2016-06-14T20:39:59Z

Use TimestampedValue in DoFnTester

This removes the duplicate OutputElementWithTimestamp data structure.

commit 3ffc763a231e4b523f220aa6e05aad5c873cc4c7
Author: Thomas Groh 
Date:   2016-06-14T20:18:41Z

Add DoFnTester#peekOutputValuesInWindow

This permits DoFns that interact with windowing to test the windowed,
rather than overall output.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

svn commit: r14006 - in /release/incubator/beam: ./ 0.1.0-incubating/

2016-06-14 Thread davor

Author: davor
Date: Wed Jun 15 00:09:45 2016
New Revision: 14006

Log:
Apache Beam, release 0.1.0-incubating.


Added:

release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip
   (with props)

release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc

release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.md5

release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.sha1

release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.md5

release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.sha1
release/incubator/beam/KEYS

Added: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip
==
Binary file - no diff available.

Propchange: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip
--
svn:mime-type = application/octet-stream

Added: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc
==
--- 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc
 (added)
+++ 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc
 Wed Jun 15 00:09:45 2016
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQEcBAABAgAGBQJXWH5tAAoJEMkEN+GPDTNPNwsIAI4RkZ5gnOw2uUV2DB9HIJFK
+FWNlQjh4ezXtRkV9dG3abp6z50QCR70zqOpFatIzI0kDJfIRH5UxVWAiqBC4JWVV
+IUPtGt7hsy1LkJbyY1HRsi9qXiOGYPePppNBJtmLMupO5yc0CuXNMOH6GrGEjsRm
+MQofrqvnpB0svAo1TmOIbqJYZUUQCT6Hrap2RQfTNO3jSekNpfkv/yWMy/o/h0hr
++c/dAFytoME6VMba3UcZfx0txt9HWRFls2qZ+CsAxhyqbX6U1HMOESZ1hJHTMnls
+2SypXxhGHj+t4QNs1eB0WRE4Tg8I3SO9Dw6X9sK/h76U+5tBlrTTsvFaghqENSQ=
+=iut1
+-END PGP SIGNATURE-

Added: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.md5
==
--- 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.md5
 (added)
+++ 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.md5
 Wed Jun 15 00:09:45 2016
@@ -0,0 +1 @@
+144b3cf880479395ac06395117f3a022
\ No newline at end of file

Added: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.sha1
==
--- 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.sha1
 (added)
+++ 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.asc.sha1
 Wed Jun 15 00:09:45 2016
@@ -0,0 +1 @@
+36f9838f5435434b488a39e7eb3f7c65f4284cf0
\ No newline at end of file

Added: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.md5
==
--- 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.md5
 (added)
+++ 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.md5
 Wed Jun 15 00:09:45 2016
@@ -0,0 +1 @@
+bdcc28e5adadd28a80e41985c1df2717
\ No newline at end of file

Added: 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.sha1
==
--- 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.sha1
 (added)
+++ 
release/incubator/beam/0.1.0-incubating/apache-beam-0.1.0-incubating-source-release.zip.sha1
 Wed Jun 15 00:09:45 2016
@@ -0,0 +1 @@
+40da1c1b3b529febfc0e5ebc5a20e84144b61651
\ No newline at end of file

Added: release/incubator/beam/KEYS
==
--- release/incubator/beam/KEYS (added)
+++ release/incubator/beam/KEYS Wed Jun 15 00:09:45 2016
@@ -0,0 +1,141 @@
+This file contains the PGP keys of various developers.
+
+Users: pgp < KEYS
+   gpg --import KEYS
+Developers: 
+pgp -kxa  and append it to this file.
+(pgpk -ll  && pgpk -xa ) >> this file.
+(gpg --list-sigs 
+ && gpg --armor --export ) >> this file.
+ 
+
+pub   4096R/C8282E76 2009-09-08
+uid  Jean-Baptiste OnofrÃ© 
+sig 3C8282E76 2009-09-08  Jean-Baptiste OnofrÃ© 
+sub   4096R/9F043BBC 2009-09-08
+sig  C8282E76 2009-09-08  Jean-Baptiste OnofrÃ© 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Version: GnuPG v1
+

[GitHub] incubator-beam pull request #465: Fix type error in Eclipse

2016-06-14 Thread kennknowles

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/465

Fix type error in Eclipse

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This type error occurs in my Eclipse installation. It apparently
does not bother the various JDKs we test with. But this is an
accurate typing, so it may help other Eclipse-using contributors,
too.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam type-nit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #465


commit df266a5e470e09774a52629c0433679d4d941c81
Author: Kenneth Knowles 
Date:   2016-06-14T23:12:11Z

Fix type error in Eclipse

This type error occurs in my Eclipse installation. It apparently
does not bother the various JDKs we test with. But this is an
accurate typing, so it may help other Eclipse-using contributors,
too.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[04/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/transforms/window.py
--
diff --git a/sdks/python/google/cloud/dataflow/transforms/window.py 
b/sdks/python/google/cloud/dataflow/transforms/window.py
deleted file mode 100644
index 6c0c2e8..000
--- a/sdks/python/google/cloud/dataflow/transforms/window.py
+++ /dev/null
@@ -1,383 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Windowing concepts.
-
-A WindowInto transform logically divides up or groups the elements of a
-PCollection into finite windows according to a windowing function (derived from
-WindowFn).
-
-The output of WindowInto contains the same elements as input, but they have 
been
-logically assigned to windows. The next GroupByKey(s) transforms, including one
-within a composite transform, will group by the combination of keys and 
windows.
-
-Windowing a PCollection allows chunks of it to be processed individually, 
before
-the entire PCollection is available.  This is especially important for
-PCollection(s) with unbounded size, since the full PCollection is never
-available at once, since more data is continually arriving. For PCollection(s)
-with a bounded size (aka. conventional batch mode), by default, all data is
-implicitly in a single window (see GlobalWindows), unless WindowInto is
-applied.
-
-For example, a simple form of windowing divides up the data into fixed-width
-time intervals, using FixedWindows.
-
-Seconds are used as the time unit for the built-in windowing primitives here.
-Integer or floating point seconds can be passed to these primitives.
-
-Internally, seconds, with microsecond granularity, are stored as
-timeutil.Timestamp and timeutil.Duration objects. This is done to avoid
-precision errors that would occur with floating point representations.
-
-Custom windowing function classes can be created, by subclassing from
-WindowFn.
-"""
-
-from __future__ import absolute_import
-
-from google.cloud.dataflow import coders
-from google.cloud.dataflow.transforms import timeutil
-from google.cloud.dataflow.transforms.timeutil import Duration
-from google.cloud.dataflow.transforms.timeutil import MAX_TIMESTAMP
-from google.cloud.dataflow.transforms.timeutil import MIN_TIMESTAMP
-from google.cloud.dataflow.transforms.timeutil import Timestamp
-
-
-# TODO(ccy): revisit naming and semantics once Java Apache Beam finalizes their
-# behavior.
-class OutputTimeFn(object):
-  """Determines how output timestamps of grouping operations are assigned."""
-
-  OUTPUT_AT_EOW = 'OUTPUT_AT_EOW'
-  OUTPUT_AT_EARLIEST = 'OUTPUT_AT_EARLIEST'
-  OUTPUT_AT_LATEST = 'OUTPUT_AT_LATEST'
-  OUTPUT_AT_EARLIEST_TRANSFORMED = 'OUTPUT_AT_EARLIEST_TRANSFORMED'
-
-  @staticmethod
-  def get_impl(output_time_fn, window_fn):
-if output_time_fn == OutputTimeFn.OUTPUT_AT_EOW:
-  return timeutil.OutputAtEndOfWindowImpl()
-elif output_time_fn == OutputTimeFn.OUTPUT_AT_EARLIEST:
-  return timeutil.OutputAtEarliestInputTimestampImpl()
-elif output_time_fn == OutputTimeFn.OUTPUT_AT_LATEST:
-  return timeutil.OutputAtLatestInputTimestampImpl()
-elif output_time_fn == OutputTimeFn.OUTPUT_AT_EARLIEST_TRANSFORMED:
-  return timeutil.OutputAtEarliestTransformedInputTimestampImpl(window_fn)
-else:
-  raise ValueError('Invalid OutputTimeFn: %s.' % output_time_fn)
-
-
-class WindowFn(object):
-  """An abstract windowing function defining a basic assign and merge."""
-
-  class AssignContext(object):
-"""Context passed to WindowFn.assign()."""
-
-def __init__(self, timestamp, element=None, existing_windows=None):
-  self.timestamp = Timestamp.of(timestamp)
-  self.element = element
-  self.existing_windows = existing_windows
-
-  def assign(self, assign_context):
-"""Associates a timestamp and set of windows to an element."""
-raise NotImplementedError
-
-  class MergeContext(object):
-"""Context passed to WindowFn.merge() to perform merging, if any."""
-
-def __init__(self, windows):
-  self.windows = list(windows)
-
-def merge(self, to_be_merged, merge_result):
-  raise NotImplementedError
-
-  def merge(self, merge_context):
-"""Returns a window that is the result of merging a set of windows."""
-raise NotImplementedError
-
-  def get_window_coder(self):
-return coders.PickleCoder()
-

[07/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/transforms/ptransform.py
--
diff --git a/sdks/python/google/cloud/dataflow/transforms/ptransform.py 
b/sdks/python/google/cloud/dataflow/transforms/ptransform.py
deleted file mode 100644
index 09f8015..000
--- a/sdks/python/google/cloud/dataflow/transforms/ptransform.py
+++ /dev/null
@@ -1,703 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""PTransform and descendants.
-
-A PTransform is an object describing (not executing) a computation. The actual
-execution semantics for a transform is captured by a runner object. A transform
-object always belongs to a pipeline object.
-
-A PTransform derived class needs to define the apply() method that describes
-how one or more PValues are created by the transform.
-
-The module defines a few standard transforms: FlatMap (parallel do),
-GroupByKey (group by key), etc. Note that the apply() methods for these
-classes contain code that will add nodes to the processing graph associated
-with a pipeline.
-
-As support for the FlatMap transform, the module also defines a DoFn
-class and wrapper class that allows lambda functions to be used as
-FlatMap processing functions.
-"""
-
-from __future__ import absolute_import
-
-import copy
-import inspect
-import operator
-import os
-import sys
-
-from google.cloud.dataflow import coders
-from google.cloud.dataflow import error
-from google.cloud.dataflow import pvalue
-from google.cloud.dataflow import typehints
-from google.cloud.dataflow.internal import pickler
-from google.cloud.dataflow.internal import util
-from google.cloud.dataflow.typehints import getcallargs_forhints
-from google.cloud.dataflow.typehints import TypeCheckError
-from google.cloud.dataflow.typehints import validate_composite_type_param
-from google.cloud.dataflow.typehints import WithTypeHints
-from google.cloud.dataflow.typehints.trivial_inference import instance_to_type
-
-
-class _PValueishTransform(object):
-  """Visitor for PValueish objects.
-
-  A PValueish is a PValue, or list, tuple, dict of PValuesish objects.
-
-  This visits a PValueish, contstructing a (possibly mutated) copy.
-  """
-  def visit(self, node, *args):
-return getattr(
-self,
-'visit_' + node.__class__.__name__,
-lambda x, *args: x)(node, *args)
-
-  def visit_list(self, node, *args):
-return [self.visit(x, *args) for x in node]
-
-  def visit_tuple(self, node, *args):
-return tuple(self.visit(x, *args) for x in node)
-
-  def visit_dict(self, node, *args):
-return {key: self.visit(value, *args) for (key, value) in node.items()}
-
-
-class _SetInputPValues(_PValueishTransform):
-  def visit(self, node, replacements):
-if id(node) in replacements:
-  return replacements[id(node)]
-else:
-  return super(_SetInputPValues, self).visit(node, replacements)
-
-
-class _MaterializedDoOutputsTuple(pvalue.DoOutputsTuple):
-  def __init__(self, deferred, pvalue_cache):
-super(_MaterializedDoOutputsTuple, self).__init__(
-None, None, deferred._tags, deferred._main_tag)
-self._deferred = deferred
-self._pvalue_cache = pvalue_cache
-
-  def __getitem__(self, tag):
-return self._pvalue_cache.get_unwindowed_pvalue(self._deferred[tag])
-
-
-class _MaterializePValues(_PValueishTransform):
-  def __init__(self, pvalue_cache):
-self._pvalue_cache = pvalue_cache
-
-  def visit(self, node):
-if isinstance(node, pvalue.PValue):
-  return self._pvalue_cache.get_unwindowed_pvalue(node)
-elif isinstance(node, pvalue.DoOutputsTuple):
-  return _MaterializedDoOutputsTuple(node, self._pvalue_cache)
-else:
-  return super(_MaterializePValues, self).visit(node)
-
-
-class GetPValues(_PValueishTransform):
-  def visit(self, node, pvalues=None):
-if pvalues is None:
-  pvalues = []
-  self.visit(node, pvalues)
-  return pvalues
-elif isinstance(node, (pvalue.PValue, pvalue.DoOutputsTuple)):
-  pvalues.append(node)
-else:
-  super(GetPValues, self).visit(node, pvalues)
-
-
-class ZipPValues(_PValueishTransform):
-  """Pairs each PValue in a pvalueish with a value in a parallel out sibling.
-
-  Sibling should have the same nested structure as pvalueish.  Leaves in
-  sibling are expanded across nested pvalueish lists, tuples, and dicts.

[jira] [Commented] (BEAM-341) ReduceFnRunner allows GC time overflow

2016-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330838#comment-15330838
 ] 

ASF GitHub Bot commented on BEAM-341:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/464

[BEAM-341] Fix ReduceFnRunner GC time overflow

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam gc-time

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #464


commit 26662378aaf7bb08dde12c6d0fa64f2b490a0d4d
Author: Kenneth Knowles 
Date:   2016-06-14T23:04:10Z

Add test for ReduceFnRunner GC time overflow




> ReduceFnRunner allows GC time overflow
> --
>
> Key: BEAM-341
> URL: https://issues.apache.org/jira/browse/BEAM-341
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> In {{ReduceFnRunner}}, any window ending after the global window has its GC 
> time capped to the end of the global window. But for windows ending before 
> the global window the allowed lateness can still be arbitrary, causing 
> overflow.
> http://stackoverflow.com/questions/37808159/why-am-i-getting-java-lang-illegalstateexception-on-google-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[39/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/pickler.py
--
diff --git a/sdks/python/apache_beam/internal/pickler.py 
b/sdks/python/apache_beam/internal/pickler.py
new file mode 100644
index 000..00f7fc7
--- /dev/null
+++ b/sdks/python/apache_beam/internal/pickler.py
@@ -0,0 +1,205 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Pickler for values, functions, and classes.
+
+Pickles created by the pickling library contain non-ASCII characters, so
+we base64-encode the results so that we can put them in a JSON objects.
+The pickler is used to embed FlatMap callable objects into the workflow JSON
+description.
+
+The pickler module should be used to pickle functions and modules; for values,
+the coders.*PickleCoder classes should be used instead.
+"""
+
+import base64
+import logging
+import sys
+import traceback
+import types
+
+import dill
+
+
+def is_nested_class(cls):
+  """Returns true if argument is a class object that appears to be nested."""
+  return (isinstance(cls, type)
+  and cls.__module__ != '__builtin__'
+  and cls.__name__ not in sys.modules[cls.__module__].__dict__)
+
+
+def find_containing_class(nested_class):
+  """Finds containing class of a nestec class passed as argument."""
+
+  def find_containing_class_inner(outer):
+for k, v in outer.__dict__.items():
+  if v is nested_class:
+return outer, k
+  elif isinstance(v, (type, types.ClassType)) and hasattr(v, '__dict__'):
+res = find_containing_class_inner(v)
+if res: return res
+
+  return find_containing_class_inner(sys.modules[nested_class.__module__])
+
+
+def _nested_type_wrapper(fun):
+  """A wrapper for the standard pickler handler for class objects.
+
+  Args:
+fun: Original pickler handler for type objects.
+
+  Returns:
+A wrapper for type objects that handles nested classes.
+
+  The wrapper detects if an object being pickled is a nested class object.
+  For nested class object only it will save the containing class object so
+  the nested structure is recreated during unpickle.
+  """
+
+  def wrapper(pickler, obj):
+# When the nested class is defined in the __main__ module we do not have to
+# do anything special because the pickler itself will save the constituent
+# parts of the type (i.e., name, base classes, dictionary) and then
+# recreate it during unpickling.
+if is_nested_class(obj) and obj.__module__ != '__main__':
+  containing_class_and_name = find_containing_class(obj)
+  if containing_class_and_name is not None:
+return pickler.save_reduce(
+getattr, containing_class_and_name, obj=obj)
+try:
+  return fun(pickler, obj)
+except dill.dill.PicklingError:
+  # pylint: disable=protected-access
+  return pickler.save_reduce(
+  dill.dill._create_type,
+  (type(obj), obj.__name__, obj.__bases__,
+   dill.dill._dict_from_dictproxy(obj.__dict__)),
+  obj=obj)
+  # pylint: enable=protected-access
+
+  return wrapper
+
+# Monkey patch the standard pickler dispatch table entry for type objects.
+# Dill, for certain types, defers to the standard pickler (including type
+# objects). We wrap the standard handler using type_wrapper() because
+# for nested class we want to pickle the actual enclosing class object so we
+# can recreate it during unpickling.
+# TODO(silviuc): Make sure we submit the fix upstream to GitHub dill project.
+dill.dill.Pickler.dispatch[type] = _nested_type_wrapper(
+dill.dill.Pickler.dispatch[type])
+
+
+# Dill pickles generators objects without complaint, but unpickling produces
+# TypeError: object.__new__(generator) is not safe, use generator.__new__()
+# on some versions of Python.
+def reject_generators(unused_pickler, unused_obj):
+  raise TypeError("can't (safely) pickle generator objects")
+dill.dill.Pickler.dispatch[types.GeneratorType] = reject_generators
+
+
+# This if guards against dill not being full initialized when generating docs.
+if 'save_module' in dir(dill.dill):
+
+  # Always pickle non-main modules by name.
+  old_save_module = dill.dill.save_module
+
+  @dill.dill.register(dill.dill.ModuleType)
+  def save_module(pickler, obj):
+if dill.dill.is_dill(pickler) and obj is pickler._main:
+  return old_save_module(pickler,

[11/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/io/iobase.py
--
diff --git a/sdks/python/google/cloud/dataflow/io/iobase.py 
b/sdks/python/google/cloud/dataflow/io/iobase.py
deleted file mode 100644
index 26ebeb5..000
--- a/sdks/python/google/cloud/dataflow/io/iobase.py
+++ /dev/null
@@ -1,1073 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Sources and sinks.
-
-A Source manages record-oriented data input from a particular kind of source
-(e.g. a set of files, a database table, etc.). The reader() method of a source
-returns a reader object supporting the iterator protocol; iteration yields
-raw records of unprocessed, serialized data.
-
-
-A Sink manages record-oriented data output to a particular kind of sink
-(e.g. a set of files, a database table, etc.). The writer() method of a sink
-returns a writer object supporting writing records of serialized data to
-the sink.
-"""
-
-from collections import namedtuple
-
-import logging
-import random
-import uuid
-
-from google.cloud.dataflow import pvalue
-from google.cloud.dataflow.coders import PickleCoder
-from google.cloud.dataflow.pvalue import AsIter
-from google.cloud.dataflow.pvalue import AsSingleton
-from google.cloud.dataflow.transforms import core
-from google.cloud.dataflow.transforms import ptransform
-from google.cloud.dataflow.transforms import window
-
-
-def _dict_printable_fields(dict_object, skip_fields):
-  """Returns a list of strings for the interesting fields of a dict."""
-  return ['%s=%r' % (name, value)
-  for name, value in dict_object.iteritems()
-  # want to output value 0 but not None nor []
-  if (value or value == 0)
-  and name not in skip_fields]
-
-_minor_fields = ['coder', 'key_coder', 'value_coder',
- 'config_bytes', 'elements',
- 'append_trailing_newlines', 'strip_trailing_newlines',
- 'compression_type']
-
-
-class NativeSource(object):
-  """A source implemented by Dataflow service.
-
-  This class is to be only inherited by sources natively implemented by Cloud
-  Dataflow service, hence should not be sub-classed by users.
-
-  This class is deprecated and should not be used to define new sources.
-  """
-
-  def reader(self):
-"""Returns a NativeSourceReader instance associated with this source."""
-raise NotImplementedError
-
-  def __repr__(self):
-return '<{name} {vals}>'.format(
-name=self.__class__.__name__,
-vals=', '.join(_dict_printable_fields(self.__dict__,
-  _minor_fields)))
-
-
-class NativeSourceReader(object):
-  """A reader for a source implemented by Dataflow service."""
-
-  def __enter__(self):
-"""Opens everything necessary for a reader to function properly."""
-raise NotImplementedError
-
-  def __exit__(self, exception_type, exception_value, traceback):
-"""Cleans up after a reader executed."""
-raise NotImplementedError
-
-  def __iter__(self):
-"""Returns an iterator over all the records of the source."""
-raise NotImplementedError
-
-  @property
-  def returns_windowed_values(self):
-"""Returns whether this reader returns windowed values."""
-return False
-
-  def get_progress(self):
-"""Returns a representation of how far the reader has read.
-
-Returns:
-  A SourceReaderProgress object that gives the current progress of the
-  reader.
-"""
-return
-
-  def request_dynamic_split(self, dynamic_split_request):
-"""Attempts to split the input in two parts.
-
-The two parts are named the "primary" part and the "residual" part. The
-current 'NativeSourceReader' keeps processing the primary part, while the
-residual part will be processed elsewhere (e.g. perhaps on a different
-worker).
-
-The primary and residual parts, if concatenated, must represent the
-same input as the current input of this 'NativeSourceReader' before this
-call.
-
-The boundary between the primary part and the residual part is
-specified in a framework-specific way using 'DynamicSplitRequest' e.g.,
-if the framework supports the notion of positions, it might be a
-position at which the input is asked to split itself (which is not
-necessarily the same position at which it *will*

[GitHub] incubator-beam pull request #464: [BEAM-341] Fix ReduceFnRunner GC time over...

2016-06-14 Thread kennknowles

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/464

[BEAM-341] Fix ReduceFnRunner GC time overflow

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam gc-time

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #464


commit 26662378aaf7bb08dde12c6d0fa64f2b490a0d4d
Author: Kenneth Knowles 
Date:   2016-06-14T23:04:10Z

Add test for ReduceFnRunner GC time overflow




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[47/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py
--
diff --git 
a/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py 
b/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py
new file mode 100644
index 000..c601801
--- /dev/null
+++ b/sdks/python/apache_beam/examples/cookbook/bigquery_side_input_test.py
@@ -0,0 +1,59 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Test for the BigQuery side input example."""
+
+import logging
+import unittest
+
+import google.cloud.dataflow as df
+from google.cloud.dataflow.examples.cookbook import bigquery_side_input
+
+
+class BigQuerySideInputTest(unittest.TestCase):
+
+  def test_create_groups(self):
+p = df.Pipeline('DirectPipelineRunner')
+
+group_ids_pcoll = p | df.Create('create_group_ids', ['A', 'B', 'C'])
+corpus_pcoll = p | df.Create('create_corpus',
+ [{'f': 'corpus1'},
+  {'f': 'corpus2'},
+  {'f': 'corpus3'}])
+words_pcoll = p | df.Create('create_words', [{'f': 'word1'},
+ {'f': 'word2'},
+ {'f': 'word3'}])
+ignore_corpus_pcoll = p | df.Create('create_ignore_corpus', ['corpus1'])
+ignore_word_pcoll = p | df.Create('create_ignore_word', ['word1'])
+
+groups = bigquery_side_input.create_groups(group_ids_pcoll, corpus_pcoll,
+   words_pcoll, 
ignore_corpus_pcoll,
+   ignore_word_pcoll)
+
+def group_matcher(actual):
+  self.assertEqual(len(actual), 3)
+  for group in actual:
+self.assertEqual(len(group), 3)
+self.assertTrue(group[1].startswith('corpus'))
+self.assertNotEqual(group[1], 'corpus1')
+self.assertTrue(group[2].startswith('word'))
+self.assertNotEqual(group[2], 'word1')
+
+df.assert_that(groups, group_matcher)
+p.run()
+
+
+if __name__ == '__main__':
+  logging.getLogger().setLevel(logging.INFO)
+  unittest.main()

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py
--
diff --git a/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py 
b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py
new file mode 100644
index 000..ba3a41d
--- /dev/null
+++ b/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py
@@ -0,0 +1,96 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""A workflow using BigQuery sources and sinks.
+
+The workflow will read from a table that has the 'month' and 'tornado' fields 
as
+part of the table schema (other additional fields are ignored). The 'month'
+field is a number represented as a string (e.g., '23') and the 'tornado' field
+is a boolean field.
+
+The workflow will compute the number of tornadoes in each month and output
+the results to a table (created if needed) with the following schema:
+  - month: number
+  - tornado_count: number
+
+This example uses the default behavior for BigQuery source and sinks that
+represents table rows as plain Python dictionaries.
+"""
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+
+import google.cloud.dataflow as df
+
+
+def count_tornadoes(input_data):
+  """Workflow computing the number of tornadoes for each month that had one.
+
+  Args:
+input_data: a PCollection of dictionaries representing table rows. Each
+  dictionary will have a 'month' and a 'tornado' key as described in the
+  module comment.
+

[32/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/transforms/combiners_test.py
--
diff --git a/sdks/python/apache_beam/transforms/combiners_test.py 
b/sdks/python/apache_beam/transforms/combiners_test.py
new file mode 100644
index 000..b8142ea
--- /dev/null
+++ b/sdks/python/apache_beam/transforms/combiners_test.py
@@ -0,0 +1,225 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for our libraries of combine PTransforms."""
+
+import unittest
+
+import google.cloud.dataflow as df
+from google.cloud.dataflow.pipeline import Pipeline
+from google.cloud.dataflow.transforms import combiners
+import google.cloud.dataflow.transforms.combiners as combine
+from google.cloud.dataflow.transforms.core import CombineGlobally
+from google.cloud.dataflow.transforms.core import Create
+from google.cloud.dataflow.transforms.core import Map
+from google.cloud.dataflow.transforms.ptransform import PTransform
+from google.cloud.dataflow.transforms.util import assert_that, equal_to
+
+
+class CombineTest(unittest.TestCase):
+
+  def test_builtin_combines(self):
+pipeline = Pipeline('DirectPipelineRunner')
+
+vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6]
+mean = sum(vals) / float(len(vals))
+size = len(vals)
+
+# First for global combines.
+pcoll = pipeline | Create('start', vals)
+result_mean = pcoll | combine.Mean.Globally('mean')
+result_count = pcoll | combine.Count.Globally('count')
+assert_that(result_mean, equal_to([mean]), label='assert:mean')
+assert_that(result_count, equal_to([size]), label='assert:size')
+
+# Again for per-key combines.
+pcoll = pipeline | Create('start-perkey', [('a', x) for x in vals])
+result_key_mean = pcoll | combine.Mean.PerKey('mean-perkey')
+result_key_count = pcoll | combine.Count.PerKey('count-perkey')
+assert_that(result_key_mean, equal_to([('a', mean)]), label='key:mean')
+assert_that(result_key_count, equal_to([('a', size)]), label='key:size')
+pipeline.run()
+
+  def test_top(self):
+pipeline = Pipeline('DirectPipelineRunner')
+
+# A parameter we'll be sharing with a custom comparator.
+names = {0: 'zo',
+ 1: 'one',
+ 2: 'twoo',
+ 3: 'three',
+ 5: 'fiiive',
+ 6: 'six',
+ 9: 'nniiinne'}
+
+# First for global combines.
+pcoll = pipeline | Create('start', [6, 3, 1, 1, 9, 1, 5, 2, 0, 6])
+result_top = pcoll | combine.Top.Largest('top', 5)
+result_bot = pcoll | combine.Top.Smallest('bot', 4)
+result_cmp = pcoll | combine.Top.Of(
+'cmp',
+6,
+lambda a, b, names: len(names[a]) < len(names[b]),
+names)  # Note parameter passed to comparator.
+assert_that(result_top, equal_to([[9, 6, 6, 5, 3]]), label='assert:top')
+assert_that(result_bot, equal_to([[0, 1, 1, 1]]), label='assert:bot')
+assert_that(result_cmp, equal_to([[9, 6, 6, 5, 3, 2]]), label='assert:cmp')
+
+# Again for per-key combines.
+pcoll = pipeline | Create(
+'start-perkey', [('a', x) for x in [6, 3, 1, 1, 9, 1, 5, 2, 0, 6]])
+result_key_top = pcoll | combine.Top.LargestPerKey('top-perkey', 5)
+result_key_bot = pcoll | combine.Top.SmallestPerKey('bot-perkey', 4)
+result_key_cmp = pcoll | combine.Top.PerKey(
+'cmp-perkey',
+6,
+lambda a, b, names: len(names[a]) < len(names[b]),
+names)  # Note parameter passed to comparator.
+assert_that(result_key_top, equal_to([('a', [9, 6, 6, 5, 3])]),
+label='key:top')
+assert_that(result_key_bot, equal_to([('a', [0, 1, 1, 1])]),
+label='key:bot')
+assert_that(result_key_cmp, equal_to([('a', [9, 6, 6, 5, 3, 2])]),
+label='key:cmp')
+pipeline.run()
+
+  def test_top_shorthands(self):
+pipeline = Pipeline('DirectPipelineRunner')
+
+pcoll = pipeline | Create('start', [6, 3, 1, 1, 9, 1, 5, 2, 0, 6])
+result_top = pcoll | df.CombineGlobally('top', combiners.Largest(5))
+result_bot = pcoll | df.CombineGlobally('bot', combiners.Smallest(4))
+assert_that(result_top, equal_to([[9, 6, 6, 5, 3]]), label='assert:top')
+assert_that(result_bot, equal_to([[0, 1, 1, 1]]), label='assert:bot')
+
+pcoll = pipeline | Create(
+'start-perkey', [('a', x) for x in [6, 3, 1,

[25/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/utils/processes_test.py
--
diff --git a/sdks/python/apache_beam/utils/processes_test.py 
b/sdks/python/apache_beam/utils/processes_test.py
new file mode 100644
index 000..eaaf06a
--- /dev/null
+++ b/sdks/python/apache_beam/utils/processes_test.py
@@ -0,0 +1,103 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Unit tests for the processes module."""
+
+import unittest
+
+
+import mock
+
+from google.cloud.dataflow.utils import processes
+
+
+class Exec(unittest.TestCase):
+
+  def setUp(self):
+pass
+
+  @mock.patch('google.cloud.dataflow.utils.processes.subprocess')
+  def test_method_forwarding_not_windows(self, *unused_mocks):
+# Test that the correct calls are being forwarded to the subprocess module
+# when we are not on Windows.
+processes.force_shell = False
+
+processes.call(['subprocess', 'call'], shell=False, other_arg=True)
+processes.subprocess.call.assert_called_once_with(
+['subprocess', 'call'],
+shell=False,
+other_arg=True)
+
+processes.check_call(
+['subprocess', 'check_call'],
+shell=False,
+other_arg=True)
+processes.subprocess.check_call.assert_called_once_with(
+['subprocess', 'check_call'],
+shell=False,
+other_arg=True)
+
+processes.check_output(
+['subprocess', 'check_output'],
+shell=False,
+other_arg=True)
+processes.subprocess.check_output.assert_called_once_with(
+['subprocess', 'check_output'],
+shell=False,
+other_arg=True)
+
+processes.Popen(['subprocess', 'Popen'], shell=False, other_arg=True)
+processes.subprocess.Popen.assert_called_once_with(
+['subprocess', 'Popen'],
+shell=False,
+other_arg=True)
+
+  @mock.patch('google.cloud.dataflow.utils.processes.subprocess')
+  def test_method_forwarding_windows(self, *unused_mocks):
+# Test that the correct calls are being forwarded to the subprocess module
+# and that the shell=True flag is added when we are on Windows.
+processes.force_shell = True
+
+processes.call(['subprocess', 'call'], shell=False, other_arg=True)
+processes.subprocess.call.assert_called_once_with(
+['subprocess', 'call'],
+shell=True,
+other_arg=True)
+
+processes.check_call(
+['subprocess', 'check_call'],
+shell=False,
+other_arg=True)
+processes.subprocess.check_call.assert_called_once_with(
+['subprocess', 'check_call'],
+shell=True,
+other_arg=True)
+
+processes.check_output(
+['subprocess', 'check_output'],
+shell=False,
+other_arg=True)
+processes.subprocess.check_output.assert_called_once_with(
+['subprocess', 'check_output'],
+shell=True,
+other_arg=True)
+
+processes.Popen(['subprocess', 'Popen'], shell=False, other_arg=True)
+processes.subprocess.Popen.assert_called_once_with(
+['subprocess', 'Popen'],
+shell=True,
+other_arg=True)
+
+
+if __name__ == '__main__':
+  unittest.main()

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/utils/profiler.py
--
diff --git a/sdks/python/apache_beam/utils/profiler.py 
b/sdks/python/apache_beam/utils/profiler.py
new file mode 100644
index 000..a210e8c
--- /dev/null
+++ b/sdks/python/apache_beam/utils/profiler.py
@@ -0,0 +1,66 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""A profiler context manager based on cProfile.Profile objects."""
+
+import cProfile
+import logging
+import os
+import pstats
+import StringIO
+import tempfile
+import time
+

[30/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/transforms/ptransform_test.py
--
diff --git a/sdks/python/apache_beam/transforms/ptransform_test.py 
b/sdks/python/apache_beam/transforms/ptransform_test.py
new file mode 100644
index 000..00b6c8d
--- /dev/null
+++ b/sdks/python/apache_beam/transforms/ptransform_test.py
@@ -0,0 +1,1814 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for the PTransform and descendants."""
+
+from __future__ import absolute_import
+
+import operator
+import re
+import unittest
+
+
+import google.cloud.dataflow as df
+from google.cloud.dataflow.pipeline import Pipeline
+import google.cloud.dataflow.pvalue as pvalue
+import google.cloud.dataflow.transforms.combiners as combine
+from google.cloud.dataflow.transforms.ptransform import PTransform
+from google.cloud.dataflow.transforms.util import assert_that, equal_to
+import google.cloud.dataflow.typehints as typehints
+from google.cloud.dataflow.typehints import with_input_types
+from google.cloud.dataflow.typehints import with_output_types
+from google.cloud.dataflow.typehints.typehints_test import TypeHintTestCase
+from google.cloud.dataflow.utils.options import PipelineOptions
+from google.cloud.dataflow.utils.options import TypeOptions
+
+
+# Disable frequent lint warning due to pipe operator for chaining transforms.
+# pylint: disable=expression-not-assigned
+
+
+class PTransformTest(unittest.TestCase):
+
+  def assertStartswith(self, msg, prefix):
+self.assertTrue(msg.startswith(prefix),
+'"%s" does not start with "%s"' % (msg, prefix))
+
+  def test_str(self):
+self.assertEqual('',
+ str(PTransform()))
+
+pa = Pipeline('DirectPipelineRunner')
+res = pa | df.Create('a_label', [1, 2])
+self.assertEqual('',
+ str(res.producer.transform))
+
+pc = Pipeline('DirectPipelineRunner')
+res = pc | df.Create('with_inputs', [1, 2])
+inputs_tr = res.producer.transform
+inputs_tr.inputs = ('ci',)
+self.assertEqual(
+"""""",
+str(inputs_tr))
+
+pd = Pipeline('DirectPipelineRunner')
+res = pd | df.Create('with_sidei', [1, 2])
+side_tr = res.producer.transform
+side_tr.side_inputs = (4,)
+self.assertEqual(
+'',
+str(side_tr))
+
+inputs_tr.side_inputs = ('cs',)
+self.assertEqual(
+"""""",
+str(inputs_tr))
+
+  def test_parse_label_and_arg(self):
+
+def fun(*args, **kwargs):
+  return PTransform().parse_label_and_arg(args, kwargs, 'name')
+
+self.assertEqual(('PTransform', 'value'), fun('value'))
+self.assertEqual(('PTransform', 'value'), fun(name='value'))
+self.assertEqual(('label', 'value'), fun('label', 'value'))
+self.assertEqual(('label', 'value'), fun('label', name='value'))
+self.assertEqual(('label', 'value'), fun('value', label='label'))
+self.assertEqual(('label', 'value'), fun(name='value', label='label'))
+
+self.assertRaises(ValueError, fun)
+self.assertRaises(ValueError, fun, 0, 'value')
+self.assertRaises(ValueError, fun, label=0, name='value')
+self.assertRaises(ValueError, fun, other='value')
+
+with self.assertRaises(ValueError) as cm:
+  fun(0, name='value')
+self.assertEqual(
+cm.exception.message,
+'PTransform expects a (label, name) or (name) argument list '
+'instead of args=(0,), kwargs={\'name\': \'value\'}')
+
+  def test_do_with_do_fn(self):
+class AddNDoFn(df.DoFn):
+
+  def process(self, context, addon):
+return [context.element + addon]
+
+pipeline = Pipeline('DirectPipelineRunner')
+pcoll = pipeline | df.Create('start', [1, 2, 3])
+result = pcoll | df.ParDo('do', AddNDoFn(), 10)
+assert_that(result, equal_to([11, 12, 13]))
+pipeline.run()
+
+  def test_do_with_unconstructed_do_fn(self):
+class MyDoFn(df.DoFn):
+
+  def process(self, context):
+pass
+
+pipeline = Pipeline('DirectPipelineRunner')
+pcoll = pipeline | df.Create('start', [1,

[49/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

Move all files to apache_beam folder


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b14dfadd
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b14dfadd
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b14dfadd

Branch: refs/heads/python-sdk
Commit: b14dfadd1f414063eb0710eae8237eb2fa9c8a2f
Parents: e507928
Author: Silviu Calinoiu 
Authored: Tue Jun 14 08:49:04 2016 -0700
Committer: Silviu Calinoiu 
Committed: Tue Jun 14 12:07:07 2016 -0700

--
 sdks/python/apache_beam/__init__.py |   78 +
 sdks/python/apache_beam/coders/__init__.py  |   16 +
 sdks/python/apache_beam/coders/coder_impl.pxd   |  109 +
 sdks/python/apache_beam/coders/coder_impl.py|  316 ++
 sdks/python/apache_beam/coders/coders.py|  511 +++
 sdks/python/apache_beam/coders/coders_test.py   |   60 +
 .../apache_beam/coders/coders_test_common.py|  180 ++
 .../apache_beam/coders/fast_coders_test.py  |   34 +
 sdks/python/apache_beam/coders/observable.py|   33 +
 .../apache_beam/coders/observable_test.py   |   54 +
 .../apache_beam/coders/slow_coders_test.py  |   36 +
 sdks/python/apache_beam/coders/slow_stream.py   |  136 +
 sdks/python/apache_beam/coders/stream.pxd   |   58 +
 sdks/python/apache_beam/coders/stream.pyx   |  201 ++
 sdks/python/apache_beam/coders/stream_test.py   |  168 +
 sdks/python/apache_beam/coders/typecoders.py|  154 +
 .../apache_beam/coders/typecoders_test.py   |  114 +
 sdks/python/apache_beam/dataflow_test.py|  405 +++
 sdks/python/apache_beam/error.py|   39 +
 sdks/python/apache_beam/examples/__init__.py|0
 .../examples/complete/autocomplete.py   |   79 +
 .../examples/complete/autocomplete_test.py  |   78 +
 .../examples/complete/estimate_pi.py|  109 +
 .../examples/complete/estimate_pi_test.py   |   46 +
 .../complete/juliaset/juliaset/__init__.py  |0
 .../complete/juliaset/juliaset/juliaset.py  |  119 +
 .../complete/juliaset/juliaset/juliaset_test.py |   83 +
 .../examples/complete/juliaset/juliaset_main.py |   55 +
 .../examples/complete/juliaset/setup.py |  115 +
 .../apache_beam/examples/complete/tfidf.py  |  196 ++
 .../apache_beam/examples/complete/tfidf_test.py |   88 +
 .../examples/complete/top_wikipedia_sessions.py |  170 +
 .../complete/top_wikipedia_sessions_test.py |   58 +
 .../examples/cookbook/bigquery_schema.py|  127 +
 .../examples/cookbook/bigquery_side_input.py|  114 +
 .../cookbook/bigquery_side_input_test.py|   59 +
 .../examples/cookbook/bigquery_tornadoes.py |   96 +
 .../cookbook/bigquery_tornadoes_test.py |   41 +
 .../apache_beam/examples/cookbook/bigshuffle.py |   84 +
 .../examples/cookbook/bigshuffle_test.py|   61 +
 .../apache_beam/examples/cookbook/coders.py |   92 +
 .../examples/cookbook/coders_test.py|   56 +
 .../examples/cookbook/combiners_test.py |   73 +
 .../examples/cookbook/custom_ptransform.py  |  132 +
 .../examples/cookbook/custom_ptransform_test.py |   64 +
 .../apache_beam/examples/cookbook/filters.py|  104 +
 .../examples/cookbook/filters_test.py   |   65 +
 .../examples/cookbook/group_with_coder.py   |  111 +
 .../examples/cookbook/group_with_coder_test.py  |   87 +
 .../examples/cookbook/mergecontacts.py  |  126 +
 .../examples/cookbook/mergecontacts_test.py |  121 +
 .../examples/cookbook/multiple_output_pardo.py  |  171 +
 .../cookbook/multiple_output_pardo_test.py  |   69 +
 .../apache_beam/examples/snippets/snippets.py   |  872 +
 .../examples/snippets/snippets_test.py  |  560 
 .../apache_beam/examples/streaming_wordcap.py   |   61 +
 .../apache_beam/examples/streaming_wordcount.py |   71 +
 sdks/python/apache_beam/examples/wordcount.py   |   99 +
 .../apache_beam/examples/wordcount_debugging.py |  154 +
 .../examples/wordcount_debugging_test.py|   56 +
 .../apache_beam/examples/wordcount_minimal.py   |  111 +
 .../examples/wordcount_minimal_test.py  |   56 +
 .../apache_beam/examples/wordcount_test.py  |   55 +
 sdks/python/apache_beam/internal/__init__.py|0
 sdks/python/apache_beam/internal/apiclient.py   |  935 ++
 .../apache_beam/internal/apiclient_test.py  |  110 +
 sdks/python/apache_beam/internal/auth.py|  161 +
 .../apache_beam/internal/clients/__init__.py|0
 .../internal/clients/bigquery/__init__.py   |   10 +
 .../clients/bigquery/bigquery_v2_client.py  |  642 
 .../clients/bigquery/bigquery_v2_messages.py| 1893 +++
 .../internal/clients/dataflow/__init__.py   |   10 +
 .../clients/dataflow/dataflow_v1b3_client.py|  316 ++
 .../clients/dataflow/dataflow_v1b3_messages.py  | 3056

[29/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/transforms/timeutil_test.py
--
diff --git a/sdks/python/apache_beam/transforms/timeutil_test.py 
b/sdks/python/apache_beam/transforms/timeutil_test.py
new file mode 100644
index 000..26ff3ae
--- /dev/null
+++ b/sdks/python/apache_beam/transforms/timeutil_test.py
@@ -0,0 +1,165 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for time utilities."""
+
+from __future__ import absolute_import
+
+import unittest
+
+from google.cloud.dataflow.transforms.timeutil import Duration
+from google.cloud.dataflow.transforms.timeutil import Timestamp
+
+
+class TimestampTest(unittest.TestCase):
+
+  def test_of(self):
+interval = Timestamp(123)
+self.assertEqual(id(interval), id(Timestamp.of(interval)))
+self.assertEqual(interval, Timestamp.of(123.0))
+with self.assertRaises(TypeError):
+  Timestamp.of(Duration(10))
+
+  def test_precision(self):
+self.assertEqual(Timestamp(1000) % 0.1, 0)
+self.assertEqual(Timestamp(1000) % 0.05, 0)
+self.assertEqual(Timestamp(1000) % 0.05, 0)
+self.assertEqual(Timestamp(1000) % Duration(0.1), 0)
+self.assertEqual(Timestamp(1000) % Duration(0.05), 0)
+self.assertEqual(Timestamp(1000) % Duration(0.05), 0)
+
+  def test_utc_timestamp(self):
+self.assertEqual(Timestamp(1000).isoformat(),
+ '1970-04-26T17:46:40Z')
+self.assertEqual(Timestamp(1000.01).isoformat(),
+ '1970-04-26T17:46:40.01Z')
+self.assertEqual(Timestamp(1458343379.123456).isoformat(),
+ '2016-03-18T23:22:59.123456Z')
+
+  def test_arithmetic(self):
+# Supported operations.
+self.assertEqual(Timestamp(123) + 456, 579)
+self.assertEqual(Timestamp(123) + Duration(456), 579)
+self.assertEqual(456 + Timestamp(123), 579)
+self.assertEqual(Duration(456) + Timestamp(123), 579)
+self.assertEqual(Timestamp(123) - 456, -333)
+self.assertEqual(Timestamp(123) - Duration(456), -333)
+self.assertEqual(Timestamp(1230) % 456, 318)
+self.assertEqual(Timestamp(1230) % Duration(456), 318)
+
+# Check that direct comparison of Timestamp and Duration is allowed.
+self.assertTrue(Duration(123) == Timestamp(123))
+self.assertTrue(Timestamp(123) == Duration(123))
+self.assertFalse(Duration(123) == Timestamp(1230))
+self.assertFalse(Timestamp(123) == Duration(1230))
+
+# Check return types.
+self.assertEqual((Timestamp(123) + 456).__class__, Timestamp)
+self.assertEqual((Timestamp(123) + Duration(456)).__class__, Timestamp)
+self.assertEqual((456 + Timestamp(123)).__class__, Timestamp)
+self.assertEqual((Duration(456) + Timestamp(123)).__class__, Timestamp)
+self.assertEqual((Timestamp(123) - 456).__class__, Timestamp)
+self.assertEqual((Timestamp(123) - Duration(456)).__class__, Timestamp)
+self.assertEqual((Timestamp(1230) % 456).__class__, Duration)
+self.assertEqual((Timestamp(1230) % Duration(456)).__class__, Duration)
+
+# Unsupported operations.
+with self.assertRaises(TypeError):
+  self.assertEqual(Timestamp(123) * 456, 56088)
+with self.assertRaises(TypeError):
+  self.assertEqual(Timestamp(123) * Duration(456), 56088)
+with self.assertRaises(TypeError):
+  self.assertEqual(456 * Timestamp(123), 56088)
+with self.assertRaises(TypeError):
+  self.assertEqual(Duration(456) * Timestamp(123), 56088)
+with self.assertRaises(TypeError):
+  self.assertEqual(456 - Timestamp(123), 333)
+with self.assertRaises(TypeError):
+  self.assertEqual(Duration(456) - Timestamp(123), 333)
+with self.assertRaises(TypeError):
+  self.assertEqual(-Timestamp(123), -123)
+with self.assertRaises(TypeError):
+  self.assertEqual(-Timestamp(123), -Duration(123))
+with self.assertRaises(TypeError):
+  self.assertEqual(1230 % Timestamp(456), 318)
+with self.assertRaises(TypeError):
+  self.assertEqual(Duration(1230) % Timestamp(456), 318)
+
+  def test_sort_order(self):
+self.assertEqual(
+[-63, Timestamp(-3), 2, 9, Timestamp(292.3), 500],
+sorted([9, 2, Timestamp(-3), Timestamp(292.3), -63, 500]))
+self.assertEqual(
+[4, 5, Timestamp(6), Timestamp(7), 8, 9],
+

[46/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/examples/snippets/snippets.py
--
diff --git a/sdks/python/apache_beam/examples/snippets/snippets.py 
b/sdks/python/apache_beam/examples/snippets/snippets.py
new file mode 100644
index 000..f6bb63a
--- /dev/null
+++ b/sdks/python/apache_beam/examples/snippets/snippets.py
@@ -0,0 +1,872 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Code snippets used in Cloud Dataflow webdocs.
+
+The examples here are written specifically to read well with the accompanying
+web docs from https://cloud.google.com/dataflow. Do not rewrite them until you
+make sure the webdocs still read well and the rewritten code supports the
+concept being described. For example, there are snippets that could be shorter
+but they are written like this to make a specific point in the docs.
+
+The code snippets are all organized as self contained functions. Parts of the
+function body delimited by [START tag] and [END tag] will be included
+automatically in the web docs. The naming convention for the tags is to have as
+prefix the PATH_TO_HTML where they are included followed by a descriptive
+string. For instance a code snippet that will be used as a code example
+at https://cloud.google.com/dataflow/model/pipelines will have the tag
+model_pipelines_DESCRIPTION. The tags can contain only letters, digits and _.
+"""
+
+import google.cloud.dataflow as df
+
+# Quiet some pylint warnings that happen because of the somewhat special
+# format for the code snippets.
+# pylint:disable=invalid-name
+# pylint:disable=expression-not-assigned
+# pylint:disable=redefined-outer-name
+# pylint:disable=unused-variable
+# pylint:disable=g-doc-args
+# pylint:disable=g-import-not-at-top
+
+
+class SnippetUtils(object):
+  from google.cloud.dataflow.pipeline import PipelineVisitor
+
+  class RenameFiles(PipelineVisitor):
+"""RenameFiles will rewire source and sink for unit testing.
+
+RenameFiles will rewire the GCS files specified in the source and
+sink in the snippet pipeline to local files so the pipeline can be run as a
+unit test. This is as close as we can get to have code snippets that are
+executed and are also ready to presented in webdocs.
+"""
+
+def __init__(self, renames):
+  self.renames = renames
+
+def visit_transform(self, transform_node):
+  if hasattr(transform_node.transform, 'source'):
+source = transform_node.transform.source
+source.file_path = self.renames['read']
+source.is_gcs_source = False
+  elif hasattr(transform_node.transform, 'sink'):
+sink = transform_node.transform.sink
+sink.file_path = self.renames['write']
+sink.is_gcs_sink = False
+
+
+def construct_pipeline(renames):
+  """A reverse words snippet as an example for constructing a pipeline.
+
+  URL: https://cloud.google.com/dataflow/pipelines/constructing-your-pipeline
+  """
+  import re
+
+  class ReverseWords(df.PTransform):
+"""A PTransform that reverses individual elements in a PCollection."""
+
+def apply(self, pcoll):
+  return pcoll | df.Map(lambda e: e[::-1])
+
+  def filter_words(unused_x):
+"""Pass through filter to select everything."""
+return True
+
+  # [START pipelines_constructing_creating]
+  from google.cloud.dataflow.utils.options import PipelineOptions
+
+  p = df.Pipeline(options=PipelineOptions())
+  # [END pipelines_constructing_creating]
+
+  # [START pipelines_constructing_reading]
+  lines = p | df.io.Read('ReadMyFile',
+  df.io.TextFileSource('gs://some/inputData.txt'))
+  # [END pipelines_constructing_reading]
+
+  # [START pipelines_constructing_applying]
+  words = lines | df.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
+  reversed_words = words | ReverseWords()
+  # [END pipelines_constructing_applying]
+
+  # [START pipelines_constructing_writing]
+  filtered_words = reversed_words | df.Filter('FilterWords', filter_words)
+  filtered_words | df.io.Write('WriteMyFile',
+   df.io.TextFileSink('gs://some/outputData.txt'))
+  # [END pipelines_constructing_writing]
+
+  p.visit(SnippetUtils.RenameFiles(renames))
+
+  # [START pipelines_constructing_running]
+  p.run()
+  # [END pipelines_constructing_running]
+
+
+def model_pipelines(argv):
+  """A wordcount

[42/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_messages.py
--
diff --git 
a/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_messages.py 
b/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_messages.py
new file mode 100644
index 000..2e0273f
--- /dev/null
+++ 
b/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_messages.py
@@ -0,0 +1,3056 @@
+"""Generated message classes for dataflow version v1b3.
+
+Develops and executes data processing patterns like ETL, batch computation,
+and continuous computation.
+"""
+# NOTE: This file is autogenerated and should not be edited by hand.
+
+from apitools.base.protorpclite import messages as _messages
+from apitools.base.py import encoding
+
+
+package = 'dataflow'
+
+
+class ApproximateProgress(_messages.Message):
+  """Obsolete in favor of ApproximateReportedProgress and
+  ApproximateSplitRequest.
+
+  Fields:
+percentComplete: Obsolete.
+position: Obsolete.
+remainingTime: Obsolete.
+  """
+
+  percentComplete = _messages.FloatField(1, variant=_messages.Variant.FLOAT)
+  position = _messages.MessageField('Position', 2)
+  remainingTime = _messages.StringField(3)
+
+
+class ApproximateReportedProgress(_messages.Message):
+  """A progress measurement of a WorkItem by a worker.
+
+  Fields:
+consumedParallelism: Total amount of parallelism in the portion of input
+  of this work item that has already been consumed. In the first two
+  examples above (see remaining_parallelism), the value should be 30 or 3
+  respectively. The sum of remaining_parallelism and consumed_parallelism
+  should equal the total amount of parallelism in this work item. If
+  specified, must be finite.
+fractionConsumed: Completion as fraction of the input consumed, from 0.0
+  (beginning, nothing consumed), to 1.0 (end of the input, entire input
+  consumed).
+position: A Position within the work to represent a progress.
+remainingParallelism: Total amount of parallelism in the input of this
+  WorkItem that has not been consumed yet (i.e. can be delegated to a new
+  WorkItem via dynamic splitting). "Amount of parallelism" refers to how
+  many non-empty parts of the input can be read in parallel. This does not
+  necessarily equal number of records. An input that can be read in
+  parallel down to the individual records is called "perfectly
+  splittable". An example of non-perfectly parallelizable input is a
+  block-compressed file format where a block of records has to be read as
+  a whole, but different blocks can be read in parallel. Examples: * If we
+  have read 30 records out of 50 in a perfectly splittable 50-record
+  input, this value should be 20. * If we are reading through block 3 in a
+  block-compressed file consisting of 5 blocks, this value should be 2
+  (since blocks 4 and 5 can be processed in parallel by new work items via
+  dynamic splitting). * If we are reading through the last block in a
+  block-compressed file, or reading or processing the last record in a
+  perfectly splittable input, this value should be 0, because the
+  remainder of the work item cannot be further split.
+  """
+
+  consumedParallelism = _messages.MessageField('ReportedParallelism', 1)
+  fractionConsumed = _messages.FloatField(2)
+  position = _messages.MessageField('Position', 3)
+  remainingParallelism = _messages.MessageField('ReportedParallelism', 4)
+
+
+class ApproximateSplitRequest(_messages.Message):
+  """A suggestion by the service to the worker to dynamically split the
+  WorkItem.
+
+  Fields:
+fractionConsumed: A fraction at which to split the work item, from 0.0
+  (beginning of the input) to 1.0 (end of the input).
+position: A Position at which to split the work item.
+  """
+
+  fractionConsumed = _messages.FloatField(1)
+  position = _messages.MessageField('Position', 2)
+
+
+class AutoscalingSettings(_messages.Message):
+  """Settings for WorkerPool autoscaling.
+
+  Enums:
+AlgorithmValueValuesEnum: The algorithm to use for autoscaling.
+
+  Fields:
+algorithm: The algorithm to use for autoscaling.
+maxNumWorkers: The maximum number of workers to cap scaling at.
+  """
+
+  class AlgorithmValueValuesEnum(_messages.Enum):
+"""The algorithm to use for autoscaling.
+
+Values:
+  AUTOSCALING_ALGORITHM_UNKNOWN: 
+  AUTOSCALING_ALGORITHM_NONE: 
+  AUTOSCALING_ALGORITHM_BASIC: 
+"""
+AUTOSCALING_ALGORITHM_UNKNOWN = 0
+AUTOSCALING_ALGORITHM_NONE = 1
+AUTOSCALING_ALGORITHM_BASIC = 2
+
+  algorithm = _messages.EnumField('AlgorithmValueValuesEnum', 1)
+  maxNumWorkers = _messages.IntegerField(2, variant=_messages.Variant.INT32)
+
+
+class ComputationTopology(_messages.Message):
+  """All configuration

[23/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/examples/cookbook/bigquery_schema.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/examples/cookbook/bigquery_schema.py 
b/sdks/python/google/cloud/dataflow/examples/cookbook/bigquery_schema.py
deleted file mode 100644
index 67616ec..000
--- a/sdks/python/google/cloud/dataflow/examples/cookbook/bigquery_schema.py
+++ /dev/null
@@ -1,127 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""A workflow that writes to a BigQuery table with nested and repeated fields.
-
-Demonstrates how to build a bigquery.TableSchema object with nested and 
repeated
-fields. Also, shows how to generate data to be written to a BigQuery table with
-nested and repeated fields.
-"""
-
-from __future__ import absolute_import
-
-import argparse
-import logging
-
-import google.cloud.dataflow as df
-
-
-def run(argv=None):
-  """Run the workflow."""
-  parser = argparse.ArgumentParser()
-
-  parser.add_argument(
-  '--output',
-  required=True,
-  help=
-  ('Output BigQuery table for results specified as: PROJECT:DATASET.TABLE '
-   'or DATASET.TABLE.'))
-  known_args, pipeline_args = parser.parse_known_args(argv)
-
-  p = df.Pipeline(argv=pipeline_args)
-
-  from google.cloud.dataflow.internal.clients import bigquery  # pylint: 
disable=g-import-not-at-top
-
-  table_schema = bigquery.TableSchema()
-
-  # Fields that use standard types.
-  kind_schema = bigquery.TableFieldSchema()
-  kind_schema.name = 'kind'
-  kind_schema.type = 'string'
-  kind_schema.mode = 'nullable'
-  table_schema.fields.append(kind_schema)
-
-  full_name_schema = bigquery.TableFieldSchema()
-  full_name_schema.name = 'fullName'
-  full_name_schema.type = 'string'
-  full_name_schema.mode = 'required'
-  table_schema.fields.append(full_name_schema)
-
-  age_schema = bigquery.TableFieldSchema()
-  age_schema.name = 'age'
-  age_schema.type = 'integer'
-  age_schema.mode = 'nullable'
-  table_schema.fields.append(age_schema)
-
-  gender_schema = bigquery.TableFieldSchema()
-  gender_schema.name = 'gender'
-  gender_schema.type = 'string'
-  gender_schema.mode = 'nullable'
-  table_schema.fields.append(gender_schema)
-
-  # A nested field
-  phone_number_schema = bigquery.TableFieldSchema()
-  phone_number_schema.name = 'phoneNumber'
-  phone_number_schema.type = 'record'
-  phone_number_schema.mode = 'nullable'
-
-  area_code = bigquery.TableFieldSchema()
-  area_code.name = 'areaCode'
-  area_code.type = 'integer'
-  area_code.mode = 'nullable'
-  phone_number_schema.fields.append(area_code)
-
-  number = bigquery.TableFieldSchema()
-  number.name = 'number'
-  number.type = 'integer'
-  number.mode = 'nullable'
-  phone_number_schema.fields.append(number)
-  table_schema.fields.append(phone_number_schema)
-
-  # A repeated field.
-  children_schema = bigquery.TableFieldSchema()
-  children_schema.name = 'children'
-  children_schema.type = 'string'
-  children_schema.mode = 'repeated'
-  table_schema.fields.append(children_schema)
-
-  def create_random_record(record_id):
-return {'kind': 'kind' + record_id, 'fullName': 'fullName'+record_id,
-'age': int(record_id) * 10, 'gender': 'male',
-'phoneNumber': {
-'areaCode': int(record_id) * 100,
-'number': int(record_id) * 10},
-'children': ['child' + record_id + '1',
- 'child' + record_id + '2',
- 'child' + record_id + '3']
-   }
-
-  # pylint: disable=expression-not-assigned
-  record_ids = p | df.Create('CreateIDs', ['1', '2', '3', '4', '5'])
-  records = record_ids | df.Map('CreateRecords', create_random_record)
-  records | df.io.Write(
-  'write',
-  df.io.BigQuerySink(
-  known_args.output,
-  schema=table_schema,
-  create_disposition=df.io.BigQueryDisposition.CREATE_IF_NEEDED,
-  write_disposition=df.io.BigQueryDisposition.WRITE_TRUNCATE))
-
-  # Run the pipeline (all operations are deferred until run() is called).
-  p.run()
-
-
-if __name__ == '__main__':
-  logging.getLogger().setLevel(logging.INFO)
-  run()

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/examples/cookbook/bigquery_side_input.py

[36/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/io/fileio.py
--
diff --git a/sdks/python/apache_beam/io/fileio.py 
b/sdks/python/apache_beam/io/fileio.py
new file mode 100644
index 000..9a003f0
--- /dev/null
+++ b/sdks/python/apache_beam/io/fileio.py
@@ -0,0 +1,747 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""File-based sources and sinks."""
+
+from __future__ import absolute_import
+
+import glob
+import gzip
+import logging
+from multiprocessing.pool import ThreadPool
+import os
+import re
+import shutil
+import tempfile
+import time
+
+from google.cloud.dataflow import coders
+from google.cloud.dataflow.io import iobase
+from google.cloud.dataflow.io import range_trackers
+from google.cloud.dataflow.utils import processes
+from google.cloud.dataflow.utils import retry
+
+
+__all__ = ['TextFileSource', 'TextFileSink']
+
+DEFAULT_SHARD_NAME_TEMPLATE = '-S-of-N'
+
+
+# Retrying is needed because there are transient errors that can happen.
+@retry.with_exponential_backoff(num_retries=4, retry_filter=lambda _: True)
+def _gcs_file_copy(from_path, to_path, encoding=''):
+  """Copy a local file to a GCS location with retries for transient errors."""
+  if not encoding:
+command_args = ['gsutil', '-m', '-q', 'cp', from_path, to_path]
+  else:
+encoding = 'Content-Type:' + encoding
+command_args = ['gsutil', '-m', '-q', '-h', encoding, 'cp', from_path,
+to_path]
+  logging.info('Executing command: %s', command_args)
+  popen = processes.Popen(command_args, stdout=processes.PIPE,
+  stderr=processes.PIPE)
+  stdoutdata, stderrdata = popen.communicate()
+  if popen.returncode != 0:
+raise ValueError(
+'Failed to copy GCS file from %s to %s (stdout=%s, stderr=%s).' % (
+from_path, to_path, stdoutdata, stderrdata))
+
+
+# -
+# TextFileSource, TextFileSink.
+
+
+class TextFileSource(iobase.NativeSource):
+  """A source for a GCS or local text file.
+
+  Parses a text file as newline-delimited elements, by default assuming
+  UTF-8 encoding.
+  """
+
+  def __init__(self, file_path, start_offset=None, end_offset=None,
+   compression_type='AUTO', strip_trailing_newlines=True,
+   coder=coders.StrUtf8Coder()):
+"""Initialize a TextSource.
+
+Args:
+  file_path: The file path to read from as a local file path or a GCS
+gs:// path. The path can contain glob characters (*, ?, and [...]
+sets).
+  start_offset: The byte offset in the source text file that the reader
+should start reading. By default is 0 (beginning of file).
+  end_offset: The byte offset in the file that the reader should stop
+reading. By default it is the end of the file.
+  compression_type: Used to handle compressed input files. Typical value
+  is 'AUTO'.
+  strip_trailing_newlines: Indicates whether this source should remove
+  the newline char in each line it reads before decoding that line.
+  coder: Coder used to decode each line.
+
+Raises:
+  TypeError: if file_path is not a string.
+
+If the file_path contains glob characters then the start_offset and
+end_offset must not be specified.
+
+The 'start_offset' and 'end_offset' pair provide a mechanism to divide the
+text file into multiple pieces for individual sources. Because the offset
+is measured by bytes, some complication arises when the offset splits in
+the middle of a text line. To avoid the scenario where two adjacent sources
+each get a fraction of a line we adopt the following rules:
+
+If start_offset falls inside a line (any character except the firt one)
+then the source will skip the line and start with the next one.
+
+If end_offset falls inside a line (any character except the first one) then
+the source will contain that entire line.
+"""
+if not isinstance(file_path, basestring):
+  raise TypeError(
+  '%s: file_path must be a string;  got %r instead' %
+  (self.__class__.__name__, file_path))
+
+self.file_path = file_path
+self.start_offset = start_offset
+self.end_offset = end_offset
+self.compression_type = compression_type
+

[GitHub] incubator-beam pull request #461: Initial Beam Python SDK

2016-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[34/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/pipeline.py
--
diff --git a/sdks/python/apache_beam/pipeline.py 
b/sdks/python/apache_beam/pipeline.py
new file mode 100644
index 000..ec87f46
--- /dev/null
+++ b/sdks/python/apache_beam/pipeline.py
@@ -0,0 +1,435 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Pipeline, the top-level Dataflow object.
+
+A pipeline holds a DAG of data transforms. Conceptually the nodes of the DAG
+are transforms (PTransform objects) and the edges are values (mostly 
PCollection
+objects). The transforms take as inputs one or more PValues and output one or
+more PValues.
+
+The pipeline offers functionality to traverse the graph.  The actual operation
+to be executed for each node visited is specified through a runner object.
+
+Typical usage:
+
+  # Create a pipeline object using a local runner for execution.
+  pipeline = Pipeline(runner=DirectPipelineRunner())
+
+  # Add to the pipeline a "Create" transform. When executed this
+  # transform will produce a PCollection object with the specified values.
+  pcoll = pipeline.create('label', [1, 2, 3])
+
+  # run() will execute the DAG stored in the pipeline.  The execution of the
+  # nodes visited is done using the specified local runner.
+  pipeline.run()
+
+"""
+
+from __future__ import absolute_import
+
+import collections
+import logging
+import os
+import shutil
+import tempfile
+
+from google.cloud.dataflow import pvalue
+from google.cloud.dataflow import typehints
+from google.cloud.dataflow.internal import pickler
+from google.cloud.dataflow.runners import create_runner
+from google.cloud.dataflow.runners import PipelineRunner
+from google.cloud.dataflow.transforms import format_full_label
+from google.cloud.dataflow.transforms import ptransform
+from google.cloud.dataflow.typehints import TypeCheckError
+from google.cloud.dataflow.utils.options import PipelineOptions
+from google.cloud.dataflow.utils.options import SetupOptions
+from google.cloud.dataflow.utils.options import StandardOptions
+from google.cloud.dataflow.utils.options import TypeOptions
+from google.cloud.dataflow.utils.pipeline_options_validator import 
PipelineOptionsValidator
+
+
+class Pipeline(object):
+  """A pipeline object that manages a DAG of PValues and their PTransforms.
+
+  Conceptually the PValues are the DAG's nodes and the PTransforms computing
+  the PValues are the edges.
+
+  All the transforms applied to the pipeline must have distinct full labels.
+  If same transform instance needs to be applied then a clone should be created
+  with a new label (e.g., transform.clone('new label')).
+  """
+
+  def __init__(self, runner=None, options=None, argv=None):
+"""Initialize a pipeline object.
+
+Args:
+  runner: An object of type 'PipelineRunner' that will be used to execute
+the pipeline. For registered runners, the runner name can be specified,
+otherwise a runner object must be supplied.
+  options: A configured 'PipelineOptions' object containing arguments
+that should be used for running the Dataflow job.
+  argv: a list of arguments (such as sys.argv) to be used for building a
+'PipelineOptions' object. This will only be used if argument 'options'
+is None.
+
+Raises:
+  ValueError: if either the runner or options argument is not of the
+  expected type.
+"""
+
+if options is not None:
+  if isinstance(options, PipelineOptions):
+self.options = options
+  else:
+raise ValueError(
+'Parameter options, if specified, must be of type PipelineOptions. 
'
+'Received : %r', options)
+elif argv is not None:
+  if isinstance(argv, list):
+self.options = PipelineOptions(argv)
+  else:
+raise ValueError(
+'Parameter argv, if specified, must be a list. Received : %r', 
argv)
+else:
+  self.options = None
+
+if runner is None and self.options is not None:
+  runner = self.options.view_as(StandardOptions).runner
+  if runner is None:
+runner = StandardOptions.DEFAULT_RUNNER
+logging.info(('Missing pipeline option (runner). Executing pipeline '
+  'using the default runner: %s.'), runner)
+
+if isinstance(runner, str):
+  runner =

[40/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/clients/storage/storage_v1_messages.py
--
diff --git 
a/sdks/python/apache_beam/internal/clients/storage/storage_v1_messages.py 
b/sdks/python/apache_beam/internal/clients/storage/storage_v1_messages.py
new file mode 100644
index 000..a565acf
--- /dev/null
+++ b/sdks/python/apache_beam/internal/clients/storage/storage_v1_messages.py
@@ -0,0 +1,1903 @@
+"""Generated message classes for storage version v1.
+
+Stores and retrieves potentially large, immutable data objects.
+"""
+# NOTE: This file is autogenerated and should not be edited by hand.
+
+from apitools.base.protorpclite import message_types as _message_types
+from apitools.base.protorpclite import messages as _messages
+from apitools.base.py import encoding
+from apitools.base.py import extra_types
+
+
+package = 'storage'
+
+
+class Bucket(_messages.Message):
+  """A bucket.
+
+  Messages:
+CorsValueListEntry: A CorsValueListEntry object.
+LifecycleValue: The bucket's lifecycle configuration. See lifecycle
+  management for more information.
+LoggingValue: The bucket's logging configuration, which defines the
+  destination bucket and optional name prefix for the current bucket's
+  logs.
+OwnerValue: The owner of the bucket. This is always the project team's
+  owner group.
+VersioningValue: The bucket's versioning configuration.
+WebsiteValue: The bucket's website configuration.
+
+  Fields:
+acl: Access controls on the bucket.
+cors: The bucket's Cross-Origin Resource Sharing (CORS) configuration.
+defaultObjectAcl: Default access controls to apply to new objects when no
+  ACL is provided.
+etag: HTTP 1.1 Entity tag for the bucket.
+id: The ID of the bucket.
+kind: The kind of item this is. For buckets, this is always
+  storage#bucket.
+lifecycle: The bucket's lifecycle configuration. See lifecycle management
+  for more information.
+location: The location of the bucket. Object data for objects in the
+  bucket resides in physical storage within this region. Defaults to US.
+  See the developer's guide for the authoritative list.
+logging: The bucket's logging configuration, which defines the destination
+  bucket and optional name prefix for the current bucket's logs.
+metageneration: The metadata generation of this bucket.
+name: The name of the bucket.
+owner: The owner of the bucket. This is always the project team's owner
+  group.
+projectNumber: The project number of the project the bucket belongs to.
+selfLink: The URI of this bucket.
+storageClass: The bucket's storage class. This defines how objects in the
+  bucket are stored and determines the SLA and the cost of storage. Values
+  include STANDARD, NEARLINE and DURABLE_REDUCED_AVAILABILITY. Defaults to
+  STANDARD. For more information, see storage classes.
+timeCreated: The creation time of the bucket in RFC 3339 format.
+updated: The modification time of the bucket in RFC 3339 format.
+versioning: The bucket's versioning configuration.
+website: The bucket's website configuration.
+  """
+
+  class CorsValueListEntry(_messages.Message):
+"""A CorsValueListEntry object.
+
+Fields:
+  maxAgeSeconds: The value, in seconds, to return in the  Access-Control-
+Max-Age header used in preflight responses.
+  method: The list of HTTP methods on which to include CORS response
+headers, (GET, OPTIONS, POST, etc) Note: "*" is permitted in the list
+of methods, and means "any method".
+  origin: The list of Origins eligible to receive CORS response headers.
+Note: "*" is permitted in the list of origins, and means "any Origin".
+  responseHeader: The list of HTTP headers other than the simple response
+headers to give permission for the user-agent to share across domains.
+"""
+
+maxAgeSeconds = _messages.IntegerField(1, variant=_messages.Variant.INT32)
+method = _messages.StringField(2, repeated=True)
+origin = _messages.StringField(3, repeated=True)
+responseHeader = _messages.StringField(4, repeated=True)
+
+  class LifecycleValue(_messages.Message):
+"""The bucket's lifecycle configuration. See lifecycle management for more
+information.
+
+Messages:
+  RuleValueListEntry: A RuleValueListEntry object.
+
+Fields:
+  rule: A lifecycle management rule, which is made of an action to take
+and the condition(s) under which the action will be taken.
+"""
+
+class RuleValueListEntry(_messages.Message):
+  """A RuleValueListEntry object.
+
+  Messages:
+ActionValue: The action to take.
+ConditionValue: The condition(s) under which the action will be taken.
+
+  Fields:
+action: The action to take.
+

[20/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/clients/bigquery/bigquery_v2_messages.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/internal/clients/bigquery/bigquery_v2_messages.py
 
b/sdks/python/google/cloud/dataflow/internal/clients/bigquery/bigquery_v2_messages.py
deleted file mode 100644
index 36e16c0..000
--- 
a/sdks/python/google/cloud/dataflow/internal/clients/bigquery/bigquery_v2_messages.py
+++ /dev/null
@@ -1,1893 +0,0 @@
-"""Generated message classes for bigquery version v2.
-
-A data platform for customers to create, manage, share and query data.
-"""
-# NOTE: This file is autogenerated and should not be edited by hand.
-
-from apitools.base.protorpclite import messages as _messages
-from apitools.base.py import encoding
-from apitools.base.py import extra_types
-
-
-package = 'bigquery'
-
-
-class BigqueryDatasetsDeleteRequest(_messages.Message):
-  """A BigqueryDatasetsDeleteRequest object.
-
-  Fields:
-datasetId: Dataset ID of dataset being deleted
-deleteContents: If True, delete all the tables in the dataset. If False
-  and the dataset contains tables, the request will fail. Default is False
-projectId: Project ID of the dataset being deleted
-  """
-
-  datasetId = _messages.StringField(1, required=True)
-  deleteContents = _messages.BooleanField(2)
-  projectId = _messages.StringField(3, required=True)
-
-
-class BigqueryDatasetsDeleteResponse(_messages.Message):
-  """An empty BigqueryDatasetsDelete response."""
-
-
-class BigqueryDatasetsGetRequest(_messages.Message):
-  """A BigqueryDatasetsGetRequest object.
-
-  Fields:
-datasetId: Dataset ID of the requested dataset
-projectId: Project ID of the requested dataset
-  """
-
-  datasetId = _messages.StringField(1, required=True)
-  projectId = _messages.StringField(2, required=True)
-
-
-class BigqueryDatasetsInsertRequest(_messages.Message):
-  """A BigqueryDatasetsInsertRequest object.
-
-  Fields:
-dataset: A Dataset resource to be passed as the request body.
-projectId: Project ID of the new dataset
-  """
-
-  dataset = _messages.MessageField('Dataset', 1)
-  projectId = _messages.StringField(2, required=True)
-
-
-class BigqueryDatasetsListRequest(_messages.Message):
-  """A BigqueryDatasetsListRequest object.
-
-  Fields:
-all: Whether to list all datasets, including hidden ones
-maxResults: The maximum number of results to return
-pageToken: Page token, returned by a previous call, to request the next
-  page of results
-projectId: Project ID of the datasets to be listed
-  """
-
-  all = _messages.BooleanField(1)
-  maxResults = _messages.IntegerField(2, variant=_messages.Variant.UINT32)
-  pageToken = _messages.StringField(3)
-  projectId = _messages.StringField(4, required=True)
-
-
-class BigqueryDatasetsPatchRequest(_messages.Message):
-  """A BigqueryDatasetsPatchRequest object.
-
-  Fields:
-dataset: A Dataset resource to be passed as the request body.
-datasetId: Dataset ID of the dataset being updated
-projectId: Project ID of the dataset being updated
-  """
-
-  dataset = _messages.MessageField('Dataset', 1)
-  datasetId = _messages.StringField(2, required=True)
-  projectId = _messages.StringField(3, required=True)
-
-
-class BigqueryDatasetsUpdateRequest(_messages.Message):
-  """A BigqueryDatasetsUpdateRequest object.
-
-  Fields:
-dataset: A Dataset resource to be passed as the request body.
-datasetId: Dataset ID of the dataset being updated
-projectId: Project ID of the dataset being updated
-  """
-
-  dataset = _messages.MessageField('Dataset', 1)
-  datasetId = _messages.StringField(2, required=True)
-  projectId = _messages.StringField(3, required=True)
-
-
-class BigqueryJobsCancelRequest(_messages.Message):
-  """A BigqueryJobsCancelRequest object.
-
-  Fields:
-jobId: [Required] Job ID of the job to cancel
-projectId: [Required] Project ID of the job to cancel
-  """
-
-  jobId = _messages.StringField(1, required=True)
-  projectId = _messages.StringField(2, required=True)
-
-
-class BigqueryJobsGetQueryResultsRequest(_messages.Message):
-  """A BigqueryJobsGetQueryResultsRequest object.
-
-  Fields:
-jobId: [Required] Job ID of the query job
-maxResults: Maximum number of results to read
-pageToken: Page token, returned by a previous call, to request the next
-  page of results
-projectId: [Required] Project ID of the query job
-startIndex: Zero-based index of the starting row
-timeoutMs: How long to wait for the query to complete, in milliseconds,
-  before returning. Default is 10 seconds. If the timeout passes before
-  the job completes, the 'jobComplete' field in the response will be false
-  """
-
-  jobId = _messages.StringField(1, required=True)
-  maxResults = _messages.IntegerField(2, variant=_messages.Variant.UINT32)
-

[02/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/utils/counters.pxd
--
diff --git a/sdks/python/google/cloud/dataflow/utils/counters.pxd 
b/sdks/python/google/cloud/dataflow/utils/counters.pxd
deleted file mode 100644
index 8c5f0b7..000
--- a/sdks/python/google/cloud/dataflow/utils/counters.pxd
+++ /dev/null
@@ -1,27 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the 'License');
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an 'AS IS' BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# cython: profile=True
-# cython: overflowcheck=True
-
-cdef class Counter(object):
-  cdef readonly object name
-  cdef readonly object combine_fn
-  cdef readonly object accumulator
-  cdef readonly object _add_input
-  cpdef bint update(self, value) except -1
-
-
-cdef class AccumulatorCombineFnCounter(Counter):
-  cdef readonly object _fast_add_input

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/utils/counters.py
--
diff --git a/sdks/python/google/cloud/dataflow/utils/counters.py 
b/sdks/python/google/cloud/dataflow/utils/counters.py
deleted file mode 100644
index 78c8598..000
--- a/sdks/python/google/cloud/dataflow/utils/counters.py
+++ /dev/null
@@ -1,180 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the 'License');
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an 'AS IS' BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# cython: profile=False
-# cython: overflowcheck=True
-
-"""Counters collect the progress of the Worker for reporting to the service."""
-
-import threading
-from google.cloud.dataflow.transforms import cy_combiners
-
-
-class Counter(object):
-  """A counter aggregates a series of values.
-
-  The aggregation kind of the Counter is specified when the Counter
-  is created.  The values aggregated must be of an appropriate for the
-  aggregation used.  Aggregations supported are listed in the code.
-
-  (The aggregated value will be reported to the Dataflow service.)
-
-  Do not create directly; call CounterFactory.get_counter instead.
-
-  Attributes:
-name: the name of the counter, a string
-aggregation_kind: one of the aggregation kinds defined by this class.
-total: the total size of all the items passed to update()
-elements: the number of times update() was called
-  """
-
-  # Handy references to common counters.
-  SUM = cy_combiners.SumInt64Fn()
-  MEAN = cy_combiners.MeanInt64Fn()
-
-  def __init__(self, name, combine_fn):
-"""Creates a Counter object.
-
-Args:
-  name: the name of this counter.  Typically has three parts:
-"step-output-counter".
-  combine_fn: the CombineFn to use for aggregation
-"""
-self.name = name
-self.combine_fn = combine_fn
-self.accumulator = combine_fn.create_accumulator()
-self._add_input = self.combine_fn.add_input
-
-  def update(self, value):
-self.accumulator = self._add_input(self.accumulator, value)
-
-  def value(self):
-return self.combine_fn.extract_output(self.accumulator)
-
-  def __str__(self):
-return '<%s>' % self._str_internal()
-
-  def __repr__(self):
-return '<%s at %s>' % (self._str_internal(), hex(id(self)))
-
-  def _str_internal(self):
-return '%s %s %s' % (self.name, self.combine_fn.__class__.__name__,
- self.value())
-
-
-class AccumulatorCombineFnCounter(Counter):
-  """Counter optimized for a mutating accumulator that holds all the logic."""
-
-  def __init__(self, name, combine_fn):
-assert isinstance(combine_fn, cy_combiners.AccumulatorCombineFn)
-super(AccumulatorCombineFnCounter, self).__init__(name, combine_fn)
-self._fast_add_input = self.accumulator.add_input
-
-  def update(self, value):
-self._fast_add_input(value)
-
-
-# Counters that represent Accumulators have names starting with this
-USER_COUNTER_PREFIX = 'user-'
-
-
-class CounterFactory(object):
-  """Keeps track of unique

[24/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/coders/stream_test.py
--
diff --git a/sdks/python/google/cloud/dataflow/coders/stream_test.py 
b/sdks/python/google/cloud/dataflow/coders/stream_test.py
deleted file mode 100644
index 3002116..000
--- a/sdks/python/google/cloud/dataflow/coders/stream_test.py
+++ /dev/null
@@ -1,168 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Tests for the stream implementations."""
-
-import logging
-import math
-import unittest
-
-
-from google.cloud.dataflow.coders import slow_stream
-
-
-class StreamTest(unittest.TestCase):
-  # pylint: disable=invalid-name
-  InputStream = slow_stream.InputStream
-  OutputStream = slow_stream.OutputStream
-  ByteCountingOutputStream = slow_stream.ByteCountingOutputStream
-  # pylint: enable=invalid-name
-
-  def test_read_write(self):
-out_s = self.OutputStream()
-out_s.write('abc')
-out_s.write('\0\t\n')
-out_s.write('xyz', True)
-out_s.write('', True)
-in_s = self.InputStream(out_s.get())
-self.assertEquals('abc\0\t\n', in_s.read(6))
-self.assertEquals('xyz', in_s.read_all(True))
-self.assertEquals('', in_s.read_all(True))
-
-  def test_read_all(self):
-out_s = self.OutputStream()
-out_s.write('abc')
-in_s = self.InputStream(out_s.get())
-self.assertEquals('abc', in_s.read_all(False))
-
-  def test_read_write_byte(self):
-out_s = self.OutputStream()
-out_s.write_byte(1)
-out_s.write_byte(0)
-out_s.write_byte(0xFF)
-in_s = self.InputStream(out_s.get())
-self.assertEquals(1, in_s.read_byte())
-self.assertEquals(0, in_s.read_byte())
-self.assertEquals(0xFF, in_s.read_byte())
-
-  def test_read_write_large(self):
-values = range(4 * 1024)
-out_s = self.OutputStream()
-for v in values:
-  out_s.write_bigendian_int64(v)
-in_s = self.InputStream(out_s.get())
-for v in values:
-  self.assertEquals(v, in_s.read_bigendian_int64())
-
-  def run_read_write_var_int64(self, values):
-out_s = self.OutputStream()
-for v in values:
-  out_s.write_var_int64(v)
-in_s = self.InputStream(out_s.get())
-for v in values:
-  self.assertEquals(v, in_s.read_var_int64())
-
-  def test_small_var_int64(self):
-self.run_read_write_var_int64(range(-10, 30))
-
-  def test_medium_var_int64(self):
-base = -1.7
-self.run_read_write_var_int64(
-[int(base**pow)
-  for pow in range(1, int(63 * math.log(2) / math.log(-base)))])
-
-  def test_large_var_int64(self):
-self.run_read_write_var_int64([0, 2**63 - 1, -2**63, 2**63 - 3])
-
-  def test_read_write_double(self):
-values = 0, 1, -1, 1e100, 1.0/3, math.pi, float('inf')
-out_s = self.OutputStream()
-for v in values:
-  out_s.write_bigendian_double(v)
-in_s = self.InputStream(out_s.get())
-for v in values:
-  self.assertEquals(v, in_s.read_bigendian_double())
-
-  def test_read_write_bigendian_int64(self):
-values = 0, 1, -1, 2**63-1, -2**63, int(2**61 * math.pi)
-out_s = self.OutputStream()
-for v in values:
-  out_s.write_bigendian_int64(v)
-in_s = self.InputStream(out_s.get())
-for v in values:
-  self.assertEquals(v, in_s.read_bigendian_int64())
-
-  def test_read_write_bigendian_int32(self):
-values = 0, 1, -1, 2**31-1, -2**31, int(2**29 * math.pi)
-out_s = self.OutputStream()
-for v in values:
-  out_s.write_bigendian_int32(v)
-in_s = self.InputStream(out_s.get())
-for v in values:
-  self.assertEquals(v, in_s.read_bigendian_int32())
-
-  def test_byte_counting(self):
-bc_s = self.ByteCountingOutputStream()
-self.assertEquals(0, bc_s.get_count())
-bc_s.write('def')
-self.assertEquals(3, bc_s.get_count())
-bc_s.write('')
-self.assertEquals(3, bc_s.get_count())
-bc_s.write_byte(10)
-self.assertEquals(4, bc_s.get_count())
-# "nested" also writes the length of the string, which should
-# cause 1 extra byte to be counted.
-bc_s.write('2345', nested=True)
-self.assertEquals(9, bc_s.get_count())
-bc_s.write_var_int64(63)
-self.assertEquals(10, bc_s.get_count())
-bc_s.write_bigendian_int64(42)
-self.assertEquals(18, bc_s.get_count())
-bc_s.write_bigendian_int32(36)
-self.assertEquals(22, bc_s.get_count())
-

[45/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/apiclient.py
--
diff --git a/sdks/python/apache_beam/internal/apiclient.py 
b/sdks/python/apache_beam/internal/apiclient.py
new file mode 100644
index 000..9fb060d
--- /dev/null
+++ b/sdks/python/apache_beam/internal/apiclient.py
@@ -0,0 +1,935 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Dataflow client utility functions."""
+
+import codecs
+import json
+import logging
+import os
+import re
+import time
+
+
+from google.cloud.dataflow import utils
+from google.cloud.dataflow import version
+from google.cloud.dataflow.internal import pickler
+from google.cloud.dataflow.internal.auth import get_service_credentials
+from google.cloud.dataflow.internal.json_value import to_json_value
+from google.cloud.dataflow.io import iobase
+from google.cloud.dataflow.transforms import cy_combiners
+from google.cloud.dataflow.utils import dependency
+from google.cloud.dataflow.utils import names
+from google.cloud.dataflow.utils import retry
+from google.cloud.dataflow.utils.names import PropertyNames
+from google.cloud.dataflow.utils.options import GoogleCloudOptions
+from google.cloud.dataflow.utils.options import StandardOptions
+from google.cloud.dataflow.utils.options import WorkerOptions
+
+from apitools.base.py import encoding
+from apitools.base.py import exceptions
+from google.cloud.dataflow.internal.clients import storage
+import google.cloud.dataflow.internal.clients.dataflow as dataflow
+
+
+BIGQUERY_API_SERVICE = 'bigquery.googleapis.com'
+COMPUTE_API_SERVICE = 'compute.googleapis.com'
+STORAGE_API_SERVICE = 'storage.googleapis.com'
+
+
+def append_counter(status_object, counter, tentative):
+  """Appends a counter to the status.
+
+  Args:
+status_object: a work_item_status to which to add this counter
+counter: a counters.Counter object to append
+tentative: whether the value should be reported as tentative
+  """
+  logging.debug('Appending counter%s %s',
+' (tentative)' if tentative else '',
+counter)
+  kind, setter = metric_translations[counter.combine_fn.__class__]
+  append_metric(
+  status_object, counter.name, kind, counter.accumulator,
+  setter, tentative=tentative)
+
+
+def append_metric(status_object, metric_name, kind, value, setter=None,
+  step=None, output_user_name=None, tentative=False,
+  worker_id=None, cumulative=True):
+  """Creates and adds a MetricUpdate field to the passed-in protobuf.
+
+  Args:
+status_object: a work_item_status to which to add this metric
+metric_name: a string naming this metric
+kind: dataflow counter kind (e.g. 'sum')
+value: accumulator value to encode
+setter: if not None, a lambda to use to update metric_update with value
+step: the name of the associated step
+output_user_name: the user-visible name to use
+tentative: whether this should be labeled as a tentative metric
+worker_id: the id of this worker.  Specifying a worker_id also
+  causes this to be encoded as a metric, not a counter.
+cumulative: Whether this metric is cumulative, default True.
+  Set to False for a delta value.
+  """
+  # Does this look like a counter or like a metric?
+  is_counter = not worker_id
+
+  metric_update = dataflow.MetricUpdate()
+  metric_update.name = dataflow.MetricStructuredName()
+  metric_update.name.name = metric_name
+  # Handle attributes stored in the name context
+  if step or output_user_name or tentative or worker_id:
+metric_update.name.context = dataflow.MetricStructuredName.ContextValue()
+
+def append_to_context(key, value):
+  metric_update.name.context.additionalProperties.append(
+  dataflow.MetricStructuredName.ContextValue.AdditionalProperty(
+  key=key, value=value))
+if step:
+  append_to_context('step', step)
+if output_user_name:
+  append_to_context('output_user_name', output_user_name)
+if tentative:
+  append_to_context('tentative', 'true')
+if worker_id:
+  append_to_context('workerId', worker_id)
+  if cumulative and is_counter:
+metric_update.cumulative = cumulative
+  if is_counter:
+# Counters are distinguished by having a kind; metrics do not.
+metric_update.kind = kind
+  if setter:
+

[03/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/typehints/typed_pipeline_test.py
--
diff --git a/sdks/python/google/cloud/dataflow/typehints/typed_pipeline_test.py 
b/sdks/python/google/cloud/dataflow/typehints/typed_pipeline_test.py
deleted file mode 100644
index 67362dc..000
--- a/sdks/python/google/cloud/dataflow/typehints/typed_pipeline_test.py
+++ /dev/null
@@ -1,248 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Unit tests for the type-hint objects and decorators."""
-import inspect
-import unittest
-
-
-import google.cloud.dataflow as df
-from google.cloud.dataflow import pvalue
-from google.cloud.dataflow import typehints
-from google.cloud.dataflow.transforms.util import assert_that, equal_to
-from google.cloud.dataflow.typehints import WithTypeHints
-from google.cloud.dataflow.utils.options import OptionsContext
-from google.cloud.dataflow.utils.options import PipelineOptions
-
-# These test often construct a pipeline as value | PTransform to test side
-# effects (e.g. errors).
-# pylint: disable=expression-not-assigned
-
-
-class MainInputTest(unittest.TestCase):
-
-  def test_bad_main_input(self):
-@typehints.with_input_types(str, int)
-def repeat(s, times):
-  return s * times
-with self.assertRaises(typehints.TypeCheckError):
-  [1, 2, 3] | df.Map(repeat, 3)
-
-  def test_non_function(self):
-result = ['a', 'bb', 'c'] | df.Map(str.upper)
-self.assertEqual(['A', 'BB', 'C'], sorted(result))
-
-result = ['xa', 'bbx', 'xcx'] | df.Map(str.strip, 'x')
-self.assertEqual(['a', 'bb', 'c'], sorted(result))
-
-result = ['1', '10', '100'] | df.Map(int)
-self.assertEqual([1, 10, 100], sorted(result))
-
-result = ['1', '10', '100'] | df.Map(int, 16)
-self.assertEqual([1, 16, 256], sorted(result))
-
-with self.assertRaises(typehints.TypeCheckError):
-  [1, 2, 3] | df.Map(str.upper)
-
-  def test_loose_bounds(self):
-@typehints.with_input_types(typehints.Union[int, float, long])
-@typehints.with_output_types(basestring)
-def format_number(x):
-  return '%g' % x
-result = [1, 2, 3] | df.Map(format_number)
-self.assertEqual(['1', '2', '3'], sorted(result))
-
-  def test_typed_dofn_class(self):
-@typehints.with_input_types(int)
-@typehints.with_output_types(str)
-class MyDoFn(df.DoFn):
-  def process(self, context):
-return [str(context.element)]
-
-result = [1, 2, 3] | df.ParDo(MyDoFn())
-self.assertEqual(['1', '2', '3'], sorted(result))
-
-with self.assertRaises(typehints.TypeCheckError):
-  ['a', 'b', 'c'] | df.ParDo(MyDoFn())
-
-with self.assertRaises(typehints.TypeCheckError):
-  [1, 2, 3] | (df.ParDo(MyDoFn()) | df.ParDo('again', MyDoFn()))
-
-  def test_typed_dofn_instance(self):
-class MyDoFn(df.DoFn):
-  def process(self, context):
-return [str(context.element)]
-my_do_fn = MyDoFn().with_input_types(int).with_output_types(str)
-
-result = [1, 2, 3] | df.ParDo(my_do_fn)
-self.assertEqual(['1', '2', '3'], sorted(result))
-
-with self.assertRaises(typehints.TypeCheckError):
-  ['a', 'b', 'c'] | df.ParDo(my_do_fn)
-
-with self.assertRaises(typehints.TypeCheckError):
-  [1, 2, 3] | (df.ParDo(my_do_fn) | df.ParDo('again', my_do_fn))
-
-
-class SideInputTest(unittest.TestCase):
-
-  def _run_repeat_test(self, repeat):
-self._run_repeat_test_good(repeat)
-self._run_repeat_test_bad(repeat)
-
-  @OptionsContext(pipeline_type_check=True)
-  def _run_repeat_test_good(self, repeat):
-# As a positional argument.
-result = ['a', 'bb', 'c'] | df.Map(repeat, 3)
-self.assertEqual(['aaa', 'bb', 'ccc'], sorted(result))
-
-# As a keyword argument.
-result = ['a', 'bb', 'c'] | df.Map(repeat, times=3)
-self.assertEqual(['aaa', 'bb', 'ccc'], sorted(result))
-
-  def _run_repeat_test_bad(self, repeat):
-# Various mismatches.
-with self.assertRaises(typehints.TypeCheckError):
-  ['a', 'bb', 'c'] | df.Map(repeat, 'z')
-with self.assertRaises(typehints.TypeCheckError):
-  ['a', 'bb', 'c'] | df.Map(repeat, times='z')
-with self.assertRaises(typehints.TypeCheckError):
-  ['a', 'bb', 'c'] | df.Map(repeat, 3, 4)
-if not inspect.getargspec(repeat).defaults:
-  with

[26/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/utils/dependency.py
--
diff --git a/sdks/python/apache_beam/utils/dependency.py 
b/sdks/python/apache_beam/utils/dependency.py
new file mode 100644
index 000..5a594f0
--- /dev/null
+++ b/sdks/python/apache_beam/utils/dependency.py
@@ -0,0 +1,439 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Support for installing custom code and required dependencies.
+
+Workflows, with the exception of very simple ones, are organized in multiple
+modules and packages. Typically, these modules and packages have
+dependencies on other standard libraries. Dataflow relies on the Python
+setuptools package to handle these scenarios. For further details please read:
+https://pythonhosted.org/setuptools/setuptools.html
+
+When a runner tries to run a pipeline it will check for a --requirements_file
+and a --setup_file option.
+
+If --setup_file is present then it is assumed that the folder containing the
+file specified by the option has the typical layout required by setuptools and
+it will run 'python setup.py sdist' to produce a source distribution. The
+resulting tarball (a file ending in .tar.gz) will be staged at the GCS staging
+location specified as job option. When a worker starts it will check for the
+presence of this file and will run 'easy_install tarball' to install the
+package in the worker.
+
+If --requirements_file is present then the file specified by the option will be
+staged in the GCS staging location.  When a worker starts it will check for the
+presence of this file and will run 'pip install -r requirements.txt'. A
+requirements file can be easily generated by running 'pip freeze -r
+requirements.txt'. The reason a Dataflow runner does not run this automatically
+is because quite often only a small fraction of the dependencies present in a
+requirements.txt file are actually needed for remote execution and therefore a
+one-time manual trimming is desirable.
+
+TODO(silviuc): Staged files should have a job specific prefix.
+To prevent several jobs in the same project stomping on each other due to a
+shared staging location.
+
+TODO(silviuc): Should we allow several setup packages?
+TODO(silviuc): We should allow customizing the exact command for setup build.
+"""
+
+import glob
+import logging
+import os
+import shutil
+import tempfile
+
+
+from google.cloud.dataflow import utils
+from google.cloud.dataflow.internal import pickler
+from google.cloud.dataflow.utils import names
+from google.cloud.dataflow.utils import processes
+from google.cloud.dataflow.utils.options import GoogleCloudOptions
+from google.cloud.dataflow.utils.options import SetupOptions
+from google.cloud.dataflow.version import __version__
+
+
+# Standard file names used for staging files.
+WORKFLOW_TARBALL_FILE = 'workflow.tar.gz'
+REQUIREMENTS_FILE = 'requirements.txt'
+EXTRA_PACKAGES_FILE = 'extra_packages.txt'
+
+PACKAGES_URL_PREFIX = (
+'https://github.com/GoogleCloudPlatform/DataflowPythonSDK/archive')
+
+
+def _dependency_file_copy(from_path, to_path):
+  """Copies a local file to a GCS file or vice versa."""
+  logging.info('file copy from %s to %s.', from_path, to_path)
+  if from_path.startswith('gs://') or to_path.startswith('gs://'):
+command_args = ['gsutil', '-m', '-q', 'cp', from_path, to_path]
+logging.info('Executing command: %s', command_args)
+result = processes.call(command_args)
+if result != 0:
+  raise ValueError(
+  'Failed to copy GCS file from %s to %s.' % (from_path, to_path))
+  else:
+# Branch used only for unit tests and integration tests.
+# In such environments GCS support is not available.
+if not os.path.isdir(os.path.dirname(to_path)):
+  logging.info('Created folder (since we have not done yet, and any errors 
'
+   'will follow): %s ', os.path.dirname(to_path))
+  os.mkdir(os.path.dirname(to_path))
+shutil.copyfile(from_path, to_path)
+
+
+def _dependency_file_download(from_url, to_folder):
+  """Downloads a file from a URL and returns path to the local file."""
+  # TODO(silviuc): We should cache downloads so we do not do it for every job.
+  try:
+# We check if the file is actually there because wget returns a file
+# even for a 404 response (file will contain the contents of the 404
+#

[08/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/transforms/combiners_test.py
--
diff --git a/sdks/python/google/cloud/dataflow/transforms/combiners_test.py 
b/sdks/python/google/cloud/dataflow/transforms/combiners_test.py
deleted file mode 100644
index b8142ea..000
--- a/sdks/python/google/cloud/dataflow/transforms/combiners_test.py
+++ /dev/null
@@ -1,225 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Unit tests for our libraries of combine PTransforms."""
-
-import unittest
-
-import google.cloud.dataflow as df
-from google.cloud.dataflow.pipeline import Pipeline
-from google.cloud.dataflow.transforms import combiners
-import google.cloud.dataflow.transforms.combiners as combine
-from google.cloud.dataflow.transforms.core import CombineGlobally
-from google.cloud.dataflow.transforms.core import Create
-from google.cloud.dataflow.transforms.core import Map
-from google.cloud.dataflow.transforms.ptransform import PTransform
-from google.cloud.dataflow.transforms.util import assert_that, equal_to
-
-
-class CombineTest(unittest.TestCase):
-
-  def test_builtin_combines(self):
-pipeline = Pipeline('DirectPipelineRunner')
-
-vals = [6, 3, 1, 1, 9, 1, 5, 2, 0, 6]
-mean = sum(vals) / float(len(vals))
-size = len(vals)
-
-# First for global combines.
-pcoll = pipeline | Create('start', vals)
-result_mean = pcoll | combine.Mean.Globally('mean')
-result_count = pcoll | combine.Count.Globally('count')
-assert_that(result_mean, equal_to([mean]), label='assert:mean')
-assert_that(result_count, equal_to([size]), label='assert:size')
-
-# Again for per-key combines.
-pcoll = pipeline | Create('start-perkey', [('a', x) for x in vals])
-result_key_mean = pcoll | combine.Mean.PerKey('mean-perkey')
-result_key_count = pcoll | combine.Count.PerKey('count-perkey')
-assert_that(result_key_mean, equal_to([('a', mean)]), label='key:mean')
-assert_that(result_key_count, equal_to([('a', size)]), label='key:size')
-pipeline.run()
-
-  def test_top(self):
-pipeline = Pipeline('DirectPipelineRunner')
-
-# A parameter we'll be sharing with a custom comparator.
-names = {0: 'zo',
- 1: 'one',
- 2: 'twoo',
- 3: 'three',
- 5: 'fiiive',
- 6: 'six',
- 9: 'nniiinne'}
-
-# First for global combines.
-pcoll = pipeline | Create('start', [6, 3, 1, 1, 9, 1, 5, 2, 0, 6])
-result_top = pcoll | combine.Top.Largest('top', 5)
-result_bot = pcoll | combine.Top.Smallest('bot', 4)
-result_cmp = pcoll | combine.Top.Of(
-'cmp',
-6,
-lambda a, b, names: len(names[a]) < len(names[b]),
-names)  # Note parameter passed to comparator.
-assert_that(result_top, equal_to([[9, 6, 6, 5, 3]]), label='assert:top')
-assert_that(result_bot, equal_to([[0, 1, 1, 1]]), label='assert:bot')
-assert_that(result_cmp, equal_to([[9, 6, 6, 5, 3, 2]]), label='assert:cmp')
-
-# Again for per-key combines.
-pcoll = pipeline | Create(
-'start-perkey', [('a', x) for x in [6, 3, 1, 1, 9, 1, 5, 2, 0, 6]])
-result_key_top = pcoll | combine.Top.LargestPerKey('top-perkey', 5)
-result_key_bot = pcoll | combine.Top.SmallestPerKey('bot-perkey', 4)
-result_key_cmp = pcoll | combine.Top.PerKey(
-'cmp-perkey',
-6,
-lambda a, b, names: len(names[a]) < len(names[b]),
-names)  # Note parameter passed to comparator.
-assert_that(result_key_top, equal_to([('a', [9, 6, 6, 5, 3])]),
-label='key:top')
-assert_that(result_key_bot, equal_to([('a', [0, 1, 1, 1])]),
-label='key:bot')
-assert_that(result_key_cmp, equal_to([('a', [9, 6, 6, 5, 3, 2])]),
-label='key:cmp')
-pipeline.run()
-
-  def test_top_shorthands(self):
-pipeline = Pipeline('DirectPipelineRunner')
-
-pcoll = pipeline | Create('start', [6, 3, 1, 1, 9, 1, 5, 2, 0, 6])
-result_top = pcoll | df.CombineGlobally('top', combiners.Largest(5))
-result_bot = pcoll | df.CombineGlobally('bot', combiners.Smallest(4))
-assert_that(result_top, equal_to([[9, 6, 6, 5, 3]]), label='assert:top')
-assert_that(result_bot, equal_to([[0, 1, 1, 1]]), label='assert:bot')
-
-pcoll = pipeline | Create(
-

[12/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/io/fileio.py
--
diff --git a/sdks/python/google/cloud/dataflow/io/fileio.py 
b/sdks/python/google/cloud/dataflow/io/fileio.py
deleted file mode 100644
index 9a003f0..000
--- a/sdks/python/google/cloud/dataflow/io/fileio.py
+++ /dev/null
@@ -1,747 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""File-based sources and sinks."""
-
-from __future__ import absolute_import
-
-import glob
-import gzip
-import logging
-from multiprocessing.pool import ThreadPool
-import os
-import re
-import shutil
-import tempfile
-import time
-
-from google.cloud.dataflow import coders
-from google.cloud.dataflow.io import iobase
-from google.cloud.dataflow.io import range_trackers
-from google.cloud.dataflow.utils import processes
-from google.cloud.dataflow.utils import retry
-
-
-__all__ = ['TextFileSource', 'TextFileSink']
-
-DEFAULT_SHARD_NAME_TEMPLATE = '-S-of-N'
-
-
-# Retrying is needed because there are transient errors that can happen.
-@retry.with_exponential_backoff(num_retries=4, retry_filter=lambda _: True)
-def _gcs_file_copy(from_path, to_path, encoding=''):
-  """Copy a local file to a GCS location with retries for transient errors."""
-  if not encoding:
-command_args = ['gsutil', '-m', '-q', 'cp', from_path, to_path]
-  else:
-encoding = 'Content-Type:' + encoding
-command_args = ['gsutil', '-m', '-q', '-h', encoding, 'cp', from_path,
-to_path]
-  logging.info('Executing command: %s', command_args)
-  popen = processes.Popen(command_args, stdout=processes.PIPE,
-  stderr=processes.PIPE)
-  stdoutdata, stderrdata = popen.communicate()
-  if popen.returncode != 0:
-raise ValueError(
-'Failed to copy GCS file from %s to %s (stdout=%s, stderr=%s).' % (
-from_path, to_path, stdoutdata, stderrdata))
-
-
-# -
-# TextFileSource, TextFileSink.
-
-
-class TextFileSource(iobase.NativeSource):
-  """A source for a GCS or local text file.
-
-  Parses a text file as newline-delimited elements, by default assuming
-  UTF-8 encoding.
-  """
-
-  def __init__(self, file_path, start_offset=None, end_offset=None,
-   compression_type='AUTO', strip_trailing_newlines=True,
-   coder=coders.StrUtf8Coder()):
-"""Initialize a TextSource.
-
-Args:
-  file_path: The file path to read from as a local file path or a GCS
-gs:// path. The path can contain glob characters (*, ?, and [...]
-sets).
-  start_offset: The byte offset in the source text file that the reader
-should start reading. By default is 0 (beginning of file).
-  end_offset: The byte offset in the file that the reader should stop
-reading. By default it is the end of the file.
-  compression_type: Used to handle compressed input files. Typical value
-  is 'AUTO'.
-  strip_trailing_newlines: Indicates whether this source should remove
-  the newline char in each line it reads before decoding that line.
-  coder: Coder used to decode each line.
-
-Raises:
-  TypeError: if file_path is not a string.
-
-If the file_path contains glob characters then the start_offset and
-end_offset must not be specified.
-
-The 'start_offset' and 'end_offset' pair provide a mechanism to divide the
-text file into multiple pieces for individual sources. Because the offset
-is measured by bytes, some complication arises when the offset splits in
-the middle of a text line. To avoid the scenario where two adjacent sources
-each get a fraction of a line we adopt the following rules:
-
-If start_offset falls inside a line (any character except the firt one)
-then the source will skip the line and start with the next one.
-
-If end_offset falls inside a line (any character except the first one) then
-the source will contain that entire line.
-"""
-if not isinstance(file_path, basestring):
-  raise TypeError(
-  '%s: file_path must be a string;  got %r instead' %
-  (self.__class__.__name__, file_path))
-
-self.file_path = file_path
-self.start_offset = start_offset
-self.end_offset = end_offset
-

[38/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/windmill_pb2.py
--
diff --git a/sdks/python/apache_beam/internal/windmill_pb2.py 
b/sdks/python/apache_beam/internal/windmill_pb2.py
new file mode 100644
index 000..549e54e
--- /dev/null
+++ b/sdks/python/apache_beam/internal/windmill_pb2.py
@@ -0,0 +1,2275 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Generated by the protocol buffer compiler.  DO NOT EDIT!
+# source: windmill.proto
+
+import sys
+_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
+from google.protobuf import descriptor as _descriptor
+from google.protobuf import message as _message
+from google.protobuf import reflection as _reflection
+from google.protobuf import symbol_database as _symbol_database
+from google.protobuf import descriptor_pb2
+# @@protoc_insertion_point(imports)
+
+_sym_db = _symbol_database.Default()
+
+
+
+
+DESCRIPTOR = _descriptor.FileDescriptor(
+  name='windmill.proto',
+  package='windmill',
+  syntax='proto2',
+  
serialized_pb=_b('\n\x0ewindmill.proto\x12\x08windmill\"R\n\x07Message\x12\'\n\ttimestamp\x18\x01
 \x02(\x03:\x14-9223372036854775808\x12\x0c\n\x04\x64\x61ta\x18\x02 
\x02(\x0c\x12\x10\n\x08metadata\x18\x03 
\x01(\x0c\"\xbf\x01\n\x05Timer\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12\'\n\ttimestamp\x18\x02 
\x01(\x03:\x14-9223372036854775808\x12-\n\x04type\x18\x03 
\x01(\x0e\x32\x14.windmill.Timer.Type:\tWATERMARK\x12\x14\n\x0cstate_family\x18\x04
 
\x01(\t\";\n\x04Type\x12\r\n\tWATERMARK\x10\x00\x12\x0c\n\x08REALTIME\x10\x01\x12\x16\n\x12\x44\x45PENDENT_REALTIME\x10\x02\"X\n\x12InputMessageBundle\x12\x1d\n\x15source_computation_id\x18\x01
 \x02(\t\x12#\n\x08messages\x18\x02 
\x03(\x0b\x32\x11.windmill.Message\"r\n\x12KeyedMessageBundle\x12\x0b\n\x03key\x18\x01
 \x02(\x0c\x12\x14\n\x0csharding_key\x18\x04 
\x01(\x06\x12#\n\x08messages\x18\x02 
\x03(\x0b\x32\x11.windmill.Message\x12\x14\n\x0cmessages_ids\x18\x03 
\x03(\x0c\"\x87\x01\n\x13OutputMessageBundle\x12\"\n\x1a\x64\x65stination_computation_id\
 x18\x01 \x01(\t\x12\x1d\n\x15\x64\x65stination_stream_id\x18\x03 
\x01(\t\x12-\n\x07\x62undles\x18\x02 
\x03(\x0b\x32\x1c.windmill.KeyedMessageBundle\"t\n\x13PubSubMessageBundle\x12\r\n\x05topic\x18\x01
 \x02(\t\x12#\n\x08messages\x18\x02 
\x03(\x0b\x32\x11.windmill.Message\x12\x17\n\x0ftimestamp_label\x18\x03 
\x01(\t\x12\x10\n\x08id_label\x18\x04 
\x01(\t\".\n\x0bTimerBundle\x12\x1f\n\x06timers\x18\x01 
\x03(\x0b\x32\x0f.windmill.Timer\">\n\x05Value\x12\'\n\ttimestamp\x18\x01 
\x02(\x03:\x14-9223372036854775808\x12\x0c\n\x04\x64\x61ta\x18\x02 
\x02(\x0c\"M\n\x08TagValue\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12\x1e\n\x05value\x18\x02 
\x01(\x0b\x32\x0f.windmill.Value\x12\x14\n\x0cstate_family\x18\x03 
\x01(\t\"\xdb\x01\n\x07TagList\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12+\n\rend_timestamp\x18\x02 
\x01(\x03:\x14-9223372036854775808\x12\x1f\n\x06values\x18\x03 
\x03(\x0b\x32\x0f.windmill.Value\x12\x14\n\x0cstate_family\x18\x04 
\x01(\t\x12\x15\n\rrequest_token\x18\x07 \x01(\x0c\x12\x1a\n\x12\x63onti
 nuation_token\x18\x05 \x01(\x0c\x12,\n\x0f\x66\x65tch_max_bytes\x18\x06 
\x01(\x03:\x13\x39\x32\x32\x33\x33\x37\x32\x30\x33\x36\x38\x35\x34\x37\x37\x35\x38\x30\x37\",\n\x0cGlobalDataId\x12\x0b\n\x03tag\x18\x01
 \x02(\t\x12\x0f\n\x07version\x18\x02 
\x02(\x0c\"k\n\nGlobalData\x12\'\n\x07\x64\x61ta_id\x18\x01 
\x02(\x0b\x32\x16.windmill.GlobalDataId\x12\x10\n\x08is_ready\x18\x02 
\x01(\x08\x12\x0c\n\x04\x64\x61ta\x18\x03 
\x01(\x0c\x12\x14\n\x0cstate_family\x18\x04 
\x01(\t\"I\n\x0bSourceState\x12\r\n\x05state\x18\x01 
\x01(\x0c\x12\x14\n\x0c\x66inalize_ids\x18\x02 
\x03(\x06\x12\x15\n\ronly_finalize\x18\x03 
\x01(\x08\"Y\n\rWatermarkHold\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12\x16\n\ntimestamps\x18\x02 
\x03(\x03\x42\x02\x10\x01\x12\r\n\x05reset\x18\x03 
\x01(\x08\x12\x14\n\x0cstate_family\x18\x04 
\x01(\t\"\xd4\x02\n\x08WorkItem\x12\x0b\n\x03key\x18\x01 
\x02(\x0c\x12\x12\n\nwork_token\x18\x02 
\x02(\x06\x12\x14\n\x0csharding_key\x18\t 
\x01(\x06\x12\x13\n\x0b\x63\x61\x63he_token\x18\x07 \x01(\x06\x
 12\x35\n\x0fmessage_bundles\x18\x03 
\x03(\x0b\x32\x1c.windmill.InputMessageBundle\x12%\n\x06timers\x18\x04 
\x01(\x0b\x32\x15.windmill.TimerBundle\x12<\n\x1cglobal_data_id_notifications\x18\x05
 \x03(\x0b\x32\x16.windmill.GlobalDataId\x12+\n\x0csource_state\x18\x06

[28/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/transforms/window_test.py
--
diff --git a/sdks/python/apache_beam/transforms/window_test.py 
b/sdks/python/apache_beam/transforms/window_test.py
new file mode 100644
index 000..155239f
--- /dev/null
+++ b/sdks/python/apache_beam/transforms/window_test.py
@@ -0,0 +1,201 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for the windowing classes."""
+
+import unittest
+
+from google.cloud.dataflow.pipeline import Pipeline
+from google.cloud.dataflow.transforms import CombinePerKey
+from google.cloud.dataflow.transforms import combiners
+from google.cloud.dataflow.transforms import core
+from google.cloud.dataflow.transforms import Create
+from google.cloud.dataflow.transforms import GroupByKey
+from google.cloud.dataflow.transforms import Map
+from google.cloud.dataflow.transforms import window
+from google.cloud.dataflow.transforms import WindowInto
+from google.cloud.dataflow.transforms.util import assert_that, equal_to
+from google.cloud.dataflow.transforms.window import FixedWindows
+from google.cloud.dataflow.transforms.window import IntervalWindow
+from google.cloud.dataflow.transforms.window import Sessions
+from google.cloud.dataflow.transforms.window import SlidingWindows
+from google.cloud.dataflow.transforms.window import TimestampedValue
+from google.cloud.dataflow.transforms.window import WindowedValue
+from google.cloud.dataflow.transforms.window import WindowFn
+
+
+def context(element, timestamp, windows):
+  return WindowFn.AssignContext(timestamp, element, windows)
+
+
+sort_values = Map(lambda (k, vs): (k, sorted(vs)))
+
+
+class ReifyWindowsFn(core.DoFn):
+  def process(self, context):
+key, values = context.element
+for window in context.windows:
+  yield "%s @ %s" % (key, window), values
+reify_windows = core.ParDo(ReifyWindowsFn())
+
+class WindowTest(unittest.TestCase):
+
+  def test_fixed_windows(self):
+# Test windows with offset: 2, 7, 12, 17, ...
+windowfn = window.FixedWindows(size=5, offset=2)
+self.assertEqual([window.IntervalWindow(7, 12)],
+ windowfn.assign(context('v', 7, [])))
+self.assertEqual([window.IntervalWindow(7, 12)],
+ windowfn.assign(context('v', 11, [])))
+self.assertEqual([window.IntervalWindow(12, 17)],
+ windowfn.assign(context('v', 12, [])))
+
+# Test windows without offset: 0, 5, 10, 15, ...
+windowfn = window.FixedWindows(size=5)
+self.assertEqual([window.IntervalWindow(5, 10)],
+ windowfn.assign(context('v', 5, [])))
+self.assertEqual([window.IntervalWindow(5, 10)],
+ windowfn.assign(context('v', 9, [])))
+self.assertEqual([window.IntervalWindow(10, 15)],
+ windowfn.assign(context('v', 10, [])))
+
+# Test windows with offset out of range.
+windowfn = window.FixedWindows(size=5, offset=12)
+self.assertEqual([window.IntervalWindow(7, 12)],
+ windowfn.assign(context('v', 11, [])))
+
+  def test_sliding_windows_assignment(self):
+windowfn = SlidingWindows(size=15, period=5, offset=2)
+expected = [IntervalWindow(7, 22),
+IntervalWindow(2, 17),
+IntervalWindow(-3, 12)]
+self.assertEqual(expected, windowfn.assign(context('v', 7, [])))
+self.assertEqual(expected, windowfn.assign(context('v', 8, [])))
+self.assertEqual(expected, windowfn.assign(context('v', 11, [])))
+
+  def test_sessions_merging(self):
+windowfn = Sessions(10)
+
+def merge(*timestamps):
+  windows = [windowfn.assign(context(None, t, [])) for t in timestamps]
+  running = set()
+
+  class TestMergeContext(WindowFn.MergeContext):
+
+def __init__(self):
+  super(TestMergeContext, self).__init__(running)
+
+def merge(self, to_be_merged, merge_result):
+  for w in to_be_merged:
+if w in running:
+  running.remove(w)
+  running.add(merge_result)
+
+  for ws in windows:
+running.update(ws)
+windowfn.merge(TestMergeContext())
+  windowfn.merge(TestMergeContext())
+  return sorted(running)
+
+self.assertEqual([IntervalWindow(2, 12)], merge(2))
+self.assertEqual([IntervalWindow(2, 12), IntervalWindow(19,

[35/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/io/iobase.py
--
diff --git a/sdks/python/apache_beam/io/iobase.py 
b/sdks/python/apache_beam/io/iobase.py
new file mode 100644
index 000..26ebeb5
--- /dev/null
+++ b/sdks/python/apache_beam/io/iobase.py
@@ -0,0 +1,1073 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Sources and sinks.
+
+A Source manages record-oriented data input from a particular kind of source
+(e.g. a set of files, a database table, etc.). The reader() method of a source
+returns a reader object supporting the iterator protocol; iteration yields
+raw records of unprocessed, serialized data.
+
+
+A Sink manages record-oriented data output to a particular kind of sink
+(e.g. a set of files, a database table, etc.). The writer() method of a sink
+returns a writer object supporting writing records of serialized data to
+the sink.
+"""
+
+from collections import namedtuple
+
+import logging
+import random
+import uuid
+
+from google.cloud.dataflow import pvalue
+from google.cloud.dataflow.coders import PickleCoder
+from google.cloud.dataflow.pvalue import AsIter
+from google.cloud.dataflow.pvalue import AsSingleton
+from google.cloud.dataflow.transforms import core
+from google.cloud.dataflow.transforms import ptransform
+from google.cloud.dataflow.transforms import window
+
+
+def _dict_printable_fields(dict_object, skip_fields):
+  """Returns a list of strings for the interesting fields of a dict."""
+  return ['%s=%r' % (name, value)
+  for name, value in dict_object.iteritems()
+  # want to output value 0 but not None nor []
+  if (value or value == 0)
+  and name not in skip_fields]
+
+_minor_fields = ['coder', 'key_coder', 'value_coder',
+ 'config_bytes', 'elements',
+ 'append_trailing_newlines', 'strip_trailing_newlines',
+ 'compression_type']
+
+
+class NativeSource(object):
+  """A source implemented by Dataflow service.
+
+  This class is to be only inherited by sources natively implemented by Cloud
+  Dataflow service, hence should not be sub-classed by users.
+
+  This class is deprecated and should not be used to define new sources.
+  """
+
+  def reader(self):
+"""Returns a NativeSourceReader instance associated with this source."""
+raise NotImplementedError
+
+  def __repr__(self):
+return '<{name} {vals}>'.format(
+name=self.__class__.__name__,
+vals=', '.join(_dict_printable_fields(self.__dict__,
+  _minor_fields)))
+
+
+class NativeSourceReader(object):
+  """A reader for a source implemented by Dataflow service."""
+
+  def __enter__(self):
+"""Opens everything necessary for a reader to function properly."""
+raise NotImplementedError
+
+  def __exit__(self, exception_type, exception_value, traceback):
+"""Cleans up after a reader executed."""
+raise NotImplementedError
+
+  def __iter__(self):
+"""Returns an iterator over all the records of the source."""
+raise NotImplementedError
+
+  @property
+  def returns_windowed_values(self):
+"""Returns whether this reader returns windowed values."""
+return False
+
+  def get_progress(self):
+"""Returns a representation of how far the reader has read.
+
+Returns:
+  A SourceReaderProgress object that gives the current progress of the
+  reader.
+"""
+return
+
+  def request_dynamic_split(self, dynamic_split_request):
+"""Attempts to split the input in two parts.
+
+The two parts are named the "primary" part and the "residual" part. The
+current 'NativeSourceReader' keeps processing the primary part, while the
+residual part will be processed elsewhere (e.g. perhaps on a different
+worker).
+
+The primary and residual parts, if concatenated, must represent the
+same input as the current input of this 'NativeSourceReader' before this
+call.
+
+The boundary between the primary part and the residual part is
+specified in a framework-specific way using 'DynamicSplitRequest' e.g.,
+if the framework supports the notion of positions, it might be a
+position at which the input is asked to split itself (which is not
+necessarily the same position at which it *will* split itself); it
+might be an

[44/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/clients/bigquery/bigquery_v2_messages.py
--
diff --git 
a/sdks/python/apache_beam/internal/clients/bigquery/bigquery_v2_messages.py 
b/sdks/python/apache_beam/internal/clients/bigquery/bigquery_v2_messages.py
new file mode 100644
index 000..36e16c0
--- /dev/null
+++ b/sdks/python/apache_beam/internal/clients/bigquery/bigquery_v2_messages.py
@@ -0,0 +1,1893 @@
+"""Generated message classes for bigquery version v2.
+
+A data platform for customers to create, manage, share and query data.
+"""
+# NOTE: This file is autogenerated and should not be edited by hand.
+
+from apitools.base.protorpclite import messages as _messages
+from apitools.base.py import encoding
+from apitools.base.py import extra_types
+
+
+package = 'bigquery'
+
+
+class BigqueryDatasetsDeleteRequest(_messages.Message):
+  """A BigqueryDatasetsDeleteRequest object.
+
+  Fields:
+datasetId: Dataset ID of dataset being deleted
+deleteContents: If True, delete all the tables in the dataset. If False
+  and the dataset contains tables, the request will fail. Default is False
+projectId: Project ID of the dataset being deleted
+  """
+
+  datasetId = _messages.StringField(1, required=True)
+  deleteContents = _messages.BooleanField(2)
+  projectId = _messages.StringField(3, required=True)
+
+
+class BigqueryDatasetsDeleteResponse(_messages.Message):
+  """An empty BigqueryDatasetsDelete response."""
+
+
+class BigqueryDatasetsGetRequest(_messages.Message):
+  """A BigqueryDatasetsGetRequest object.
+
+  Fields:
+datasetId: Dataset ID of the requested dataset
+projectId: Project ID of the requested dataset
+  """
+
+  datasetId = _messages.StringField(1, required=True)
+  projectId = _messages.StringField(2, required=True)
+
+
+class BigqueryDatasetsInsertRequest(_messages.Message):
+  """A BigqueryDatasetsInsertRequest object.
+
+  Fields:
+dataset: A Dataset resource to be passed as the request body.
+projectId: Project ID of the new dataset
+  """
+
+  dataset = _messages.MessageField('Dataset', 1)
+  projectId = _messages.StringField(2, required=True)
+
+
+class BigqueryDatasetsListRequest(_messages.Message):
+  """A BigqueryDatasetsListRequest object.
+
+  Fields:
+all: Whether to list all datasets, including hidden ones
+maxResults: The maximum number of results to return
+pageToken: Page token, returned by a previous call, to request the next
+  page of results
+projectId: Project ID of the datasets to be listed
+  """
+
+  all = _messages.BooleanField(1)
+  maxResults = _messages.IntegerField(2, variant=_messages.Variant.UINT32)
+  pageToken = _messages.StringField(3)
+  projectId = _messages.StringField(4, required=True)
+
+
+class BigqueryDatasetsPatchRequest(_messages.Message):
+  """A BigqueryDatasetsPatchRequest object.
+
+  Fields:
+dataset: A Dataset resource to be passed as the request body.
+datasetId: Dataset ID of the dataset being updated
+projectId: Project ID of the dataset being updated
+  """
+
+  dataset = _messages.MessageField('Dataset', 1)
+  datasetId = _messages.StringField(2, required=True)
+  projectId = _messages.StringField(3, required=True)
+
+
+class BigqueryDatasetsUpdateRequest(_messages.Message):
+  """A BigqueryDatasetsUpdateRequest object.
+
+  Fields:
+dataset: A Dataset resource to be passed as the request body.
+datasetId: Dataset ID of the dataset being updated
+projectId: Project ID of the dataset being updated
+  """
+
+  dataset = _messages.MessageField('Dataset', 1)
+  datasetId = _messages.StringField(2, required=True)
+  projectId = _messages.StringField(3, required=True)
+
+
+class BigqueryJobsCancelRequest(_messages.Message):
+  """A BigqueryJobsCancelRequest object.
+
+  Fields:
+jobId: [Required] Job ID of the job to cancel
+projectId: [Required] Project ID of the job to cancel
+  """
+
+  jobId = _messages.StringField(1, required=True)
+  projectId = _messages.StringField(2, required=True)
+
+
+class BigqueryJobsGetQueryResultsRequest(_messages.Message):
+  """A BigqueryJobsGetQueryResultsRequest object.
+
+  Fields:
+jobId: [Required] Job ID of the query job
+maxResults: Maximum number of results to read
+pageToken: Page token, returned by a previous call, to request the next
+  page of results
+projectId: [Required] Project ID of the query job
+startIndex: Zero-based index of the starting row
+timeoutMs: How long to wait for the query to complete, in milliseconds,
+  before returning. Default is 10 seconds. If the timeout passes before
+  the job completes, the 'jobComplete' field in the response will be false
+  """
+
+  jobId = _messages.StringField(1, required=True)
+  maxResults = _messages.IntegerField(2, variant=_messages.Variant.UINT32)
+  pageToken = _messages.StringField(3)
+  projectId =

[16/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_messages.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_messages.py
 
b/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_messages.py
deleted file mode 100644
index a565acf..000
--- 
a/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_messages.py
+++ /dev/null
@@ -1,1903 +0,0 @@
-"""Generated message classes for storage version v1.
-
-Stores and retrieves potentially large, immutable data objects.
-"""
-# NOTE: This file is autogenerated and should not be edited by hand.
-
-from apitools.base.protorpclite import message_types as _message_types
-from apitools.base.protorpclite import messages as _messages
-from apitools.base.py import encoding
-from apitools.base.py import extra_types
-
-
-package = 'storage'
-
-
-class Bucket(_messages.Message):
-  """A bucket.
-
-  Messages:
-CorsValueListEntry: A CorsValueListEntry object.
-LifecycleValue: The bucket's lifecycle configuration. See lifecycle
-  management for more information.
-LoggingValue: The bucket's logging configuration, which defines the
-  destination bucket and optional name prefix for the current bucket's
-  logs.
-OwnerValue: The owner of the bucket. This is always the project team's
-  owner group.
-VersioningValue: The bucket's versioning configuration.
-WebsiteValue: The bucket's website configuration.
-
-  Fields:
-acl: Access controls on the bucket.
-cors: The bucket's Cross-Origin Resource Sharing (CORS) configuration.
-defaultObjectAcl: Default access controls to apply to new objects when no
-  ACL is provided.
-etag: HTTP 1.1 Entity tag for the bucket.
-id: The ID of the bucket.
-kind: The kind of item this is. For buckets, this is always
-  storage#bucket.
-lifecycle: The bucket's lifecycle configuration. See lifecycle management
-  for more information.
-location: The location of the bucket. Object data for objects in the
-  bucket resides in physical storage within this region. Defaults to US.
-  See the developer's guide for the authoritative list.
-logging: The bucket's logging configuration, which defines the destination
-  bucket and optional name prefix for the current bucket's logs.
-metageneration: The metadata generation of this bucket.
-name: The name of the bucket.
-owner: The owner of the bucket. This is always the project team's owner
-  group.
-projectNumber: The project number of the project the bucket belongs to.
-selfLink: The URI of this bucket.
-storageClass: The bucket's storage class. This defines how objects in the
-  bucket are stored and determines the SLA and the cost of storage. Values
-  include STANDARD, NEARLINE and DURABLE_REDUCED_AVAILABILITY. Defaults to
-  STANDARD. For more information, see storage classes.
-timeCreated: The creation time of the bucket in RFC 3339 format.
-updated: The modification time of the bucket in RFC 3339 format.
-versioning: The bucket's versioning configuration.
-website: The bucket's website configuration.
-  """
-
-  class CorsValueListEntry(_messages.Message):
-"""A CorsValueListEntry object.
-
-Fields:
-  maxAgeSeconds: The value, in seconds, to return in the  Access-Control-
-Max-Age header used in preflight responses.
-  method: The list of HTTP methods on which to include CORS response
-headers, (GET, OPTIONS, POST, etc) Note: "*" is permitted in the list
-of methods, and means "any method".
-  origin: The list of Origins eligible to receive CORS response headers.
-Note: "*" is permitted in the list of origins, and means "any Origin".
-  responseHeader: The list of HTTP headers other than the simple response
-headers to give permission for the user-agent to share across domains.
-"""
-
-maxAgeSeconds = _messages.IntegerField(1, variant=_messages.Variant.INT32)
-method = _messages.StringField(2, repeated=True)
-origin = _messages.StringField(3, repeated=True)
-responseHeader = _messages.StringField(4, repeated=True)
-
-  class LifecycleValue(_messages.Message):
-"""The bucket's lifecycle configuration. See lifecycle management for more
-information.
-
-Messages:
-  RuleValueListEntry: A RuleValueListEntry object.
-
-Fields:
-  rule: A lifecycle management rule, which is made of an action to take
-and the condition(s) under which the action will be taken.
-"""
-
-class RuleValueListEntry(_messages.Message):
-  """A RuleValueListEntry object.
-
-  Messages:
-ActionValue: The action to take.
-ConditionValue: The condition(s) under which the action will be taken.
-
-  Fields:
-

[41/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/clients/storage/__init__.py
--
diff --git a/sdks/python/apache_beam/internal/clients/storage/__init__.py 
b/sdks/python/apache_beam/internal/clients/storage/__init__.py
new file mode 100644
index 000..15b1524
--- /dev/null
+++ b/sdks/python/apache_beam/internal/clients/storage/__init__.py
@@ -0,0 +1,10 @@
+"""Common imports for generated storage client library."""
+# pylint:disable=wildcard-import
+
+import pkgutil
+
+from apitools.base.py import *
+from google.cloud.dataflow.internal.clients.storage.storage_v1_client import *
+from google.cloud.dataflow.internal.clients.storage.storage_v1_messages import 
*
+
+__path__ = pkgutil.extend_path(__path__, __name__)

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/clients/storage/storage_v1_client.py
--
diff --git 
a/sdks/python/apache_beam/internal/clients/storage/storage_v1_client.py 
b/sdks/python/apache_beam/internal/clients/storage/storage_v1_client.py
new file mode 100644
index 000..c8255c2
--- /dev/null
+++ b/sdks/python/apache_beam/internal/clients/storage/storage_v1_client.py
@@ -0,0 +1,1021 @@
+"""Generated client library for storage version v1."""
+# NOTE: This file is autogenerated and should not be edited by hand.
+from apitools.base.py import base_api
+from google.cloud.dataflow.internal.clients.storage import storage_v1_messages 
as messages
+
+
+class StorageV1(base_api.BaseApiClient):
+  """Generated client library for service storage version v1."""
+
+  MESSAGES_MODULE = messages
+
+  _PACKAGE = u'storage'
+  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/cloud-platform.read-only', 
u'https://www.googleapis.com/auth/devstorage.full_control', 
u'https://www.googleapis.com/auth/devstorage.read_only', 
u'https://www.googleapis.com/auth/devstorage.read_write']
+  _VERSION = u'v1'
+  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
+  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _CLIENT_CLASS_NAME = u'StorageV1'
+  _URL_VERSION = u'v1'
+  _API_KEY = None
+
+  def __init__(self, url='', credentials=None,
+   get_credentials=True, http=None, model=None,
+   log_request=False, log_response=False,
+   credentials_args=None, default_global_params=None,
+   additional_http_headers=None):
+"""Create a new storage handle."""
+url = url or u'https://www.googleapis.com/storage/v1/'
+super(StorageV1, self).__init__(
+url, credentials=credentials,
+get_credentials=get_credentials, http=http, model=model,
+log_request=log_request, log_response=log_response,
+credentials_args=credentials_args,
+default_global_params=default_global_params,
+additional_http_headers=additional_http_headers)
+self.bucketAccessControls = self.BucketAccessControlsService(self)
+self.buckets = self.BucketsService(self)
+self.channels = self.ChannelsService(self)
+self.defaultObjectAccessControls = 
self.DefaultObjectAccessControlsService(self)
+self.objectAccessControls = self.ObjectAccessControlsService(self)
+self.objects = self.ObjectsService(self)
+
+  class BucketAccessControlsService(base_api.BaseApiService):
+"""Service class for the bucketAccessControls resource."""
+
+_NAME = u'bucketAccessControls'
+
+def __init__(self, client):
+  super(StorageV1.BucketAccessControlsService, self).__init__(client)
+  self._method_configs = {
+  'Delete': base_api.ApiMethodInfo(
+  http_method=u'DELETE',
+  method_id=u'storage.bucketAccessControls.delete',
+  ordered_params=[u'bucket', u'entity'],
+  path_params=[u'bucket', u'entity'],
+  query_params=[],
+  relative_path=u'b/{bucket}/acl/{entity}',
+  request_field='',
+  request_type_name=u'StorageBucketAccessControlsDeleteRequest',
+  response_type_name=u'StorageBucketAccessControlsDeleteResponse',
+  supports_download=False,
+  ),
+  'Get': base_api.ApiMethodInfo(
+  http_method=u'GET',
+  method_id=u'storage.bucketAccessControls.get',
+  ordered_params=[u'bucket', u'entity'],
+  path_params=[u'bucket', u'entity'],
+  query_params=[],
+  relative_path=u'b/{bucket}/acl/{entity}',
+  request_field='',
+  request_type_name=u'StorageBucketAccessControlsGetRequest',
+  response_type_name=u'BucketAccessControl',
+  supports_download=False,
+  ),
+  'Insert': base_api.ApiMethodInfo(
+  http_method=u'POST',
+

[48/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/coders/typecoders.py
--
diff --git a/sdks/python/apache_beam/coders/typecoders.py 
b/sdks/python/apache_beam/coders/typecoders.py
new file mode 100644
index 000..98cf2b5
--- /dev/null
+++ b/sdks/python/apache_beam/coders/typecoders.py
@@ -0,0 +1,154 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Type coders registration.
+
+This module contains functionality to define and use coders for custom classes.
+Let's say we have a class Xyz and we are processing a PCollection with elements
+of type Xyz. If we do not register a coder for Xyz, a default pickle-based
+fallback coder will be used. This can be undesirable for two reasons. First, we
+may want a faster coder or a more space efficient one. Second, the pickle-based
+coder is not deterministic in the sense that objects like dictionaries or sets
+are not guaranteed to be encoded in the same way every time (elements are not
+really ordered).
+
+Two (sometimes three) steps are needed to define and use a custom coder:
+  - define the coder class
+  - associate the code with the class (a.k.a. coder registration)
+  - typehint DoFns or transforms with the new class or composite types using
+the class.
+
+A coder class is defined by subclassing from CoderBase and defining the
+encode_to_bytes and decode_from_bytes methods. The framework uses duck-typing
+for coders so it is not strictly required to subclass from CoderBase as long as
+the encode/decode methods are defined.
+
+Registering a coder class is made with a register_coder() call::
+
+  from google.cloud.dataflow import coders
+  ...
+  coders.registry.register_coder(Xyz, XyzCoder)
+
+Additionally, DoFns and PTransforms may need type hints. This is not always
+necessary since there is functionality to infer the return types of DoFns by
+analyzing the code. For instance, for the function below the return type of
+'Xyz' will be inferred::
+
+  def MakeXyzs(v):
+return Xyz(v)
+
+If Xyz is inferred then its coder will be used whenever the framework needs to
+serialize data (e.g., writing to the shuffler subsystem responsible for group 
by
+key operations). If a typehint is needed it can be specified by decorating the
+DoFns or using with_input_types/with_output_types methods on PTransforms. For
+example, the above function can be decorated::
+
+  @with_output_types(Xyz)
+  def MakeXyzs(v):
+return complex_operation_returning_Xyz(v)
+
+See google.cloud.dataflow.typehints.decorators module for more details.
+"""
+
+import logging
+
+from google.cloud.dataflow.coders import coders
+from google.cloud.dataflow.typehints import typehints
+
+
+class CoderRegistry(object):
+  """A coder registry for typehint/coder associations."""
+
+  def __init__(self, fallback_coder=None):
+self._coders = {}
+self.custom_types = []
+self.register_standard_coders(fallback_coder)
+
+  def register_standard_coders(self, fallback_coder):
+"""Register coders for all basic and composite types."""
+self._register_coder_internal(int, coders.VarIntCoder)
+self._register_coder_internal(float, coders.FloatCoder)
+self._register_coder_internal(str, coders.BytesCoder)
+self._register_coder_internal(bytes, coders.BytesCoder)
+self._register_coder_internal(unicode, coders.StrUtf8Coder)
+self._register_coder_internal(typehints.TupleConstraint, coders.TupleCoder)
+self._register_coder_internal(typehints.AnyTypeConstraint,
+  coders.PickleCoder)
+self._fallback_coder = fallback_coder or coders.PickleCoder
+
+  def _register_coder_internal(self, typehint_type, typehint_coder_class):
+self._coders[typehint_type] = typehint_coder_class
+
+  def register_coder(self, typehint_type, typehint_coder_class):
+if not isinstance(typehint_coder_class, type):
+  raise TypeError('Coder registration requires a coder class object. '
+  'Received %r instead.' % typehint_coder_class)
+if typehint_type not in self.custom_types:
+  self.custom_types.append(typehint_type)
+self._register_coder_internal(typehint_type, typehint_coder_class)
+
+  def get_coder(self, typehint):
+coder = self._coders.get(
+typehint.__class__ if isinstance(typehint, typehints.TypeConstraint)
+else typehint, None)
+

[21/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/apiclient.py
--
diff --git a/sdks/python/google/cloud/dataflow/internal/apiclient.py 
b/sdks/python/google/cloud/dataflow/internal/apiclient.py
deleted file mode 100644
index 9fb060d..000
--- a/sdks/python/google/cloud/dataflow/internal/apiclient.py
+++ /dev/null
@@ -1,935 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Dataflow client utility functions."""
-
-import codecs
-import json
-import logging
-import os
-import re
-import time
-
-
-from google.cloud.dataflow import utils
-from google.cloud.dataflow import version
-from google.cloud.dataflow.internal import pickler
-from google.cloud.dataflow.internal.auth import get_service_credentials
-from google.cloud.dataflow.internal.json_value import to_json_value
-from google.cloud.dataflow.io import iobase
-from google.cloud.dataflow.transforms import cy_combiners
-from google.cloud.dataflow.utils import dependency
-from google.cloud.dataflow.utils import names
-from google.cloud.dataflow.utils import retry
-from google.cloud.dataflow.utils.names import PropertyNames
-from google.cloud.dataflow.utils.options import GoogleCloudOptions
-from google.cloud.dataflow.utils.options import StandardOptions
-from google.cloud.dataflow.utils.options import WorkerOptions
-
-from apitools.base.py import encoding
-from apitools.base.py import exceptions
-from google.cloud.dataflow.internal.clients import storage
-import google.cloud.dataflow.internal.clients.dataflow as dataflow
-
-
-BIGQUERY_API_SERVICE = 'bigquery.googleapis.com'
-COMPUTE_API_SERVICE = 'compute.googleapis.com'
-STORAGE_API_SERVICE = 'storage.googleapis.com'
-
-
-def append_counter(status_object, counter, tentative):
-  """Appends a counter to the status.
-
-  Args:
-status_object: a work_item_status to which to add this counter
-counter: a counters.Counter object to append
-tentative: whether the value should be reported as tentative
-  """
-  logging.debug('Appending counter%s %s',
-' (tentative)' if tentative else '',
-counter)
-  kind, setter = metric_translations[counter.combine_fn.__class__]
-  append_metric(
-  status_object, counter.name, kind, counter.accumulator,
-  setter, tentative=tentative)
-
-
-def append_metric(status_object, metric_name, kind, value, setter=None,
-  step=None, output_user_name=None, tentative=False,
-  worker_id=None, cumulative=True):
-  """Creates and adds a MetricUpdate field to the passed-in protobuf.
-
-  Args:
-status_object: a work_item_status to which to add this metric
-metric_name: a string naming this metric
-kind: dataflow counter kind (e.g. 'sum')
-value: accumulator value to encode
-setter: if not None, a lambda to use to update metric_update with value
-step: the name of the associated step
-output_user_name: the user-visible name to use
-tentative: whether this should be labeled as a tentative metric
-worker_id: the id of this worker.  Specifying a worker_id also
-  causes this to be encoded as a metric, not a counter.
-cumulative: Whether this metric is cumulative, default True.
-  Set to False for a delta value.
-  """
-  # Does this look like a counter or like a metric?
-  is_counter = not worker_id
-
-  metric_update = dataflow.MetricUpdate()
-  metric_update.name = dataflow.MetricStructuredName()
-  metric_update.name.name = metric_name
-  # Handle attributes stored in the name context
-  if step or output_user_name or tentative or worker_id:
-metric_update.name.context = dataflow.MetricStructuredName.ContextValue()
-
-def append_to_context(key, value):
-  metric_update.name.context.additionalProperties.append(
-  dataflow.MetricStructuredName.ContextValue.AdditionalProperty(
-  key=key, value=value))
-if step:
-  append_to_context('step', step)
-if output_user_name:
-  append_to_context('output_user_name', output_user_name)
-if tentative:
-  append_to_context('tentative', 'true')
-if worker_id:
-  append_to_context('workerId', worker_id)
-  if cumulative and is_counter:
-metric_update.cumulative = cumulative
-  if is_counter:
-# Counters are distinguished by having a kind; metrics do not.
-

[31/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/transforms/ptransform.py
--
diff --git a/sdks/python/apache_beam/transforms/ptransform.py 
b/sdks/python/apache_beam/transforms/ptransform.py
new file mode 100644
index 000..09f8015
--- /dev/null
+++ b/sdks/python/apache_beam/transforms/ptransform.py
@@ -0,0 +1,703 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""PTransform and descendants.
+
+A PTransform is an object describing (not executing) a computation. The actual
+execution semantics for a transform is captured by a runner object. A transform
+object always belongs to a pipeline object.
+
+A PTransform derived class needs to define the apply() method that describes
+how one or more PValues are created by the transform.
+
+The module defines a few standard transforms: FlatMap (parallel do),
+GroupByKey (group by key), etc. Note that the apply() methods for these
+classes contain code that will add nodes to the processing graph associated
+with a pipeline.
+
+As support for the FlatMap transform, the module also defines a DoFn
+class and wrapper class that allows lambda functions to be used as
+FlatMap processing functions.
+"""
+
+from __future__ import absolute_import
+
+import copy
+import inspect
+import operator
+import os
+import sys
+
+from google.cloud.dataflow import coders
+from google.cloud.dataflow import error
+from google.cloud.dataflow import pvalue
+from google.cloud.dataflow import typehints
+from google.cloud.dataflow.internal import pickler
+from google.cloud.dataflow.internal import util
+from google.cloud.dataflow.typehints import getcallargs_forhints
+from google.cloud.dataflow.typehints import TypeCheckError
+from google.cloud.dataflow.typehints import validate_composite_type_param
+from google.cloud.dataflow.typehints import WithTypeHints
+from google.cloud.dataflow.typehints.trivial_inference import instance_to_type
+
+
+class _PValueishTransform(object):
+  """Visitor for PValueish objects.
+
+  A PValueish is a PValue, or list, tuple, dict of PValuesish objects.
+
+  This visits a PValueish, contstructing a (possibly mutated) copy.
+  """
+  def visit(self, node, *args):
+return getattr(
+self,
+'visit_' + node.__class__.__name__,
+lambda x, *args: x)(node, *args)
+
+  def visit_list(self, node, *args):
+return [self.visit(x, *args) for x in node]
+
+  def visit_tuple(self, node, *args):
+return tuple(self.visit(x, *args) for x in node)
+
+  def visit_dict(self, node, *args):
+return {key: self.visit(value, *args) for (key, value) in node.items()}
+
+
+class _SetInputPValues(_PValueishTransform):
+  def visit(self, node, replacements):
+if id(node) in replacements:
+  return replacements[id(node)]
+else:
+  return super(_SetInputPValues, self).visit(node, replacements)
+
+
+class _MaterializedDoOutputsTuple(pvalue.DoOutputsTuple):
+  def __init__(self, deferred, pvalue_cache):
+super(_MaterializedDoOutputsTuple, self).__init__(
+None, None, deferred._tags, deferred._main_tag)
+self._deferred = deferred
+self._pvalue_cache = pvalue_cache
+
+  def __getitem__(self, tag):
+return self._pvalue_cache.get_unwindowed_pvalue(self._deferred[tag])
+
+
+class _MaterializePValues(_PValueishTransform):
+  def __init__(self, pvalue_cache):
+self._pvalue_cache = pvalue_cache
+
+  def visit(self, node):
+if isinstance(node, pvalue.PValue):
+  return self._pvalue_cache.get_unwindowed_pvalue(node)
+elif isinstance(node, pvalue.DoOutputsTuple):
+  return _MaterializedDoOutputsTuple(node, self._pvalue_cache)
+else:
+  return super(_MaterializePValues, self).visit(node)
+
+
+class GetPValues(_PValueishTransform):
+  def visit(self, node, pvalues=None):
+if pvalues is None:
+  pvalues = []
+  self.visit(node, pvalues)
+  return pvalues
+elif isinstance(node, (pvalue.PValue, pvalue.DoOutputsTuple)):
+  pvalues.append(node)
+else:
+  super(GetPValues, self).visit(node, pvalues)
+
+
+class ZipPValues(_PValueishTransform):
+  """Pairs each PValue in a pvalueish with a value in a parallel out sibling.
+
+  Sibling should have the same nested structure as pvalueish.  Leaves in
+  sibling are expanded across nested pvalueish lists, tuples, and dicts.
+  For example
+
+

[27/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/typehints/typehints.py
--
diff --git a/sdks/python/apache_beam/typehints/typehints.py 
b/sdks/python/apache_beam/typehints/typehints.py
new file mode 100644
index 000..f1b3f53
--- /dev/null
+++ b/sdks/python/apache_beam/typehints/typehints.py
@@ -0,0 +1,1054 @@
+# Copyright 2016 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Syntax and semantics for type-hinting custom-functions/PTransforms in the 
SDK.
+
+This module defines type-hinting objects and the corresponding syntax for
+type-hinting function arguments, function return types, or PTransform object
+themselves. TypeHint's defined in the module can be used to implement either
+static or run-time type-checking in regular Python code.
+
+Type-hints are defined by 'indexing' a type-parameter into a defined
+CompositeTypeHint instance:
+
+  * 'List[int]'.
+
+Valid type-hints are partitioned into two categories: simple, and composite.
+
+Simple type hints are type hints based on a subset of Python primitive types:
+int, bool, float, str, object, None, and bytes. No other primitive types are
+allowed.
+
+Composite type-hints are reserved for hinting the types of container-like
+Python objects such as 'list'. Composite type-hints can be parameterized by an
+inner simple or composite type-hint, using the 'indexing' syntax. In order to
+avoid conflicting with the namespace of the built-in container types, when
+specifying this category of type-hints, the first letter should capitalized.
+The following composite type-hints are permitted. NOTE: 'T' can be any of the
+type-hints listed or a simple Python type:
+
+  * Any
+  * Union[T, T, T]
+  * Optional[T]
+  * Tuple[T, T]
+  * Tuple[T, ...]
+  * List[T]
+  * KV[T, T]
+  * Dict[T, T]
+  * Set[T]
+  * Iterable[T]
+  * Iterator[T]
+  * Generator[T]
+
+Type-hints can be nested, allowing one to define type-hints for complex types:
+
+  * 'List[Tuple[int, int, str]]
+
+In addition, type-hints can be used to implement run-time type-checking via the
+'type_check' method on each TypeConstraint.
+
+"""
+
+import collections
+import copy
+import types
+
+
+# A set of the built-in Python types we don't support, guiding the users
+# to templated (upper-case) versions instead.
+DISALLOWED_PRIMITIVE_TYPES = (list, set, tuple, dict)
+
+
+class SimpleTypeHintError(TypeError):
+  pass
+
+
+class CompositeTypeHintError(TypeError):
+  pass
+
+
+class GetitemConstructor(type):
+  """A metaclass that makes Cls[arg] an alias for Cls(arg)."""
+  def __getitem__(cls, arg):
+return cls(arg)
+
+
+class TypeConstraint(object):
+
+  """The base-class for all created type-constraints defined below.
+
+  A TypeConstraint is the result of parameterizing a CompositeTypeHint with
+  with one of the allowed Python types or another CompositeTypeHint. It
+  binds and enforces a specific version of a generalized TypeHint.
+  """
+
+  def _consistent_with_check_(self, sub):
+"""Returns whether sub is consistent with self.
+
+Has the same relationship to is_consistent_with() as
+__subclasscheck__ does for issubclass().
+
+Not meant to be called directly; call is_consistent_with(sub, self)
+instead.
+
+Implementation may assume that maybe_sub_type is not Any
+and has been normalized.
+"""
+raise NotImplementedError
+
+  def type_check(self, instance):
+"""Determines if the type of 'instance' satisfies this type constraint.
+
+Args:
+  instance: An instance of a Python object.
+
+Raises:
+  TypeError: The passed 'instance' doesn't satisfy this TypeConstraint.
+Subclasses of TypeConstraint are free to raise any of the subclasses of
+TypeError defined above, depending on the manner of the type hint 
error.
+
+All TypeConstraint sub-classes must define this method in other for the
+class object to be created.
+"""
+raise NotImplementedError
+
+  def match_type_variables(self, unused_concrete_type):
+return {}
+
+  def bind_type_variables(self, unused_bindings):
+return self
+
+  def _inner_types(self):
+"""Iterates over the inner types of the composite type."""
+return []
+
+  def visit(self, visitor, visitor_arg):
+"""Visitor method to visit all inner types of a composite type.
+
+Args:
+  visitor: A callable invoked for all nodes in

[09/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/runners/dataflow_runner.py
--
diff --git a/sdks/python/google/cloud/dataflow/runners/dataflow_runner.py 
b/sdks/python/google/cloud/dataflow/runners/dataflow_runner.py
deleted file mode 100644
index 1c0c589..000
--- a/sdks/python/google/cloud/dataflow/runners/dataflow_runner.py
+++ /dev/null
@@ -1,639 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""A runner implementation that submits a job for remote execution.
-
-The runner will create a JSON description of the job graph and then submit it
-to the Dataflow Service for remote execution by a worker.
-"""
-
-import base64
-import logging
-import threading
-import time
-
-
-from google.cloud.dataflow import coders
-from google.cloud.dataflow import pvalue
-from google.cloud.dataflow.internal import pickler
-from google.cloud.dataflow.io import iobase
-from google.cloud.dataflow.pvalue import PCollectionView
-from google.cloud.dataflow.runners.runner import PipelineResult
-from google.cloud.dataflow.runners.runner import PipelineRunner
-from google.cloud.dataflow.runners.runner import PipelineState
-from google.cloud.dataflow.runners.runner import PValueCache
-from google.cloud.dataflow.typehints import typehints
-from google.cloud.dataflow.utils import names
-from google.cloud.dataflow.utils.names import PropertyNames
-from google.cloud.dataflow.utils.names import TransformNames
-from google.cloud.dataflow.utils.options import StandardOptions
-from google.cloud.dataflow.internal.clients import dataflow as dataflow_api
-
-
-class DataflowPipelineRunner(PipelineRunner):
-  """A runner that creates job graphs and submits them for remote execution.
-
-  Every execution of the run() method will submit an independent job for
-  remote execution that consists of the nodes reachable from the passed in
-  node argument or entire graph if node is None. The run() method returns
-  after the service created the job and  will not wait for the job to finish
-  if blocking is set to False.
-  """
-
-  # Environment version information. It is passed to the service during a
-  # a job submission and is used by the service to establish what features
-  # are expected by the workers.
-  BATCH_ENVIRONMENT_MAJOR_VERSION = '4'
-  STREAMING_ENVIRONMENT_MAJOR_VERSION = '0'
-
-  def __init__(self, cache=None, blocking=False):
-# Cache of CloudWorkflowStep protos generated while the runner
-# "executes" a pipeline.
-self._cache = cache if cache is not None else PValueCache()
-self.blocking = blocking
-self.result = None
-self._unique_step_id = 0
-
-  def _get_unique_step_name(self):
-self._unique_step_id += 1
-return 's%s' % self._unique_step_id
-
-  @staticmethod
-  def poll_for_job_completion(runner, job_id):
-"""Polls for the specified job to finish running (successfully or not)."""
-last_message_time = None
-last_message_id = None
-
-last_error_rank = float('-inf')
-last_error_msg = None
-last_job_state = None
-# How long to wait after pipeline failure for the error
-# message to show up giving the reason for the failure.
-# It typically takes about 30 seconds.
-final_countdown_timer_secs = 50.0
-sleep_secs = 5.0
-# Try to prioritize the user-level traceback, if any.
-def rank_error(msg):
-  if 'work item was attempted' in msg:
-return -1
-  elif 'Traceback' in msg:
-return 1
-  else:
-return 0
-
-while True:
-  response = runner.dataflow_client.get_job(job_id)
-  # If get() is called very soon after Create() the response may not 
contain
-  # an initialized 'currentState' field.
-  if response.currentState is not None:
-if response.currentState != last_job_state:
-  logging.info('Job %s is in state %s', job_id, response.currentState)
-  last_job_state = response.currentState
-if str(response.currentState) != 'JOB_STATE_RUNNING':
-  # Stop checking for new messages on timeout, explanatory
-  # message received, success, or a terminal job state caused
-  # by the user that therefore doesn't require explanation.
-  if (final_countdown_timer_secs <= 0.0
-  or last_error_msg is not None
-  or str(response.currentState) ==

[18/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
 
b/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
deleted file mode 100644
index 2e0273f..000
--- 
a/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
+++ /dev/null
@@ -1,3056 +0,0 @@
-"""Generated message classes for dataflow version v1b3.
-
-Develops and executes data processing patterns like ETL, batch computation,
-and continuous computation.
-"""
-# NOTE: This file is autogenerated and should not be edited by hand.
-
-from apitools.base.protorpclite import messages as _messages
-from apitools.base.py import encoding
-
-
-package = 'dataflow'
-
-
-class ApproximateProgress(_messages.Message):
-  """Obsolete in favor of ApproximateReportedProgress and
-  ApproximateSplitRequest.
-
-  Fields:
-percentComplete: Obsolete.
-position: Obsolete.
-remainingTime: Obsolete.
-  """
-
-  percentComplete = _messages.FloatField(1, variant=_messages.Variant.FLOAT)
-  position = _messages.MessageField('Position', 2)
-  remainingTime = _messages.StringField(3)
-
-
-class ApproximateReportedProgress(_messages.Message):
-  """A progress measurement of a WorkItem by a worker.
-
-  Fields:
-consumedParallelism: Total amount of parallelism in the portion of input
-  of this work item that has already been consumed. In the first two
-  examples above (see remaining_parallelism), the value should be 30 or 3
-  respectively. The sum of remaining_parallelism and consumed_parallelism
-  should equal the total amount of parallelism in this work item. If
-  specified, must be finite.
-fractionConsumed: Completion as fraction of the input consumed, from 0.0
-  (beginning, nothing consumed), to 1.0 (end of the input, entire input
-  consumed).
-position: A Position within the work to represent a progress.
-remainingParallelism: Total amount of parallelism in the input of this
-  WorkItem that has not been consumed yet (i.e. can be delegated to a new
-  WorkItem via dynamic splitting). "Amount of parallelism" refers to how
-  many non-empty parts of the input can be read in parallel. This does not
-  necessarily equal number of records. An input that can be read in
-  parallel down to the individual records is called "perfectly
-  splittable". An example of non-perfectly parallelizable input is a
-  block-compressed file format where a block of records has to be read as
-  a whole, but different blocks can be read in parallel. Examples: * If we
-  have read 30 records out of 50 in a perfectly splittable 50-record
-  input, this value should be 20. * If we are reading through block 3 in a
-  block-compressed file consisting of 5 blocks, this value should be 2
-  (since blocks 4 and 5 can be processed in parallel by new work items via
-  dynamic splitting). * If we are reading through the last block in a
-  block-compressed file, or reading or processing the last record in a
-  perfectly splittable input, this value should be 0, because the
-  remainder of the work item cannot be further split.
-  """
-
-  consumedParallelism = _messages.MessageField('ReportedParallelism', 1)
-  fractionConsumed = _messages.FloatField(2)
-  position = _messages.MessageField('Position', 3)
-  remainingParallelism = _messages.MessageField('ReportedParallelism', 4)
-
-
-class ApproximateSplitRequest(_messages.Message):
-  """A suggestion by the service to the worker to dynamically split the
-  WorkItem.
-
-  Fields:
-fractionConsumed: A fraction at which to split the work item, from 0.0
-  (beginning of the input) to 1.0 (end of the input).
-position: A Position at which to split the work item.
-  """
-
-  fractionConsumed = _messages.FloatField(1)
-  position = _messages.MessageField('Position', 2)
-
-
-class AutoscalingSettings(_messages.Message):
-  """Settings for WorkerPool autoscaling.
-
-  Enums:
-AlgorithmValueValuesEnum: The algorithm to use for autoscaling.
-
-  Fields:
-algorithm: The algorithm to use for autoscaling.
-maxNumWorkers: The maximum number of workers to cap scaling at.
-  """
-
-  class AlgorithmValueValuesEnum(_messages.Enum):
-"""The algorithm to use for autoscaling.
-
-Values:
-  AUTOSCALING_ALGORITHM_UNKNOWN: 
-  AUTOSCALING_ALGORITHM_NONE: 
-  AUTOSCALING_ALGORITHM_BASIC: 
-"""
-AUTOSCALING_ALGORITHM_UNKNOWN = 0
-AUTOSCALING_ALGORITHM_NONE = 1
-AUTOSCALING_ALGORITHM_BASIC = 2
-
-  algorithm = _messages.EnumField('AlgorithmValueValuesEnum', 1)
-  maxNumWorkers = _messages.IntegerField(2, variant=_messages.Variant.INT32)
-
-
-class

[13/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/windmill_service_pb2.py
--
diff --git a/sdks/python/google/cloud/dataflow/internal/windmill_service_pb2.py 
b/sdks/python/google/cloud/dataflow/internal/windmill_service_pb2.py
deleted file mode 100644
index e90d4f0..000
--- a/sdks/python/google/cloud/dataflow/internal/windmill_service_pb2.py
+++ /dev/null
@@ -1,161 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Generated by the protocol buffer compiler.  DO NOT EDIT!
-# source: windmill_service.proto
-
-import sys
-_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
-from google.protobuf import descriptor as _descriptor
-from google.protobuf import message as _message
-from google.protobuf import reflection as _reflection
-from google.protobuf import symbol_database as _symbol_database
-from google.protobuf import descriptor_pb2
-# @@protoc_insertion_point(imports)
-
-_sym_db = _symbol_database.Default()
-
-
-import windmill_pb2 as windmill__pb2
-
-
-DESCRIPTOR = _descriptor.FileDescriptor(
-  name='windmill_service.proto',
-  package='google.dataflow.windmillservice.v1alpha1',
-  syntax='proto2',
-  
serialized_pb=_b('\n\x16windmill_service.proto\x12(google.dataflow.windmillservice.v1alpha1\x1a\x0ewindmill.proto2\xf9\x02\n\x1c\x43loudWindmillServiceV1Alpha1\x12>\n\x07GetWork\x12\x18.windmill.GetWorkRequest\x1a\x19.windmill.GetWorkResponse\x12>\n\x07GetData\x12\x18.windmill.GetDataRequest\x1a\x19.windmill.GetDataResponse\x12G\n\nCommitWork\x12\x1b.windmill.CommitWorkRequest\x1a\x1c.windmill.CommitWorkResponse\x12\x44\n\tGetConfig\x12\x1a.windmill.GetConfigRequest\x1a\x1b.windmill.GetConfigResponse\x12J\n\x0bReportStats\x12\x1c.windmill.ReportStatsRequest\x1a\x1d.windmill.ReportStatsResponseB7\n5com.google.cloud.dataflow.sdk.runners.worker.windmill')
-  ,
-  dependencies=[windmill__pb2.DESCRIPTOR,])
-_sym_db.RegisterFileDescriptor(DESCRIPTOR)
-
-
-
-
-
-DESCRIPTOR.has_options = True
-DESCRIPTOR._options = _descriptor._ParseOptions(descriptor_pb2.FileOptions(), 
_b('\n5com.google.cloud.dataflow.sdk.runners.worker.windmill'))
-from grpc.beta import implementations as beta_implementations
-from grpc.beta import interfaces as beta_interfaces
-from grpc.framework.common import cardinality
-from grpc.framework.interfaces.face import utilities as face_utilities
-
-
-class BetaCloudWindmillServiceV1Alpha1Servicer(object):
-  """The Cloud Windmill Service API used by GCE to acquire and process 
streaming
-  Dataflow work.
-  """
-  def GetWork(self, request, context):
-"""Gets streaming Dataflow work.
-"""
-context.code(beta_interfaces.StatusCode.UNIMPLEMENTED)
-  def GetData(self, request, context):
-"""Gets data from Windmill.
-"""
-context.code(beta_interfaces.StatusCode.UNIMPLEMENTED)
-  def CommitWork(self, request, context):
-"""Commits previously acquired work.
-"""
-context.code(beta_interfaces.StatusCode.UNIMPLEMENTED)
-  def GetConfig(self, request, context):
-"""Gets dependant configuration from windmill.
-"""
-context.code(beta_interfaces.StatusCode.UNIMPLEMENTED)
-  def ReportStats(self, request, context):
-"""Reports stats to Windmill.
-"""
-context.code(beta_interfaces.StatusCode.UNIMPLEMENTED)
-
-
-class BetaCloudWindmillServiceV1Alpha1Stub(object):
-  """The Cloud Windmill Service API used by GCE to acquire and process 
streaming
-  Dataflow work.
-  """
-  def GetWork(self, request, timeout, metadata=None, with_call=False, 
protocol_options=None):
-"""Gets streaming Dataflow work.
-"""
-raise NotImplementedError()
-  GetWork.future = None
-  def GetData(self, request, timeout, metadata=None, with_call=False, 
protocol_options=None):
-"""Gets data from Windmill.
-"""
-raise NotImplementedError()
-  GetData.future = None
-  def CommitWork(self, request, timeout, metadata=None, with_call=False, 
protocol_options=None):
-"""Commits previously acquired work.
-"""
-raise NotImplementedError()
-  CommitWork.future = None
-  def GetConfig(self, request, timeout, metadata=None, with_call=False, 
protocol_options=None):
-"""Gets dependant configuration from windmill.
-"""
-raise NotImplementedError()
-  GetConfig.future = None
-  def ReportStats(self, request, timeout, metadata=None,

[22/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/examples/snippets/snippets.py
--
diff --git a/sdks/python/google/cloud/dataflow/examples/snippets/snippets.py 
b/sdks/python/google/cloud/dataflow/examples/snippets/snippets.py
deleted file mode 100644
index f6bb63a..000
--- a/sdks/python/google/cloud/dataflow/examples/snippets/snippets.py
+++ /dev/null
@@ -1,872 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Code snippets used in Cloud Dataflow webdocs.
-
-The examples here are written specifically to read well with the accompanying
-web docs from https://cloud.google.com/dataflow. Do not rewrite them until you
-make sure the webdocs still read well and the rewritten code supports the
-concept being described. For example, there are snippets that could be shorter
-but they are written like this to make a specific point in the docs.
-
-The code snippets are all organized as self contained functions. Parts of the
-function body delimited by [START tag] and [END tag] will be included
-automatically in the web docs. The naming convention for the tags is to have as
-prefix the PATH_TO_HTML where they are included followed by a descriptive
-string. For instance a code snippet that will be used as a code example
-at https://cloud.google.com/dataflow/model/pipelines will have the tag
-model_pipelines_DESCRIPTION. The tags can contain only letters, digits and _.
-"""
-
-import google.cloud.dataflow as df
-
-# Quiet some pylint warnings that happen because of the somewhat special
-# format for the code snippets.
-# pylint:disable=invalid-name
-# pylint:disable=expression-not-assigned
-# pylint:disable=redefined-outer-name
-# pylint:disable=unused-variable
-# pylint:disable=g-doc-args
-# pylint:disable=g-import-not-at-top
-
-
-class SnippetUtils(object):
-  from google.cloud.dataflow.pipeline import PipelineVisitor
-
-  class RenameFiles(PipelineVisitor):
-"""RenameFiles will rewire source and sink for unit testing.
-
-RenameFiles will rewire the GCS files specified in the source and
-sink in the snippet pipeline to local files so the pipeline can be run as a
-unit test. This is as close as we can get to have code snippets that are
-executed and are also ready to presented in webdocs.
-"""
-
-def __init__(self, renames):
-  self.renames = renames
-
-def visit_transform(self, transform_node):
-  if hasattr(transform_node.transform, 'source'):
-source = transform_node.transform.source
-source.file_path = self.renames['read']
-source.is_gcs_source = False
-  elif hasattr(transform_node.transform, 'sink'):
-sink = transform_node.transform.sink
-sink.file_path = self.renames['write']
-sink.is_gcs_sink = False
-
-
-def construct_pipeline(renames):
-  """A reverse words snippet as an example for constructing a pipeline.
-
-  URL: https://cloud.google.com/dataflow/pipelines/constructing-your-pipeline
-  """
-  import re
-
-  class ReverseWords(df.PTransform):
-"""A PTransform that reverses individual elements in a PCollection."""
-
-def apply(self, pcoll):
-  return pcoll | df.Map(lambda e: e[::-1])
-
-  def filter_words(unused_x):
-"""Pass through filter to select everything."""
-return True
-
-  # [START pipelines_constructing_creating]
-  from google.cloud.dataflow.utils.options import PipelineOptions
-
-  p = df.Pipeline(options=PipelineOptions())
-  # [END pipelines_constructing_creating]
-
-  # [START pipelines_constructing_reading]
-  lines = p | df.io.Read('ReadMyFile',
-  df.io.TextFileSource('gs://some/inputData.txt'))
-  # [END pipelines_constructing_reading]
-
-  # [START pipelines_constructing_applying]
-  words = lines | df.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
-  reversed_words = words | ReverseWords()
-  # [END pipelines_constructing_applying]
-
-  # [START pipelines_constructing_writing]
-  filtered_words = reversed_words | df.Filter('FilterWords', filter_words)
-  filtered_words | df.io.Write('WriteMyFile',
-   df.io.TextFileSink('gs://some/outputData.txt'))
-  # [END pipelines_constructing_writing]
-
-  p.visit(SnippetUtils.RenameFiles(renames))
-
-  # [START pipelines_constructing_running]
-  p.run()
-  # [END pipelines_constructing_running]
-
-

[05/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/transforms/timeutil.py
--
diff --git a/sdks/python/google/cloud/dataflow/transforms/timeutil.py 
b/sdks/python/google/cloud/dataflow/transforms/timeutil.py
deleted file mode 100644
index 7b750f9..000
--- a/sdks/python/google/cloud/dataflow/transforms/timeutil.py
+++ /dev/null
@@ -1,310 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Time and timer utilities."""
-
-from __future__ import absolute_import
-
-from abc import ABCMeta
-from abc import abstractmethod
-
-import datetime
-import sys
-
-
-class Timestamp(object):
-  """Represents a Unix second timestamp with microsecond granularity.
-
-  Can be treated in common timestamp arithmetic operations as a numeric type.
-
-  Internally stores a time interval as an int of microseconds. This strategy
-  is necessary since floating point values lose precision when storing values,
-  especially after arithmetic operations (for example, 1000 % 0.1 evaluates
-  to 0.04448885).
-  """
-
-  def __init__(self, seconds=0, micros=0):
-self.micros = int(seconds * 100) + int(micros)
-
-  @staticmethod
-  def of(seconds):
-"""Return the Timestamp for the given number of seconds.
-
-If the input is already a Timestamp, the input itself will be returned.
-
-Args:
-  seconds: Number of seconds as int, float or Timestamp.
-
-Returns:
-  Corresponding Timestamp object.
-"""
-
-if isinstance(seconds, Duration):
-  raise TypeError('Can\'t interpret %s as Timestamp.' % seconds)
-if isinstance(seconds, Timestamp):
-  return seconds
-return Timestamp(seconds)
-
-  def __repr__(self):
-micros = self.micros
-sign = ''
-if micros < 0:
-  sign = '-'
-  micros = -micros
-int_part = micros / 100
-frac_part = micros % 100
-if frac_part:
-  return 'Timestamp(%s%d.%06d)' % (sign, int_part, frac_part)
-else:
-  return 'Timestamp(%s%d)' % (sign, int_part)
-
-  def to_utc_datetime(self):
-epoch = datetime.datetime.utcfromtimestamp(0)
-# We can't easily construct a datetime object from microseconds, so we
-# create one at the epoch and add an appropriate timedelta interval.
-return epoch + datetime.timedelta(microseconds=self.micros)
-
-  def isoformat(self):
-# Append 'Z' for UTC timezone.
-return self.to_utc_datetime().isoformat() + 'Z'
-
-  def __float__(self):
-# Note that the returned value may have lost precision.
-return float(self.micros) / 100
-
-  def __int__(self):
-# Note that the returned value may have lost precision.
-return self.micros / 100
-
-  def __cmp__(self, other):
-# Allow comparisons between Duration and Timestamp values.
-if not isinstance(other, Duration):
-  other = Timestamp.of(other)
-return cmp(self.micros, other.micros)
-
-  def __hash__(self):
-return hash(self.micros)
-
-  def __add__(self, other):
-other = Duration.of(other)
-return Timestamp(micros=self.micros + other.micros)
-
-  def __radd__(self, other):
-return self + other
-
-  def __sub__(self, other):
-other = Duration.of(other)
-return Timestamp(micros=self.micros - other.micros)
-
-  def __mod__(self, other):
-other = Duration.of(other)
-return Duration(micros=self.micros % other.micros)
-
-
-MIN_TIMESTAMP = Timestamp(micros=-sys.maxint - 1)
-MAX_TIMESTAMP = Timestamp(micros=sys.maxint)
-
-
-class Duration(object):
-  """Represents a second duration with microsecond granularity.
-
-  Can be treated in common arithmetic operations as a numeric type.
-
-  Internally stores a time interval as an int of microseconds. This strategy
-  is necessary since floating point values lose precision when storing values,
-  especially after arithmetic operations (for example, 1000 % 0.1 evaluates
-  to 0.04448885).
-  """
-
-  def __init__(self, seconds=0, micros=0):
-self.micros = int(seconds * 100) + int(micros)
-
-  @staticmethod
-  def of(seconds):
-"""Return the Duration for the given number of seconds since Unix epoch.
-
-If the input is already a Duration, the input itself will be returned.
-
-Args:
-  seconds: Number of seconds as int, float or Duration.
-
-Returns:
-  Corresponding Duration object.
-"""
-
-

[14/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/windmill_pb2.py
--
diff --git a/sdks/python/google/cloud/dataflow/internal/windmill_pb2.py 
b/sdks/python/google/cloud/dataflow/internal/windmill_pb2.py
deleted file mode 100644
index 549e54e..000
--- a/sdks/python/google/cloud/dataflow/internal/windmill_pb2.py
+++ /dev/null
@@ -1,2275 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Generated by the protocol buffer compiler.  DO NOT EDIT!
-# source: windmill.proto
-
-import sys
-_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
-from google.protobuf import descriptor as _descriptor
-from google.protobuf import message as _message
-from google.protobuf import reflection as _reflection
-from google.protobuf import symbol_database as _symbol_database
-from google.protobuf import descriptor_pb2
-# @@protoc_insertion_point(imports)
-
-_sym_db = _symbol_database.Default()
-
-
-
-
-DESCRIPTOR = _descriptor.FileDescriptor(
-  name='windmill.proto',
-  package='windmill',
-  syntax='proto2',
-  
serialized_pb=_b('\n\x0ewindmill.proto\x12\x08windmill\"R\n\x07Message\x12\'\n\ttimestamp\x18\x01
 \x02(\x03:\x14-9223372036854775808\x12\x0c\n\x04\x64\x61ta\x18\x02 
\x02(\x0c\x12\x10\n\x08metadata\x18\x03 
\x01(\x0c\"\xbf\x01\n\x05Timer\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12\'\n\ttimestamp\x18\x02 
\x01(\x03:\x14-9223372036854775808\x12-\n\x04type\x18\x03 
\x01(\x0e\x32\x14.windmill.Timer.Type:\tWATERMARK\x12\x14\n\x0cstate_family\x18\x04
 
\x01(\t\";\n\x04Type\x12\r\n\tWATERMARK\x10\x00\x12\x0c\n\x08REALTIME\x10\x01\x12\x16\n\x12\x44\x45PENDENT_REALTIME\x10\x02\"X\n\x12InputMessageBundle\x12\x1d\n\x15source_computation_id\x18\x01
 \x02(\t\x12#\n\x08messages\x18\x02 
\x03(\x0b\x32\x11.windmill.Message\"r\n\x12KeyedMessageBundle\x12\x0b\n\x03key\x18\x01
 \x02(\x0c\x12\x14\n\x0csharding_key\x18\x04 
\x01(\x06\x12#\n\x08messages\x18\x02 
\x03(\x0b\x32\x11.windmill.Message\x12\x14\n\x0cmessages_ids\x18\x03 
\x03(\x0c\"\x87\x01\n\x13OutputMessageBundle\x12\"\n\x1a\x64\x65stination_computation_id\
 x18\x01 \x01(\t\x12\x1d\n\x15\x64\x65stination_stream_id\x18\x03 
\x01(\t\x12-\n\x07\x62undles\x18\x02 
\x03(\x0b\x32\x1c.windmill.KeyedMessageBundle\"t\n\x13PubSubMessageBundle\x12\r\n\x05topic\x18\x01
 \x02(\t\x12#\n\x08messages\x18\x02 
\x03(\x0b\x32\x11.windmill.Message\x12\x17\n\x0ftimestamp_label\x18\x03 
\x01(\t\x12\x10\n\x08id_label\x18\x04 
\x01(\t\".\n\x0bTimerBundle\x12\x1f\n\x06timers\x18\x01 
\x03(\x0b\x32\x0f.windmill.Timer\">\n\x05Value\x12\'\n\ttimestamp\x18\x01 
\x02(\x03:\x14-9223372036854775808\x12\x0c\n\x04\x64\x61ta\x18\x02 
\x02(\x0c\"M\n\x08TagValue\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12\x1e\n\x05value\x18\x02 
\x01(\x0b\x32\x0f.windmill.Value\x12\x14\n\x0cstate_family\x18\x03 
\x01(\t\"\xdb\x01\n\x07TagList\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12+\n\rend_timestamp\x18\x02 
\x01(\x03:\x14-9223372036854775808\x12\x1f\n\x06values\x18\x03 
\x03(\x0b\x32\x0f.windmill.Value\x12\x14\n\x0cstate_family\x18\x04 
\x01(\t\x12\x15\n\rrequest_token\x18\x07 \x01(\x0c\x12\x1a\n\x12\x63onti
 nuation_token\x18\x05 \x01(\x0c\x12,\n\x0f\x66\x65tch_max_bytes\x18\x06 
\x01(\x03:\x13\x39\x32\x32\x33\x33\x37\x32\x30\x33\x36\x38\x35\x34\x37\x37\x35\x38\x30\x37\",\n\x0cGlobalDataId\x12\x0b\n\x03tag\x18\x01
 \x02(\t\x12\x0f\n\x07version\x18\x02 
\x02(\x0c\"k\n\nGlobalData\x12\'\n\x07\x64\x61ta_id\x18\x01 
\x02(\x0b\x32\x16.windmill.GlobalDataId\x12\x10\n\x08is_ready\x18\x02 
\x01(\x08\x12\x0c\n\x04\x64\x61ta\x18\x03 
\x01(\x0c\x12\x14\n\x0cstate_family\x18\x04 
\x01(\t\"I\n\x0bSourceState\x12\r\n\x05state\x18\x01 
\x01(\x0c\x12\x14\n\x0c\x66inalize_ids\x18\x02 
\x03(\x06\x12\x15\n\ronly_finalize\x18\x03 
\x01(\x08\"Y\n\rWatermarkHold\x12\x0b\n\x03tag\x18\x01 
\x02(\x0c\x12\x16\n\ntimestamps\x18\x02 
\x03(\x03\x42\x02\x10\x01\x12\r\n\x05reset\x18\x03 
\x01(\x08\x12\x14\n\x0cstate_family\x18\x04 
\x01(\t\"\xd4\x02\n\x08WorkItem\x12\x0b\n\x03key\x18\x01 
\x02(\x0c\x12\x12\n\nwork_token\x18\x02 
\x02(\x06\x12\x14\n\x0csharding_key\x18\t 
\x01(\x06\x12\x13\n\x0b\x63\x61\x63he_token\x18\x07 \x01(\x06\x
 12\x35\n\x0fmessage_bundles\x18\x03 
\x03(\x0b\x32\x1c.windmill.InputMessageBundle\x12%\n\x06timers\x18\x04 
\x01(\x0b\x32\x15.windmill.TimerBundle\x12<\n\x1cglobal_data_id_notifications\x18\x05

[43/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py
--
diff --git 
a/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py 
b/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py
new file mode 100644
index 000..c2eaea1
--- /dev/null
+++ b/sdks/python/apache_beam/internal/clients/dataflow/dataflow_v1b3_client.py
@@ -0,0 +1,316 @@
+"""Generated client library for dataflow version v1b3."""
+# NOTE: This file is autogenerated and should not be edited by hand.
+from apitools.base.py import base_api
+from google.cloud.dataflow.internal.clients.dataflow import 
dataflow_v1b3_messages as messages
+
+
+class DataflowV1b3(base_api.BaseApiClient):
+  """Generated client library for service dataflow version v1b3."""
+
+  MESSAGES_MODULE = messages
+
+  _PACKAGE = u'dataflow'
+  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/userinfo.email']
+  _VERSION = u'v1b3'
+  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
+  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _CLIENT_CLASS_NAME = u'DataflowV1b3'
+  _URL_VERSION = u'v1b3'
+  _API_KEY = None
+
+  def __init__(self, url='', credentials=None,
+   get_credentials=True, http=None, model=None,
+   log_request=False, log_response=False,
+   credentials_args=None, default_global_params=None,
+   additional_http_headers=None):
+"""Create a new dataflow handle."""
+url = url or u'https://dataflow.googleapis.com/'
+super(DataflowV1b3, self).__init__(
+url, credentials=credentials,
+get_credentials=get_credentials, http=http, model=model,
+log_request=log_request, log_response=log_response,
+credentials_args=credentials_args,
+default_global_params=default_global_params,
+additional_http_headers=additional_http_headers)
+self.projects_jobs_messages = self.ProjectsJobsMessagesService(self)
+self.projects_jobs_workItems = self.ProjectsJobsWorkItemsService(self)
+self.projects_jobs = self.ProjectsJobsService(self)
+self.projects = self.ProjectsService(self)
+
+  class ProjectsJobsMessagesService(base_api.BaseApiService):
+"""Service class for the projects_jobs_messages resource."""
+
+_NAME = u'projects_jobs_messages'
+
+def __init__(self, client):
+  super(DataflowV1b3.ProjectsJobsMessagesService, self).__init__(client)
+  self._method_configs = {
+  'List': base_api.ApiMethodInfo(
+  http_method=u'GET',
+  method_id=u'dataflow.projects.jobs.messages.list',
+  ordered_params=[u'projectId', u'jobId'],
+  path_params=[u'jobId', u'projectId'],
+  query_params=[u'endTime', u'minimumImportance', u'pageSize', 
u'pageToken', u'startTime'],
+  relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/messages',
+  request_field='',
+  request_type_name=u'DataflowProjectsJobsMessagesListRequest',
+  response_type_name=u'ListJobMessagesResponse',
+  supports_download=False,
+  ),
+  }
+
+  self._upload_configs = {
+  }
+
+def List(self, request, global_params=None):
+  """Request the job status.
+
+  Args:
+request: (DataflowProjectsJobsMessagesListRequest) input message
+global_params: (StandardQueryParameters, default: None) global 
arguments
+  Returns:
+(ListJobMessagesResponse) The response message.
+  """
+  config = self.GetMethodConfig('List')
+  return self._RunMethod(
+  config, request, global_params=global_params)
+
+  class ProjectsJobsWorkItemsService(base_api.BaseApiService):
+"""Service class for the projects_jobs_workItems resource."""
+
+_NAME = u'projects_jobs_workItems'
+
+def __init__(self, client):
+  super(DataflowV1b3.ProjectsJobsWorkItemsService, self).__init__(client)
+  self._method_configs = {
+  'Lease': base_api.ApiMethodInfo(
+  http_method=u'POST',
+  method_id=u'dataflow.projects.jobs.workItems.lease',
+  ordered_params=[u'projectId', u'jobId'],
+  path_params=[u'jobId', u'projectId'],
+  query_params=[],
+  
relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/workItems:lease',
+  request_field=u'leaseWorkItemRequest',
+  request_type_name=u'DataflowProjectsJobsWorkItemsLeaseRequest',
+  response_type_name=u'LeaseWorkItemResponse',
+  supports_download=False,
+  ),
+  'ReportStatus': base_api.ApiMethodInfo(
+  http_method=u'POST',
+  method_id=u'dataflow.projects.jobs.workItems.reportStatus',
+

[17/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/clients/storage/__init__.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/internal/clients/storage/__init__.py 
b/sdks/python/google/cloud/dataflow/internal/clients/storage/__init__.py
deleted file mode 100644
index 15b1524..000
--- a/sdks/python/google/cloud/dataflow/internal/clients/storage/__init__.py
+++ /dev/null
@@ -1,10 +0,0 @@
-"""Common imports for generated storage client library."""
-# pylint:disable=wildcard-import
-
-import pkgutil
-
-from apitools.base.py import *
-from google.cloud.dataflow.internal.clients.storage.storage_v1_client import *
-from google.cloud.dataflow.internal.clients.storage.storage_v1_messages import 
*
-
-__path__ = pkgutil.extend_path(__path__, __name__)

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_client.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_client.py
 
b/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_client.py
deleted file mode 100644
index c8255c2..000
--- 
a/sdks/python/google/cloud/dataflow/internal/clients/storage/storage_v1_client.py
+++ /dev/null
@@ -1,1021 +0,0 @@
-"""Generated client library for storage version v1."""
-# NOTE: This file is autogenerated and should not be edited by hand.
-from apitools.base.py import base_api
-from google.cloud.dataflow.internal.clients.storage import storage_v1_messages 
as messages
-
-
-class StorageV1(base_api.BaseApiClient):
-  """Generated client library for service storage version v1."""
-
-  MESSAGES_MODULE = messages
-
-  _PACKAGE = u'storage'
-  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/cloud-platform.read-only', 
u'https://www.googleapis.com/auth/devstorage.full_control', 
u'https://www.googleapis.com/auth/devstorage.read_only', 
u'https://www.googleapis.com/auth/devstorage.read_write']
-  _VERSION = u'v1'
-  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
-  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _CLIENT_CLASS_NAME = u'StorageV1'
-  _URL_VERSION = u'v1'
-  _API_KEY = None
-
-  def __init__(self, url='', credentials=None,
-   get_credentials=True, http=None, model=None,
-   log_request=False, log_response=False,
-   credentials_args=None, default_global_params=None,
-   additional_http_headers=None):
-"""Create a new storage handle."""
-url = url or u'https://www.googleapis.com/storage/v1/'
-super(StorageV1, self).__init__(
-url, credentials=credentials,
-get_credentials=get_credentials, http=http, model=model,
-log_request=log_request, log_response=log_response,
-credentials_args=credentials_args,
-default_global_params=default_global_params,
-additional_http_headers=additional_http_headers)
-self.bucketAccessControls = self.BucketAccessControlsService(self)
-self.buckets = self.BucketsService(self)
-self.channels = self.ChannelsService(self)
-self.defaultObjectAccessControls = 
self.DefaultObjectAccessControlsService(self)
-self.objectAccessControls = self.ObjectAccessControlsService(self)
-self.objects = self.ObjectsService(self)
-
-  class BucketAccessControlsService(base_api.BaseApiService):
-"""Service class for the bucketAccessControls resource."""
-
-_NAME = u'bucketAccessControls'
-
-def __init__(self, client):
-  super(StorageV1.BucketAccessControlsService, self).__init__(client)
-  self._method_configs = {
-  'Delete': base_api.ApiMethodInfo(
-  http_method=u'DELETE',
-  method_id=u'storage.bucketAccessControls.delete',
-  ordered_params=[u'bucket', u'entity'],
-  path_params=[u'bucket', u'entity'],
-  query_params=[],
-  relative_path=u'b/{bucket}/acl/{entity}',
-  request_field='',
-  request_type_name=u'StorageBucketAccessControlsDeleteRequest',
-  response_type_name=u'StorageBucketAccessControlsDeleteResponse',
-  supports_download=False,
-  ),
-  'Get': base_api.ApiMethodInfo(
-  http_method=u'GET',
-  method_id=u'storage.bucketAccessControls.get',
-  ordered_params=[u'bucket', u'entity'],
-  path_params=[u'bucket', u'entity'],
-  query_params=[],
-  relative_path=u'b/{bucket}/acl/{entity}',
-  request_field='',
-  request_type_name=u'StorageBucketAccessControlsGetRequest',
-  response_type_name=u'BucketAccessControl',
-  supports_download=False,
-  ),

[06/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/transforms/ptransform_test.py
--
diff --git a/sdks/python/google/cloud/dataflow/transforms/ptransform_test.py 
b/sdks/python/google/cloud/dataflow/transforms/ptransform_test.py
deleted file mode 100644
index 00b6c8d..000
--- a/sdks/python/google/cloud/dataflow/transforms/ptransform_test.py
+++ /dev/null
@@ -1,1814 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Unit tests for the PTransform and descendants."""
-
-from __future__ import absolute_import
-
-import operator
-import re
-import unittest
-
-
-import google.cloud.dataflow as df
-from google.cloud.dataflow.pipeline import Pipeline
-import google.cloud.dataflow.pvalue as pvalue
-import google.cloud.dataflow.transforms.combiners as combine
-from google.cloud.dataflow.transforms.ptransform import PTransform
-from google.cloud.dataflow.transforms.util import assert_that, equal_to
-import google.cloud.dataflow.typehints as typehints
-from google.cloud.dataflow.typehints import with_input_types
-from google.cloud.dataflow.typehints import with_output_types
-from google.cloud.dataflow.typehints.typehints_test import TypeHintTestCase
-from google.cloud.dataflow.utils.options import PipelineOptions
-from google.cloud.dataflow.utils.options import TypeOptions
-
-
-# Disable frequent lint warning due to pipe operator for chaining transforms.
-# pylint: disable=expression-not-assigned
-
-
-class PTransformTest(unittest.TestCase):
-
-  def assertStartswith(self, msg, prefix):
-self.assertTrue(msg.startswith(prefix),
-'"%s" does not start with "%s"' % (msg, prefix))
-
-  def test_str(self):
-self.assertEqual('',
- str(PTransform()))
-
-pa = Pipeline('DirectPipelineRunner')
-res = pa | df.Create('a_label', [1, 2])
-self.assertEqual('',
- str(res.producer.transform))
-
-pc = Pipeline('DirectPipelineRunner')
-res = pc | df.Create('with_inputs', [1, 2])
-inputs_tr = res.producer.transform
-inputs_tr.inputs = ('ci',)
-self.assertEqual(
-"""""",
-str(inputs_tr))
-
-pd = Pipeline('DirectPipelineRunner')
-res = pd | df.Create('with_sidei', [1, 2])
-side_tr = res.producer.transform
-side_tr.side_inputs = (4,)
-self.assertEqual(
-'',
-str(side_tr))
-
-inputs_tr.side_inputs = ('cs',)
-self.assertEqual(
-"""""",
-str(inputs_tr))
-
-  def test_parse_label_and_arg(self):
-
-def fun(*args, **kwargs):
-  return PTransform().parse_label_and_arg(args, kwargs, 'name')
-
-self.assertEqual(('PTransform', 'value'), fun('value'))
-self.assertEqual(('PTransform', 'value'), fun(name='value'))
-self.assertEqual(('label', 'value'), fun('label', 'value'))
-self.assertEqual(('label', 'value'), fun('label', name='value'))
-self.assertEqual(('label', 'value'), fun('value', label='label'))
-self.assertEqual(('label', 'value'), fun(name='value', label='label'))
-
-self.assertRaises(ValueError, fun)
-self.assertRaises(ValueError, fun, 0, 'value')
-self.assertRaises(ValueError, fun, label=0, name='value')
-self.assertRaises(ValueError, fun, other='value')
-
-with self.assertRaises(ValueError) as cm:
-  fun(0, name='value')
-self.assertEqual(
-cm.exception.message,
-'PTransform expects a (label, name) or (name) argument list '
-'instead of args=(0,), kwargs={\'name\': \'value\'}')
-
-  def test_do_with_do_fn(self):
-class AddNDoFn(df.DoFn):
-
-  def process(self, context, addon):
-return [context.element + addon]
-
-pipeline = Pipeline('DirectPipelineRunner')
-pcoll = pipeline | df.Create('start', [1, 2, 3])
-result = pcoll | df.ParDo('do', AddNDoFn(), 10)
-assert_that(result, equal_to([11, 12, 13]))
-pipeline.run()
-
-  def test_do_with_unconstructed_do_fn(self):
-class MyDoFn(df.DoFn):
-
-  def process(self, context):
-pass
-
-pipeline = Pipeline('DirectPipelineRunner')
-

[19/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
 
b/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
deleted file mode 100644
index c2eaea1..000
--- 
a/sdks/python/google/cloud/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
+++ /dev/null
@@ -1,316 +0,0 @@
-"""Generated client library for dataflow version v1b3."""
-# NOTE: This file is autogenerated and should not be edited by hand.
-from apitools.base.py import base_api
-from google.cloud.dataflow.internal.clients.dataflow import 
dataflow_v1b3_messages as messages
-
-
-class DataflowV1b3(base_api.BaseApiClient):
-  """Generated client library for service dataflow version v1b3."""
-
-  MESSAGES_MODULE = messages
-
-  _PACKAGE = u'dataflow'
-  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/userinfo.email']
-  _VERSION = u'v1b3'
-  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
-  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _CLIENT_CLASS_NAME = u'DataflowV1b3'
-  _URL_VERSION = u'v1b3'
-  _API_KEY = None
-
-  def __init__(self, url='', credentials=None,
-   get_credentials=True, http=None, model=None,
-   log_request=False, log_response=False,
-   credentials_args=None, default_global_params=None,
-   additional_http_headers=None):
-"""Create a new dataflow handle."""
-url = url or u'https://dataflow.googleapis.com/'
-super(DataflowV1b3, self).__init__(
-url, credentials=credentials,
-get_credentials=get_credentials, http=http, model=model,
-log_request=log_request, log_response=log_response,
-credentials_args=credentials_args,
-default_global_params=default_global_params,
-additional_http_headers=additional_http_headers)
-self.projects_jobs_messages = self.ProjectsJobsMessagesService(self)
-self.projects_jobs_workItems = self.ProjectsJobsWorkItemsService(self)
-self.projects_jobs = self.ProjectsJobsService(self)
-self.projects = self.ProjectsService(self)
-
-  class ProjectsJobsMessagesService(base_api.BaseApiService):
-"""Service class for the projects_jobs_messages resource."""
-
-_NAME = u'projects_jobs_messages'
-
-def __init__(self, client):
-  super(DataflowV1b3.ProjectsJobsMessagesService, self).__init__(client)
-  self._method_configs = {
-  'List': base_api.ApiMethodInfo(
-  http_method=u'GET',
-  method_id=u'dataflow.projects.jobs.messages.list',
-  ordered_params=[u'projectId', u'jobId'],
-  path_params=[u'jobId', u'projectId'],
-  query_params=[u'endTime', u'minimumImportance', u'pageSize', 
u'pageToken', u'startTime'],
-  relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/messages',
-  request_field='',
-  request_type_name=u'DataflowProjectsJobsMessagesListRequest',
-  response_type_name=u'ListJobMessagesResponse',
-  supports_download=False,
-  ),
-  }
-
-  self._upload_configs = {
-  }
-
-def List(self, request, global_params=None):
-  """Request the job status.
-
-  Args:
-request: (DataflowProjectsJobsMessagesListRequest) input message
-global_params: (StandardQueryParameters, default: None) global 
arguments
-  Returns:
-(ListJobMessagesResponse) The response message.
-  """
-  config = self.GetMethodConfig('List')
-  return self._RunMethod(
-  config, request, global_params=global_params)
-
-  class ProjectsJobsWorkItemsService(base_api.BaseApiService):
-"""Service class for the projects_jobs_workItems resource."""
-
-_NAME = u'projects_jobs_workItems'
-
-def __init__(self, client):
-  super(DataflowV1b3.ProjectsJobsWorkItemsService, self).__init__(client)
-  self._method_configs = {
-  'Lease': base_api.ApiMethodInfo(
-  http_method=u'POST',
-  method_id=u'dataflow.projects.jobs.workItems.lease',
-  ordered_params=[u'projectId', u'jobId'],
-  path_params=[u'jobId', u'projectId'],
-  query_params=[],
-  
relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/workItems:lease',
-  request_field=u'leaseWorkItemRequest',
-  request_type_name=u'DataflowProjectsJobsWorkItemsLeaseRequest',
-  response_type_name=u'LeaseWorkItemResponse',
-  supports_download=False,
-  ),
-  'ReportStatus': base_api.ApiMethodInfo(
-  http_method=u'POST',
-

[15/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/internal/pickler.py
--
diff --git a/sdks/python/google/cloud/dataflow/internal/pickler.py 
b/sdks/python/google/cloud/dataflow/internal/pickler.py
deleted file mode 100644
index 00f7fc7..000
--- a/sdks/python/google/cloud/dataflow/internal/pickler.py
+++ /dev/null
@@ -1,205 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Pickler for values, functions, and classes.
-
-Pickles created by the pickling library contain non-ASCII characters, so
-we base64-encode the results so that we can put them in a JSON objects.
-The pickler is used to embed FlatMap callable objects into the workflow JSON
-description.
-
-The pickler module should be used to pickle functions and modules; for values,
-the coders.*PickleCoder classes should be used instead.
-"""
-
-import base64
-import logging
-import sys
-import traceback
-import types
-
-import dill
-
-
-def is_nested_class(cls):
-  """Returns true if argument is a class object that appears to be nested."""
-  return (isinstance(cls, type)
-  and cls.__module__ != '__builtin__'
-  and cls.__name__ not in sys.modules[cls.__module__].__dict__)
-
-
-def find_containing_class(nested_class):
-  """Finds containing class of a nestec class passed as argument."""
-
-  def find_containing_class_inner(outer):
-for k, v in outer.__dict__.items():
-  if v is nested_class:
-return outer, k
-  elif isinstance(v, (type, types.ClassType)) and hasattr(v, '__dict__'):
-res = find_containing_class_inner(v)
-if res: return res
-
-  return find_containing_class_inner(sys.modules[nested_class.__module__])
-
-
-def _nested_type_wrapper(fun):
-  """A wrapper for the standard pickler handler for class objects.
-
-  Args:
-fun: Original pickler handler for type objects.
-
-  Returns:
-A wrapper for type objects that handles nested classes.
-
-  The wrapper detects if an object being pickled is a nested class object.
-  For nested class object only it will save the containing class object so
-  the nested structure is recreated during unpickle.
-  """
-
-  def wrapper(pickler, obj):
-# When the nested class is defined in the __main__ module we do not have to
-# do anything special because the pickler itself will save the constituent
-# parts of the type (i.e., name, base classes, dictionary) and then
-# recreate it during unpickling.
-if is_nested_class(obj) and obj.__module__ != '__main__':
-  containing_class_and_name = find_containing_class(obj)
-  if containing_class_and_name is not None:
-return pickler.save_reduce(
-getattr, containing_class_and_name, obj=obj)
-try:
-  return fun(pickler, obj)
-except dill.dill.PicklingError:
-  # pylint: disable=protected-access
-  return pickler.save_reduce(
-  dill.dill._create_type,
-  (type(obj), obj.__name__, obj.__bases__,
-   dill.dill._dict_from_dictproxy(obj.__dict__)),
-  obj=obj)
-  # pylint: enable=protected-access
-
-  return wrapper
-
-# Monkey patch the standard pickler dispatch table entry for type objects.
-# Dill, for certain types, defers to the standard pickler (including type
-# objects). We wrap the standard handler using type_wrapper() because
-# for nested class we want to pickle the actual enclosing class object so we
-# can recreate it during unpickling.
-# TODO(silviuc): Make sure we submit the fix upstream to GitHub dill project.
-dill.dill.Pickler.dispatch[type] = _nested_type_wrapper(
-dill.dill.Pickler.dispatch[type])
-
-
-# Dill pickles generators objects without complaint, but unpickling produces
-# TypeError: object.__new__(generator) is not safe, use generator.__new__()
-# on some versions of Python.
-def reject_generators(unused_pickler, unused_obj):
-  raise TypeError("can't (safely) pickle generator objects")
-dill.dill.Pickler.dispatch[types.GeneratorType] = reject_generators
-
-
-# This if guards against dill not being full initialized when generating docs.
-if 'save_module' in dir(dill.dill):
-
-  # Always pickle non-main modules by name.
-  old_save_module = dill.dill.save_module
-
-  @dill.dill.register(dill.dill.ModuleType)
-  def save_module(pickler, obj):
-if dill.dill.is_dill(pickler) and obj is

[01/50] [abbrv] incubator-beam git commit: Move all files to apache_beam folder

2016-06-14 Thread davor

Repository: incubator-beam
Updated Branches:
  refs/heads/python-sdk d53e96a0d -> cd0f50980


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b14dfadd/sdks/python/google/cloud/dataflow/utils/pipeline_options_validator_test.py
--
diff --git 
a/sdks/python/google/cloud/dataflow/utils/pipeline_options_validator_test.py 
b/sdks/python/google/cloud/dataflow/utils/pipeline_options_validator_test.py
deleted file mode 100644
index 84cdb93..000
--- a/sdks/python/google/cloud/dataflow/utils/pipeline_options_validator_test.py
+++ /dev/null
@@ -1,234 +0,0 @@
-# Copyright 2016 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#  http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Unit tests for the pipeline options validator module."""
-
-import logging
-import unittest
-
-from google.cloud.dataflow.utils.options import PipelineOptions
-from google.cloud.dataflow.utils.pipeline_options_validator import 
PipelineOptionsValidator
-
-
-# Mock runners to use for validations.
-class MockRunners(object):
-
-  class DataflowPipelineRunner(object):
-pass
-
-  class OtherRunner(object):
-pass
-
-
-class SetupTest(unittest.TestCase):
-
-  def check_errors_for_arguments(self, errors, args):
-"""Checks that there is exactly one error for each given argument."""
-missing = []
-remaining = list(errors)
-
-for arg in args:
-  found = False
-  for error in remaining:
-if arg in error:
-  remaining.remove(error)
-  found = True
-  break
-  if not found:
-missing.append('Missing error for: ' + arg)
-
-# Return missing and remaining (not matched) errors.
-return missing + remaining
-
-  def test_local_runner(self):
-runner = MockRunners.OtherRunner()
-options = PipelineOptions([])
-validator = PipelineOptionsValidator(options, runner)
-errors = validator.validate()
-self.assertEqual(len(errors), 0)
-
-  def test_missing_required_options(self):
-options = PipelineOptions([''])
-runner = MockRunners.DataflowPipelineRunner()
-validator = PipelineOptionsValidator(options, runner)
-errors = validator.validate()
-
-self.assertEqual(
-self.check_errors_for_arguments(
-errors,
-['project', 'job_name', 'staging_location', 'temp_location']),
-[])
-
-  def test_gcs_path(self):
-def get_validator(temp_location):
-  options = ['--project=example:example', '--job_name=job',
- '--staging_location=gs://foo/bar']
-
-  if temp_location is not None:
-options.append('--temp_location=' + temp_location)
-
-  pipeline_options = PipelineOptions(options)
-  runner = MockRunners.DataflowPipelineRunner()
-  validator = PipelineOptionsValidator(pipeline_options, runner)
-  return validator
-
-test_cases = [
-{'temp_location': None, 'errors': ['temp_location']},
-{'temp_location': 'gcs:/foo/bar', 'errors': ['temp_location']},
-{'temp_location': 'gs:/foo/bar', 'errors': ['temp_location']},
-{'temp_location': 'gs://ABC/bar', 'errors': ['temp_location']},
-{'temp_location': 'gs://ABC/bar', 'errors': ['temp_location']},
-{'temp_location': 'gs://foo', 'errors': ['temp_location']},
-{'temp_location': 'gs://foo/', 'errors': []},
-{'temp_location': 'gs://foo/bar', 'errors': []},
-]
-
-for case in test_cases:
-  errors = get_validator(case['temp_location']).validate()
-  self.assertEqual(
-  self.check_errors_for_arguments(errors, case['errors']), [])
-
-  def test_project(self):
-def get_validator(project):
-  options = ['--job_name=job', '--staging_location=gs://foo/bar',
- '--temp_location=gs://foo/bar']
-
-  if project is not None:
-options.append('--project=' + project)
-
-  pipeline_options = PipelineOptions(options)
-  runner = MockRunners.DataflowPipelineRunner()
-  validator = PipelineOptionsValidator(pipeline_options, runner)
-  return validator
-
-test_cases = [
-{'project': None, 'errors': ['project']},
-{'project': '12345', 'errors': ['project']},
-{'project': 'FOO', 'errors': ['project']},
-{'project': 'foo:BAR', 'errors': ['project']},
-{'project': 'fo', 'errors': ['project']},
-{'project': 'foo', 'errors': []},
-{'project': 'foo:bar', 'errors': []},
-]
-
-for case in

[GitHub] incubator-beam pull request #463: Explicitly set the Runner in TestFlinkPipe...

2016-06-14 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/463

Explicitly set the Runner in TestFlinkPipelineRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This ensures that the created PipelineOptions are valid if the
DirectRunner is not on the classpath.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
test_flink_runner_explicit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #463


commit 13ede5437afb2fec9f0688e39af3301937edf7a5
Author: Thomas Groh 
Date:   2016-06-14T22:49:34Z

Explicitly set the Runner in TestFlinkPipelineRunner

This ensures that the created PipelineOptions are valid if the
DirectRunner is not on the classpath.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-beam pull request #462: Use TimestampedValue in DoFnTester

2016-06-14 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/462

Use TimestampedValue in DoFnTester

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This removes the duplicate OutputElementWithTimestamp data structure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
use_dofntester_timestamped_value

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/462.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #462


commit 4da5ebfbf021051288620634ec84cafa9208265c
Author: Thomas Groh 
Date:   2016-06-14T20:39:59Z

Use TimestampedValue in DoFnTester

This removes the duplicate OutputElementWithTimestamp data structure.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] incubator-beam-site git commit: This closes #20

2016-06-14 Thread davor

This closes #20


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/4f3d0f9f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/4f3d0f9f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/4f3d0f9f

Branch: refs/heads/asf-site
Commit: 4f3d0f9fccac1b325f4fb89d2eca79de68629ff0
Parents: 0c5c647 3e35c5f
Author: Davor Bonaci 
Authored: Tue Jun 14 13:34:27 2016 -0700
Committer: Davor Bonaci 
Committed: Tue Jun 14 13:34:27 2016 -0700

--
 _data/authors.yml   |  4 +++
 .../2016-06-15-flink-batch-runner-milestone.md  | 32 
 2 files changed, 36 insertions(+)
--

[GitHub] incubator-beam pull request #461: Initial Beam Python SDK

2016-06-14 Thread silviulica

GitHub user silviulica opened a pull request:

https://github.com/apache/incubator-beam/pull/461

Initial Beam Python SDK

Baby Beam Python SDK joins the Beam family.
Code moved from https://github.com/GoogleCloudPlatform/DataflowPythonSDK/

R: @davorbonaci, @francesperry, @robertwb 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/silviulica/incubator-beam beam_python

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #461


commit f18859267f4b1b44ac1bf175c763295d8f4fc6c3
Author: Silviu Calinoiu 
Date:   2016-02-25T16:10:33Z

first commit

commit 9d0079f4720757d6bc725c2b9bd7711a2ff22e3d
Author: Silviu Calinoiu 
Date:   2016-02-25T16:15:47Z

Initial push.

commit 5dbf438c501332506eda2c9278cf8340939ece35
Author: Silviu Calinoiu 
Date:   2016-02-25T21:42:41Z

Several refactorings in preparation for making the repo public.

commit 784a342660b8a875a2b3499ff8202fc50327640f
Author: Silviu Calinoiu 
Date:   2016-02-25T22:11:42Z

Small fixes in BigQuery snippets and wordcount example.

commit f5d1b52f158abf33ceb03bf29a0c44a126192f3b
Author: robertwb 
Date:   2016-02-25T22:12:47Z

Python Dataflow fit-n-finish.

* Updated duplicate label message to be more in line with java. (Also, the 
issue is more often than not a different transform of the same name, e.g. two 
Creates.)
* Actually call default_label. Eliminate messy traceback when 
PTransform.__init__ is not called.
* Add a DeterministicPickleCoder that pickles but fails at runtime if the 
encoding is not known to be deterministic.
* Get rid of some spurious warnings for incompletely inferred types.
* Remove obsolete TODOs.
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115602962

commit 914231c502a30e67b8e6b5a2e7723c797e650c38
Author: gildea 
Date:   2016-02-26T01:44:47Z

README: add explicit Table of Contents.

Add new script update-readme-toc.sh to update this section when the
document contents change.  Add anchors in the README.md that the
script can use to collect section names and point to sections.
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115624954

commit 40b39a4a630f29ded91896347512642d28abbbc7
Author: chamikara 
Date:   2016-02-26T23:32:28Z

Some more fixes related to argument passing.
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115716125

commit 553936c8775bc9ba4376b0a4cc220b28316e14ec
Author: gildea 
Date:   2016-02-26T23:41:32Z

Readme: add a missing section anchor, close all anchors.
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115717014

commit 56d13586ef255fd76e9d32b03e996c41e47decab
Author: gildea 
Date:   2016-02-27T00:52:22Z

"README" edit from Robert: []
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115723709

commit 347c509e9683ab526bf6c91c77525eb4ce362a3c
Author: chamikara 
Date:   2016-02-29T19:47:06Z

Code snippets for Web doc on PipelineOptions.
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=115880615

commit 62ac247028b6b47afdcd5281e4d765f87cf7e628
Author: gildea 
Date:   2016-03-01T16:28:55Z

Depend on google-apitools-dataflow-v1b3 >= 0.4.20160217

New version includes JobState enum value JOB_STATE_DRAINING.
Release Notes
[]
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=116002538

commit c25e77f1a160a0f2e84b9d181e50b9193f4b72f1
Author: chamikara 
Date:   2016-03-02T04:27:25Z

Performs several updates to doc snippents for PipelineOptions.

Release Notes
[]
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=116093263

commit 155bd6dd63f65405c7b9457013f845f3ad0a2c85
Author: altay 
Date:   2016-03-02T22:50:44Z

Validate pipeline options at the time of pipeline creation.

Release Notes
[]
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=116185013

commit 5be011f7c87ff69c18b16b138c5ebc622fbcbe18
Author: altay 
Date:   2016-03-02T23:02:00Z

Simplify whitelist warning to show warning before every run.

Release Notes
[]
-
Created by MOE: https://github.com/google/moe

[jira] [Created] (BEAM-341) ReduceFnRunner allows GC time overflow

2016-06-14 Thread Kenneth Knowles (JIRA)

Kenneth Knowles created BEAM-341:


 Summary: ReduceFnRunner allows GC time overflow
 Key: BEAM-341
 URL: https://issues.apache.org/jira/browse/BEAM-341
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


In {{ReduceFnRunner}}, any window ending after the global window has its GC 
time capped to the end of the global window. But for windows ending before the 
global window the allowed lateness can still be arbitrary, causing overflow.

http://stackoverflow.com/questions/37808159/why-am-i-getting-java-lang-illegalstateexception-on-google-dataflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (BEAM-340) Use Avro coder for KafkaIO

2016-06-14 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi resolved BEAM-340.
---
Resolution: Duplicate

> Use Avro coder for KafkaIO 
> ---
>
> Key: BEAM-340
> URL: https://issues.apache.org/jira/browse/BEAM-340
> Project: Beam
>  Issue Type: Improvement
>Reporter: Raghu Angadi
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (BEAM-340) Use Avro coder for KafkaIO

2016-06-14 Thread Raghu Angadi (JIRA)

Raghu Angadi created BEAM-340:
-

 Summary: Use Avro coder for KafkaIO 
 Key: BEAM-340
 URL: https://issues.apache.org/jira/browse/BEAM-340
 Project: Beam
  Issue Type: Improvement
Reporter: Raghu Angadi






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] incubator-beam pull request #455: Set trimStackTrace to false for WordCountI...

2016-06-14 Thread peihe

Github user peihe closed the pull request at:

https://github.com/apache/incubator-beam/pull/455


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-beam pull request #452: Fix PAssert with empty collections

2016-06-14 Thread kennknowles

Github user kennknowles closed the pull request at:

https://github.com/apache/incubator-beam/pull/452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-243) Remove DirectPipelineRunner and keep only the InProcessPipelineRunner

2016-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329912#comment-15329912
 ] 

ASF GitHub Bot commented on BEAM-243:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/446


> Remove DirectPipelineRunner and keep only the InProcessPipelineRunner
> -
>
> Key: BEAM-243
> URL: https://issues.apache.org/jira/browse/BEAM-243
> Project: Beam
>  Issue Type: Task
>  Components: runner-direct
>Reporter: Jean-Baptiste Onofré
>Assignee: Thomas Groh
>
> We have two runners for local JVM/process: the "old" DirectPipelineRunner and 
> the "new" InProcessPipelineRunner.
> They have different feature (for instance the DirectPipelineRunner doesn't 
> support Unbounded PCollection, whereas InProcessPipelineRunner does).
> To avoid confusion, we could remove the "old" DirectPipelineRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[7/8] incubator-beam git commit: Update the Default Pipeline Runner

2016-06-14 Thread kenn

Update the Default Pipeline Runner

Select the InProcessRunner if it is on the classpath, and throw an
exception otherwise.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/a3ffd510
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/a3ffd510
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/a3ffd510

Branch: refs/heads/master
Commit: a3ffd510896626019723294931a4c3763faf43af
Parents: 816a3bf
Author: Thomas Groh 
Authored: Wed May 18 16:56:06 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 14 09:57:17 2016 -0700

--
 examples/java/pom.xml   |  7 ++
 runners/google-cloud-dataflow-java/pom.xml  | 11 +++
 sdks/java/core/pom.xml  |  3 +
 .../beam/sdk/options/PipelineOptions.java   | 31 +++-
 .../sdk/options/PipelineOptionsFactoryTest.java | 79 +++-
 .../beam/sdk/options/PipelineOptionsTest.java   |  8 --
 .../options/PipelineOptionsValidatorTest.java   | 15 
 .../sdk/runners/DirectPipelineRunnerTest.java   |  1 +
 .../beam/sdk/testing/TestPipelineTest.java  |  5 +-
 sdks/java/extensions/join-library/pom.xml   |  7 ++
 sdks/java/io/google-cloud-platform/pom.xml  |  7 ++
 sdks/java/io/hdfs/pom.xml   |  7 ++
 sdks/java/io/kafka/pom.xml  |  7 ++
 sdks/java/java8tests/pom.xml|  7 ++
 .../beam/sdk/transforms/WithKeysJava8Test.java  |  3 +-
 .../main/resources/archetype-resources/pom.xml  |  6 ++
 16 files changed, 170 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a3ffd510/examples/java/pom.xml
--
diff --git a/examples/java/pom.xml b/examples/java/pom.xml
index 3d81338..5211b80 100644
--- a/examples/java/pom.xml
+++ b/examples/java/pom.xml
@@ -49,6 +49,7 @@
 maven-surefire-plugin
 
   
+
 
 
   
@@ -213,6 +214,12 @@
 
 
   org.apache.beam
+  beam-runners-direct-java
+  ${project.version}
+
+
+
+  org.apache.beam
   beam-runners-google-cloud-dataflow-java
   ${project.version}
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a3ffd510/runners/google-cloud-dataflow-java/pom.xml
--
diff --git a/runners/google-cloud-dataflow-java/pom.xml 
b/runners/google-cloud-dataflow-java/pom.xml
index a6dfae3..6d8e94b 100644
--- a/runners/google-cloud-dataflow-java/pom.xml
+++ b/runners/google-cloud-dataflow-java/pom.xml
@@ -84,6 +84,17 @@
 
   
 org.apache.maven.plugins
+maven-surefire-plugin
+
+  
+
+true
+  
+
+  
+
+  
+org.apache.maven.plugins
 maven-dependency-plugin
 
   

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a3ffd510/sdks/java/core/pom.xml
--
diff --git a/sdks/java/core/pom.xml b/sdks/java/core/pom.xml
index 372a913..c559cff 100644
--- a/sdks/java/core/pom.xml
+++ b/sdks/java/core/pom.xml
@@ -129,6 +129,9 @@
   
 org.apache.beam.sdk.testing.NeedsRunner
   
+  
+true
+  
 
   
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a3ffd510/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
index a2f38ed..b1b5280 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
@@ -21,11 +21,11 @@ import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.options.GoogleApiDebugOptions.GoogleApiTracer;
 import org.apache.beam.sdk.options.ProxyInvocationHandler.Deserializer;
 import org.apache.beam.sdk.options.ProxyInvocationHandler.Serializer;
-import org.apache.beam.sdk.runners.DirectPipelineRunner;
 import org.apache.beam.sdk.runners.PipelineRunner;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.DoFn.Context;
 import org.apache.beam.sdk.transforms.display.HasDisplayData;
+
 import com.google.auto.service.AutoService;
 
 import com.fasterxml.jackson.annotation.JsonIgnore;
@@ -225,7 +225,7 @@ public interface PipelineOptions extends HasDisplayData {
   @Description("The pipeline runner that will

[5/8] incubator-beam git commit: Increase Visibility of Flink Test PipelineOptions

2016-06-14 Thread kenn

Increase Visibility of Flink Test PipelineOptions

This fixes an issue where the package-private nature would cause an
exception


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/816a3bf1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/816a3bf1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/816a3bf1

Branch: refs/heads/master
Commit: 816a3bf19f21f224fdfed2fb5bddb436293f655c
Parents: 4d1e68a
Author: Thomas Groh 
Authored: Fri Jun 10 14:47:53 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 14 09:57:17 2016 -0700

--
 .../apache/beam/runners/flink/PipelineOptionsTest.java   | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/816a3bf1/runners/flink/runner/src/test/java/org/apache/beam/runners/flink/PipelineOptionsTest.java
--
diff --git 
a/runners/flink/runner/src/test/java/org/apache/beam/runners/flink/PipelineOptionsTest.java
 
b/runners/flink/runner/src/test/java/org/apache/beam/runners/flink/PipelineOptionsTest.java
index d571f31..61e219c 100644
--- 
a/runners/flink/runner/src/test/java/org/apache/beam/runners/flink/PipelineOptionsTest.java
+++ 
b/runners/flink/runner/src/test/java/org/apache/beam/runners/flink/PipelineOptionsTest.java
@@ -17,6 +17,10 @@
  */
 package org.apache.beam.runners.flink;
 
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
 import 
org.apache.beam.runners.flink.translation.utils.SerializedPipelineOptions;
 import 
org.apache.beam.runners.flink.translation.wrappers.streaming.FlinkAbstractParDoWrapper;
 import org.apache.beam.sdk.options.Default;
@@ -30,6 +34,7 @@ import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.util.WindowingInternals;
 import org.apache.beam.sdk.util.WindowingStrategy;
 import org.apache.beam.sdk.values.TupleTag;
+
 import org.apache.commons.lang.SerializationUtils;
 import org.apache.flink.util.Collector;
 import org.joda.time.Instant;
@@ -38,16 +43,12 @@ import org.junit.BeforeClass;
 import org.junit.Test;
 import org.mockito.Mockito;
 
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertNotNull;
-import static org.junit.Assert.assertTrue;
-
 /**
  * Tests the serialization and deserialization of PipelineOptions.
  */
 public class PipelineOptionsTest {
 
-  private interface MyOptions extends FlinkPipelineOptions {
+  public interface MyOptions extends FlinkPipelineOptions {
 @Description("Bla bla bla")
 @Default.String("Hello")
 String getTestOption();

[8/8] incubator-beam git commit: This closes #446

2016-06-14 Thread kenn

This closes #446


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/77494401
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/77494401
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/77494401

Branch: refs/heads/master
Commit: 774944014af046f55b995a22258cbe7195b7b6f8
Parents: c8ad2e7 a3ffd51
Author: Kenneth Knowles 
Authored: Tue Jun 14 09:57:31 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 14 09:57:31 2016 -0700

--
 examples/java/pom.xml   |  7 ++
 runners/direct-java/pom.xml |  6 ++
 .../direct/AvroIOShardedWriteFactoryTest.java   | 12 ++-
 .../direct/InProcessPipelineRunnerTest.java | 11 ++-
 .../direct/KeyedPValueTrackingVisitorTest.java  |  7 +-
 .../direct/TextIOShardedWriteFactoryTest.java   | 12 ++-
 .../beam/runners/flink/PipelineOptionsTest.java | 11 +--
 runners/google-cloud-dataflow-java/pom.xml  | 11 +++
 .../BlockingDataflowPipelineRunnerTest.java |  2 +
 .../dataflow/DataflowPipelineRunnerTest.java|  2 +
 .../runners/dataflow/io/DataflowTextIOTest.java | 81 +---
 .../translation/WindowedWordCountTest.java  | 19 +++--
 sdks/java/core/pom.xml  |  3 +
 .../beam/sdk/options/PipelineOptions.java   | 31 +++-
 .../org/apache/beam/sdk/io/BigQueryIOTest.java  |  4 +-
 .../java/org/apache/beam/sdk/io/TextIOTest.java | 75 ++
 .../java/org/apache/beam/sdk/io/WriteTest.java  |  9 +--
 .../sdk/options/PipelineOptionsFactoryTest.java | 79 ++-
 .../beam/sdk/options/PipelineOptionsTest.java   |  8 --
 .../options/PipelineOptionsValidatorTest.java   | 15 
 .../sdk/runners/DirectPipelineRunnerTest.java   |  1 +
 .../beam/sdk/testing/TestPipelineTest.java  |  5 +-
 sdks/java/extensions/join-library/pom.xml   |  7 ++
 sdks/java/io/google-cloud-platform/pom.xml  |  7 ++
 sdks/java/io/hdfs/pom.xml   |  7 ++
 sdks/java/io/kafka/pom.xml  |  7 ++
 sdks/java/java8tests/pom.xml|  7 ++
 .../beam/sdk/transforms/WithKeysJava8Test.java  |  3 +-
 .../main/resources/archetype-resources/pom.xml  |  6 ++
 29 files changed, 314 insertions(+), 141 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/77494401/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
--

[1/8] incubator-beam git commit: Move GcsUtil TextIO Tests to TextIOTest

2016-06-14 Thread kenn

Repository: incubator-beam
Updated Branches:
  refs/heads/master c8ad2e7dd -> 774944014


Move GcsUtil TextIO Tests to TextIOTest

These tests are not a test of the DataflowRunner, nor any
DataflowRunner specific behavior, so they should be part of TextIOTest


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/f2fb59c6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/f2fb59c6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/f2fb59c6

Branch: refs/heads/master
Commit: f2fb59c65119d5da56df5dd4e64fa1873c6ccbbb
Parents: f73bd73
Author: Thomas Groh 
Authored: Fri Jun 10 14:43:10 2016 -0700
Committer: Thomas Groh 
Committed: Fri Jun 10 14:49:33 2016 -0700

--
 .../runners/dataflow/io/DataflowTextIOTest.java | 80 +---
 .../java/org/apache/beam/sdk/io/TextIOTest.java | 75 ++
 2 files changed, 77 insertions(+), 78 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f2fb59c6/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
index 0d7c1cb..ae711f0 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
@@ -27,30 +27,16 @@ import static org.junit.Assert.assertThat;
 import org.apache.beam.runners.dataflow.DataflowPipelineRunner;
 import org.apache.beam.runners.dataflow.testing.TestDataflowPipelineOptions;
 import 
org.apache.beam.runners.dataflow.transforms.DataflowDisplayDataEvaluator;
-import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.io.TextIO;
-import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.display.DisplayData;
 import org.apache.beam.sdk.transforms.display.DisplayDataEvaluator;
-import org.apache.beam.sdk.util.GcsUtil;
 import org.apache.beam.sdk.util.TestCredential;
-import org.apache.beam.sdk.util.gcsfs.GcsPath;
-
-import com.google.common.collect.ImmutableList;
 
 import org.junit.Test;
 import org.junit.runner.RunWith;
 import org.junit.runners.JUnit4;
-import org.mockito.Mockito;
-import org.mockito.invocation.InvocationOnMock;
-import org.mockito.stubbing.Answer;
 
-import java.io.IOException;
-import java.nio.channels.FileChannel;
-import java.nio.channels.SeekableByteChannel;
-import java.nio.file.Files;
-import java.nio.file.StandardOpenOption;
-import java.util.List;
 import java.util.Set;
 
 /**
@@ -60,73 +46,11 @@ import java.util.Set;
 public class DataflowTextIOTest {
   private TestDataflowPipelineOptions buildTestPipelineOptions() {
 TestDataflowPipelineOptions options =
-PipelineOptionsFactory.as(TestDataflowPipelineOptions.class);
+
TestPipeline.testingPipelineOptions().as(TestDataflowPipelineOptions.class);
 options.setGcpCredential(new TestCredential());
 return options;
   }
 
-  private GcsUtil buildMockGcsUtil() throws IOException {
-GcsUtil mockGcsUtil = Mockito.mock(GcsUtil.class);
-
-// Any request to open gets a new bogus channel
-Mockito
-.when(mockGcsUtil.open(Mockito.any(GcsPath.class)))
-.then(new Answer() {
-  @Override
-  public SeekableByteChannel answer(InvocationOnMock invocation) 
throws Throwable {
-return FileChannel.open(
-Files.createTempFile("channel-", ".tmp"),
-StandardOpenOption.CREATE, StandardOpenOption.DELETE_ON_CLOSE);
-  }
-});
-
-// Any request for expansion returns a list containing the original GcsPath
-// This is required to pass validation that occurs in TextIO during apply()
-Mockito
-.when(mockGcsUtil.expand(Mockito.any(GcsPath.class)))
-.then(new Answer() {
-  @Override
-  public List answer(InvocationOnMock invocation) throws 
Throwable {
-return ImmutableList.of((GcsPath) invocation.getArguments()[0]);
-  }
-});
-
-return mockGcsUtil;
-  }
-
-  /**
-   * This tests a few corner cases that should not crash.
-   */
-  @Test
-  public void testGoodWildcards() throws Exception {
-TestDataflowPipelineOptions options = buildTestPipelineOptions();
-options.setGcsUtil(buildMockGcsUtil());
-
-Pipeline pipeline = Pipeline.create(options);
-
-

[GitHub] incubator-beam pull request #446: [BEAM-243][BEAM-22] Change the Default Pip...

2016-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/446


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[6/8] incubator-beam git commit: Set Runner in DataflowRunner Tests

2016-06-14 Thread kenn

Set Runner in DataflowRunner Tests

Otherwise the Default Runner is used, which may be unavailable.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/4d1e68af
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/4d1e68af
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/4d1e68af

Branch: refs/heads/master
Commit: 4d1e68af96d7ef44403add666333fab29f849c69
Parents: f2fb59c
Author: Thomas Groh 
Authored: Fri Jun 10 14:45:58 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 14 09:57:17 2016 -0700

--
 .../beam/runners/dataflow/BlockingDataflowPipelineRunnerTest.java  | 2 ++
 .../apache/beam/runners/dataflow/DataflowPipelineRunnerTest.java   | 2 ++
 .../org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java| 1 -
 3 files changed, 4 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/4d1e68af/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BlockingDataflowPipelineRunnerTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BlockingDataflowPipelineRunnerTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BlockingDataflowPipelineRunnerTest.java
index bc570e1..55b4027 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BlockingDataflowPipelineRunnerTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/BlockingDataflowPipelineRunnerTest.java
@@ -196,6 +196,7 @@ public class BlockingDataflowPipelineRunnerTest {
 DataflowPipelineRunner mockRunner = mock(DataflowPipelineRunner.class);
 TestDataflowPipelineOptions options =
 PipelineOptionsFactory.as(TestDataflowPipelineOptions.class);
+options.setRunner(BlockingDataflowPipelineRunner.class);
 options.setProject(job.getProjectId());
 
 when(mockRunner.run(isA(Pipeline.class))).thenReturn(job);
@@ -296,6 +297,7 @@ public class BlockingDataflowPipelineRunnerTest {
 options.setTempLocation("gs://test/temp/location");
 options.setGcpCredential(new TestCredential());
 options.setPathValidatorClass(NoopPathValidator.class);
+options.setRunner(BlockingDataflowPipelineRunner.class);
 assertEquals("BlockingDataflowPipelineRunner#testjobname",
 BlockingDataflowPipelineRunner.fromOptions(options).toString());
   }

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/4d1e68af/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineRunnerTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineRunnerTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineRunnerTest.java
index aa65dd1..f7068b0 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineRunnerTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineRunnerTest.java
@@ -452,6 +452,7 @@ public class DataflowPipelineRunnerTest {
 options.setProject(PROJECT_ID);
 options.setGcpCredential(new TestCredential());
 options.setGcsUtil(buildMockGcsUtil(true /* bucket exists */));
+options.setRunner(DataflowPipelineRunner.class);
 
 DataflowPipelineRunner.fromOptions(options);
 
@@ -866,6 +867,7 @@ public class DataflowPipelineRunnerTest {
 options.setTempLocation("gs://test/temp/location");
 options.setGcpCredential(new TestCredential());
 options.setPathValidatorClass(NoopPathValidator.class);
+options.setRunner(DataflowPipelineRunner.class);
 assertEquals(
 "DataflowPipelineRunner#testjobname",
 DataflowPipelineRunner.fromOptions(options).toString());

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/4d1e68af/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
index ae711f0..0340435 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/io/DataflowTextIOTest.java
+++

[3/8] incubator-beam git commit: Update Direct Module tests to explicitly set Pipeline

2016-06-14 Thread kenn

Update Direct Module tests to explicitly set Pipeline


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/a8a33b19
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/a8a33b19
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/a8a33b19

Branch: refs/heads/master
Commit: a8a33b19933326c28522dee530974c96d4aef0cb
Parents: de49d03
Author: Thomas Groh 
Authored: Fri Jun 10 14:38:36 2016 -0700
Committer: Thomas Groh 
Committed: Fri Jun 10 14:49:33 2016 -0700

--
 runners/direct-java/pom.xml |  6 ++
 .../runners/direct/AvroIOShardedWriteFactoryTest.java   | 12 ++--
 .../runners/direct/InProcessPipelineRunnerTest.java | 11 ++-
 .../runners/direct/KeyedPValueTrackingVisitorTest.java  |  7 ++-
 .../runners/direct/TextIOShardedWriteFactoryTest.java   | 12 ++--
 5 files changed, 38 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a8a33b19/runners/direct-java/pom.xml
--
diff --git a/runners/direct-java/pom.xml b/runners/direct-java/pom.xml
index def7207..b2cb607 100644
--- a/runners/direct-java/pom.xml
+++ b/runners/direct-java/pom.xml
@@ -78,6 +78,12 @@
   
 org.apache.maven.plugins
 maven-surefire-plugin
+
+  
+
+true
+  
+
 
   
 runnable-on-service-tests

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a8a33b19/runners/direct-java/src/test/java/org/apache/beam/runners/direct/AvroIOShardedWriteFactoryTest.java
--
diff --git 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/AvroIOShardedWriteFactoryTest.java
 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/AvroIOShardedWriteFactoryTest.java
index d290a4b..c0c1361 100644
--- 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/AvroIOShardedWriteFactoryTest.java
+++ 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/AvroIOShardedWriteFactoryTest.java
@@ -21,8 +21,10 @@ import static org.hamcrest.Matchers.not;
 import static org.hamcrest.Matchers.theInstance;
 import static org.junit.Assert.assertThat;
 
+import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.io.AvroIO;
 import org.apache.beam.sdk.io.AvroIOTest;
+import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.Create;
 import org.apache.beam.sdk.transforms.PTransform;
@@ -82,7 +84,7 @@ public class AvroIOShardedWriteFactoryTest {
 
 assertThat(overridden, not(Matchers.>equalTo(original)));
 
-TestPipeline p = TestPipeline.create();
+Pipeline p = getPipeline();
 String[] elems = new String[] {"foo", "bar", "baz"};
 p.apply(Create.of(elems)).apply(overridden);
 
@@ -101,7 +103,7 @@ public class AvroIOShardedWriteFactoryTest {
 
 assertThat(overridden, not(Matchers.>equalTo(original)));
 
-TestPipeline p = TestPipeline.create();
+Pipeline p = getPipeline();
 String[] elems = new String[] {"foo", "bar", "baz", "spam", "ham", "eggs"};
 p.apply(Create.of(elems)).apply(overridden);
 
@@ -109,4 +111,10 @@ public class AvroIOShardedWriteFactoryTest {
 p.run();
 AvroIOTest.assertTestOutputs(elems, 3, file.getAbsolutePath(), 
original.getShardNameTemplate());
   }
+
+  private Pipeline getPipeline() {
+PipelineOptions options = TestPipeline.testingPipelineOptions();
+options.setRunner(InProcessPipelineRunner.class);
+return TestPipeline.fromOptions(options);
+  }
 }

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a8a33b19/runners/direct-java/src/test/java/org/apache/beam/runners/direct/InProcessPipelineRunnerTest.java
--
diff --git 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/InProcessPipelineRunnerTest.java
 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/InProcessPipelineRunnerTest.java
index 5c26ac3..ab26c15 100644
--- 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/InProcessPipelineRunnerTest.java
+++ 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/InProcessPipelineRunnerTest.java
@@ -18,6 +18,7 @@
 package org.apache.beam.runners.direct;
 
 import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
 import static org.junit.Assert.fail;
 
 import

[4/8] incubator-beam git commit: Update Pipeline Execution Style in WindowedWordCountTest

2016-06-14 Thread kenn

Update Pipeline Execution Style in WindowedWordCountTest

This sets the runner a Pipeline creation time rather than sending a
(potentially rewritten) pipeline to a new runner instance.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/de49d032
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/de49d032
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/de49d032

Branch: refs/heads/master
Commit: de49d032730dd21691e6e4358fdcfef249aef46f
Parents: 8291219
Author: Thomas Groh 
Authored: Fri Jun 10 14:36:42 2016 -0700
Committer: Thomas Groh 
Committed: Fri Jun 10 14:49:33 2016 -0700

--
 .../spark/translation/WindowedWordCountTest.java | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/de49d032/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/WindowedWordCountTest.java
--
diff --git 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/WindowedWordCountTest.java
 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/WindowedWordCountTest.java
index c6911e1..54af5e3 100644
--- 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/WindowedWordCountTest.java
+++ 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/WindowedWordCountTest.java
@@ -23,6 +23,7 @@ import org.apache.beam.runners.spark.SimpleWordCountTest;
 import org.apache.beam.runners.spark.SparkPipelineRunner;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.transforms.Create;
@@ -55,7 +56,9 @@ public class WindowedWordCountTest {
 
   @Test
   public void testFixed() throws Exception {
-Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
+PipelineOptions opts = PipelineOptionsFactory.create();
+opts.setRunner(SparkPipelineRunner.class);
+Pipeline p = Pipeline.create(opts);
 PCollection inputWords =
 p.apply(Create.timestamped(WORDS, 
TIMESTAMPS)).setCoder(StringUtf8Coder.of());
 PCollection windowedWords =
@@ -65,7 +68,7 @@ public class WindowedWordCountTest {
 
 PAssert.that(output).containsInAnyOrder(EXPECTED_FIXED_SEPARATE_COUNT_SET);
 
-EvaluationResult res = SparkPipelineRunner.create().run(p);
+EvaluationResult res = (EvaluationResult) p.run();
 res.close();
   }
 
@@ -74,7 +77,9 @@ public class WindowedWordCountTest {
 
   @Test
   public void testFixed2() throws Exception {
-Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
+PipelineOptions opts = PipelineOptionsFactory.create();
+opts.setRunner(SparkPipelineRunner.class);
+Pipeline p = Pipeline.create(opts);
 PCollection inputWords = p.apply(Create.timestamped(WORDS, 
TIMESTAMPS)
 .withCoder(StringUtf8Coder.of()));
 PCollection windowedWords = inputWords
@@ -84,7 +89,7 @@ public class WindowedWordCountTest {
 
 PAssert.that(output).containsInAnyOrder(EXPECTED_FIXED_SAME_COUNT_SET);
 
-EvaluationResult res = SparkPipelineRunner.create().run(p);
+EvaluationResult res = (EvaluationResult) p.run();
 res.close();
   }
 
@@ -94,7 +99,9 @@ public class WindowedWordCountTest {
 
   @Test
   public void testSliding() throws Exception {
-Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
+PipelineOptions opts = PipelineOptionsFactory.create();
+opts.setRunner(SparkPipelineRunner.class);
+Pipeline p = Pipeline.create(opts);
 PCollection inputWords = p.apply(Create.timestamped(WORDS, 
TIMESTAMPS)
 .withCoder(StringUtf8Coder.of()));
 PCollection windowedWords = inputWords
@@ -105,7 +112,7 @@ public class WindowedWordCountTest {
 
 PAssert.that(output).containsInAnyOrder(EXPECTED_SLIDING_COUNT_SET);
 
-EvaluationResult res = SparkPipelineRunner.create().run(p);
+EvaluationResult res = (EvaluationResult) p.run();
 res.close();
   }

[2/8] incubator-beam git commit: Use TestPipeline#testingPipelineOptions in IO Tests

2016-06-14 Thread kenn

Use TestPipeline#testingPipelineOptions in IO Tests


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/f73bd73c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/f73bd73c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/f73bd73c

Branch: refs/heads/master
Commit: f73bd73caa5e8222946cfc20491fd2806edd1d2b
Parents: a8a33b1
Author: Thomas Groh 
Authored: Fri Jun 10 14:41:06 2016 -0700
Committer: Thomas Groh 
Committed: Fri Jun 10 14:49:33 2016 -0700

--
 .../test/java/org/apache/beam/sdk/io/BigQueryIOTest.java| 4 ++--
 .../src/test/java/org/apache/beam/sdk/io/WriteTest.java | 9 -
 2 files changed, 6 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f73bd73c/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BigQueryIOTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BigQueryIOTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BigQueryIOTest.java
index 679ae27..2a135ec 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BigQueryIOTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/BigQueryIOTest.java
@@ -384,7 +384,7 @@ public class BigQueryIOTest implements Serializable {
 
   @Before
   public void setUp() throws IOException {
-bqOptions = PipelineOptionsFactory.as(BigQueryOptions.class);
+bqOptions = 
TestPipeline.testingPipelineOptions().as(BigQueryOptions.class);
 bqOptions.setProject("defaultProject");
 
bqOptions.setTempLocation(testFolder.newFolder("BigQueryIOTest").getAbsolutePath());
 
@@ -755,7 +755,7 @@ public class BigQueryIOTest implements Serializable {
 options.setProject("someproject");
 options.setStreaming(streaming);
 
-Pipeline p = Pipeline.create(options);
+Pipeline p = TestPipeline.create(options);
 
 TableReference tableRef = new TableReference();
 tableRef.setDatasetId("somedataset");

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f73bd73c/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteTest.java
--
diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteTest.java
index 45a4374..abda3a5 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/WriteTest.java
@@ -19,7 +19,6 @@ package org.apache.beam.sdk.io;
 
 import static 
org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem;
 import static 
org.apache.beam.sdk.transforms.display.DisplayDataMatchers.includesDisplayDataFrom;
-
 import static org.hamcrest.Matchers.anyOf;
 import static org.hamcrest.Matchers.containsInAnyOrder;
 import static org.hamcrest.Matchers.equalTo;
@@ -35,9 +34,9 @@ import org.apache.beam.sdk.io.Sink.WriteOperation;
 import org.apache.beam.sdk.io.Sink.Writer;
 import org.apache.beam.sdk.options.Description;
 import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import 
org.apache.beam.sdk.options.PipelineOptionsFactoryTest.TestPipelineOptions;
 import org.apache.beam.sdk.testing.NeedsRunner;
+import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.Create;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.GroupByKey;
@@ -190,9 +189,9 @@ public class WriteTest {
   private static void runWrite(
   List inputs, PTransform transform) {
 // Flag to validate that the pipeline options are passed to the Sink
-String[] args = {"--testFlag=test_value"};
-PipelineOptions options = 
PipelineOptionsFactory.fromArgs(args).as(WriteOptions.class);
-Pipeline p = Pipeline.create(options);
+WriteOptions options = 
TestPipeline.testingPipelineOptions().as(WriteOptions.class);
+options.setTestFlag("test_value");
+Pipeline p = TestPipeline.create(options);
 
 // Clear the sink's contents.
 sinkContents.clear();

[jira] [Commented] (BEAM-327) Dataflow runner should have configuration for System.out/err handling

2016-06-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329852#comment-15329852
 ] 

ASF GitHub Bot commented on BEAM-327:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/459

[BEAM-327] Update DataflowPipelineRunner worker container version

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam worker-container

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/459.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #459






> Dataflow runner should have configuration for System.out/err handling
> -
>
> Key: BEAM-327
> URL: https://issues.apache.org/jira/browse/BEAM-327
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>
> We would like to support the following scenarios:
> # Respect global logging filter configuration for System.out/System.err log 
> messages.
> # Suppress all log message for a given source, including System.out/err 
> (Level.OFF)
> # Set the log level for messages emitted from System.out/err.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] incubator-beam pull request #458: Rename DoFnTester#processBatch to processB...

2016-06-14 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/458

Rename DoFnTester#processBatch to processBundle

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

DoFns process elements in bundles, not batches.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam fn_tester_process_bundle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #458


commit fdc580612363878dde99c7b41104d6e97f5e631b
Author: Thomas Groh 
Date:   2016-06-14T16:27:55Z

Rename DoFnTester#processBatch to processBundle

DoFns process elements in bundles, not batches.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

svn commit: r13998 - /release/incubator/beam/0.1.0-incubating/

2016-06-14 Thread davor

Author: davor
Date: Tue Jun 14 16:28:12 2016
New Revision: 13998

Log:
Add a directory for 0.1.0-incubating version.

The real purpose is to test permissions.


Added:
release/incubator/beam/0.1.0-incubating/

svn commit: r13997 - /dev/incubator/beam/KEYS

2016-06-14 Thread davor

Author: davor
Date: Tue Jun 14 16:25:07 2016
New Revision: 13997

Log:
Add KEYS for Apache Beam, dev portion.


Added:
dev/incubator/beam/KEYS

Added: dev/incubator/beam/KEYS
==
--- dev/incubator/beam/KEYS (added)
+++ dev/incubator/beam/KEYS Tue Jun 14 16:25:07 2016
@@ -0,0 +1,141 @@
+This file contains the PGP keys of various developers.
+
+Users: pgp < KEYS
+   gpg --import KEYS
+Developers: 
+pgp -kxa  and append it to this file.
+(pgpk -ll  && pgpk -xa ) >> this file.
+(gpg --list-sigs 
+ && gpg --armor --export ) >> this file.
+ 
+
+pub   4096R/C8282E76 2009-09-08
+uid  Jean-Baptiste OnofrÃ© 
+sig 3C8282E76 2009-09-08  Jean-Baptiste OnofrÃ© 
+sub   4096R/9F043BBC 2009-09-08
+sig  C8282E76 2009-09-08  Jean-Baptiste OnofrÃ© 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Version: GnuPG v1
+
+mQINBEqmJkEBEADAAMOjOidXzoyK4FK9WhhRg2EEGX1gm5lK8PpJtk68Fqmz6xvv
+N8VJXMIJUgeD7M35zZSQUWJY43xEU8Yfn6oLL0KR0dIqVOclxE+7G8vxXFcIbRE9
+ziZFp7Z5yzsdzjiIzXv5MVQMczcAAMev/i0BnjiRy5Cg+k6kHXVpu/Gsn05JKPaG
+s7ZcfSxpboyS99MVKQvoFLE5Z/Shh4gFJn2rFInqK5EgVpoZbVyysF52nx0dti/e
+O0NjraQkrEDBWvsPt3cYZA0oP1gWiZiRvOLfAFIarf3poMDyoWBIwnbqb3Msv09j
+yDAmcGq9wsD3alHFHcRIiJl5SzFUStml1d5x/BvUl/Xc5VfHPi2ObKF3xOPGkyTf
+aZ6mYFLaRCAJ0v2MPW+4/grDXKsP8n8xPbE2VQvHBpxaZklD7q4Omn2d+m2sUOLX
+NRUo4n29NyfowAffBYl7ZqrYBBodR9YngWC9LpgM+APHyiw3HzauZ94bGy5Of3+L
+Yu6/riDcP4OXF6r6IH6KIsVqIkv5xzq7OGxxXmlhWg8ifNPLq5yNRccS0nWXc5BD
+/9q06ta/ceQGNkXL327XPuZC+lstWGAa4dKEosRDgcO0Pv2j2a3h8W8oHyxF+gEe
+O+9s0mGdQFxNiEA+JyeKCg+jvfx9Hv/2Syrlert76NEkfbaTFA7BJ4c3EQARAQAB
+tCtKZWFuLUJhcHRpc3RlIE9ub2Zyw6kgPGpib25vZnJlQGFwYWNoZS5vcmc+iQI2
+BBMBAgAgBQJKpiZBAhsDBgsJCAcDAgQVAggDBBYCAwECHgECF4AACgkQv/LuQsgo
+LnboyRAAguqFIpiKkCCR6TR0Y5UQDFhgEMhBreQKCEW0czbGoFnxfULV9H1kJRSB
+Vt0knecGaYS340WEmz4B7BMpkBCgaszgn66+fhacZTBd+Aff1k2lbhdMgdBvlPcm
+q9vFGtbE515j9bPHzsPRJ2wFWd6ot9wXiLD3RJLV6c7L3Egstu3qTp0tEoFHrQps
+qskGBl+mahhMyz3BUDlusavB0Y0tb6hhXCR79ErhjQrTgU947isztYWpgJlA40lx
+DW0hskZWbuGNXjxUJvTT3pKiYUN32WG+2CDNYHceuhsfRLxO/Wb4BKwwDaHWAlH9
+d5F9/vhdPObSv5GQbuUtmCEzeqADUd65jLLM7WSlvRJ+i4m0/TTeP8y4NfxlVbBP
+WuYrQW4gPmDKEDNvEec6PH6hhBfMLJz3M6o4huwLp2kQrq6wSTMDGIoxOLP0ae3c
+BMIuFM5EavLDJmuATUIWWyZt/c7mmAOOh5TGcFWTugnJ6l4FllOrFPiWyFsjMn+U
+zzzaeSkYmq/xZYxjRTdWjK5Zb5rbVuCx/q5VF9Awdy4EM6UXhaqWo06VyjWNOJ86
+wgres4+bVldB7+TiVi9iO6n80WNlPgIaQJlLc+FRsld4Er21kdXreX5doxFD5Iue
+S4y/pLwftHfx1xxj+p2jPJ49Hb0ddNr+XrsrO5txing2pNJgfH65Ag0ESqYmQQEQ
+AKPoXgIIKnyJiPvks7xBV+FqJPecVAx3SSlLyTfsh/jBat9QLd4hsfiZcv1ANZHB
+n4qDeGlsmJ6uDGv8wnUZQ2Im8Heje1h7dKeLNpNnxfBS9gn6e2bXKhAsJGUE7gip
+qVfijFnEY0Vj6Tztzq+Wyqg2Gbz+bJZMo1JVQiaAYyQeQlrOcoZcQHsA/Ol+y48h
+Le36A1TSIPMOSI4ZAZXkqxXAumEaMaz82EvV8KDH7Ijr23Y0wZjEUJ+dJQM9ssuE
+f9GMLIuCbmM/CJ5MCCwepGJd52ymllvgJTHC7B+BY/jKNMWHwAsMJ1oWcPlLzFQI
+Bmyy5RjKoMifzaoSo/hTWkiwcL2Vc+qU3b3/2eUtnCnBB/nkrZkJNNc+OV5YGBSP
+vNPaN43Gvjbvborv4PBvt7QhVjZYQemtXO2sWx1XWSFsucD2K4kJ8ipNWxVgIqDu
+J8SJOnGigX9hMpsZ2HVAwOeKP/jI90J3voKrCPLaKcL1Ip+b28k0aj7kl44YJqw4
+5pbRSx/v73bH4uleQiXSW+JczA+KLw7hX3tOWJEnLS2+Ig9sNUKYGZOg0nw613bN
+fZy8Cbx/UkT10Lznx9FW6MedGyJPYT4MJMMh/PnnsWv50jFnfu2rtnRXEOUXwujL
+fwrmCYbXHgE3Ka+fmRz8HxsyTmtqIHtPixw8RoqfoFfxABEBAAGJAh8EGAECAAkF
+AkqmJkECGwwACgkQv/LuQsgoLnb8AQ/+POsLFdqNqSKfwBXp1YOIEjNdbVjysQc6
+zC6LlMJXNSxAmUmol2g9bJYh9LdpvOTU3gfFgIanaGytC75U7/NOl0zEsN4IU18j
+CLBNaD5/Or1ciQ3CVrID/lPO8s0Hm0/cUPreEjJPPrrPbXG+i9bweg3Dtfy3+WQl
+PhfpvgudwtUjB3st2gztYipkUhmrH+STbbJZVJN5ZNL8mOoM5M2wGS+9VweOWbKe
+z0QeZ9hIPyQNMzTn1xlvRUVNTu8fz2FGvumrd+zgzYcpTE5VpFkOxxUayr3aWXSf
+Cak+HH0WjUDWc9/lJR4dVpwdjLonJfiC70W07J4CnNodYwnPUaGKTVYq3pvQzAPw
+hjx4u6t5zTZy5CbCAEhZC/9GeQmtuM0rcQhz048don4s4baDrqUPKL+X3C3ev4/o
+00yLrQ5rLX8K4iE/Go5xUyhzT7gqFJUPWdo8neTXXwQGThqqhVQovnn3M6i55rCg
+EeOTd7uW+k3vt6kunWZFKPjzRBxMD4NYovIQXwhPxj0vq6DnE0RQa7Dfm6l3cAV7
+/l3kRQcT69AWXotUJQnpY4bemTuYlxAYWCkTGNLdNNiBhiaqlR7xgYMNXS4XqcgA
+6QtP8ulb2FPR0MWEtvGkbHgAAIayV+Jt1Ed2JkIsdJHGeSZO5WEiupySDQCGn6rZ
+DR2E2zua3tQ=
+=LCyH
+-END PGP PUBLIC KEY BLOCK-
+pub   2048R/8F0D334F 2014-10-31
+uid  Davor Bonaci 
+sig 38F0D334F 2014-10-31  Davor Bonaci 
+sub   2048R/D1B59DCE 2014-10-31
+sig  8F0D334F 2014-10-31  Davor Bonaci 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Version: GnuPG v1
+
+mQENBFRUAG4BCADDIXl7xSyNCm/4PiaRWLQXB/fjiAYEDZvrodiA7sYh7ZBeuUzV
+R72RXdzqsL43ZKS0OVfgKIN/qSriVHzz6HfdBJNLbVdzHlIA9/zCTZ/rgPT72P+i
+flU/iKPVuYLd+xOb/iPG1Ha0zMobw3tWWD3ifXIe+N8DEnxo1ywiP7+Uf2MDmK70
+Is0v55/pF/fTridIK/a3Hmt5OtyA/quw0j7k2ga9bi/m+R9BO7W198EpSAn6rZRI
+Ipo+kjdEdF1QKR39dbGjLjzfAsT24Loq2H9WZnSjwaPaAIJWZR82AcFda04qEiJ/
+yfQLn+IXihNErX7IFUd+PcQQ9RjyABwL9LWXABEBAAG0H0Rhdm9yIEJvbmFjaSA8
+ZGF2b3JAZ29vZ2xlLmNvbT6JATgEEwECACIFAlRUAG4CGwMGCwkIBwMCBhUIAgkK
+CwQWAgMBAh4BAheAAAoJEMkEN+GPDTNPpVUIALNRhjG0gwdE3OuHTWo/YUNUJQIa

[GitHub] incubator-beam pull request #457: Revert GBK-based PAssert

2016-06-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/457


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] incubator-beam git commit: Revert GBK-based PAssert

2016-06-14 Thread kenn

Repository: incubator-beam
Updated Branches:
  refs/heads/master f9a9214dd -> c8ad2e7dd


Revert GBK-based PAssert

This changed neglected the use of counters by the Dataflow runner,
which is used to prevent tests for spuriously passing when
a PCollection is empty.

Obvious fixes for that revealed probable bugs in the in-process
and Spark runner, as well as tests that happen to work with
PAssert but are actually unsupported.

A proper long-term fix is underway to address all of the above.
In the meantime, this commit rolls back the changes to PAssert.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/045b568f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/045b568f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/045b568f

Branch: refs/heads/master
Commit: 045b568f6be4b7b010d4fd4cfdd1536db943ce54
Parents: f9a9214
Author: Kenneth Knowles 
Authored: Tue Jun 14 08:05:04 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 14 08:05:41 2016 -0700

--
 .../testing/TestDataflowPipelineRunner.java |   3 +-
 .../org/apache/beam/sdk/testing/PAssert.java| 779 +--
 .../apache/beam/sdk/testing/PAssertTest.java|  27 +
 3 files changed, 396 insertions(+), 413 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/045b568f/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowPipelineRunner.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowPipelineRunner.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowPipelineRunner.java
index c940e9a..3e8d903 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowPipelineRunner.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/testing/TestDataflowPipelineRunner.java
@@ -166,8 +166,7 @@ public class TestDataflowPipelineRunner extends 
PipelineRunner OutputT apply(
   PTransform transform, InputT input) {
 if (transform instanceof PAssert.OneSideInputAssert
-|| transform instanceof PAssert.GroupThenAssert
-|| transform instanceof PAssert.GroupThenAssertForSingleton) {
+|| transform instanceof PAssert.TwoSideInputAssert) {
   expectedNumberOfAssertions += 1;
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/045b568f/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
index 62d3599..c2cd598 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java
@@ -34,14 +34,11 @@ import org.apache.beam.sdk.runners.PipelineRunner;
 import org.apache.beam.sdk.transforms.Aggregator;
 import org.apache.beam.sdk.transforms.Create;
 import org.apache.beam.sdk.transforms.DoFn;
-import org.apache.beam.sdk.transforms.GroupByKey;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.transforms.ParDo;
 import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.transforms.Sum;
-import org.apache.beam.sdk.transforms.Values;
 import org.apache.beam.sdk.transforms.View;
-import org.apache.beam.sdk.transforms.WithKeys;
 import org.apache.beam.sdk.transforms.windowing.GlobalWindows;
 import org.apache.beam.sdk.transforms.windowing.Window;
 import org.apache.beam.sdk.util.CoderUtils;
@@ -51,27 +48,32 @@ import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionView;
 import org.apache.beam.sdk.values.PDone;
 
+import com.google.common.base.Optional;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import java.io.IOException;
 import java.io.Serializable;
 import java.util.Arrays;
 import java.util.Collection;
 import java.util.Collections;
+import java.util.List;
 import java.util.Map;
 import java.util.NoSuchElementException;
 
 /**
- * An assertion on the contents of a {@link PCollection} incorporated into the 
pipeline. Such an
- * assertion can be checked no matter what kind of {@link PipelineRunner} is 
used.
+ * An assertion on the contents of a {@link PCollection}
+ * incorporated into the pipeline.  Such an assertion
+ * can

[2/2] incubator-beam git commit: This closes #457

2016-06-14 Thread kenn

This closes #457


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/c8ad2e7d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/c8ad2e7d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/c8ad2e7d

Branch: refs/heads/master
Commit: c8ad2e7dd5443ba40d126dca4cd3cb29b33103cf
Parents: f9a9214 045b568
Author: Kenneth Knowles 
Authored: Tue Jun 14 08:52:03 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 14 08:52:03 2016 -0700

--
 .../testing/TestDataflowPipelineRunner.java |   3 +-
 .../org/apache/beam/sdk/testing/PAssert.java| 779 +--
 .../apache/beam/sdk/testing/PAssertTest.java|  27 +
 3 files changed, 396 insertions(+), 413 deletions(-)
--

[GitHub] incubator-beam pull request #457: Revert GBK-based PAssert

2016-06-14 Thread kennknowles

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/457

Revert GBK-based PAssert

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This changed neglected the use of counters by the Dataflow runner, which is 
used to prevent tests for spuriously passing when a `PCollection` is empty (as 
well as just diagnostic printouts).

Obvious fixes for that revealed probable bugs in the in-process and Spark 
runner, as well as tests that happen to work with `PAssert` but are actually 
unsupported.

A proper long-term fix is underway to address all of the above. In the 
meantime, this commit rolls back the changes to `PAssert`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam PAssert-rollback

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit 045b568f6be4b7b010d4fd4cfdd1536db943ce54
Author: Kenneth Knowles 
Date:   2016-06-14T15:05:04Z

Revert GBK-based PAssert

This changed neglected the use of counters by the Dataflow runner,
which is used to prevent tests for spuriously passing when
a PCollection is empty.

Obvious fixes for that revealed probable bugs in the in-process
and Spark runner, as well as tests that happen to work with
PAssert but are actually unsupported.

A proper long-term fix is underway to address all of the above.
In the meantime, this commit rolls back the changes to PAssert.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #539

2016-06-14 Thread Apache Jenkins Server

See

86 matches

Mail list logo