date:20170222

[jira] [Commented] (BEAM-1397) Introduce IO metrics

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879859#comment-15879859
 ] 

ASF GitHub Bot commented on BEAM-1397:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2082

[BEAM-1397] [BEAM-1398] Introduce IO metrics. Add KafkaIO metrics.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam io-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2082


commit 31d94d4eecf492357c41424f65e1037fc3976d09
Author: Aviem Zur 
Date:   2017-02-22T14:18:13Z

[BEAM-1397] Introduce IO metrics

commit 8192724e65e65092117fee0a78408b476adf0245
Author: Aviem Zur 
Date:   2017-02-22T21:26:45Z

[BEAM-1398] KafkaIO metrics

commit 62d0ac450ff4631ddfd057a5caa785dae305065b
Author: Aviem Zur 
Date:   2017-02-23T04:56:43Z

Test Spark runner streaming IO metrics




> Introduce IO metrics
> 
>
> Key: BEAM-1397
> URL: https://issues.apache.org/jira/browse/BEAM-1397
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Introduce the usage of metrics API in IOs.
> POC using {{CountingInput}}:
> * Add metrics to {{CountingInput}}
> * {{RunnableOnService}} test which creates a pipeline which asserts these 
> metrics.
> * Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2082: [BEAM-1397] [BEAM-1398] Introduce IO metrics. Add K...

2017-02-22 Thread aviemzur

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2082

[BEAM-1397] [BEAM-1398] Introduce IO metrics. Add KafkaIO metrics.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam io-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2082


commit 31d94d4eecf492357c41424f65e1037fc3976d09
Author: Aviem Zur 
Date:   2017-02-22T14:18:13Z

[BEAM-1397] Introduce IO metrics

commit 8192724e65e65092117fee0a78408b476adf0245
Author: Aviem Zur 
Date:   2017-02-22T21:26:45Z

[BEAM-1398] KafkaIO metrics

commit 62d0ac450ff4631ddfd057a5caa785dae305065b
Author: Aviem Zur 
Date:   2017-02-23T04:56:43Z

Test Spark runner streaming IO metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1539) Support unknown length iterables for IterableCoder in Python SDK

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879855#comment-15879855
 ] 

ASF GitHub Bot commented on BEAM-1539:
--

GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/2081

[BEAM-1539]: Support unkown length iterables for IterableCoder in python SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam it_cdr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2081


commit dcc4b6bd7970741b03766fb162f367d0c6343551
Author: Vikas Kedigehalli 
Date:   2017-02-23T03:36:33Z

Support unkown length iterables for IterableCoder in python SDK




> Support unknown length iterables for IterableCoder in Python SDK
> 
>
> Key: BEAM-1539
> URL: https://issues.apache.org/jira/browse/BEAM-1539
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2081: [BEAM-1539]: Support unkown length iterables for It...

2017-02-22 Thread vikkyrk

GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/2081

[BEAM-1539]: Support unkown length iterables for IterableCoder in python SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam it_cdr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2081


commit dcc4b6bd7970741b03766fb162f367d0c6343551
Author: Vikas Kedigehalli 
Date:   2017-02-23T03:36:33Z

Support unkown length iterables for IterableCoder in python SDK




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Dataflow #2380

2017-02-22 Thread Apache Jenkins Server

See

[jira] [Created] (BEAM-1538) Add a fast version of BufferedElementCountingOutputStream

2017-02-22 Thread Vikas Kedigehalli (JIRA)

Vikas Kedigehalli created BEAM-1538:
---

 Summary: Add a fast version of BufferedElementCountingOutputStream
 Key: BEAM-1538
 URL: https://issues.apache.org/jira/browse/BEAM-1538
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay
Priority: Minor


We are currently using python version of the stream which is slow. We need to 
implement a Cython version.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[2/2] beam git commit: This closes #2080

2017-02-22 Thread altay

This closes #2080


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/00b39588
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/00b39588
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/00b39588

Branch: refs/heads/master
Commit: 00b39588110c1b5fb7e4a34586957cc05fbcd03c
Parents: 8ec8909 821669d
Author: Ahmet Altay 
Authored: Wed Feb 22 19:37:20 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 19:37:20 2017 -0800

--
 .../runners/dataflow/internal/apiclient.py  |   6 +-
 .../runners/dataflow/internal/dependency.py | 508 +++
 .../dataflow/internal/dependency_test.py| 425 
 sdks/python/apache_beam/utils/dependency.py | 504 --
 .../python/apache_beam/utils/dependency_test.py | 425 
 sdks/python/apache_beam/utils/profiler.py   |  12 +-
 6 files changed, 942 insertions(+), 938 deletions(-)
--

[GitHub] beam pull request #2080: [BEAM-1218] Move dependency file to dataflow runner...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2080


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Move dependency file to dataflow runner directory

2017-02-22 Thread altay

Repository: beam
Updated Branches:
  refs/heads/master 8ec890909 -> 00b395881


Move dependency file to dataflow runner directory


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/821669da
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/821669da
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/821669da

Branch: refs/heads/master
Commit: 821669da21d2d785977428e1ecb41c641dda9852
Parents: 8ec8909
Author: Ahmet Altay 
Authored: Wed Feb 22 19:08:16 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 19:36:04 2017 -0800

--
 .../runners/dataflow/internal/apiclient.py  |   6 +-
 .../runners/dataflow/internal/dependency.py | 508 +++
 .../dataflow/internal/dependency_test.py| 425 
 sdks/python/apache_beam/utils/dependency.py | 504 --
 .../python/apache_beam/utils/dependency_test.py | 425 
 sdks/python/apache_beam/utils/profiler.py   |  12 +-
 6 files changed, 942 insertions(+), 938 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/821669da/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
index 0dab676..481ab70 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
@@ -34,13 +34,13 @@ from apache_beam import utils
 from apache_beam.internal.auth import get_service_credentials
 from apache_beam.internal.gcp.json_value import to_json_value
 from apache_beam.io.gcp.internal.clients import storage
+from apache_beam.runners.dataflow.internal import dependency
 from apache_beam.runners.dataflow.internal.clients import dataflow
+from apache_beam.runners.dataflow.internal.dependency import 
get_required_container_version
+from apache_beam.runners.dataflow.internal.dependency import 
get_sdk_name_and_version
 from apache_beam.transforms import cy_combiners
 from apache_beam.transforms.display import DisplayData
-from apache_beam.utils import dependency
 from apache_beam.utils import retry
-from apache_beam.utils.dependency import get_required_container_version
-from apache_beam.utils.dependency import get_sdk_name_and_version
 from apache_beam.utils.names import PropertyNames
 from apache_beam.utils.pipeline_options import DebugOptions
 from apache_beam.utils.pipeline_options import GoogleCloudOptions

http://git-wip-us.apache.org/repos/asf/beam/blob/821669da/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py 
b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
new file mode 100644
index 000..6024332
--- /dev/null
+++ b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
@@ -0,0 +1,508 @@
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Support for installing custom code and required dependencies.
+
+Workflows, with the exception of very simple ones, are organized in multiple
+modules and packages. Typically, these modules and packages have
+dependencies on other standard libraries. Dataflow relies on the Python
+setuptools package to handle these scenarios. For further details please read:
+https://pythonhosted.org/an_example_pypi_project/setuptools.html
+
+When a runner tries to run a pipeline it will check for a --requirements_file
+and a --setup_file option.
+
+If --setup_file is present then it is assumed that the folder containing the
+file specified by the option has the typical layout required by setuptools and
+it will run 'python setup.py sdist' to produce a source distribution. The
+resulting tarball (a .tar or .tar.gz file) will be staged at the GCS staging
+location specified as job option. When a worker starts

[2/3] beam-site git commit: Regenerate website

2017-02-22 Thread altay

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7612eb22
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7612eb22
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7612eb22

Branch: refs/heads/asf-site
Commit: 7612eb22994999ac7367d530af313b7a92e63d83
Parents: f5d8735
Author: Ahmet Altay 
Authored: Wed Feb 22 19:37:58 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 19:37:58 2017 -0800

--
 content/documentation/programming-guide/index.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/7612eb22/content/documentation/programming-guide/index.html
--
diff --git a/content/documentation/programming-guide/index.html 
b/content/documentation/programming-guide/index.html
index 89f18e5..75a04e5 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -1332,8 +1332,8 @@ tree, [2]
   
   
   
-https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py;>Google
 BigQuery
-https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/google_cloud_platform/datastore;>Google
 Cloud Datastore
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py;>Google
 BigQuery
+https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore;>Google
 Cloud Datastore

[GitHub] beam-site pull request #161: Fix broken links due to directory renames in Be...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/161


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2379

2017-02-22 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1218) De-Googlify Python SDK

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879650#comment-15879650
 ] 

ASF GitHub Bot commented on BEAM-1218:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2079


> De-Googlify Python SDK
> --
>
> Key: BEAM-1218
> URL: https://issues.apache.org/jira/browse/BEAM-1218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Mark Liu
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[20/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/gcp/bigquery_test.py
--
diff --git a/sdks/python/apache_beam/io/gcp/bigquery_test.py 
b/sdks/python/apache_beam/io/gcp/bigquery_test.py
new file mode 100644
index 000..fbf073c
--- /dev/null
+++ b/sdks/python/apache_beam/io/gcp/bigquery_test.py
@@ -0,0 +1,828 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unit tests for BigQuery sources and sinks."""
+
+import datetime
+import json
+import logging
+import time
+import unittest
+
+import hamcrest as hc
+import mock
+
+import apache_beam as beam
+from apache_beam.io.gcp.bigquery import RowAsDictJsonCoder
+from apache_beam.io.gcp.bigquery import TableRowJsonCoder
+from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
+from apache_beam.io.gcp.internal.clients import bigquery
+from apache_beam.internal.gcp.json_value import to_json_value
+from apache_beam.transforms.display import DisplayData
+from apache_beam.transforms.display_test import DisplayDataItemMatcher
+from apache_beam.utils.pipeline_options import PipelineOptions
+
+# Protect against environments where bigquery library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from apitools.base.py.exceptions import HttpError
+except ImportError:
+  HttpError = None
+# pylint: enable=wrong-import-order, wrong-import-position
+
+
+@unittest.skipIf(HttpError is None, 'GCP dependencies are not installed')
+class TestRowAsDictJsonCoder(unittest.TestCase):
+
+  def test_row_as_dict(self):
+coder = RowAsDictJsonCoder()
+test_value = {'s': 'abc', 'i': 123, 'f': 123.456, 'b': True}
+self.assertEqual(test_value, coder.decode(coder.encode(test_value)))
+
+  def json_compliance_exception(self, value):
+with self.assertRaises(ValueError) as exn:
+  coder = RowAsDictJsonCoder()
+  test_value = {'s': value}
+  self.assertEqual(test_value, coder.decode(coder.encode(test_value)))
+  self.assertTrue(bigquery.JSON_COMPLIANCE_ERROR in exn.exception.message)
+
+  def test_invalid_json_nan(self):
+self.json_compliance_exception(float('nan'))
+
+  def test_invalid_json_inf(self):
+self.json_compliance_exception(float('inf'))
+
+  def test_invalid_json_neg_inf(self):
+self.json_compliance_exception(float('-inf'))
+
+
+@unittest.skipIf(HttpError is None, 'GCP dependencies are not installed')
+class TestTableRowJsonCoder(unittest.TestCase):
+
+  def test_row_as_table_row(self):
+schema_definition = [
+('s', 'STRING'),
+('i', 'INTEGER'),
+('f', 'FLOAT'),
+('b', 'BOOLEAN'),
+('r', 'RECORD')]
+data_defination = [
+'abc',
+123,
+123.456,
+True,
+{'a': 'b'}]
+str_def = '{"s": "abc", "i": 123, "f": 123.456, "b": true, "r": {"a": 
"b"}}'
+schema = bigquery.TableSchema(
+fields=[bigquery.TableFieldSchema(name=k, type=v)
+for k, v in schema_definition])
+coder = TableRowJsonCoder(table_schema=schema)
+test_row = bigquery.TableRow(
+f=[bigquery.TableCell(v=to_json_value(e)) for e in data_defination])
+
+self.assertEqual(str_def, coder.encode(test_row))
+self.assertEqual(test_row, coder.decode(coder.encode(test_row)))
+# A coder without schema can still decode.
+self.assertEqual(
+test_row, TableRowJsonCoder().decode(coder.encode(test_row)))
+
+  def test_row_and_no_schema(self):
+coder = TableRowJsonCoder()
+test_row = bigquery.TableRow(
+f=[bigquery.TableCell(v=to_json_value(e))
+   for e in ['abc', 123, 123.456, True]])
+with self.assertRaises(AttributeError) as ctx:
+  coder.encode(test_row)
+self.assertTrue(
+ctx.exception.message.startswith('The TableRowJsonCoder requires'))
+
+  def json_compliance_exception(self, value):
+with self.assertRaises(ValueError) as exn:
+  schema_definition = [('f', 'FLOAT')]
+  schema = bigquery.TableSchema(
+  fields=[bigquery.TableFieldSchema(name=k, type=v)
+  for k, v in schema_definition])
+  coder = TableRowJsonCoder(table_schema=schema)
+  test_row =

[22/22] beam git commit: This closes #2079

2017-02-22 Thread altay

This closes #2079


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8ec89090
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8ec89090
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8ec89090

Branch: refs/heads/master
Commit: 8ec890909a49648bffd082603901e4d718ae331f
Parents: cad84c8 59ad58a
Author: Ahmet Altay 
Authored: Wed Feb 22 16:30:25 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 16:30:25 2017 -0800

--
 .../examples/cookbook/bigquery_schema.py|2 +-
 .../examples/cookbook/datastore_wordcount.py|4 +-
 .../apache_beam/examples/snippets/snippets.py   |4 +-
 .../python/apache_beam/internal/gcp/__init__.py |   16 +
 .../apache_beam/internal/gcp/json_value.py  |  147 +
 .../apache_beam/internal/gcp/json_value_test.py |   93 +
 .../internal/google_cloud_platform/__init__.py  |   16 -
 .../google_cloud_platform/json_value.py |  147 -
 .../google_cloud_platform/json_value_test.py|   93 -
 sdks/python/apache_beam/io/__init__.py  |4 +-
 sdks/python/apache_beam/io/fileio.py|2 +-
 sdks/python/apache_beam/io/gcp/__init__.py  |   16 +
 sdks/python/apache_beam/io/gcp/bigquery.py  | 1081 +
 sdks/python/apache_beam/io/gcp/bigquery_test.py |  828 
 .../apache_beam/io/gcp/datastore/__init__.py|   16 +
 .../apache_beam/io/gcp/datastore/v1/__init__.py |   16 +
 .../io/gcp/datastore/v1/datastoreio.py  |  397 ++
 .../io/gcp/datastore/v1/datastoreio_test.py |  245 +
 .../io/gcp/datastore/v1/fake_datastore.py   |   98 +
 .../apache_beam/io/gcp/datastore/v1/helper.py   |  274 ++
 .../io/gcp/datastore/v1/helper_test.py  |  265 ++
 .../io/gcp/datastore/v1/query_splitter.py   |  275 ++
 .../io/gcp/datastore/v1/query_splitter_test.py  |  208 +
 sdks/python/apache_beam/io/gcp/gcsio.py |  871 
 sdks/python/apache_beam/io/gcp/gcsio_test.py|  796 
 .../apache_beam/io/gcp/internal/__init__.py |   16 +
 .../io/gcp/internal/clients/__init__.py |   16 +
 .../gcp/internal/clients/bigquery/__init__.py   |   33 +
 .../clients/bigquery/bigquery_v2_client.py  |  660 +++
 .../clients/bigquery/bigquery_v2_messages.py| 1910 
 .../io/gcp/internal/clients/storage/__init__.py |   33 +
 .../clients/storage/storage_v1_client.py| 1039 +
 .../clients/storage/storage_v1_messages.py  | 1920 
 sdks/python/apache_beam/io/gcp/pubsub.py|   91 +
 sdks/python/apache_beam/io/gcp/pubsub_test.py   |   63 +
 .../io/google_cloud_platform/__init__.py|   16 -
 .../io/google_cloud_platform/bigquery.py| 1081 -
 .../io/google_cloud_platform/bigquery_test.py   |  828 
 .../google_cloud_platform/datastore/__init__.py |   16 -
 .../datastore/v1/__init__.py|   16 -
 .../datastore/v1/datastoreio.py |  397 --
 .../datastore/v1/datastoreio_test.py|  245 -
 .../datastore/v1/fake_datastore.py  |   98 -
 .../datastore/v1/helper.py  |  274 --
 .../datastore/v1/helper_test.py |  265 --
 .../datastore/v1/query_splitter.py  |  275 --
 .../datastore/v1/query_splitter_test.py |  208 -
 .../io/google_cloud_platform/gcsio.py   |  871 
 .../io/google_cloud_platform/gcsio_test.py  |  796 
 .../google_cloud_platform/internal/__init__.py  |   16 -
 .../internal/clients/__init__.py|   16 -
 .../internal/clients/bigquery/__init__.py   |   33 -
 .../clients/bigquery/bigquery_v2_client.py  |  660 ---
 .../clients/bigquery/bigquery_v2_messages.py| 1910 
 .../internal/clients/storage/__init__.py|   33 -
 .../clients/storage/storage_v1_client.py| 1039 -
 .../clients/storage/storage_v1_messages.py  | 1920 
 .../io/google_cloud_platform/pubsub.py  |   91 -
 .../io/google_cloud_platform/pubsub_test.py |   63 -
 sdks/python/apache_beam/io/iobase.py|4 +-
 sdks/python/apache_beam/pipeline_test.py|2 +-
 sdks/python/apache_beam/runners/__init__.py |4 +-
 .../apache_beam/runners/dataflow/__init__.py|   16 +
 .../runners/dataflow/dataflow_metrics.py|   33 +
 .../runners/dataflow/dataflow_metrics_test.py   |   20 +
 .../runners/dataflow/dataflow_runner.py |  724 +++
 .../runners/dataflow/dataflow_runner_test.py|   78 +
 .../runners/dataflow/internal/__init__.py   |   16 +
 .../runners/dataflow/internal/apiclient.py  |  726 +++
 .../runners/dataflow/internal/apiclient_test.py |   96 +
 .../dataflow/internal/clients/__init__.py   |   16 +
 .../internal/clients/dataflow/__init__.py   |   33 +
 .../clients/dataflow/dataflow_v1b3_client.py|  684 +++
 .../clients/dataflow/dataflow_v1b3_messages.py  | 4173

[18/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py
--
diff --git 
a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py
 
b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py
new file mode 100644
index 000..201a183
--- /dev/null
+++ 
b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py
@@ -0,0 +1,660 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Generated client library for bigquery version v2."""
+# NOTE: This file is autogenerated and should not be edited by hand.
+from apitools.base.py import base_api
+
+from apache_beam.io.gcp.internal.clients.bigquery import bigquery_v2_messages 
as messages
+
+
+class BigqueryV2(base_api.BaseApiClient):
+  """Generated client library for service bigquery version v2."""
+
+  MESSAGES_MODULE = messages
+
+  _PACKAGE = u'bigquery'
+  _SCOPES = [u'https://www.googleapis.com/auth/bigquery', 
u'https://www.googleapis.com/auth/bigquery.insertdata', 
u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/cloud-platform.read-only', 
u'https://www.googleapis.com/auth/devstorage.full_control', 
u'https://www.googleapis.com/auth/devstorage.read_only', 
u'https://www.googleapis.com/auth/devstorage.read_write']
+  _VERSION = u'v2'
+  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
+  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _CLIENT_CLASS_NAME = u'BigqueryV2'
+  _URL_VERSION = u'v2'
+  _API_KEY = None
+
+  def __init__(self, url='', credentials=None,
+   get_credentials=True, http=None, model=None,
+   log_request=False, log_response=False,
+   credentials_args=None, default_global_params=None,
+   additional_http_headers=None):
+"""Create a new bigquery handle."""
+url = url or u'https://www.googleapis.com/bigquery/v2/'
+super(BigqueryV2, self).__init__(
+url, credentials=credentials,
+get_credentials=get_credentials, http=http, model=model,
+log_request=log_request, log_response=log_response,
+credentials_args=credentials_args,
+default_global_params=default_global_params,
+additional_http_headers=additional_http_headers)
+self.datasets = self.DatasetsService(self)
+self.jobs = self.JobsService(self)
+self.projects = self.ProjectsService(self)
+self.tabledata = self.TabledataService(self)
+self.tables = self.TablesService(self)
+
+  class DatasetsService(base_api.BaseApiService):
+"""Service class for the datasets resource."""
+
+_NAME = u'datasets'
+
+def __init__(self, client):
+  super(BigqueryV2.DatasetsService, self).__init__(client)
+  self._method_configs = {
+  'Delete': base_api.ApiMethodInfo(
+  http_method=u'DELETE',
+  method_id=u'bigquery.datasets.delete',
+  ordered_params=[u'projectId', u'datasetId'],
+  path_params=[u'datasetId', u'projectId'],
+  query_params=[u'deleteContents'],
+  relative_path=u'projects/{projectId}/datasets/{datasetId}',
+  request_field='',
+  request_type_name=u'BigqueryDatasetsDeleteRequest',
+  response_type_name=u'BigqueryDatasetsDeleteResponse',
+  supports_download=False,
+  ),
+  'Get': base_api.ApiMethodInfo(
+  http_method=u'GET',
+  method_id=u'bigquery.datasets.get',
+  ordered_params=[u'projectId', u'datasetId'],
+  path_params=[u'datasetId', u'projectId'],
+  query_params=[],
+  relative_path=u'projects/{projectId}/datasets/{datasetId}',
+  request_field='',
+  request_type_name=u'BigqueryDatasetsGetRequest',
+  response_type_name=u'Dataset',
+  supports_download=False,
+  ),
+  'Insert': base_api.ApiMethodInfo(
+  http_method=u'POST',
+  method_id=u'bigquery.datasets.insert',
+

[GitHub] beam pull request #2079: [BEAM-1218] Rename google_cloud_dataflow and google...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2079


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[08/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_messages.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_messages.py
 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_messages.py
deleted file mode 100644
index dc9e5e6..000
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_messages.py
+++ /dev/null
@@ -1,1920 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""Generated message classes for storage version v1.
-
-Stores and retrieves potentially large, immutable data objects.
-"""
-# NOTE: This file is autogenerated and should not be edited by hand.
-
-from apitools.base.protorpclite import message_types as _message_types
-from apitools.base.protorpclite import messages as _messages
-from apitools.base.py import encoding
-from apitools.base.py import extra_types
-
-
-package = 'storage'
-
-
-class Bucket(_messages.Message):
-  """A bucket.
-
-  Messages:
-CorsValueListEntry: A CorsValueListEntry object.
-LifecycleValue: The bucket's lifecycle configuration. See lifecycle
-  management for more information.
-LoggingValue: The bucket's logging configuration, which defines the
-  destination bucket and optional name prefix for the current bucket's
-  logs.
-OwnerValue: The owner of the bucket. This is always the project team's
-  owner group.
-VersioningValue: The bucket's versioning configuration.
-WebsiteValue: The bucket's website configuration.
-
-  Fields:
-acl: Access controls on the bucket.
-cors: The bucket's Cross-Origin Resource Sharing (CORS) configuration.
-defaultObjectAcl: Default access controls to apply to new objects when no
-  ACL is provided.
-etag: HTTP 1.1 Entity tag for the bucket.
-id: The ID of the bucket.
-kind: The kind of item this is. For buckets, this is always
-  storage#bucket.
-lifecycle: The bucket's lifecycle configuration. See lifecycle management
-  for more information.
-location: The location of the bucket. Object data for objects in the
-  bucket resides in physical storage within this region. Defaults to US.
-  See the developer's guide for the authoritative list.
-logging: The bucket's logging configuration, which defines the destination
-  bucket and optional name prefix for the current bucket's logs.
-metageneration: The metadata generation of this bucket.
-name: The name of the bucket.
-owner: The owner of the bucket. This is always the project team's owner
-  group.
-projectNumber: The project number of the project the bucket belongs to.
-selfLink: The URI of this bucket.
-storageClass: The bucket's storage class. This defines how objects in the
-  bucket are stored and determines the SLA and the cost of storage. Values
-  include STANDARD, NEARLINE and DURABLE_REDUCED_AVAILABILITY. Defaults to
-  STANDARD. For more information, see storage classes.
-timeCreated: The creation time of the bucket in RFC 3339 format.
-updated: The modification time of the bucket in RFC 3339 format.
-versioning: The bucket's versioning configuration.
-website: The bucket's website configuration.
-  """
-
-  class CorsValueListEntry(_messages.Message):
-"""A CorsValueListEntry object.
-
-Fields:
-  maxAgeSeconds: The value, in seconds, to return in the  Access-Control-
-Max-Age header used in preflight responses.
-  method: The list of HTTP methods on which to include CORS response
-headers, (GET, OPTIONS, POST, etc) Note: "*" is permitted in the list
-of methods, and means "any method".
-  origin: The list of Origins eligible to receive CORS response headers.
-Note: "*" is permitted in the list of origins, and means "any Origin".
-  responseHeader: The list of HTTP headers other than the simple response
-headers to give permission for the user-agent to share across domains.
-"""
-
-maxAgeSeconds =

[03/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py
--
diff --git 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py 
b/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py
deleted file mode 100644
index 98473ca..000
--- 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/apiclient.py
+++ /dev/null
@@ -1,726 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""Dataflow client utility functions."""
-
-import codecs
-import getpass
-import json
-import logging
-import os
-import re
-import time
-from StringIO import StringIO
-from datetime import datetime
-
-from apitools.base.py import encoding
-from apitools.base.py import exceptions
-
-from apache_beam import utils
-from apache_beam.internal.auth import get_service_credentials
-from apache_beam.internal.google_cloud_platform.json_value import to_json_value
-from apache_beam.io.google_cloud_platform.internal.clients import storage
-from apache_beam.runners.google_cloud_dataflow.internal.clients import dataflow
-from apache_beam.transforms import cy_combiners
-from apache_beam.transforms.display import DisplayData
-from apache_beam.utils import dependency
-from apache_beam.utils import retry
-from apache_beam.utils.dependency import get_required_container_version
-from apache_beam.utils.dependency import get_sdk_name_and_version
-from apache_beam.utils.names import PropertyNames
-from apache_beam.utils.pipeline_options import DebugOptions
-from apache_beam.utils.pipeline_options import GoogleCloudOptions
-from apache_beam.utils.pipeline_options import StandardOptions
-from apache_beam.utils.pipeline_options import WorkerOptions
-
-
-class Step(object):
-  """Wrapper for a dataflow Step protobuf."""
-
-  def __init__(self, step_kind, step_name, additional_properties=None):
-self.step_kind = step_kind
-self.step_name = step_name
-self.proto = dataflow.Step(kind=step_kind, name=step_name)
-self.proto.properties = {}
-self._additional_properties = []
-
-if additional_properties is not None:
-  for (n, v, t) in additional_properties:
-self.add_property(n, v, t)
-
-  def add_property(self, name, value, with_type=False):
-self._additional_properties.append((name, value, with_type))
-self.proto.properties.additionalProperties.append(
-dataflow.Step.PropertiesValue.AdditionalProperty(
-key=name, value=to_json_value(value, with_type=with_type)))
-
-  def _get_outputs(self):
-"""Returns a list of all output labels for a step."""
-outputs = []
-for p in self.proto.properties.additionalProperties:
-  if p.key == PropertyNames.OUTPUT_INFO:
-for entry in p.value.array_value.entries:
-  for entry_prop in entry.object_value.properties:
-if entry_prop.key == PropertyNames.OUTPUT_NAME:
-  outputs.append(entry_prop.value.string_value)
-return outputs
-
-  def __reduce__(self):
-"""Reduce hook for pickling the Step class more easily."""
-return (Step, (self.step_kind, self.step_name, 
self._additional_properties))
-
-  def get_output(self, tag=None):
-"""Returns name if it is one of the outputs or first output if name is 
None.
-
-Args:
-  tag: tag of the output as a string or None if we want to get the
-name of the first output.
-
-Returns:
-  The name of the output associated with the tag or the first output
-  if tag was None.
-
-Raises:
-  ValueError: if the tag does not exist within outputs.
-"""
-outputs = self._get_outputs()
-if tag is None:
-  return outputs[0]
-else:
-  name = '%s_%s' % (PropertyNames.OUT, tag)
-  if name not in outputs:
-raise ValueError(
-'Cannot find named output: %s in %s.' % (name, outputs))
-  return name
-
-
-class Environment(object):
-  """Wrapper for a dataflow Environment protobuf."""
-
-  def __init__(self, packages, options, environment_version):
-self.standard_options = options.view_as(StandardOptions)
-self.google_cloud_options =

[07/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/__init__.py
--
diff --git a/sdks/python/apache_beam/runners/__init__.py 
b/sdks/python/apache_beam/runners/__init__.py
index a77c928..2b93c30 100644
--- a/sdks/python/apache_beam/runners/__init__.py
+++ b/sdks/python/apache_beam/runners/__init__.py
@@ -26,5 +26,5 @@ from apache_beam.runners.runner import PipelineRunner
 from apache_beam.runners.runner import PipelineState
 from apache_beam.runners.runner import create_runner
 
-from apache_beam.runners.google_cloud_dataflow.dataflow_runner import 
DataflowRunner
-from apache_beam.runners.google_cloud_dataflow.test_dataflow_runner import 
TestDataflowRunner
+from apache_beam.runners.dataflow.dataflow_runner import DataflowRunner
+from apache_beam.runners.dataflow.test_dataflow_runner import 
TestDataflowRunner

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/dataflow/__init__.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/__init__.py 
b/sdks/python/apache_beam/runners/dataflow/__init__.py
new file mode 100644
index 000..cce3aca
--- /dev/null
+++ b/sdks/python/apache_beam/runners/dataflow/__init__.py
@@ -0,0 +1,16 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
new file mode 100644
index 000..1d86f2f
--- /dev/null
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
@@ -0,0 +1,33 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+DataflowRunner implementation of MetricResults. It is in charge of
+responding to queries of current metrics by going to the dataflow
+service.
+"""
+
+from apache_beam.metrics.metric import MetricResults
+
+
+# TODO(pabloem)(JIRA-1381) Implement this once metrics are queriable from
+# dataflow service
+class DataflowMetrics(MetricResults):
+
+  def query(self, filter=None):
+return {'counters': [],
+'distributions': []}

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py
new file mode 100644
index 000..5475ac7
--- /dev/null
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics_test.py
@@ -0,0 +1,20 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+#

[04/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py
--
diff --git 
a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py
 
b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py
new file mode 100644
index 000..4dda47a
--- /dev/null
+++ 
b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/message_matchers.py
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from hamcrest.core.base_matcher import BaseMatcher
+
+
+IGNORED = object()
+
+
+class MetricStructuredNameMatcher(BaseMatcher):
+  """Matches a MetricStructuredName."""
+  def __init__(self,
+   name=IGNORED,
+   origin=IGNORED,
+   context=IGNORED):
+"""Creates a MetricsStructuredNameMatcher.
+
+Any property not passed in to the constructor will be ignored when 
matching.
+
+Args:
+  name: A string with the metric name.
+  origin: A string with the metric namespace.
+  context: A key:value dictionary that will be matched to the
+structured name.
+"""
+if context != IGNORED and not isinstance(context, dict):
+  raise ValueError('context must be a Python dictionary.')
+
+self.name = name
+self.origin = origin
+self.context = context
+
+  def _matches(self, item):
+if self.name != IGNORED and item.name != self.name:
+  return False
+if self.origin != IGNORED and item.origin != self.origin:
+  return False
+if self.context != IGNORED:
+  for key, name in self.context.iteritems():
+if key not in item.context:
+  return False
+if name != IGNORED and item.context[key] != name:
+  return False
+return True
+
+  def describe_to(self, description):
+descriptors = []
+if self.name != IGNORED:
+  descriptors.append('name is {}'.format(self.name))
+if self.origin != IGNORED:
+  descriptors.append('origin is {}'.format(self.origin))
+if self.context != IGNORED:
+  descriptors.append('context is ({})'.format(str(self.context)))
+
+item_description = ' and '.join(descriptors)
+description.append(item_description)
+
+
+class MetricUpdateMatcher(BaseMatcher):
+  """Matches a metrics update protocol buffer."""
+  def __init__(self,
+   cumulative=IGNORED,
+   name=IGNORED,
+   scalar=IGNORED,
+   kind=IGNORED):
+"""Creates a MetricUpdateMatcher.
+
+Any property not passed in to the constructor will be ignored when 
matching.
+
+Args:
+  cumulative: A boolean.
+  name: A MetricStructuredNameMatcher object that matches the name.
+  scalar: An integer with the metric update.
+  kind: A string defining the kind of counter.
+"""
+if name != IGNORED and not isinstance(name, MetricStructuredNameMatcher):
+  raise ValueError('name must be a MetricStructuredNameMatcher.')
+
+self.cumulative = cumulative
+self.name = name
+self.scalar = scalar
+self.kind = kind
+
+  def _matches(self, item):
+if self.cumulative != IGNORED and item.cumulative != self.cumulative:
+  return False
+if self.name != IGNORED and not self.name._matches(item.name):
+  return False
+if self.kind != IGNORED and item.kind != self.kind:
+  return False
+if self.scalar != IGNORED:
+  value_property = [p
+for p in item.scalar.object_value.properties
+if p.key == 'value']
+  int_value = value_property[0].value.integer_value
+  if self.scalar != int_value:
+return False
+return True
+
+  def describe_to(self, description):
+descriptors = []
+if self.cumulative != IGNORED:
+  descriptors.append('cumulative is {}'.format(self.cumulative))
+if self.name != IGNORED:
+  descriptors.append('name is {}'.format(self.name))
+if self.scalar != IGNORED:
+  descriptors.append('scalar is ({})'.format(str(self.scalar)))
+
+item_description = ' and '.join(descriptors)
+description.append(item_description)

[17/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py
--
diff --git 
a/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py
 
b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py
new file mode 100644
index 000..4045428
--- /dev/null
+++ 
b/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py
@@ -0,0 +1,1910 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Generated message classes for bigquery version v2.
+
+A data platform for customers to create, manage, share and query data.
+"""
+# NOTE: This file is autogenerated and should not be edited by hand.
+
+from apitools.base.protorpclite import messages as _messages
+from apitools.base.py import encoding
+from apitools.base.py import extra_types
+
+
+package = 'bigquery'
+
+
+class BigqueryDatasetsDeleteRequest(_messages.Message):
+  """A BigqueryDatasetsDeleteRequest object.
+
+  Fields:
+datasetId: Dataset ID of dataset being deleted
+deleteContents: If True, delete all the tables in the dataset. If False
+  and the dataset contains tables, the request will fail. Default is False
+projectId: Project ID of the dataset being deleted
+  """
+
+  datasetId = _messages.StringField(1, required=True)
+  deleteContents = _messages.BooleanField(2)
+  projectId = _messages.StringField(3, required=True)
+
+
+class BigqueryDatasetsDeleteResponse(_messages.Message):
+  """An empty BigqueryDatasetsDelete response."""
+
+
+class BigqueryDatasetsGetRequest(_messages.Message):
+  """A BigqueryDatasetsGetRequest object.
+
+  Fields:
+datasetId: Dataset ID of the requested dataset
+projectId: Project ID of the requested dataset
+  """
+
+  datasetId = _messages.StringField(1, required=True)
+  projectId = _messages.StringField(2, required=True)
+
+
+class BigqueryDatasetsInsertRequest(_messages.Message):
+  """A BigqueryDatasetsInsertRequest object.
+
+  Fields:
+dataset: A Dataset resource to be passed as the request body.
+projectId: Project ID of the new dataset
+  """
+
+  dataset = _messages.MessageField('Dataset', 1)
+  projectId = _messages.StringField(2, required=True)
+
+
+class BigqueryDatasetsListRequest(_messages.Message):
+  """A BigqueryDatasetsListRequest object.
+
+  Fields:
+all: Whether to list all datasets, including hidden ones
+maxResults: The maximum number of results to return
+pageToken: Page token, returned by a previous call, to request the next
+  page of results
+projectId: Project ID of the datasets to be listed
+  """
+
+  all = _messages.BooleanField(1)
+  maxResults = _messages.IntegerField(2, variant=_messages.Variant.UINT32)
+  pageToken = _messages.StringField(3)
+  projectId = _messages.StringField(4, required=True)
+
+
+class BigqueryDatasetsPatchRequest(_messages.Message):
+  """A BigqueryDatasetsPatchRequest object.
+
+  Fields:
+dataset: A Dataset resource to be passed as the request body.
+datasetId: Dataset ID of the dataset being updated
+projectId: Project ID of the dataset being updated
+  """
+
+  dataset = _messages.MessageField('Dataset', 1)
+  datasetId = _messages.StringField(2, required=True)
+  projectId = _messages.StringField(3, required=True)
+
+
+class BigqueryDatasetsUpdateRequest(_messages.Message):
+  """A BigqueryDatasetsUpdateRequest object.
+
+  Fields:
+dataset: A Dataset resource to be passed as the request body.
+datasetId: Dataset ID of the dataset being updated
+projectId: Project ID of the dataset being updated
+  """
+
+  dataset = _messages.MessageField('Dataset', 1)
+  datasetId = _messages.StringField(2, required=True)
+  projectId = _messages.StringField(3, required=True)
+
+
+class BigqueryJobsCancelRequest(_messages.Message):
+  """A BigqueryJobsCancelRequest object.
+
+  Fields:
+jobId: [Required] Job ID of the job to cancel
+projectId: [Required] Project ID of the job to cancel
+  """
+
+  jobId = _messages.StringField(1, required=True)
+  projectId = _messages.StringField(2, required=True)
+
+
+class

[06/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
--
diff --git 
a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
 
b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
new file mode 100644
index 000..725d496
--- /dev/null
+++ 
b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py
@@ -0,0 +1,684 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Generated client library for dataflow version v1b3."""
+# NOTE: This file is autogenerated and should not be edited by hand.
+from apitools.base.py import base_api
+
+from apache_beam.runners.dataflow.internal.clients.dataflow import 
dataflow_v1b3_messages as messages
+
+
+class DataflowV1b3(base_api.BaseApiClient):
+  """Generated client library for service dataflow version v1b3."""
+
+  MESSAGES_MODULE = messages
+  BASE_URL = u'https://dataflow.googleapis.com/'
+
+  _PACKAGE = u'dataflow'
+  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/userinfo.email']
+  _VERSION = u'v1b3'
+  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
+  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _CLIENT_CLASS_NAME = u'DataflowV1b3'
+  _URL_VERSION = u'v1b3'
+  _API_KEY = None
+
+  def __init__(self, url='', credentials=None,
+   get_credentials=True, http=None, model=None,
+   log_request=False, log_response=False,
+   credentials_args=None, default_global_params=None,
+   additional_http_headers=None):
+"""Create a new dataflow handle."""
+url = url or self.BASE_URL
+super(DataflowV1b3, self).__init__(
+url, credentials=credentials,
+get_credentials=get_credentials, http=http, model=model,
+log_request=log_request, log_response=log_response,
+credentials_args=credentials_args,
+default_global_params=default_global_params,
+additional_http_headers=additional_http_headers)
+self.projects_jobs_debug = self.ProjectsJobsDebugService(self)
+self.projects_jobs_messages = self.ProjectsJobsMessagesService(self)
+self.projects_jobs_workItems = self.ProjectsJobsWorkItemsService(self)
+self.projects_jobs = self.ProjectsJobsService(self)
+self.projects_locations_jobs_messages = 
self.ProjectsLocationsJobsMessagesService(self)
+self.projects_locations_jobs_workItems = 
self.ProjectsLocationsJobsWorkItemsService(self)
+self.projects_locations_jobs = self.ProjectsLocationsJobsService(self)
+self.projects_locations = self.ProjectsLocationsService(self)
+self.projects_templates = self.ProjectsTemplatesService(self)
+self.projects = self.ProjectsService(self)
+
+  class ProjectsJobsDebugService(base_api.BaseApiService):
+"""Service class for the projects_jobs_debug resource."""
+
+_NAME = u'projects_jobs_debug'
+
+def __init__(self, client):
+  super(DataflowV1b3.ProjectsJobsDebugService, self).__init__(client)
+  self._upload_configs = {
+  }
+
+def GetConfig(self, request, global_params=None):
+  """Get encoded debug configuration for component. Not cacheable.
+
+  Args:
+request: (DataflowProjectsJobsDebugGetConfigRequest) input message
+global_params: (StandardQueryParameters, default: None) global 
arguments
+  Returns:
+(GetDebugConfigResponse) The response message.
+  """
+  config = self.GetMethodConfig('GetConfig')
+  return self._RunMethod(
+  config, request, global_params=global_params)
+
+GetConfig.method_config = lambda: base_api.ApiMethodInfo(
+http_method=u'POST',
+method_id=u'dataflow.projects.jobs.debug.getConfig',
+ordered_params=[u'projectId', u'jobId'],
+path_params=[u'jobId', u'projectId'],
+query_params=[],
+
relative_path=u'v1b3/projects/{projectId}/jobs/{jobId}/debug/getConfig',
+request_field=u'getDebugConfigRequest',
+

[12/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py
--
diff --git a/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py 
b/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py
deleted file mode 100644
index 195fafc..000
--- a/sdks/python/apache_beam/io/google_cloud_platform/gcsio.py
+++ /dev/null
@@ -1,871 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-"""Google Cloud Storage client.
-
-This library evolved from the Google App Engine GCS client available at
-https://github.com/GoogleCloudPlatform/appengine-gcs-client.
-"""
-
-import cStringIO
-import errno
-import fnmatch
-import logging
-import multiprocessing
-import os
-import Queue
-import re
-import threading
-import traceback
-
-import apitools.base.py.transfer as transfer
-from apitools.base.py.batch import BatchApiRequest
-from apitools.base.py.exceptions import HttpError
-
-from apache_beam.internal import auth
-from apache_beam.utils import retry
-
-# Issue a friendlier error message if the storage library is not available.
-# TODO(silviuc): Remove this guard when storage is available everywhere.
-try:
-  # pylint: disable=wrong-import-order, wrong-import-position
-  from apache_beam.io.google_cloud_platform.internal.clients import storage
-except ImportError:
-  raise RuntimeError(
-  'Google Cloud Storage I/O not supported for this execution environment '
-  '(could not import storage API client).')
-
-# This is the size of each partial-file read operation from GCS.  This
-# parameter was chosen to give good throughput while keeping memory usage at
-# a reasonable level; the following table shows throughput reached when
-# reading files of a given size with a chosen buffer size and informed the
-# choice of the value, as of 11/2016:
-#
-# +---++-+-+-+
-# |   | 50 MB file | 100 MB file | 200 MB file | 400 MB file |
-# +---++-+-+-+
-# | 8 MB buffer   | 17.12 MB/s | 22.67 MB/s  | 23.81 MB/s  | 26.05 MB/s  |
-# | 16 MB buffer  | 24.21 MB/s | 42.70 MB/s  | 42.89 MB/s  | 46.92 MB/s  |
-# | 32 MB buffer  | 28.53 MB/s | 48.08 MB/s  | 54.30 MB/s  | 54.65 MB/s  |
-# | 400 MB buffer | 34.72 MB/s | 71.13 MB/s  | 79.13 MB/s  | 85.39 MB/s  |
-# +---++-+-+-+
-DEFAULT_READ_BUFFER_SIZE = 16 * 1024 * 1024
-
-# This is the number of seconds the library will wait for a partial-file read
-# operation from GCS to complete before retrying.
-DEFAULT_READ_SEGMENT_TIMEOUT_SECONDS = 60
-
-# This is the size of chunks used when writing to GCS.
-WRITE_CHUNK_SIZE = 8 * 1024 * 1024
-
-
-# Maximum number of operations permitted in GcsIO.copy_batch() and
-# GcsIO.delete_batch().
-MAX_BATCH_OPERATION_SIZE = 100
-
-
-def parse_gcs_path(gcs_path):
-  """Return the bucket and object names of the given gs:// path."""
-  match = re.match('^gs://([^/]+)/(.+)$', gcs_path)
-  if match is None:
-raise ValueError('GCS path must be in the form gs:///.')
-  return match.group(1), match.group(2)
-
-
-class GcsIOError(IOError, retry.PermanentException):
-  """GCS IO error that should not be retried."""
-  pass
-
-
-class GcsIO(object):
-  """Google Cloud Storage I/O client."""
-
-  def __new__(cls, storage_client=None):
-if storage_client:
-  return super(GcsIO, cls).__new__(cls, storage_client)
-else:
-  # Create a single storage client for each thread.  We would like to avoid
-  # creating more than one storage client for each thread, since each
-  # initialization requires the relatively expensive step of initializing
-  # credentaials.
-  local_state = threading.local()
-  if getattr(local_state, 'gcsio_instance', None) is None:
-credentials = auth.get_service_credentials()
-storage_client = storage.StorageV1(credentials=credentials)
-local_state.gcsio_instance = (
-super(GcsIO, cls).__new__(cls, storage_client))
-local_state.gcsio_instance.client = storage_client
-  return local_state.gcsio_instance
-
-  def

[09/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_client.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_client.py
 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_client.py
deleted file mode 100644
index 6fed50d..000
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/storage/storage_v1_client.py
+++ /dev/null
@@ -1,1039 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""Generated client library for storage version v1."""
-# NOTE: This file is autogenerated and should not be edited by hand.
-from apitools.base.py import base_api
-
-from apache_beam.io.google_cloud_platform.internal.clients.storage import 
storage_v1_messages as messages
-
-
-class StorageV1(base_api.BaseApiClient):
-  """Generated client library for service storage version v1."""
-
-  MESSAGES_MODULE = messages
-
-  _PACKAGE = u'storage'
-  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/cloud-platform.read-only', 
u'https://www.googleapis.com/auth/devstorage.full_control', 
u'https://www.googleapis.com/auth/devstorage.read_only', 
u'https://www.googleapis.com/auth/devstorage.read_write']
-  _VERSION = u'v1'
-  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
-  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _CLIENT_CLASS_NAME = u'StorageV1'
-  _URL_VERSION = u'v1'
-  _API_KEY = None
-
-  def __init__(self, url='', credentials=None,
-   get_credentials=True, http=None, model=None,
-   log_request=False, log_response=False,
-   credentials_args=None, default_global_params=None,
-   additional_http_headers=None):
-"""Create a new storage handle."""
-url = url or u'https://www.googleapis.com/storage/v1/'
-super(StorageV1, self).__init__(
-url, credentials=credentials,
-get_credentials=get_credentials, http=http, model=model,
-log_request=log_request, log_response=log_response,
-credentials_args=credentials_args,
-default_global_params=default_global_params,
-additional_http_headers=additional_http_headers)
-self.bucketAccessControls = self.BucketAccessControlsService(self)
-self.buckets = self.BucketsService(self)
-self.channels = self.ChannelsService(self)
-self.defaultObjectAccessControls = 
self.DefaultObjectAccessControlsService(self)
-self.objectAccessControls = self.ObjectAccessControlsService(self)
-self.objects = self.ObjectsService(self)
-
-  class BucketAccessControlsService(base_api.BaseApiService):
-"""Service class for the bucketAccessControls resource."""
-
-_NAME = u'bucketAccessControls'
-
-def __init__(self, client):
-  super(StorageV1.BucketAccessControlsService, self).__init__(client)
-  self._method_configs = {
-  'Delete': base_api.ApiMethodInfo(
-  http_method=u'DELETE',
-  method_id=u'storage.bucketAccessControls.delete',
-  ordered_params=[u'bucket', u'entity'],
-  path_params=[u'bucket', u'entity'],
-  query_params=[],
-  relative_path=u'b/{bucket}/acl/{entity}',
-  request_field='',
-  request_type_name=u'StorageBucketAccessControlsDeleteRequest',
-  response_type_name=u'StorageBucketAccessControlsDeleteResponse',
-  supports_download=False,
-  ),
-  'Get': base_api.ApiMethodInfo(
-  http_method=u'GET',
-  method_id=u'storage.bucketAccessControls.get',
-  ordered_params=[u'bucket', u'entity'],
-  path_params=[u'bucket', u'entity'],
-  query_params=[],
-  relative_path=u'b/{bucket}/acl/{entity}',
-  request_field='',
-  request_type_name=u'StorageBucketAccessControlsGetRequest',
-  response_type_name=u'BucketAccessControl',
-  supports_download=False,
-  ),
-

[01/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

Repository: beam
Updated Branches:
  refs/heads/master cad84c880 -> 8ec890909


http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/message_matchers.py
--
diff --git 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/message_matchers.py
 
b/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/message_matchers.py
deleted file mode 100644
index 4dda47a..000
--- 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/message_matchers.py
+++ /dev/null
@@ -1,124 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-from hamcrest.core.base_matcher import BaseMatcher
-
-
-IGNORED = object()
-
-
-class MetricStructuredNameMatcher(BaseMatcher):
-  """Matches a MetricStructuredName."""
-  def __init__(self,
-   name=IGNORED,
-   origin=IGNORED,
-   context=IGNORED):
-"""Creates a MetricsStructuredNameMatcher.
-
-Any property not passed in to the constructor will be ignored when 
matching.
-
-Args:
-  name: A string with the metric name.
-  origin: A string with the metric namespace.
-  context: A key:value dictionary that will be matched to the
-structured name.
-"""
-if context != IGNORED and not isinstance(context, dict):
-  raise ValueError('context must be a Python dictionary.')
-
-self.name = name
-self.origin = origin
-self.context = context
-
-  def _matches(self, item):
-if self.name != IGNORED and item.name != self.name:
-  return False
-if self.origin != IGNORED and item.origin != self.origin:
-  return False
-if self.context != IGNORED:
-  for key, name in self.context.iteritems():
-if key not in item.context:
-  return False
-if name != IGNORED and item.context[key] != name:
-  return False
-return True
-
-  def describe_to(self, description):
-descriptors = []
-if self.name != IGNORED:
-  descriptors.append('name is {}'.format(self.name))
-if self.origin != IGNORED:
-  descriptors.append('origin is {}'.format(self.origin))
-if self.context != IGNORED:
-  descriptors.append('context is ({})'.format(str(self.context)))
-
-item_description = ' and '.join(descriptors)
-description.append(item_description)
-
-
-class MetricUpdateMatcher(BaseMatcher):
-  """Matches a metrics update protocol buffer."""
-  def __init__(self,
-   cumulative=IGNORED,
-   name=IGNORED,
-   scalar=IGNORED,
-   kind=IGNORED):
-"""Creates a MetricUpdateMatcher.
-
-Any property not passed in to the constructor will be ignored when 
matching.
-
-Args:
-  cumulative: A boolean.
-  name: A MetricStructuredNameMatcher object that matches the name.
-  scalar: An integer with the metric update.
-  kind: A string defining the kind of counter.
-"""
-if name != IGNORED and not isinstance(name, MetricStructuredNameMatcher):
-  raise ValueError('name must be a MetricStructuredNameMatcher.')
-
-self.cumulative = cumulative
-self.name = name
-self.scalar = scalar
-self.kind = kind
-
-  def _matches(self, item):
-if self.cumulative != IGNORED and item.cumulative != self.cumulative:
-  return False
-if self.name != IGNORED and not self.name._matches(item.name):
-  return False
-if self.kind != IGNORED and item.kind != self.kind:
-  return False
-if self.scalar != IGNORED:
-  value_property = [p
-for p in item.scalar.object_value.properties
-if p.key == 'value']
-  int_value = value_property[0].value.integer_value
-  if self.scalar != int_value:
-return False
-return True
-
-  def describe_to(self, description):
-descriptors = []
-if self.cumulative != IGNORED:
-  descriptors.append('cumulative is {}'.format(self.cumulative))
-if self.name != IGNORED:
-  descriptors.append('name is {}'.format(self.name))
-if self.scalar != IGNORED:
-

[11/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_client.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_client.py
 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_client.py
deleted file mode 100644
index 833d375..000
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_client.py
+++ /dev/null
@@ -1,660 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""Generated client library for bigquery version v2."""
-# NOTE: This file is autogenerated and should not be edited by hand.
-from apitools.base.py import base_api
-
-from apache_beam.io.google_cloud_platform.internal.clients.bigquery import 
bigquery_v2_messages as messages
-
-
-class BigqueryV2(base_api.BaseApiClient):
-  """Generated client library for service bigquery version v2."""
-
-  MESSAGES_MODULE = messages
-
-  _PACKAGE = u'bigquery'
-  _SCOPES = [u'https://www.googleapis.com/auth/bigquery', 
u'https://www.googleapis.com/auth/bigquery.insertdata', 
u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/cloud-platform.read-only', 
u'https://www.googleapis.com/auth/devstorage.full_control', 
u'https://www.googleapis.com/auth/devstorage.read_only', 
u'https://www.googleapis.com/auth/devstorage.read_write']
-  _VERSION = u'v2'
-  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
-  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
-  _CLIENT_CLASS_NAME = u'BigqueryV2'
-  _URL_VERSION = u'v2'
-  _API_KEY = None
-
-  def __init__(self, url='', credentials=None,
-   get_credentials=True, http=None, model=None,
-   log_request=False, log_response=False,
-   credentials_args=None, default_global_params=None,
-   additional_http_headers=None):
-"""Create a new bigquery handle."""
-url = url or u'https://www.googleapis.com/bigquery/v2/'
-super(BigqueryV2, self).__init__(
-url, credentials=credentials,
-get_credentials=get_credentials, http=http, model=model,
-log_request=log_request, log_response=log_response,
-credentials_args=credentials_args,
-default_global_params=default_global_params,
-additional_http_headers=additional_http_headers)
-self.datasets = self.DatasetsService(self)
-self.jobs = self.JobsService(self)
-self.projects = self.ProjectsService(self)
-self.tabledata = self.TabledataService(self)
-self.tables = self.TablesService(self)
-
-  class DatasetsService(base_api.BaseApiService):
-"""Service class for the datasets resource."""
-
-_NAME = u'datasets'
-
-def __init__(self, client):
-  super(BigqueryV2.DatasetsService, self).__init__(client)
-  self._method_configs = {
-  'Delete': base_api.ApiMethodInfo(
-  http_method=u'DELETE',
-  method_id=u'bigquery.datasets.delete',
-  ordered_params=[u'projectId', u'datasetId'],
-  path_params=[u'datasetId', u'projectId'],
-  query_params=[u'deleteContents'],
-  relative_path=u'projects/{projectId}/datasets/{datasetId}',
-  request_field='',
-  request_type_name=u'BigqueryDatasetsDeleteRequest',
-  response_type_name=u'BigqueryDatasetsDeleteResponse',
-  supports_download=False,
-  ),
-  'Get': base_api.ApiMethodInfo(
-  http_method=u'GET',
-  method_id=u'bigquery.datasets.get',
-  ordered_params=[u'projectId', u'datasetId'],
-  path_params=[u'datasetId', u'projectId'],
-  query_params=[],
-  relative_path=u'projects/{projectId}/datasets/{datasetId}',
-  request_field='',
-  request_type_name=u'BigqueryDatasetsGetRequest',
-  response_type_name=u'Dataset',
-  supports_download=False,
-  ),
-  'Insert': base_api.ApiMethodInfo(
-

[13/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py 
b/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py
deleted file mode 100644
index 335c532..000
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/datastore/v1/datastoreio.py
+++ /dev/null
@@ -1,397 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""A connector for reading from and writing to Google Cloud Datastore"""
-
-import logging
-
-# Protect against environments where datastore library is not available.
-# pylint: disable=wrong-import-order, wrong-import-position
-try:
-  from google.cloud.proto.datastore.v1 import datastore_pb2
-  from googledatastore import helper as datastore_helper
-except ImportError:
-  pass
-# pylint: enable=wrong-import-order, wrong-import-position
-
-from apache_beam.io.google_cloud_platform.datastore.v1 import helper
-from apache_beam.io.google_cloud_platform.datastore.v1 import query_splitter
-from apache_beam.transforms import Create
-from apache_beam.transforms import DoFn
-from apache_beam.transforms import FlatMap
-from apache_beam.transforms import GroupByKey
-from apache_beam.transforms import Map
-from apache_beam.transforms import PTransform
-from apache_beam.transforms import ParDo
-from apache_beam.transforms.util import Values
-
-__all__ = ['ReadFromDatastore', 'WriteToDatastore', 'DeleteFromDatastore']
-
-
-class ReadFromDatastore(PTransform):
-  """A ``PTransform`` for reading from Google Cloud Datastore.
-
-  To read a ``PCollection[Entity]`` from a Cloud Datastore ``Query``, use
-  ``ReadFromDatastore`` transform by providing a `project` id and a `query` to
-  read from. You can optionally provide a `namespace` and/or specify how many
-  splits you want for the query through `num_splits` option.
-
-  Note: Normally, a runner will read from Cloud Datastore in parallel across
-  many workers. However, when the `query` is configured with a `limit` or if 
the
-  query contains inequality filters like `GREATER_THAN, LESS_THAN` etc., then
-  all the returned results will be read by a single worker in order to ensure
-  correct data. Since data is read from a single worker, this could have
-  significant impact on the performance of the job.
-
-  The semantics for the query splitting is defined below:
-1. If `num_splits` is equal to 0, then the number of splits will be chosen
-dynamically at runtime based on the query data size.
-
-2. Any value of `num_splits` greater than
-`ReadFromDatastore._NUM_QUERY_SPLITS_MAX` will be capped at that value.
-
-3. If the `query` has a user limit set, or contains inequality filters, 
then
-`num_splits` will be ignored and no split will be performed.
-
-4. Under certain cases Cloud Datastore is unable to split query to the
-requested number of splits. In such cases we just use whatever the Cloud
-Datastore returns.
-
-  See https://developers.google.com/datastore/ for more details on Google Cloud
-  Datastore.
-  """
-
-  # An upper bound on the number of splits for a query.
-  _NUM_QUERY_SPLITS_MAX = 5
-  # A lower bound on the number of splits for a query. This is to ensure that
-  # we parellelize the query even when Datastore statistics are not available.
-  _NUM_QUERY_SPLITS_MIN = 12
-  # Default bundle size of 64MB.
-  _DEFAULT_BUNDLE_SIZE_BYTES = 64 * 1024 * 1024
-
-  def __init__(self, project, query, namespace=None, num_splits=0):
-"""Initialize the ReadFromDatastore transform.
-
-Args:
-  project: The Project ID
-  query: Cloud Datastore query to be read from.
-  namespace: An optional namespace.
-  num_splits: Number of splits for the query.
-"""
-logging.warning('datastoreio read transform is experimental.')
-super(ReadFromDatastore, self).__init__()
-
-if not project:
-  ValueError("Project cannot be empty")
-if not query:
-  ValueError("Query cannot be empty")
-if num_splits < 0:
-  ValueError("num_splits must be greater than or equal 0")
-

[05/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
--
diff --git 
a/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
 
b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
new file mode 100644
index 000..a42154e
--- /dev/null
+++ 
b/sdks/python/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
@@ -0,0 +1,4173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Generated message classes for dataflow version v1b3.
+
+Develops and executes data processing patterns like ETL, batch computation,
+and continuous computation.
+"""
+# NOTE: This file is autogenerated and should not be edited by hand.
+
+from apitools.base.protorpclite import messages as _messages
+from apitools.base.py import encoding
+from apitools.base.py import extra_types
+
+
+package = 'dataflow'
+
+
+class ApproximateProgress(_messages.Message):
+  """Obsolete in favor of ApproximateReportedProgress and
+  ApproximateSplitRequest.
+
+  Fields:
+percentComplete: Obsolete.
+position: Obsolete.
+remainingTime: Obsolete.
+  """
+
+  percentComplete = _messages.FloatField(1, variant=_messages.Variant.FLOAT)
+  position = _messages.MessageField('Position', 2)
+  remainingTime = _messages.StringField(3)
+
+
+class ApproximateReportedProgress(_messages.Message):
+  """A progress measurement of a WorkItem by a worker.
+
+  Fields:
+consumedParallelism: Total amount of parallelism in the portion of input
+  of this task that has already been consumed and is no longer active. In
+  the first two examples above (see remaining_parallelism), the value
+  should be 29 or 2 respectively.  The sum of remaining_parallelism and
+  consumed_parallelism should equal the total amount of parallelism in
+  this work item.  If specified, must be finite.
+fractionConsumed: Completion as fraction of the input consumed, from 0.0
+  (beginning, nothing consumed), to 1.0 (end of the input, entire input
+  consumed).
+position: A Position within the work to represent a progress.
+remainingParallelism: Total amount of parallelism in the input of this
+  task that remains, (i.e. can be delegated to this task and any new tasks
+  via dynamic splitting). Always at least 1 for non-finished work items
+  and 0 for finished.  "Amount of parallelism" refers to how many non-
+  empty parts of the input can be read in parallel. This does not
+  necessarily equal number of records. An input that can be read in
+  parallel down to the individual records is called "perfectly
+  splittable". An example of non-perfectly parallelizable input is a
+  block-compressed file format where a block of records has to be read as
+  a whole, but different blocks can be read in parallel.  Examples: * If
+  we are processing record #30 (starting at 1) out of 50 in a perfectly
+  splittable 50-record input, this value should be 21 (20 remaining + 1
+  current). * If we are reading through block 3 in a block-compressed file
+  consisting   of 5 blocks, this value should be 3 (since blocks 4 and 5
+  can be   processed in parallel by new tasks via dynamic splitting and
+  the current   task remains processing block 3). * If we are reading
+  through the last block in a block-compressed file,   or reading or
+  processing the last record in a perfectly splittable   input, this value
+  should be 1, because apart from the current task, no   additional
+  remainder can be split off.
+  """
+
+  consumedParallelism = _messages.MessageField('ReportedParallelism', 1)
+  fractionConsumed = _messages.FloatField(2)
+  position = _messages.MessageField('Position', 3)
+  remainingParallelism = _messages.MessageField('ReportedParallelism', 4)
+
+
+class ApproximateSplitRequest(_messages.Message):
+  """A suggestion by the service to the worker to dynamically split the
+  WorkItem.
+
+  Fields:
+fractionConsumed: A fraction at which to split the work item,

[02/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
--
diff --git 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
 
b/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
deleted file mode 100644
index a42154e..000
--- 
a/sdks/python/apache_beam/runners/google_cloud_dataflow/internal/clients/dataflow/dataflow_v1b3_messages.py
+++ /dev/null
@@ -1,4173 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""Generated message classes for dataflow version v1b3.
-
-Develops and executes data processing patterns like ETL, batch computation,
-and continuous computation.
-"""
-# NOTE: This file is autogenerated and should not be edited by hand.
-
-from apitools.base.protorpclite import messages as _messages
-from apitools.base.py import encoding
-from apitools.base.py import extra_types
-
-
-package = 'dataflow'
-
-
-class ApproximateProgress(_messages.Message):
-  """Obsolete in favor of ApproximateReportedProgress and
-  ApproximateSplitRequest.
-
-  Fields:
-percentComplete: Obsolete.
-position: Obsolete.
-remainingTime: Obsolete.
-  """
-
-  percentComplete = _messages.FloatField(1, variant=_messages.Variant.FLOAT)
-  position = _messages.MessageField('Position', 2)
-  remainingTime = _messages.StringField(3)
-
-
-class ApproximateReportedProgress(_messages.Message):
-  """A progress measurement of a WorkItem by a worker.
-
-  Fields:
-consumedParallelism: Total amount of parallelism in the portion of input
-  of this task that has already been consumed and is no longer active. In
-  the first two examples above (see remaining_parallelism), the value
-  should be 29 or 2 respectively.  The sum of remaining_parallelism and
-  consumed_parallelism should equal the total amount of parallelism in
-  this work item.  If specified, must be finite.
-fractionConsumed: Completion as fraction of the input consumed, from 0.0
-  (beginning, nothing consumed), to 1.0 (end of the input, entire input
-  consumed).
-position: A Position within the work to represent a progress.
-remainingParallelism: Total amount of parallelism in the input of this
-  task that remains, (i.e. can be delegated to this task and any new tasks
-  via dynamic splitting). Always at least 1 for non-finished work items
-  and 0 for finished.  "Amount of parallelism" refers to how many non-
-  empty parts of the input can be read in parallel. This does not
-  necessarily equal number of records. An input that can be read in
-  parallel down to the individual records is called "perfectly
-  splittable". An example of non-perfectly parallelizable input is a
-  block-compressed file format where a block of records has to be read as
-  a whole, but different blocks can be read in parallel.  Examples: * If
-  we are processing record #30 (starting at 1) out of 50 in a perfectly
-  splittable 50-record input, this value should be 21 (20 remaining + 1
-  current). * If we are reading through block 3 in a block-compressed file
-  consisting   of 5 blocks, this value should be 3 (since blocks 4 and 5
-  can be   processed in parallel by new tasks via dynamic splitting and
-  the current   task remains processing block 3). * If we are reading
-  through the last block in a block-compressed file,   or reading or
-  processing the last record in a perfectly splittable   input, this value
-  should be 1, because apart from the current task, no   additional
-  remainder can be split off.
-  """
-
-  consumedParallelism = _messages.MessageField('ReportedParallelism', 1)
-  fractionConsumed = _messages.FloatField(2)
-  position = _messages.MessageField('Position', 3)
-  remainingParallelism = _messages.MessageField('ReportedParallelism', 4)
-
-
-class ApproximateSplitRequest(_messages.Message):
-  """A suggestion by the service to the worker to dynamically split the
-  WorkItem.
-
-  Fields:
-

[16/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py
--
diff --git 
a/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py 
b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py
new file mode 100644
index 000..1b46d91
--- /dev/null
+++ 
b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py
@@ -0,0 +1,1039 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Generated client library for storage version v1."""
+# NOTE: This file is autogenerated and should not be edited by hand.
+from apitools.base.py import base_api
+
+from apache_beam.io.gcp.internal.clients.storage import storage_v1_messages as 
messages
+
+
+class StorageV1(base_api.BaseApiClient):
+  """Generated client library for service storage version v1."""
+
+  MESSAGES_MODULE = messages
+
+  _PACKAGE = u'storage'
+  _SCOPES = [u'https://www.googleapis.com/auth/cloud-platform', 
u'https://www.googleapis.com/auth/cloud-platform.read-only', 
u'https://www.googleapis.com/auth/devstorage.full_control', 
u'https://www.googleapis.com/auth/devstorage.read_only', 
u'https://www.googleapis.com/auth/devstorage.read_write']
+  _VERSION = u'v1'
+  _CLIENT_ID = '1042881264118.apps.googleusercontent.com'
+  _CLIENT_SECRET = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _USER_AGENT = 'x_Tw5K8nnjoRAqULM9PFAC2b'
+  _CLIENT_CLASS_NAME = u'StorageV1'
+  _URL_VERSION = u'v1'
+  _API_KEY = None
+
+  def __init__(self, url='', credentials=None,
+   get_credentials=True, http=None, model=None,
+   log_request=False, log_response=False,
+   credentials_args=None, default_global_params=None,
+   additional_http_headers=None):
+"""Create a new storage handle."""
+url = url or u'https://www.googleapis.com/storage/v1/'
+super(StorageV1, self).__init__(
+url, credentials=credentials,
+get_credentials=get_credentials, http=http, model=model,
+log_request=log_request, log_response=log_response,
+credentials_args=credentials_args,
+default_global_params=default_global_params,
+additional_http_headers=additional_http_headers)
+self.bucketAccessControls = self.BucketAccessControlsService(self)
+self.buckets = self.BucketsService(self)
+self.channels = self.ChannelsService(self)
+self.defaultObjectAccessControls = 
self.DefaultObjectAccessControlsService(self)
+self.objectAccessControls = self.ObjectAccessControlsService(self)
+self.objects = self.ObjectsService(self)
+
+  class BucketAccessControlsService(base_api.BaseApiService):
+"""Service class for the bucketAccessControls resource."""
+
+_NAME = u'bucketAccessControls'
+
+def __init__(self, client):
+  super(StorageV1.BucketAccessControlsService, self).__init__(client)
+  self._method_configs = {
+  'Delete': base_api.ApiMethodInfo(
+  http_method=u'DELETE',
+  method_id=u'storage.bucketAccessControls.delete',
+  ordered_params=[u'bucket', u'entity'],
+  path_params=[u'bucket', u'entity'],
+  query_params=[],
+  relative_path=u'b/{bucket}/acl/{entity}',
+  request_field='',
+  request_type_name=u'StorageBucketAccessControlsDeleteRequest',
+  response_type_name=u'StorageBucketAccessControlsDeleteResponse',
+  supports_download=False,
+  ),
+  'Get': base_api.ApiMethodInfo(
+  http_method=u'GET',
+  method_id=u'storage.bucketAccessControls.get',
+  ordered_params=[u'bucket', u'entity'],
+  path_params=[u'bucket', u'entity'],
+  query_params=[],
+  relative_path=u'b/{bucket}/acl/{entity}',
+  request_field='',
+  request_type_name=u'StorageBucketAccessControlsGetRequest',
+  response_type_name=u'BucketAccessControl',
+  supports_download=False,
+  ),
+  'Insert': base_api.ApiMethodInfo(
+  http_method=u'POST',
+

[15/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py
--
diff --git 
a/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py
 
b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py
new file mode 100644
index 000..dc9e5e6
--- /dev/null
+++ 
b/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py
@@ -0,0 +1,1920 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Generated message classes for storage version v1.
+
+Stores and retrieves potentially large, immutable data objects.
+"""
+# NOTE: This file is autogenerated and should not be edited by hand.
+
+from apitools.base.protorpclite import message_types as _message_types
+from apitools.base.protorpclite import messages as _messages
+from apitools.base.py import encoding
+from apitools.base.py import extra_types
+
+
+package = 'storage'
+
+
+class Bucket(_messages.Message):
+  """A bucket.
+
+  Messages:
+CorsValueListEntry: A CorsValueListEntry object.
+LifecycleValue: The bucket's lifecycle configuration. See lifecycle
+  management for more information.
+LoggingValue: The bucket's logging configuration, which defines the
+  destination bucket and optional name prefix for the current bucket's
+  logs.
+OwnerValue: The owner of the bucket. This is always the project team's
+  owner group.
+VersioningValue: The bucket's versioning configuration.
+WebsiteValue: The bucket's website configuration.
+
+  Fields:
+acl: Access controls on the bucket.
+cors: The bucket's Cross-Origin Resource Sharing (CORS) configuration.
+defaultObjectAcl: Default access controls to apply to new objects when no
+  ACL is provided.
+etag: HTTP 1.1 Entity tag for the bucket.
+id: The ID of the bucket.
+kind: The kind of item this is. For buckets, this is always
+  storage#bucket.
+lifecycle: The bucket's lifecycle configuration. See lifecycle management
+  for more information.
+location: The location of the bucket. Object data for objects in the
+  bucket resides in physical storage within this region. Defaults to US.
+  See the developer's guide for the authoritative list.
+logging: The bucket's logging configuration, which defines the destination
+  bucket and optional name prefix for the current bucket's logs.
+metageneration: The metadata generation of this bucket.
+name: The name of the bucket.
+owner: The owner of the bucket. This is always the project team's owner
+  group.
+projectNumber: The project number of the project the bucket belongs to.
+selfLink: The URI of this bucket.
+storageClass: The bucket's storage class. This defines how objects in the
+  bucket are stored and determines the SLA and the cost of storage. Values
+  include STANDARD, NEARLINE and DURABLE_REDUCED_AVAILABILITY. Defaults to
+  STANDARD. For more information, see storage classes.
+timeCreated: The creation time of the bucket in RFC 3339 format.
+updated: The modification time of the bucket in RFC 3339 format.
+versioning: The bucket's versioning configuration.
+website: The bucket's website configuration.
+  """
+
+  class CorsValueListEntry(_messages.Message):
+"""A CorsValueListEntry object.
+
+Fields:
+  maxAgeSeconds: The value, in seconds, to return in the  Access-Control-
+Max-Age header used in preflight responses.
+  method: The list of HTTP methods on which to include CORS response
+headers, (GET, OPTIONS, POST, etc) Note: "*" is permitted in the list
+of methods, and means "any method".
+  origin: The list of Origins eligible to receive CORS response headers.
+Note: "*" is permitted in the list of origins, and means "any Origin".
+  responseHeader: The list of HTTP headers other than the simple response
+headers to give permission for the user-agent to share across domains.
+"""
+
+maxAgeSeconds = _messages.IntegerField(1, variant=_messages.Variant.INT32)
+method =

[21/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

Rename google_cloud_dataflow and google_cloud_platform


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/59ad58ac
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/59ad58ac
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/59ad58ac

Branch: refs/heads/master
Commit: 59ad58ac530aac9f0d4b81ac08c8ca2e3be4f1dd
Parents: cad84c8
Author: Sourabh Bajaj 
Authored: Wed Feb 22 15:28:38 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 16:30:19 2017 -0800

--
 .../examples/cookbook/bigquery_schema.py|2 +-
 .../examples/cookbook/datastore_wordcount.py|4 +-
 .../apache_beam/examples/snippets/snippets.py   |4 +-
 .../python/apache_beam/internal/gcp/__init__.py |   16 +
 .../apache_beam/internal/gcp/json_value.py  |  147 +
 .../apache_beam/internal/gcp/json_value_test.py |   93 +
 .../internal/google_cloud_platform/__init__.py  |   16 -
 .../google_cloud_platform/json_value.py |  147 -
 .../google_cloud_platform/json_value_test.py|   93 -
 sdks/python/apache_beam/io/__init__.py  |4 +-
 sdks/python/apache_beam/io/fileio.py|2 +-
 sdks/python/apache_beam/io/gcp/__init__.py  |   16 +
 sdks/python/apache_beam/io/gcp/bigquery.py  | 1081 +
 sdks/python/apache_beam/io/gcp/bigquery_test.py |  828 
 .../apache_beam/io/gcp/datastore/__init__.py|   16 +
 .../apache_beam/io/gcp/datastore/v1/__init__.py |   16 +
 .../io/gcp/datastore/v1/datastoreio.py  |  397 ++
 .../io/gcp/datastore/v1/datastoreio_test.py |  245 +
 .../io/gcp/datastore/v1/fake_datastore.py   |   98 +
 .../apache_beam/io/gcp/datastore/v1/helper.py   |  274 ++
 .../io/gcp/datastore/v1/helper_test.py  |  265 ++
 .../io/gcp/datastore/v1/query_splitter.py   |  275 ++
 .../io/gcp/datastore/v1/query_splitter_test.py  |  208 +
 sdks/python/apache_beam/io/gcp/gcsio.py |  871 
 sdks/python/apache_beam/io/gcp/gcsio_test.py|  796 
 .../apache_beam/io/gcp/internal/__init__.py |   16 +
 .../io/gcp/internal/clients/__init__.py |   16 +
 .../gcp/internal/clients/bigquery/__init__.py   |   33 +
 .../clients/bigquery/bigquery_v2_client.py  |  660 +++
 .../clients/bigquery/bigquery_v2_messages.py| 1910 
 .../io/gcp/internal/clients/storage/__init__.py |   33 +
 .../clients/storage/storage_v1_client.py| 1039 +
 .../clients/storage/storage_v1_messages.py  | 1920 
 sdks/python/apache_beam/io/gcp/pubsub.py|   91 +
 sdks/python/apache_beam/io/gcp/pubsub_test.py   |   63 +
 .../io/google_cloud_platform/__init__.py|   16 -
 .../io/google_cloud_platform/bigquery.py| 1081 -
 .../io/google_cloud_platform/bigquery_test.py   |  828 
 .../google_cloud_platform/datastore/__init__.py |   16 -
 .../datastore/v1/__init__.py|   16 -
 .../datastore/v1/datastoreio.py |  397 --
 .../datastore/v1/datastoreio_test.py|  245 -
 .../datastore/v1/fake_datastore.py  |   98 -
 .../datastore/v1/helper.py  |  274 --
 .../datastore/v1/helper_test.py |  265 --
 .../datastore/v1/query_splitter.py  |  275 --
 .../datastore/v1/query_splitter_test.py |  208 -
 .../io/google_cloud_platform/gcsio.py   |  871 
 .../io/google_cloud_platform/gcsio_test.py  |  796 
 .../google_cloud_platform/internal/__init__.py  |   16 -
 .../internal/clients/__init__.py|   16 -
 .../internal/clients/bigquery/__init__.py   |   33 -
 .../clients/bigquery/bigquery_v2_client.py  |  660 ---
 .../clients/bigquery/bigquery_v2_messages.py| 1910 
 .../internal/clients/storage/__init__.py|   33 -
 .../clients/storage/storage_v1_client.py| 1039 -
 .../clients/storage/storage_v1_messages.py  | 1920 
 .../io/google_cloud_platform/pubsub.py  |   91 -
 .../io/google_cloud_platform/pubsub_test.py |   63 -
 sdks/python/apache_beam/io/iobase.py|4 +-
 sdks/python/apache_beam/pipeline_test.py|2 +-
 sdks/python/apache_beam/runners/__init__.py |4 +-
 .../apache_beam/runners/dataflow/__init__.py|   16 +
 .../runners/dataflow/dataflow_metrics.py|   33 +
 .../runners/dataflow/dataflow_metrics_test.py   |   20 +
 .../runners/dataflow/dataflow_runner.py |  724 +++
 .../runners/dataflow/dataflow_runner_test.py|   78 +
 .../runners/dataflow/internal/__init__.py   |   16 +
 .../runners/dataflow/internal/apiclient.py  |  726 +++
 .../runners/dataflow/internal/apiclient_test.py |   96 +
 .../dataflow/internal/clients/__init__.py   |   16 +
 .../internal/clients/dataflow/__init__.py   |   33 +
 .../clients/dataflow/dataflow_v1b3_client.py|  684 +++

[10/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_messages.py
--
diff --git 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_messages.py
 
b/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_messages.py
deleted file mode 100644
index 4045428..000
--- 
a/sdks/python/apache_beam/io/google_cloud_platform/internal/clients/bigquery/bigquery_v2_messages.py
+++ /dev/null
@@ -1,1910 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-"""Generated message classes for bigquery version v2.
-
-A data platform for customers to create, manage, share and query data.
-"""
-# NOTE: This file is autogenerated and should not be edited by hand.
-
-from apitools.base.protorpclite import messages as _messages
-from apitools.base.py import encoding
-from apitools.base.py import extra_types
-
-
-package = 'bigquery'
-
-
-class BigqueryDatasetsDeleteRequest(_messages.Message):
-  """A BigqueryDatasetsDeleteRequest object.
-
-  Fields:
-datasetId: Dataset ID of dataset being deleted
-deleteContents: If True, delete all the tables in the dataset. If False
-  and the dataset contains tables, the request will fail. Default is False
-projectId: Project ID of the dataset being deleted
-  """
-
-  datasetId = _messages.StringField(1, required=True)
-  deleteContents = _messages.BooleanField(2)
-  projectId = _messages.StringField(3, required=True)
-
-
-class BigqueryDatasetsDeleteResponse(_messages.Message):
-  """An empty BigqueryDatasetsDelete response."""
-
-
-class BigqueryDatasetsGetRequest(_messages.Message):
-  """A BigqueryDatasetsGetRequest object.
-
-  Fields:
-datasetId: Dataset ID of the requested dataset
-projectId: Project ID of the requested dataset
-  """
-
-  datasetId = _messages.StringField(1, required=True)
-  projectId = _messages.StringField(2, required=True)
-
-
-class BigqueryDatasetsInsertRequest(_messages.Message):
-  """A BigqueryDatasetsInsertRequest object.
-
-  Fields:
-dataset: A Dataset resource to be passed as the request body.
-projectId: Project ID of the new dataset
-  """
-
-  dataset = _messages.MessageField('Dataset', 1)
-  projectId = _messages.StringField(2, required=True)
-
-
-class BigqueryDatasetsListRequest(_messages.Message):
-  """A BigqueryDatasetsListRequest object.
-
-  Fields:
-all: Whether to list all datasets, including hidden ones
-maxResults: The maximum number of results to return
-pageToken: Page token, returned by a previous call, to request the next
-  page of results
-projectId: Project ID of the datasets to be listed
-  """
-
-  all = _messages.BooleanField(1)
-  maxResults = _messages.IntegerField(2, variant=_messages.Variant.UINT32)
-  pageToken = _messages.StringField(3)
-  projectId = _messages.StringField(4, required=True)
-
-
-class BigqueryDatasetsPatchRequest(_messages.Message):
-  """A BigqueryDatasetsPatchRequest object.
-
-  Fields:
-dataset: A Dataset resource to be passed as the request body.
-datasetId: Dataset ID of the dataset being updated
-projectId: Project ID of the dataset being updated
-  """
-
-  dataset = _messages.MessageField('Dataset', 1)
-  datasetId = _messages.StringField(2, required=True)
-  projectId = _messages.StringField(3, required=True)
-
-
-class BigqueryDatasetsUpdateRequest(_messages.Message):
-  """A BigqueryDatasetsUpdateRequest object.
-
-  Fields:
-dataset: A Dataset resource to be passed as the request body.
-datasetId: Dataset ID of the dataset being updated
-projectId: Project ID of the dataset being updated
-  """
-
-  dataset = _messages.MessageField('Dataset', 1)
-  datasetId = _messages.StringField(2, required=True)
-  projectId = _messages.StringField(3, required=True)
-
-
-class BigqueryJobsCancelRequest(_messages.Message):
-  """A BigqueryJobsCancelRequest object.
-
-  Fields:
-jobId: [Required] Job ID of the job to cancel
-projectId: [Required] Project ID of the job to cancel
-  """
-
-  jobId = _messages.StringField(1, required=True)
-  projectId =

[19/22] beam git commit: Rename google_cloud_dataflow and google_cloud_platform

2017-02-22 Thread altay

http://git-wip-us.apache.org/repos/asf/beam/blob/59ad58ac/sdks/python/apache_beam/io/gcp/datastore/v1/query_splitter.py
--
diff --git a/sdks/python/apache_beam/io/gcp/datastore/v1/query_splitter.py 
b/sdks/python/apache_beam/io/gcp/datastore/v1/query_splitter.py
new file mode 100644
index 000..8ced170
--- /dev/null
+++ b/sdks/python/apache_beam/io/gcp/datastore/v1/query_splitter.py
@@ -0,0 +1,275 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Implements a Cloud Datastore query splitter."""
+
+from apache_beam.io.gcp.datastore.v1 import helper
+
+# Protect against environments where datastore library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud.proto.datastore.v1 import datastore_pb2
+  from google.cloud.proto.datastore.v1 import query_pb2
+  from google.cloud.proto.datastore.v1.query_pb2 import PropertyFilter
+  from google.cloud.proto.datastore.v1.query_pb2 import CompositeFilter
+  from googledatastore import helper as datastore_helper
+  UNSUPPORTED_OPERATORS = [PropertyFilter.LESS_THAN,
+   PropertyFilter.LESS_THAN_OR_EQUAL,
+   PropertyFilter.GREATER_THAN,
+   PropertyFilter.GREATER_THAN_OR_EQUAL]
+except ImportError:
+  UNSUPPORTED_OPERATORS = None
+# pylint: enable=wrong-import-order, wrong-import-position
+
+
+__all__ = [
+'get_splits',
+]
+
+SCATTER_PROPERTY_NAME = '__scatter__'
+KEY_PROPERTY_NAME = '__key__'
+# The number of keys to sample for each split.
+KEYS_PER_SPLIT = 32
+
+
+def get_splits(datastore, query, num_splits, partition=None):
+  """Returns a list of sharded queries for the given Cloud Datastore query.
+
+  This will create up to the desired number of splits, however it may return
+  less splits if the desired number of splits is unavailable. This will happen
+  if the number of split points provided by the underlying Datastore is less
+  than the desired number, which will occur if the number of results for the
+  query is too small.
+
+  This implementation of the QuerySplitter uses the __scatter__ property to
+  gather random split points for a query.
+
+  Note: This implementation is derived from the java query splitter in
+  
https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/java/datastore/src/main/java/com/google/datastore/v1/client/QuerySplitterImpl.java
+
+  Args:
+datastore: the datastore client.
+query: the query to split.
+num_splits: the desired number of splits.
+partition: the partition the query is running in.
+
+  Returns:
+A list of split queries, of a max length of `num_splits`
+  """
+
+  # Validate that the number of splits is not out of bounds.
+  if num_splits < 1:
+raise ValueError('The number of splits must be greater than 0.')
+
+  if num_splits == 1:
+return [query]
+
+  _validate_query(query)
+
+  splits = []
+  scatter_keys = _get_scatter_keys(datastore, query, num_splits, partition)
+  last_key = None
+  for next_key in _get_split_key(scatter_keys, num_splits):
+splits.append(_create_split(last_key, next_key, query))
+last_key = next_key
+
+  splits.append(_create_split(last_key, None, query))
+  return splits
+
+
+def _validate_query(query):
+  """ Verifies that the given query can be properly scattered."""
+
+  if len(query.kind) != 1:
+raise ValueError('Query must have exactly one kind.')
+
+  if len(query.order) != 0:
+raise ValueError('Query cannot have any sort orders.')
+
+  if query.HasField('limit'):
+raise ValueError('Query cannot have a limit set.')
+
+  if query.offset > 0:
+raise ValueError('Query cannot have an offset set.')
+
+  _validate_filter(query.filter)
+
+
+def _validate_filter(filter):
+  """Validates that we only have allowable filters.
+
+  Note that equality and ancestor filters are allowed, however they may result
+  in inefficient sharding.
+  """
+
+  if filter.HasField('composite_filter'):
+for sub_filter in filter.composite_filter.filters:
+  _validate_filter(sub_filter)
+  elif filter.HasField('property_filter'):
+if filter.property_filter.op in UNSUPPORTED_OPERATORS:
+  raise

[jira] [Resolved] (BEAM-1407) Support multiple Kafka client in KakaIO

2017-02-22 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin resolved BEAM-1407.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

> Support multiple Kafka client in KakaIO
> ---
>
> Key: BEAM-1407
> URL: https://issues.apache.org/jira/browse/BEAM-1407
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
> Fix For: 0.6.0
>
>
> enhance KafkaIO to work with Kafka client 0.9 and 0.10, (maybe 0.8 as well 
> ?), 
> 1). to fully leverage new features in each version, like external 
> authentication, timestamp in 0.10;
> 2). hide kafka API changed, to support steamless switch between different 
> cluster versions;
> Scope of change:
> 1). add an abstract API for Kafka Consumer in existing KafkaIO. By default 
> it's kafka 0.9 to keep compatible, with an option for extension;
> 2). Wrapped API for Kafka 0.10, with Kafka message timestamp support;
> 3). wrapped API for Kafka 0.8 (TBD);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Issue Comment Deleted] (BEAM-1514) change default timestamp in KafkaIO

2017-02-22 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-1514:
-
Comment: was deleted

(was: [~davor], can I have your help on how to setup the multiple version test? 
I'm not quire familiar with Beam's test settings.
Now I can pass junit tests locally, with 'clean verify 
-Dkafka.clients.version=0.10.1.1' and 'clean verify 
-Dkafka.clients.version=0.9.0.1'. Doubt do I need to change .travis.yml? )

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1537) Report more accurate size estimation for Iterables of unknown length

2017-02-22 Thread Vikas Kedigehalli (JIRA)

Vikas Kedigehalli created BEAM-1537:
---

 Summary: Report more accurate size estimation for Iterables of 
unknown length
 Key: BEAM-1537
 URL: https://issues.apache.org/jira/browse/BEAM-1537
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core, sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2031: [BEAM-1497] Refactor Hadoop/HDFS IO

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2031


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/4] beam git commit: Add SerializableConfiguration

2017-02-22 Thread dhalperi

Add SerializableConfiguration


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2b1c084e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2b1c084e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2b1c084e

Branch: refs/heads/master
Commit: 2b1c084efeff863e2dee56fc7243b68b38c43d08
Parents: ff208cc
Author: Rafal Wojdyla 
Authored: Thu Feb 16 00:22:41 2017 -0500
Committer: Dan Halperin 
Committed: Wed Feb 22 16:20:14 2017 -0800

--
 .../sdk/io/hdfs/SerializableConfiguration.java  | 67 
 1 file changed, 67 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/2b1c084e/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/SerializableConfiguration.java
--
diff --git 
a/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/SerializableConfiguration.java
 
b/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/SerializableConfiguration.java
new file mode 100644
index 000..f7b4bff
--- /dev/null
+++ 
b/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/SerializableConfiguration.java
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.hdfs;
+
+import java.io.Externalizable;
+import java.io.IOException;
+import java.io.ObjectInput;
+import java.io.ObjectOutput;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.Job;
+
+/**
+ * A wrapper to allow Hadoop {@link Configuration}s to be serialized using 
Java's standard
+ * serialization mechanisms.
+ */
+public class SerializableConfiguration implements Externalizable {
+  private static final long serialVersionUID = 0L;
+
+  private Configuration conf;
+
+  public SerializableConfiguration() {
+  }
+
+  public SerializableConfiguration(Configuration conf) {
+this.conf = conf;
+  }
+
+  public Configuration get() {
+return conf;
+  }
+
+  @Override
+  public void writeExternal(ObjectOutput out) throws IOException {
+out.writeInt(conf.size());
+for (Map.Entry entry : conf) {
+  out.writeUTF(entry.getKey());
+  out.writeUTF(entry.getValue());
+}
+  }
+
+  @Override
+  public void readExternal(ObjectInput in) throws IOException, 
ClassNotFoundException {
+conf = new Configuration();
+int size = in.readInt();
+for (int i = 0; i < size; i++) {
+  conf.set(in.readUTF(), in.readUTF());
+}
+  }
+
+}

[3/4] beam git commit: Remove obsolete classes

2017-02-22 Thread dhalperi

Remove obsolete classes


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ff208ccd
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ff208ccd
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ff208ccd

Branch: refs/heads/master
Commit: ff208ccd3e4493ce1eb0690f95c27ad270fe191a
Parents: f303b98
Author: Rafal Wojdyla 
Authored: Thu Feb 16 00:22:28 2017 -0500
Committer: Dan Halperin 
Committed: Wed Feb 22 16:20:14 2017 -0800

--
 .../beam/sdk/io/hdfs/AvroHDFSFileSource.java| 142 ---
 .../beam/sdk/io/hdfs/AvroWrapperCoder.java  | 114 ---
 .../SimpleAuthAvroHDFSFileSource.java   |  82 ---
 .../hdfs/simpleauth/SimpleAuthHDFSFileSink.java | 131 -
 .../simpleauth/SimpleAuthHDFSFileSource.java| 117 ---
 .../sdk/io/hdfs/simpleauth/package-info.java|  22 ---
 .../beam/sdk/io/hdfs/AvroWrapperCoderTest.java  |  50 ---
 7 files changed, 658 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ff208ccd/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/AvroHDFSFileSource.java
--
diff --git 
a/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/AvroHDFSFileSource.java
 
b/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/AvroHDFSFileSource.java
deleted file mode 100644
index 92fe5a6..000
--- 
a/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/AvroHDFSFileSource.java
+++ /dev/null
@@ -1,142 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io.hdfs;
-
-import com.google.common.base.Function;
-import com.google.common.collect.ImmutableList;
-import com.google.common.collect.Lists;
-import java.io.IOException;
-import java.util.List;
-import javax.annotation.Nullable;
-import org.apache.avro.Schema;
-import org.apache.avro.mapred.AvroKey;
-import org.apache.avro.mapreduce.AvroJob;
-import org.apache.avro.mapreduce.AvroKeyInputFormat;
-import org.apache.beam.sdk.coders.AvroCoder;
-import org.apache.beam.sdk.coders.Coder;
-import org.apache.beam.sdk.coders.KvCoder;
-import org.apache.beam.sdk.io.BoundedSource;
-import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.util.CoderUtils;
-import org.apache.beam.sdk.values.KV;
-import org.apache.hadoop.io.NullWritable;
-import org.apache.hadoop.mapreduce.InputSplit;
-import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
-
-/**
- * A {@code BoundedSource} for reading Avro files resident in a Hadoop 
filesystem.
- *
- * @param  The type of the Avro records to be read from the source.
- */
-public class AvroHDFSFileSource extends HDFSFileSource {
-  private static final long serialVersionUID = 0L;
-
-  protected final AvroCoder avroCoder;
-  private final String schemaStr;
-
-  public AvroHDFSFileSource(String filepattern, AvroCoder avroCoder) {
-this(filepattern, avroCoder, null);
-  }
-
-  public AvroHDFSFileSource(String filepattern,
-AvroCoder avroCoder,
-HDFSFileSource.SerializableSplit 
serializableSplit) {
-super(filepattern,
-ClassUtil.castClass(AvroKeyInputFormat.class),
-ClassUtil.castClass(AvroKey.class),
-NullWritable.class, serializableSplit);
-this.avroCoder = avroCoder;
-this.schemaStr = avroCoder.getSchema().toString();
-  }
-
-  @Override
-  public Coder> getDefaultOutputCoder() {
-AvroWrapperCoder keyCoder = 
AvroWrapperCoder.of(this.getKeyClass(), avroCoder);
-return KvCoder.of(keyCoder, WritableCoder.of(NullWritable.class));
-  }
-
-  @Override
-  public List> splitIntoBundles(
-  long desiredBundleSizeBytes, PipelineOptions options) throws Exception {
-if (serializableSplit == null) {
-  return Lists.transform(computeSplits(desiredBundleSizeBytes),
-  new

[4/4] beam git commit: This closes #2031

2017-02-22 Thread dhalperi

This closes #2031


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/cad84c88
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/cad84c88
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/cad84c88

Branch: refs/heads/master
Commit: cad84c8802068c699ea88b2094514410484576df
Parents: f303b98 da6d7e1
Author: Dan Halperin 
Authored: Wed Feb 22 16:20:17 2017 -0800
Committer: Dan Halperin 
Committed: Wed Feb 22 16:20:17 2017 -0800

--
 sdks/java/io/hdfs/pom.xml   |  11 +
 .../beam/sdk/io/hdfs/AvroHDFSFileSource.java| 142 -
 .../beam/sdk/io/hdfs/AvroWrapperCoder.java  | 114 
 .../apache/beam/sdk/io/hdfs/HDFSFileSink.java   | 300 ---
 .../apache/beam/sdk/io/hdfs/HDFSFileSource.java | 531 ---
 .../sdk/io/hdfs/SerializableConfiguration.java  |  93 
 .../org/apache/beam/sdk/io/hdfs/UGIHelper.java  |  38 ++
 .../apache/beam/sdk/io/hdfs/WritableCoder.java  |   2 +-
 .../SimpleAuthAvroHDFSFileSource.java   |  82 ---
 .../hdfs/simpleauth/SimpleAuthHDFSFileSink.java | 131 -
 .../simpleauth/SimpleAuthHDFSFileSource.java| 117 
 .../sdk/io/hdfs/simpleauth/package-info.java|  22 -
 .../beam/sdk/io/hdfs/AvroWrapperCoderTest.java  |  50 --
 .../beam/sdk/io/hdfs/HDFSFileSinkTest.java  | 173 ++
 .../beam/sdk/io/hdfs/HDFSFileSourceTest.java|  39 +-
 15 files changed, 905 insertions(+), 940 deletions(-)
--

[1/4] beam git commit: Refactor Hadoop/HDFS IO

2017-02-22 Thread dhalperi

Repository: beam
Updated Branches:
  refs/heads/master f303b9899 -> cad84c880


Refactor Hadoop/HDFS IO


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/da6d7e16
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/da6d7e16
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/da6d7e16

Branch: refs/heads/master
Commit: da6d7e16464cf8ddb1a798872e79f3ff55580c9c
Parents: 2b1c084
Author: Rafal Wojdyla 
Authored: Thu Feb 16 00:56:51 2017 -0500
Committer: Dan Halperin 
Committed: Wed Feb 22 16:20:14 2017 -0800

--
 sdks/java/io/hdfs/pom.xml   |  11 +
 .../apache/beam/sdk/io/hdfs/HDFSFileSink.java   | 300 ---
 .../apache/beam/sdk/io/hdfs/HDFSFileSource.java | 531 ---
 .../sdk/io/hdfs/SerializableConfiguration.java  |  28 +-
 .../org/apache/beam/sdk/io/hdfs/UGIHelper.java  |  38 ++
 .../apache/beam/sdk/io/hdfs/WritableCoder.java  |   2 +-
 .../beam/sdk/io/hdfs/HDFSFileSinkTest.java  | 173 ++
 .../beam/sdk/io/hdfs/HDFSFileSourceTest.java|  39 +-
 8 files changed, 839 insertions(+), 283 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/da6d7e16/sdks/java/io/hdfs/pom.xml
--
diff --git a/sdks/java/io/hdfs/pom.xml b/sdks/java/io/hdfs/pom.xml
index cd6cf4c..f857a22 100644
--- a/sdks/java/io/hdfs/pom.xml
+++ b/sdks/java/io/hdfs/pom.xml
@@ -105,6 +105,17 @@
 
 
 
+  com.google.auto.value
+  auto-value
+  provided
+
+
+
+  org.slf4j
+  slf4j-api
+
+
+
   com.google.code.findbugs
   jsr305
 

http://git-wip-us.apache.org/repos/asf/beam/blob/da6d7e16/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSink.java
--
diff --git 
a/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSink.java 
b/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSink.java
index 6d30307..168bac7 100644
--- 
a/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSink.java
+++ 
b/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSink.java
@@ -17,25 +17,36 @@
  */
 package org.apache.beam.sdk.io.hdfs;
 
+import static com.google.common.base.Preconditions.checkNotNull;
 import static com.google.common.base.Preconditions.checkState;
 
+import com.google.auto.value.AutoValue;
 import com.google.common.collect.Lists;
-import com.google.common.collect.Maps;
 import com.google.common.collect.Sets;
 import java.io.IOException;
-import java.util.Map;
+import java.net.URI;
+import java.security.PrivilegedExceptionAction;
 import java.util.Random;
 import java.util.Set;
+import javax.annotation.Nullable;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.mapred.AvroKey;
+import org.apache.avro.mapreduce.AvroKeyOutputFormat;
+import org.apache.beam.sdk.coders.AvroCoder;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.StringUtf8Coder;
 import org.apache.beam.sdk.io.Sink;
 import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.values.KV;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.PathFilter;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.JobContext;
 import org.apache.hadoop.mapreduce.JobID;
@@ -46,67 +57,203 @@ import org.apache.hadoop.mapreduce.TaskID;
 import org.apache.hadoop.mapreduce.TaskType;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 import org.apache.hadoop.mapreduce.task.JobContextImpl;
 import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
 
 /**
- * A {@code Sink} for writing records to a Hadoop filesystem using a Hadoop 
file-based output
+ * A {@link Sink} for writing records to a Hadoop filesystem using a Hadoop 
file-based output
  * format.
  *
- * @param  The type of keys to be written to the sink.
- * @param  The type of values to be written to the sink.
+ * To write a {@link org.apache.beam.sdk.values.PCollection} of elements of 
type T to Hadoop
+ * filesystem use {@link HDFSFileSink#to}, specify the path (this can be any 
Hadoop supported
+ * filesystem: HDFS, S3, GCS etc), the Hadoop {@link FileOutputFormat}, the 
key class K and the
+ * value class V and finally the {@link

[jira] [Updated] (BEAM-1205) Auto set "enableAbandonedNodeEnforcement" in TestPipeline

2017-02-22 Thread Daniel Halperin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1205:
--
Labels: backward-incompatible  (was: )

> Auto set "enableAbandonedNodeEnforcement" in TestPipeline
> -
>
> Key: BEAM-1205
> URL: https://issues.apache.org/jira/browse/BEAM-1205
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Stas Levin
>Assignee: Stas Levin
>  Labels: backward-incompatible
> Fix For: 0.6.0
>
>
> At the moment one has to manually set 
> {{enableAbandonedNodeEnforcement(false)}} in tests that do not run the 
> TestPipeline, otherwise one gets an {{AbandonedNodeException}} on account of 
> having nodes that were not run.
> This could probably be auto detected using the {{RunnableOnService}} and 
> {{NeedsRunner}} annotations, the presence of which indicates a given test 
> does indeed use a runner. 
> Essentially we need to check if {{RunnableOnService}} / {{NeedsRunner}} are 
> present on a given test and if so set 
> {{enableAbandonedNodeEnforcement(true)}}, otherwise set 
> {{enableAbandonedNodeEnforcement(false)}}.
> [~tgroh], [~kenn]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2076: [BEAM-1320] Fix warnings in the beam docstrings

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2076


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: [BEAM-1320] Fix warnings in the beam docstrings

2017-02-22 Thread altay

Repository: beam
Updated Branches:
  refs/heads/master 17ab371f1 -> f303b9899


[BEAM-1320] Fix warnings in the beam docstrings


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d1ac3055
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d1ac3055
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d1ac3055

Branch: refs/heads/master
Commit: d1ac30552aebb596ebc5e981976991d26851f3ea
Parents: 17ab371
Author: Sourabh Bajaj 
Authored: Wed Feb 22 14:25:26 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 14:59:21 2017 -0800

--
 .../examples/complete/autocomplete.py   |  3 ++-
 .../complete/juliaset/juliaset/juliaset.py  |  3 ++-
 sdks/python/apache_beam/io/textio.py| 23 ++--
 sdks/python/generate_pydoc.sh   | 12 ++
 4 files changed, 23 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/d1ac3055/sdks/python/apache_beam/examples/complete/autocomplete.py
--
diff --git a/sdks/python/apache_beam/examples/complete/autocomplete.py 
b/sdks/python/apache_beam/examples/complete/autocomplete.py
index 3f2a7ae..f954ec1 100644
--- a/sdks/python/apache_beam/examples/complete/autocomplete.py
+++ b/sdks/python/apache_beam/examples/complete/autocomplete.py
@@ -78,7 +78,8 @@ class TopPerPrefix(beam.PTransform):
 | beam.combiners.Top.LargestPerKey(self._count))
 
 
-def extract_prefixes((word, count)):
+def extract_prefixes(element):
+  word, count = element
   for k in range(1, len(word) + 1):
 prefix = word[:k]
 yield prefix, (count, word)

http://git-wip-us.apache.org/repos/asf/beam/blob/d1ac3055/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
--
diff --git 
a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py 
b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
index 8e5d5b3..5ff2b78 100644
--- a/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
+++ b/sdks/python/apache_beam/examples/complete/juliaset/juliaset/juliaset.py
@@ -33,8 +33,9 @@ def from_pixel(x, y, n):
   return complex(2.0 * x / n - 1.0, 2.0 * y / n - 1.0)
 
 
-def get_julia_set_point_color((x, y), c, n, max_iterations):
+def get_julia_set_point_color(element, c, n, max_iterations):
   """Given an pixel, convert it into a point in our julia set."""
+  x, y = element
   z = from_pixel(x, y, n)
   for i in xrange(max_iterations):
 if z.real * z.real + z.imag * z.imag > 2.0:

http://git-wip-us.apache.org/repos/asf/beam/blob/d1ac3055/sdks/python/apache_beam/io/textio.py
--
diff --git a/sdks/python/apache_beam/io/textio.py 
b/sdks/python/apache_beam/io/textio.py
index 2ddaf02..19980cb 100644
--- a/sdks/python/apache_beam/io/textio.py
+++ b/sdks/python/apache_beam/io/textio.py
@@ -333,7 +333,7 @@ class ReadFromText(PTransform):
   """A PTransform for reading text files.
 
   Parses a text file as newline-delimited elements, by default assuming
-  UTF-8 encoding. Supports newline delimiters '\n' and '\r\n'.
+  UTF-8 encoding. Supports newline delimiters '\\n' and '\\r\\n'.
 
   This implementation only supports reading text encoded using UTF-8 or ASCII.
   This does not support other encodings such as UTF-16 or UTF-32.
@@ -352,22 +352,21 @@ class ReadFromText(PTransform):
 
 Args:
   file_pattern: The file path to read from as a local file path or a GCS
-gs:// path. The path can contain glob characters (*, ?, and [...]
-sets).
+``gs://`` path. The path can contain glob characters
+``(*, ?, and [...] sets)``.
   min_bundle_size: Minimum size of bundles that should be generated when
-   splitting this source into bundles. See
-   ``FileBasedSource`` for more details.
+splitting this source into bundles. See ``FileBasedSource`` for more
+details.
   compression_type: Used to handle compressed input files. Typical value
-  is CompressionTypes.AUTO, in which case the underlying file_path's
-  extension will be used to detect the compression.
+is CompressionTypes.AUTO, in which case the underlying file_path's
+extension will be used to detect the compression.
   strip_trailing_newlines: Indicates whether this source should remove
-   the newline char in each line it reads before
-   decoding that line.
+the newline char in each line it reads before decoding that line.
   validate: flag to verify that the

[2/2] beam git commit: This closes #2076

2017-02-22 Thread altay

This closes #2076


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f303b989
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f303b989
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f303b989

Branch: refs/heads/master
Commit: f303b98994482393f1dbfc3a42d6b1b98c59fde6
Parents: 17ab371 d1ac305
Author: Ahmet Altay 
Authored: Wed Feb 22 14:59:32 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 14:59:32 2017 -0800

--
 .../examples/complete/autocomplete.py   |  3 ++-
 .../complete/juliaset/juliaset/juliaset.py  |  3 ++-
 sdks/python/apache_beam/io/textio.py| 23 ++--
 sdks/python/generate_pydoc.sh   | 12 ++
 4 files changed, 23 insertions(+), 18 deletions(-)
--

[jira] [Assigned] (BEAM-1535) Build a kubernetes environment for running examples on (direct) Java runner

2017-02-22 Thread Nitin Lamba (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitin Lamba reassigned BEAM-1535:
-

Assignee: Nitin Lamba  (was: Frances Perry)

> Build a kubernetes environment for running examples on (direct) Java runner 
> 
>
> Key: BEAM-1535
> URL: https://issues.apache.org/jira/browse/BEAM-1535
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-java
>Reporter: Nitin Lamba
>Assignee: Nitin Lamba
>
> This will help getting examples setup and deployed quickly in a multi-node 
> cluster.
> This tutorial can be used as a guideline:
> https://tensorflow.github.io/serving/serving_inception



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1535) Build a kubernetes environment for running examples on (direct) Java runner

2017-02-22 Thread Nitin Lamba (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879493#comment-15879493
 ] 

Nitin Lamba commented on BEAM-1535:
---

I can take this up; planning to add docs around it as well. Thanks

> Build a kubernetes environment for running examples on (direct) Java runner 
> 
>
> Key: BEAM-1535
> URL: https://issues.apache.org/jira/browse/BEAM-1535
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-java
>Reporter: Nitin Lamba
>Assignee: Frances Perry
>
> This will help getting examples setup and deployed quickly in a multi-node 
> cluster.
> This tutorial can be used as a guideline:
> https://tensorflow.github.io/serving/serving_inception



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam-site pull request #161: Fix broken links due to directory renames in Be...

2017-02-22 Thread sb2nov

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam-site/pull/161

Fix broken links due to directory renames in Beam

Ref: https://github.com/apache/beam/pull/2079

R: @aaltay PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam-site sb_fix_links

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/161.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #161


commit f5d8735bdeb30c3f317fc5851d7425de2c5c4bcb
Author: Sourabh Bajaj 
Date:   2017-02-22T23:34:30Z

Fix broken links




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1218) De-Googlify Python SDK

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879465#comment-15879465
 ] 

ASF GitHub Bot commented on BEAM-1218:
--

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2079

[BEAM-1218] Rename google_cloud_dataflow and google_cloud_platform

R: @aaltay PTAL

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-1218-rename-directories

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2079


commit 17040b671d3e435bf4c1033d672972c133e9dc68
Author: Sourabh Bajaj 
Date:   2017-02-22T23:28:38Z

Rename google_cloud_dataflow and google_cloud_platform




> De-Googlify Python SDK
> --
>
> Key: BEAM-1218
> URL: https://issues.apache.org/jira/browse/BEAM-1218
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Mark Liu
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (BEAM-1530) BigQueryIO should support value-dependent windows

2017-02-22 Thread Davor Bonaci (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1530:
--

Assignee: Reuven Lax  (was: Davor Bonaci)

> BigQueryIO should support value-dependent windows
> -
>
> Key: BEAM-1530
> URL: https://issues.apache.org/jira/browse/BEAM-1530
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2079: [BEAM-1218] Rename google_cloud_dataflow and google...

2017-02-22 Thread sb2nov

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2079

[BEAM-1218] Rename google_cloud_dataflow and google_cloud_platform

R: @aaltay PTAL

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-1218-rename-directories

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2079


commit 17040b671d3e435bf4c1033d672972c133e9dc68
Author: Sourabh Bajaj 
Date:   2017-02-22T23:28:38Z

Rename google_cloud_dataflow and google_cloud_platform




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #2078: [BEAM-646] Specialize ISM View Translations

2017-02-22 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2078

[BEAM-646] Specialize ISM View Translations

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
These translations have a different coder than the original view, which
is required on the worker side to be transmitted as part of the
Singleton Output, and are generally not compatible with the replacements
performed during Pipeline Surgery. Specializing this translation allows
the updated coders to be set on the output view based on the overriding
transform, rather than the output node, as the view needs to remain
consistent.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam ism_coder_specialized_override

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2078


commit 519e8f2495f9463119f190a29e53c8398cb7ec7e
Author: Thomas Groh 
Date:   2017-02-22T22:59:08Z

Specialize ISM View Translations

These translations have a different coder than the original view, which
is required on the worker side to be transmitted as part of the
Singleton Output, and are generally not compatible with the replacements
performed during Pipeline Surgery. Specializing this translation allows
the updated coders to be set on the output view based on the overriding
transform, rather than the output node, as the view needs to remain
consistent.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-646) Get runners out of the apply()

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879423#comment-15879423
 ] 

ASF GitHub Bot commented on BEAM-646:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2078

[BEAM-646] Specialize ISM View Translations

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
These translations have a different coder than the original view, which
is required on the worker side to be transmitted as part of the
Singleton Output, and are generally not compatible with the replacements
performed during Pipeline Surgery. Specializing this translation allows
the updated coders to be set on the output view based on the overriding
transform, rather than the output node, as the view needs to remain
consistent.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam ism_coder_specialized_override

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2078


commit 519e8f2495f9463119f190a29e53c8398cb7ec7e
Author: Thomas Groh 
Date:   2017-02-22T22:59:08Z

Specialize ISM View Translations

These translations have a different coder than the original view, which
is required on the worker side to be transmitted as part of the
Singleton Output, and are generally not compatible with the replacements
performed during Pipeline Surgery. Specializing this translation allows
the updated coders to be set on the output view based on the overriding
transform, rather than the output node, as the view needs to remain
consistent.




> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>  Labels: backwards-incompatible
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1320) Add sphinx or pydocs documentation for python-sdk

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879411#comment-15879411
 ] 

ASF GitHub Bot commented on BEAM-1320:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/158


> Add sphinx or pydocs documentation for python-sdk
> -
>
> Key: BEAM-1320
> URL: https://issues.apache.org/jira/browse/BEAM-1320
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[3/4] beam-site git commit: Regenerate website

2017-02-22 Thread davor

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/af2f9e19
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/af2f9e19
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/af2f9e19

Branch: refs/heads/asf-site
Commit: af2f9e19da69427ab2e4d5bba862154c6527a726
Parents: 6686137
Author: Davor Bonaci 
Authored: Wed Feb 22 14:53:02 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 14:53:02 2017 -0800

--
 .../2016/03/17/capability-matrix.html   |   6 +-
 .../2016/04/03/presentation-materials.html  |   6 +-
 .../sdk/2016/02/25/python-sdk-now-public.html   |   6 +-
 .../beam/release/2016/06/15/first-release.html  |   6 +-
 .../10/11/strata-hadoop-world-and-beam.html |   6 +-
 .../website/2016/02/22/beam-has-a-logo.html |   6 +-
 .../blog/2016/05/18/splitAtFraction-method.html |   6 +-
 .../05/27/where-is-my-pcollection-dot-map.html  |   6 +-
 .../06/13/flink-batch-runner-milestone.html |   6 +-
 content/blog/2016/08/03/six-months.html |   6 +-
 content/blog/2016/10/20/test-stream.html|   6 +-
 content/blog/2017/01/09/added-apex-runner.html  |   6 +-
 content/blog/2017/01/10/beam-graduates.html |   6 +-
 .../blog/2017/02/01/graduation-media-recap.html |   6 +-
 .../blog/2017/02/13/stateful-processing.html|   6 +-
 content/blog/index.html |   6 +-
 content/coming-soon.html|   6 +-
 .../contribute/contribution-guide/index.html|   6 +-
 content/contribute/design-principles/index.html |   6 +-
 content/contribute/index.html   |   6 +-
 content/contribute/logos/index.html |   6 +-
 content/contribute/maturity-model/index.html|   6 +-
 .../presentation-materials/index.html   |   6 +-
 .../ptransform-style-guide/index.html   |   6 +-
 content/contribute/release-guide/index.html |  34 +++-
 content/contribute/source-repository/index.html |   6 +-
 content/contribute/team/index.html  |   6 +-
 content/contribute/testing/index.html   |   6 +-
 content/contribute/work-in-progress/index.html  |   6 +-
 content/documentation/index.html|   6 +-
 .../pipelines/create-your-pipeline/index.html   |   6 +-
 .../pipelines/design-your-pipeline/index.html   |   6 +-
 .../pipelines/test-your-pipeline/index.html |   6 +-
 .../documentation/programming-guide/index.html  |   6 +-
 content/documentation/resources/index.html  |   6 +-
 content/documentation/runners/apex/index.html   |   6 +-
 .../runners/capability-matrix/index.html|   6 +-
 .../documentation/runners/dataflow/index.html   |   6 +-
 content/documentation/runners/direct/index.html |   6 +-
 content/documentation/runners/flink/index.html  |   6 +-
 content/documentation/runners/spark/index.html  |   6 +-
 content/documentation/sdks/java/index.html  |   8 +-
 .../documentation/sdks/pydoc/current/index.html | 183 +++
 content/documentation/sdks/pydoc/index.html | 183 +++
 .../sdks/python-custom-io/index.html|   6 +-
 .../python-pipeline-dependencies/index.html |   6 +-
 .../sdks/python-type-safety/index.html  |   6 +-
 content/documentation/sdks/python/index.html|  12 +-
 content/get-started/beam-overview/index.html|   6 +-
 content/get-started/downloads/index.html|   6 +-
 content/get-started/index.html  |   6 +-
 .../mobile-gaming-example/index.html|   6 +-
 content/get-started/quickstart-java/index.html  |   6 +-
 content/get-started/quickstart-py/index.html|   6 +-
 content/get-started/support/index.html  |   6 +-
 .../get-started/wordcount-example/index.html|   6 +-
 content/index.html  |   6 +-
 content/privacy_policy/index.html   |   6 +-
 58 files changed, 674 insertions(+), 64 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/af2f9e19/content/beam/capability/2016/03/17/capability-matrix.html
--
diff --git a/content/beam/capability/2016/03/17/capability-matrix.html 
b/content/beam/capability/2016/03/17/capability-matrix.html
index 0cbe40d..9ec9145 100644
--- a/content/beam/capability/2016/03/17/capability-matrix.html
+++ b/content/beam/capability/2016/03/17/capability-matrix.html
@@ -84,6 +84,10 @@
  alt="External link.">
 
 Python SDK
+
  
  Runners
  Capability Matrix
@@ -108,7 +112,7 @@
 Technical References
 Design Principles
  Ongoing 
Projects
-Source 
Repository  
+

[jira] [Commented] (BEAM-1047) DataflowRunner: support regionalization.

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879404#comment-15879404
 ] 

ASF GitHub Bot commented on BEAM-1047:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2074


> DataflowRunner: support regionalization.
> 
>
> Key: BEAM-1047
> URL: https://issues.apache.org/jira/browse/BEAM-1047
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Pei He
>Assignee: Daniel Halperin
>
> Tracking bug.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[1/4] beam-site git commit: [BEAM-1320] Update release guide to create pydocs

2017-02-22 Thread davor

Repository: beam-site
Updated Branches:
  refs/heads/asf-site 7937818f5 -> b3d322402


[BEAM-1320] Update release guide to create pydocs


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6686137c
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6686137c
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6686137c

Branch: refs/heads/asf-site
Commit: 6686137c3096004dc77b2078327c78a2723fcf57
Parents: 7937818
Author: Sourabh Bajaj 
Authored: Wed Feb 22 14:15:28 2017 -0800
Committer: Sourabh Bajaj 
Committed: Wed Feb 22 14:15:28 2017 -0800

--
 src/_includes/header.html   |  6 +-
 src/contribute/release-guide.md | 20 +---
 src/documentation/sdks/java.md  |  2 +-
 src/documentation/sdks/pydoc/current.md |  9 +
 src/documentation/sdks/pydoc/index.md   |  9 +
 src/documentation/sdks/python.md|  6 +++---
 6 files changed, 44 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/6686137c/src/_includes/header.html
--
diff --git a/src/_includes/header.html b/src/_includes/header.html
index d44e0dc..a3a54e0 100644
--- a/src/_includes/header.html
+++ b/src/_includes/header.html
@@ -50,6 +50,10 @@
  alt="External link.">
 
 Python 
SDK
+
  
  Runners
  Capability Matrix
@@ -74,7 +78,7 @@
 Technical References
 Design 
Principles
  Ongoing Projects
-Source 
Repository  
+Source 
Repository
 
  Promotion
 Presentation Materials

http://git-wip-us.apache.org/repos/asf/beam-site/blob/6686137c/src/contribute/release-guide.md
--
diff --git a/src/contribute/release-guide.md b/src/contribute/release-guide.md
index 7bba5ee..db0b558 100644
--- a/src/contribute/release-guide.md
+++ b/src/contribute/release-guide.md
@@ -278,6 +278,14 @@ Copy the source release to the dev repository of 
`dist.apache.org`.
 
 1. Verify that files are 
[present](https://dist.apache.org/repos/dist/dev/beam).
 
+### Build the Pydoc API reference
+
+Create the Python SDK documentation using sphinx by running a helper script.
+```
+cd sdks/python && ./generate_pydoc.sh
+```
+By default the Pydoc is generated in `sdks/python/target/docs/_build`. Let 
`${PYDOC_ROOT}` be the absolute path to `_build`.
+
 ### Propose a pull request for website updates
 
 The final step of building the candidate is to propose a website pull request.
@@ -297,13 +305,19 @@ Add the new Javadoc to [SDK API Reference page]({{ 
site.baseurl }}/documentation
 * Set up the necessary git commands to account for the new and deleted files 
from the javadoc.
 * Update the Javadoc link on this page to point to the new version (in 
`src/documentation/sdks/javadoc/current.md`).
 
+ Create Pydoc
+Add the new Pydoc to [SDK API Reference page]({{ site.baseurl 
}}/documentation/sdks/pydoc/) page, as follows:
+
+* Copy the generated Pydoc into the website repository: `cp -r ${PYDOC_ROOT} 
documentation/sdks/pydoc/${VERSION}`.
+* Update the Pydoc link on this page to point to the new version (in 
`src/documentation/sdks/pydoc/current.md`).
+
 Finally, propose a pull request with these changes. (Donât merge before 
finalizing the release.)
 
 ### Checklist to proceed to the next step
 
 1. Maven artifacts deployed to the staging repository of 
[repository.apache.org](https://repository.apache.org/content/repositories/)
 1. Source distribution deployed to the dev repository of 
[dist.apache.org](https://dist.apache.org/repos/dist/dev/beam/)
-1. Website pull request proposed to list the [release]({{ site.baseurl 
}}/use/releases/) and publish the [API reference manual]({{ site.baseurl 
}}/learn/sdks/javadoc/)
+1. Website pull request proposed to list the [release]({{ site.baseurl 
}}/use/releases/), publish the [Java API reference manual]({{ site.baseurl 
}}/documentation/sdks/javadoc/), and publish the [Python API reference 
manual]({{ site.baseurl }}/documentation/sdks/pydoc/).
 
 **
 
@@ -402,7 +416,7 @@ Create a new Git tag for the released version by copying 
the tag for the final r
 
 ### Merge website pull request
 
-Merge the website pull request to [list the release]({{ site.baseurl 
}}/use/releases/) and publish the [API reference manual]({{ site.baseurl 
}}/learn/sdks/javadoc/) created earlier.
+Merge the website pull request to [list the release]({{ site.baseurl 
}}/use/releases/), publish the [Python API reference manual]({{ site.baseurl

[GitHub] beam-site pull request #158: [BEAM-1320] Update release guide to create pydo...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/158


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/4] beam-site git commit: Regenerate website

2017-02-22 Thread davor

http://git-wip-us.apache.org/repos/asf/beam-site/blob/af2f9e19/content/documentation/sdks/pydoc/index.html
--
diff --git a/content/documentation/sdks/pydoc/index.html 
b/content/documentation/sdks/pydoc/index.html
new file mode 100644
index 000..a52e0fb
--- /dev/null
+++ b/content/documentation/sdks/pydoc/index.html
@@ -0,0 +1,183 @@
+
+
+
+  
+  
+  
+  
+
+  Apache Beam Pydoc
+  
+
+  
+  
+  https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";>
+  
+  
+  https://beam.apache.org/documentation/sdks/pydoc/; data-proofer-ignore>
+  https://beam.apache.org/feed.xml;>
+  
+
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ga('create', 'UA-73650088-1', 'auto');
+ga('send', 'pageview');
+
+  
+  
+
+
+
+  
+
+
+  
+
+  
+
+  
+  
+Toggle navigation
+
+
+
+  
+
+
+  
+
+ Get Started 
+ 
+ Beam 
Overview
+Quickstart - Java
+Quickstart - Python
+ 
+ Example Walkthroughs
+ WordCount
+ Mobile Gaming
+  
+  Resources
+  Downloads
+  Support
+ 
+   
+
+ Documentation 
+ 
+ Using the 
Documentation
+ 
+ Beam Concepts
+ Programming Guide
+ Additional 
Resources
+ 
+  Pipeline Fundamentals
+  Design Your 
Pipeline
+  Create Your 
Pipeline
+  Test 
Your Pipeline
+  
+ SDKs
+ Java 
SDK
+ Java SDK API Reference 
+
+Python SDK
+
+ 
+ Runners
+ Capability Matrix
+ Direct 
Runner
+ Apache 
Apex Runner
+ Apache 
Flink Runner
+ Apache 
Spark Runner
+ Cloud 
Dataflow Runner
+ 
+   
+
+ Contribute 
+ 
+ Get Started 
Contributing
+
+Guides
+ Contribution Guide
+Testing Guide
+Release Guide
+PTransform Style 
Guide
+
+Technical References
+Design Principles
+ Ongoing 
Projects
+Source Repository
+
+ Promotion
+Presentation 
Materials
+Logos and Design
+
+Maturity Model
+Team
+ 
+   
+
+Blog
+  
+  
+
+  https://www.apache.org/foundation/press/kit/feather_small.png; alt="Apache 
Logo" style="height:24px;">Apache Software Foundation
+  
+http://www.apache.org/;>ASF Homepage
+http://www.apache.org/licenses/;>License
+http://www.apache.org/security/;>Security
+http://www.apache.org/foundation/thanks.html;>Thanks
+http://www.apache.org/foundation/sponsorship.html;>Sponsorship
+https://www.apache.org/foundation/policies/conduct;>Code of 
Conduct
+  
+
+  
+
+  
+
+
+
+
+
+
+
+
+  
+
+
+
+Please watch this space for the upcoming Python SDK release.
+
+  
+
+
+
+  
+  
+  
+  
+ Copyright
+http://www.apache.org;>The Apache Software 
Foundation,
+2017. All Rights Reserved.
+  
+  
+Privacy Policy |
+RSS Feed
+  
+  
+  
+  
+  
+
+
+
+  
+
+

http://git-wip-us.apache.org/repos/asf/beam-site/blob/af2f9e19/content/documentation/sdks/python-custom-io/index.html
--
diff --git a/content/documentation/sdks/python-custom-io/index.html 
b/content/documentation/sdks/python-custom-io/index.html
index 7592a4a..ba448f0 100644
--- a/content/documentation/sdks/python-custom-io/index.html
+++ b/content/documentation/sdks/python-custom-io/index.html
@@ -85,6 +85,10 @@
  alt="External link.">
 
 Python SDK
+
  
  Runners
  Capability Matrix
@@ -109,7 +113,7 @@

[4/4] beam-site git commit: This closes #158

2017-02-22 Thread davor

This closes #158


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b3d32240
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b3d32240
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b3d32240

Branch: refs/heads/asf-site
Commit: b3d32240246c845d3ed44150ea0dee061bbfb08d
Parents: 7937818 af2f9e1
Author: Davor Bonaci 
Authored: Wed Feb 22 14:53:02 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 14:53:02 2017 -0800

--
 .../2016/03/17/capability-matrix.html   |   6 +-
 .../2016/04/03/presentation-materials.html  |   6 +-
 .../sdk/2016/02/25/python-sdk-now-public.html   |   6 +-
 .../beam/release/2016/06/15/first-release.html  |   6 +-
 .../10/11/strata-hadoop-world-and-beam.html |   6 +-
 .../website/2016/02/22/beam-has-a-logo.html |   6 +-
 .../blog/2016/05/18/splitAtFraction-method.html |   6 +-
 .../05/27/where-is-my-pcollection-dot-map.html  |   6 +-
 .../06/13/flink-batch-runner-milestone.html |   6 +-
 content/blog/2016/08/03/six-months.html |   6 +-
 content/blog/2016/10/20/test-stream.html|   6 +-
 content/blog/2017/01/09/added-apex-runner.html  |   6 +-
 content/blog/2017/01/10/beam-graduates.html |   6 +-
 .../blog/2017/02/01/graduation-media-recap.html |   6 +-
 .../blog/2017/02/13/stateful-processing.html|   6 +-
 content/blog/index.html |   6 +-
 content/coming-soon.html|   6 +-
 .../contribute/contribution-guide/index.html|   6 +-
 content/contribute/design-principles/index.html |   6 +-
 content/contribute/index.html   |   6 +-
 content/contribute/logos/index.html |   6 +-
 content/contribute/maturity-model/index.html|   6 +-
 .../presentation-materials/index.html   |   6 +-
 .../ptransform-style-guide/index.html   |   6 +-
 content/contribute/release-guide/index.html |  34 +++-
 content/contribute/source-repository/index.html |   6 +-
 content/contribute/team/index.html  |   6 +-
 content/contribute/testing/index.html   |   6 +-
 content/contribute/work-in-progress/index.html  |   6 +-
 content/documentation/index.html|   6 +-
 .../pipelines/create-your-pipeline/index.html   |   6 +-
 .../pipelines/design-your-pipeline/index.html   |   6 +-
 .../pipelines/test-your-pipeline/index.html |   6 +-
 .../documentation/programming-guide/index.html  |   6 +-
 content/documentation/resources/index.html  |   6 +-
 content/documentation/runners/apex/index.html   |   6 +-
 .../runners/capability-matrix/index.html|   6 +-
 .../documentation/runners/dataflow/index.html   |   6 +-
 content/documentation/runners/direct/index.html |   6 +-
 content/documentation/runners/flink/index.html  |   6 +-
 content/documentation/runners/spark/index.html  |   6 +-
 content/documentation/sdks/java/index.html  |   8 +-
 .../documentation/sdks/pydoc/current/index.html | 183 +++
 content/documentation/sdks/pydoc/index.html | 183 +++
 .../sdks/python-custom-io/index.html|   6 +-
 .../python-pipeline-dependencies/index.html |   6 +-
 .../sdks/python-type-safety/index.html  |   6 +-
 content/documentation/sdks/python/index.html|  12 +-
 content/get-started/beam-overview/index.html|   6 +-
 content/get-started/downloads/index.html|   6 +-
 content/get-started/index.html  |   6 +-
 .../mobile-gaming-example/index.html|   6 +-
 content/get-started/quickstart-java/index.html  |   6 +-
 content/get-started/quickstart-py/index.html|   6 +-
 content/get-started/support/index.html  |   6 +-
 .../get-started/wordcount-example/index.html|   6 +-
 content/index.html  |   6 +-
 content/privacy_policy/index.html   |   6 +-
 src/_includes/header.html   |   6 +-
 src/contribute/release-guide.md |  20 +-
 src/documentation/sdks/java.md  |   2 +-
 src/documentation/sdks/pydoc/current.md |   9 +
 src/documentation/sdks/pydoc/index.md   |   9 +
 src/documentation/sdks/python.md|   6 +-
 64 files changed, 718 insertions(+), 72 deletions(-)
--

Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Dataflow #2377

2017-02-22 Thread Apache Jenkins Server

See

[GitHub] beam pull request #2074: [BEAM-1047] Add getRegion to DataflowPipelineOption...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2074


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Add getRegion to the DataflowPipelineOptions

2017-02-22 Thread dhalperi

Repository: beam
Updated Branches:
  refs/heads/master 0f9fefe74 -> 17ab371f1


Add getRegion to the DataflowPipelineOptions


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/26c42b98
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/26c42b98
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/26c42b98

Branch: refs/heads/master
Commit: 26c42b9802ac88d2710071ff30fae2575b84c0ed
Parents: 0f9fefe
Author: Dan Halperin 
Authored: Wed Feb 22 10:00:11 2017 -0800
Committer: Dan Halperin 
Committed: Wed Feb 22 14:47:44 2017 -0800

--
 .../beam/runners/dataflow/DataflowClient.java   |  44 +++---
 .../runners/dataflow/DataflowPipelineJob.java   |   2 +-
 .../options/DataflowPipelineOptions.java|  17 +++
 .../dataflow/DataflowPipelineJobTest.java   | 153 +++
 .../runners/dataflow/DataflowRunnerTest.java|  64 
 5 files changed, 168 insertions(+), 112 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/26c42b98/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java
index 3536d72..dfd1c2b 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java
@@ -20,7 +20,7 @@ package org.apache.beam.runners.dataflow;
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import com.google.api.services.dataflow.Dataflow;
-import com.google.api.services.dataflow.Dataflow.Projects.Jobs;
+import com.google.api.services.dataflow.Dataflow.Projects.Locations.Jobs;
 import com.google.api.services.dataflow.model.Job;
 import com.google.api.services.dataflow.model.JobMetrics;
 import com.google.api.services.dataflow.model.LeaseWorkItemRequest;
@@ -40,15 +40,15 @@ import 
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
 public class DataflowClient {
 
   public static DataflowClient create(DataflowPipelineOptions options) {
-return new DataflowClient(options.getDataflowClient(), 
options.getProject());
+return new DataflowClient(options.getDataflowClient(), options);
   }
 
   private final Dataflow dataflow;
-  private final String projectId;
+  private final DataflowPipelineOptions options;
 
-  private DataflowClient(Dataflow dataflow, String projectId) {
+  private DataflowClient(Dataflow dataflow, DataflowPipelineOptions options) {
 this.dataflow = checkNotNull(dataflow, "dataflow");
-this.projectId = checkNotNull(projectId, "options");
+this.options = checkNotNull(options, "options");
   }
 
   /**
@@ -56,7 +56,8 @@ public class DataflowClient {
*/
   public Job createJob(@Nonnull Job job) throws IOException {
 checkNotNull(job, "job");
-Jobs.Create jobsCreate = dataflow.projects().jobs().create(projectId, job);
+Jobs.Create jobsCreate = dataflow.projects().locations().jobs()
+.create(options.getProject(), options.getRegion(), job);
 return jobsCreate.execute();
   }
 
@@ -65,8 +66,8 @@ public class DataflowClient {
* the {@link DataflowPipelineOptions}.
*/
   public ListJobsResponse listJobs(@Nullable String pageToken) throws 
IOException {
-Jobs.List jobsList = dataflow.projects().jobs()
-.list(projectId)
+Jobs.List jobsList = dataflow.projects().locations().jobs()
+.list(options.getProject(), options.getRegion())
 .setPageToken(pageToken);
 return jobsList.execute();
   }
@@ -77,8 +78,8 @@ public class DataflowClient {
   public Job updateJob(@Nonnull String jobId, @Nonnull Job content) throws 
IOException {
 checkNotNull(jobId, "jobId");
 checkNotNull(content, "content");
-Jobs.Update jobsUpdate = dataflow.projects().jobs()
-.update(projectId, jobId, content);
+Jobs.Update jobsUpdate = dataflow.projects().locations().jobs()
+.update(options.getProject(), options.getRegion(), jobId, content);
 return jobsUpdate.execute();
   }
 
@@ -87,8 +88,8 @@ public class DataflowClient {
*/
   public Job getJob(@Nonnull String jobId) throws IOException {
 checkNotNull(jobId, "jobId");
-Jobs.Get jobsGet = dataflow.projects().jobs()
-.get(projectId, jobId);
+Jobs.Get jobsGet = dataflow.projects().locations().jobs()
+.get(options.getProject(), options.getRegion(), jobId);
 return jobsGet.execute();
   }
 
@@ -97,8 +98,8 @@ public class DataflowClient {

[2/2] beam git commit: This closes #2074

2017-02-22 Thread dhalperi

This closes #2074


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/17ab371f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/17ab371f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/17ab371f

Branch: refs/heads/master
Commit: 17ab371f13a151d03315889e79e69c06ac77ac1d
Parents: 0f9fefe 26c42b9
Author: Dan Halperin 
Authored: Wed Feb 22 14:47:52 2017 -0800
Committer: Dan Halperin 
Committed: Wed Feb 22 14:47:52 2017 -0800

--
 .../beam/runners/dataflow/DataflowClient.java   |  44 +++---
 .../runners/dataflow/DataflowPipelineJob.java   |   2 +-
 .../options/DataflowPipelineOptions.java|  17 +++
 .../dataflow/DataflowPipelineJobTest.java   | 153 +++
 .../runners/dataflow/DataflowRunnerTest.java|  64 
 5 files changed, 168 insertions(+), 112 deletions(-)
--

[GitHub] beam pull request #2068: Respect type hints for IterableCoder

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2068


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Respect type hints for IterableCoder

2017-02-22 Thread altay

Repository: beam
Updated Branches:
  refs/heads/master e38a4803d -> 0f9fefe74


Respect type hints for IterableCoder


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/37ff8835
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/37ff8835
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/37ff8835

Branch: refs/heads/master
Commit: 37ff8835ef7bc0e988b1c3bbb5d71d7b025f63b2
Parents: e38a480
Author: Vikas Kedigehalli 
Authored: Tue Feb 21 21:50:48 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 14:19:21 2017 -0800

--
 sdks/python/apache_beam/coders/typecoders.py  | 6 +-
 sdks/python/apache_beam/coders/typecoders_test.py | 7 +++
 2 files changed, 8 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/37ff8835/sdks/python/apache_beam/coders/typecoders.py
--
diff --git a/sdks/python/apache_beam/coders/typecoders.py 
b/sdks/python/apache_beam/coders/typecoders.py
index 9bbfedf..767d791 100644
--- a/sdks/python/apache_beam/coders/typecoders.py
+++ b/sdks/python/apache_beam/coders/typecoders.py
@@ -117,11 +117,7 @@ class CoderRegistry(object):
 'Coder registry has no fallback coder. This can happen if the '
 'fast_coders module could not be imported.')
   if isinstance(typehint, typehints.IterableTypeConstraint):
-# In this case, we suppress the warning message for using the fallback
-# coder, since Iterable is hinted as the output of a GroupByKey
-# operation and that direct output will not be coded.
-# TODO(ccy): refine this behavior.
-pass
+return coders.IterableCoder.from_type_hint(typehint, self)
   elif typehint is None:
 # In some old code, None is used for Any.
 # TODO(robertwb): Clean this up.

http://git-wip-us.apache.org/repos/asf/beam/blob/37ff8835/sdks/python/apache_beam/coders/typecoders_test.py
--
diff --git a/sdks/python/apache_beam/coders/typecoders_test.py 
b/sdks/python/apache_beam/coders/typecoders_test.py
index 28c77af..2b6aa7a 100644
--- a/sdks/python/apache_beam/coders/typecoders_test.py
+++ b/sdks/python/apache_beam/coders/typecoders_test.py
@@ -112,6 +112,13 @@ class TypeCodersTest(unittest.TestCase):
 real_coder.encode('abc'), expected_coder.encode('abc'))
 self.assertEqual('abc', real_coder.decode(real_coder.encode('abc')))
 
+  def test_iterable_coder(self):
+real_coder = typecoders.registry.get_coder(typehints.Iterable[str])
+expected_coder = coders.IterableCoder(coders.BytesCoder())
+values = ['abc', 'xyz']
+self.assertEqual(expected_coder, real_coder)
+self.assertEqual(real_coder.encode(values), expected_coder.encode(values))
+
 
 if __name__ == '__main__':
   unittest.main()

[2/2] beam git commit: This closes #2075

2017-02-22 Thread altay

This closes #2075


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e38a4803
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e38a4803
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e38a4803

Branch: refs/heads/master
Commit: e38a4803db8058bf230253f983c40615839522e6
Parents: d44f5b8 5c21d08
Author: Ahmet Altay 
Authored: Wed Feb 22 13:56:25 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 22 13:56:25 2017 -0800

--
 sdks/python/apache_beam/coders/standard_coders_test.py | 11 +++
 sdks/python/setup.py   |  1 -
 2 files changed, 7 insertions(+), 5 deletions(-)
--

[jira] [Commented] (BEAM-1517) Garbage collect user state in Flink Runner

2017-02-22 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879359#comment-15879359
 ] 

Kenneth Knowles commented on BEAM-1517:
---

[~aljoscha] that makes sense to me. In both the direct runner and Dataflow 
runner it was easy to implement "under the hood" and kept the system timers 
totally separate from the user timers, which is why I didn't produce a shared 
utility like I usually try to do. It shouldn't be much work to make such a 
`StatefulDoFnRunner` and will be useful in general.

> Garbage collect user state in Flink Runner
> --
>
> Key: BEAM-1517
> URL: https://issues.apache.org/jira/browse/BEAM-1517
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> User facing state/timers in Beam are bound to the key/window of the data. 
> Right now, the Flink Runner does not clean up user state when the watermark 
> passes the GC horizon for the state associated with a given window.
> Neither {{StateInternals}} nor the Flink state API support discarding state 
> for a whole namespace (which is the window in this case) so we might have to 
> manually set a GC timer for each window/key combination, as is done in the 
> {{ReduceFnRunner}}. For this we have to know all states a user can possibly 
> use, which we can get from the {{DoFn}} signature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1320) Add sphinx or pydocs documentation for python-sdk

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879345#comment-15879345
 ] 

ASF GitHub Bot commented on BEAM-1320:
--

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2076

[BEAM-1320] Fix warnings in the beam docstrings

R: @aaltay PTAL

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam 
BEAM-1320-fix-warnings-docstrings-in-sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2076


commit 95ee9b6d191074e58f7f954838f86fbfe29f95b9
Author: Sourabh Bajaj 
Date:   2017-02-22T22:06:37Z

[BEAM-1320] Fix warnings in the beam docstrings




> Add sphinx or pydocs documentation for python-sdk
> -
>
> Key: BEAM-1320
> URL: https://issues.apache.org/jira/browse/BEAM-1320
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2076: [BEAM-1320] Fix warnings in the beam docstrings

2017-02-22 Thread sb2nov

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2076

[BEAM-1320] Fix warnings in the beam docstrings

R: @aaltay PTAL

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam 
BEAM-1320-fix-warnings-docstrings-in-sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2076


commit 95ee9b6d191074e58f7f954838f86fbfe29f95b9
Author: Sourabh Bajaj 
Date:   2017-02-22T22:06:37Z

[BEAM-1320] Fix warnings in the beam docstrings




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Comment Edited] (BEAM-1442) Performance improvement of the Python DirectRunner

2017-02-22 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879034#comment-15879034
 ] 

Pablo Estrada edited comment on BEAM-1442 at 2/22/17 9:43 PM:
--

Hi Haoxiang,
It's great that you find the project interesting. It is a challenging (and 
exciting) project. We want to have a detailed proposal, because as you may 
guess, the project is not easy and we want to help you (or any student) 
understand the DirectRunner well before you are selected.

With this in mind, we suggest you include the following items in the proposal:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as 
specific and detailed as possible.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

Feel free to ask questions, or share your train of thought here, and we can 
help you polish the proposal to make it robust - and help you familiarize 
yourself with the DirectRunner.


was (Author: pabloem):
Hi Haoxiang,
It's great that you find the project interesting. It is a challenging -and 
exciting- project. We want to have a detailed proposal, because as you may 
guess, the project is not easy and we want to help you (or any student) 
understand the DirectRunner well before you are selected.

With this in mind, we suggest you include the following items in the proposal:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as 
specific and detailed as possible.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

Feel free to ask questions, or share your train of thought here, and we can 
help you polish the proposal to make it robust - and help you familiarize 
yourself with the DirectRunner.

> Performance improvement of the Python DirectRunner
> --
>
> Key: BEAM-1442
> URL: https://issues.apache.org/jira/browse/BEAM-1442
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Ahmet Altay
>  Labels: gsoc2017, mentor, python
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, 
> and correctness checkers for Beam pipelines; but there are users that run 
> data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although 
> some work has gone into improving it. There are more opportunities for 
> improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, 
> and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions 
> directly on JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (BEAM-1442) Performance improvement of the Python DirectRunner

2017-02-22 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879034#comment-15879034
 ] 

Pablo Estrada edited comment on BEAM-1442 at 2/22/17 9:43 PM:
--

Hi Haoxiang,
It's great that you find the project interesting. It is a challenging -and 
exciting- project. We want to have a detailed proposal, because as you may 
guess, the project is not easy and we want to help you (or any student) 
understand the DirectRunner well before you are selected.

With this in mind, we suggest you include the following items in the proposal:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as 
specific and detailed as possible.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

Feel free to ask questions, or share your train of thought here, and we can 
help you polish the proposal to make it robust - and help you familiarize 
yourself with the DirectRunner.


was (Author: pabloem):
Before we can accept your proposal we have to be confident that you understand 
the existing code and you can expand it safely.
For a proposal you should include:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as 
specific and detailed as possible. This project is not easy and we need to see 
that you have a good grasp of the different components.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

> Performance improvement of the Python DirectRunner
> --
>
> Key: BEAM-1442
> URL: https://issues.apache.org/jira/browse/BEAM-1442
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Ahmet Altay
>  Labels: gsoc2017, mentor, python
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, 
> and correctness checkers for Beam pipelines; but there are users that run 
> data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although 
> some work has gone into improving it. There are more opportunities for 
> improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, 
> and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions 
> directly on JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam-site pull request #159: Add Apache in front of initial Beam reference

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/159


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[3/3] beam-site git commit: This closes #159

2017-02-22 Thread davor

This closes #159


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7937818f
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7937818f
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7937818f

Branch: refs/heads/asf-site
Commit: 7937818f53d84436d159ac6eb5cd6b4fd58a1979
Parents: eb1974f 77d285f
Author: Davor Bonaci 
Authored: Wed Feb 22 13:33:59 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 13:33:59 2017 -0800

--
 .../sdks/python-custom-io/index.html| 48 ++--
 content/documentation/sdks/python/index.html|  4 +-
 src/documentation/sdks/python-custom-io.md  | 48 ++--
 src/documentation/sdks/python.md|  4 +-
 4 files changed, 52 insertions(+), 52 deletions(-)
--

[1/3] beam-site git commit: Add Apache in front of initial Beam reference

2017-02-22 Thread davor

Repository: beam-site
Updated Branches:
  refs/heads/asf-site eb1974f31 -> 7937818f5


Add Apache in front of initial Beam reference


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/dca566f8
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/dca566f8
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/dca566f8

Branch: refs/heads/asf-site
Commit: dca566f829498d3adfef27ab18a6b978101c
Parents: eb1974f
Author: melissa 
Authored: Fri Feb 17 13:39:45 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 13:33:36 2017 -0800

--
 src/documentation/sdks/python-custom-io.md | 48 -
 src/documentation/sdks/python.md   |  4 +--
 2 files changed, 26 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/dca566f8/src/documentation/sdks/python-custom-io.md
--
diff --git a/src/documentation/sdks/python-custom-io.md 
b/src/documentation/sdks/python-custom-io.md
index b97f01d..ee87e4e 100644
--- a/src/documentation/sdks/python-custom-io.md
+++ b/src/documentation/sdks/python-custom-io.md
@@ -1,26 +1,26 @@
 ---
 layout: default
-title: "Beam Custom Sources and Sinks for Python"
+title: "Apache Beam: Creating New Sources and Sinks with the Python SDK"
 permalink: /documentation/sdks/python-custom-io/
 ---
-# Beam Custom Sources and Sinks for Python
+# Creating New Sources and Sinks with the Python SDK
 
-The Beam SDK for Python provides an extensible API that you can use to create 
custom data sources and sinks. This tutorial shows how to create custom sources 
and sinks using [Beam's Source and Sink 
API](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py).
+The Apache Beam SDK for Python provides an extensible API that you can use to 
create new data sources and sinks. This tutorial shows how to create new 
sources and sinks using [Beam's Source and Sink 
API](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py).
 
-* Create a custom source by extending the `BoundedSource` and `RangeTracker` 
interfaces.
-* Create a custom sink by implementing the `Sink` and `Writer` classes.
+* Create a new source by extending the `BoundedSource` and `RangeTracker` 
interfaces.
+* Create a new sink by implementing the `Sink` and `Writer` classes.
 
 
-## Why Create a Custom Source or Sink
+## Why Create a New Source or Sink
 
-You'll need to create a custom source or sink if you want your pipeline to 
read data from (or write data to) a storage system for which the Beam SDK for 
Python does not provide [native support]({{ site.baseurl 
}}/documentation/programming-guide/#io).
+You'll need to create a new source or sink if you want your pipeline to read 
data from (or write data to) a storage system for which the Beam SDK for Python 
does not provide [native support]({{ site.baseurl 
}}/documentation/programming-guide/#io).
 
-In simple cases, you may not need to create a custom source or sink. For 
example, if you need to read data from an SQL database using an arbitrary 
query, none of the advanced Source API features would benefit you. Likewise, if 
you'd like to write data to a third-party API via a protocol that lacks 
deduplication support, the Sink API wouldn't benefit you. In such cases it 
makes more sense to use a `ParDo`.
+In simple cases, you may not need to create a new source or sink. For example, 
if you need to read data from an SQL database using an arbitrary query, none of 
the advanced Source API features would benefit you. Likewise, if you'd like to 
write data to a third-party API via a protocol that lacks deduplication 
support, the Sink API wouldn't benefit you. In such cases it makes more sense 
to use a `ParDo`.
 
-However, if you'd like to use advanced features such as dynamic splitting and 
size estimation, you should use Beam's APIs and create a custom source or sink.
+However, if you'd like to use advanced features such as dynamic splitting and 
size estimation, you should use Beam's APIs and create a new source or sink.
 
 
-## Basic Code Requirements for Custom Sources 
and Sinks
+## Basic Code Requirements for New Sources and 
Sinks
 
 Services use the classes you provide to read and/or write data using multiple 
worker instances in parallel. As such, the code you provide for `Source` and 
`Sink` subclasses must meet some basic requirements:
 
@@ -43,9 +43,9 @@ It is critical to exhaustively unit-test all of your `Source` 
and `Sink` subclas
 You can use test harnesses and utility methods available in the 
[source_test_utils 
module](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/source_test_utils.py)
 to develop tests

[2/3] beam-site git commit: Regenerate website

2017-02-22 Thread davor

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/77d285ff
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/77d285ff
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/77d285ff

Branch: refs/heads/asf-site
Commit: 77d285ff8445e6a9902741f21c7a63bcd07ff47e
Parents: dca566f
Author: Davor Bonaci 
Authored: Wed Feb 22 13:33:58 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 13:33:58 2017 -0800

--
 .../sdks/python-custom-io/index.html| 48 ++--
 content/documentation/sdks/python/index.html|  4 +-
 2 files changed, 26 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/77d285ff/content/documentation/sdks/python-custom-io/index.html
--
diff --git a/content/documentation/sdks/python-custom-io/index.html 
b/content/documentation/sdks/python-custom-io/index.html
index c43f606..7592a4a 100644
--- a/content/documentation/sdks/python-custom-io/index.html
+++ b/content/documentation/sdks/python-custom-io/index.html
@@ -6,7 +6,7 @@
   
   
 
-  Beam Custom Sources and Sinks for Python
+  Apache Beam: Creating New Sources and Sinks with the Python 
SDK
   
 
@@ -146,24 +146,24 @@
 
 
   
-Beam Custom Sources 
and Sinks for Python
+Creating 
New Sources and Sinks with the Python SDK
 
-The Beam SDK for Python provides an extensible API that you can use to 
create custom data sources and sinks. This tutorial shows how to create custom 
sources and sinks using https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py;>Beamâs
 Source and Sink API.
+The Apache Beam SDK for Python provides an extensible API that you can use 
to create new data sources and sinks. This tutorial shows how to create new 
sources and sinks using https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py;>Beamâs
 Source and Sink API.
 
 
-  Create a custom source by extending the BoundedSource and RangeTracker interfaces.
-  Create a custom sink by implementing the Sink and Writer classes.
+  Create a new source by extending the BoundedSource and RangeTracker interfaces.
+  Create a new sink by implementing the Sink and Writer classes.
 
 
-Why Create a Custom Source or 
Sink
+Why Create a New Source or Sink
 
-Youâll need to create a custom source or sink if you want your pipeline 
to read data from (or write data to) a storage system for which the Beam SDK 
for Python does not provide native support.
+Youâll need to create a new source or sink if you want your pipeline to 
read data from (or write data to) a storage system for which the Beam SDK for 
Python does not provide native 
support.
 
-In simple cases, you may not need to create a custom source or sink. For 
example, if you need to read data from an SQL database using an arbitrary 
query, none of the advanced Source API features would benefit you. Likewise, if 
youâd like to write data to a third-party API via a protocol that lacks 
deduplication support, the Sink API wouldnât benefit you. In such cases it 
makes more sense to use a ParDo.
+In simple cases, you may not need to create a new source or sink. For 
example, if you need to read data from an SQL database using an arbitrary 
query, none of the advanced Source API features would benefit you. Likewise, if 
youâd like to write data to a third-party API via a protocol that lacks 
deduplication support, the Sink API wouldnât benefit you. In such cases it 
makes more sense to use a ParDo.
 
-However, if youâd like to use advanced features such as dynamic splitting 
and size estimation, you should use Beamâs APIs and create a custom source or 
sink.
+However, if youâd like to use advanced features such as dynamic splitting 
and size estimation, you should use Beamâs APIs and create a new source or 
sink.
 
-Basic Code Requirements for Custom Sources and 
Sinks
+Basic Code Requirements for New Sources and 
Sinks
 
 Services use the classes you provide to read and/or write data using 
multiple worker instances in parallel. As such, the code you provide for Source and Sink subclasses must meet some basic 
requirements:
 
@@ -185,9 +185,9 @@
 
 You can use test harnesses and utility methods available in the https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/source_test_utils.py;>source_test_utils
 module to develop tests for your source.
 
-Creating a Custom Source
+Creating a New Source
 
-You should create a custom source if youâd like to use the advanced 
features that the Source API provides:
+You should create a new source if youâd like to use the advanced features 
that the Source API provides:

[1/3] beam-site git commit: Fix broken links due to code path changes

2017-02-22 Thread davor

Repository: beam-site
Updated Branches:
  refs/heads/asf-site 845f589d3 -> eb1974f31


Fix broken links due to code path changes


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/46ca2427
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/46ca2427
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/46ca2427

Branch: refs/heads/asf-site
Commit: 46ca2427a59e17edd170b6d1b11b747d73ec008e
Parents: 845f589
Author: melissa 
Authored: Wed Feb 22 13:10:09 2017 -0800
Committer: melissa 
Committed: Wed Feb 22 13:10:09 2017 -0800

--
 src/documentation/programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/46ca2427/src/documentation/programming-guide.md
--
diff --git a/src/documentation/programming-guide.md 
b/src/documentation/programming-guide.md
index 641ad2d..a40ade3 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -1030,8 +1030,8 @@ See the language specific source code directories for the 
Beam supported I/O API
   
   
   
-https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py;>Google
 BigQuery
-https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/datastore;>Google
 Cloud Datastore
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py;>Google
 BigQuery
+https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/google_cloud_platform/datastore;>Google
 Cloud Datastore

[GitHub] beam-site pull request #160: Fix broken links due to code path changes

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/160


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/3] beam-site git commit: Regenerate website

2017-02-22 Thread davor

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f0a4fde4
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f0a4fde4
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f0a4fde4

Branch: refs/heads/asf-site
Commit: f0a4fde4d376b18ca138767b6e67af149c8715d2
Parents: 46ca242
Author: Davor Bonaci 
Authored: Wed Feb 22 13:28:49 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 13:28:49 2017 -0800

--
 content/documentation/programming-guide/index.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/f0a4fde4/content/documentation/programming-guide/index.html
--
diff --git a/content/documentation/programming-guide/index.html 
b/content/documentation/programming-guide/index.html
index 0aa0575..20cf8c5 100644
--- a/content/documentation/programming-guide/index.html
+++ b/content/documentation/programming-guide/index.html
@@ -1328,8 +1328,8 @@ tree, [2]
   
   
   
-https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py;>Google
 BigQuery
-https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/datastore;>Google
 Cloud Datastore
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/google_cloud_platform/bigquery.py;>Google
 BigQuery
+https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/google_cloud_platform/datastore;>Google
 Cloud Datastore

[3/3] beam-site git commit: This closes #160

2017-02-22 Thread davor

This closes #160


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/eb1974f3
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/eb1974f3
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/eb1974f3

Branch: refs/heads/asf-site
Commit: eb1974f319d2672b1f5983b0bc70cb5e8b2dd0fd
Parents: 845f589 f0a4fde
Author: Davor Bonaci 
Authored: Wed Feb 22 13:28:49 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 22 13:28:49 2017 -0800

--
 content/documentation/programming-guide/index.html | 4 ++--
 src/documentation/programming-guide.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)
--

[jira] [Commented] (BEAM-1504) Python standard coders test doesn't run with setup test

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879272#comment-15879272
 ] 

ASF GitHub Bot commented on BEAM-1504:
--

GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/2075

[BEAM-1504] Remove the nose-parameterized dependency

R: @sb2nov 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam dep

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2075


commit 5c21d08d5343c9d0dac0bc672351bd2e82e22c50
Author: Ahmet Altay 
Date:   2017-02-22T21:19:57Z

Remove the nose-parameterized dependency




> Python standard coders test doesn't run with setup test
> ---
>
> Key: BEAM-1504
> URL: https://issues.apache.org/jira/browse/BEAM-1504
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>Priority: Minor
>
> _python setup.py test_ does not run _standard_coders_test_. This is because 
> dynamically generated tests are not visibile to _nose.collect_ plugin. 
> Fix: use _nose-parameterized_ to create parameterized tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2075: [BEAM-1504] Remove the nose-parameterized dependenc...

2017-02-22 Thread aaltay

GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/2075

[BEAM-1504] Remove the nose-parameterized dependency

R: @sb2nov 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam dep

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2075


commit 5c21d08d5343c9d0dac0bc672351bd2e82e22c50
Author: Ahmet Altay 
Date:   2017-02-22T21:19:57Z

Remove the nose-parameterized dependency




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #2071: Remove unneeded plugins sections from the IO (inher...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2071


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] beam git commit: This closes #2071

2017-02-22 Thread jbonofre

This closes #2071


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d44f5b84
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d44f5b84
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d44f5b84

Branch: refs/heads/master
Commit: d44f5b84fbd7eefbbb96c115517d8050b593ced0
Parents: b65db42 3a0f418
Author: Jean-Baptiste OnofrÃ© 
Authored: Wed Feb 22 22:16:15 2017 +0100
Committer: Jean-Baptiste OnofrÃ© 
Committed: Wed Feb 22 22:16:15 2017 +0100

--
 sdks/java/io/elasticsearch/pom.xml | 17 -
 sdks/java/io/google-cloud-platform/pom.xml |  8 
 sdks/java/io/hbase/pom.xml | 12 
 sdks/java/io/hdfs/pom.xml  |  8 
 sdks/java/io/jdbc/pom.xml  | 17 -
 sdks/java/io/jms/pom.xml   | 13 -
 sdks/java/io/kafka/pom.xml |  8 
 sdks/java/io/kinesis/pom.xml   |  9 +
 sdks/java/io/mongodb/pom.xml   | 17 -
 sdks/java/io/mqtt/pom.xml  | 17 -
 10 files changed, 1 insertion(+), 125 deletions(-)
--

[1/2] beam git commit: Remove unneeded plugins sections from the IO (inherited from parent)

2017-02-22 Thread jbonofre

Repository: beam
Updated Branches:
  refs/heads/master b65db428a -> d44f5b84f


Remove unneeded plugins sections from the IO (inherited from parent)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3a0f418b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3a0f418b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3a0f418b

Branch: refs/heads/master
Commit: 3a0f418b2f5ae815dc40542e3f0115a253b5fcc3
Parents: b65db42
Author: IsmaÃ«l MejÃa 
Authored: Wed Feb 22 10:46:27 2017 +0100
Committer: Jean-Baptiste OnofrÃ© 
Committed: Wed Feb 22 22:15:48 2017 +0100

--
 sdks/java/io/elasticsearch/pom.xml | 17 -
 sdks/java/io/google-cloud-platform/pom.xml |  8 
 sdks/java/io/hbase/pom.xml | 12 
 sdks/java/io/hdfs/pom.xml  |  8 
 sdks/java/io/jdbc/pom.xml  | 17 -
 sdks/java/io/jms/pom.xml   | 13 -
 sdks/java/io/kafka/pom.xml |  8 
 sdks/java/io/kinesis/pom.xml   |  9 +
 sdks/java/io/mongodb/pom.xml   | 17 -
 sdks/java/io/mqtt/pom.xml  | 17 -
 10 files changed, 1 insertion(+), 125 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3a0f418b/sdks/java/io/elasticsearch/pom.xml
--
diff --git a/sdks/java/io/elasticsearch/pom.xml 
b/sdks/java/io/elasticsearch/pom.xml
index da52fdd..3279dfd 100644
--- a/sdks/java/io/elasticsearch/pom.xml
+++ b/sdks/java/io/elasticsearch/pom.xml
@@ -30,23 +30,6 @@
   Apache Beam :: SDKs :: Java :: IO :: Elasticsearch
   IO to read and write on Elasticsearch.
 
-  
-
-  
-org.apache.maven.plugins
-maven-compiler-plugin
-  
-  
-org.apache.maven.plugins
-maven-surefire-plugin
-  
-  
-org.apache.maven.plugins
-maven-jar-plugin
-  
-
-  
-
   
 
   org.apache.beam

http://git-wip-us.apache.org/repos/asf/beam/blob/3a0f418b/sdks/java/io/google-cloud-platform/pom.xml
--
diff --git a/sdks/java/io/google-cloud-platform/pom.xml 
b/sdks/java/io/google-cloud-platform/pom.xml
index 95a524f..66a4207 100644
--- a/sdks/java/io/google-cloud-platform/pom.xml
+++ b/sdks/java/io/google-cloud-platform/pom.xml
@@ -39,10 +39,6 @@
 
   
 org.apache.maven.plugins
-maven-compiler-plugin
-  
-  
-org.apache.maven.plugins
 maven-surefire-plugin
 
   
@@ -50,10 +46,6 @@
   
 
   
-  
-org.apache.maven.plugins
-maven-jar-plugin
-  
 
   
   

http://git-wip-us.apache.org/repos/asf/beam/blob/3a0f418b/sdks/java/io/hbase/pom.xml
--
diff --git a/sdks/java/io/hbase/pom.xml b/sdks/java/io/hbase/pom.xml
index 23582d2..3570316 100644
--- a/sdks/java/io/hbase/pom.xml
+++ b/sdks/java/io/hbase/pom.xml
@@ -82,18 +82,6 @@
 
   
 org.apache.maven.plugins
-maven-compiler-plugin
-  
-  
-org.apache.maven.plugins
-maven-surefire-plugin
-  
-  
-org.apache.maven.plugins
-maven-jar-plugin
-  
-  
-org.apache.maven.plugins
 maven-shade-plugin
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/3a0f418b/sdks/java/io/hdfs/pom.xml
--
diff --git a/sdks/java/io/hdfs/pom.xml b/sdks/java/io/hdfs/pom.xml
index bd62bf6..cd6cf4c 100644
--- a/sdks/java/io/hdfs/pom.xml
+++ b/sdks/java/io/hdfs/pom.xml
@@ -68,10 +68,6 @@
 
   
 org.apache.maven.plugins
-maven-compiler-plugin
-  
-  
-org.apache.maven.plugins
 maven-surefire-plugin
 
   
@@ -81,10 +77,6 @@
   
   
 org.apache.maven.plugins
-maven-jar-plugin
-  
-  
-org.apache.maven.plugins
 maven-shade-plugin
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/3a0f418b/sdks/java/io/jdbc/pom.xml
--
diff --git a/sdks/java/io/jdbc/pom.xml b/sdks/java/io/jdbc/pom.xml
index 23feab6..afe428a 100644
--- a/sdks/java/io/jdbc/pom.xml
+++ b/sdks/java/io/jdbc/pom.xml
@@ -30,23 +30,6 @@
   Apache Beam :: SDKs :: Java :: IO :: JDBC
   IO to read and write on JDBC datasource.
 
-  
-
-  
-org.apache.maven.plugins
-maven-compiler-plugin
-  
-  
-org.apache.maven.plugins
-maven-surefire-plugin
-  
-

[GitHub] beam-site pull request #160: Fix broken links due to code path changes

2017-02-22 Thread melap

GitHub user melap opened a pull request:

https://github.com/apache/beam-site/pull/160

Fix broken links due to code path changes



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/melap/beam-site linkfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/160.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #160


commit 46ca2427a59e17edd170b6d1b11b747d73ec008e
Author: melissa 
Date:   2017-02-22T21:10:09Z

Fix broken links due to code path changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2376

2017-02-22 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-646) Get runners out of the apply()

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879192#comment-15879192
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2061


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>  Labels: backwards-incompatible
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2061: [BEAM-646] Deprecate, Document View.CreatePCollecti...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2061


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] beam git commit: This closes #2061

2017-02-22 Thread tgroh

This closes #2061


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b65db428
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b65db428
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b65db428

Branch: refs/heads/master
Commit: b65db428aa30140fdbc7dc84a9c51bef85d1b60f
Parents: 41de930 a8b46cf
Author: Thomas Groh 
Authored: Wed Feb 22 12:42:23 2017 -0800
Committer: Thomas Groh 
Committed: Wed Feb 22 12:42:23 2017 -0800

--
 .../java/org/apache/beam/runners/apex/ApexRunner.java |  4 
 .../runners/apex/translation/ApexPipelineTranslator.java  |  8 
 .../apache/beam/runners/direct/ViewEvaluatorFactory.java  |  1 +
 .../main/java/org/apache/beam/sdk/transforms/View.java| 10 ++
 4 files changed, 15 insertions(+), 8 deletions(-)
--

[1/2] beam git commit: Remove uses of CreatePCollectionView#getView

2017-02-22 Thread tgroh

Repository: beam
Updated Branches:
  refs/heads/master 41de9301a -> b65db428a


Remove uses of CreatePCollectionView#getView

Views output by this transform should be obtained by inspecting the
graph node, not by interrogating the PTransform. Doing otherwise may use
incorrect views after Graph Surgery has been performed.

The result of getView can be used to, for example, return the same type
of view. The view returned by this method should be interpreted as a
PCollectionView spec rather than a PValue, as the graph containing the
PTransform and PCollectionView can change independently.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a8b46cf9
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a8b46cf9
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a8b46cf9

Branch: refs/heads/master
Commit: a8b46cf984181b6ec0dfc8a59c5e1ed180e2a29e
Parents: 41de930
Author: Thomas Groh 
Authored: Tue Feb 21 17:10:13 2017 -0800
Committer: Thomas Groh 
Committed: Wed Feb 22 12:42:22 2017 -0800

--
 .../java/org/apache/beam/runners/apex/ApexRunner.java |  4 
 .../runners/apex/translation/ApexPipelineTranslator.java  |  8 
 .../apache/beam/runners/direct/ViewEvaluatorFactory.java  |  1 +
 .../main/java/org/apache/beam/sdk/transforms/View.java| 10 ++
 4 files changed, 15 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a8b46cf9/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
index e220e6c..1eb5e72 100644
--- a/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
+++ b/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
@@ -213,10 +213,6 @@ public class ApexRunner extends 
PipelineRunner {
   return new CreateApexPCollectionView<>(view);
 }
 
-public PCollectionView getView() {
-  return view;
-}
-
 @Override
 public PCollectionView expand(PCollection input) {
   return view;

http://git-wip-us.apache.org/repos/asf/beam/blob/a8b46cf9/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
index c8e0290..0818c36 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
@@ -157,7 +157,7 @@ public class ApexPipelineTranslator implements 
Pipeline.PipelineVisitor {
 @Override
 public void translate(CreateApexPCollectionView transform,
 TranslationContext context) {
-  PCollectionView view = transform.getView();
+  PCollectionView view = (PCollectionView) 
context.getOutput();
   context.addView(view);
   LOG.debug("view {}", view.getName());
 }
@@ -168,9 +168,9 @@ public class ApexPipelineTranslator implements 
Pipeline.PipelineVisitor {
 private static final long serialVersionUID = 1L;
 
 @Override
-public void translate(CreatePCollectionView transform,
-TranslationContext context) {
-  PCollectionView view = transform.getView();
+public void translate(
+CreatePCollectionView transform, TranslationContext 
context) {
+  PCollectionView view = (PCollectionView) 
context.getOutput();
   context.addView(view);
   LOG.debug("view {}", view.getName());
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/a8b46cf9/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ViewEvaluatorFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ViewEvaluatorFactory.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ViewEvaluatorFactory.java
index 0fa6254..1548772 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ViewEvaluatorFactory.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ViewEvaluatorFactory.java
@@ -148,6 +148,7 @@ class ViewEvaluatorFactory implements 
TransformEvaluatorFactory {
 }
 
 @Override
+@SuppressWarnings("deprecation")
 public PCollectionView expand(PCollection input) {

[jira] [Resolved] (BEAM-1465) No natural place to flush resources in FileBasedWriter

2017-02-22 Thread Aviem Zur (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1465.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> No natural place to flush resources in FileBasedWriter
> --
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> {{FileBasedWriter}} API does not have a natural place to flush resources 
> opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1465) No natural place to flush resources in FileBasedWriter

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879178#comment-15879178
 ] 

ASF GitHub Bot commented on BEAM-1465:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2067


> No natural place to flush resources in FileBasedWriter
> --
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> {{FileBasedWriter}} API does not have a natural place to flush resources 
> opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2067: [BEAM-1465] Add finishWrite method to FileBasedWrit...

2017-02-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2067


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: [BEAM-1465] No natural place to flush/close resources in FileBasedWriter

2017-02-22 Thread dhalperi

Repository: beam
Updated Branches:
  refs/heads/master ede77c1b5 -> 41de9301a


[BEAM-1465] No natural place to flush/close resources in FileBasedWriter


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2b6a08f6
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2b6a08f6
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2b6a08f6

Branch: refs/heads/master
Commit: 2b6a08f6cb37a9c51d9ae5a63e6b69a310bb3aae
Parents: ede77c1
Author: Aviem Zur 
Authored: Wed Feb 22 06:26:38 2017 +0200
Committer: Dan Halperin 
Committed: Wed Feb 22 12:37:30 2017 -0800

--
 .../src/main/java/org/apache/beam/sdk/io/AvroIO.java |  2 +-
 .../main/java/org/apache/beam/sdk/io/FileBasedSink.java  | 11 +++
 .../src/main/java/org/apache/beam/sdk/io/TextIO.java | 11 +++
 3 files changed, 19 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/2b6a08f6/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
index 01a4cba..388d9f0 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
@@ -1032,7 +1032,7 @@ public class AvroIO {
   }
 
   @Override
-  protected void writeFooter() throws Exception {
+  protected void finishWrite() throws Exception {
 dataFileWriter.flush();
   }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/2b6a08f6/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
index 32b8b4f..e14ba59 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
@@ -590,6 +590,12 @@ public abstract class FileBasedSink extends Sink {
 protected void writeFooter() throws Exception {}
 
 /**
+ * Called after all calls to {@link #writeHeader}, {@link #write} and 
{@link #writeFooter}.
+ * If any resources opened in the write processes need to be flushed, 
flush them here.
+ */
+protected void finishWrite() throws Exception {}
+
+/**
  * Opens the channel.
  */
 @Override
@@ -630,6 +636,11 @@ public abstract class FileBasedSink extends Sink {
   try (WritableByteChannel theChannel = channel) {
 LOG.debug("Writing footer to {}.", filename);
 writeFooter();
+LOG.debug("Finishing write to {}.", filename);
+finishWrite();
+if (!channel.isOpen()) {
+  throw new IllegalStateException("Channel should only be closed by 
its owner: " + channel);
+}
   }
   FileResult result = new FileResult(filename);
   LOG.debug("Result for bundle {}: {}", this.id, filename);

http://git-wip-us.apache.org/repos/asf/beam/blob/2b6a08f6/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
index 726411c..86e6989 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
@@ -1101,15 +1101,18 @@ public class TextIO {
   }
 
   @Override
+  public void write(String value) throws Exception {
+writeLine(value);
+  }
+
+  @Override
   protected void writeFooter() throws Exception {
 writeIfNotNull(footer);
-// Flush here because there is currently no other natural place to do 
this. [BEAM-1465]
-out.flush();
   }
 
   @Override
-  public void write(String value) throws Exception {
-writeLine(value);
+  protected void finishWrite() throws Exception {
+out.flush();
   }
 }
   }

[2/2] beam git commit: This closes #2067

2017-02-22 Thread dhalperi

This closes #2067


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/41de9301
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/41de9301
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/41de9301

Branch: refs/heads/master
Commit: 41de9301af367a2dff19255d2d2fe563368af55c
Parents: ede77c1 2b6a08f
Author: Dan Halperin 
Authored: Wed Feb 22 12:37:36 2017 -0800
Committer: Dan Halperin 
Committed: Wed Feb 22 12:37:36 2017 -0800

--
 .../src/main/java/org/apache/beam/sdk/io/AvroIO.java |  2 +-
 .../main/java/org/apache/beam/sdk/io/FileBasedSink.java  | 11 +++
 .../src/main/java/org/apache/beam/sdk/io/TextIO.java | 11 +++
 3 files changed, 19 insertions(+), 5 deletions(-)
--

[jira] [Commented] (BEAM-475) High-quality javadoc for Beam

2017-02-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879161#comment-15879161
 ] 

ASF GitHub Bot commented on BEAM-475:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/139


> High-quality javadoc for Beam
> -
>
> Key: BEAM-475
> URL: https://issues.apache.org/jira/browse/BEAM-475
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Daniel Halperin
>Assignee: Benson Margulies
> Fix For: Not applicable
>
>
> We should have good Javadoc for Beam!
> Current snapshot: http://beam.incubator.apache.org/javadoc/0.1.0-incubating/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[2/4] beam-site git commit: Closes #139

2017-02-22 Thread dhalperi

Closes #139


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/2d661f9f
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/2d661f9f
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/2d661f9f

Branch: refs/heads/asf-site
Commit: 2d661f9f942128d3bc06d080477a6bbc4e064d1e
Parents: 940eb06 bdb0ab4
Author: Dan Halperin 
Authored: Wed Feb 22 12:10:55 2017 -0800
Committer: Dan Halperin 
Committed: Wed Feb 22 12:10:55 2017 -0800

--
 src/contribute/release-guide.md | 56 ++--
 1 file changed, 35 insertions(+), 21 deletions(-)
--

1 2 >

1 - 100 of 150 matches

Mail list logo