Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2707

2017-07-25 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2706

2017-07-25 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3642: Sets desired bundle size on AvroIO.readAll

2017-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3642


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Sets desired bundle size on AvroIO.readAll

2017-07-25 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master a9fdc3bc4 -> bc72c9405


Sets desired bundle size on AvroIO.readAll


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/56a1bcea
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/56a1bcea
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/56a1bcea

Branch: refs/heads/master
Commit: 56a1bceaad8aeec4705c562db4dab08a4d4f36b3
Parents: a9fdc3b
Author: Eugene Kirpichov 
Authored: Tue Jul 25 20:00:41 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jul 25 23:22:17 2017 -0700

--
 sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/56a1bcea/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
index f201114..bc7fecb 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
@@ -178,6 +178,7 @@ public class AvroIO {
 return new AutoValue_AvroIO_ReadAll.Builder()
 .setRecordClass(GenericRecord.class)
 .setSchema(schema)
+.setDesiredBundleSizeBytes(64 * 1024 * 1024L)
 .build();
   }
 



[2/2] beam git commit: This closes #3642: Sets desired bundle size on AvroIO.readAll

2017-07-25 Thread jkff
This closes #3642: Sets desired bundle size on AvroIO.readAll


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/bc72c940
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/bc72c940
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/bc72c940

Branch: refs/heads/master
Commit: bc72c940537dbfcfa982cc1867af087487ed7441
Parents: a9fdc3b 56a1bce
Author: Eugene Kirpichov 
Authored: Tue Jul 25 23:22:38 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jul 25 23:22:38 2017 -0700

--
 sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java | 1 +
 1 file changed, 1 insertion(+)
--




[2/2] beam git commit: This closes #3630

2017-07-25 Thread chamikara
This closes #3630


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a9fdc3bc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a9fdc3bc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a9fdc3bc

Branch: refs/heads/master
Commit: a9fdc3bc4edf7d083efc163d4aff3f68d59c89b1
Parents: d919394 483abc0
Author: chamik...@google.com 
Authored: Tue Jul 25 22:58:55 2017 -0700
Committer: chamik...@google.com 
Committed: Tue Jul 25 22:58:55 2017 -0700

--
 sdks/python/apache_beam/io/gcp/bigquery.py  | 19 +++
 sdks/python/apache_beam/io/gcp/bigquery_test.py |  3 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)
--




[GitHub] beam pull request #3630: We shouldn't write to re-created tables for 2 mins

2017-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3630


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: We shouldn't write to re-created tables for 2 mins

2017-07-25 Thread chamikara
Repository: beam
Updated Branches:
  refs/heads/master d919394c7 -> a9fdc3bc4


We shouldn't write to re-created tables for 2 mins


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/483abc09
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/483abc09
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/483abc09

Branch: refs/heads/master
Commit: 483abc0941f0fb42c506565f6912153296fd94b5
Parents: d919394
Author: Sourabh Bajaj 
Authored: Mon Jul 24 15:54:02 2017 -0700
Committer: chamik...@google.com 
Committed: Tue Jul 25 22:58:23 2017 -0700

--
 sdks/python/apache_beam/io/gcp/bigquery.py  | 19 +++
 sdks/python/apache_beam/io/gcp/bigquery_test.py |  3 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/483abc09/sdks/python/apache_beam/io/gcp/bigquery.py
--
diff --git a/sdks/python/apache_beam/io/gcp/bigquery.py 
b/sdks/python/apache_beam/io/gcp/bigquery.py
index 23fd310..db6715a 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery.py
@@ -1002,12 +1002,23 @@ class BigQueryWrapper(object):
 if found_table and write_disposition != BigQueryDisposition.WRITE_TRUNCATE:
   return found_table
 else:
+  created_table = self._create_table(project_id=project_id,
+ dataset_id=dataset_id,
+ table_id=table_id,
+ schema=schema or found_table.schema)
   # if write_disposition == BigQueryDisposition.WRITE_TRUNCATE we delete
   # the table before this point.
-  return self._create_table(project_id=project_id,
-dataset_id=dataset_id,
-table_id=table_id,
-schema=schema or found_table.schema)
+  if write_disposition == BigQueryDisposition.WRITE_TRUNCATE:
+# BigQuery can route data to the old table for 2 mins max so wait
+# that much time before creating the table and writing it
+logging.warning('Sleeping for 150 seconds before the write as ' +
+'BigQuery inserts can be routed to deleted table ' +
+'for 2 mins after the delete and create.')
+# TODO(BEAM-2673): Remove this sleep by migrating to load api
+time.sleep(150)
+return created_table
+  else:
+return created_table
 
   def run_query(self, project_id, query, use_legacy_sql, flatten_results,
 dry_run=False):

http://git-wip-us.apache.org/repos/asf/beam/blob/483abc09/sdks/python/apache_beam/io/gcp/bigquery_test.py
--
diff --git a/sdks/python/apache_beam/io/gcp/bigquery_test.py 
b/sdks/python/apache_beam/io/gcp/bigquery_test.py
index 14247ba..bfd06ac 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery_test.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery_test.py
@@ -650,7 +650,8 @@ class TestBigQueryWriter(unittest.TestCase):
 self.assertFalse(client.tables.Delete.called)
 self.assertFalse(client.tables.Insert.called)
 
-  def test_table_with_write_disposition_truncate(self):
+  @mock.patch('time.sleep', return_value=None)
+  def test_table_with_write_disposition_truncate(self, _patched_sleep):
 client = mock.Mock()
 table = bigquery.Table(
 tableReference=bigquery.TableReference(



[jira] [Updated] (BEAM-2683) Update GCSFileSystem to support glob patterns of the form {x,y,z}

2017-07-25 Thread Chamikara Jayalath (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath updated BEAM-2683:
-
Component/s: (was: sdk-java-core)
 sdk-java-extensions

> Update GCSFileSystem to support glob patterns of the form {x,y,z} 
> --
>
> Key: BEAM-2683
> URL: https://issues.apache.org/jira/browse/BEAM-2683
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Chamikara Jayalath
>
> Currently glob patterns of the form {x,y,z} (where either one of these values 
> is  valid) are not supported by GCSFileSystem. See 
> http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob for the 
> definition of Java glob patterns.
> We should also update documentation to clarify that support for glob-patterns 
> is file-system dependent.
> Also applies to Python SDK.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2683) Update GCSFileSystem to support glob patterns of the form {x,y,z}

2017-07-25 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-2683:


 Summary: Update GCSFileSystem to support glob patterns of the form 
{x,y,z} 
 Key: BEAM-2683
 URL: https://issues.apache.org/jira/browse/BEAM-2683
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Chamikara Jayalath


Currently glob patterns of the form {x,y,z} (where either one of these values 
is  valid) are not supported by GCSFileSystem. See 
http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob for the 
definition of Java glob patterns.

We should also update documentation to clarify that support for glob-patterns 
is file-system dependent.

Also applies to Python SDK.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2482) CodedValueMutationDetector should use the coders structural value

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101174#comment-16101174
 ] 

ASF GitHub Bot commented on BEAM-2482:
--

Github user evindj closed the pull request at:

https://github.com/apache/beam/pull/3406


> CodedValueMutationDetector should use the coders structural value
> -
>
> Key: BEAM-2482
> URL: https://issues.apache.org/jira/browse/BEAM-2482
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Innocent
>Priority: Minor
>  Labels: newbie, starter
>
> Currently the CodedValueMutationDetector[1] checks to see if the objects are 
> Java equals and then compares whether the bytes are equal. Instead of relying 
> on the bytes, we should rely on the coders structural value explicitly[2]. 
> This allows for a coder to provide a fast path equality check with the 
> default still comparing byte array representations.
> 1: 
> https://github.com/apache/beam/blob/eae0d05bd7c088accd927dcfe3e511efbb11c9fd/sdks/java/core/src/main/java/org/apache/beam/sdk/util/MutationDetectors.java
> 2: 
> https://github.com/apache/beam/blob/01b3f87f977d44eac23eb5488074bbc638858a9d/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L252



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2556) Client-side throttling for Datastore connector

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101173#comment-16101173
 ] 

ASF GitHub Bot commented on BEAM-2556:
--

GitHub user cph6 opened a pull request:

https://github.com/apache/beam/pull/3644

[BEAM-2556] Add client-side throttling.

The approach used is as described in

https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
. By backing off individual workers in response to high error rates, we 
relieve
pressure on the Datastore service, increasing the chance that the workload 
can
complete successfully. This matches the implementation in the Java SDK.

Also fix the timekeeping of the dynamic batching (seconds vs milliseconds).

R: @vikkyrk

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cph6/beam datastore_adaptive_py

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3644


commit 0b0d90cb45d0dd94e9bdab5f2ae41f5458019883
Author: Colin Phipps 
Date:   2017-07-24T21:01:09Z

Add client-side throttling.

The approach used is as described in

https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
. By backing off individual workers in response to high error rates, we 
relieve
pressure on the Datastore service, increasing the chance that the workload 
can
complete successfully. This matches the implementation in the Java SDK.




> Client-side throttling for Datastore connector
> --
>
> Key: BEAM-2556
> URL: https://issues.apache.org/jira/browse/BEAM-2556
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Colin Phipps
>Assignee: Colin Phipps
>Priority: Minor
>  Labels: datastore
>
> The Datastore connector currently has exponential backoff on errors, which is 
> good. But it does not do any other throttling of its write load in response 
> to errors; once a request succeeds, it resumes writing as quickly as it can.
> Write loads will be more stable and more likely to compete if the client 
> throttles itself in the event that it receives high rates of errors from the 
> Datastore service; specifically 
> https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
>  is a technique that Google has had success with on other services.
> We (Datastore) have a patch in progress to add this behaviour to the 
> connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3406: [BEAM-2482] CodedValueMutationDetector should use t...

2017-07-25 Thread evindj
Github user evindj closed the pull request at:

https://github.com/apache/beam/pull/3406


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3644: [BEAM-2556] Add client-side throttling.

2017-07-25 Thread cph6
GitHub user cph6 opened a pull request:

https://github.com/apache/beam/pull/3644

[BEAM-2556] Add client-side throttling.

The approach used is as described in

https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
. By backing off individual workers in response to high error rates, we 
relieve
pressure on the Datastore service, increasing the chance that the workload 
can
complete successfully. This matches the implementation in the Java SDK.

Also fix the timekeeping of the dynamic batching (seconds vs milliseconds).

R: @vikkyrk

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cph6/beam datastore_adaptive_py

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3644


commit 0b0d90cb45d0dd94e9bdab5f2ae41f5458019883
Author: Colin Phipps 
Date:   2017-07-24T21:01:09Z

Add client-side throttling.

The approach used is as described in

https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
. By backing off individual workers in response to high error rates, we 
relieve
pressure on the Datastore service, increasing the chance that the workload 
can
complete successfully. This matches the implementation in the Java SDK.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3643: [Beam-2482] - CodedValueMutationDetector should use...

2017-07-25 Thread evindj
GitHub user evindj opened a pull request:

https://github.com/apache/beam/pull/3643

[Beam-2482] - CodedValueMutationDetector should use the coders structural 
value

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/evindj/beam BEAM-2482

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3643


commit 3aa0fc37d96b6e76dbc540fc0fb538a438cbf6e9
Author: Innocent Djiofack 
Date:   2017-07-26T03:41:02Z

Changed the mutation detector to be based on structural value only

commit 36c7a14c69ba628f477798f63d4e721dc86e2be7
Author: Innocent Djiofack 
Date:   2017-07-26T04:51:21Z

fixed compilation errors and a bug in ByteArraycoder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-2648) beam_PerformanceTests_Python failing since 2017-07-17

2017-07-25 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-2648.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

> beam_PerformanceTests_Python failing since 2017-07-17
> -
>
> Key: BEAM-2648
> URL: https://issues.apache.org/jira/browse/BEAM-2648
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Kenneth Knowles
>Assignee: Mark Liu
> Fix For: 2.2.0
>
>
> See 
> https://builds.apache.org/blue/organizations/jenkins/beam_PerformanceTests_Python/activity



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3650

2017-07-25 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3642: Sets desired bundle size on AvroIO.readAll

2017-07-25 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3642

Sets desired bundle size on AvroIO.readAll

This was overlooked cause unit tests set it explicitly to a small value in 
order to provide more coverage.

R: @mairbek 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam avroio-bundle-size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3642


commit b6d2f89575b6ae892d3ee96a9a5edd79ad1d96d3
Author: Eugene Kirpichov 
Date:   2017-07-26T03:00:41Z

Sets desired bundle size on AvroIO.readAll




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2682) Merge AvroIOTest and AvroIOTransformTest

2017-07-25 Thread Eugene Kirpichov (JIRA)
Eugene Kirpichov created BEAM-2682:
--

 Summary: Merge AvroIOTest and AvroIOTransformTest
 Key: BEAM-2682
 URL: https://issues.apache.org/jira/browse/BEAM-2682
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov


These two tests seem to have exactly the same purpose. They should be merged 
into AvroIOTest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2681) TransformHierarchy.Node getInputs() TupleTags doesn't match previous ParDo.MultiOutput's getOutputs().

2017-07-25 Thread Pei He (JIRA)
Pei He created BEAM-2681:


 Summary: TransformHierarchy.Node getInputs() TupleTags doesn't 
match previous ParDo.MultiOutput's getOutputs().
 Key: BEAM-2681
 URL: https://issues.apache.org/jira/browse/BEAM-2681
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Pei He
Assignee: Thomas Groh
Priority: Minor


For example,
ParDo.MultiOutput ->PCollectionTuple [tag1 -> pc1 (tag_pc1), tag2 -> pc2 
(tag_pc2)]
pc1 (tag_pc1) -> transform_a
pc2 (tag_pc2) -> transform_b

During translation, calling getOutputs() for Node ParDo.MultiOutput will 
returns map keyed by tag1 and tag2, and calling getInputs() for Node 
transform_a and transform_b will returns map keyed by tag_pc1 and tag_pc2 
respectively.

Every runner will need to add special case for ParDo translation in order to 
handle this.
Is this a bug or intended? (If it is intended, should we document it somewhere 
for runner implementors?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-79) Gearpump runner

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101071#comment-16101071
 ] 

ASF GitHub Bot commented on BEAM-79:


Github user manuzhang closed the pull request at:

https://github.com/apache/beam/pull/3611


> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-gearpump
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3611: [BEAM-79] merge gearpump-runner into master

2017-07-25 Thread manuzhang
Github user manuzhang closed the pull request at:

https://github.com/apache/beam/pull/3611


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2680) Improve scalability of the Watch transform

2017-07-25 Thread Eugene Kirpichov (JIRA)
Eugene Kirpichov created BEAM-2680:
--

 Summary: Improve scalability of the Watch transform
 Key: BEAM-2680
 URL: https://issues.apache.org/jira/browse/BEAM-2680
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov


https://github.com/apache/beam/pull/3565 introduces the Watch transform 
http://s.apache.org/beam-watch-transform.

The implementation leaves several scalability-related TODOs:
1) The state stores hashes and timestamps of outputs that have already been 
output and should be omitted from future polls. We could garbage-collect this 
state, e.g. dropping elements from "completed" and from addNewAsPending() if 
their timestamp is more than X behind the watermark.
2) When a poll returns a huge number of elements, we don't necessarily have to 
add all of them into state.pending - instead we could add only N elements and 
ignore others, relying on future poll rounds to provide them, in order to avoid 
blowing up the state. Combined with garbage collection of 
GrowthState.completed, this would make the transform scalable to very large 
poll results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3649

2017-07-25 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2705

2017-07-25 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2648) beam_PerformanceTests_Python failing since 2017-07-17

2017-07-25 Thread Mark Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101035#comment-16101035
 ] 

Mark Liu commented on BEAM-2648:


PR is merged. Recent Jenkins run passed.
https://builds.apache.org/view/Beam/job/beam_PerformanceTests_Python/142/

> beam_PerformanceTests_Python failing since 2017-07-17
> -
>
> Key: BEAM-2648
> URL: https://issues.apache.org/jira/browse/BEAM-2648
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Kenneth Knowles
>Assignee: Mark Liu
>
> See 
> https://builds.apache.org/blue/organizations/jenkins/beam_PerformanceTests_Python/activity



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3594: Do not submit: scratch work

2017-07-25 Thread charlesccychen
Github user charlesccychen closed the pull request at:

https://github.com/apache/beam/pull/3594


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (BEAM-2656) Introduce AvroIO.readAll()

2017-07-25 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2656.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> Introduce AvroIO.readAll()
> --
>
> Key: BEAM-2656
> URL: https://issues.apache.org/jira/browse/BEAM-2656
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 2.2.0
>
>
> TextIO.readAll() is nifty and performant when reading a large number of files.
> We should similarly have AvroIO.readAll(). Maybe other connectors too in the 
> future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2656) Introduce AvroIO.readAll()

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101013#comment-16101013
 ] 

ASF GitHub Bot commented on BEAM-2656:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3615


> Introduce AvroIO.readAll()
> --
>
> Key: BEAM-2656
> URL: https://issues.apache.org/jira/browse/BEAM-2656
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> TextIO.readAll() is nifty and performant when reading a large number of files.
> We should similarly have AvroIO.readAll(). Maybe other connectors too in the 
> future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3615: [BEAM-2656] Introduces AvroIO.readAll()

2017-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3615


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/4] beam git commit: Extracts common logic from TextIO.ReadAll into a utility transform

2017-07-25 Thread jkff
Extracts common logic from TextIO.ReadAll into a utility transform


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/eaf0b363
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/eaf0b363
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/eaf0b363

Branch: refs/heads/master
Commit: eaf0b36313fcd59963b2efbf16f50dd913da7de2
Parents: e80c83b
Author: Eugene Kirpichov 
Authored: Fri Jul 21 14:09:13 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jul 25 17:36:49 2017 -0700

--
 .../beam/sdk/io/ReadAllViaFileBasedSource.java  | 152 +++
 .../java/org/apache/beam/sdk/io/TextIO.java | 135 
 2 files changed, 179 insertions(+), 108 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/eaf0b363/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java
new file mode 100644
index 000..66aa41e
--- /dev/null
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/ReadAllViaFileBasedSource.java
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import java.io.IOException;
+import java.util.concurrent.ThreadLocalRandom;
+import org.apache.beam.sdk.io.fs.MatchResult;
+import org.apache.beam.sdk.io.range.OffsetRange;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.Reshuffle;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.Values;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+
+/**
+ * Reads each filepattern in the input {@link PCollection} using given 
parameters for splitting
+ * files into offset ranges and for creating a {@link FileBasedSource} for a 
file.
+ */
+class ReadAllViaFileBasedSource extends PTransform, 
PCollection> {
+  private final SerializableFunction isSplittable;
+  private final long desiredBundleSizeBytes;
+  private final SerializableFunction> createSource;
+
+  public ReadAllViaFileBasedSource(
+  SerializableFunction isSplittable,
+  long desiredBundleSizeBytes,
+  SerializableFunction> createSource) {
+this.isSplittable = isSplittable;
+this.desiredBundleSizeBytes = desiredBundleSizeBytes;
+this.createSource = createSource;
+  }
+
+  @Override
+  public PCollection expand(PCollection input) {
+return input
+.apply("Expand glob", ParDo.of(new ExpandGlobFn()))
+.apply(
+"Split into ranges",
+ParDo.of(new SplitIntoRangesFn(isSplittable, 
desiredBundleSizeBytes)))
+.apply("Reshuffle", new 
ReshuffleWithUniqueKey>())
+.apply("Read ranges", ParDo.of(new ReadFileRangesFn(createSource)));
+  }
+
+  private static class ReshuffleWithUniqueKey
+  extends PTransform, PCollection> {
+@Override
+public PCollection expand(PCollection input) {
+  return input
+  .apply("Unique key", ParDo.of(new AssignUniqueKeyFn()))
+  .apply("Reshuffle", Reshuffle.of())
+  .apply("Values", Values.create());
+}
+  }
+
+  private static class AssignUniqueKeyFn extends DoFn> {
+private int index;
+
+@Setup
+public void setup() {
+  this.index = ThreadLocalRandom.current().nextInt();
+}
+
+@ProcessElement
+public void process(ProcessContext c) {
+  c.output(KV.of(++index, c.element()));
+}
+  }
+
+  private static class ExpandGlobFn extends DoFn 
{
+@ProcessElement
+public void process(ProcessContext c) throws Exception {
+  MatchResult match = FileSystems.match(c.element());
+  checkArgument(
+  match.sta

[4/4] beam git commit: This closes #3615: [BEAM-2656] Introduces AvroIO.readAll()

2017-07-25 Thread jkff
This closes #3615: [BEAM-2656] Introduces AvroIO.readAll()


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d919394c
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d919394c
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d919394c

Branch: refs/heads/master
Commit: d919394c7fb4080a2069d24da5a09a410e215e9f
Parents: 71196ec ee1bcba
Author: Eugene Kirpichov 
Authored: Tue Jul 25 17:37:06 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jul 25 17:37:06 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroIO.java | 165 +++
 .../java/org/apache/beam/sdk/io/AvroSource.java |  24 ++-
 .../java/org/apache/beam/sdk/io/AvroUtils.java  |  40 +
 .../apache/beam/sdk/io/BlockBasedSource.java|   6 +
 .../beam/sdk/io/ReadAllViaFileBasedSource.java  | 152 +
 .../java/org/apache/beam/sdk/io/TextIO.java | 135 +++
 .../java/org/apache/beam/sdk/io/AvroIOTest.java |  65 +++-
 7 files changed, 436 insertions(+), 151 deletions(-)
--




[3/4] beam git commit: Adds ValueProvider support to AvroIO.Read

2017-07-25 Thread jkff
Adds ValueProvider support to AvroIO.Read


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e80c83b2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e80c83b2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e80c83b2

Branch: refs/heads/master
Commit: e80c83b2a0a1cf55aa8a452a02a76c9dc13697cc
Parents: 71196ec
Author: Eugene Kirpichov 
Authored: Fri Jul 21 12:38:17 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jul 25 17:36:49 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroIO.java | 49 
 .../java/org/apache/beam/sdk/io/AvroSource.java | 24 +++---
 .../apache/beam/sdk/io/BlockBasedSource.java|  6 +++
 3 files changed, 42 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/e80c83b2/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
index 89cadbd..d308c85 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
@@ -18,6 +18,7 @@
 package org.apache.beam.sdk.io;
 
 import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
 
 import com.google.auto.value.AutoValue;
 import com.google.common.collect.ImmutableMap;
@@ -36,7 +37,6 @@ import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.VoidCoder;
 import org.apache.beam.sdk.io.FileBasedSink.DynamicDestinations;
 import org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy;
-import org.apache.beam.sdk.io.Read.Bounded;
 import org.apache.beam.sdk.io.fs.ResourceId;
 import org.apache.beam.sdk.options.ValueProvider;
 import org.apache.beam.sdk.options.ValueProvider.NestedValueProvider;
@@ -185,10 +185,10 @@ public class AvroIO {
 .setWindowedWrites(false);
   }
 
-  /** Implementation of {@link #read}. */
+  /** Implementation of {@link #read} and {@link #readGenericRecords}. */
   @AutoValue
   public abstract static class Read extends PTransform> {
-@Nullable abstract String getFilepattern();
+@Nullable abstract ValueProvider getFilepattern();
 @Nullable abstract Class getRecordClass();
 @Nullable abstract Schema getSchema();
 
@@ -196,7 +196,7 @@ public class AvroIO {
 
 @AutoValue.Builder
 abstract static class Builder {
-  abstract Builder setFilepattern(String filepattern);
+  abstract Builder setFilepattern(ValueProvider filepattern);
   abstract Builder setRecordClass(Class recordClass);
   abstract Builder setSchema(Schema schema);
 
@@ -204,45 +204,34 @@ public class AvroIO {
 }
 
 /** Reads from the given filename or filepattern. */
-public Read from(String filepattern) {
+public Read from(ValueProvider filepattern) {
   return toBuilder().setFilepattern(filepattern).build();
 }
 
+/** Like {@link #from(ValueProvider)}. */
+public Read from(String filepattern) {
+  return from(StaticValueProvider.of(filepattern));
+}
+
 @Override
 public PCollection expand(PBegin input) {
-  if (getFilepattern() == null) {
-throw new IllegalStateException(
-"need to set the filepattern of an AvroIO.Read transform");
-  }
-  if (getSchema() == null) {
-throw new IllegalStateException("need to set the schema of an 
AvroIO.Read transform");
-  }
+  checkNotNull(getFilepattern(), "filepattern");
+  checkNotNull(getSchema(), "schema");
 
   @SuppressWarnings("unchecked")
-  Bounded read =
+  AvroSource source =
   getRecordClass() == GenericRecord.class
-  ? (Bounded) org.apache.beam.sdk.io.Read.from(
-  AvroSource.from(getFilepattern()).withSchema(getSchema()))
-  : org.apache.beam.sdk.io.Read.from(
-  
AvroSource.from(getFilepattern()).withSchema(getRecordClass()));
-
-  PCollection pcol = input.getPipeline().apply("Read", read);
-  // Honor the default output coder that would have been used by this 
PTransform.
-  pcol.setCoder(getDefaultOutputCoder());
-  return pcol;
+  ? (AvroSource) 
AvroSource.from(getFilepattern()).withSchema(getSchema())
+  : AvroSource.from(getFilepattern()).withSchema(getRecordClass());
+
+  return input.getPipeline().apply("Read", 
org.apache.beam.sdk.io.Read.from(source));
 }
 
 @Override
 public void populateDisplayData(DisplayData.Builder builder) {
   super.populateDisplayData(builder);
-  builder
-.addIfNotNull(DisplayData.item("filePattern", getFilepattern())
- 

[1/4] beam git commit: Introduces AvroIO.readAll() and readAllGenericRecords()

2017-07-25 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master 71196ec9c -> d919394c7


Introduces AvroIO.readAll() and readAllGenericRecords()


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ee1bcbae
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ee1bcbae
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ee1bcbae

Branch: refs/heads/master
Commit: ee1bcbae08fb221e392175fbd0387594653d4a86
Parents: eaf0b36
Author: Eugene Kirpichov 
Authored: Fri Jul 21 14:09:35 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jul 25 17:36:49 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroIO.java | 132 +--
 .../java/org/apache/beam/sdk/io/AvroUtils.java  |  40 ++
 .../java/org/apache/beam/sdk/io/AvroIOTest.java |  65 -
 3 files changed, 223 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ee1bcbae/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
index d308c85..f201114 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
@@ -21,6 +21,8 @@ import static 
com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
 import com.google.auto.value.AutoValue;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Supplier;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Maps;
 import com.google.common.io.BaseEncoding;
@@ -54,14 +56,17 @@ import org.apache.beam.sdk.values.PDone;
  * {@link PTransform}s for reading and writing Avro files.
  *
  * To read a {@link PCollection} from one or more Avro files, use {@code 
AvroIO.read()}, using
- * {@link AvroIO.Read#from} to specify the filename or filepattern to read 
from. See {@link
- * FileSystems} for information on supported file systems and filepatterns.
+ * {@link AvroIO.Read#from} to specify the filename or filepattern to read 
from. Alternatively, if
+ * the filepatterns to be read are themselves in a {@link PCollection}, apply 
{@link #readAll}.
+ *
+ * See {@link FileSystems} for information on supported file systems and 
filepatterns.
  *
  * To read specific records, such as Avro-generated classes, use {@link 
#read(Class)}. To read
  * {@link GenericRecord GenericRecords}, use {@link 
#readGenericRecords(Schema)} which takes a
  * {@link Schema} object, or {@link #readGenericRecords(String)} which takes 
an Avro schema in a
  * JSON-encoded string form. An exception will be thrown if a record doesn't 
match the specified
- * schema.
+ * schema. Likewise, to read a {@link PCollection} of filepatterns, apply 
{@link
+ * #readAllGenericRecords}.
  *
  * For example:
  *
@@ -79,6 +84,18 @@ import org.apache.beam.sdk.values.PDone;
  *.from("gs://my_bucket/path/to/records-*.avro"));
  * }
  *
+ * Reading from a {@link PCollection} of filepatterns:
+ *
+ * {@code
+ * Pipeline p = ...;
+ *
+ * PCollection filepatterns = p.apply(...);
+ * PCollection records =
+ * filepatterns.apply(AvroIO.read(AvroAutoGenClass.class));
+ * PCollection genericRecords =
+ * filepatterns.apply(AvroIO.readGenericRecords(schema));
+ * }
+ *
  * To write a {@link PCollection} to one or more Avro files, use {@link 
AvroIO.Write}, using
  * {@code AvroIO.write().to(String)} to specify the output filename prefix. 
The default {@link
  * DefaultFilenamePolicy} will use this prefix, in conjunction with a {@link 
ShardNameTemplate} (set
@@ -133,6 +150,18 @@ public class AvroIO {
 .build();
   }
 
+  /** Like {@link #read}, but reads each filepattern in the input {@link 
PCollection}. */
+  public static  ReadAll readAll(Class recordClass) {
+return new AutoValue_AvroIO_ReadAll.Builder()
+.setRecordClass(recordClass)
+.setSchema(ReflectData.get().getSchema(recordClass))
+// 64MB is a reasonable value that allows to amortize the cost of 
opening files,
+// but is not so large as to exhaust a typical runner's maximum amount 
of output per
+// ProcessElement call.
+.setDesiredBundleSizeBytes(64 * 1024 * 1024L)
+.build();
+  }
+
   /** Reads Avro file(s) containing records of the specified schema. */
   public static Read readGenericRecords(Schema schema) {
 return new AutoValue_AvroIO_Read.Builder()
@@ -142,6 +171,17 @@ public class AvroIO {
   }
 
   /**
+   * Like {@link #readGenericRecords(Schema)}, but reads each filepattern in 
the input {@link
+   * PCollection}.
+   */
+  public static R

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2704

2017-07-25 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PerformanceTests_Python #142

2017-07-25 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3641: Add logging when Staging takes a long time

2017-07-25 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3641

Add logging when Staging takes a long time

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam log_during_staging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3641


commit c43b2ca8006f7528b572752d3062233f63af1df4
Author: Thomas Groh 
Date:   2017-07-25T23:54:48Z

Add logging when Staging takes a long time




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3640: Translate combining operations through the Runner A...

2017-07-25 Thread robertwb
GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3640

Translate combining operations through the Runner API.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-api-combine

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3640


commit 9d34084d0d6a4887f04aefad7d2316d192d778d8
Author: Robert Bradshaw 
Date:   2017-07-25T23:12:16Z

Translate combining operations through the Runner API.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2566) Java SDK harness should not depend on any runner

2017-07-25 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100948#comment-16100948
 ] 

Kenneth Knowles commented on BEAM-2566:
---

[~tgroh] if I recall, I think you even did this once before...

> Java SDK harness should not depend on any runner
> 
>
> Key: BEAM-2566
> URL: https://issues.apache.org/jira/browse/BEAM-2566
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now there is a dependency on the Dataflow runner. I believe this is 
> legacy due to using {{CloudObject}} temporarily but I do not claim to 
> understand the full nature of the dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2566) Java SDK harness should not depend on any runner

2017-07-25 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-2566:
-

Assignee: Thomas Groh  (was: Luke Cwik)

> Java SDK harness should not depend on any runner
> 
>
> Key: BEAM-2566
> URL: https://issues.apache.org/jira/browse/BEAM-2566
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now there is a dependency on the Dataflow runner. I believe this is 
> legacy due to using {{CloudObject}} temporarily but I do not claim to 
> understand the full nature of the dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3648

2017-07-25 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2703

2017-07-25 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1234) Consider a hint ParDo.withHighFanout()

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100703#comment-16100703
 ] 

ASF GitHub Bot commented on BEAM-1234:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3580


> Consider a hint ParDo.withHighFanout()
> --
>
> Key: BEAM-1234
> URL: https://issues.apache.org/jira/browse/BEAM-1234
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Priority: Minor
>
> I'm finding myself again and again suggesting users on StackOverflow to 
> insert fusion breaks after high-fanout ParDo's.
> I think we should just implement this as a hint on ParDo and MapElements 
> transforms, like we have on GroupByKey.fewKeys() or 
> Combine.withHotKeyFanout().
> E.g.: c.apply(ParDo.of(some high-fanout DoFn).withHighFanout()), and a runner 
> that implements fusion could decide to insert a runner-specific fusion break. 
> This somewhat sidesteps the issues in 
> https://issues.apache.org/jira/browse/BEAM-730 and 
> https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
>  because every runner can decide how to do the right thing, or is free to 
> ignore the hint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3580: Let IsBounded take True value.

2017-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3580


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: Let IsBounded take True value.

2017-07-25 Thread robertwb
Let IsBounded take True value.

This is useful for languages like Python that may use this in a conditional 
statement (or allow assignment from True->1/False->0).


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2e8ed5a9
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2e8ed5a9
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2e8ed5a9

Branch: refs/heads/master
Commit: 2e8ed5a903dbc681b03b75ed1d55ebf2dac6fab6
Parents: 73da9cc
Author: Robert Bradshaw 
Authored: Mon Jul 17 16:01:18 2017 -0700
Committer: Robert Bradshaw 
Committed: Tue Jul 25 13:18:38 2017 -0700

--
 sdks/common/runner-api/src/main/proto/beam_runner_api.proto | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/2e8ed5a9/sdks/common/runner-api/src/main/proto/beam_runner_api.proto
--
diff --git a/sdks/common/runner-api/src/main/proto/beam_runner_api.proto 
b/sdks/common/runner-api/src/main/proto/beam_runner_api.proto
index 0c433fa..42e2601 100644
--- a/sdks/common/runner-api/src/main/proto/beam_runner_api.proto
+++ b/sdks/common/runner-api/src/main/proto/beam_runner_api.proto
@@ -288,8 +288,8 @@ message TimerSpec {
 }
 
 enum IsBounded {
-  BOUNDED = 0;
-  UNBOUNDED = 1;
+  UNBOUNDED = 0;
+  BOUNDED = 1;
 }
 
 // The payload for the primitive Read transform.



[1/2] beam git commit: Closes #3580

2017-07-25 Thread robertwb
Repository: beam
Updated Branches:
  refs/heads/master 73da9cc40 -> 71196ec9c


Closes #3580


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/71196ec9
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/71196ec9
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/71196ec9

Branch: refs/heads/master
Commit: 71196ec9cab9f393a73d068dd3ecbc255352dd0e
Parents: 73da9cc 2e8ed5a
Author: Robert Bradshaw 
Authored: Tue Jul 25 13:18:38 2017 -0700
Committer: Robert Bradshaw 
Committed: Tue Jul 25 13:18:38 2017 -0700

--
 sdks/common/runner-api/src/main/proto/beam_runner_api.proto | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--




[jira] [Commented] (BEAM-2640) Introduce Create.ofProvider(ValueProvider)

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100547#comment-16100547
 ] 

ASF GitHub Bot commented on BEAM-2640:
--

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3639

[BEAM-2640, BEAM-2641] Introduces TextIO.read().withHintMatchesManyFiles()

Reincarnation of https://github.com/apache/beam/pull/3598 which got closed 
by accident

In that case it expands to TextIO.readAll(). Implementing this when the 
filepattern is a ValueProvider nudged me to also implement BEAM-2640 - 
Create.ofProvider(ValueProvider).

Links:
https://issues.apache.org/jira/browse/BEAM-2640
https://issues.apache.org/jira/browse/BEAM-2641

R: @reuvenlaxCC: @sammcveety


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam textio-readall

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3639


commit 8f594b2726bd85b60df5d8e93a2055586137cd60
Author: Eugene Kirpichov 
Date:   2017-07-19T18:50:58Z

[BEAM-2640] Introduces Create.ofProvider(ValueProvider)

I also converted DatastoreV1 to use this overload, and, as an
exercise, added a withQuery(ValueProvider) overload to JdbcIO.

commit a3a4e861db8936bf0ba169b6b089a4a1fcfd6557
Author: Eugene Kirpichov 
Date:   2017-07-19T18:51:33Z

[BEAM-2641] Introduces TextIO.read().withHintMatchesManyFiles()

In that case it expands to TextIO.readAll().




> Introduce Create.ofProvider(ValueProvider)
> --
>
> Key: BEAM-2640
> URL: https://issues.apache.org/jira/browse/BEAM-2640
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> When you have a ValueProvider that may or may not be accessible at 
> construction time, a common task is to wrap it into a single-element 
> PCollection. This is especially common when migrating an IO connector that 
> used something like Create.of(query) followed by a ParDo, to having query be 
> a ValueProvider.
> Currently this is done in an icky way (e.g. 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java#L615)
> We should have a convenience helper for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3639: [BEAM-2640, BEAM-2641] Introduces TextIO.read().wit...

2017-07-25 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3639

[BEAM-2640, BEAM-2641] Introduces TextIO.read().withHintMatchesManyFiles()

Reincarnation of https://github.com/apache/beam/pull/3598 which got closed 
by accident

In that case it expands to TextIO.readAll(). Implementing this when the 
filepattern is a ValueProvider nudged me to also implement BEAM-2640 - 
Create.ofProvider(ValueProvider).

Links:
https://issues.apache.org/jira/browse/BEAM-2640
https://issues.apache.org/jira/browse/BEAM-2641

R: @reuvenlaxCC: @sammcveety


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam textio-readall

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3639


commit 8f594b2726bd85b60df5d8e93a2055586137cd60
Author: Eugene Kirpichov 
Date:   2017-07-19T18:50:58Z

[BEAM-2640] Introduces Create.ofProvider(ValueProvider)

I also converted DatastoreV1 to use this overload, and, as an
exercise, added a withQuery(ValueProvider) overload to JdbcIO.

commit a3a4e861db8936bf0ba169b6b089a4a1fcfd6557
Author: Eugene Kirpichov 
Date:   2017-07-19T18:51:33Z

[BEAM-2641] Introduces TextIO.read().withHintMatchesManyFiles()

In that case it expands to TextIO.readAll().




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-07-25 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100536#comment-16100536
 ] 

Chamikara Jayalath commented on BEAM-2490:
--

I suspect this was due to https://issues.apache.org/jira/browse/BEAM-2497. I 
think we can close this if this issue could not be reproduced with the fix 
(https://github.com/apache/beam/pull/3428) applied.

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2679) SQL DSL API Review

2017-07-25 Thread Tyler Akidau (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100529#comment-16100529
 ] 

Tyler Akidau commented on BEAM-2679:


Yes, taken.

> SQL DSL API Review
> --
>
> Key: BEAM-2679
> URL: https://issues.apache.org/jira/browse/BEAM-2679
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Tyler Akidau
>  Labels: dsl_sql_merge
>
> 7. API Review: This will be a large addition to the API. We will need to have 
> a holistic API review of the proposed additions before merge. Note: this may 
> result in large changes to the API surface. Committers involved will do their 
> best to prevent this as things evolve, but it is always a possibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2679) SQL DSL API Review

2017-07-25 Thread Tyler Akidau (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Akidau reassigned BEAM-2679:
--

Assignee: Tyler Akidau

> SQL DSL API Review
> --
>
> Key: BEAM-2679
> URL: https://issues.apache.org/jira/browse/BEAM-2679
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Tyler Akidau
>  Labels: dsl_sql_merge
>
> 7. API Review: This will be a large addition to the API. We will need to have 
> a holistic API review of the proposed additions before merge. Note: this may 
> result in large changes to the API surface. Committers involved will do their 
> best to prevent this as things evolve, but it is always a possibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2640) Introduce Create.ofProvider(ValueProvider)

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100482#comment-16100482
 ] 

ASF GitHub Bot commented on BEAM-2640:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3598


> Introduce Create.ofProvider(ValueProvider)
> --
>
> Key: BEAM-2640
> URL: https://issues.apache.org/jira/browse/BEAM-2640
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> When you have a ValueProvider that may or may not be accessible at 
> construction time, a common task is to wrap it into a single-element 
> PCollection. This is especially common when migrating an IO connector that 
> used something like Create.of(query) followed by a ParDo, to having query be 
> a ValueProvider.
> Currently this is done in an icky way (e.g. 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java#L615)
> We should have a convenience helper for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3598: [BEAM-2640, BEAM-2641] Introduces TextIO.read().wit...

2017-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3598


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3599: ReduceFnRunner: test when watermark leapfrogs EOW a...

2017-07-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3599


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3598: ReduceFnRunner: test when watermark leapfrogs EOW and GC

2017-07-25 Thread kenn
This closes #3598: ReduceFnRunner: test when watermark leapfrogs EOW and GC


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/73da9cc4
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/73da9cc4
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/73da9cc4

Branch: refs/heads/master
Commit: 73da9cc403eb8baf91b79fb45605d564da26266e
Parents: 01408c8 22b8296
Author: Kenneth Knowles 
Authored: Tue Jul 25 10:53:30 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jul 25 10:53:30 2017 -0700

--
 .../beam/runners/core/ReduceFnRunnerTest.java   | 83 
 1 file changed, 83 insertions(+)
--




[1/2] beam git commit: ReduceFnRunner: test when watermark leapfrogs EOW and GC

2017-07-25 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 01408c864 -> 73da9cc40


ReduceFnRunner: test when watermark leapfrogs EOW and GC

This is known to fail in older versions; forward porting regression test.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/22b82969
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/22b82969
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/22b82969

Branch: refs/heads/master
Commit: 22b82969828508cfbd45e4f90fe74dfed7914b88
Parents: 01408c8
Author: Kenneth Knowles 
Authored: Wed Jul 19 15:27:20 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jul 25 10:52:55 2017 -0700

--
 .../beam/runners/core/ReduceFnRunnerTest.java   | 83 
 1 file changed, 83 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/22b82969/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
--
diff --git 
a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
 
b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
index 4f13af1..2341502 100644
--- 
a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
+++ 
b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
@@ -39,6 +39,7 @@ import static org.mockito.Mockito.withSettings;
 import com.google.common.collect.Iterables;
 import java.util.List;
 import org.apache.beam.runners.core.metrics.MetricsContainerImpl;
+import org.apache.beam.runners.core.triggers.DefaultTriggerStateMachine;
 import org.apache.beam.runners.core.triggers.TriggerStateMachine;
 import org.apache.beam.sdk.coders.VarIntCoder;
 import org.apache.beam.sdk.metrics.MetricName;
@@ -247,6 +248,88 @@ public class ReduceFnRunnerTest {
   }
 
   /**
+   * When the watermark passes the end-of-window and window expiration time
+   * in a single update, this tests that it does not crash.
+   */
+  @Test
+  public void testSessionEowAndGcTogether() throws Exception {
+ReduceFnTester, IntervalWindow> tester =
+ReduceFnTester.nonCombining(
+Sessions.withGapDuration(Duration.millis(10)),
+DefaultTriggerStateMachine.of(),
+AccumulationMode.ACCUMULATING_FIRED_PANES,
+Duration.millis(50),
+ClosingBehavior.FIRE_ALWAYS);
+
+tester.setAutoAdvanceOutputWatermark(true);
+
+tester.advanceInputWatermark(new Instant(0));
+injectElement(tester, 1);
+tester.advanceInputWatermark(new Instant(100));
+
+assertThat(
+tester.extractOutput(),
+contains(
+isSingleWindowedValue(
+contains(1), 1, 1, 11, PaneInfo.createPane(true, true, 
Timing.ON_TIME;
+  }
+
+  /**
+   * When the watermark passes the end-of-window and window expiration time
+   * in a single update, this tests that it does not crash.
+   */
+  @Test
+  public void testFixedWindowsEowAndGcTogether() throws Exception {
+ReduceFnTester, IntervalWindow> tester =
+ReduceFnTester.nonCombining(
+FixedWindows.of(Duration.millis(10)),
+DefaultTriggerStateMachine.of(),
+AccumulationMode.ACCUMULATING_FIRED_PANES,
+Duration.millis(50),
+ClosingBehavior.FIRE_ALWAYS);
+
+tester.setAutoAdvanceOutputWatermark(true);
+
+tester.advanceInputWatermark(new Instant(0));
+injectElement(tester, 1);
+tester.advanceInputWatermark(new Instant(100));
+
+assertThat(
+tester.extractOutput(),
+contains(
+isSingleWindowedValue(
+contains(1), 1, 0, 10, PaneInfo.createPane(true, true, 
Timing.ON_TIME;
+  }
+
+  /**
+   * When the watermark passes the end-of-window and window expiration time
+   * in a single update, this tests that it does not crash.
+   */
+  @Test
+  public void testFixedWindowsEowAndGcTogetherFireIfNonEmpty() throws 
Exception {
+ReduceFnTester, IntervalWindow> tester =
+ReduceFnTester.nonCombining(
+FixedWindows.of(Duration.millis(10)),
+DefaultTriggerStateMachine.of(),
+AccumulationMode.ACCUMULATING_FIRED_PANES,
+Duration.millis(50),
+ClosingBehavior.FIRE_IF_NON_EMPTY);
+
+tester.setAutoAdvanceOutputWatermark(true);
+
+tester.advanceInputWatermark(new Instant(0));
+injectElement(tester, 1);
+tester.advanceInputWatermark(new Instant(100));
+
+List>> output = tester.extractOutput();
+assertThat(
+output,
+contains(
+isSingleWindowedValue(
+contains(1), 1, 0, 10, PaneInfo.createPane(true, true, 
Timing.ON_TIME;
+  }
+
+  /**
* Tests that with the default trigger we will not

[jira] [Commented] (BEAM-2679) SQL DSL API Review

2017-07-25 Thread Xu Mingmin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100474#comment-16100474
 ] 

Xu Mingmin commented on BEAM-2679:
--

[~takidau] could you take the task to drive an API review?

> SQL DSL API Review
> --
>
> Key: BEAM-2679
> URL: https://issues.apache.org/jira/browse/BEAM-2679
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>  Labels: dsl_sql_merge
>
> 7. API Review: This will be a large addition to the API. We will need to have 
> a holistic API review of the proposed additions before merge. Note: this may 
> result in large changes to the API surface. Committers involved will do their 
> best to prevent this as things evolve, but it is always a possibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2679) SQL DSL API Review

2017-07-25 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-2679:


 Summary: SQL DSL API Review
 Key: BEAM-2679
 URL: https://issues.apache.org/jira/browse/BEAM-2679
 Project: Beam
  Issue Type: Task
  Components: dsl-sql
Reporter: Xu Mingmin


7. API Review: This will be a large addition to the API. We will need to have a 
holistic API review of the proposed additions before merge. Note: this may 
result in large changes to the API surface. Committers involved will do their 
best to prevent this as things evolve, but it is always a possibility.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-07-25 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100426#comment-16100426
 ] 

Ahmet Altay commented on BEAM-2490:
---

What are the next steps on this issue? Can we close it?

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-07-25 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062089#comment-16062089
 ] 

Ahmet Altay edited comment on BEAM-2490 at 7/25/17 5:38 PM:


gzip issue could be related to https://issues.apache.org/jira/browse/BEAM-2497 
[~wileeam], are you running against head with the fix 
(https://github.com/apache/beam/pull/3428) ?


was (Author: altay):
gzip issue could be related to https://issues.apache.org/jira/browse/BEAM-2490 
[~wileeam], are you running against head with the fix 
(https://github.com/apache/beam/pull/3428) ?

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100330#comment-16100330
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3638

[BEAM-1347] Add utility to be able to model inbound reading as a single 
input stream

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
This is towards having both Beam Fn State and Beam Fn Data APIs share code 
for consuming from a logical input stream.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3638


commit 1709961e6cd8755ad8ee13d93ba3dc4a6c56d3d7
Author: Luke Cwik 
Date:   2017-07-25T16:02:41Z

[BEAM-1347] Add utility to be able to model inbound reading as a single 
input stream




> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3638: [BEAM-1347] Add utility to be able to model inbound...

2017-07-25 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3638

[BEAM-1347] Add utility to be able to model inbound reading as a single 
input stream

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
This is towards having both Beam Fn State and Beam Fn Data APIs share code 
for consuming from a logical input stream.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3638


commit 1709961e6cd8755ad8ee13d93ba3dc4a6c56d3d7
Author: Luke Cwik 
Date:   2017-07-25T16:02:41Z

[BEAM-1347] Add utility to be able to model inbound reading as a single 
input stream




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2676) move BeamSqlRow and BeamSqlRowType to sdk/java/core

2017-07-25 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099989#comment-16099989
 ] 

Luke Cwik commented on BEAM-2676:
-

I would suggest moving this to *org.apache.beam.sdk.values* instead of 
*org.apache.beam.sdk.sd* since that is where KV is.

[~robertwb] What do you think?

> move BeamSqlRow and BeamSqlRowType to sdk/java/core
> ---
>
> Key: BEAM-2676
> URL: https://issues.apache.org/jira/browse/BEAM-2676
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>
> BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
> Beam, like joins, simple projections/expansions. It's more visible to move 
> them to sdk-java-core.
> It contains two parts:
> 1). remove SQL word in the name,
> BeamSqlRow --> BeamRow
> BeamSqlRowType --> BeamRowType
> 2). move from package {{org.apache.beam.dsls.sql.schema}} to 
> {{org.apache.beam.sdk.sd}} (sd stands for structure data), in module 
> {{beam-sdks-java-core}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2670) ParDoTest.testPipelineOptionsParameter* validatesRunner tests fail on Spark runner

2017-07-25 Thread Stas Levin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stas Levin reassigned BEAM-2670:


Assignee: Stas Levin  (was: Amit Sela)

> ParDoTest.testPipelineOptionsParameter* validatesRunner tests fail on Spark 
> runner
> --
>
> Key: BEAM-2670
> URL: https://issues.apache.org/jira/browse/BEAM-2670
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Stas Levin
>Priority: Minor
> Fix For: 2.1.0, 2.2.0
>
>
> Tests succeed when run alone but fail when the whole ParDoTest is run. May be 
> related to PipelineOptions reusing / not cleaning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2669) Kryo serialization exception when DStreams containing non-Kryo-serializable data are cached

2017-07-25 Thread Kobi Salant (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kobi Salant reassigned BEAM-2669:
-

Assignee: Kobi Salant  (was: Amit Sela)

> Kryo serialization exception when DStreams containing non-Kryo-serializable 
> data are cached
> ---
>
> Key: BEAM-2669
> URL: https://issues.apache.org/jira/browse/BEAM-2669
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.4.0, 0.5.0, 0.6.0, 2.0.0
>Reporter: Aviem Zur
>Assignee: Kobi Salant
>
> Today, when we detect re-use of a dataset in a pipeline in Spark runner we 
> eagerly cache it to avoid calculating the same data multiple times.
> ([EvaluationContext.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L148])
> When the dataset is bounded, which in Spark is represented by an {{RDD}}, we 
> call {{RDD#persist}} and use storage level provided by the user via 
> {{SparkPipelineOptions}}. 
> ([BoundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java#L103-L103])
> When the dataset is unbounded, which in Spark is represented by a {{DStream}} 
> we call {{DStream.cache()}} which defaults to persist the {{DStream}} using 
> storage level {{MEMORY_ONLY_SER}} 
> ([UnboundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java#L61])
>  
> ([DStream.scala|https://github.com/apache/spark/blob/v1.6.3/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L169])
> Storage level {{MEMORY_ONLY_SER}} means Spark will serialize the data using 
> its configured serializer. Since we configure this to be Kryo in a hard coded 
> fashion, this means the data will be serialized using Kryo. 
> ([SparkContextFactory.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java#L99-L99])
> Due to this, if your {{DStream}} contains non-Kryo-serializable data you will 
> encounter Kryo serialization exceptions and your task will fail.
> Possible actions we should consider:
> # Remove the hard coded Spark serializer configuration, this should be taken 
> from the user's configuration of Spark, no real reason for us to interfere 
> with this.
> # Use the user's configured storage level configuration from 
> {{SparkPipelineOptions}} when caching unbounded datasets ({{DStream}}s), same 
> as we do for bounded datasets.
> # Make caching of re-used datasets configurable in {{SparkPipelineOptions}} 
> (enable/disable). Although overloading our configuration with more options is 
> always something not to be taken lightly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-1773) Consider allowing Source#validate() to throw exception

2017-07-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved BEAM-1773.
--
   Resolution: Later
Fix Version/s: Not applicable

> Consider allowing Source#validate() to throw exception
> --
>
> Key: BEAM-1773
> URL: https://issues.apache.org/jira/browse/BEAM-1773
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: backward-incompatible
> Fix For: Not applicable
>
> Attachments: 1773.v4.patch, beam-1773.v1.patch, beam-1773.v2.patch
>
>
> In HDFSFileSource.java :
> {code}
>   @Override
>   public void validate() {
> ...
>   } catch (IOException | InterruptedException e) {
> throw new RuntimeException(e);
>   }
> {code}
> Source#validate() should be allowed to throw exception so that we don't 
> resort to using RuntimeException.
> Here was related thread on mailing list:
> http://search-hadoop.com/m/Beam/gfKHFOwE0uETxae?subj=Re+why+Source+validate+is+not+declared+to+throw+any+exception



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-1962) Connection should be closed in case start() throws exception

2017-07-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated BEAM-1962:
-
Description: 
In JmsIO#start() :
{code}
  try {
Connection connection;
if (spec.getUsername() != null) {
  connection =
  connectionFactory.createConnection(spec.getUsername(), 
spec.getPassword());
} else {
  connection = connectionFactory.createConnection();
}
connection.start();
this.connection = connection;
  } catch (Exception e) {
throw new IOException("Error connecting to JMS", e);
  }
{code}
If start() throws exception, connection should be closed.

  was:
In JmsIO#start() :

{code}
  try {
Connection connection;
if (spec.getUsername() != null) {
  connection =
  connectionFactory.createConnection(spec.getUsername(), 
spec.getPassword());
} else {
  connection = connectionFactory.createConnection();
}
connection.start();
this.connection = connection;
  } catch (Exception e) {
throw new IOException("Error connecting to JMS", e);
  }
{code}
If start() throws exception, connection should be closed.


> Connection should be closed in case start() throws exception
> 
>
> Key: BEAM-1962
> URL: https://issues.apache.org/jira/browse/BEAM-1962
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Ted Yu
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> In JmsIO#start() :
> {code}
>   try {
> Connection connection;
> if (spec.getUsername() != null) {
>   connection =
>   connectionFactory.createConnection(spec.getUsername(), 
> spec.getPassword());
> } else {
>   connection = connectionFactory.createConnection();
> }
> connection.start();
> this.connection = connection;
>   } catch (Exception e) {
> throw new IOException("Error connecting to JMS", e);
>   }
> {code}
> If start() throws exception, connection should be closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2335) Document various maven commands for running tests

2017-07-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated BEAM-2335:
-
Description: 
In this discussion thread, various maven commands for running / not running 
selected tests were mentioned:

http://search-hadoop.com/m/Beam/gfKHFd9bPDh5WJr1?subj=Re+How+can+I+disable+running+Python+SDK+tests+when+testing+my+Java+change+

We should document these commands under 
https://beam.apache.org/contribute/testing/ 


Borisa raised the following questions:

how do I execute only one test marked as @NeedsRunner?
How do I execute one specific test in java io?
How to execute one pecific test in any of the runners?
How to use beamTestpipelineoptions with few json examples?
Will mvn clean verify execute ALL tests against all runners?


For the #1 above, we can create profile which is used run tests in NeedsRunner 
category.
See the following:
http://stackoverflow.com/questions/3100924/how-to-run-junit-tests-by-category-in-maven

  was:
In this discussion thread, various maven commands for running / not running 
selected tests were mentioned:

http://search-hadoop.com/m/Beam/gfKHFd9bPDh5WJr1?subj=Re+How+can+I+disable+running+Python+SDK+tests+when+testing+my+Java+change+

We should document these commands under 
https://beam.apache.org/contribute/testing/ 

Borisa raised the following questions:

how do I execute only one test marked as @NeedsRunner?
How do I execute one specific test in java io?
How to execute one pecific test in any of the runners?
How to use beamTestpipelineoptions with few json examples?
Will mvn clean verify execute ALL tests against all runners?


For the #1 above, we can create profile which is used run tests in NeedsRunner 
category.
See the following:
http://stackoverflow.com/questions/3100924/how-to-run-junit-tests-by-category-in-maven


> Document various maven commands for running tests
> -
>
> Key: BEAM-2335
> URL: https://issues.apache.org/jira/browse/BEAM-2335
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ted Yu
>  Labels: document
>
> In this discussion thread, various maven commands for running / not running 
> selected tests were mentioned:
> http://search-hadoop.com/m/Beam/gfKHFd9bPDh5WJr1?subj=Re+How+can+I+disable+running+Python+SDK+tests+when+testing+my+Java+change+
> We should document these commands under 
> https://beam.apache.org/contribute/testing/ 
> Borisa raised the following questions:
> how do I execute only one test marked as @NeedsRunner?
> How do I execute one specific test in java io?
> How to execute one pecific test in any of the runners?
> How to use beamTestpipelineoptions with few json examples?
> Will mvn clean verify execute ALL tests against all runners?
> For the #1 above, we can create profile which is used run tests in 
> NeedsRunner category.
> See the following:
> http://stackoverflow.com/questions/3100924/how-to-run-junit-tests-by-category-in-maven



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3647

2017-07-25 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #4450

2017-07-25 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2702

2017-07-25 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Python #141

2017-07-25 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam7 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 01408c864e9d844f4ffb74cc3f18276ff6a5c447 (origin/master)
Commit message: "This closes #3334: [BEAM-2333] Go to proto and back before 
running a pipeline in Java DirectRunner"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 01408c864e9d844f4ffb74cc3f18276ff6a5c447
 > git rev-list 01408c864e9d844f4ffb74cc3f18276ff6a5c447 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins8090089138042491092.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins2422507024687697766.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6425068728222051930.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.1.1 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3201858934638189151.sh
+ pip install --user -e 'sdks/python/[gcp,test]'
Obtaining 
file://
  Running setup.py 
(path:
 egg_info for package from 
file://

:66:
 UserWarning: You are using version 1.5.4 of pip. However, version 7.0.0 is 
recommended.
  _PIP_VERSION, REQUIRED_PIP_VERSION
no previously-included directories found matching 'doc/.build'

Installed 


warning: no files found matching 'README.md'
warning: no files fou