[beam-site] 01/02: parallelise test

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 52561dbe6137d6c1d26ebab69dbefc25986f4bde
Author: Taro Murao 
AuthorDate: Wed Feb 28 15:16:38 2018 +0100

parallelise test
---
 README.md | 2 ++
 Rakefile  | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a61dc2b..b0e00a9 100644
--- a/README.md
+++ b/README.md
@@ -21,6 +21,8 @@ This repository contains:
 
 ### Setup
 
+You need Ruby version >= 2.2.0 to build the project.
+
 Install [Ruby Gems](https://rubygems.org/pages/download), a package management 
framework for Ruby.
 
 Install [Bundler](http://bundler.io/v1.3/rationale.html), which  we use to 
specify dependencies and ensure
diff --git a/Rakefile b/Rakefile
index 1c858cc..605ed5b 100644
--- a/Rakefile
+++ b/Rakefile
@@ -1,5 +1,6 @@
 require 'fileutils'
 require 'html-proofer'
+require 'etc'
 
 task :test do
   FileUtils.rm_rf('./.testcontent')
@@ -10,6 +11,7 @@ task :test do
   :connecttimeout => 40 },
 :allow_hash_href => true,
 :check_html => true,
-:file_ignore => [/javadoc/, /v2/, /pydoc/]
+:file_ignore => [/javadoc/, /v2/, /pydoc/],
+:parallel => { :in_processes => Etc.nprocessors },
 }).run
 end

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 02/02: This closes #395

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit fedc74b0b5c6d101dee9da155ebe218f06aa5e63
Merge: 789d0e2 52561db
Author: Mergebot 
AuthorDate: Tue Mar 20 22:59:23 2018 -0700

This closes #395

 README.md | 2 ++
 Rakefile  | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


Build failed in Jenkins: beam_PostRelease_NightlySnapshot #144

2018-03-20 Thread Apache Jenkins Server
See 


--
[...truncated 3.10 KB...]
Applying build_rules.gradle to core-construction-java
applyJavaNature with [artifactId:beam-runners-core-construction-java] for 
project core-construction-java
Generating :runQuickstartJavaDirect
Generating :runMobileGamingJavaDirect
Applying build_rules.gradle to google-cloud-dataflow-java
applyJavaNature with [enableFindbugs:false, 
artifactId:beam-runners-google-cloud-dataflow-java] for project 
google-cloud-dataflow-java
Applying build_rules.gradle to google-cloud-platform
applyJavaNature with [artifactId:beam-sdks-java-io-google-cloud-platform, 
enableFindbugs:false] for project google-cloud-platform
Applying build_rules.gradle to google-cloud-platform-core
applyJavaNature with 
[artifactId:beam-sdks-java-extensions-google-cloud-platform-core] for project 
google-cloud-platform-core
Generating :runQuickstartJavaDataflow
Generating :runMobileGamingJavaDataflow
Applying build_rules.gradle to apex
applyJavaNature with [artifactId:beam-runners-apex] for project apex
Generating :runQuickstartJavaApex
Applying build_rules.gradle to spark
applyJavaNature with [artifactId:beam-runners-spark] for project spark
Generating :runQuickstartJavaSpark
:release:compileJava NO-SOURCE
:release:compileGroovy
:release:processResources NO-SOURCE
:release:classes
:runners:apex:runQuickstartJavaApex
:runners:direct-java:runMobileGamingJavaDirect
:runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow
:runners:flink:runQuickstartJavaFlinkLocal
Repository URL: https://repository.apache.org/content/repositories/snapshots
Version: 2.5.0-SNAPSHOT
GCS Project: apache-beam-testing
GCS Storage bucket: 
temp-storage-for-release-validation-tests/nightly-snapshot-validation
BigQuery Dataset: beam_postrelease_mobile_gaming
PubSub Topic: java_mobile_gaming_topic
Repository URL: https://repository.apache.org/content/repositories/snapshots
Version: 2.5.0-SNAPSHOT
**
* Scenario: Run Apache Beam Java SDK Mobile Gaming Examples - Direct
**
mvn archetype:generate   -DarchetypeGroupId=org.apache.beam   
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples   
-DarchetypeVersion= 2.5.0-SNAPSHOT  -DgroupId=org.example   
-DartifactId=word-count-beam   -Dversion="0.1"   
-Dpackage=org.apache.beam.examples   -DinteractiveMode=false
**
* Scenario: Run Apache Beam Java SDK Quickstart - Apex
**

**
* Test: Gets the WordCount Example Code
**

mvn archetype:generate   -DarchetypeGroupId=org.apache.beam   
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples   
-DarchetypeVersion= 2.5.0-SNAPSHOT  -DgroupId=org.example   
-DartifactId=word-count-beam   -Dversion="0.1"   
-Dpackage=org.apache.beam.examples   -DinteractiveMode=false
Using maven /home/jenkins/tools/maven/apache-maven-3.5.2
Using maven /home/jenkins/tools/maven/apache-maven-3.5.2
Repository URL: https://repository.apache.org/content/repositories/snapshots
Version: 2.5.0-SNAPSHOT
Repository URL: https://repository.apache.org/content/repositories/snapshots
Version: 2.5.0-SNAPSHOT
GCS Project: apache-beam-testing
GCS Storage bucket: 
temp-storage-for-release-validation-tests/nightly-snapshot-validation**
* Scenario: Run Apache Beam Java SDK Quickstart - Flink Local
**

BigQuery Dataset: beam_postrelease_mobile_gaming
PubSub Topic: java_mobile_gaming_topic

**
* Test: Gets the WordCount Example Code
**

mvn archetype:generate   -DarchetypeGroupId=org.apache.beam   
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples   
-DarchetypeVersion= 2.5.0-SNAPSHOT  -DgroupId=org.example   
-DartifactId=word-count-beam   -Dversion="0.1"   
-Dpackage=org.apache.beam.examples   -DinteractiveMode=false
**
* Scenario: Run Apache Beam Java SDK Mobile Gaming Examples - Dataflow
**
Using maven /home/jenkins/tools/maven/apache-maven-3.5.2
mvn archetype:generate   -DarchetypeGroupId=org.apache.beam   
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples   
-DarchetypeVersion= 2.5.0-SNAPSHOT  -DgroupId=org.example   
-DartifactId=word-count-beam   -Dversion="0.1"   
-Dpackage=org.apache.beam.examples   -DinteractiveMode=false
Using maven /home/jenkins/tools/maven/apache-maven-3.5.2
[INFO] Scanning for projects...
[INFO] Scanning for projects...
[INFO] Scanning for projects...
[INFO] 
[INFO] 

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=82629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82629
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 21/Mar/18 05:46
Start Date: 21/Mar/18 05:46
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-374839138
 
 
   Run Dataflow PostRelease


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82629)
Time Spent: 71h  (was: 70h 50m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 71h
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] 02/02: This closes #395

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 9ca810ea44180bf8ba37687c52743d51234d0360
Merge: 789d0e2 09616ca
Author: Mergebot 
AuthorDate: Tue Mar 20 22:41:07 2018 -0700

This closes #395

 README.md | 2 ++
 Rakefile  | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 01/02: parallelise test

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 09616ca8096c0d57c2588d942486de5838be1b15
Author: Taro Murao 
AuthorDate: Wed Feb 28 15:16:38 2018 +0100

parallelise test
---
 README.md | 2 ++
 Rakefile  | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a61dc2b..b0e00a9 100644
--- a/README.md
+++ b/README.md
@@ -21,6 +21,8 @@ This repository contains:
 
 ### Setup
 
+You need Ruby version >= 2.2.0 to build the project.
+
 Install [Ruby Gems](https://rubygems.org/pages/download), a package management 
framework for Ruby.
 
 Install [Bundler](http://bundler.io/v1.3/rationale.html), which  we use to 
specify dependencies and ensure
diff --git a/Rakefile b/Rakefile
index 1c858cc..605ed5b 100644
--- a/Rakefile
+++ b/Rakefile
@@ -1,5 +1,6 @@
 require 'fileutils'
 require 'html-proofer'
+require 'etc'
 
 task :test do
   FileUtils.rm_rf('./.testcontent')
@@ -10,6 +11,7 @@ task :test do
   :connecttimeout => 40 },
 :allow_hash_href => true,
 :check_html => true,
-:file_ignore => [/javadoc/, /v2/, /pydoc/]
+:file_ignore => [/javadoc/, /v2/, /pydoc/],
+:parallel => { :in_processes => Etc.nprocessors },
 }).run
 end

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] branch mergebot updated (5e2de4d -> 9ca810e)

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from 5e2de4d  This closes #397
 add 789d0e2  Prepare repository for deployment.
 new 09616ca  parallelise test
 new 9ca810e  This closes #395

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 README.md |   2 +
 Rakefile  |   4 +-
 content/documentation/dsls/sql/index.html | 462 +++---
 3 files changed, 293 insertions(+), 175 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 01/01: Prepare repository for deployment.

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 789d0e25de0cbfeec1414b201fc5821ed90b94fb
Author: Mergebot 
AuthorDate: Tue Mar 20 22:40:24 2018 -0700

Prepare repository for deployment.
---
 content/documentation/dsls/sql/index.html | 462 +++---
 1 file changed, 288 insertions(+), 174 deletions(-)

diff --git a/content/documentation/dsls/sql/index.html 
b/content/documentation/dsls/sql/index.html
index 8a002d5..cb2b147 100644
--- a/content/documentation/dsls/sql/index.html
+++ b/content/documentation/dsls/sql/index.html
@@ -132,7 +132,7 @@
   1. Overview
   2. Usage of DSL APIs
 
-  BeamRecord
+  Row
   BeamSql
 
   
@@ -152,128 +152,159 @@
   
 Beam SQL
 
-
-  1. Overview
-  2. Usage of DSL APIs
-  BeamRecord
-  BeamSql
-
-  
-  3. 
Functionality in Beam SQL
-  3.1. Supported 
Features
-  3.2. Data 
Types
-  3.3. Built-in SQL functions
-
-  
-  4. 
Internals of Beam SQL
-
-
 This page describes the implementation of Beam SQL, and how to simplify a 
Beam pipeline with DSL APIs.
 
 1. Overview
 
-SQL is a well-adopted standard to process data with concise syntax. With 
DSL APIs (currently available only in Java), now PCollections can be queried with standard SQL 
statements, like a regular table. The DSL APIs leverage http://calcite.apache.org/;>Apache Calcite to parse and optimize SQL 
queries, then translate into a composite Beam PTransform. In this way, both SQL and normal 
Beam SQL is a well-adopted standard to process data with concise syntax. With 
DSL APIs (currently available only in Java), now PCollections can be queried with standard SQL 
statements, like a regular table. The DSL APIs leverage http://calcite.apache.org/;>Apache Calcite to parse and optimize SQL 
queries, then translate into a composite Beam PTransform. In this way, both SQL and normal 
Beam There are two main pieces to the SQL DSL API:
 
 
-  BeamRecord:
 a new data type used to define composite records (i.e., rows) that consist of 
multiple, named columns of primitive data types. All SQL DSL queries must be 
made against collections of type PCollectionBeamRecord. Note that BeamRecord itself is not SQL-specific, 
however, and may also be u [...]
-  BeamSql:
 the interface for creating PTransforms 
from SQL queries.
+  BeamSql:
 the interface for creating PTransforms 
from SQL queries;
+  Row
 contains named columns with corresponding data types. Beam SQL queries can be 
made only against collections of type PCollectionRow;
 
 
 We’ll look at each of these below.
 
 2. Usage of DSL APIs
 
-BeamRecord
+Row
 
-Before applying a SQL query to a PCollection, the data in the collection must 
be in BeamRecord format. A BeamRecord represents a single, immutable row 
in a Beam SQL PCollection. The names and 
types of the fields/columns in the record are defined by its associated PCollection, the data in the collection must 
be in Row format. A Row represents a single, immutable record in a 
Beam SQL PCollection. The names and 
types of the fields/columns in the row are defined by its associated RowSqlType.builder()
 to create RowTypes, it allows creating 
schemas with all supported SQL types (see Data Types 
for more details on supported primitive data types).
 
-A PCollectionBeamRecord can 
be created explicitly or implicitly:
+A PCollectionRow can be 
obtained multiple ways, for example:
 
-Explicitly:
 
-  From in-memory data (typically for unit testing). In 
this case, the record type and coder must be specified explicitly:
-// Define the 
record type (i.e., schema).
-ListString fieldNames = Arrays.asList("appId", "description", 
"rowtime");
-ListInteger fieldTypes = Arrays.asList(Types.INTEGER, Types.VARCHAR, 
Types.TIMESTAMP);
-BeamRecordSqlType appType = BeamRecordSqlType.create(fieldNames, fieldTypes);
-
-// Create a concrete row with that type.
-BeamRecord row = new BeamRecord(nameType, 1, "Some cool app", new Date());
-
-//create a source PCollection containing only that row.
-PCollectionBeamRecord testApps = PBegin
-.in(p)
-.apply(Create.of(row)
- .withCoder(nameType.getRecordCoder()));
+  
+From in-memory data (typically for unit testing).
+
+Note: you have to explicitly specify the Row coder. In this example we’re doing it by 
calling Create.of(..).withCoder():
+
+// Define the record type (i.e., 
schema).
+RowType appType = 
+RowSqlType
+  .builder()
+  .withIntegerField("appId")
+  .withVarcharField("description")
+  .withTimestampField("rowtime")
+  .build();
+
+// Create a concrete row with that type.
+Row row = 
+Row
+  .withRowType(appType)
+  .addValues(1, "Some cool app", new Date())
+  .build();
+
+// Create a source PCollection containing only that row
+PCollectionRow testApps 
= 
+   

[beam-site] branch asf-site updated (7e6ecc9 -> 789d0e2)

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from 7e6ecc9  Prepare repository for deployment.
 add 7bc10c7  Update SQL doc to match new APIs
 add 5e2de4d  This closes #397
 new 789d0e2  Prepare repository for deployment.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/documentation/dsls/sql/index.html | 462 +++---
 src/documentation/dsls/sql.md | 337 +++---
 2 files changed, 514 insertions(+), 285 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=82627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82627
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 21/Mar/18 05:38
Start Date: 21/Mar/18 05:38
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-374838339
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82627)
Time Spent: 70h 50m  (was: 70h 40m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 70h 50m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] branch mergebot updated (a57c91a -> 5e2de4d)

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


 discard a57c91a  This closes #395
 discard 2ceb407  parallelise test
 new 7bc10c7  Update SQL doc to match new APIs
 new 5e2de4d  This closes #397

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (a57c91a)
\
 N -- N -- N   refs/heads/mergebot (5e2de4d)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 README.md |   2 -
 Rakefile  |   4 +-
 src/documentation/dsls/sql.md | 337 --
 3 files changed, 227 insertions(+), 116 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 02/02: This closes #397

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 5e2de4dbd9f623f0e572d4ec9c525d7cfa041110
Merge: 7e6ecc9 7bc10c7
Author: Mergebot 
AuthorDate: Tue Mar 20 22:35:35 2018 -0700

This closes #397

 src/documentation/dsls/sql.md | 337 --
 1 file changed, 226 insertions(+), 111 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 01/02: Update SQL doc to match new APIs

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 7bc10c7e44469d97bd0b1df25ff851882df03c77
Author: akedin 
AuthorDate: Fri Mar 2 11:16:18 2018 -0800

Update SQL doc to match new APIs

BeamSql.querySimple() and queryMulti() were combined into query().
BeamRecord was renamed to Row. Factory methods and builders were added to 
it.
---
 src/documentation/dsls/sql.md | 337 --
 1 file changed, 226 insertions(+), 111 deletions(-)

diff --git a/src/documentation/dsls/sql.md b/src/documentation/dsls/sql.md
index 2f6fafc..a6289dc 100644
--- a/src/documentation/dsls/sql.md
+++ b/src/documentation/dsls/sql.md
@@ -7,104 +7,145 @@ permalink: /documentation/dsls/sql/
 
 # Beam SQL
 
-* TOC
-{:toc}
-
 This page describes the implementation of Beam SQL, and how to simplify a Beam 
pipeline with DSL APIs.
 
 ## 1. Overview {#overview}
 
-SQL is a well-adopted standard to process data with concise syntax. With DSL 
APIs (currently available only in Java), now `PCollection`s can be queried with 
standard SQL statements, like a regular table. The DSL APIs leverage [Apache 
Calcite](http://calcite.apache.org/) to parse and optimize SQL queries, then 
translate into a composite Beam `PTransform`. In this way, both SQL and normal 
Beam `PTransform`s can be mixed in the same pipeline.
+SQL is a well-adopted standard to process data with concise syntax. With DSL 
APIs (currently available only in Java), now `PCollections` can be queried with 
standard SQL statements, like a regular table. The DSL APIs leverage [Apache 
Calcite](http://calcite.apache.org/) to parse and optimize SQL queries, then 
translate into a composite Beam `PTransform`. In this way, both SQL and normal 
Beam `PTransforms` can be mixed in the same pipeline.
 
 There are two main pieces to the SQL DSL API:
 
-* [BeamRecord]({{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest }}/index.html?org/apache/beam/sdk/values/BeamRecord.html): 
a new data type used to define composite records (i.e., rows) that consist of 
multiple, named columns of primitive data types. All SQL DSL queries must be 
made against collections of type `PCollection`. Note that 
`BeamRecord` itself is not SQL-specific, however, and may also be used in 
pipelines that do not utilize SQL.
-* [BeamSql]({{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest 
}}/index.html?org/apache/beam/sdk/extensions/sql/BeamSql.html): the interface 
for creating `PTransforms` from SQL queries.
+* [BeamSql]({{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest 
}}/index.html?org/apache/beam/sdk/extensions/sql/BeamSql.html): the interface 
for creating `PTransforms` from SQL queries;
+* [Row]({{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/values/Row.html) contains named columns with 
corresponding data types. Beam SQL queries can be made only against collections 
of type `PCollection`;
 
 We'll look at each of these below.
 
 ## 2. Usage of DSL APIs {#usage}
 
-### BeamRecord
+### Row
 
-Before applying a SQL query to a `PCollection`, the data in the collection 
must be in `BeamRecord` format. A `BeamRecord` represents a single, immutable 
row in a Beam SQL `PCollection`. The names and types of the fields/columns in 
the record are defined by its associated [BeamRecordType]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/values/BeamRecordType.html); for SQL queries, 
you should use the [BeamRecordSqlType]({{ site.baseurl [...]
+Before applying a SQL query to a `PCollection`, the data in the collection 
must be in `Row` format. A `Row` represents a single, immutable record in a 
Beam SQL `PCollection`. The names and types of the fields/columns in the row 
are defined by its associated [RowType]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/values/RowType.html).
+For SQL queries, you should use the [RowSqlType.builder()]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/extensions/sql/RowSqlType.html) to create 
`RowTypes`, it allows creating schemas with all supported SQL types (see [Data 
Types](#data-types) for more details on supported primitive data types).
 
 
-A `PCollection` can be created explicitly or implicitly:
+A `PCollection` can be obtained multiple ways, for example:
 
-Explicitly:
-  * **From in-memory data** (typically for unit testing). In this case, the 
record type and coder must be specified explicitly:
-```
+  * **From in-memory data** (typically for unit testing).
+
+**Note:** you have to explicitly specify the `Row` coder. In this example 
we're doing it by calling `Create.of(..).withCoder()`:
+
+

[beam-site] 01/02: parallelise test

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 2ceb4072123e28ea7cb053dea48b37829e4c4b1b
Author: Taro Murao 
AuthorDate: Wed Feb 28 15:16:38 2018 +0100

parallelise test
---
 README.md | 2 ++
 Rakefile  | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a61dc2b..b0e00a9 100644
--- a/README.md
+++ b/README.md
@@ -21,6 +21,8 @@ This repository contains:
 
 ### Setup
 
+You need Ruby version >= 2.2.0 to build the project.
+
 Install [Ruby Gems](https://rubygems.org/pages/download), a package management 
framework for Ruby.
 
 Install [Bundler](http://bundler.io/v1.3/rationale.html), which  we use to 
specify dependencies and ensure
diff --git a/Rakefile b/Rakefile
index 1c858cc..605ed5b 100644
--- a/Rakefile
+++ b/Rakefile
@@ -1,5 +1,6 @@
 require 'fileutils'
 require 'html-proofer'
+require 'etc'
 
 task :test do
   FileUtils.rm_rf('./.testcontent')
@@ -10,6 +11,7 @@ task :test do
   :connecttimeout => 40 },
 :allow_hash_href => true,
 :check_html => true,
-:file_ignore => [/javadoc/, /v2/, /pydoc/]
+:file_ignore => [/javadoc/, /v2/, /pydoc/],
+:parallel => { :in_processes => Etc.nprocessors },
 }).run
 end

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] branch mergebot updated (b706197 -> a57c91a)

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from b706197  This closes #399
 add 7e6ecc9  Prepare repository for deployment.
 new 2ceb407  parallelise test
 new a57c91a  This closes #395

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 README.md  |  2 ++
 Rakefile   |  4 ++-
 content/contribute/release-guide/index.html|  6 ++--
 .../documentation/sdks/python-custom-io/index.html |  5 ++-
 .../get-started/mobile-gaming-example/index.html   | 39 ++
 content/get-started/support/index.html |  3 +-
 6 files changed, 24 insertions(+), 35 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 02/02: This closes #395

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit a57c91a274074549ba5fc98864081a6d2717b44c
Merge: 7e6ecc9 2ceb407
Author: Mergebot 
AuthorDate: Tue Mar 20 22:32:29 2018 -0700

This closes #395

 README.md | 2 ++
 Rakefile  | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=82626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82626
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 21/Mar/18 05:31
Start Date: 21/Mar/18 05:31
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on a change in pull request #4788: 
[BEAM-3339] Mobile gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#discussion_r175987434
 
 

 ##
 File path: release/src/main/groovy/MoblieGamingJavaUtils.groovy
 ##
 @@ -0,0 +1,182 @@
+#!groovy
+import java.text.SimpleDateFormat
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+class MobileGamingJavaUtils {
+
+public static final RUNNERS = [DirectRunner: "direct-runner",
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82626)
Time Spent: 70h 40m  (was: 70.5h)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 70h 40m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] branch asf-site updated (98818ae -> 7e6ecc9)

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from 98818ae  Update instructions on cryptographic hashes.
 add f0d486e  Increase link checking timeouts
 add b706197  This closes #399
 new 7e6ecc9  Prepare repository for deployment.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 Rakefile   |  3 ++
 content/contribute/release-guide/index.html|  6 ++--
 .../documentation/sdks/python-custom-io/index.html |  5 ++-
 .../get-started/mobile-gaming-example/index.html   | 39 ++
 content/get-started/support/index.html |  3 +-
 5 files changed, 22 insertions(+), 34 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 01/01: Prepare repository for deployment.

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 7e6ecc92a8546747781575c925bd799bb7f5928c
Author: Mergebot 
AuthorDate: Tue Mar 20 22:30:35 2018 -0700

Prepare repository for deployment.
---
 content/contribute/release-guide/index.html|  6 ++--
 .../documentation/sdks/python-custom-io/index.html |  5 ++-
 .../get-started/mobile-gaming-example/index.html   | 39 ++
 content/get-started/support/index.html |  3 +-
 4 files changed, 19 insertions(+), 34 deletions(-)

diff --git a/content/contribute/release-guide/index.html 
b/content/contribute/release-guide/index.html
index 8ba599b..4af9749 100644
--- a/content/contribute/release-guide/index.html
+++ b/content/contribute/release-guide/index.html
@@ -633,11 +633,9 @@ RC_TAG="v${VERSION}-RC${RC_NUM}"
   
 Create hashes for source files and sign the python source file file
 
- sha1sum 
apache-beam-${VERSION}-source-release.zip  
apache-beam-${VERSION}-source-release.zip.sha1
- md5sum apache-beam-${VERSION}-source-release.zip  
apache-beam-${VERSION}-source-release.zip.md5
+ sha512sum 
apache-beam-${VERSION}-source-release.zip  
apache-beam-${VERSION}-source-release.zip.sha512
  gpg --armor --detach-sig apache-beam-${VERSION}-python.zip
- sha1sum apache-beam-${VERSION}-python.zip  
apache-beam-${VERSION}-python.zip.sha1
- md5sum apache-beam-${VERSION}-python.zip  
apache-beam-${VERSION}-python.zip.md5
+ sha512sum apache-beam-${VERSION}-python.zip  
apache-beam-${VERSION}-python.zip.sha512
 
 
   
diff --git a/content/documentation/sdks/python-custom-io/index.html 
b/content/documentation/sdks/python-custom-io/index.html
index 123373d..2aa1cd2 100644
--- a/content/documentation/sdks/python-custom-io/index.html
+++ b/content/documentation/sdks/python-custom-io/index.html
@@ -510,7 +510,10 @@
 table_name = 'table' + uid
 return SimpleKVWriter(self._simplekv, access_token, table_name)
 
-  def finalize_write(self, access_token, table_names):
+  def pre_finalize(self, init_result, writer_results):
+pass
+
+  def finalize_write(self, access_token, table_names, pre_finalize_result):
 for i, table_name in enumerate(table_names):
   self._simplekv.rename_table(
   access_token, table_name, self._final_table_name + str(i))
diff --git a/content/get-started/mobile-gaming-example/index.html 
b/content/get-started/mobile-gaming-example/index.html
index 8a63425..cc1ac57 100644
--- a/content/get-started/mobile-gaming-example/index.html
+++ b/content/get-started/mobile-gaming-example/index.html
@@ -626,8 +626,7 @@ logical windows based on when those scores occurred in 
event time.
   options = PipelineOptions(pipeline_args)
 
   # We also require the --project option to access 
--dataset
-  project = options.view_as(GoogleCloudOptions).project
-  if project is None:
+  if options.view_as(GoogleCloudOptions).project is 
None:
 parser.print_usage()
 print(sys.argv[0] + ': error: argument --project is 
required')
 sys.exit(1)
@@ -636,23 +635,18 @@ logical windows based on when those scores occurred in 
event time.
   # workflow rely on global context (e.g., a module imported 
at module level).
   options.view_as(SetupOptions).save_main_session = True
 
-  table_spec = '{}:{}.{}'.format(project, args.dataset, args.table_name)
-  table_schema = (
-  'team:STRING, '
-  'total_score:INTEGER, '
-  'window_start:STRING')
-
   with beam.Pipeline(options=options) as 
p:
 (p  # 
pylint: disable=expression-not-assigned
  | 'ReadInputText'  beam.io.ReadFromText(args.input)
  | 'HourlyTeamScore'  HourlyTeamScore(
  args.start_min, args.stop_min, args.window_duration)
  | 'TeamScoresDict'  beam.ParDo(TeamScoresDict())
- | 'WriteTeamScoreSums' 
 beam.io.WriteToBigQuery(
- table_spec,
- schema=table_schema,
- create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
- write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))
+ | 'WriteTeamScoreSums' 
 WriteToBigQuery(
+ args.table_name, args.dataset, {
+ 'team': 'STRING',
+ 'total_score': 
'INTEGER',
+ 'window_start': 
'STRING',
+ }))
 
 
 
@@ -997,13 +991,6 @@ late results.
 # updates for late data. Uses the side input derived above 
--the set of
 # suspected robots-- to filter out scores from those users 
from the sum.
 # Write the results to BigQuery.
-team_table_spec = table_spec_prefix + '_teams'
-team_table_schema = (
-'team:STRING, '
-'total_score:INTEGER, '
-'window_start:STRING, '
-'processing_time: STRING')
-
 (raw_events  # 
pylint: disable=expression-not-assigned
  | 'WindowIntoFixedWindows' 
 beam.WindowInto(
  beam.window.FixedWindows(fixed_window_duration))
@@ -1054,9 +1041,6 @@ between instances are.
 # from further 

[beam-site] 01/02: Increase link checking timeouts

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit f0d486e1b90301ac0619d099e39d0bc44e7df466
Author: melissa 
AuthorDate: Wed Mar 7 15:36:20 2018 -0800

Increase link checking timeouts
---
 Rakefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Rakefile b/Rakefile
index b92022d..1c858cc 100644
--- a/Rakefile
+++ b/Rakefile
@@ -5,6 +5,9 @@ task :test do
   FileUtils.rm_rf('./.testcontent')
   sh "bundle exec jekyll build --config _config.yml,_config_test.yml"
   HTMLProofer.check_directory("./.testcontent", {
+:typhoeus => {
+  :timeout => 60,
+  :connecttimeout => 40 },
 :allow_hash_href => true,
 :check_html => true,
 :file_ignore => [/javadoc/, /v2/, /pydoc/]

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 02/02: This closes #399

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit b706197891d5dd003568325d34e7c9259aa5fd9b
Merge: 98818ae f0d486e
Author: Mergebot 
AuthorDate: Tue Mar 20 22:25:34 2018 -0700

This closes #399

 Rakefile | 3 +++
 1 file changed, 3 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] branch mergebot updated (db02a1a -> b706197)

2018-03-20 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


 discard db02a1a  This closes #399
 discard 99ed544  Increase link checking timeouts
 new f0d486e  Increase link checking timeouts
 new b706197  This closes #399

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (db02a1a)
\
 N -- N -- N   refs/heads/mergebot (b706197)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


Build failed in Jenkins: beam_PostCommit_Python_ValidatesContainer_Dataflow #75

2018-03-20 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-3817] Switch BQ write to not use side input

[herohde] Add TODO to revert Go IO to use side input

[axelmagn] Fix StateRequestHandler interface to be idiomatic

[herohde] Add Go support for universal runners, incl Flink

[herohde] CR: Fixed comments for job service helper functions

[iemejia] Add missing ASF license to ExecutableStageTranslation file

[yifanzou] [BEAM-3840] Get python mobile-gaming automating on core runners

[sidhom] [BEAM-3565] Clean up ExecutableStage

[wcn] Fix incorrect read of atomic counter.

[herohde] [BEAM-3893] Add fallback to unauthenticated access for GCS IO

[robertwb] [BEAM-3865] Fix watermark hold handling bug.

[robertwb] [BEAM-2927] Python support for dataflow portable side inputs over Fn 
API

[herohde] CR: fix typo

[aaltay] [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end

--
[...truncated 152.72 KB...]
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "pair_with_one.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s2"
}, 
"serialized_fn": "ref_AppliedPTransform_pair_with_one_5", 
"user_name": "pair_with_one"
  }
}, 
{
  "kind": "GroupByKey", 
  "name": "s4", 
  "properties": {
"display_data": [], 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:pair", 
  "component_encodings": [
{
  "@type": 
"StrUtf8Coder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlzBJUWhJWkWziAeVyGDZmMhY20hU5IeAAajEkY=",
 
  "component_encodings": []
}, 
{
  "@type": "kind:stream", 
  "component_encodings": [
{
  "@type": 
"VarIntCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxhiUWeeSXOIA5XIYNmYyFjbSFTkh4A89cR+g==",
 
  "component_encodings": []
}
  ], 
  "is_stream_like": true
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "group_and_sum/GroupByKey.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s3"
}, 
"serialized_fn": 
"%0AD%22B%0A%1Dref_Coder_GlobalWindowCoder_1%12%21%0A%1F%0A%1D%0A%1Bbeam%3Acoder%3Aglobal_window%3Av1jT%0A%25%0A%23%0A%21beam%3Awindowfn%3Aglobal_windows%3Av0.1%10%01%1A%1Dref_Coder_GlobalWindowCoder_1%22%02%3A%00%28%010%018%01H%01",
 
"user_name": "group_and_sum/GroupByKey"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s5", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CombineValuesDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CombineValuesDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=82624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82624
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 21/Mar/18 05:12
Start Date: 21/Mar/18 05:12
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on a change in pull request #4788: 
[BEAM-3339] Mobile gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#discussion_r175986054
 
 

 ##
 File path: release/src/main/groovy/MoblieGamingJavaUtils.groovy
 ##
 @@ -0,0 +1,182 @@
+#!groovy
+import java.text.SimpleDateFormat
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+class MobileGamingJavaUtils {
+
+public static final RUNNERS = [DirectRunner: "direct-runner",
+   DataflowRunner: "dataflow-runner",
+   SparkRunner: "spark-runner",
+   ApexRunner: "apex-runner",
+   FlinkRunner: "flink-runner"]
+
+public static final EXECUTION_TIMEOUT = 15
+
+// Lists used to verify team names generated in the LeaderBoard example
+public static final COLORS = new ArrayList<>(Arrays.asList(
+"Magenta",
+"AliceBlue",
+"Almond",
+"Amaranth",
+"Amber",
+"Amethyst",
+"AndroidGreen",
+"AntiqueBrass",
+"Fuchsia",
+"Ruby",
+"AppleGreen",
+"Apricot",
+"Aqua",
+"ArmyGreen",
+"Asparagus",
+"Auburn",
+"Azure",
+"Banana",
+"Beige",
+"Bisque",
+"BarnRed",
+"BattleshipGrey"))
+
+private static final USERSCORE_OUTPUT_PREFIX = "java-userscore-result-"
+
+private static final HOURLYTEAMSCORE_OUTPUT_PREFIX = 
"java-hourlyteamscore-result-"
+
+public static String generateCommand(String exampleName, String runner, 
TestScripts t){
+String commonArgs = "--tempLocation=gs://${t.gcsBucket()}/tmp 
--runner=${runner} "
+
+if(exampleName.equals("UserScore")){
+return generateUserScoreCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("HourlyTeamScore")){
+return generateHourlyTeamScoreCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("LeaderBoard")){
+return generateLeaderBoardCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("GameStats")){
+return generateGameStatsCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("Injector")){
+return generateInjectorCommand(t)
+}
+return "ERROR: Not found the example ${exampleName}."
+}
+
+public static String getUserScoreOutputName(String runner){
+return "${USERSCORE_OUTPUT_PREFIX}${RUNNERS[runner]}.txt"
+}
+
+public static String getHourlyTeamScoreOutputName(String runner){
+return "${HOURLYTEAMSCORE_OUTPUT_PREFIX}${RUNNERS[runner]}.txt"
+}
+
+
+private static String generateUserScoreCommand(String runner, TestScripts 
t, String commonArgs){
+StringBuilder cmd = new StringBuilder()
+StringBuilder exeArgs = new StringBuilder(commonArgs)
+
+exeArgs.append("--input=gs://${t.gcsBucket()}/5000_gaming_data.csv ")
+if(runner == "DataflowRunner"){
+exeArgs.append("--project=${t.gcpProject()} ")
+
.append("--output=gs://${t.gcsBucket()}/${getUserScoreOutputName(runner)} ")
+}
+else{
+exeArgs.append("--output=${getUserScoreOutputName(runner)} ")
+}
+
+cmd.append("mvn compile exec:java -q ")
+
.append("-Dexec.mainClass=org.apache.beam.examples.complete.game.UserScore ")
+.append("-Dexec.args=\"${exeArgs.toString()}\" ")
 
 Review comment:
   You're right. toString() removed.



[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=82623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82623
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 21/Mar/18 05:07
Start Date: 21/Mar/18 05:07
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on a change in pull request #4788: 
[BEAM-3339] Mobile gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#discussion_r175985686
 
 

 ##
 File path: release/src/main/groovy/MoblieGamingJavaUtils.groovy
 ##
 @@ -0,0 +1,182 @@
+#!groovy
+import java.text.SimpleDateFormat
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+class MobileGamingJavaUtils {
+
+public static final RUNNERS = [DirectRunner: "direct-runner",
+   DataflowRunner: "dataflow-runner",
+   SparkRunner: "spark-runner",
+   ApexRunner: "apex-runner",
+   FlinkRunner: "flink-runner"]
+
+public static final EXECUTION_TIMEOUT = 15
+
+// Lists used to verify team names generated in the LeaderBoard example
+public static final COLORS = new ArrayList<>(Arrays.asList(
+"Magenta",
+"AliceBlue",
+"Almond",
+"Amaranth",
+"Amber",
+"Amethyst",
+"AndroidGreen",
+"AntiqueBrass",
+"Fuchsia",
+"Ruby",
+"AppleGreen",
+"Apricot",
+"Aqua",
+"ArmyGreen",
+"Asparagus",
+"Auburn",
+"Azure",
+"Banana",
+"Beige",
+"Bisque",
+"BarnRed",
+"BattleshipGrey"))
+
+private static final USERSCORE_OUTPUT_PREFIX = "java-userscore-result-"
+
+private static final HOURLYTEAMSCORE_OUTPUT_PREFIX = 
"java-hourlyteamscore-result-"
+
+public static String generateCommand(String exampleName, String runner, 
TestScripts t){
+String commonArgs = "--tempLocation=gs://${t.gcsBucket()}/tmp 
--runner=${runner} "
+
+if(exampleName.equals("UserScore")){
+return generateUserScoreCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("HourlyTeamScore")){
+return generateHourlyTeamScoreCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("LeaderBoard")){
+return generateLeaderBoardCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("GameStats")){
+return generateGameStatsCommand(runner, t, commonArgs)
+}
+if(exampleName.equals("Injector")){
+return generateInjectorCommand(t)
+}
+return "ERROR: Not found the example ${exampleName}."
+}
+
+public static String getUserScoreOutputName(String runner){
+return "${USERSCORE_OUTPUT_PREFIX}${RUNNERS[runner]}.txt"
+}
+
+public static String getHourlyTeamScoreOutputName(String runner){
+return "${HOURLYTEAMSCORE_OUTPUT_PREFIX}${RUNNERS[runner]}.txt"
+}
+
+
+private static String generateUserScoreCommand(String runner, TestScripts 
t, String commonArgs){
+StringBuilder cmd = new StringBuilder()
+StringBuilder exeArgs = new StringBuilder(commonArgs)
+
+exeArgs.append("--input=gs://${t.gcsBucket()}/5000_gaming_data.csv ")
+if(runner == "DataflowRunner"){
+exeArgs.append("--project=${t.gcpProject()} ")
+
.append("--output=gs://${t.gcsBucket()}/${getUserScoreOutputName(runner)} ")
+}
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82623)
Time Spent: 

Build failed in Jenkins: beam_PostCommit_Python_ValidatesRunner_Dataflow #1153

2018-03-20 Thread Apache Jenkins Server
See 


Changes:

[herohde] Add Go support for universal runners, incl Flink

[herohde] CR: Fixed comments for job service helper functions

[herohde] [BEAM-3893] Add fallback to unauthenticated access for GCS IO

[robertwb] [BEAM-2927] Python support for dataflow portable side inputs over Fn 
API

[herohde] CR: fix typo

[aaltay] [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end

--
[...truncated 774.29 KB...]
"serialized_fn": "", 
"user_name": "assert_that/Match"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s16", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": ""
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:pair", 
  "component_encodings": [
{
  "@type": "kind:bytes"
}, 
{
  "@type": 
"VarIntCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxhiUWeeSXOIA5XIYNmYyFjbSFTkh4A89cR+g==",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "compute/MapToVoidKey0.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s2"
}, 
"serialized_fn": "", 
"user_name": "compute/MapToVoidKey0"
  }
}
  ], 
  "type": "JOB_TYPE_BATCH"
}
root: INFO: Create job: 
root: INFO: Created job with id: [2018-03-20_21_20_28-237846840698724677]
root: INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-20_21_20_28-237846840698724677?project=apache-beam-testing
root: INFO: Job 2018-03-20_21_20_28-237846840698724677 is in state 
JOB_STATE_PENDING
root: INFO: 2018-03-21T04:20:29.040Z: JOB_MESSAGE_WARNING: Job 
2018-03-20_21_20_28-237846840698724677 might autoscale up to 250 workers.
root: INFO: 2018-03-21T04:20:29.049Z: JOB_MESSAGE_DETAILED: Autoscaling is 
enabled for job 2018-03-20_21_20_28-237846840698724677. The number of workers 
will be between 1 and 250.
root: INFO: 2018-03-21T04:20:29.066Z: JOB_MESSAGE_DETAILED: Autoscaling was 
automatically enabled for job 2018-03-20_21_20_28-237846840698724677.
root: INFO: 2018-03-21T04:20:31.699Z: JOB_MESSAGE_DETAILED: Checking required 
Cloud APIs are enabled.
root: INFO: 2018-03-21T04:20:31.863Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
root: INFO: 2018-03-21T04:20:32.570Z: JOB_MESSAGE_DETAILED: Expanding 
CoGroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T04:20:32.602Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step assert_that/Group/GroupByKey: GroupByKey not followed by a 
combiner.
root: INFO: 2018-03-21T04:20:32.623Z: JOB_MESSAGE_DETAILED: Expanding 
GroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T04:20:32.646Z: JOB_MESSAGE_DETAILED: Lifting 
ValueCombiningMappingFns into MergeBucketsMappingFns
root: INFO: 2018-03-21T04:20:32.679Z: JOB_MESSAGE_DEBUG: Annotating graph with 
Autotuner information.
root: INFO: 2018-03-21T04:20:32.711Z: JOB_MESSAGE_DETAILED: Fusing adjacent 
ParDo, Read, Write, and Flatten operations
root: INFO: 2018-03-21T04:20:32.736Z: JOB_MESSAGE_DETAILED: Unzipping flatten 
s11 for input s10.out
root: INFO: 2018-03-21T04:20:32.759Z: JOB_MESSAGE_DETAILED: Fusing unzipped 
copy of assert_that/Group/GroupByKey/Reify, through flatten 
assert_that/Group/Flatten, into producer assert_that/Group/pair_with_1
root: INFO: 2018-03-21T04:20:32.774Z: JOB_MESSAGE_DETAILED: Fusing consumer 
assert_that/Group/GroupByKey/GroupByWindow into 
assert_that/Group/GroupByKey/Read
root: INFO: 2018-03-21T04:20:32.797Z: JOB_MESSAGE_DETAILED: Fusing consumer 
assert_that/Unkey into assert_that/Group/Map(_merge_tagged_vals_under_key)
root: 

Build failed in Jenkins: beam_PostCommit_Python_Verify #4472

2018-03-20 Thread Apache Jenkins Server
See 


Changes:

[herohde] Add Go support for universal runners, incl Flink

[herohde] CR: Fixed comments for job service helper functions

[herohde] [BEAM-3893] Add fallback to unauthenticated access for GCS IO

[robertwb] [BEAM-2927] Python support for dataflow portable side inputs over Fn 
API

[herohde] CR: fix typo

[aaltay] [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end

--
[...truncated 1.12 MB...]
 steps: []
 tempFiles: []
 type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)>
root: INFO: Created job with id: [2018-03-20_20_50_21-2321371576352290335]
root: INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-03-20_20_50_21-2321371576352290335?project=apache-beam-testing
root: INFO: Job 2018-03-20_20_50_21-2321371576352290335 is in state 
JOB_STATE_PENDING
root: INFO: 2018-03-21T03:50:21.226Z: JOB_MESSAGE_WARNING: Job 
2018-03-20_20_50_21-2321371576352290335 might autoscale up to 250 workers.
root: INFO: 2018-03-21T03:50:21.255Z: JOB_MESSAGE_DETAILED: Autoscaling is 
enabled for job 2018-03-20_20_50_21-2321371576352290335. The number of workers 
will be between 1 and 250.
root: INFO: 2018-03-21T03:50:21.282Z: JOB_MESSAGE_DETAILED: Autoscaling was 
automatically enabled for job 2018-03-20_20_50_21-2321371576352290335.
root: INFO: 2018-03-21T03:50:25.136Z: JOB_MESSAGE_DETAILED: Checking required 
Cloud APIs are enabled.
root: INFO: 2018-03-21T03:50:25.433Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
root: INFO: 2018-03-21T03:50:26.351Z: JOB_MESSAGE_DETAILED: Expanding 
CoGroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T03:50:26.369Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step write/Write/WriteImpl/GroupByKey: GroupByKey not followed by a 
combiner.
root: INFO: 2018-03-21T03:50:26.400Z: JOB_MESSAGE_DEBUG: Combiner lifting 
skipped for step group: GroupByKey not followed by a combiner.
root: INFO: 2018-03-21T03:50:26.429Z: JOB_MESSAGE_DETAILED: Expanding 
GroupByKey operations into optimizable parts.
root: INFO: 2018-03-21T03:50:26.452Z: JOB_MESSAGE_DETAILED: Lifting 
ValueCombiningMappingFns into MergeBucketsMappingFns
root: INFO: 2018-03-21T03:50:26.477Z: JOB_MESSAGE_DEBUG: Annotating graph with 
Autotuner information.
root: INFO: 2018-03-21T03:50:26.518Z: JOB_MESSAGE_DETAILED: Fusing adjacent 
ParDo, Read, Write, and Flatten operations
root: INFO: 2018-03-21T03:50:26.545Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/PreFinalize/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T03:50:26.566Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T03:50:26.588Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/PreFinalize/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T03:50:26.615Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1 into 
write/Write/WriteImpl/Extract
root: INFO: 2018-03-21T03:50:26.641Z: JOB_MESSAGE_DETAILED: Fusing consumer 
pair_with_one into split
root: INFO: 2018-03-21T03:50:26.664Z: JOB_MESSAGE_DETAILED: Fusing consumer 
group/Reify into pair_with_one
root: INFO: 2018-03-21T03:50:26.691Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/GroupByKey/Reify into 
write/Write/WriteImpl/WindowInto(WindowIntoFn)
root: INFO: 2018-03-21T03:50:26.722Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/WindowInto(WindowIntoFn) into write/Write/WriteImpl/Pair
root: INFO: 2018-03-21T03:50:26.749Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/WriteBundles/WriteBundles into format
root: INFO: 2018-03-21T03:50:26.781Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/Pair into write/Write/WriteImpl/WriteBundles/WriteBundles
root: INFO: 2018-03-21T03:50:26.811Z: JOB_MESSAGE_DETAILED: Fusing consumer 
split into read/Read
root: INFO: 2018-03-21T03:50:26.838Z: JOB_MESSAGE_DETAILED: Fusing consumer 
count into group/GroupByWindow
root: INFO: 2018-03-21T03:50:26.863Z: JOB_MESSAGE_DETAILED: Fusing consumer 
format into count
root: INFO: 2018-03-21T03:50:26.890Z: JOB_MESSAGE_DETAILED: Fusing consumer 
group/Write into group/Reify
root: INFO: 2018-03-21T03:50:26.918Z: JOB_MESSAGE_DETAILED: Fusing consumer 
group/GroupByWindow into group/Read
root: INFO: 2018-03-21T03:50:26.940Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/Extract into 
write/Write/WriteImpl/GroupByKey/GroupByWindow
root: INFO: 2018-03-21T03:50:26.965Z: JOB_MESSAGE_DETAILED: Fusing consumer 
write/Write/WriteImpl/GroupByKey/Write into 
write/Write/WriteImpl/GroupByKey/Reify
root: INFO: 2018-03-21T03:50:26.996Z: JOB_MESSAGE_DETAILED: Fusing consumer 

[jira] [Commented] (BEAM-3032) Add RedshiftIO

2018-03-20 Thread Jacob Marble (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407415#comment-16407415
 ] 

Jacob Marble commented on BEAM-3032:


[~jbonofre] thanks for your offer, I would love your input on this one!

The current branch is working, here's an example of usage:
[https://github.com/jacobmarble/beam-redshift-example]

The unit tests aren't even close; can we discuss the interface first? Some 
notes:
 * DataSourceConfiguration exists because Redshift's DataSource is not 
Serializable
 * Redshift.Write wraps Copy; Copy can also be used directly
 * Redshift.Read wraps Unload; Unload can also be used directly
 * COPY (SQL command) can ingest from S3 in any region; UNLOAD only writes to 
S3 in the same region; this implementation does not support the extra feature 
in COPY
 * RedshiftMarshaller is just a way to convert query results into objects, 
happy to improve this
 * StringToListAccumulator makes FileSystems.delete() more efficient by helping 
the FileSystem implementation to batch delete operations
 * Redshift.longestCommonPrefix() is probably over-the-top, but would like a 
second opinion before I refactor
 * TextIO.Read is instantiated with TextIO.read(); is this pattern preferred 
here?
 * Same question for .Write
 * Copy and Unload lack several available features, but this implementation 
probably satisfies 90%+ of potential use

[https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html]
[https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html]

> Add RedshiftIO
> --
>
> Key: BEAM-3032
> URL: https://issues.apache.org/jira/browse/BEAM-3032
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
> Environment: AWS Redshift
>Reporter: Jacob Marble
>Assignee: Jacob Marble
>Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I would like to add a RedshiftIO Java extension to perform bulk read/write 
> to/from AWS Redshift via the UNLOAD and COPY Redshift SQL commands. This 
> requires S3, which is the subject of BEAM-2500.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Spark #4461

2018-03-20 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Spark #1491

2018-03-20 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-3817] Switch BQ write to not use side input

[herohde] Add TODO to revert Go IO to use side input

[herohde] Add Go support for universal runners, incl Flink

[herohde] CR: Fixed comments for job service helper functions

[sidhom] [BEAM-3565] Clean up ExecutableStage

[wcn] Fix incorrect read of atomic counter.

[herohde] [BEAM-3893] Add fallback to unauthenticated access for GCS IO

[robertwb] [BEAM-3865] Fix watermark hold handling bug.

[robertwb] [BEAM-2927] Python support for dataflow portable side inputs over Fn 
API

[herohde] CR: fix typo

[aaltay] [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end

--
[...truncated 70.26 KB...]
2018-03-21 02:57:22,159 dc4143cc MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-21 02:57:46,585 dc4143cc MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-21 02:57:49,846 dc4143cc MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r2d692159f78a04ad_0162467db80b_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_r2d692159f78a04ad_0162467db80b_1 
... (0s) Current status: RUNNING
  Waiting on 
bqjob_r2d692159f78a04ad_0162467db80b_1 ... (0s) Current status: DONE   
2018-03-21 02:57:49,846 dc4143cc MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-21 02:58:12,769 dc4143cc MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-21 02:58:16,152 dc4143cc MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_rf9ae3f42024a524_0162467e1e63_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_rf9ae3f42024a524_0162467e1e63_1 
... (0s) Current status: RUNNING
 Waiting on 
bqjob_rf9ae3f42024a524_0162467e1e63_1 ... (0s) Current status: DONE   
2018-03-21 02:58:16,152 dc4143cc MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-21 02:58:45,805 dc4143cc MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-21 02:58:49,040 dc4143cc MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r15d8253c14c88ab6_0162467e9f89_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_r15d8253c14c88ab6_0162467e9f89_1 
... (0s) Current status: RUNNING
  Waiting on 
bqjob_r15d8253c14c88ab6_0162467e9f89_1 ... (0s) Current status: DONE   
2018-03-21 02:58:49,040 dc4143cc MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-21 02:59:04,070 dc4143cc MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-21 02:59:07,505 dc4143cc MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  

Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #6254

2018-03-20 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PerformanceTests_TextIOIT #292

2018-03-20 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Python #1047

2018-03-20 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-3817] Switch BQ write to not use side input

[herohde] Add TODO to revert Go IO to use side input

[herohde] Add Go support for universal runners, incl Flink

[herohde] CR: Fixed comments for job service helper functions

[sidhom] [BEAM-3565] Clean up ExecutableStage

[wcn] Fix incorrect read of atomic counter.

[herohde] [BEAM-3893] Add fallback to unauthenticated access for GCS IO

[robertwb] [BEAM-3865] Fix watermark hold handling bug.

[robertwb] [BEAM-2927] Python support for dataflow portable side inputs over Fn 
API

[herohde] CR: fix typo

[aaltay] [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end

--
[...truncated 2.88 KB...]
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/jenkins225377120102786920.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3576933539590339664.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Collecting absl-py (from -r PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Collecting colorlog[windows]==2.6.0 (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
  Using cached colorlog-2.6.0-py2.py3-none-any.whl
Collecting blinker>=1.3 (from -r PerfKitBenchmarker/requirements.txt (line 18))
Collecting futures>=3.0.3 (from -r PerfKitBenchmarker/requirements.txt (line 
19))
  Using cached futures-3.2.0-py2-none-any.whl
Requirement already satisfied: PyYAML==3.12 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Collecting pint>=0.7 (from -r PerfKitBenchmarker/requirements.txt (line 21))
Collecting numpy==1.13.3 (from -r PerfKitBenchmarker/requirements.txt (line 22))
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: functools32 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Collecting contextlib2>=0.5.1 (from -r PerfKitBenchmarker/requirements.txt 
(line 24))
  Using cached contextlib2-0.5.5-py2.py3-none-any.whl
Collecting pywinrm (from -r PerfKitBenchmarker/requirements.txt (line 25))
  Using cached pywinrm-0.3.0-py2.py3-none-any.whl
Requirement already satisfied: six in /usr/local/lib/python2.7/dist-packages 
(from absl-py->-r PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe>=0.23 in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: colorama; extra == "windows" in 
/usr/lib/python2.7/dist-packages (from colorlog[windows]==2.6.0->-r 
PerfKitBenchmarker/requirements.txt (line 17))
Collecting xmltodict (from pywinrm->-r PerfKitBenchmarker/requirements.txt 
(line 25))
  Using cached xmltodict-0.11.0-py2.py3-none-any.whl
Collecting requests-ntlm>=0.3.0 (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
  Using cached requests_ntlm-1.1.0-py2.py3-none-any.whl
Requirement already satisfied: requests>=2.9.1 in 
/usr/local/lib/python2.7/dist-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Collecting ntlm-auth>=1.0.2 (from requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
  Using cached ntlm_auth-1.1.0-py2.py3-none-any.whl
Collecting cryptography>=1.3 (from requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
  Using cached cryptography-2.2.1-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: urllib3<1.23,>=1.21.1 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: idna<2.7,>=2.5 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: certifi>=2017.4.17 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Collecting cffi>=1.7; platform_python_implementation != "PyPy" (from 
cryptography>=1.3->requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
  Using cached cffi-1.11.5-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: enum34; python_version < "3" in 
/usr/local/lib/python2.7/dist-packages (from 

Build failed in Jenkins: beam_PerformanceTests_JDBC #354

2018-03-20 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-3817] Switch BQ write to not use side input

[herohde] Add TODO to revert Go IO to use side input

[herohde] Add Go support for universal runners, incl Flink

[herohde] CR: Fixed comments for job service helper functions

[sidhom] [BEAM-3565] Clean up ExecutableStage

[wcn] Fix incorrect read of atomic counter.

[herohde] [BEAM-3893] Add fallback to unauthenticated access for GCS IO

[robertwb] [BEAM-3865] Fix watermark hold handling bug.

[robertwb] [BEAM-2927] Python support for dataflow portable side inputs over Fn 
API

[herohde] CR: fix typo

[aaltay] [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end

--
[...truncated 45.84 KB...]
[INFO] Excluding com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 from the 
shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-protobuf:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-auth:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-netty:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http2:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-buffer:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-common:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-transport:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-resolver:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-all:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-okhttp:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.squareup.okhttp:okhttp:jar:2.5.0 from the shaded jar.
[INFO] Excluding com.squareup.okio:okio:jar:1.6.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-lite:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-nano:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 
from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.opencensus:opencensus-api:jar:0.7.0 from the shaded jar.
[INFO] Excluding io.dropwizard.metrics:metrics-core:jar:3.1.2 from the shaded 
jar.
[INFO] Excluding com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded 
jar.
[INFO] Excluding io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client:jar:1.22.0 from the 
shaded jar.
[INFO] Excluding 

Jenkins build is back to normal : beam_PerformanceTests_Compressed_TextIOIT #276

2018-03-20 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PerformanceTests_AvroIOIT #278

2018-03-20 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3861) Build test infra for end-to-end streaming test in Python SDK

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3861?focusedWorklogId=82606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82606
 ]

ASF GitHub Bot logged work on BEAM-3861:


Author: ASF GitHub Bot
Created on: 21/Mar/18 01:58
Start Date: 21/Mar/18 01:58
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #4874: [BEAM-3861] Improve 
test infra in Python SDK for streaming end-to-end test
URL: https://github.com/apache/beam/pull/4874
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/examples/streaming_wordcount.py 
b/sdks/python/apache_beam/examples/streaming_wordcount.py
index 12f73510873..7ef95d85f1a 100644
--- a/sdks/python/apache_beam/examples/streaming_wordcount.py
+++ b/sdks/python/apache_beam/examples/streaming_wordcount.py
@@ -36,7 +36,7 @@
 
 def split_fn(lines):
   import re
-  return re.findall(r'[A-Za-z\']+', lines)
+  return re.findall(r'[A-Za-z0-9\']+', lines)
 
 
 def run(argv=None):
diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py 
b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
new file mode 100644
index 000..a95e5fa8f53
--- /dev/null
+++ b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
@@ -0,0 +1,102 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""End-to-end test for the streaming wordcount example.
+
+Important: End-to-end test infrastructure for streaming pipeine in Python SDK
+is in development and is not yet available for use.
+
+Currently, this test blocks until the job is manually terminated.
+"""
+
+import logging
+import unittest
+
+from hamcrest.core.core.allof import all_of
+from nose.plugins.attrib import attr
+
+from apache_beam.examples import streaming_wordcount
+from apache_beam.runners.runner import PipelineState
+from apache_beam.testing import test_utils
+from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher
+from apache_beam.testing.test_pipeline import TestPipeline
+
+INPUT_TOPIC = 'wc_topic_input'
+OUTPUT_TOPIC = 'wc_topic_output'
+INPUT_SUB = 'wc_subscription_input'
+OUTPUT_SUB = 'wc_subscription_output'
+
+DEFAULT_INPUT_NUMBERS = 500
+
+
+class StreamingWordCountIT(unittest.TestCase):
+
+  def setUp(self):
+self.test_pipeline = TestPipeline(is_integration_test=True)
+
+# Set up PubSub environment.
+from google.cloud import pubsub
+self.pubsub_client = pubsub.Client(
+project=self.test_pipeline.get_option('project'))
+self.input_topic = self.pubsub_client.topic(INPUT_TOPIC)
+self.output_topic = self.pubsub_client.topic(OUTPUT_TOPIC)
+self.input_sub = self.input_topic.subscription(INPUT_SUB)
+self.output_sub = self.output_topic.subscription(OUTPUT_SUB)
+
+self._cleanup_pubsub()
+
+self.input_topic.create()
+self.output_topic.create()
+test_utils.wait_for_topics_created([self.input_topic, self.output_topic])
+self.input_sub.create()
+self.output_sub.create()
+
+  def _inject_numbers(self, topic, num_messages):
+"""Inject numbers as test data to PubSub."""
+logging.debug('Injecting %d numbers to topic %s',
+  num_messages, topic.full_name)
+for n in range(num_messages):
+  topic.publish(str(n))
+
+  def _cleanup_pubsub(self):
+test_utils.cleanup_subscriptions([self.input_sub, self.output_sub])
+test_utils.cleanup_topics([self.input_topic, self.output_topic])
+
+  def tearDown(self):
+self._cleanup_pubsub()
+
+  @attr('developing_test')
+  def test_streaming_wordcount_it(self):
+# Set extra options to the pipeline for test purpose
+pipeline_verifiers = [PipelineStateMatcher(PipelineState.RUNNING)]
+extra_opts = {'input_sub': self.input_sub.full_name,
+  'output_topic': self.output_topic.full_name,
+  'on_success_matcher': all_of(*pipeline_verifiers)}
+
+# Generate input data and inject to 

[beam] branch master updated: [BEAM-3861] Improve test infra in Python SDK for streaming end-to-end test (#4874)

2018-03-20 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 776fd5a  [BEAM-3861] Improve test infra in Python SDK for streaming 
end-to-end test (#4874)
776fd5a is described below

commit 776fd5a6ae21352a20c388ecf23822e6def13854
Author: Mark Liu 
AuthorDate: Tue Mar 20 18:58:52 2018 -0700

[BEAM-3861] Improve test infra in Python SDK for streaming end-to-end test 
(#4874)

Create Python End-to-end Test for Streaming WordCount
---
 .../apache_beam/examples/streaming_wordcount.py|   2 +-
 .../examples/streaming_wordcount_it_test.py| 102 +
 .../runners/dataflow/test_dataflow_runner.py   |  37 +++-
 sdks/python/apache_beam/testing/test_pipeline.py   |   4 +-
 sdks/python/apache_beam/testing/test_utils.py  |  48 ++
 sdks/python/apache_beam/testing/test_utils_test.py |  55 ++-
 6 files changed, 242 insertions(+), 6 deletions(-)

diff --git a/sdks/python/apache_beam/examples/streaming_wordcount.py 
b/sdks/python/apache_beam/examples/streaming_wordcount.py
index 12f7351..7ef95d8 100644
--- a/sdks/python/apache_beam/examples/streaming_wordcount.py
+++ b/sdks/python/apache_beam/examples/streaming_wordcount.py
@@ -36,7 +36,7 @@ from apache_beam.options.pipeline_options import 
StandardOptions
 
 def split_fn(lines):
   import re
-  return re.findall(r'[A-Za-z\']+', lines)
+  return re.findall(r'[A-Za-z0-9\']+', lines)
 
 
 def run(argv=None):
diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py 
b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
new file mode 100644
index 000..a95e5fa
--- /dev/null
+++ b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
@@ -0,0 +1,102 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""End-to-end test for the streaming wordcount example.
+
+Important: End-to-end test infrastructure for streaming pipeine in Python SDK
+is in development and is not yet available for use.
+
+Currently, this test blocks until the job is manually terminated.
+"""
+
+import logging
+import unittest
+
+from hamcrest.core.core.allof import all_of
+from nose.plugins.attrib import attr
+
+from apache_beam.examples import streaming_wordcount
+from apache_beam.runners.runner import PipelineState
+from apache_beam.testing import test_utils
+from apache_beam.testing.pipeline_verifiers import PipelineStateMatcher
+from apache_beam.testing.test_pipeline import TestPipeline
+
+INPUT_TOPIC = 'wc_topic_input'
+OUTPUT_TOPIC = 'wc_topic_output'
+INPUT_SUB = 'wc_subscription_input'
+OUTPUT_SUB = 'wc_subscription_output'
+
+DEFAULT_INPUT_NUMBERS = 500
+
+
+class StreamingWordCountIT(unittest.TestCase):
+
+  def setUp(self):
+self.test_pipeline = TestPipeline(is_integration_test=True)
+
+# Set up PubSub environment.
+from google.cloud import pubsub
+self.pubsub_client = pubsub.Client(
+project=self.test_pipeline.get_option('project'))
+self.input_topic = self.pubsub_client.topic(INPUT_TOPIC)
+self.output_topic = self.pubsub_client.topic(OUTPUT_TOPIC)
+self.input_sub = self.input_topic.subscription(INPUT_SUB)
+self.output_sub = self.output_topic.subscription(OUTPUT_SUB)
+
+self._cleanup_pubsub()
+
+self.input_topic.create()
+self.output_topic.create()
+test_utils.wait_for_topics_created([self.input_topic, self.output_topic])
+self.input_sub.create()
+self.output_sub.create()
+
+  def _inject_numbers(self, topic, num_messages):
+"""Inject numbers as test data to PubSub."""
+logging.debug('Injecting %d numbers to topic %s',
+  num_messages, topic.full_name)
+for n in range(num_messages):
+  topic.publish(str(n))
+
+  def _cleanup_pubsub(self):
+test_utils.cleanup_subscriptions([self.input_sub, self.output_sub])
+test_utils.cleanup_topics([self.input_topic, self.output_topic])
+
+  def tearDown(self):
+self._cleanup_pubsub()
+
+  @attr('developing_test')
+  def test_streaming_wordcount_it(self):
+# Set extra options to the 

[jira] [Resolved] (BEAM-3893) Go SDK GCS I/O shouldn't require credentials to read from public buckets

2018-03-20 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-3893.
-
   Resolution: Fixed
Fix Version/s: 2.5.0

> Go SDK GCS I/O shouldn't require credentials to read from public buckets
> 
>
> Key: BEAM-3893
> URL: https://issues.apache.org/jira/browse/BEAM-3893
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should handle the case where application default credentials are not 
> available for public reads. Wordcount running a container fails currently:
> 2018/03/20 20:03:00 Failed to execute job: panic: panic: failed to create GCE 
> client: google: could not find default credentials. See 
> https://developers.google.com/accounts/docs/application-default-credentials 
> for more information. goroutine 1 [running]:
> runtime/debug.Stack(0xc4201728a0, 0xc39460, 0xc4201c0120)
>   /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
> github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx.CallNoPanic.func1(0xc420172f80)
>   
> /foo/src/github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx/call.go:98
>  +0x6e
> panic(0xc39460, 0xc4201c0120)
>   /usr/local/go/src/runtime/panic.go:491 +0x283
> github.com/apache/beam/sdks/go/pkg/beam/io/textio/gcs.New(0x12f3500, 
> 0xc420014060, 0xc420094c40, 0x2)
>   
> /foo/src/github.com/apache/beam/sdks/go/pkg/beam/io/textio/gcs/gcs.go:44 
> +0x16e
> github.com/apache/beam/sdks/go/pkg/beam/io/textio.newFileSystem(0x12f3500, 
> 0xc420014060, 0xc420094c40, 0x31, 0xc420172b50, 0x4bf4ef, 0xc37e60, 0xc37e60)
>   /foo/src/github.com/apache/beam/sdks/go/pkg/beam/io/textio/textio.go:68 
> +0xac



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82568
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 21/Mar/18 00:35
Start Date: 21/Mar/18 00:35
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4910: 
[BEAM-3895] Add Side Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#discussion_r175962764
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -139,14 +151,27 @@ default PTransform toPTransform() {
*/
   static ExecutableStage fromPayload(ExecutableStagePayload payload, 
Components components) {
 Environment environment = payload.getEnvironment();
-PCollectionNode input = PipelineNode.pCollection(payload.getInput(),
-components.getPcollectionsOrThrow(payload.getInput()));
-List transforms = payload.getTransformsList().stream()
-.map(id -> PipelineNode.pTransform(id, 
components.getTransformsOrThrow(id)))
-.collect(Collectors.toList());
-List outputs = payload.getOutputsList().stream()
-.map(id -> PipelineNode.pCollection(id, 
components.getPcollectionsOrThrow(id)))
-.collect(Collectors.toList());
-return ImmutableExecutableStage.of(environment, input, transforms, 
outputs);
+PCollectionNode input =
+PipelineNode.pCollection(
+payload.getInput(), 
components.getPcollectionsOrThrow(payload.getInput()));
 
 Review comment:
   https://issues.apache.org/jira/browse/BEAM-3899, leaving for follow-ups


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82568)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82567
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 21/Mar/18 00:35
Start Date: 21/Mar/18 00:35
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4910: 
[BEAM-3895] Add Side Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#discussion_r175962950
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -214,11 +214,13 @@ message ExecutableStagePayload {
   // Input PCollection id.
   string input = 2;
 
+  repeated string side_inputs = 3;
 
 Review comment:
   Added comments like `input` and `outputs`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82567)
Time Spent: 1h 20m  (was: 1h 10m)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82566
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 21/Mar/18 00:35
Start Date: 21/Mar/18 00:35
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4910: 
[BEAM-3895] Add Side Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#discussion_r175961702
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuser.java
 ##
 @@ -125,6 +128,7 @@ public static ExecutableStage forGrpcPortRead(
 return ImmutableExecutableStage.of(
 environment,
 inputPCollection,
+ImmutableSet.copyOf(sideInputs),
 
 Review comment:
   Point. The deep immutability of `ImmutableExecutableStage`, or at least it's 
collections, is a responsibility of that class, so it's really being defensive 
against abstraction failure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82566)
Time Spent: 1h 10m  (was: 1h)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3899) Calls to getXOrThrow could be wrapped with a more useful error message

2018-03-20 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-3899:
-

 Summary: Calls to getXOrThrow could be wrapped with a more useful 
error message
 Key: BEAM-3899
 URL: https://issues.apache.org/jira/browse/BEAM-3899
 Project: Beam
  Issue Type: Improvement
  Components: runner-core
Reporter: Thomas Groh


The exception thrown out of the generated code provides no information about 
the thing that was missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5190

2018-03-20 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-3287) Go SDK support for portable pipelines

2018-03-20 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-3287.
-
   Resolution: Fixed
Fix Version/s: 2.5.0

> Go SDK support for portable pipelines
> -
>
> Key: BEAM-3287
> URL: https://issues.apache.org/jira/browse/BEAM-3287
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
> Fix For: 2.5.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The Go SDK should participate in the portability framework, incl. job 
> submission w/ a docker container image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3287) Go SDK support for portable pipelines

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3287?focusedWorklogId=82561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82561
 ]

ASF GitHub Bot logged work on BEAM-3287:


Author: ASF GitHub Bot
Created on: 21/Mar/18 00:27
Start Date: 21/Mar/18 00:27
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #4888: [BEAM-3287] Add Go 
support for universal runners, incl Flink
URL: https://github.com/apache/beam/pull/4888
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/container/boot.go b/sdks/go/container/boot.go
index 5259246a276..a7357a2999b 100644
--- a/sdks/go/container/boot.go
+++ b/sdks/go/container/boot.go
@@ -77,24 +77,41 @@ func main() {
// (2) Retrieve the staged files.
//
// The Go SDK harness downloads the worker binary and invokes
-   // it. For now, we assume that the first (and only) package
-   // is the binary.
+   // it. The binary is required to be keyed as "worker", if there
+   // are more than one artifact.
 
dir := filepath.Join(*semiPersistDir, "staged")
artifacts, err := artifact.Materialize(ctx, *artifactEndpoint, dir)
if err != nil {
log.Fatalf("Failed to retrieve staged files: %v", err)
}
-   if len(artifacts) == 0 {
-   log.Fatal("No binaries staged")
+
+   const worker = "worker"
+   name := worker
+
+   switch len(artifacts) {
+   case 0:
+   log.Fatal("No artifacts staged")
+   case 1:
+   name = artifacts[0].Name
+   default:
+   found := false
+   for _, a := range artifacts {
+   if a.Name == worker {
+   found = true
+   break
+   }
+   }
+   if !found {
+   log.Fatalf("No artifact named '%v' found", worker)
+   }
}
 
// (3) The persist dir may be on a noexec volume, so we must
// copy the binary to a different location to execute.
-
-   prog := filepath.Join("/bin", artifacts[0].Name)
-   if err := copyExe(filepath.Join(dir, artifacts[0].Name), prog); err != 
nil {
-   log.Fatalf("Failed to copy binary: %v", err)
+   const prog = "/bin/worker"
+   if err := copyExe(filepath.Join(dir, name), prog); err != nil {
+   log.Fatalf("Failed to copy worker binary: %v", err)
}
 
args := []string{
diff --git a/sdks/go/pkg/beam/core/runtime/graphx/translate.go 
b/sdks/go/pkg/beam/core/runtime/graphx/translate.go
index 7c3519ada83..616a7f44d0f 100644
--- a/sdks/go/pkg/beam/core/runtime/graphx/translate.go
+++ b/sdks/go/pkg/beam/core/runtime/graphx/translate.go
@@ -28,12 +28,12 @@ import (
 const (
// Model constants
 
-   URNImpulse = "urn:beam:transform:impulse:v1"
+   URNImpulse = "beam:transform:impulse:v1"
URNParDo   = "urn:beam:transform:pardo:v1"
-   URNFlatten = "urn:beam:transform:flatten:v1"
-   URNGBK = "urn:beam:transform:groupbykey:v1"
-   URNCombine = "urn:beam:transform:combine:v1"
-   URNWindow  = "urn:beam:transform:window:v1"
+   URNFlatten = "beam:transform:flatten:v1"
+   URNGBK = "beam:transform:group_by_key:v1"
+   URNCombine = "beam:transform:combine:v1"
+   URNWindow  = "beam:transform:window:v1"
 
URNGlobalWindowsWindowFn = "beam:windowfn:global_windows:v0.1"
 
@@ -42,7 +42,7 @@ const (
// TODO: remove URNJavaDoFN when the Dataflow runner
// uses the model pipeline and no longer falls back to Java.
URNJavaDoFn = "urn:beam:dofn:javasdk:0.1"
-   URNDoFn = "urn:beam:go:transform:dofn:v1"
+   URNDoFn = "beam:go:transform:dofn:v1"
 )
 
 // TODO(herohde) 11/6/2017: move some of the configuration into the graph 
during construction.
@@ -372,6 +372,7 @@ func (m *marshaller) addWindowingStrategy(w *window.Window) 
string {
OutputTime:  pb.OutputTime_END_OF_WINDOW,
ClosingBehavior: pb.ClosingBehavior_EMIT_IF_NONEMPTY,
AllowedLateness: 0,
+   OnTimeBehavior:  pb.OnTimeBehavior_FIRE_ALWAYS,
}
m.windowing[id] = ws
}
diff --git a/sdks/go/pkg/beam/core/runtime/harness/init/init.go 
b/sdks/go/pkg/beam/core/runtime/harness/init/init.go
index c99601da989..9b37f0dac98 100644
--- a/sdks/go/pkg/beam/core/runtime/harness/init/init.go
+++ b/sdks/go/pkg/beam/core/runtime/harness/init/init.go
@@ -49,28 +49,6 @@ func init() {
runtime.RegisterInit(hook)
 

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #6253

2018-03-20 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-2927) Python SDK support for portable side input

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2927?focusedWorklogId=82553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82553
 ]

ASF GitHub Bot logged work on BEAM-2927:


Author: ASF GitHub Bot
Created on: 21/Mar/18 00:05
Start Date: 21/Mar/18 00:05
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #4781: [BEAM-2927] Python 
support for portable side inputs over Fn API
URL: https://github.com/apache/beam/pull/4781
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 7a5b884b1af..82130d6e2b7 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -211,36 +211,8 @@ def visit_transform(self, transform_node):
 from apache_beam.transforms.core import GroupByKey, _GroupByKeyOnly
 if isinstance(transform_node.transform, (GroupByKey, _GroupByKeyOnly)):
   pcoll = transform_node.inputs[0]
-  input_type = pcoll.element_type
-  # If input_type is not specified, then treat it as `Any`.
-  if not input_type:
-input_type = typehints.Any
-
-  def coerce_to_kv_type(element_type):
-if isinstance(element_type, typehints.TupleHint.TupleConstraint):
-  if len(element_type.tuple_types) == 2:
-return element_type
-  else:
-raise ValueError(
-"Tuple input to GroupByKey must be have two components. "
-"Found %s for %s" % (element_type, pcoll))
-elif isinstance(input_type, typehints.AnyTypeConstraint):
-  # `Any` type needs to be replaced with a KV[Any, Any] to
-  # force a KV coder as the main output coder for the pcollection
-  # preceding a GroupByKey.
-  return typehints.KV[typehints.Any, typehints.Any]
-elif isinstance(element_type, typehints.UnionConstraint):
-  union_types = [
-  coerce_to_kv_type(t) for t in element_type.union_types]
-  return typehints.KV[
-  typehints.Union[tuple(t.tuple_types[0] for t in 
union_types)],
-  typehints.Union[tuple(t.tuple_types[1] for t in 
union_types)]]
-else:
-  # TODO: Possibly handle other valid types.
-  raise ValueError(
-  "Input to GroupByKey must be of Tuple or Any type. "
-  "Found %s for %s" % (element_type, pcoll))
-  pcoll.element_type = coerce_to_kv_type(input_type)
+  pcoll.element_type = typehints.coerce_to_kv_type(
+  pcoll.element_type, transform_node.full_label)
   key_type, value_type = pcoll.element_type.tuple_types
   if transform_node.outputs:
 transform_node.outputs[None].element_type = typehints.KV[
@@ -248,6 +220,59 @@ def coerce_to_kv_type(element_type):
 
 return GroupByKeyInputVisitor()
 
+  @staticmethod
+  def side_input_visitor():
+# Imported here to avoid circular dependencies.
+# pylint: disable=wrong-import-order, wrong-import-position
+from apache_beam.pipeline import PipelineVisitor
+from apache_beam.transforms.core import ParDo
+
+class SideInputVisitor(PipelineVisitor):
+  """Ensures input `PCollection` used as a side inputs has a `KV` type.
+
+  TODO(BEAM-115): Once Python SDK is compatible with the new Runner API,
+  we could directly replace the coder instead of mutating the element type.
+  """
+  def visit_transform(self, transform_node):
+if isinstance(transform_node.transform, ParDo):
+  new_side_inputs = []
+  for ix, side_input in enumerate(transform_node.side_inputs):
+access_pattern = side_input._side_input_data().access_pattern
+if access_pattern == common_urns.ITERABLE_SIDE_INPUT:
+  # Add a map to ('', value) as Dataflow currently only handles
+  # keyed side inputs.
+  pipeline = side_input.pvalue.pipeline
+  new_side_input = _DataflowIterableSideInput(side_input)
+  new_side_input.pvalue = beam.pvalue.PCollection(
+  pipeline,
+  element_type=typehints.KV[
+  str, side_input.pvalue.element_type])
+  parent = transform_node.parent or pipeline._root_transform()
+  map_to_void_key = beam.pipeline.AppliedPTransform(
+  pipeline,
+  beam.Map(lambda x: ('', x)),

[beam] 01/01: Merge pull request #4781 [BEAM-2927] Python support for dataflow portable side inputs over Fn API

2018-03-20 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit b51ed61a89adea91d0138424e2928373e1507ef4
Merge: 605781c 99f1ebf
Author: Robert Bradshaw 
AuthorDate: Tue Mar 20 17:05:13 2018 -0700

Merge pull request #4781 [BEAM-2927] Python support for dataflow portable 
side inputs over Fn API

 .../runners/dataflow/dataflow_runner.py| 158 -
 .../runners/dataflow/dataflow_runner_test.py   |  23 ++-
 .../runners/portability/fn_api_runner.py   |  13 +-
 .../apache_beam/runners/worker/bundle_processor.py |   9 +-
 .../apache_beam/runners/worker/sdk_worker.py   |  32 +++--
 sdks/python/apache_beam/typehints/typehints.py |  33 +
 6 files changed, 214 insertions(+), 54 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[beam] branch master updated (605781c -> b51ed61)

2018-03-20 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 605781c  [BEAM-3893] Add fallback to unauthenticated access for Go GCS 
IO
 add 99f1ebf  [BEAM-2927] Python support for dataflow portable side inputs 
over Fn API
 new b51ed61  Merge pull request #4781 [BEAM-2927] Python support for 
dataflow portable side inputs over Fn API

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../runners/dataflow/dataflow_runner.py| 158 -
 .../runners/dataflow/dataflow_runner_test.py   |  23 ++-
 .../runners/portability/fn_api_runner.py   |  13 +-
 .../apache_beam/runners/worker/bundle_processor.py |   9 +-
 .../apache_beam/runners/worker/sdk_worker.py   |  32 +++--
 sdks/python/apache_beam/typehints/typehints.py |  33 +
 6 files changed, 214 insertions(+), 54 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82551
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 21/Mar/18 00:01
Start Date: 21/Mar/18 00:01
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4777: [BEAM-3565] Add 
FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#issuecomment-374797898
 
 
   PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82551)
Time Spent: 15h 40m  (was: 15.5h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1442) Performance improvement of the Python DirectRunner

2018-03-20 Thread Robert Bradshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407244#comment-16407244
 ] 

Robert Bradshaw commented on BEAM-1442:
---

As of today, you can get this from pip. Note for avro in particular that 
https://issues.apache.org/jira/browse/BEAM-2810 is still open. 

> Performance improvement of the Python DirectRunner
> --
>
> Key: BEAM-1442
> URL: https://issues.apache.org/jira/browse/BEAM-1442
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Charles Chen
>Priority: Major
>  Labels: gsoc2017, mentor, python
> Fix For: 2.4.0
>
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, 
> and correctness checkers for Beam pipelines; but there are users that run 
> data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although 
> some work has gone into improving it. There are more opportunities for 
> improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, 
> and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions 
> directly on JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1442) Performance improvement of the Python DirectRunner

2018-03-20 Thread Debasish Das (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407241#comment-16407241
 ] 

Debasish Das commented on BEAM-1442:


Hi...I am pushing 10MB avro files on local and idea is to push a sizable amount 
of data in local mode for pipeline validation...Can I get this fix from pip to 
test it out on local files ?

> Performance improvement of the Python DirectRunner
> --
>
> Key: BEAM-1442
> URL: https://issues.apache.org/jira/browse/BEAM-1442
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Charles Chen
>Priority: Major
>  Labels: gsoc2017, mentor, python
> Fix For: 2.4.0
>
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, 
> and correctness checkers for Beam pipelines; but there are users that run 
> data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although 
> some work has gone into improving it. There are more opportunities for 
> improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, 
> and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions 
> directly on JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82549
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 20/Mar/18 23:29
Start Date: 20/Mar/18 23:29
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r175953760
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -118,97 +132,78 @@ class ReadMessagesFromPubSub(PTransform):
   Outputs elements of type :class:`~PubsubMessage`.
   """
 
-  def __init__(self, topic=None, subscription=None, id_label=None):
+  def __init__(self, topic=None, subscription=None, id_label=None,
+   timestamp_attribute=None):
 """Initializes ``ReadMessagesFromPubSub``.
 
 Args:
-  topic: Cloud Pub/Sub topic in the form "projects//topics/
-". If provided, subscription must be None.
-  subscription: Existing Cloud Pub/Sub subscription to use in the
-form "projects//subscriptions/". If not
-specified, a temporary subscription will be created from the specified
-topic. If provided, topic must be None.
-  id_label: The attribute on incoming Pub/Sub messages to use as a unique
-record identifier.  When specified, the value of this attribute (which
-can be any string that uniquely identifies the record) will be used for
-deduplication of messages.  If not provided, we cannot guarantee
-that no duplicate data will be delivered on the Pub/Sub stream. In this
-case, deduplication of the stream will be strictly best effort.
 """
 super(ReadMessagesFromPubSub, self).__init__()
 self.topic = topic
 self.subscription = subscription
 self.id_label = id_label
+self.timestamp_attribute = timestamp_attribute
 
   def get_windowing(self, unused_inputs):
 return core.Windowing(window.GlobalWindows())
 
   def expand(self, pcoll):
 p = (pcoll.pipeline
  | _ReadFromPubSub(self.topic, self.subscription, self.id_label,
-   with_attributes=True))
+   with_attributes=True,
+   timestamp_attribute=self.timestamp_attribute))
 return p
 
 
-class ReadStringsFromPubSub(PTransform):
-  """A ``PTransform`` for reading utf-8 string payloads from Cloud Pub/Sub.
+class ReadPayloadsFromPubSub(PTransform):
+  """A ``PTransform`` for reading raw payloads from Cloud Pub/Sub.
 
-  Outputs elements of type ``unicode``, decoded from UTF-8.
+  Outputs elements of type ``str``.
   """
 
-  def __init__(self, topic=None, subscription=None, id_label=None):
-"""Initializes ``ReadStringsFromPubSub``.
-
-Args:
-  topic: Cloud Pub/Sub topic in the form "projects//topics/
-". If provided, subscription must be None.
-  subscription: Existing Cloud Pub/Sub subscription to use in the
-form "projects//subscriptions/". If not
-specified, a temporary subscription will be created from the specified
-topic. If provided, topic must be None.
-  id_label: The attribute on incoming Pub/Sub messages to use as a unique
-record identifier.  When specified, the value of this attribute (which
-can be any string that uniquely identifies the record) will be used for
-deduplication of messages.  If not provided, we cannot guarantee
-that no duplicate data will be delivered on the Pub/Sub stream. In this
-case, deduplication of the stream will be strictly best effort.
-"""
-super(ReadStringsFromPubSub, self).__init__()
+  def __init__(self, topic=None, subscription=None, id_label=None,
+   timestamp_attribute=None):
+super(ReadPayloadsFromPubSub, self).__init__()
 self.topic = topic
 self.subscription = subscription
 self.id_label = id_label
+self.timestamp_attribute = timestamp_attribute
 
   def get_windowing(self, unused_inputs):
 return core.Windowing(window.GlobalWindows())
 
   def expand(self, pcoll):
 p = (pcoll.pipeline
  | _ReadFromPubSub(self.topic, self.subscription, self.id_label,
-   with_attributes=False)
+   with_attributes=False,
+   timestamp_attribute=self.timestamp_attribute))
+return p
+
+
+class ReadStringsFromPubSub(ReadPayloadsFromPubSub):
 
 Review comment:
   Please see my comment above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Work logged] (BEAM-3744) Support full PubsubMessages

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3744?focusedWorklogId=82548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82548
 ]

ASF GitHub Bot logged work on BEAM-3744:


Author: ASF GitHub Bot
Created on: 20/Mar/18 23:28
Start Date: 20/Mar/18 23:28
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #4901: 
[BEAM-3744] Expand Pubsub read API for Python.
URL: https://github.com/apache/beam/pull/4901#discussion_r175953648
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -44,8 +68,8 @@
   pubsub_pb2 = None
 
 
-__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadStringsFromPubSub',
-   'WriteStringsToPubSub']
+__all__ = ['PubsubMessage', 'ReadMessagesFromPubSub', 'ReadPayloadsFromPubSub',
 
 Review comment:
   There was a previous discussions about this here: 
https://github.com/apache/beam/pull/3223 .
   Java has 
[PubsubIO.readStrings()](https://github.com/apache/beam/blob/29859eb54d05b96a9db477e7bb04537510273bd2/examples/java/src/main/java/org/apache/beam/examples/complete/game/StatefulTeamScore.java#L126),
 
[readMessages()](https://github.com/apache/beam/blob/29859eb54d05b96a9db477e7bb04537510273bd2/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L447),
 readMessagesWithAttributes(), etc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82548)
Time Spent: 4h 20m  (was: 4h 10m)

> Support full PubsubMessages
> ---
>
> Key: BEAM-3744
> URL: https://issues.apache.org/jira/browse/BEAM-3744
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Critical
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Tracking changes to Pubsub support in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2517) Document how to build Python SDK from BEAM head in contribution guide.

2018-03-20 Thread Udi Meiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-2517:
---

Assignee: Udi Meiri

> Document how to build Python SDK from BEAM head in contribution guide.
> --
>
> Key: BEAM-2517
> URL: https://issues.apache.org/jira/browse/BEAM-2517
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Minor
>  Labels: starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should add instructions how to build Python SDK from BEAM head to BEAM 
> contributor guide[1] .
> The commands can be as follows:
> cd ./beam/sdks/python
> python setup.py sdist
> SDK tarball will appear in ./dist/
> [1]: https://beam.apache.org/contribute/contribution-guide



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3898) PTransformOverride does not replace side input usage of replaced transform

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3898?focusedWorklogId=82541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82541
 ]

ASF GitHub Bot logged work on BEAM-3898:


Author: ASF GitHub Bot
Created on: 20/Mar/18 23:18
Start Date: 20/Mar/18 23:18
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #4914: [BEAM-3898] 
Replace side inputs when applying PTransformOverrides
URL: https://github.com/apache/beam/pull/4914#issuecomment-374790035
 
 
   R: @chamikaramj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82541)
Time Spent: 20m  (was: 10m)

> PTransformOverride does not replace side input usage of replaced transform
> --
>
> Key: BEAM-3898
> URL: https://issues.apache.org/jira/browse/BEAM-3898
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the PTransformOverride mechanism allows specification of a 
> replacement procedure where transform A is replaced with another transform B. 
>  However, the current mechanism does not replace usages where the output of A 
> is being read as a side input.  We should fix this behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3898) PTransformOverride does not replace side input usage of replaced transform

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3898?focusedWorklogId=82540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82540
 ]

ASF GitHub Bot logged work on BEAM-3898:


Author: ASF GitHub Bot
Created on: 20/Mar/18 23:17
Start Date: 20/Mar/18 23:17
Worklog Time Spent: 10m 
  Work Description: charlesccychen opened a new pull request #4914: 
[BEAM-3898] Replace side inputs when applying PTransformOverrides
URL: https://github.com/apache/beam/pull/4914
 
 
   Currently, the PTransformOverride mechanism allows specification of a 
replacement procedure where transform A is replaced with another transform B.  
However, the current mechanism does not replace usages where the output of A is 
being read as a side input.  This change fixes this behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82540)
Time Spent: 10m
Remaining Estimate: 0h

> PTransformOverride does not replace side input usage of replaced transform
> --
>
> Key: BEAM-3898
> URL: https://issues.apache.org/jira/browse/BEAM-3898
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Charles Chen
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the PTransformOverride mechanism allows specification of a 
> replacement procedure where transform A is replaced with another transform B. 
>  However, the current mechanism does not replace usages where the output of A 
> is being read as a side input.  We should fix this behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3898) PTransformOverride does not replace side input usage of replaced transform

2018-03-20 Thread Charles Chen (JIRA)
Charles Chen created BEAM-3898:
--

 Summary: PTransformOverride does not replace side input usage of 
replaced transform
 Key: BEAM-3898
 URL: https://issues.apache.org/jira/browse/BEAM-3898
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Charles Chen
Assignee: Ahmet Altay


Currently, the PTransformOverride mechanism allows specification of a 
replacement procedure where transform A is replaced with another transform B.  
However, the current mechanism does not replace usages where the output of A is 
being read as a side input.  We should fix this behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Python_ValidatesRunner_Dataflow #1152

2018-03-20 Thread Apache Jenkins Server
See 




[beam] branch master updated (f8ccbe5 -> 605781c)

2018-03-20 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from f8ccbe5  Merge pull request #4840 [BEAM-3817] Switch Go SDK BQ write 
to not use side input
 add 02fe391  [BEAM-3893] Add fallback to unauthenticated access for GCS IO
 new 605781c  [BEAM-3893] Add fallback to unauthenticated access for Go GCS 
IO

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/go/pkg/beam/io/textio/gcs/gcs.go  | 11 +--
 sdks/go/pkg/beam/provision/provision.go|  2 +-
 sdks/go/pkg/beam/runners/dataflow/translate.go |  2 +-
 sdks/go/pkg/beam/util/gcsx/gcs.go  | 11 +++
 4 files changed, 22 insertions(+), 4 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
lc...@apache.org.


[jira] [Work logged] (BEAM-3893) Go SDK GCS I/O shouldn't require credentials to read from public buckets

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3893?focusedWorklogId=82537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82537
 ]

ASF GitHub Bot logged work on BEAM-3893:


Author: ASF GitHub Bot
Created on: 20/Mar/18 23:13
Start Date: 20/Mar/18 23:13
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #4911: [BEAM-3893] Add 
fallback to unauthenticated access for Go GCS IO
URL: https://github.com/apache/beam/pull/4911
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/pkg/beam/io/textio/gcs/gcs.go 
b/sdks/go/pkg/beam/io/textio/gcs/gcs.go
index 4a005dd7520..a5a6edfd5cd 100644
--- a/sdks/go/pkg/beam/io/textio/gcs/gcs.go
+++ b/sdks/go/pkg/beam/io/textio/gcs/gcs.go
@@ -24,6 +24,7 @@ import (
"strings"
 
"github.com/apache/beam/sdks/go/pkg/beam/io/textio"
+   "github.com/apache/beam/sdks/go/pkg/beam/log"
"github.com/apache/beam/sdks/go/pkg/beam/util/gcsx"
"google.golang.org/api/storage/v1"
 )
@@ -37,11 +38,17 @@ type fs struct {
 }
 
 // New creates a new Google Cloud Storage filesystem using application
-// default credentials.
+// default credentials. If it fails, it falls back to unauthenticated
+// access.
 func New(ctx context.Context) textio.FileSystem {
client, err := gcsx.NewClient(ctx, storage.DevstorageReadWriteScope)
if err != nil {
-   panic(fmt.Sprintf("failed to create GCE client: %v", err))
+   log.Warnf(ctx, "Warning: falling back to unauthenticated GCS 
access: %v", err)
+
+   client, err = gcsx.NewUnauthenticatedClient(ctx)
+   if err != nil {
+   panic(fmt.Sprintf("failed to create GCE client: %v", 
err))
+   }
}
return {client: client}
 }
diff --git a/sdks/go/pkg/beam/provision/provision.go 
b/sdks/go/pkg/beam/provision/provision.go
index 656b3f27539..7349ef2a343 100644
--- a/sdks/go/pkg/beam/provision/provision.go
+++ b/sdks/go/pkg/beam/provision/provision.go
@@ -45,7 +45,7 @@ func Info(ctx context.Context, endpoint string) 
(*pb.ProvisionInfo, error) {
return nil, fmt.Errorf("failed to get manifest: %v", err)
}
if resp.GetInfo() == nil {
-   return nil, fmt.Errorf("empty manifest",)
+   return nil, fmt.Errorf("empty manifest")
}
return resp.GetInfo(), nil
 }
diff --git a/sdks/go/pkg/beam/runners/dataflow/translate.go 
b/sdks/go/pkg/beam/runners/dataflow/translate.go
index c2d4f88a11c..440dcd7a9ab 100644
--- a/sdks/go/pkg/beam/runners/dataflow/translate.go
+++ b/sdks/go/pkg/beam/runners/dataflow/translate.go
@@ -24,6 +24,7 @@ import (
 
"github.com/apache/beam/sdks/go/pkg/beam"
"github.com/apache/beam/sdks/go/pkg/beam/core/graph"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
"github.com/apache/beam/sdks/go/pkg/beam/core/graph/window"
"github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec"
"github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx"
@@ -32,7 +33,6 @@ import (
rnapi_pb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1"
"github.com/golang/protobuf/proto"
df "google.golang.org/api/dataflow/v1b3"
-   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder"
 )
 
 const (
diff --git a/sdks/go/pkg/beam/util/gcsx/gcs.go 
b/sdks/go/pkg/beam/util/gcsx/gcs.go
index b39bd21514a..666a837fb0f 100644
--- a/sdks/go/pkg/beam/util/gcsx/gcs.go
+++ b/sdks/go/pkg/beam/util/gcsx/gcs.go
@@ -27,7 +27,9 @@ import (
 
"golang.org/x/oauth2/google"
"google.golang.org/api/googleapi"
+   "google.golang.org/api/option"
"google.golang.org/api/storage/v1"
+   ghttp "google.golang.org/api/transport/http"
 )
 
 // NewClient creates a new GCS client with default application credentials.
@@ -39,6 +41,15 @@ func NewClient(ctx context.Context, scope string) 
(*storage.Service, error) {
return storage.New(cl)
 }
 
+// NewUnauthenticatedClient creates a new GCS client without authentication.
+func NewUnauthenticatedClient(ctx context.Context) (*storage.Service, error) {
+   cl, _, err := ghttp.NewClient(ctx, option.WithoutAuthentication())
+   if err != nil {
+   return nil, fmt.Errorf("dialing: %v", err)
+   }
+   return storage.New(cl)
+}
+
 // Upload writes the given content to GCS. If the specified bucket does not
 // exist, it is created first. Returns the full path of the object.
 func Upload(client *storage.Service, project, bucket, object string, r 
io.Reader) (string, error) {


 


[beam] 01/01: [BEAM-3893] Add fallback to unauthenticated access for Go GCS IO

2018-03-20 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 605781cd9fa6f6c8276033247e5c527fac9656aa
Merge: f8ccbe5 02fe391
Author: Lukasz Cwik 
AuthorDate: Tue Mar 20 16:13:17 2018 -0700

[BEAM-3893] Add fallback to unauthenticated access for Go GCS IO

 sdks/go/pkg/beam/io/textio/gcs/gcs.go  | 11 +--
 sdks/go/pkg/beam/provision/provision.go|  2 +-
 sdks/go/pkg/beam/runners/dataflow/translate.go |  2 +-
 sdks/go/pkg/beam/util/gcsx/gcs.go  | 11 +++
 4 files changed, 22 insertions(+), 4 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
lc...@apache.org.


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82532
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:59
Start Date: 20/Mar/18 22:59
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #4910: 
[BEAM-3895] Add Side Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#discussion_r175947778
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -139,14 +151,27 @@ default PTransform toPTransform() {
*/
   static ExecutableStage fromPayload(ExecutableStagePayload payload, 
Components components) {
 Environment environment = payload.getEnvironment();
-PCollectionNode input = PipelineNode.pCollection(payload.getInput(),
-components.getPcollectionsOrThrow(payload.getInput()));
-List transforms = payload.getTransformsList().stream()
-.map(id -> PipelineNode.pTransform(id, 
components.getTransformsOrThrow(id)))
-.collect(Collectors.toList());
-List outputs = payload.getOutputsList().stream()
-.map(id -> PipelineNode.pCollection(id, 
components.getPcollectionsOrThrow(id)))
-.collect(Collectors.toList());
-return ImmutableExecutableStage.of(environment, input, transforms, 
outputs);
+PCollectionNode input =
+PipelineNode.pCollection(
+payload.getInput(), 
components.getPcollectionsOrThrow(payload.getInput()));
 
 Review comment:
   At this pointm it may be worth wrapping the getXOrThrow calls to actually 
return a legible error message since the generated code does not print the 
input ids.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82532)
Time Spent: 50m  (was: 40m)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82533
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:59
Start Date: 20/Mar/18 22:59
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #4910: 
[BEAM-3895] Add Side Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#discussion_r175948457
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuser.java
 ##
 @@ -125,6 +128,7 @@ public static ExecutableStage forGrpcPortRead(
 return ImmutableExecutableStage.of(
 environment,
 inputPCollection,
+ImmutableSet.copyOf(sideInputs),
 
 Review comment:
   It's a bit weird to pessimistically copy fields here when we know that 
ImmutableExecutableStage.of() already does this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82533)
Time Spent: 1h  (was: 50m)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82531
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:59
Start Date: 20/Mar/18 22:59
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #4910: 
[BEAM-3895] Add Side Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#discussion_r175944762
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -214,11 +214,13 @@ message ExecutableStagePayload {
   // Input PCollection id.
   string input = 2;
 
+  repeated string side_inputs = 3;
 
 Review comment:
   Please add a comment indicating that these are global ids (as opposed to 
local names).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82531)
Time Spent: 40m  (was: 0.5h)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3873) Current version of commons-compress is DOS vulnerable CVE-2018-1324

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3873?focusedWorklogId=82528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82528
 ]

ASF GitHub Bot logged work on BEAM-3873:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:47
Start Date: 20/Mar/18 22:47
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4889: 
[BEAM-3873] Current version of commons-compress is DOS vulnerable CVE-2018-1324
URL: https://github.com/apache/beam/pull/4889#discussion_r175946499
 
 

 ##
 File path: pom.xml
 ##
 @@ -1446,6 +1446,7 @@
 junit
 junit
 ${junit.version}
+test-jar
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82528)
Time Spent: 2.5h  (was: 2h 20m)

> Current version of commons-compress is DOS vulnerable CVE-2018-1324
> ---
>
> Key: BEAM-3873
> URL: https://issues.apache.org/jira/browse/BEAM-3873
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The commons-compress version of the library used by Beam has a security 
> vulnerability. For more details see 
> [CVE-2018-1324|https://www.cvedetails.com/cve/CVE-2018-1324/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3873) Current version of commons-compress is DOS vulnerable CVE-2018-1324

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3873?focusedWorklogId=82527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82527
 ]

ASF GitHub Bot logged work on BEAM-3873:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:47
Start Date: 20/Mar/18 22:47
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4889: 
[BEAM-3873] Current version of commons-compress is DOS vulnerable CVE-2018-1324
URL: https://github.com/apache/beam/pull/4889#discussion_r175946473
 
 

 ##
 File path: pom.xml
 ##
 @@ -1460,6 +1461,7 @@
 com.google.guava
 guava-testlib
 ${guava.version}
+test-jar
 test
 
 Review comment:
   You are right, I thought the warning was related to the type and not the 
duplicated dependency. Removing it should make it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82527)
Time Spent: 2h 20m  (was: 2h 10m)

> Current version of commons-compress is DOS vulnerable CVE-2018-1324
> ---
>
> Key: BEAM-3873
> URL: https://issues.apache.org/jira/browse/BEAM-3873
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The commons-compress version of the library used by Beam has a security 
> vulnerability. For more details see 
> [CVE-2018-1324|https://www.cvedetails.com/cve/CVE-2018-1324/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3878) Improve error reporting in calls.go

2018-03-20 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-3878:

Priority: Minor  (was: Major)

> Improve error reporting in calls.go
> ---
>
> Key: BEAM-3878
> URL: https://issues.apache.org/jira/browse/BEAM-3878
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Priority: Minor
>
> The error messages generated in calls.go are not as helpful as they could be.
> Instead of simply reporting "incompatible func type" it would be great if 
> they reported the topology of the actual function supplied versus what is 
> expected. That would make debugging a lot easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3878) Improve error reporting in calls.go

2018-03-20 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-3878:
---

Assignee: (was: Henning Rohde)

> Improve error reporting in calls.go
> ---
>
> Key: BEAM-3878
> URL: https://issues.apache.org/jira/browse/BEAM-3878
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Priority: Major
>
> The error messages generated in calls.go are not as helpful as they could be.
> Instead of simply reporting "incompatible func type" it would be great if 
> they reported the topology of the actual function supplied versus what is 
> expected. That would make debugging a lot easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3817) Incompatible input encoding running Tornadoes example on dataflow

2018-03-20 Thread Henning Rohde (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde resolved BEAM-3817.
-
   Resolution: Fixed
Fix Version/s: 2.5.0

> Incompatible input encoding running Tornadoes example on dataflow
> -
>
> Key: BEAM-3817
> URL: https://issues.apache.org/jira/browse/BEAM-3817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Braden Bassingthwaite
>Assignee: Henning Rohde
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Trying to run:
> go run tornadoes.go --output=:bbass.tornadoes --project  
> --runner dataflow --staging_location=gs://bbass/tornadoes 
> --worker_harness_container_image=gcr.io//beam/go
> Found here:
> [https://github.com/apache/beam/blob/master/sdks/go/examples/cookbook/tornadoes/tornadoes.go]
> I can run it locally but I get the error on Dataflow:
> (8fa522c2bb03a769): Workflow failed. Causes: (8fa522c2bb03ab04): Incompatible 
> input encoding. 
>  
> I built the worker_harness_container_image using:
> mvn clean install -DskipTests -Pbuild-containers 
> -Ddocker-repository-root=gcr.io//beam
>  
> Thanks!
>  
> Very excited to start using the golang beam sdk! great work!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3418) Python Fnapi - Support Multiple SDK workers on a single VM

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?focusedWorklogId=82513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82513
 ]

ASF GitHub Bot logged work on BEAM-3418:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #4587: 
[BEAM-3418] Send worker_id in all grpc channels to runner harness
URL: https://github.com/apache/beam/pull/4587#discussion_r175920015
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -177,6 +177,14 @@ def __init__(self, packages, options, 
environment_version, pipeline_url):
 if self.debug_options.experiments:
   for experiment in self.debug_options.experiments:
 self.proto.experiments.append(experiment)
+# Add use_multiple_sdk_containers flag if its not already present. Do not
+# add the flag if 'no_use_multiple_sdk_containers' is present.
+# TODO: Cleanup use_multiple_sdk_containers once we deprecate Python SDK
+# till version 2.4.
+if (job_type.startswith('FNAPI_') and
+'use_multiple_sdk_containers' not in self.proto.experiments and
+'no_use_multiple_sdk_containers' not in self.proto.experiments):
+  self.proto.experiments.append('use_multiple_sdk_containers')
 
 Review comment:
   Makes Sense!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82513)
Time Spent: 2h 40m  (was: 2.5h)

> Python Fnapi - Support Multiple SDK workers on a single VM
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: performance, portability
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Support multiple python SDK process on a VM to fully utilize a machine.
> Each SDK Process will work in isolation and interact with Runner Harness 
> independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3418) Python Fnapi - Support Multiple SDK workers on a single VM

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?focusedWorklogId=82511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82511
 ]

ASF GitHub Bot logged work on BEAM-3418:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #4587: 
[BEAM-3418] Send worker_id in all grpc channels to runner harness
URL: https://github.com/apache/beam/pull/4587#discussion_r175917370
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_id_interceptor.py
 ##
 @@ -0,0 +1,54 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Client Interceptor to inject worker_id"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+import os
+import uuid
+
+import grpc
+
+
+class _ClientCallDetails(
+collections.namedtuple('_ClientCallDetails',
+   ('method', 'timeout', 'metadata', 'credentials')),
+grpc.ClientCallDetails):
+  pass
+
+
+class WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
+
+  # Unique worker Id for this worker.
+  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
 
 Review comment:
   For backward compatibility of containers, I would like to assign a UUID if 
worker_id is not provided.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82511)
Time Spent: 2.5h  (was: 2h 20m)

> Python Fnapi - Support Multiple SDK workers on a single VM
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: performance, portability
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Support multiple python SDK process on a VM to fully utilize a machine.
> Each SDK Process will work in isolation and interact with Runner Harness 
> independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3418) Python Fnapi - Support Multiple SDK workers on a single VM

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?focusedWorklogId=82510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82510
 ]

ASF GitHub Bot logged work on BEAM-3418:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #4587: 
[BEAM-3418] Send worker_id in all grpc channels to runner harness
URL: https://github.com/apache/beam/pull/4587#discussion_r175917786
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_id_interceptor.py
 ##
 @@ -0,0 +1,54 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""Client Interceptor to inject worker_id"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+import os
+import uuid
+
+import grpc
+
+
+class _ClientCallDetails(
+collections.namedtuple('_ClientCallDetails',
+   ('method', 'timeout', 'metadata', 'credentials')),
+grpc.ClientCallDetails):
+  pass
+
+
+class WorkerIdInterceptor(grpc.StreamStreamClientInterceptor):
+
+  # Unique worker Id for this worker.
+  _worker_id = os.environ['WORKER_ID'] if os.environ.has_key(
+  'WORKER_ID') else str(uuid.uuid4())
+
+  def __init__(self):
+pass
+
+  def intercept_stream_stream(self, continuation, client_call_details,
+  request_iterator):
+metadata = []
+if client_call_details.metadata is not None:
+  metadata = list(client_call_details.metadata)
+metadata.append(('worker_id', self._worker_id))
 
 Review comment:
   It should be an error.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82510)
Time Spent: 2h 20m  (was: 2h 10m)

> Python Fnapi - Support Multiple SDK workers on a single VM
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: performance, portability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Support multiple python SDK process on a VM to fully utilize a machine.
> Each SDK Process will work in isolation and interact with Runner Harness 
> independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3418) Python Fnapi - Support Multiple SDK workers on a single VM

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?focusedWorklogId=82514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82514
 ]

ASF GitHub Bot logged work on BEAM-3418:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #4587: 
[BEAM-3418] Send worker_id in all grpc channels to runner harness
URL: https://github.com/apache/beam/pull/4587#discussion_r175919043
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/data_plane.py
 ##
 @@ -311,6 +312,9 @@ def create_data_channel(self, remote_grpc_port):
   # controlled in a layer above.
   options=[("grpc.max_receive_message_length", -1),
("grpc.max_send_message_length", -1)])
+  # Add workerId to the grpc channel
+  grpc_channel = grpc.intercept_channel(grpc_channel,
 
 Review comment:
   Not Simplifying to keep readability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82514)
Time Spent: 2h 50m  (was: 2h 40m)

> Python Fnapi - Support Multiple SDK workers on a single VM
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: performance, portability
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Support multiple python SDK process on a VM to fully utilize a machine.
> Each SDK Process will work in isolation and interact with Runner Harness 
> independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3418) Python Fnapi - Support Multiple SDK workers on a single VM

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?focusedWorklogId=82515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82515
 ]

ASF GitHub Bot logged work on BEAM-3418:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #4587: 
[BEAM-3418] Send worker_id in all grpc channels to runner harness
URL: https://github.com/apache/beam/pull/4587#discussion_r175918327
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -43,6 +44,8 @@ def __init__(self, control_address, worker_count):
 self._worker_count = worker_count
 self._worker_index = 0
 self._control_channel = grpc.insecure_channel(control_address)
+self._control_channel = grpc.intercept_channel(self._control_channel,
 
 Review comment:
   Sure!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82515)
Time Spent: 3h  (was: 2h 50m)

> Python Fnapi - Support Multiple SDK workers on a single VM
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: performance, portability
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Support multiple python SDK process on a VM to fully utilize a machine.
> Each SDK Process will work in isolation and interact with Runner Harness 
> independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3817) Incompatible input encoding running Tornadoes example on dataflow

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3817?focusedWorklogId=82516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82516
 ]

ASF GitHub Bot logged work on BEAM-3817:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #4840: [BEAM-3817] Switch Go 
SDK BQ write to not use side input
URL: https://github.com/apache/beam/pull/4840#issuecomment-374776224
 
 
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82516)
Time Spent: 1.5h  (was: 1h 20m)

> Incompatible input encoding running Tornadoes example on dataflow
> -
>
> Key: BEAM-3817
> URL: https://issues.apache.org/jira/browse/BEAM-3817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Braden Bassingthwaite
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Trying to run:
> go run tornadoes.go --output=:bbass.tornadoes --project  
> --runner dataflow --staging_location=gs://bbass/tornadoes 
> --worker_harness_container_image=gcr.io//beam/go
> Found here:
> [https://github.com/apache/beam/blob/master/sdks/go/examples/cookbook/tornadoes/tornadoes.go]
> I can run it locally but I get the error on Dataflow:
> (8fa522c2bb03a769): Workflow failed. Causes: (8fa522c2bb03ab04): Incompatible 
> input encoding. 
>  
> I built the worker_harness_container_image using:
> mvn clean install -DskipTests -Pbuild-containers 
> -Ddocker-repository-root=gcr.io//beam
>  
> Thanks!
>  
> Very excited to start using the golang beam sdk! great work!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3418) Python Fnapi - Support Multiple SDK workers on a single VM

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3418?focusedWorklogId=82512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82512
 ]

ASF GitHub Bot logged work on BEAM-3418:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:17
Start Date: 20/Mar/18 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #4587: 
[BEAM-3418] Send worker_id in all grpc channels to runner harness
URL: https://github.com/apache/beam/pull/4587#discussion_r175918509
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/log_handler.py
 ##
 @@ -49,6 +50,8 @@ class FnApiLogRecordHandler(logging.Handler):
   def __init__(self, log_service_descriptor):
 super(FnApiLogRecordHandler, self).__init__()
 self._log_channel = grpc.insecure_channel(log_service_descriptor.url)
+self._log_channel = grpc.intercept_channel(self._log_channel,
 
 Review comment:
   Sure!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82512)
Time Spent: 2.5h  (was: 2h 20m)

> Python Fnapi - Support Multiple SDK workers on a single VM
> --
>
> Key: BEAM-3418
> URL: https://issues.apache.org/jira/browse/BEAM-3418
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Labels: performance, portability
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Support multiple python SDK process on a VM to fully utilize a machine.
> Each SDK Process will work in isolation and interact with Runner Harness 
> independently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3897) Flink runners fails on multioutput portable pipeline

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3897?focusedWorklogId=82508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82508
 ]

ASF GitHub Bot logged work on BEAM-3897:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:11
Start Date: 20/Mar/18 22:11
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #4913: [BEAM-3897] Add Go 
wordcount example with multi output DoFns
URL: https://github.com/apache/beam/pull/4913#issuecomment-374774551
 
 
   R: @bsidhom @aljoscha 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82508)
Time Spent: 20m  (was: 10m)

> Flink runners fails on multioutput portable pipeline
> 
>
> Key: BEAM-3897
> URL: https://issues.apache.org/jira/browse/BEAM-3897
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When trying a Go pipeline with a 2-output DoFn on the hacking branch, I see:
> [flink-runner-job-server] ERROR 
> org.apache.beam.runners.flink.FlinkJobInvocation - Error during job 
> invocation go-job-1521582585657843000_-2121541089_763230090.
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:134)
>   at 
> org.apache.beam.runners.flink.FlinkJobInvocation.lambda$start$0(FlinkJobInvocation.java:61)
>   at 
> org.apache.beam.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
>   at 
> org.apache.beam.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
>   at 
> org.apache.beam.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:897)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
> expected one element but was: <0, 1>
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.lambda$mapPartition$0(FlinkExecutableStageFunction.java:176)
>   at java.util.HashMap$Values.forEach(HashMap.java:980)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:172)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: expected one element but was: <0, 1>
>   at 
> org.apache.beam.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
>   at 
> 

[jira] [Work logged] (BEAM-3897) Flink runners fails on multioutput portable pipeline

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3897?focusedWorklogId=82507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82507
 ]

ASF GitHub Bot logged work on BEAM-3897:


Author: ASF GitHub Bot
Created on: 20/Mar/18 22:09
Start Date: 20/Mar/18 22:09
Worklog Time Spent: 10m 
  Work Description: herohde opened a new pull request #4913: [BEAM-3897] 
Add Go wordcount example with multi output DoFns
URL: https://github.com/apache/beam/pull/4913
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82507)
Time Spent: 10m
Remaining Estimate: 0h

> Flink runners fails on multioutput portable pipeline
> 
>
> Key: BEAM-3897
> URL: https://issues.apache.org/jira/browse/BEAM-3897
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying a Go pipeline with a 2-output DoFn on the hacking branch, I see:
> [flink-runner-job-server] ERROR 
> org.apache.beam.runners.flink.FlinkJobInvocation - Error during job 
> invocation go-job-1521582585657843000_-2121541089_763230090.
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:134)
>   at 
> org.apache.beam.runners.flink.FlinkJobInvocation.lambda$start$0(FlinkJobInvocation.java:61)
>   at 
> org.apache.beam.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
>   at 
> org.apache.beam.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
>   at 
> org.apache.beam.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:897)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
> expected one element but was: <0, 1>
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.lambda$mapPartition$0(FlinkExecutableStageFunction.java:176)
>   at java.util.HashMap$Values.forEach(HashMap.java:980)
>   at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:172)
>   at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalArgumentException: expected one element but was: <0, 1>
>   at 
> org.apache.beam.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
>   at 
> 

[jira] [Created] (BEAM-3897) Flink runners fails on multioutput portable pipeline

2018-03-20 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-3897:
---

 Summary: Flink runners fails on multioutput portable pipeline
 Key: BEAM-3897
 URL: https://issues.apache.org/jira/browse/BEAM-3897
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Henning Rohde
Assignee: Ben Sidhom


When trying a Go pipeline with a 2-output DoFn on the hacking branch, I see:

[flink-runner-job-server] ERROR 
org.apache.beam.runners.flink.FlinkJobInvocation - Error during job invocation 
go-job-1521582585657843000_-2121541089_763230090.
java.lang.RuntimeException: Pipeline execution failed
at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:134)
at 
org.apache.beam.runners.flink.FlinkJobInvocation.lambda$start$0(FlinkJobInvocation.java:61)
at 
org.apache.beam.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at 
org.apache.beam.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
at 
org.apache.beam.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:897)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: expected one element but was: <0, 1>
at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.lambda$mapPartition$0(FlinkExecutableStageFunction.java:176)
at java.util.HashMap$Values.forEach(HashMap.java:980)
at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:172)
at 
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: expected one element but was: <0, 1>
at 
org.apache.beam.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
at 
org.apache.beam.util.concurrent.AbstractFuture.get(AbstractFuture.java:479)
at 
org.apache.beam.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
at 
org.apache.beam.sdk.fn.data.SettableFutureInboundDataClient.awaitCompletion(SettableFutureInboundDataClient.java:41)
at 
org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.awaitCompletion(BeamFnDataInboundObserver.java:89)
at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.lambda$mapPartition$0(FlinkExecutableStageFunction.java:174)
... 7 more
Caused by: java.lang.IllegalArgumentException: expected one element but was: 
<0, 1>
at org.apache.beam.collect.Iterators.getOnlyElement(Iterators.java:322)
at org.apache.beam.collect.Iterables.getOnlyElement(Iterables.java:294)
at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction$1$1.accept(FlinkExecutableStageFunction.java:149)
at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction$1$1.accept(FlinkExecutableStageFunction.java:135)
at 

[jira] [Work logged] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=82502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82502
 ]

ASF GitHub Bot logged work on BEAM-3042:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:50
Start Date: 20/Mar/18 21:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4912: [BEAM-3042] Fixing 
check for sideinput_io_metrics experiment flag.
URL: https://github.com/apache/beam/pull/4912#issuecomment-374769245
 
 
   r: @chamikaramj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82502)
Time Spent: 20m  (was: 10m)

> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3817) Incompatible input encoding running Tornadoes example on dataflow

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3817?focusedWorklogId=82499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82499
 ]

ASF GitHub Bot logged work on BEAM-3817:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:41
Start Date: 20/Mar/18 21:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #4840: [BEAM-3817] Switch 
Go SDK BQ write to not use side input
URL: https://github.com/apache/beam/pull/4840#issuecomment-374766650
 
 
   Looks good, thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82499)
Time Spent: 1h 10m  (was: 1h)

> Incompatible input encoding running Tornadoes example on dataflow
> -
>
> Key: BEAM-3817
> URL: https://issues.apache.org/jira/browse/BEAM-3817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Braden Bassingthwaite
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Trying to run:
> go run tornadoes.go --output=:bbass.tornadoes --project  
> --runner dataflow --staging_location=gs://bbass/tornadoes 
> --worker_harness_container_image=gcr.io//beam/go
> Found here:
> [https://github.com/apache/beam/blob/master/sdks/go/examples/cookbook/tornadoes/tornadoes.go]
> I can run it locally but I get the error on Dataflow:
> (8fa522c2bb03a769): Workflow failed. Causes: (8fa522c2bb03ab04): Incompatible 
> input encoding. 
>  
> I built the worker_harness_container_image using:
> mvn clean install -DskipTests -Pbuild-containers 
> -Ddocker-repository-root=gcr.io//beam
>  
> Thanks!
>  
> Very excited to start using the golang beam sdk! great work!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3817) Incompatible input encoding running Tornadoes example on dataflow

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3817?focusedWorklogId=82500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82500
 ]

ASF GitHub Bot logged work on BEAM-3817:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:41
Start Date: 20/Mar/18 21:41
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #4840: [BEAM-3817] Switch 
Go SDK BQ write to not use side input
URL: https://github.com/apache/beam/pull/4840
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/pkg/beam/combine.go b/sdks/go/pkg/beam/combine.go
index c0b70fe2623..a9bc098a493 100644
--- a/sdks/go/pkg/beam/combine.go
+++ b/sdks/go/pkg/beam/combine.go
@@ -33,16 +33,11 @@ func CombinePerKey(s Scope, combinefn interface{}, col 
PCollection) PCollection
return Must(TryCombinePerKey(s, combinefn, col))
 }
 
-// addFixedKeyFn forces all elements to a single key.
-func addFixedKeyFn(elm T) (int, T) {
-   return 0, elm
-}
-
 // TryCombine attempts to insert a global Combine transform into the pipeline. 
It may fail
 // for multiple reasons, notably that the combinefn is not valid or cannot be 
bound
 // -- due to type mismatch, say -- to the incoming PCollections.
 func TryCombine(s Scope, combinefn interface{}, col PCollection) (PCollection, 
error) {
-   pre := ParDo(s, addFixedKeyFn, col)
+   pre := AddFixedKey(s, col)
post, err := TryCombinePerKey(s, combinefn, pre)
if err != nil {
return PCollection{}, err
diff --git a/sdks/go/pkg/beam/io/bigqueryio/bigquery.go 
b/sdks/go/pkg/beam/io/bigqueryio/bigquery.go
index 07e315a8b64..22b5f3c3fcf 100644
--- a/sdks/go/pkg/beam/io/bigqueryio/bigquery.go
+++ b/sdks/go/pkg/beam/io/bigqueryio/bigquery.go
@@ -169,8 +169,11 @@ func Write(s beam.Scope, project, table string, col 
beam.PCollection) {
 
s = s.Scope("bigquery.Write")
 
-   imp := beam.Impulse(s)
-   beam.ParDo0(s, {Project: project, Table: qn, Type: 
beam.EncodedType{T: t}}, imp, beam.SideInput{Input: col})
+   // TODO(BEAM-3860) 3/15/2018: use side input instead of GBK.
+
+   pre := beam.AddFixedKey(s, col)
+   post := beam.GroupByKey(s, pre)
+   beam.ParDo0(s, {Project: project, Table: qn, Type: 
beam.EncodedType{T: t}}, post)
 }
 
 type writeFn struct {
@@ -182,7 +185,7 @@ type writeFn struct {
Type beam.EncodedType `json:"type"`
 }
 
-func (f *writeFn) ProcessElement(ctx context.Context, _ []byte, iter 
func(*beam.X) bool) error {
+func (f *writeFn) ProcessElement(ctx context.Context, _ int, iter 
func(*beam.X) bool) error {
client, err := bigquery.NewClient(ctx, f.Project)
if err != nil {
return err
diff --git a/sdks/go/pkg/beam/io/textio/textio.go 
b/sdks/go/pkg/beam/io/textio/textio.go
index b33a7d71ade..926251fb6bc 100644
--- a/sdks/go/pkg/beam/io/textio/textio.go
+++ b/sdks/go/pkg/beam/io/textio/textio.go
@@ -139,15 +139,13 @@ func Write(s beam.Scope, filename string, col 
beam.PCollection) {
// FinishBundle doesn't have the right granularity. We therefore
// perform a GBK with a fixed key to get all values in a single 
invocation.
 
-   pre := beam.ParDo(s, addFixedKey, col)
+   // TODO(BEAM-3860) 3/15/2018: use side input instead of GBK.
+
+   pre := beam.AddFixedKey(s, col)
post := beam.GroupByKey(s, pre)
beam.ParDo0(s, {Filename: filename}, post)
 }
 
-func addFixedKey(elm beam.T) (int, beam.T) {
-   return 0, elm
-}
-
 type writeFileFn struct {
Filename string `json:"filename"`
 }
diff --git a/sdks/go/pkg/beam/util.go b/sdks/go/pkg/beam/util.go
index f385e708ec1..e730765c61d 100644
--- a/sdks/go/pkg/beam/util.go
+++ b/sdks/go/pkg/beam/util.go
@@ -41,6 +41,15 @@ func Seq(s Scope, col PCollection, dofns ...interface{}) 
PCollection {
return cur
 }
 
+// AddFixedKey adds a fixed key (0) to every element.
+func AddFixedKey(s Scope, col PCollection) PCollection {
+   return ParDo(s, addFixedKeyFn, col)
+}
+
+func addFixedKeyFn(elm T) (int, T) {
+   return 0, elm
+}
+
 // DropKey drops the key for an input PCollection>. It returns
 // a PCollection.
 func DropKey(s Scope, col PCollection) PCollection {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82500)
Time Spent: 1h 20m  (was: 1h 10m)

> Incompatible input encoding 

[beam] branch master updated (e499c78 -> f8ccbe5)

2018-03-20 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from e499c78  Merge pull request #4898: Clean up ExecutableStage
 add 978c2cb  [BEAM-3817] Switch BQ write to not use side input
 add f6631dc  Add TODO to revert Go IO to use side input
 new f8ccbe5  Merge pull request #4840 [BEAM-3817] Switch Go SDK BQ write 
to not use side input

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/go/pkg/beam/combine.go| 7 +--
 sdks/go/pkg/beam/io/bigqueryio/bigquery.go | 9 ++---
 sdks/go/pkg/beam/io/textio/textio.go   | 8 +++-
 sdks/go/pkg/beam/util.go   | 9 +
 4 files changed, 19 insertions(+), 14 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[beam] 01/01: Merge pull request #4840 [BEAM-3817] Switch Go SDK BQ write to not use side input

2018-03-20 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit f8ccbe5cfa64459b39cccf58dc14b1bd7d0f99f6
Merge: e499c78 f6631dc
Author: Robert Bradshaw 
AuthorDate: Tue Mar 20 14:41:36 2018 -0700

Merge pull request #4840 [BEAM-3817] Switch Go SDK BQ write to not use side 
input

 sdks/go/pkg/beam/combine.go| 7 +--
 sdks/go/pkg/beam/io/bigqueryio/bigquery.go | 9 ++---
 sdks/go/pkg/beam/io/textio/textio.go   | 8 +++-
 sdks/go/pkg/beam/util.go   | 9 +
 4 files changed, 19 insertions(+), 14 deletions(-)


-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[jira] [Work logged] (BEAM-3879) Automate preparation/checking of releases

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3879?focusedWorklogId=82496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82496
 ]

ASF GitHub Bot logged work on BEAM-3879:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:39
Start Date: 20/Mar/18 21:39
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #4896: [BEAM-3879] Automate 
checking of release steps from the release guide
URL: https://github.com/apache/beam/pull/4896#issuecomment-374766162
 
 
   Yes, though I have to admit I get wary of bash scripts once they cross the 
100 line barrier. 
   
   I'm assuming this would work in conjunction with the artifacts themselves 
being produced (and staged) by a jenkins job? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82496)
Time Spent: 0.5h  (was: 20m)

> Automate preparation/checking of releases
> -
>
> Key: BEAM-3879
> URL: https://issues.apache.org/jira/browse/BEAM-3879
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Automate checking/preparing releases based on instructions in 
> https://beam.apache.org/contribute/release-guide/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3879) Automate preparation/checking of releases

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3879?focusedWorklogId=82497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82497
 ]

ASF GitHub Bot logged work on BEAM-3879:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:39
Start Date: 20/Mar/18 21:39
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4896: 
[BEAM-3879] Automate checking of release steps from the release guide
URL: https://github.com/apache/beam/pull/4896#discussion_r175930874
 
 

 ##
 File path: release/check_release.sh
 ##
 @@ -0,0 +1,91 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# This script checks that the release instructions in 
https://beam.apache.org/contribute/release-guide
+# have been followed.
+
+echo 'Check preparation of a new release for Apache Beam.'
+echo ' '
+
+# Load the functions in release_helper.sh.
+. $(dirname "${BASH_SOURCE[0]}")/release_helper.sh 
+
+# Load previous answers, to avoid re-prompting.
+ANSWER_FILE=~/.prepare_beam_release_answers.txt
+load_previous_answers
+
+# Check the needed software is installed.
+check_software gpg
+check_software git
+check_software mvn
+check_software svn
+check_software gpg-agent
+check_software python
+
+check_gpg_key
+check_access_to_nexus
+
+ensure_yes release_proposed "Has a release been proposed to the @dev list" "
+  Deciding to release and selecting a Release Manager is the first step of
+  the release process. This is a consensus-based decision of the entire
+  community. Anybody can propose a release on the dev@ mailing list, giving
+  a solid argument and nominating a committer as the Release Manager
+  (including themselves). There’s no formal process, no vote requirements,
+  and no timing requirements. Any objections should be resolved by consensus
+  before starting the release."
+
+ensure_set beam_version "What version number will be this release"
+check_beam_version_in_jira "${beam_version}" "current"
+beam_version_id=$found_beam_version_id
+
+ensure_yes website_setup "Have you set up access to the beam website?" "
+  You need to prepare access to the beam website to push changes there"
+
+set_next_version
+ensure_yes next_version_looks_ok "Will the next version (after 
${beam_version}) be version ${next_version}" "
+  When contributors resolve an issue in JIRA, they are tagging it with a
+  release that will contain their changes. With the release currently underway,
+  new issues should be resolved against a subsequent future release."
+
+check_beam_version_in_jira "${next_version}" "next"
+
+check_no_unresolved_issues
+
+release_page="https://issues.apache.org/jira/projects/BEAM/versions/${beam_version_id};
+ensure_yes release_notes_reviewed "Have you reviewed and edited the release 
notes?" "
+  JIRA automatically generates Release Notes based on the Fix Version field 
applied to issues.
+  Release Notes are intended for Beam users (not Beam committers/contributors).
+  You should ensure that Release Notes are informative and useful.
+  The release notes are linked from ${release_page}"
+
+ensure_yes release_build_works "Have you run a release build with mvn 
-Prelease clean install?" "
+  Before creating a release branch, ensure that the release build works and 
javadoc in sdks/java/javadoc
+  looks ok"
+
+check_release_branch_created
+
+check_python_version master "${next_version}.dev"
+check_python_version "release-${beam_version}" "${beam_version}"
+
+check_java_version master beam-master-MMDD
+check_java_version "release-${beam_version}" "beam-${beam_version}"
+
+cleanup
+
+echo ""
+echo "Script complete, but there are more steps at 
https://beam.apache.org/contribute/release-guide;
+echo "To start with a clean state, rm ${ANSWER_FILE} before re-running"
 
 Review comment:
   Very nice idea for resume. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact 

[jira] [Work logged] (BEAM-3879) Automate preparation/checking of releases

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3879?focusedWorklogId=82498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82498
 ]

ASF GitHub Bot logged work on BEAM-3879:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:39
Start Date: 20/Mar/18 21:39
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4896: 
[BEAM-3879] Automate checking of release steps from the release guide
URL: https://github.com/apache/beam/pull/4896#discussion_r175930450
 
 

 ##
 File path: release/check_release.sh
 ##
 @@ -0,0 +1,91 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# This script checks that the release instructions in 
https://beam.apache.org/contribute/release-guide
+# have been followed.
+
+echo 'Check preparation of a new release for Apache Beam.'
+echo ' '
+
+# Load the functions in release_helper.sh.
+. $(dirname "${BASH_SOURCE[0]}")/release_helper.sh 
+
+# Load previous answers, to avoid re-prompting.
+ANSWER_FILE=~/.prepare_beam_release_answers.txt
+load_previous_answers
+
+# Check the needed software is installed.
+check_software gpg
+check_software git
+check_software mvn
+check_software svn
+check_software gpg-agent
+check_software python
+
+check_gpg_key
+check_access_to_nexus
+
+ensure_yes release_proposed "Has a release been proposed to the @dev list" "
+  Deciding to release and selecting a Release Manager is the first step of
+  the release process. This is a consensus-based decision of the entire
+  community. Anybody can propose a release on the dev@ mailing list, giving
+  a solid argument and nominating a committer as the Release Manager
+  (including themselves). There’s no formal process, no vote requirements,
+  and no timing requirements. Any objections should be resolved by consensus
+  before starting the release."
+
+ensure_set beam_version "What version number will be this release"
+check_beam_version_in_jira "${beam_version}" "current"
+beam_version_id=$found_beam_version_id
+
+ensure_yes website_setup "Have you set up access to the beam website?" "
+  You need to prepare access to the beam website to push changes there"
+
+set_next_version
+ensure_yes next_version_looks_ok "Will the next version (after 
${beam_version}) be version ${next_version}" "
+  When contributors resolve an issue in JIRA, they are tagging it with a
+  release that will contain their changes. With the release currently underway,
+  new issues should be resolved against a subsequent future release."
+
+check_beam_version_in_jira "${next_version}" "next"
+
+check_no_unresolved_issues
 
 Review comment:
   It should be possible to start a release with some unresolved issues (but a 
warning or ack should be required at least). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82498)
Time Spent: 50m  (was: 40m)

> Automate preparation/checking of releases
> -
>
> Key: BEAM-3879
> URL: https://issues.apache.org/jira/browse/BEAM-3879
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Automate checking/preparing releases based on instructions in 
> https://beam.apache.org/contribute/release-guide/
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=82495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82495
 ]

ASF GitHub Bot logged work on BEAM-3042:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:37
Start Date: 20/Mar/18 21:37
Worklog Time Spent: 10m 
  Work Description: pabloem opened a new pull request #4912: [BEAM-3042] 
Fixing check for sideinput_io_metrics experiment flag.
URL: https://github.com/apache/beam/pull/4912
 
 
   Fixing a bug that did not check for experiments properly in the worker.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82495)
Time Spent: 10m
Remaining Estimate: 0h

> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3563) Revise Fn API metrics protos

2018-03-20 Thread Pablo Estrada (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-3563.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Revise Fn API metrics protos
> 
>
> Key: BEAM-3563
> URL: https://issues.apache.org/jira/browse/BEAM-3563
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Pablo Estrada
>Priority: Major
>  Labels: portability
> Fix For: 2.3.0
>
>
> I understand there are some changes you want in this proto. I will wait to 
> integrate for now until the protos get the next rev.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3570) More elegant interface for RuntimeValueProvider

2018-03-20 Thread Pablo Estrada (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada closed BEAM-3570.
---
   Resolution: Fixed
Fix Version/s: 2.3.0

> More elegant interface for RuntimeValueProvider
> ---
>
> Key: BEAM-3570
> URL: https://issues.apache.org/jira/browse/BEAM-3570
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.3.0
>
>
> The RuntimeValueProvider/ValueProvider classes require creating a new 
> instance whenever we want to retrieve a value from them. It may be desirable 
> to have a class method that pulls the value without creating a new instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3287) Go SDK support for portable pipelines

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3287?focusedWorklogId=82493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82493
 ]

ASF GitHub Bot logged work on BEAM-3287:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:35
Start Date: 20/Mar/18 21:35
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #4888: 
[BEAM-3287] Add Go support for universal runners, incl Flink
URL: https://github.com/apache/beam/pull/4888#discussion_r175930346
 
 

 ##
 File path: sdks/go/pkg/beam/runners/universal/universal.go
 ##
 @@ -0,0 +1,59 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package universal contains a general-purpose runner that can submit jobs
+// to any portable Beam runner.
+package universal
+
+import (
+   "context"
+   "fmt"
+
+   "github.com/apache/beam/sdks/go/pkg/beam"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/graphx"
+   _ "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/init"
+   "github.com/apache/beam/sdks/go/pkg/beam/options/jobopts"
+   "github.com/apache/beam/sdks/go/pkg/beam/runners/universal/runnerlib"
+)
+
+func init() {
+   // Note that we also _ import harness/init to setup the remote 
execution hook.
+   beam.RegisterRunner("universal", Execute)
+}
+
+// Execute execute the pipeline on a universal beam runner.
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82493)
Time Spent: 4h 10m  (was: 4h)

> Go SDK support for portable pipelines
> -
>
> Key: BEAM-3287
> URL: https://issues.apache.org/jira/browse/BEAM-3287
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The Go SDK should participate in the portability framework, incl. job 
> submission w/ a docker container image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3287) Go SDK support for portable pipelines

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3287?focusedWorklogId=82494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82494
 ]

ASF GitHub Bot logged work on BEAM-3287:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:35
Start Date: 20/Mar/18 21:35
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #4888: [BEAM-3287] Add Go 
support for universal runners, incl Flink
URL: https://github.com/apache/beam/pull/4888#issuecomment-374765140
 
 
   Thanks @lostluck! @aljoscha -- would you mind merging the code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82494)
Time Spent: 4h 20m  (was: 4h 10m)

> Go SDK support for portable pipelines
> -
>
> Key: BEAM-3287
> URL: https://issues.apache.org/jira/browse/BEAM-3287
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The Go SDK should participate in the portability framework, incl. job 
> submission w/ a docker container image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3893) Go SDK GCS I/O shouldn't require credentials to read from public buckets

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3893?focusedWorklogId=82490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82490
 ]

ASF GitHub Bot logged work on BEAM-3893:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:26
Start Date: 20/Mar/18 21:26
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #4911: [BEAM-3893] Add 
fallback to unauthenticated access for Go GCS IO
URL: https://github.com/apache/beam/pull/4911#issuecomment-374762366
 
 
   Thanks @lukecwik. PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82490)
Time Spent: 50m  (was: 40m)

> Go SDK GCS I/O shouldn't require credentials to read from public buckets
> 
>
> Key: BEAM-3893
> URL: https://issues.apache.org/jira/browse/BEAM-3893
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should handle the case where application default credentials are not 
> available for public reads. Wordcount running a container fails currently:
> 2018/03/20 20:03:00 Failed to execute job: panic: panic: failed to create GCE 
> client: google: could not find default credentials. See 
> https://developers.google.com/accounts/docs/application-default-credentials 
> for more information. goroutine 1 [running]:
> runtime/debug.Stack(0xc4201728a0, 0xc39460, 0xc4201c0120)
>   /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
> github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx.CallNoPanic.func1(0xc420172f80)
>   
> /foo/src/github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx/call.go:98
>  +0x6e
> panic(0xc39460, 0xc4201c0120)
>   /usr/local/go/src/runtime/panic.go:491 +0x283
> github.com/apache/beam/sdks/go/pkg/beam/io/textio/gcs.New(0x12f3500, 
> 0xc420014060, 0xc420094c40, 0x2)
>   
> /foo/src/github.com/apache/beam/sdks/go/pkg/beam/io/textio/gcs/gcs.go:44 
> +0x16e
> github.com/apache/beam/sdks/go/pkg/beam/io/textio.newFileSystem(0x12f3500, 
> 0xc420014060, 0xc420094c40, 0x31, 0xc420172b50, 0x4bf4ef, 0xc37e60, 0xc37e60)
>   /foo/src/github.com/apache/beam/sdks/go/pkg/beam/io/textio/textio.go:68 
> +0xac



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3896) Make the Go SDK GCS client load the credentials internally on the first 401

2018-03-20 Thread Henning Rohde (JIRA)
Henning Rohde created BEAM-3896:
---

 Summary: Make the Go SDK GCS client load the credentials 
internally on the first 401
 Key: BEAM-3896
 URL: https://issues.apache.org/jira/browse/BEAM-3896
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Henning Rohde


>From lcwik in https://github.com/apache/beam/pull/491:

"Is there any way to make the client load the credentials internally on the 
first 401?

It seems like we should only get the credentials if they are asked. We should 
still be able to pass in the scope that we want for the client."




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3895) Side Inputs should be available on ExecutableStage

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3895?focusedWorklogId=82489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82489
 ]

ASF GitHub Bot logged work on BEAM-3895:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:25
Start Date: 20/Mar/18 21:25
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4910: [BEAM-3895] Add Side 
Inputs to ExecutableStage
URL: https://github.com/apache/beam/pull/4910#issuecomment-374762250
 
 
   R: @tweise @bsidhom 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82489)
Time Spent: 0.5h  (was: 20m)

> Side Inputs should be available on ExecutableStage
> --
>
> Key: BEAM-3895
> URL: https://issues.apache.org/jira/browse/BEAM-3895
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Doing this ensures that the runner will have the side inputs immediately 
> available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3893) Go SDK GCS I/O shouldn't require credentials to read from public buckets

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3893?focusedWorklogId=82488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82488
 ]

ASF GitHub Bot logged work on BEAM-3893:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:25
Start Date: 20/Mar/18 21:25
Worklog Time Spent: 10m 
  Work Description: herohde commented on a change in pull request #4911: 
[BEAM-3893] Add fallback to unauthenticated access for Go GCS IO
URL: https://github.com/apache/beam/pull/4911#discussion_r175927831
 
 

 ##
 File path: sdks/go/pkg/beam/util/gcsx/gcs.go
 ##
 @@ -39,6 +41,15 @@ func NewClient(ctx context.Context, scope string) 
(*storage.Service, error) {
return storage.New(cl)
 
 Review comment:
   I see your point, but not a good way to do that. It seems to go against the 
flow of these libraries and we'd have to do far more work manually to achieve 
that. I'd prefer not to take that on now. Filed 
https://issues.apache.org/jira/browse/BEAM-3896.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82488)
Time Spent: 40m  (was: 0.5h)

> Go SDK GCS I/O shouldn't require credentials to read from public buckets
> 
>
> Key: BEAM-3893
> URL: https://issues.apache.org/jira/browse/BEAM-3893
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should handle the case where application default credentials are not 
> available for public reads. Wordcount running a container fails currently:
> 2018/03/20 20:03:00 Failed to execute job: panic: panic: failed to create GCE 
> client: google: could not find default credentials. See 
> https://developers.google.com/accounts/docs/application-default-credentials 
> for more information. goroutine 1 [running]:
> runtime/debug.Stack(0xc4201728a0, 0xc39460, 0xc4201c0120)
>   /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
> github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx.CallNoPanic.func1(0xc420172f80)
>   
> /foo/src/github.com/apache/beam/sdks/go/pkg/beam/core/util/reflectx/call.go:98
>  +0x6e
> panic(0xc39460, 0xc4201c0120)
>   /usr/local/go/src/runtime/panic.go:491 +0x283
> github.com/apache/beam/sdks/go/pkg/beam/io/textio/gcs.New(0x12f3500, 
> 0xc420014060, 0xc420094c40, 0x2)
>   
> /foo/src/github.com/apache/beam/sdks/go/pkg/beam/io/textio/gcs/gcs.go:44 
> +0x16e
> github.com/apache/beam/sdks/go/pkg/beam/io/textio.newFileSystem(0x12f3500, 
> 0xc420014060, 0xc420094c40, 0x31, 0xc420172b50, 0x4bf4ef, 0xc37e60, 0xc37e60)
>   /foo/src/github.com/apache/beam/sdks/go/pkg/beam/io/textio/textio.go:68 
> +0xac



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=82485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82485
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:24
Start Date: 20/Mar/18 21:24
Worklog Time Spent: 10m 
  Work Description: tgroh closed pull request #4898: [BEAM-3565] Clean up 
ExecutableStage
URL: https://github.com/apache/beam/pull/4898
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto 
b/model/pipeline/src/main/proto/beam_runner_api.proto
index 3ed90dd036e..9fa301460fb 100644
--- a/model/pipeline/src/main/proto/beam_runner_api.proto
+++ b/model/pipeline/src/main/proto/beam_runner_api.proto
@@ -207,6 +207,8 @@ message PCollection {
 // ProcessBundleDescriptor.
 message ExecutableStagePayload {
 
+  // Environment in which this stage executes. We use an environment rather 
than environment id
+  // because ExecutableStages use environments directly. This may change in 
the future.
   Environment environment = 1;
 
   // Input PCollection id.
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
index 27bfed87553..e66148421fc 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
@@ -81,53 +81,53 @@
* follows:
*
* 
-   *   The {@link PTransform#getSubtransformsList()} contains no 
subtransforms. This ensures
+   *   The {@link PTransform#getSubtransformsList()} is empty. This ensures
*   that executable stages are treated as primitive transforms.
*   The only {@link PCollection} in the {@link 
PTransform#getInputsMap()} is the result of
*   {@link #getInputPCollection()}.
*   The output {@link PCollection PCollections} in the values of {@link
*   PTransform#getOutputsMap()} are the {@link PCollectionNode 
PCollections} returned by
*   {@link #getOutputPCollections()}.
-   *   The {@link FunctionSpec} contains an {@link ExecutableStagePayload} 
which has its input
-   *   and output PCollections set to the same values as the outer 
PTransform itself. It further
-   *   contains the environment set of transforms for this stage.
+   *   The {@link PTransform#getSpec()} contains an {@link 
ExecutableStagePayload} with inputs
+   *   and outputs equal to the PTransform's inputs and outputs, and 
transforms equal to the
+   *   result of {@link #getTransforms}.
* 
*
* The executable stage can be reconstructed from the resulting {@link 
ExecutableStagePayload}
* and components alone via {@link #fromPayload(ExecutableStagePayload, 
Components)}.
*/
   default PTransform toPTransform() {
+PTransform.Builder pt = PTransform.newBuilder();
 ExecutableStagePayload.Builder payload = 
ExecutableStagePayload.newBuilder();
 
 payload.setEnvironment(getEnvironment());
 
+// Populate inputs and outputs of the stage payload and outer PTransform 
simultaneously.
 PCollectionNode input = getInputPCollection();
+pt.putInputs("input", getInputPCollection().getId());
 payload.setInput(input.getId());
 
-for (PTransformNode transform : getTransforms()) {
-  payload.addTransforms(transform.getId());
-}
-
+int outputIndex = 0;
 for (PCollectionNode output : getOutputPCollections()) {
+  pt.putOutputs(String.format("materialized_%d", outputIndex), 
output.getId());
   payload.addOutputs(output.getId());
+  outputIndex++;
+}
+
+// Inner PTransforms of this stage are hidden from the outer pipeline and 
only belong in the
+// stage payload.
+for (PTransformNode transform : getTransforms()) {
+  payload.addTransforms(transform.getId());
 }
 
-PTransform.Builder pt = PTransform.newBuilder();
 pt.setSpec(FunctionSpec.newBuilder()
 .setUrn(ExecutableStage.URN)
 .setPayload(payload.build().toByteString())
 .build());
-pt.putInputs("input", getInputPCollection().getId());
-int outputIndex = 0;
-for (PCollectionNode pcNode : getOutputPCollections()) {
-  // Do something
-  pt.putOutputs(String.format("materialized_%d", outputIndex), 
pcNode.getId());
-  outputIndex++;
-}
+
 return pt.build();
   }
 
-  // TODO: Should this live under 

[jira] [Work logged] (BEAM-3702) Support system properties source for pipeline options

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3702?focusedWorklogId=82486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82486
 ]

ASF GitHub Bot logged work on BEAM-3702:


Author: ASF GitHub Bot
Created on: 20/Mar/18 21:24
Start Date: 20/Mar/18 21:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #4683: [BEAM-3702] adding 
fromJvm to create pipelineoptions from the system properties
URL: https://github.com/apache/beam/pull/4683#issuecomment-374761801
 
 
   Please fix up the javadoc errors:
   2018-03-18T19:18:01.876 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:3.0.0-M1:jar (attach-javadocs) on 
project beam-sdks-java-core: MavenReportException: Error while generating 
Javadoc:
   2018-03-18T19:18:01.876 [ERROR] Exit code: 1 - 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall@2/src/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java:1924:
 error: unexpected end tag: 
   2018-03-18T19:18:01.876 [ERROR] * key/values.
   2018-03-18T19:18:01.876 [ERROR] ^
   2018-03-18T19:18:01.876 [ERROR] 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall@2/src/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java:276:
 error: unexpected end tag: 
   2018-03-18T19:18:01.876 [ERROR] * as a prefix in this case.
   2018-03-18T19:18:01.876 [ERROR] ^


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82486)
Time Spent: 10h 20m  (was: 10h 10m)

> Support system properties source for pipeline options
> -
>
> Key: BEAM-3702
> URL: https://issues.apache.org/jira/browse/BEAM-3702
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Romain Manni-Bucau
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >