Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4599

2017-08-18 Thread Apache Jenkins Server
See 


--
[...truncated 3.67 MB...]
2017-08-19T01:32:04.590 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-guice/2.9.2/sisu-guice-2.9.2.pom
 (8 KB at 247.5 KB/sec)
2017-08-19T01:32:04.592 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/inject/guice-parent/2.9.2/guice-parent-2.9.2.pom
2017-08-19T01:32:04.620 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/inject/guice-parent/2.9.2/guice-parent-2.9.2.pom
 (12 KB at 427.0 KB/sec)
2017-08-19T01:32:04.625 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/slf4j/jcl-over-slf4j/1.7.21/jcl-over-slf4j-1.7.21.pom
2017-08-19T01:32:04.654 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/slf4j/jcl-over-slf4j/1.7.21/jcl-over-slf4j-1.7.21.pom
 (963 B at 32.4 KB/sec)
2017-08-19T01:32:04.659 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpmime/4.5.2/httpmime-4.5.2.jar
2017-08-19T01:32:04.660 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-core/2.8.3/jackson-core-2.8.3.jar
2017-08-19T01:32:04.660 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.8.3/jackson-annotations-2.8.3.jar
2017-08-19T01:32:04.662 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit/4.5.0.201609210915-r/org.eclipse.jgit-4.5.0.201609210915-r.jar
2017-08-19T01:32:04.662 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.8.3/jackson-databind-2.8.3.jar
2017-08-19T01:32:04.697 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/httpcomponents/httpmime/4.5.2/httpmime-4.5.2.jar
 (41 KB at 1054.4 KB/sec)
2017-08-19T01:32:04.698 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/jcraft/jsch/0.1.53/jsch-0.1.53.jar
2017-08-19T01:32:04.710 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.8.3/jackson-annotations-2.8.3.jar
 (55 KB at 1065.6 KB/sec)
2017-08-19T01:32:04.710 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/googlecode/javaewah/JavaEWAH/0.7.9/JavaEWAH-0.7.9.jar
2017-08-19T01:32:04.721 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-core/2.8.3/jackson-core-2.8.3.jar
 (275 KB at 4427.3 KB/sec)
2017-08-19T01:32:04.721 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.0.7/plexus-utils-2.0.7.jar
2017-08-19T01:32:04.758 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/googlecode/javaewah/JavaEWAH/0.7.9/JavaEWAH-0.7.9.jar
 (123 KB at 1273.1 KB/sec)
2017-08-19T01:32:04.758 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-inject-plexus/1.4.3.2/sisu-inject-plexus-1.4.3.2.jar
2017-08-19T01:32:04.806 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.0.7/plexus-utils-2.0.7.jar
 (219 KB at 1518.7 KB/sec)
2017-08-19T01:32:04.807 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-inject-bean/1.4.3.2/sisu-inject-bean-1.4.3.2.jar
2017-08-19T01:32:04.807 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/jcraft/jsch/0.1.53/jsch-0.1.53.jar 
(274 KB at 1887.2 KB/sec)
2017-08-19T01:32:04.807 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-guice/2.9.2/sisu-guice-2.9.2-no_aop.jar
2017-08-19T01:32:04.871 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-inject-bean/1.4.3.2/sisu-inject-bean-1.4.3.2.jar
 (157 KB at 749.6 KB/sec)
2017-08-19T01:32:04.871 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/1.7.21/slf4j-api-1.7.21.jar
2017-08-19T01:32:04.890 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-inject-plexus/1.4.3.2/sisu-inject-plexus-1.4.3.2.jar
 (201 KB at 881.4 KB/sec)
2017-08-19T01:32:04.890 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/slf4j/jcl-over-slf4j/1.7.21/jcl-over-slf4j-1.7.21.jar
2017-08-19T01:32:04.903 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/1.7.21/slf4j-api-1.7.21.jar
 (41 KB at 166.4 KB/sec)
2017-08-19T01:32:04.942 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/slf4j/jcl-over-slf4j/1.7.21/jcl-over-slf4j-1.7.21.jar
 (17 KB at 57.3 KB/sec)
2017-08-19T01:32:04.992 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/sisu/sisu-guice/2.9.2/sisu-guice-2.9.2-no_aop.jar
 (469 KB at 1419.9 KB/sec)
2017-08-19T01:32:05.010 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.8.3/jackson-databind-2.8.3.jar
 (1205 KB at 3442.3 KB/sec)
2017-08-19T01:32:05.024 [INFO] Downloaded: 
https://repo.maven.apache

Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3794

2017-08-18 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2854

2017-08-18 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2781) Should have a canonical Compression enum

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133779#comment-16133779
 ] 

ASF GitHub Bot commented on BEAM-2781:
--

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3737

[BEAM-2781] Adds a canonical Compression enum for file-based IOs

R: @lukecwik 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam compression-type

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3737


commit 12876d43a961dd8ff9dba5f4cffcfbe643927eee
Author: Eugene Kirpichov 
Date:   2017-08-18T23:17:20Z

Adds a canonical Compression enum for file-based IOs




> Should have a canonical Compression enum
> 
>
> Key: BEAM-2781
> URL: https://issues.apache.org/jira/browse/BEAM-2781
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> There are multiple equivalent enums in the Beam Java SDK representing a 
> compresison type:
> TextIO.CompressionType
> TFRecordIO.CompressionType
> XmlIO.CompressionType
> FileBasedSink.CompressionType
> CompressedSource.CompressionMode
> This is ugly and we should unify them. That is also necessary to enable 
> authors of new file-based IOs to support compression without duplicating this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3737: [BEAM-2781] Adds a canonical Compression enum for f...

2017-08-18 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3737

[BEAM-2781] Adds a canonical Compression enum for file-based IOs

R: @lukecwik 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam compression-type

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3737


commit 12876d43a961dd8ff9dba5f4cffcfbe643927eee
Author: Eugene Kirpichov 
Date:   2017-08-18T23:17:20Z

Adds a canonical Compression enum for file-based IOs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133775#comment-16133775
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3736

[BEAM-1347] Provide an abstraction which creates an Iterator view over the 
Beam Fn State API

Combining this with the DataStreams.DataStreamDecoder converts the Beam Fn 
State API into a
an input stream backed by multiple logical chunks.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3736


commit 34f3646ec1c39a6fe0a3e8d4a2dfdcbc2b319827
Author: Luke Cwik 
Date:   2017-08-17T18:49:35Z

[BEAM-1347] Provide an abstraction which creates an Iterator view over the 
Beam Fn State API

Combining this with the DataStreams.DataStreamDecoder converts the Beam Fn 
State API into a
an input stream backed by multiple logical chunks.




> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3736: [BEAM-1347] Provide an abstraction which creates an...

2017-08-18 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3736

[BEAM-1347] Provide an abstraction which creates an Iterator view over the 
Beam Fn State API

Combining this with the DataStreams.DataStreamDecoder converts the Beam Fn 
State API into a
an input stream backed by multiple logical chunks.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3736


commit 34f3646ec1c39a6fe0a3e8d4a2dfdcbc2b319827
Author: Luke Cwik 
Date:   2017-08-17T18:49:35Z

[BEAM-1347] Provide an abstraction which creates an Iterator view over the 
Beam Fn State API

Combining this with the DataStreams.DataStreamDecoder converts the Beam Fn 
State API into a
an input stream backed by multiple logical chunks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2781) Should have a canonical Compression enum

2017-08-18 Thread Eugene Kirpichov (JIRA)
Eugene Kirpichov created BEAM-2781:
--

 Summary: Should have a canonical Compression enum
 Key: BEAM-2781
 URL: https://issues.apache.org/jira/browse/BEAM-2781
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov


There are multiple equivalent enums in the Beam Java SDK representing a 
compresison type:
TextIO.CompressionType
TFRecordIO.CompressionType
XmlIO.CompressionType
FileBasedSink.CompressionType
CompressedSource.CompressionMode

This is ugly and we should unify them. That is also necessary to enable authors 
of new file-based IOs to support compression without duplicating this 
functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Dataflow #3793

2017-08-18 Thread Apache Jenkins Server
See 


Changes:

[lcwik] [BEAM-1347] Convert an InputStream into an Iterable using the Beam Fn

--
[...truncated 327.50 KB...]
2017-08-18T23:19:32.407 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/thoughtworks/paranamer/paranamer/2.7/paranamer-2.7.pom
2017-08-18T23:19:32.438 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/thoughtworks/paranamer/paranamer/2.7/paranamer-2.7.pom
 (6 KB at 166.1 KB/sec)
2017-08-18T23:19:32.439 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/thoughtworks/paranamer/paranamer-parent/2.7/paranamer-parent-2.7.pom
2017-08-18T23:19:32.467 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/thoughtworks/paranamer/paranamer-parent/2.7/paranamer-parent-2.7.pom
 (12 KB at 401.2 KB/sec)
2017-08-18T23:19:32.469 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.4-M3/snappy-java-1.1.4-M3.pom
2017-08-18T23:19:32.494 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.4-M3/snappy-java-1.1.4-M3.pom
 (4 KB at 128.5 KB/sec)
2017-08-18T23:19:32.496 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-compress/1.14/commons-compress-1.14.pom
2017-08-18T23:19:32.523 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-compress/1.14/commons-compress-1.14.pom
 (13 KB at 475.3 KB/sec)
2017-08-18T23:19:32.525 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/42/commons-parent-42.pom
2017-08-18T23:19:32.557 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/42/commons-parent-42.pom
 (67 KB at 2073.9 KB/sec)
2017-08-18T23:19:32.562 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.6/commons-lang3-3.6.pom
2017-08-18T23:19:32.596 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.6/commons-lang3-3.6.pom
 (27 KB at 789.7 KB/sec)
2017-08-18T23:19:32.601 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/joda-time/joda-time/2.4/joda-time-2.4.pom
2017-08-18T23:19:32.628 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/joda-time/joda-time/2.4/joda-time-2.4.pom 
(26 KB at 953.3 KB/sec)
2017-08-18T23:19:32.631 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auto/service/auto-service/1.0-rc2/auto-service-1.0-rc2.pom
2017-08-18T23:19:32.664 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auto/service/auto-service/1.0-rc2/auto-service-1.0-rc2.pom
 (3 KB at 81.3 KB/sec)
2017-08-18T23:19:32.665 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auto/auto-parent/2/auto-parent-2.pom
2017-08-18T23:19:32.693 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auto/auto-parent/2/auto-parent-2.pom
 (5 KB at 158.1 KB/sec)
2017-08-18T23:19:32.695 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auto/auto-common/0.3/auto-common-0.3.pom
2017-08-18T23:19:32.721 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auto/auto-common/0.3/auto-common-0.3.pom
 (3 KB at 88.3 KB/sec)
2017-08-18T23:19:32.723 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auto/value/auto-value/1.4.1/auto-value-1.4.1.pom
2017-08-18T23:19:32.749 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auto/value/auto-value/1.4.1/auto-value-1.4.1.pom
 (7 KB at 239.5 KB/sec)
2017-08-18T23:19:32.751 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auto/auto-parent/3/auto-parent-3.pom
2017-08-18T23:19:32.777 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auto/auto-parent/3/auto-parent-3.pom
 (4 KB at 144.2 KB/sec)
2017-08-18T23:19:32.779 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.8.9/jackson-dataformat-yaml-2.8.9.pom
2017-08-18T23:19:32.808 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.8.9/jackson-dataformat-yaml-2.8.9.pom
 (7 KB at 207.5 KB/sec)
2017-08-18T23:19:32.811 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/yaml/snakeyaml/1.17/snakeyaml-1.17.pom
2017-08-18T23:19:32.839 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/yaml/snakeyaml/1.17/snakeyaml-1.17.pom 
(28 KB at 976.7 KB/sec)
2017-08-18T23:19:32.842 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/hamcrest/hamcrest-all/1.3/hamcrest-all-1.3.pom
2017-08-18T23:19:32.869 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/hamcrest/hamcrest-all/1.3/hamcrest-all-1.3.pom
 (650 B at 23.5 KB/sec)
2017-08-18T23:19:32.870 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/slf4j/slf4j-jdk14/1.7.14/slf4j-jdk14-1.7

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3792

2017-08-18 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2853

2017-08-18 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2852

2017-08-18 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-2711) ByteKeyRangeTracker.getFractionConsumed() fails when out of range positions are claimed

2017-08-18 Thread Chamikara Jayalath (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-2711.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> ByteKeyRangeTracker.getFractionConsumed() fails when out of range positions 
> are claimed
> ---
>
> Key: BEAM-2711
> URL: https://issues.apache.org/jira/browse/BEAM-2711
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
> Fix For: 2.2.0
>
>
> ByteKeyRangeTracker.getFractionConsumed() invokes 
> range.estimateFractionForKey(position) at following location.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java#L127
> This invocation fails for out of range positions.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L170
> But ByteKeyRangeTracker may accept out of range positions at following 
> location.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java#L80
> We can fix this by updating ByteKeyRangeTracker.getFractionConsumed() to 
> return 1.0 for positions that are larger than the stop position, similar to 
> OffsetRangeTracker.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRangeTracker.java#L176



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3735: Add Proto Definitions for the Artifact API

2017-08-18 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3735

Add Proto Definitions for the Artifact API

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam artifact_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3735


commit 38d92dad62e14ec0945395d32a273c1f32b94af5
Author: Thomas Groh 
Date:   2017-08-18T00:45:09Z

Add Proto Definitions for the Artifact API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3724: [BEAM-1347] Convert an InputStream into an Iterable...

2017-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3724


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133678#comment-16133678
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3724


> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[1/2] beam git commit: [BEAM-1347] Convert an InputStream into an Iterable using the Beam Fn data specification

2017-08-18 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master ae9a2dcfd -> 95cd37faf


[BEAM-1347] Convert an InputStream into an Iterable using the Beam Fn data 
specification

This is towards sharing common code that supports the Beam Fn State API and the 
Beam Fn Data API.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b949aa1b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b949aa1b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b949aa1b

Branch: refs/heads/master
Commit: b949aa1bbfd7fbb1a8159e6d650dae6196015e5c
Parents: ae9a2dc
Author: Luke Cwik 
Authored: Wed Aug 16 16:44:59 2017 -0700
Committer: Luke Cwik 
Committed: Fri Aug 18 14:59:49 2017 -0700

--
 .../beam/fn/harness/stream/DataStreams.java |  73 +++-
 .../beam/fn/harness/stream/DataStreamsTest.java | 165 ++-
 2 files changed, 192 insertions(+), 46 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b949aa1b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/DataStreams.java
--
diff --git 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/DataStreams.java
 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/DataStreams.java
index d23d784..6967160 100644
--- 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/DataStreams.java
+++ 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/DataStreams.java
@@ -17,19 +17,24 @@
  */
 package org.apache.beam.fn.harness.stream;
 
+import static com.google.common.base.Preconditions.checkState;
+
 import com.google.common.io.ByteStreams;
+import com.google.common.io.CountingInputStream;
 import com.google.protobuf.ByteString;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;
+import java.io.PushbackInputStream;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
 import java.util.concurrent.BlockingQueue;
 import org.apache.beam.fn.harness.fn.CloseableThrowingConsumer;
+import org.apache.beam.sdk.coders.Coder;
 
 /**
  * {@link #inbound(Iterator)} treats multiple {@link ByteString}s as a single 
input stream and
- * {@link #outbound(CloseableThrowingConsumer)} treats a single {@link 
OutputStream} as mulitple
+ * {@link #outbound(CloseableThrowingConsumer)} treats a single {@link 
OutputStream} as multiple
  * {@link ByteString}s.
  */
 public class DataStreams {
@@ -100,6 +105,72 @@ public class DataStreams {
   }
 
   /**
+   * An adapter which converts an {@link InputStream} to an {@link Iterator} 
of {@code T} values
+   * using the specified {@link Coder}.
+   *
+   * Note that this adapter follows the Beam Fn API specification for 
forcing values that decode
+   * consuming zero bytes to consuming exactly one byte.
+   *
+   * Note that access to the underlying {@link InputStream} is lazy and 
will only be invoked on
+   * first access to {@link #next()} or {@link #hasNext()}.
+   */
+  public static class DataStreamDecoder implements Iterator {
+private enum State { READ_REQUIRED, HAS_NEXT, EOF };
+
+private final CountingInputStream countingInputStream;
+private final PushbackInputStream pushbackInputStream;
+private final Coder coder;
+private State currentState;
+private T next;
+public DataStreamDecoder(Coder coder, InputStream inputStream) {
+  this.currentState = State.READ_REQUIRED;
+  this.coder = coder;
+  this.pushbackInputStream = new PushbackInputStream(inputStream, 1);
+  this.countingInputStream = new CountingInputStream(pushbackInputStream);
+}
+
+@Override
+public boolean hasNext() {
+  switch (currentState) {
+case EOF:
+  return false;
+case READ_REQUIRED:
+  try {
+int nextByte = pushbackInputStream.read();
+if (nextByte == -1) {
+  currentState = State.EOF;
+  return false;
+}
+
+pushbackInputStream.unread(nextByte);
+long count = countingInputStream.getCount();
+next = coder.decode(countingInputStream);
+// Skip one byte if decoding the value consumed 0 bytes.
+if (countingInputStream.getCount() - count == 0) {
+  checkState(countingInputStream.read() != -1, "Unexpected EOF 
reached");
+}
+currentState = State.HAS_NEXT;
+  } catch (IOException e) {
+throw new IllegalStateException(e);
+  }
+  // fall through expected
+case HAS_NEXT:
+  return true;
+  }
+  throw new IllegalStateException(String.format("Unknown state %s", 
currentState));
+}
+
+@Override
+public T next() {
+  if (!hasNext()) {
+throw new

[2/2] beam git commit: [BEAM-1347] Convert an InputStream into an Iterable using the Beam Fn data specification

2017-08-18 Thread lcwik
[BEAM-1347] Convert an InputStream into an Iterable using the Beam Fn data 
specification

This closes #3724


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/95cd37fa
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/95cd37fa
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/95cd37fa

Branch: refs/heads/master
Commit: 95cd37faff1c57dc4184ed001d8d168bc42d174e
Parents: ae9a2dc b949aa1
Author: Luke Cwik 
Authored: Fri Aug 18 15:00:15 2017 -0700
Committer: Luke Cwik 
Committed: Fri Aug 18 15:00:15 2017 -0700

--
 .../beam/fn/harness/stream/DataStreams.java |  73 +++-
 .../beam/fn/harness/stream/DataStreamsTest.java | 165 ++-
 2 files changed, 192 insertions(+), 46 deletions(-)
--




[jira] [Commented] (BEAM-2711) ByteKeyRangeTracker.getFractionConsumed() fails when out of range positions are claimed

2017-08-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133658#comment-16133658
 ] 

ASF GitHub Bot commented on BEAM-2711:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3715


> ByteKeyRangeTracker.getFractionConsumed() fails when out of range positions 
> are claimed
> ---
>
> Key: BEAM-2711
> URL: https://issues.apache.org/jira/browse/BEAM-2711
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> ByteKeyRangeTracker.getFractionConsumed() invokes 
> range.estimateFractionForKey(position) at following location.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java#L127
> This invocation fails for out of range positions.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRange.java#L170
> But ByteKeyRangeTracker may accept out of range positions at following 
> location.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java#L80
> We can fix this by updating ByteKeyRangeTracker.getFractionConsumed() to 
> return 1.0 for positions that are larger than the stop position, similar to 
> OffsetRangeTracker.
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRangeTracker.java#L176



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3715: [BEAM-2711] Updates ByteKeyRangeTracker so that get...

2017-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3715


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Updates ByteKeyRangeTracker so that getFractionConsumed() does not fail for completed trackers.

2017-08-18 Thread chamikara
Repository: beam
Updated Branches:
  refs/heads/master d03a1284c -> ae9a2dcfd


Updates ByteKeyRangeTracker so that getFractionConsumed() does not fail for 
completed trackers.

After this update:
* getFractionConsumed() returns 1.0 after markDone() is set.
* getFractionConsumed() returns 1.0 after tryReturnRecordAt() is invoked for a 
position that is larger than or equal to the end key.

This is similar to how getFractionConsumed() method of OffsetRangeTracker is 
implemented.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1b81f1dc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1b81f1dc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1b81f1dc

Branch: refs/heads/master
Commit: 1b81f1dc2bfad434fb764c61106679b4d6c94377
Parents: d03a128
Author: chamik...@google.com 
Authored: Thu Aug 10 17:35:37 2017 -0700
Committer: chamik...@google.com 
Committed: Fri Aug 18 14:09:39 2017 -0700

--
 .../beam/sdk/io/range/ByteKeyRangeTracker.java  |  5 +
 .../sdk/io/range/ByteKeyRangeTrackerTest.java   | 23 
 2 files changed, 28 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/1b81f1dc/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java
index b889ec7..509e434 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/ByteKeyRangeTracker.java
@@ -127,7 +127,12 @@ public final class ByteKeyRangeTracker implements 
RangeTracker {
   public synchronized double getFractionConsumed() {
 if (position == null) {
   return 0;
+} else if (done) {
+  return 1.0;
+} else if (position.compareTo(range.getEndKey()) >= 0) {
+  return 1.0;
 }
+
 return range.estimateFractionForKey(position);
   }
 

http://git-wip-us.apache.org/repos/asf/beam/blob/1b81f1dc/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTrackerTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTrackerTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTrackerTest.java
index 8deaf44..0523d75 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTrackerTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/range/ByteKeyRangeTrackerTest.java
@@ -38,6 +38,7 @@ public class ByteKeyRangeTrackerTest {
   private static final ByteKey NEW_MIDDLE_KEY = ByteKey.of(0x24);
   private static final ByteKey BEFORE_END_KEY = ByteKey.of(0x33);
   private static final ByteKey END_KEY = ByteKey.of(0x34);
+  private static final ByteKey KEY_LARGER_THAN_END = ByteKey.of(0x35);
   private static final double INITIAL_RANGE_SIZE = 0x34 - 0x12;
   private static final ByteKeyRange INITIAL_RANGE = 
ByteKeyRange.of(INITIAL_START_KEY, END_KEY);
   private static final double NEW_RANGE_SIZE = 0x34 - 0x14;
@@ -98,6 +99,28 @@ public class ByteKeyRangeTrackerTest {
 assertEquals(1 - 1 / INITIAL_RANGE_SIZE, tracker.getFractionConsumed(), 
delta);
   }
 
+  @Test
+  public void testGetFractionConsumedAfterDone() {
+ByteKeyRangeTracker tracker = ByteKeyRangeTracker.of(INITIAL_RANGE);
+double delta = 0.1;
+
+assertTrue(tracker.tryReturnRecordAt(true, INITIAL_START_KEY));
+tracker.markDone();
+
+assertEquals(1.0, tracker.getFractionConsumed(), delta);
+  }
+
+  @Test
+  public void testGetFractionConsumedAfterOutOfRangeClaim() {
+ByteKeyRangeTracker tracker = ByteKeyRangeTracker.of(INITIAL_RANGE);
+double delta = 0.1;
+
+assertTrue(tracker.tryReturnRecordAt(true, INITIAL_START_KEY));
+assertTrue(tracker.tryReturnRecordAt(false, KEY_LARGER_THAN_END));
+
+assertEquals(1.0, tracker.getFractionConsumed(), delta);
+  }
+
   /** Tests for {@link ByteKeyRangeTracker#getFractionConsumed()} with updated 
start key. */
   @Test
   public void testGetFractionConsumedUpdateStartKey() {



[2/2] beam git commit: This closes #3715

2017-08-18 Thread chamikara
This closes #3715


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ae9a2dcf
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ae9a2dcf
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ae9a2dcf

Branch: refs/heads/master
Commit: ae9a2dcfdfa3b3b38274cde0f5779ffec2301955
Parents: d03a128 1b81f1d
Author: chamik...@google.com 
Authored: Fri Aug 18 14:44:11 2017 -0700
Committer: chamik...@google.com 
Committed: Fri Aug 18 14:44:11 2017 -0700

--
 .../beam/sdk/io/range/ByteKeyRangeTracker.java  |  5 +
 .../sdk/io/range/ByteKeyRangeTrackerTest.java   | 23 
 2 files changed, 28 insertions(+)
--




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3791

2017-08-18 Thread Apache Jenkins Server
See 




[beam-site] branch asf-site updated (44d3769 -> 4c5cf9a)

2017-08-18 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from 44d3769  Regenerates website again - mergebot failed to add new files
 add c01fac2  Add Manu Zhang to committers list
 add cda8a7d  This closes #291
 new 4c5cf9a  Prepare repository for deployment.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/contribute/team/index.html | 9 +
 src/_beam_team/team.md | 6 ++
 2 files changed, 15 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[beam-site] 01/01: Prepare repository for deployment.

2017-08-18 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 4c5cf9aaa76e952c43921a7bb74bf9192882e6d8
Author: Mergebot 
AuthorDate: Fri Aug 18 19:51:26 2017 +

Prepare repository for deployment.
---
 content/contribute/team/index.html | 9 +
 1 file changed, 9 insertions(+)

diff --git a/content/contribute/team/index.html 
b/content/contribute/team/index.html
index c6c189f..42992e6 100644
--- a/content/contribute/team/index.html
+++ b/content/contribute/team/index.html
@@ -375,6 +375,15 @@
 
   
 
+  Manu Zhang
+  mauzhang
+  mauzhang [at] apache [dot] org
+  Vipshop
+  committer
+  +8
+
+  
+
   Aviem Zur
   aviemzur
   aviemzur [at] apache [dot] org

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam-site] 01/02: Add Manu Zhang to committers list

2017-08-18 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit c01fac2329b4d3cc6a7db5857860982d110428a4
Author: manuzhang 
AuthorDate: Sat Aug 12 09:28:43 2017 +0800

Add Manu Zhang to committers list
---
 src/_beam_team/team.md | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/_beam_team/team.md b/src/_beam_team/team.md
index 6f54dfa..34ab204 100644
--- a/src/_beam_team/team.md
+++ b/src/_beam_team/team.md
@@ -146,6 +146,12 @@ members:
 organization: Slack
 roles: committer, PMC
 time_zone: "-8"
+  - name: Manu Zhang
+apache_id: mauzhang
+email: mauzhang [at] apache [dot] org
+organization: Vipshop
+roles: committer
+time_zone: "+8"
   - name: Aviem Zur
 apache_id: aviemzur
 email: aviemzur [at] apache [dot] org

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam-site] branch mergebot updated (3d89790 -> cda8a7d)

2017-08-18 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from 3d89790  This closes #292
 add b548e9b  Prepare repository for deployment.
 add 44d3769  Regenerates website again - mergebot failed to add new files
 new c01fac2  Add Manu Zhang to committers list
 new cda8a7d  This closes #291

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/blog/2017/08/16/splittable-do-fn.html  | 752 +
 content/blog/index.html|  22 +
 content/feed.xml   | 599 ++--
 .../images/blog/splittable-do-fn/blocks.png| Bin
 .../blog/splittable-do-fn/jdbcio-expansion.png | Bin
 .../blog/splittable-do-fn/kafka-splitting.png  | Bin
 .../images/blog/splittable-do-fn/restrictions.png  | Bin
 .../blog/splittable-do-fn/transform-expansion.png  | Bin
 content/index.html |  10 +-
 src/_beam_team/team.md |   6 +
 10 files changed, 1341 insertions(+), 48 deletions(-)
 create mode 100644 content/blog/2017/08/16/splittable-do-fn.html
 copy {src => content}/images/blog/splittable-do-fn/blocks.png (100%)
 copy {src => content}/images/blog/splittable-do-fn/jdbcio-expansion.png (100%)
 copy {src => content}/images/blog/splittable-do-fn/kafka-splitting.png (100%)
 copy {src => content}/images/blog/splittable-do-fn/restrictions.png (100%)
 copy {src => content}/images/blog/splittable-do-fn/transform-expansion.png 
(100%)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


[beam-site] 02/02: This closes #291

2017-08-18 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit cda8a7d2d39987371dd0d271ec415fbd5ad4195c
Merge: 44d3769 c01fac2
Author: Mergebot 
AuthorDate: Fri Aug 18 19:48:29 2017 +

This closes #291

 src/_beam_team/team.md | 6 ++
 1 file changed, 6 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3790

2017-08-18 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2780) Fix Window sample code in programming guide

2017-08-18 Thread Melissa Pashniak (JIRA)
Melissa Pashniak created BEAM-2780:
--

 Summary: Fix Window sample code in programming guide
 Key: BEAM-2780
 URL: https://issues.apache.org/jira/browse/BEAM-2780
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Melissa Pashniak
Assignee: Melissa Pashniak
Priority: Minor


Window.into should be Window.into




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2851

2017-08-18 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-2776) TextIO should support reading header lines

2017-08-18 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov updated BEAM-2776:
---
Component/s: sdk-py

> TextIO should support reading header lines
> --
>
> Key: BEAM-2776
> URL: https://issues.apache.org/jira/browse/BEAM-2776
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py
>Reporter: Eugene Kirpichov
>
> Users frequently request the ability to skip some header rows when reading 
> text files.
> https://stackoverflow.com/questions/28450554/skipping-header-rows-is-it-possible-with-cloud-dataflow
> https://stackoverflow.com/questions/43551876/how-do-i-read-and-transform-csv-headers-before-bigqueryio-write
> https://stackoverflow.com/questions/41297704/reading-csv-header-with-dataflow
> https://stackoverflow.com/questions/45554466/google-cloud-dataflow-apache-beam-how-to-process-gzipped-csv-files-with-a-he
> https://stackoverflow.com/questions/44045744/how-do-i-skip-header-files-when-reading-from-google-cloud-storage-in-a-dataflow
> This is also relevant for reading file formats such as VCF, see thread 
> https://lists.apache.org/thread.html/dc7e5c3ff20d9270f06c1a298ad949da018a83f900b22d58f6b4c468@%3Cdev.beam.apache.org%3E
> Python supports this partially https://github.com/apache/beam/pull/1771/files 
> via skip_header_lines, but the header lines can have useful content, and the 
> number of header lines is not fixed (in VCF).
> We should figure out a good API for this and support this natively in TextIO. 
> The API decisions would be:
> - How do we specify how much of the beginning of each file is the header: 
> options could be e.g. a certain number of lines; or lines that start with a 
> certain character; or a custom predicate.
> - How do we make the header contents accessible to a user of TextIO. Since 
> the header can be different in each file, we can't return it as a 
> PCollectionView>. Instead I suppose, when you use a header, 
> you'd need to specify a SerializableFunction, String>, T> or 
> something like that for parsing (header, line) -> user type. Note that 
> currently TextIO.Read does not support returning a user type anyway, so 
> that'd need to be done too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2774) Add I/O source for VCF files (python)

2017-08-18 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133395#comment-16133395
 ] 

Eugene Kirpichov commented on BEAM-2774:


Related issue: https://issues.apache.org/jira/browse/BEAM-2776

> Add I/O source for VCF files (python)
> -
>
> Key: BEAM-2774
> URL: https://issues.apache.org/jira/browse/BEAM-2774
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Asha Rostamianfar
>Assignee: Asha Rostamianfar
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> A new I/O source for reading (and eventually writing) VCF files [1] for 
> Python. The design doc is available at 
> https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
> [1] http://samtools.github.io/hts-specs/VCFv4.3.pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2431) Model Runner interactions in RPC layer for Runner API

2017-08-18 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-2431.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

> Model Runner interactions in RPC layer for Runner API
> -
>
> Key: BEAM-2431
> URL: https://issues.apache.org/jira/browse/BEAM-2431
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Sourabh Bajaj
>  Labels: beam-python-everywhere
> Fix For: 2.2.0
>
>
> The "Runner API" today is actually just a definition of what constitutes a 
> Beam pipeline. It needs to actually be a (very small) API.
> This would allow e.g. a Java-based job launcher to respond to launch requests 
> and state queries from a Python-based adapter.
> The expected API would be something like a distillation of the APIs for 
> PipelineRunner and PipelineResult (which is really "Job") via analyzing how 
> these both look in Java and Python.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2600) Artifact for Python SDK harness that can be referenced in pipeline definition

2017-08-18 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133360#comment-16133360
 ] 

Kenneth Knowles commented on BEAM-2600:
---

Noting for the benefit of the JIRA that there's 
https://hub.docker.com/r/apache/ to be considered. (link comes via [~lcwik] and 
[~herohde]) 

> Artifact for Python SDK harness that can be referenced in pipeline definition
> -
>
> Key: BEAM-2600
> URL: https://issues.apache.org/jira/browse/BEAM-2600
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Kenneth Knowles
>Assignee: Ahmet Altay
>  Labels: beam-python-everywhere
>
> In order to build a pipeline that invokes a Python UDF, we need to be able to 
> construct something like this:
> {code}
> SdkFunctionSpec {
>   environment = ,
>   spec = {
> urn = ,
> data = 
>   }
> }
> {code}
> I could be out of date, but based on a couple of conversations I do not know 
> that there exists anything we can put for "" today. For 
> prototyping, it could be just a symbol that runners have to know. But 
> eventually it should be something that runners can instantiate without 
> knowing anything about the SDK that put it there. I imagine it may encompass 
> "custom containers" eventually, though that doesn't block anything 
> immediately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2779) PipelineOptionsFactory should prevent non PipelineOptions interfaces from being constructed.

2017-08-18 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2779:

Summary: PipelineOptionsFactory should prevent non PipelineOptions 
interfaces from being constructed.  (was: PipelineOptions should not serialize 
data for types on interfaces which aren't PipelineOptions)

> PipelineOptionsFactory should prevent non PipelineOptions interfaces from 
> being constructed.
> 
>
> Key: BEAM-2779
> URL: https://issues.apache.org/jira/browse/BEAM-2779
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Luke Cwik
>Priority: Minor
>  Labels: starter
>
> PipelineOptions currently serializes information about all getter/setter 
> pairs on interfaces which don't extend PipelineOptions.
> For example:
> {code:java}
> interface Foo extends PipelineOptions, Bar {
>   String getFoo();
>   void setFoo(String value);
> }
> interface Bar {
>   String getBar();
>   void setBar();
> }
> {code}
> The serialization of the above (when both *foo* and *bar* are set) will 
> produce JSON where we only include display data for *foo* but data for both 
> *foo* and *bar*. During validation of an interface in 
> *PipelineOptionsFactory*, we should throw an error if one of the users 
> interfaces doesn't extend *PipelineOptions* (note that we should ignore the 
> HasDisplayData interface).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2779) PipelineOptions should not serialize data for types on interfaces which aren't PipelineOptions

2017-08-18 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2779:

Labels: starter  (was: )

> PipelineOptions should not serialize data for types on interfaces which 
> aren't PipelineOptions
> --
>
> Key: BEAM-2779
> URL: https://issues.apache.org/jira/browse/BEAM-2779
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Luke Cwik
>Priority: Minor
>  Labels: starter
>
> PipelineOptions currently serializes information about all getter/setter 
> pairs on interfaces which don't extend PipelineOptions.
> For example:
> {code:java}
> interface Foo extends PipelineOptions, Bar {
>   String getFoo();
>   void setFoo(String value);
> }
> interface Bar {
>   String getBar();
>   void setBar();
> }
> {code}
> The serialization of the above (when both *foo* and *bar* are set) will 
> produce JSON where we only include display data for *foo* but data for both 
> *foo* and *bar*. During validation of an interface in 
> *PipelineOptionsFactory*, we should throw an error if one of the users 
> interfaces doesn't extend *PipelineOptions* (note that we should ignore the 
> HasDisplayData interface).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2779) PipelineOptions should not serialize data for types on interfaces which aren't PipelineOptions

2017-08-18 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2779:

Affects Version/s: 2.0.0

> PipelineOptions should not serialize data for types on interfaces which 
> aren't PipelineOptions
> --
>
> Key: BEAM-2779
> URL: https://issues.apache.org/jira/browse/BEAM-2779
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Luke Cwik
>Priority: Minor
>  Labels: starter
>
> PipelineOptions currently serializes information about all getter/setter 
> pairs on interfaces which don't extend PipelineOptions.
> For example:
> {code:java}
> interface Foo extends PipelineOptions, Bar {
>   String getFoo();
>   void setFoo(String value);
> }
> interface Bar {
>   String getBar();
>   void setBar();
> }
> {code}
> The serialization of the above (when both *foo* and *bar* are set) will 
> produce JSON where we only include display data for *foo* but data for both 
> *foo* and *bar*. During validation of an interface in 
> *PipelineOptionsFactory*, we should throw an error if one of the users 
> interfaces doesn't extend *PipelineOptions* (note that we should ignore the 
> HasDisplayData interface).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2779) PipelineOptions should not serialize data for types on interfaces which aren't PipelineOptions

2017-08-18 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-2779:
---

 Summary: PipelineOptions should not serialize data for types on 
interfaces which aren't PipelineOptions
 Key: BEAM-2779
 URL: https://issues.apache.org/jira/browse/BEAM-2779
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Luke Cwik
Priority: Minor


PipelineOptions currently serializes information about all getter/setter pairs 
on interfaces which don't extend PipelineOptions.

For example:

{code:java}
interface Foo extends PipelineOptions, Bar {
  String getFoo();
  void setFoo(String value);
}

interface Bar {
  String getBar();
  void setBar();
}
{code}

The serialization of the above (when both *foo* and *bar* are set) will produce 
JSON where we only include display data for *foo* but data for both *foo* and 
*bar*. During validation of an interface in *PipelineOptionsFactory*, we should 
throw an error if one of the users interfaces doesn't extend *PipelineOptions* 
(note that we should ignore the HasDisplayData interface).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : beam_PostCommit_Python_Verify #2947

2017-08-18 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3789

2017-08-18 Thread Apache Jenkins Server
See 




Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2850

2017-08-18 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Python_Verify #2946

2017-08-18 Thread Apache Jenkins Server
See 


--
[...truncated 660.32 KB...]
  "kind": "ParallelDo", 
  "name": "s14", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": ""
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "write/Write/WriteImpl/Extract.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s13"
}, 
"serialized_fn": "", 
"user_name": "write/Write/WriteImpl/Extract"
  }
}, 
{
  "kind": "CollectionToSingleton", 
  "name": "SideInput-s15", 
  "properties": {
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}
  ]
}, 
"output_name": "out", 
"user_name": 
"write/Write/WriteImpl/FinalizeWrite/AsSingleton(InitializeWrite.out.0).output"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s8"
}, 
"user_name": 
"write/Write/WriteImpl/FinalizeWrite/AsSingleton(InitializeWrite.out.0)"
  }
}, 
{
  "kind": "CollectionToSingleton", 
  "name": "SideInput-s16", 
  "properties": {
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitive

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3788

2017-08-18 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-2778) Serialization stack error using spark stream

2017-08-18 Thread li yuntian (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

li yuntian updated BEAM-2778:
-
Description: 
options..
Pipeline pipeline = Pipeline.create(options);
KafkaIO.Read read = KafkaIO.read()
.withBootstrapServers("10.139.7.xx:9092")
.withTopics(Collections.singletonList("test"))
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class);
PCollection kafkaJsonPc = pipeline.apply(read.withoutMetadata())
   
.apply(Window.>into(FixedWindows.of(Duration.standardMinutes(1
.apply(Values. create());
kafkaJsonPc.apply(WithKeys.of("global"))
.apply(GroupByKey.create());

I get errors, IF I DELETE " .apply(GroupByKey.create())"   
everything is fine.
SO I think is there something wrong with GroupBy Transform in spark streaming?
I find a jira https://issues.apache.org/jira/browse/BEAM-1624  is these the 
same? when to fix?

errors:
17/08/18 15:31:37 INFO BlockManagerInfo: Added broadcast_42_piece0 in memory on 
localhost:56153 (size: 399.0 B, free: 1804.1 MB)
17/08/18 15:31:37 INFO SparkContext: Created broadcast 42 from broadcast at 
GlobalWatermarkHolder.java:135
17/08/18 15:31:37 WARN JobGenerator: Timed out while stopping the job generator 
(timeout = 5000)
17/08/18 15:31:37 INFO JobGenerator: Waited for jobs to be processed and 
checkpoints to be written
17/08/18 15:31:47 INFO CheckpointWriter: CheckpointWriter executor terminated ? 
false, waited for 10001 ms.
17/08/18 15:31:47 WARN Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:970)
at org.apache.hadoop.ipc.Client.call(Client.java:1320)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy38.rename(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:396)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy41.rename(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1555)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:522)
at 
org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:238)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/08/18 15:31:47 INFO JobGenerator: Stopped JobGenerator
17/08/18 15:31:47 WARN Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:970)
at org.apache.hadoop.ipc.Client.call(Client.java:1320)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy38.rename(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:396)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy41.rename(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1555)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:522)
at 
org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:238)
at 
java.util.concurrent.T

[jira] [Created] (BEAM-2778) Serialization stack error using spark stream

2017-08-18 Thread li yuntian (JIRA)
li yuntian created BEAM-2778:


 Summary: Serialization stack error  using spark stream 
 Key: BEAM-2778
 URL: https://issues.apache.org/jira/browse/BEAM-2778
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Affects Versions: 2.0.0
Reporter: li yuntian
Assignee: Amit Sela


options..
Pipeline pipeline = Pipeline.create(options);
KafkaIO.Read read = KafkaIO.read()
.withBootstrapServers("10.139.7.xx:9092")
.withTopics(Collections.singletonList("test"))
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class);
PCollection kafkaJsonPc = pipeline.apply(read.withoutMetadata())
   
.apply(Window.>into(FixedWindows.of(Duration.standardMinutes(1
.apply(Values. create());
kafkaJsonPc.apply(WithKeys.of("global"))
.apply(GroupByKey.create());

I get error:
17/08/18 15:31:37 INFO BlockManagerInfo: Added broadcast_42_piece0 in memory on 
localhost:56153 (size: 399.0 B, free: 1804.1 MB)
17/08/18 15:31:37 INFO SparkContext: Created broadcast 42 from broadcast at 
GlobalWatermarkHolder.java:135
17/08/18 15:31:37 WARN JobGenerator: Timed out while stopping the job generator 
(timeout = 5000)
17/08/18 15:31:37 INFO JobGenerator: Waited for jobs to be processed and 
checkpoints to be written
17/08/18 15:31:47 INFO CheckpointWriter: CheckpointWriter executor terminated ? 
false, waited for 10001 ms.
17/08/18 15:31:47 WARN Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:970)
at org.apache.hadoop.ipc.Client.call(Client.java:1320)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy38.rename(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:396)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy41.rename(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1555)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:522)
at 
org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:238)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/08/18 15:31:47 INFO JobGenerator: Stopped JobGenerator
17/08/18 15:31:47 WARN Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:970)
at org.apache.hadoop.ipc.Client.call(Client.java:1320)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy38.rename(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:396)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy41.rename(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1555)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:522)
at 
org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:238)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool

[GitHub] beam pull request #3734: to merge Pr-3624 with fixups

2017-08-18 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/3734

to merge Pr-3624 with fixups

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam PR-3624

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3734


commit 9d4de1b2f5bf4816cb834e08aae8c127d2dca4cf
Author: Pei He 
Date:   2017-04-06T06:53:04Z

[BEAM-1899] Start jstorm runner moduel in feature branch.

commit 15ebaf0f77c6194f4f676644a2ff79fb24a1
Author: Pei He 
Date:   2017-04-06T06:55:25Z

[BEAM-1899] Add JStormRunnerRegistrar and empty implementations of 
PipelineRunner, RunnerResult, PipelineOptions.

commit f6a89b0fc2428d2f85e087525a6ddb5361eed4cb
Author: Pei He 
Date:   2017-04-22T07:12:45Z

This closes #2457

commit f1e170a5fa9dc4d462af42f9f382afd0ecd798b6
Author: Pei He 
Date:   2017-04-25T09:37:52Z

Merge branch 'master' upto commit 686b774ceda8bee32032cb421651e8350ca5bf3d 
into jstorm-runner

commit 58d4b97c0a218d01e1b64d5fced693b15d941074
Author: Kenneth Knowles 
Date:   2017-04-25T17:29:18Z

This closes #2672: Merge branch 'master' upto commit 686b774 into 
jstorm-runner

  [BEAM-1993] Remove special unbounded Flink source/sink
  Remove flink-annotations dependency
  Fix Javadoc warnings on Flink Runner
  Enable flink dependency enforcement and make dependencies explicit
  [BEAM-59] Register standard FileSystems wherever we register 
IOChannelFactories
  [BEAM-1991] Sum.SumDoubleFn => Sum.ofDoubles
  clean up description for sdk_location
  Set the Project of a Table Reference at Runtime
  Only compile HIFIO ITs when compiling with java 8.
  Update assertions of source_test_utils from camelcase to 
underscore-separated.
  Add no-else return to pylintrc
  Remove getSideInputWindow
  Remove reference to the isStreaming flag
  Javadoc fixups after style guide changes
  Update Dataflow Worker Version
  [BEAM-1922] Close datasource in JdbcIO when possible
  Fix javadoc warnings
  Add javadoc to getCheckpointMark in UnboundedSource
  Removes final minor usages of OldDoFn outside OldDoFn itself
  [BEAM-1915] Removes use of OldDoFn from Apex
  Update Signature of PTransformOverrideFactory
  [BEAM-1964] Fix lint issues and pylint upgrade
  Rename DoFn.Context#sideOutput to output
  [BEAM-1964] Fix lint issues for linter upgrade -3
  [BEAM-1964] Fix lint issues for linter upgrade -2
  Avoi repackaging bigtable classes in dataflow runner.
  ApexRunner: register standard IOs when deserializing pipeline options
  Add PCollections Utilities
  Free PTransform Names if they are being Replaced
  [BEAM-1347] Update protos related to State API for prototyping purposes.
  Update java8 examples pom files to include maven-shade-plugin.
  fix the simplest typo
  [BEAM-1964] Fix lint issues for linter upgrade
  Merge PR#2423: Add Kubernetes scripts for clusters for Performance and 
Integration tests of Cassandra and ES for Hadoop Input Format IO
  Remove Triggers.java from SDK entirely
  [BEAM-1708] Improve error message when GCP not installed
  Improve gcloud logging message
  [BEAM-1101, BEAM-1068] Remove service account name credential pipeline 
options
  Update user_score.py
  Pin versions in tox script
  Improve Empty Create Default Coder Error Message
  Represent a Pipeline via a list of Top-level Transforms
  Test all Known Coders to ensure they Serialize via URN
  [BEAM-1950] Add missing 'static' keyword to 
MicrobatchSource#initReaderCache
  Move Triggers from sdk-core 

[GitHub] beam pull request #3733: Fix a function serialization bug in AvroIO

2017-08-18 Thread azurezyq
GitHub user azurezyq opened a pull request:

https://github.com/apache/beam/pull/3733

Fix a function serialization bug in AvroIO

Hi @jkff , can you please take a look?

If ValueProvider is specified as outputPrefix, a 
NestedValueProvider is created with a SerializableFunction enclosed to perform 
some translation work.
However, in the original codes the function is created as an anonymous 
class, capturing the enclosing class TypedWrite as "this", which contains a 
un-serializable member Schema. An exception is thrown from expand() if run 
because the function is not serializable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/azurezyq/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3733


commit cd43044d7b80a3c36b2cb9fa18665ff12bb77760
Author: Yunqing Zhou 
Date:   2017-08-18T06:17:52Z

Fix a bug in AvroIO, in which a SerializableFunction is created with a 
context containing a un-serializable member (Schema)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---