[jira] [Assigned] (BEAM-407) Inconsistent synchronization in OffsetRangeTracker.copy

2017-08-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reassigned BEAM-407:
-

Assignee: Justin Tumale

> Inconsistent synchronization in OffsetRangeTracker.copy
> ---
>
> Key: BEAM-407
> URL: https://issues.apache.org/jira/browse/BEAM-407
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Justin Tumale
>Priority: Minor
>  Labels: findbugs, newbie, starter
>
> [FindBugs 
> IS2_INCONSISTENT_SYNC|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L148]:
>  Inconsistent synchronization
> Applies to: 
> [OffsetRangeTracker.copy|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/core/src/main/java/org/apache/beam/sdk/io/range/OffsetRangeTracker.java#L263].
>  Its mutating methods are all marked synchronized otherwise.
> This is a good starter bug. When fixing, please remove the corresponding 
> entries from 
> [findbugs-filter.xml|https://github.com/apache/incubator-beam/blob/master/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml]
>  and verify the build passes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-2781) Should have a canonical Compression enum

2017-08-30 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2781.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> Should have a canonical Compression enum
> 
>
> Key: BEAM-2781
> URL: https://issues.apache.org/jira/browse/BEAM-2781
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 2.2.0
>
>
> There are multiple equivalent enums in the Beam Java SDK representing a 
> compresison type:
> TextIO.CompressionType
> TFRecordIO.CompressionType
> XmlIO.CompressionType
> FileBasedSink.CompressionType
> CompressedSource.CompressionMode
> This is ugly and we should unify them. That is also necessary to enable 
> authors of new file-based IOs to support compression without duplicating this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2632) TextIOReadTest creates pipelines with non-unique application names

2017-08-30 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2632:
--
Summary: TextIOReadTest creates pipelines with non-unique application names 
 (was: TextIOReadTest create pipelines with non-unique application names)

> TextIOReadTest creates pipelines with non-unique application names
> --
>
> Key: BEAM-2632
> URL: https://issues.apache.org/jira/browse/BEAM-2632
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Huafeng Wang
>Priority: Trivial
>  Labels: newbie, starter
>
> The test {{TextIOReadTest}} uses a loop to create a few tests within a single 
> test method. This results in a pipeline with non-unique applied transform 
> nodes.
> Perhaps the best way to fix this is to use a JUnit {{Paramaterized}} test 
> suite, or multiple. It does seem that the test is basically doing the full 
> product of empty/tiny/large with various compression types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2516) User reports 4 minutes to process 1 million line CSV in DirectRunner

2017-08-30 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148418#comment-16148418
 ] 

Kenneth Knowles commented on BEAM-2516:
---

I think for 2.2.0 it is best to remove the translation to/from a proto by 
hiding it behind PipelineOptions.

There's a lot of overhead right now because of the impedance mismatch between 
the parts that are still Java-specific and the parts which are SDK-agnostic. In 
the full story for the portability framework, the DoFns and other UDFs can't 
even be deserialized, but shipped to the SDK harness. The harness will own the 
caching, so it probably doesn't make sense to add it to the DirectRunner unless 
there's one silly repeated deserialization we can eliminate. Based on the 
profiling results, perhaps there is, but no need to block anything on it.

> User reports 4 minutes to process 1 million line CSV in DirectRunner
> 
>
> Key: BEAM-2516
> URL: https://issues.apache.org/jira/browse/BEAM-2516
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Priority: Minor
> Fix For: 2.2.0
>
>
> https://stackoverflow.com/questions/44736414/simple-apache-beam-manipulations-work-very-slow
> I don't know what the expectation are here, so I wasn't ready to say this is 
> WAI. Low priority since it isn't what the runner is for anyhow, but this 
> seems like the scale of data that should be snappy. Worth investigating, or 
> maybe you can quickly indicate why it is expected?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3876

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1189) Add guide for release verifiers in the release guide

2017-08-30 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148407#comment-16148407
 ] 

Kenneth Knowles commented on BEAM-1189:
---

Worth mentioning this start on a sign-up sheet template, so we don't duplicate 
all the work: 
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit?usp=sharing

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2830) Main thread needs to re-throw worker exceptions for ValidatesRunner tests that expects exceptions.

2017-08-30 Thread Pei He (JIRA)
Pei He created BEAM-2830:


 Summary: Main thread needs to re-throw worker exceptions for 
ValidatesRunner tests that expects exceptions.
 Key: BEAM-2830
 URL: https://issues.apache.org/jira/browse/BEAM-2830
 Project: Beam
  Issue Type: New Feature
  Components: runner-mapreduce
Reporter: Pei He






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2783) Support Counters/Metrics and run ValidatesRunner tests

2017-08-30 Thread Pei He (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He reassigned BEAM-2783:


Assignee: Pei He

> Support Counters/Metrics and run ValidatesRunner tests
> --
>
> Key: BEAM-2783
> URL: https://issues.apache.org/jira/browse/BEAM-2783
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-mapreduce
>Reporter: Pei He
>Assignee: Pei He
>
> It is important to be able to run ValidatesRunner tests.
> And, we need wire MapReduce runner with the framework's counters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #4689

2017-08-30 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4688

2017-08-30 Thread Apache Jenkins Server
See 


Changes:

[altay] Add a log message to ValueError in AsSingleton

--
[...truncated 3.69 MB...]
2017-08-31T01:10:28.638 [INFO] Excluding it.unimi.dsi:fastutil:jar:7.0.6 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.apex:apex-shaded-ning19:jar:1.0.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding org.apache.apex:apex-engine:jar:3.6.0 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding org.apache.bval:bval-jsr303:jar:0.5 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding org.apache.bval:bval-core:jar:0.5 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.apex:apex-bufferserver:jar:3.6.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.httpcomponents:httpclient:jar:4.3.6 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.httpcomponents:httpcore:jar:4.3.3 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.sun.jersey.contribs:jersey-apache-client4:jar:1.9 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.hadoop:hadoop-yarn-client:jar:2.6.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding commons-lang:commons-lang:jar:2.6 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding commons-cli:commons-cli:jar:1.2 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding log4j:log4j:jar:1.2.17 from the shaded 
jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.hadoop:hadoop-annotations:jar:2.6.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.hadoop:hadoop-yarn-api:jar:2.6.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.hadoop:hadoop-yarn-common:jar:2.6.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.2 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding javax.xml.stream:stax-api:jar:1.0-2 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.google.inject.extensions:guice-servlet:jar:3.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding com.google.inject:guice:jar:3.0 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding javax.inject:javax.inject:jar:1 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding aopalliance:aopalliance:jar:1.0 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding com.sun.jersey:jersey-server:jar:1.9 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding asm:asm:jar:3.1 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.sun.jersey.contribs:jersey-guice:jar:1.9 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding jline:jline:jar:2.11 from the shaded 
jar.
2017-08-31T01:10:28.638 [INFO] Excluding org.apache.ant:ant:jar:1.9.2 from the 
shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding org.apache.ant:ant-launcher:jar:1.9.2 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding net.engio:mbassador:jar:1.1.9 from the 
shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding net.lingala.zip4j:zip4j:jar:1.3.2 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding commons-codec:commons-codec:jar:1.10 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.xbean:xbean-asm5-shaded:jar:4.3 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding org.jctools:jctools-core:jar:1.1 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.beam:beam-sdks-common-runner-api:jar:2.2.0-SNAPSHOT from the shaded 
jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Including com.google.guava:guava:jar:20.0 in the 
shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding io.grpc:grpc-core:jar:1.2.0 from the 
shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.google.errorprone:error_prone_annotations:jar:2.0.15 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding io.grpc:grpc-context:jar:1.2.0 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.google.instrumentation:instrumentation-api:jar:0.3.0 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding io.grpc:grpc-protobuf:jar:1.2.0 from 
the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding io.grpc:grpc-protobuf-lite:jar:1.2.0 
from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the 
shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
org.apache.beam:beam-sdks-java-core:jar:2.2.0-SNAPSHOT from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-core:jar:2.8.9 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9 from the shaded jar.
2017-08-31T01:10:28.638 [INFO] Excluding 

[jira] [Commented] (BEAM-2829) Add ability to set job labels in DataflowPipelineOptions

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148285#comment-16148285
 ] 

ASF GitHub Bot commented on BEAM-2829:
--

GitHub user zongweiz opened a pull request:

https://github.com/apache/beam/pull/3797

[BEAM-2829] Add ability to set job labels for Dataflow runners in BEAM Java 
SDK

Dataflow runner supports job labels specified by users and the labels will 
be populated to billing records and make query easier. This change enables BEAM 
Java SDK to use job labels on Dataflow runners. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zongweiz/beam billing-label-java

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3797


commit 5d09ac483e9eacf30b4174c19f1b5c7a2fe967d4
Author: Zongwei Zhou 
Date:   2017-08-31T00:45:34Z

Add ability to set job labels to BEAM Java SDK




> Add ability to set job labels in DataflowPipelineOptions
> 
>
> Key: BEAM-2829
> URL: https://issues.apache.org/jira/browse/BEAM-2829
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Zongwei Zhou
>Assignee: Zongwei Zhou
>Priority: Minor
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Enable setting job labels --labels in DataflowPipelineOptions (earlier 
> Dataflow SDK 1.x supports this)
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3797: [BEAM-2829] Add ability to set job labels for Dataf...

2017-08-30 Thread zongweiz
GitHub user zongweiz opened a pull request:

https://github.com/apache/beam/pull/3797

[BEAM-2829] Add ability to set job labels for Dataflow runners in BEAM Java 
SDK

Dataflow runner supports job labels specified by users and the labels will 
be populated to billing records and make query easier. This change enables BEAM 
Java SDK to use job labels on Dataflow runners. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zongweiz/beam billing-label-java

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3797


commit 5d09ac483e9eacf30b4174c19f1b5c7a2fe967d4
Author: Zongwei Zhou 
Date:   2017-08-31T00:45:34Z

Add ability to set job labels to BEAM Java SDK




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2781) Should have a canonical Compression enum

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148273#comment-16148273
 ] 

ASF GitHub Bot commented on BEAM-2781:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3737


> Should have a canonical Compression enum
> 
>
> Key: BEAM-2781
> URL: https://issues.apache.org/jira/browse/BEAM-2781
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> There are multiple equivalent enums in the Beam Java SDK representing a 
> compresison type:
> TextIO.CompressionType
> TFRecordIO.CompressionType
> XmlIO.CompressionType
> FileBasedSink.CompressionType
> CompressedSource.CompressionMode
> This is ugly and we should unify them. That is also necessary to enable 
> authors of new file-based IOs to support compression without duplicating this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3737: [BEAM-2781] Adds a canonical Compression enum for f...

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3737


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] beam git commit: This closes #3737: [BEAM-2781] Adds a canonical Compression enum for file-based IOs

2017-08-30 Thread jkff
This closes #3737: [BEAM-2781] Adds a canonical Compression enum for file-based 
IOs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5cb7be78
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5cb7be78
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5cb7be78

Branch: refs/heads/master
Commit: 5cb7be78bf1871e8ca17cbe00ad1d154f20f9c22
Parents: afe8b0e 54489f0
Author: Eugene Kirpichov 
Authored: Wed Aug 30 17:42:17 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 17:42:17 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroSink.java   |   2 +-
 .../apache/beam/sdk/io/CompressedSource.java| 292 ++-
 .../org/apache/beam/sdk/io/Compression.java | 228 +++
 .../org/apache/beam/sdk/io/FileBasedSink.java   | 113 +++
 .../java/org/apache/beam/sdk/io/TFRecordIO.java | 153 --
 .../java/org/apache/beam/sdk/io/TextIO.java | 178 +--
 .../beam/sdk/io/CompressedSourceTest.java   |  17 +-
 .../apache/beam/sdk/io/FileBasedSinkTest.java   |  41 ++-
 .../java/org/apache/beam/sdk/io/SimpleSink.java |  23 +-
 .../org/apache/beam/sdk/io/TFRecordIOTest.java  |  35 ++-
 .../org/apache/beam/sdk/io/TextIOReadTest.java  |  81 +++--
 .../org/apache/beam/sdk/io/WriteFilesTest.java  |   9 +-
 .../java/org/apache/beam/sdk/io/xml/XmlIO.java  |  96 +++---
 13 files changed, 672 insertions(+), 596 deletions(-)
--




[2/3] beam git commit: Adds a canonical Compression enum for file-based IOs

2017-08-30 Thread jkff
Adds a canonical Compression enum for file-based IOs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/54489f0d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/54489f0d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/54489f0d

Branch: refs/heads/master
Commit: 54489f0d52e354d8233bf297cce6ce451a05f6a5
Parents: afe8b0e
Author: Eugene Kirpichov 
Authored: Fri Aug 18 16:17:20 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 17:40:52 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroSink.java   |   2 +-
 .../apache/beam/sdk/io/CompressedSource.java| 292 ++-
 .../org/apache/beam/sdk/io/Compression.java | 228 +++
 .../org/apache/beam/sdk/io/FileBasedSink.java   | 113 +++
 .../java/org/apache/beam/sdk/io/TFRecordIO.java | 153 --
 .../java/org/apache/beam/sdk/io/TextIO.java | 178 +--
 .../beam/sdk/io/CompressedSourceTest.java   |  17 +-
 .../apache/beam/sdk/io/FileBasedSinkTest.java   |  41 ++-
 .../java/org/apache/beam/sdk/io/SimpleSink.java |  23 +-
 .../org/apache/beam/sdk/io/TFRecordIOTest.java  |  35 ++-
 .../org/apache/beam/sdk/io/TextIOReadTest.java  |  81 +++--
 .../org/apache/beam/sdk/io/WriteFilesTest.java  |   9 +-
 .../java/org/apache/beam/sdk/io/xml/XmlIO.java  |  96 +++---
 13 files changed, 672 insertions(+), 596 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/54489f0d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
index acd3ea6..888db85 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
@@ -40,7 +40,7 @@ class AvroSink extends 
FileBasedSink 
dynamicDestinations,
   boolean genericRecords) {
 // Avro handle compression internally using the codec.
-super(outputPrefix, dynamicDestinations, CompressionType.UNCOMPRESSED);
+super(outputPrefix, dynamicDestinations, Compression.UNCOMPRESSED);
 this.dynamicDestinations = dynamicDestinations;
 this.genericRecords = genericRecords;
   }

http://git-wip-us.apache.org/repos/asf/beam/blob/54489f0d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java
index 6943a02..ae55d80 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CompressedSource.java
@@ -20,28 +20,17 @@ package org.apache.beam.sdk.io;
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
-import com.google.common.io.ByteStreams;
-import com.google.common.primitives.Ints;
 import java.io.IOException;
-import java.io.InputStream;
-import java.io.PushbackInputStream;
 import java.io.Serializable;
 import java.nio.ByteBuffer;
-import java.nio.channels.Channels;
 import java.nio.channels.ReadableByteChannel;
 import java.util.NoSuchElementException;
-import java.util.zip.GZIPInputStream;
-import java.util.zip.ZipEntry;
-import java.util.zip.ZipInputStream;
 import javax.annotation.concurrent.GuardedBy;
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.io.fs.MatchResult.Metadata;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.transforms.display.DisplayData;
-import 
org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
-import 
org.apache.commons.compress.compressors.deflate.DeflateCompressorInputStream;
-import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
 import org.joda.time.Instant;
 
 /**
@@ -54,21 +43,20 @@ import org.joda.time.Instant;
  * FileBasedSource mySource = ...;
  * PCollection collection = p.apply(Read.from(CompressedSource
  * .from(mySource)
- * .withDecompression(CompressedSource.CompressionMode.GZIP)));
+ * .withCompression(Compression.GZIP)));
  * } 
  *
- * Supported compression algorithms are {@link CompressionMode#GZIP},
- * {@link CompressionMode#BZIP2}, {@link CompressionMode#ZIP} and {@link 
CompressionMode#DEFLATE}.
- * User-defined compression types are supported by implementing

[1/3] beam git commit: Adds a canonical Compression enum for file-based IOs

2017-08-30 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master afe8b0ea1 -> 5cb7be78b


http://git-wip-us.apache.org/repos/asf/beam/blob/54489f0d/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java
index aa6090d..65253f9 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOReadTest.java
@@ -19,12 +19,12 @@ package org.apache.beam.sdk.io;
 
 import static org.apache.beam.sdk.TestUtils.LINES_ARRAY;
 import static org.apache.beam.sdk.TestUtils.NO_LINES_ARRAY;
-import static org.apache.beam.sdk.io.TextIO.CompressionType.AUTO;
-import static org.apache.beam.sdk.io.TextIO.CompressionType.BZIP2;
-import static org.apache.beam.sdk.io.TextIO.CompressionType.DEFLATE;
-import static org.apache.beam.sdk.io.TextIO.CompressionType.GZIP;
-import static org.apache.beam.sdk.io.TextIO.CompressionType.UNCOMPRESSED;
-import static org.apache.beam.sdk.io.TextIO.CompressionType.ZIP;
+import static org.apache.beam.sdk.io.Compression.AUTO;
+import static org.apache.beam.sdk.io.Compression.BZIP2;
+import static org.apache.beam.sdk.io.Compression.DEFLATE;
+import static org.apache.beam.sdk.io.Compression.GZIP;
+import static org.apache.beam.sdk.io.Compression.UNCOMPRESSED;
+import static org.apache.beam.sdk.io.Compression.ZIP;
 import static 
org.apache.beam.sdk.transforms.Watch.Growth.afterTimeSinceNewOutput;
 import static 
org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasDisplayItem;
 import static 
org.apache.beam.sdk.transforms.display.DisplayDataMatchers.hasValue;
@@ -63,7 +63,6 @@ import java.util.zip.ZipEntry;
 import java.util.zip.ZipOutputStream;
 import org.apache.beam.sdk.coders.StringUtf8Coder;
 import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
-import org.apache.beam.sdk.io.TextIO.CompressionType;
 import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsFactory;
@@ -121,7 +120,7 @@ public class TextIOReadTest {
 
   @Rule public ExpectedException expectedException = ExpectedException.none();
 
-  private static File writeToFile(List lines, String filename, 
CompressionType compression)
+  private static File writeToFile(List lines, String filename, 
Compression compression)
   throws IOException {
 File file = tempFolder.resolve(filename).toFile();
 OutputStream output = new FileOutputStream(file);
@@ -153,19 +152,19 @@ public class TextIOReadTest {
   public static void setupClass() throws IOException {
 tempFolder = Files.createTempDirectory("TextIOTest");
 // empty files
-emptyTxt = writeToFile(EMPTY, "empty.txt", CompressionType.UNCOMPRESSED);
+emptyTxt = writeToFile(EMPTY, "empty.txt", UNCOMPRESSED);
 emptyGz = writeToFile(EMPTY, "empty.gz", GZIP);
 emptyBzip2 = writeToFile(EMPTY, "empty.bz2", BZIP2);
 emptyZip = writeToFile(EMPTY, "empty.zip", ZIP);
 emptyDeflate = writeToFile(EMPTY, "empty.deflate", DEFLATE);
 // tiny files
-tinyTxt = writeToFile(TINY, "tiny.txt", CompressionType.UNCOMPRESSED);
+tinyTxt = writeToFile(TINY, "tiny.txt", UNCOMPRESSED);
 tinyGz = writeToFile(TINY, "tiny.gz", GZIP);
 tinyBzip2 = writeToFile(TINY, "tiny.bz2", BZIP2);
 tinyZip = writeToFile(TINY, "tiny.zip", ZIP);
 tinyDeflate = writeToFile(TINY, "tiny.deflate", DEFLATE);
 // large files
-largeTxt = writeToFile(LARGE, "large.txt", CompressionType.UNCOMPRESSED);
+largeTxt = writeToFile(LARGE, "large.txt", UNCOMPRESSED);
 largeGz = writeToFile(LARGE, "large.gz", GZIP);
 largeBzip2 = writeToFile(LARGE, "large.bz2", BZIP2);
 largeZip = writeToFile(LARGE, "large.zip", ZIP);
@@ -235,7 +234,7 @@ public class TextIOReadTest {
 
   @Test
   public void testReadDisplayData() {
-TextIO.Read read = TextIO.read().from("foo.*").withCompressionType(BZIP2);
+TextIO.Read read = TextIO.read().from("foo.*").withCompression(BZIP2);
 
 DisplayData displayData = DisplayData.from(read);
 
@@ -274,11 +273,11 @@ public class TextIOReadTest {
   }
 
   @Test
-  public void testCompressionTypeIsSet() throws Exception {
+  public void testCompressionIsSet() throws Exception {
 TextIO.Read read = TextIO.read().from("/tmp/test");
-assertEquals(AUTO, read.getCompressionType());
-read = TextIO.read().from("/tmp/test").withCompressionType(GZIP);
-assertEquals(GZIP, read.getCompressionType());
+assertEquals(AUTO, read.getCompression());
+read = TextIO.read().from("/tmp/test").withCompression(GZIP);
+assertEquals(GZIP, read.getCompression());
   }
 
   /**
@@ -299,34 +298,34 @@ public class TextIOReadTest {
*
* The transforms being verified are:
* 
-   *   

Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #4686

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Closed] (BEAM-2753) File DynamicDestinations side inputs don't work with sharding

2017-08-30 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2753.
--
Resolution: Fixed

> File DynamicDestinations side inputs don't work with sharding
> -
>
> Key: BEAM-2753
> URL: https://issues.apache.org/jira/browse/BEAM-2753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 2.2.0
>
>
> WriteWithShardingFactory uses PTransformReplacements.getSingletonMaininput 
> https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WriteWithShardingFactory.java#L74
> However if the dynamic destinations have a side input, then the transform has 
> more than 1 input and the function fails:
> Exception in thread "main" java.lang.IllegalArgumentException: Got multiple 
> inputs that are not additional inputs for a singleton main input: Avro schema 
> side input/ParMultiDo(Anonymous).out0 [PCollection] and Run read all/Execute 
> queries/ParMultiDo(NaiveSpannerRead).out0 [PCollection]
>   at 
> org.apache.beam.runners.direct.repackaged.runners.core.construction.java.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:383)
>   at 
> org.apache.beam.runners.direct.repackaged.runners.core.construction.PTransformReplacements.getSingletonMainInput(PTransformReplacements.java:50)
>   at 
> org.apache.beam.runners.direct.repackaged.runners.core.construction.PTransformReplacements.getSingletonMainInput(PTransformReplacements.java:41)
>   at 
> org.apache.beam.runners.direct.WriteWithShardingFactory.getReplacementTransform(WriteWithShardingFactory.java:74)
>   at org.apache.beam.sdk.Pipeline.applyReplacement(Pipeline.java:540)
>   at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:280)
>   at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:201)
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:169)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289)
> This is not caught by unit tests because unit tests specify withoutSharding().
> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java#L644
> CC: [~mkhadikov]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2803) JdbcIO read is very slow when query return a lot of rows

2017-08-30 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-2803:
--

Assignee: Eugene Kirpichov  (was: Jean-Baptiste Onofré)

> JdbcIO read is very slow when query return a lot of rows
> 
>
> Key: BEAM-2803
> URL: https://issues.apache.org/jira/browse/BEAM-2803
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: Not applicable
>Reporter: Jérémie Vexiau
>Assignee: Eugene Kirpichov
>  Labels: performance
> Fix For: 2.2.0
>
> Attachments: test1500K.png, test1M.png, test2M.jpg, test500k.png
>
>
> Hi,
> I'm using JdbcIO reader in batch mode with the postgresql driver.
> my select query return more than 5 Millions rows
> using cursors with Statement.setFetchSize().
> these ParDo are OK :
> {code:java}
>   .apply(ParDo.of(new ReadFn<>(this))).setCoder(getCoder())
>   .apply(ParDo.of(new DoFn>() {
> private Random random;
> @Setup
> public void setup() {
>   random = new Random();
> }
> @ProcessElement
> public void processElement(ProcessContext context) {
>   context.output(KV.of(random.nextInt(), context.element()));
> }
>   }))
> {code}
> but reshuffle is very very slow. 
> it must be the GroupByKey with more than 5 millions of Key.
> {code:java}
> .apply(GroupByKey.create())
> {code}
> is there a way to optimize the reshuffle, or use another method to prevent 
> fusion ? 
> thanks in advance,
> edit: 
> I add some tests 
> I use google dataflow as runner, with 1 worker, 2 max, and workerMachineType 
> n1-standard-2
> and  autoscalingAlgorithm THROUGHPUT_BASED
> First one : query return 500 000 results : 
> !test500k.png!
> as we can see,
>  parDo(Read) is about 1300 r/s
> groupByKey is about 1080 r/s
> 2nd : query return 1 000 000 results 
> !test1M.png!
> parDo(read) => 1480 r/s
> groupByKey => 634 r/s
> 3rd : query return 1 500 000 results
> !test1500K.png!
> parDo(read) => 1700 r/s
> groupByKey => 565 r/s
> 4th query return 2 000 000 results
> !test2M.jpg!
> parDo(read) => 1485 r/s
> groupByKey => 537 r/s
> As we can see, groupByKey  rate decrease when number of record are more 
> important.
> ps:  2nd worker start just after ParDo(read) is succeed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2803) JdbcIO read is very slow when query return a lot of rows

2017-08-30 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148245#comment-16148245
 ] 

Eugene Kirpichov commented on BEAM-2803:


The conclusion of my experiments is that the combined approach is best, and 
produces performance equivalent to using shuffle service. I'm going to 
implement this.

> JdbcIO read is very slow when query return a lot of rows
> 
>
> Key: BEAM-2803
> URL: https://issues.apache.org/jira/browse/BEAM-2803
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: Not applicable
>Reporter: Jérémie Vexiau
>Assignee: Jean-Baptiste Onofré
>  Labels: performance
> Fix For: 2.2.0
>
> Attachments: test1500K.png, test1M.png, test2M.jpg, test500k.png
>
>
> Hi,
> I'm using JdbcIO reader in batch mode with the postgresql driver.
> my select query return more than 5 Millions rows
> using cursors with Statement.setFetchSize().
> these ParDo are OK :
> {code:java}
>   .apply(ParDo.of(new ReadFn<>(this))).setCoder(getCoder())
>   .apply(ParDo.of(new DoFn>() {
> private Random random;
> @Setup
> public void setup() {
>   random = new Random();
> }
> @ProcessElement
> public void processElement(ProcessContext context) {
>   context.output(KV.of(random.nextInt(), context.element()));
> }
>   }))
> {code}
> but reshuffle is very very slow. 
> it must be the GroupByKey with more than 5 millions of Key.
> {code:java}
> .apply(GroupByKey.create())
> {code}
> is there a way to optimize the reshuffle, or use another method to prevent 
> fusion ? 
> thanks in advance,
> edit: 
> I add some tests 
> I use google dataflow as runner, with 1 worker, 2 max, and workerMachineType 
> n1-standard-2
> and  autoscalingAlgorithm THROUGHPUT_BASED
> First one : query return 500 000 results : 
> !test500k.png!
> as we can see,
>  parDo(Read) is about 1300 r/s
> groupByKey is about 1080 r/s
> 2nd : query return 1 000 000 results 
> !test1M.png!
> parDo(read) => 1480 r/s
> groupByKey => 634 r/s
> 3rd : query return 1 500 000 results
> !test1500K.png!
> parDo(read) => 1700 r/s
> groupByKey => 565 r/s
> 4th query return 2 000 000 results
> !test2M.jpg!
> parDo(read) => 1485 r/s
> groupByKey => 537 r/s
> As we can see, groupByKey  rate decrease when number of record are more 
> important.
> ps:  2nd worker start just after ParDo(read) is succeed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2718) Add bundle retry logic to the DirectRunner

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148240#comment-16148240
 ] 

ASF GitHub Bot commented on BEAM-2718:
--

GitHub user mariapython opened a pull request:

https://github.com/apache/beam/pull/3796

[BEAM-2718] Improve bundle retry display.

While --direct_runner_bundle_retry is experimental, if an exception occurs, 
the user should be informed that the option to use retry is available.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariapython/incubator-beam retry

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3796


commit 593b6f6659973b2774659a66e9be613c535c3f16
Author: Maria Garcia Herrero 
Date:   2017-08-31T00:00:13Z

Improve bundle retry display.




> Add bundle retry logic to the DirectRunner
> --
>
> Key: BEAM-2718
> URL: https://issues.apache.org/jira/browse/BEAM-2718
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Affects Versions: Not applicable
>Reporter: María GH
>Assignee: María GH
>Priority: Minor
> Fix For: Not applicable
>
>
> When processing of a bundle fails, the bundle should be retried 3 times (for 
> a total of 4 attempts to process it).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3796: [BEAM-2718] Improve bundle retry display.

2017-08-30 Thread mariapython
GitHub user mariapython opened a pull request:

https://github.com/apache/beam/pull/3796

[BEAM-2718] Improve bundle retry display.

While --direct_runner_bundle_retry is experimental, if an exception occurs, 
the user should be informed that the option to use retry is available.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mariapython/incubator-beam retry

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3796


commit 593b6f6659973b2774659a66e9be613c535c3f16
Author: Maria Garcia Herrero 
Date:   2017-08-31T00:00:13Z

Improve bundle retry display.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3777: [BEAM-2814] Update error message in AsSingleton

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3777


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3777

2017-08-30 Thread altay
This closes #3777


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/afe8b0ea
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/afe8b0ea
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/afe8b0ea

Branch: refs/heads/master
Commit: afe8b0ea1783ac7df7a435f385af8a4301e5c2e6
Parents: 097aec7 521b2d7
Author: Ahmet Altay 
Authored: Wed Aug 30 16:49:18 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Aug 30 16:49:18 2017 -0700

--
 sdks/python/apache_beam/pvalue.py  | 5 +++--
 sdks/python/apache_beam/pvalue_test.py | 8 
 2 files changed, 11 insertions(+), 2 deletions(-)
--




[jira] [Commented] (BEAM-2814) test_as_singleton_with_different_defaults test is flaky

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148226#comment-16148226
 ] 

ASF GitHub Bot commented on BEAM-2814:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3777


> test_as_singleton_with_different_defaults test is flaky
> ---
>
> Key: BEAM-2814
> URL: https://issues.apache.org/jira/browse/BEAM-2814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Priority: Critical
>
> {{test_as_singleton_with_different_defaults}} is flaky and failed in the post 
> commit test 3013, but there is no related change to trigger this.
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/3013/consoleFull
> (https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2017-08-28_11_08_56-17324181904913254210?project=apache-beam-testing)
> Dataflow error form the console:
>   (b4d390f9f9e033b4): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 582, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File "apache_beam/runners/worker/operations.py", line 294, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10607)
> def start(self):
>   File "apache_beam/runners/worker/operations.py", line 295, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10501)
> with self.scoped_start_state:
>   File "apache_beam/runners/worker/operations.py", line 323, in 
> apache_beam.runners.worker.operations.DoOperation.start 
> (apache_beam/runners/worker/operations.c:10322)
> self.dofn_runner = common.DoFnRunner(
>   File "apache_beam/runners/common.py", line 378, in 
> apache_beam.runners.common.DoFnRunner.__init__ 
> (apache_beam/runners/common.c:10018)
> self.do_fn_invoker = DoFnInvoker.create_invoker(
>   File "apache_beam/runners/common.py", line 154, in 
> apache_beam.runners.common.DoFnInvoker.create_invoker 
> (apache_beam/runners/common.c:5212)
> return PerWindowInvoker(
>   File "apache_beam/runners/common.py", line 219, in 
> apache_beam.runners.common.PerWindowInvoker.__init__ 
> (apache_beam/runners/common.c:7109)
> input_args, input_kwargs, [si[global_window] for si in side_inputs])
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py",
>  line 63, in __getitem__
> _FilteringIterable(self._iterable, target_window), self._view_options)
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/pvalue.py", line 
> 332, in _from_runtime_iterable
> 'PCollection with more than one element accessed as '
> ValueError: PCollection with more than one element accessed as a singleton 
> view.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2753) File DynamicDestinations side inputs don't work with sharding

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148217#comment-16148217
 ] 

ASF GitHub Bot commented on BEAM-2753:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3765


> File DynamicDestinations side inputs don't work with sharding
> -
>
> Key: BEAM-2753
> URL: https://issues.apache.org/jira/browse/BEAM-2753
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 2.2.0
>
>
> WriteWithShardingFactory uses PTransformReplacements.getSingletonMaininput 
> https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WriteWithShardingFactory.java#L74
> However if the dynamic destinations have a side input, then the transform has 
> more than 1 input and the function fails:
> Exception in thread "main" java.lang.IllegalArgumentException: Got multiple 
> inputs that are not additional inputs for a singleton main input: Avro schema 
> side input/ParMultiDo(Anonymous).out0 [PCollection] and Run read all/Execute 
> queries/ParMultiDo(NaiveSpannerRead).out0 [PCollection]
>   at 
> org.apache.beam.runners.direct.repackaged.runners.core.construction.java.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:383)
>   at 
> org.apache.beam.runners.direct.repackaged.runners.core.construction.PTransformReplacements.getSingletonMainInput(PTransformReplacements.java:50)
>   at 
> org.apache.beam.runners.direct.repackaged.runners.core.construction.PTransformReplacements.getSingletonMainInput(PTransformReplacements.java:41)
>   at 
> org.apache.beam.runners.direct.WriteWithShardingFactory.getReplacementTransform(WriteWithShardingFactory.java:74)
>   at org.apache.beam.sdk.Pipeline.applyReplacement(Pipeline.java:540)
>   at org.apache.beam.sdk.Pipeline.replace(Pipeline.java:280)
>   at org.apache.beam.sdk.Pipeline.replaceAll(Pipeline.java:201)
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:169)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289)
> This is not caught by unit tests because unit tests specify withoutSharding().
> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java#L644
> CC: [~mkhadikov]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3765: [BEAM-2753] Fixes translation of WriteFiles side in...

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3765


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3765: [BEAM-2753] Fixes translation of WriteFiles side inputs

2017-08-30 Thread jkff
This closes #3765: [BEAM-2753] Fixes translation of WriteFiles side inputs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/097aec7a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/097aec7a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/097aec7a

Branch: refs/heads/master
Commit: 097aec7a36a01145e1f4e0332a3172ae9243bfa7
Parents: 1cd87e3 783f26f
Author: Eugene Kirpichov 
Authored: Wed Aug 30 16:29:10 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 16:29:10 2017 -0700

--
 .../core/construction/PipelineTranslation.java  |  55 ++
 .../direct/WriteWithShardingFactory.java|  13 +--
 .../java/org/apache/beam/sdk/io/AvroIOTest.java | 106 +--
 3 files changed, 112 insertions(+), 62 deletions(-)
--




[1/2] beam git commit: [BEAM-2753] Fixes translation of WriteFiles side inputs

2017-08-30 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master 1cd87e325 -> 097aec7a3


[BEAM-2753] Fixes translation of WriteFiles side inputs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/783f26f3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/783f26f3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/783f26f3

Branch: refs/heads/master
Commit: 783f26f3a80a3f2a9d5a0fafc33778e046fe6b36
Parents: 1cd87e3
Author: Eugene Kirpichov 
Authored: Fri Aug 25 14:49:07 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 16:29:05 2017 -0700

--
 .../core/construction/PipelineTranslation.java  |  55 ++
 .../direct/WriteWithShardingFactory.java|  13 +--
 .../java/org/apache/beam/sdk/io/AvroIOTest.java | 106 +--
 3 files changed, 112 insertions(+), 62 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/783f26f3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
index d928338..8a2faf3 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
@@ -152,30 +152,24 @@ public class PipelineTranslation {
 RunnerApi.FunctionSpec transformSpec = transformProto.getSpec();
 
 // By default, no "additional" inputs, since that is an SDK-specific thing.
-// Only ParDo really separates main from side inputs
+// Only ParDo and WriteFiles really separate main from side inputs
 Map additionalInputs = Collections.emptyMap();
 
-// TODO: ParDoTranslator should own it - 
https://issues.apache.org/jira/browse/BEAM-2674
+// TODO: ParDoTranslation should own it - 
https://issues.apache.org/jira/browse/BEAM-2674
 if 
(transformSpec.getUrn().equals(PTransformTranslation.PAR_DO_TRANSFORM_URN)) {
-  RunnerApi.ParDoPayload payload =
-  RunnerApi.ParDoPayload.parseFrom(transformSpec.getPayload());
-
-  List views = new ArrayList<>();
-  for (Map.Entry sideInputEntry :
-  payload.getSideInputsMap().entrySet()) {
-String localName = sideInputEntry.getKey();
-RunnerApi.SideInput sideInput = sideInputEntry.getValue();
-PCollection pCollection =
-(PCollection) checkNotNull(rehydratedInputs.get(new 
TupleTag<>(localName)));
-views.add(
-ParDoTranslation.viewFromProto(
-sideInputEntry.getValue(),
-sideInputEntry.getKey(),
-pCollection,
-transformProto,
-rehydratedComponents));
-  }
-  additionalInputs = PCollectionViews.toAdditionalInputs(views);
+  RunnerApi.ParDoPayload payload = 
RunnerApi.ParDoPayload.parseFrom(transformSpec.getPayload());
+  additionalInputs =
+  sideInputMapToAdditionalInputs(
+  transformProto, rehydratedComponents, rehydratedInputs, 
payload.getSideInputsMap());
+}
+
+// TODO: WriteFilesTranslation should own it - 
https://issues.apache.org/jira/browse/BEAM-2674
+if 
(transformSpec.getUrn().equals(PTransformTranslation.WRITE_FILES_TRANSFORM_URN))
 {
+  RunnerApi.WriteFilesPayload payload =
+  RunnerApi.WriteFilesPayload.parseFrom(transformSpec.getPayload());
+  additionalInputs =
+  sideInputMapToAdditionalInputs(
+  transformProto, rehydratedComponents, rehydratedInputs, 
payload.getSideInputsMap());
 }
 
 // TODO: CombineTranslator should own it - 
https://issues.apache.org/jira/browse/BEAM-2674
@@ -216,6 +210,25 @@ public class PipelineTranslation {
 }
   }
 
+  private static Map sideInputMapToAdditionalInputs(
+  RunnerApi.PTransform transformProto,
+  RehydratedComponents rehydratedComponents,
+  Map rehydratedInputs,
+  Map sideInputsMap)
+  throws IOException {
+List views = new ArrayList<>();
+for (Map.Entry sideInputEntry : 
sideInputsMap.entrySet()) {
+  String localName = sideInputEntry.getKey();
+  RunnerApi.SideInput sideInput = sideInputEntry.getValue();
+  PCollection pCollection =
+  (PCollection) checkNotNull(rehydratedInputs.get(new 

[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148211#comment-16148211
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3795

[BEAM-1347] Wire up the BeamFnStateGrpcClientCache implementation into the 
ProcessBundleHandler

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
Add a BeamFnStateClient that is dependent on whether the State API service 
descriptor is populated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3795


commit 9e2e999f6ee5e5ffa2e0fc4937d4f4b717f19259
Author: Luke Cwik 
Date:   2017-08-30T20:56:40Z

[BEAM-1347] Wire up the BeamFnStateGrpcClientCache implementation into the 
ProcessBundleHandler

Add a BeamFnStateClient that is dependent on whether the State API service 
descriptor is populated.




> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3795: [BEAM-1347] Wire up the BeamFnStateGrpcClientCache ...

2017-08-30 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3795

[BEAM-1347] Wire up the BeamFnStateGrpcClientCache implementation into the 
ProcessBundleHandler

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
Add a BeamFnStateClient that is dependent on whether the State API service 
descriptor is populated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3795


commit 9e2e999f6ee5e5ffa2e0fc4937d4f4b717f19259
Author: Luke Cwik 
Date:   2017-08-30T20:56:40Z

[BEAM-1347] Wire up the BeamFnStateGrpcClientCache implementation into the 
ProcessBundleHandler

Add a BeamFnStateClient that is dependent on whether the State API service 
descriptor is populated.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148196#comment-16148196
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3788


> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3788: [BEAM-1347] Implement a BeamFnStateClient which com...

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3788


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: [BEAM-1347] Implement a BeamFnStateClient which communicates over gRPC.

2017-08-30 Thread lcwik
[BEAM-1347] Implement a BeamFnStateClient which communicates over gRPC.

This closes #3788


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1cd87e32
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1cd87e32
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1cd87e32

Branch: refs/heads/master
Commit: 1cd87e325798178c2176a9d090dbc5de7f9b46d2
Parents: 585440d fb2d6b5
Author: Luke Cwik 
Authored: Wed Aug 30 16:10:52 2017 -0700
Committer: Luke Cwik 
Committed: Wed Aug 30 16:10:52 2017 -0700

--
 .../org/apache/beam/fn/harness/IdGenerator.java |  33 +++
 .../state/BeamFnStateGrpcClientCache.java   | 173 ++
 .../apache/beam/fn/harness/IdGeneratorTest.java |  40 
 .../state/BeamFnStateGrpcClientCacheTest.java   | 234 +++
 4 files changed, 480 insertions(+)
--




[1/2] beam git commit: [BEAM-1347] Implement a BeamFnStateClient which communicates over gRPC.

2017-08-30 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master 585440d22 -> 1cd87e325


[BEAM-1347] Implement a BeamFnStateClient which communicates over gRPC.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fb2d6b58
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fb2d6b58
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fb2d6b58

Branch: refs/heads/master
Commit: fb2d6b58c065604daedf02a492457ce35bacfde2
Parents: 585440d
Author: Luke Cwik 
Authored: Tue Aug 29 18:31:39 2017 -0700
Committer: Luke Cwik 
Committed: Wed Aug 30 16:10:25 2017 -0700

--
 .../org/apache/beam/fn/harness/IdGenerator.java |  33 +++
 .../state/BeamFnStateGrpcClientCache.java   | 173 ++
 .../apache/beam/fn/harness/IdGeneratorTest.java |  40 
 .../state/BeamFnStateGrpcClientCacheTest.java   | 234 +++
 4 files changed, 480 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/fb2d6b58/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/IdGenerator.java
--
diff --git 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/IdGenerator.java 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/IdGenerator.java
new file mode 100644
index 000..1112f43
--- /dev/null
+++ 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/IdGenerator.java
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness;
+
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * An id generator.
+ *
+ * This encapsulation exists to prevent usage of the wrong method on a 
shared {@link AtomicLong}.
+ */
+public final class IdGenerator {
+  private static final AtomicLong idGenerator = new AtomicLong(-1);
+
+  public static String generate() {
+return Long.toString(idGenerator.getAndDecrement());
+  }
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/fb2d6b58/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java
--
diff --git 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java
 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java
new file mode 100644
index 000..316e3e6
--- /dev/null
+++ 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/BeamFnStateGrpcClientCache.java
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.fn.harness.state;
+
+import io.grpc.ManagedChannel;
+import io.grpc.stub.StreamObserver;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.BiFunction;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import org.apache.beam.fn.harness.data.BeamFnDataGrpcClient;
+import org.apache.beam.fn.v1.BeamFnApi;
+import org.apache.beam.fn.v1.BeamFnApi.ApiServiceDescriptor;
+import org.apache.beam.fn.v1.BeamFnApi.StateRequest;
+import 

Jenkins build is back to normal : beam_PostCommit_Python_Verify #3034

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2803) JdbcIO read is very slow when query return a lot of rows

2017-08-30 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148145#comment-16148145
 ] 

Eugene Kirpichov commented on BEAM-2803:


Side inputs (at least in Dataflow) have the disadvantage that they materialize 
via Avro, which has pretty high granularity of reading parallelism (unlike 
shuffle). I'm currently running some experiments to compare 3 approaches: 
shuffle, side input, and side input THEN shuffle (the latter approach may be 
good because then we're shuffling data that's already distributed across 
multiple workers)

> JdbcIO read is very slow when query return a lot of rows
> 
>
> Key: BEAM-2803
> URL: https://issues.apache.org/jira/browse/BEAM-2803
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: Not applicable
>Reporter: Jérémie Vexiau
>Assignee: Jean-Baptiste Onofré
>  Labels: performance
> Fix For: 2.2.0
>
> Attachments: test1500K.png, test1M.png, test2M.jpg, test500k.png
>
>
> Hi,
> I'm using JdbcIO reader in batch mode with the postgresql driver.
> my select query return more than 5 Millions rows
> using cursors with Statement.setFetchSize().
> these ParDo are OK :
> {code:java}
>   .apply(ParDo.of(new ReadFn<>(this))).setCoder(getCoder())
>   .apply(ParDo.of(new DoFn>() {
> private Random random;
> @Setup
> public void setup() {
>   random = new Random();
> }
> @ProcessElement
> public void processElement(ProcessContext context) {
>   context.output(KV.of(random.nextInt(), context.element()));
> }
>   }))
> {code}
> but reshuffle is very very slow. 
> it must be the GroupByKey with more than 5 millions of Key.
> {code:java}
> .apply(GroupByKey.create())
> {code}
> is there a way to optimize the reshuffle, or use another method to prevent 
> fusion ? 
> thanks in advance,
> edit: 
> I add some tests 
> I use google dataflow as runner, with 1 worker, 2 max, and workerMachineType 
> n1-standard-2
> and  autoscalingAlgorithm THROUGHPUT_BASED
> First one : query return 500 000 results : 
> !test500k.png!
> as we can see,
>  parDo(Read) is about 1300 r/s
> groupByKey is about 1080 r/s
> 2nd : query return 1 000 000 results 
> !test1M.png!
> parDo(read) => 1480 r/s
> groupByKey => 634 r/s
> 3rd : query return 1 500 000 results
> !test1500K.png!
> parDo(read) => 1700 r/s
> groupByKey => 565 r/s
> 4th query return 2 000 000 results
> !test2M.jpg!
> parDo(read) => 1485 r/s
> groupByKey => 537 r/s
> As we can see, groupByKey  rate decrease when number of record are more 
> important.
> ps:  2nd worker start just after ParDo(read) is succeed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2829) Add ability to set job labels in DataflowPipelineOptions

2017-08-30 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-2829:
---

Assignee: Zongwei Zhou  (was: Thomas Groh)

> Add ability to set job labels in DataflowPipelineOptions
> 
>
> Key: BEAM-2829
> URL: https://issues.apache.org/jira/browse/BEAM-2829
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Zongwei Zhou
>Assignee: Zongwei Zhou
>Priority: Minor
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Enable setting job labels --labels in DataflowPipelineOptions (earlier 
> Dataflow SDK 1.x supports this)
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1189) Add guide for release verifiers in the release guide

2017-08-30 Thread Griselda Cuevas Zambrano (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148102#comment-16148102
 ] 

Griselda Cuevas Zambrano commented on BEAM-1189:


Hi there - I'm planning to use the content of the Acceptance Criteria section 
in this doc: 
https://docs.google.com/document/d/1XwojJ4Mj3wSlnBO1YlBs51P8kuGygYRj2lrNrqmAUvo/edit#

To work on this, let me know if I should consider something else. 

G 

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1927) Add tfrecord io to built in IO list

2017-08-30 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148100#comment-16148100
 ] 

Ahmet Altay commented on BEAM-1927:
---

Fixed in: https://github.com/apache/beam-site/pull/303

> Add tfrecord io to built in IO list
> ---
>
> Key: BEAM-1927
> URL: https://issues.apache.org/jira/browse/BEAM-1927
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py, website
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
>  Labels: documentation, newbie, starter
> Fix For: 2.2.0
>
>
> Add tfrecordio 
> (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/tfrecordio.py)
>  to built-in io list (https://beam.apache.org/documentation/io/built-in/).
> cc: [~melap]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-1927) Add tfrecord io to built in IO list

2017-08-30 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-1927.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

> Add tfrecord io to built in IO list
> ---
>
> Key: BEAM-1927
> URL: https://issues.apache.org/jira/browse/BEAM-1927
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py, website
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
>  Labels: documentation, newbie, starter
> Fix For: 2.2.0
>
>
> Add tfrecordio 
> (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/tfrecordio.py)
>  to built-in io list (https://beam.apache.org/documentation/io/built-in/).
> cc: [~melap]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1927) Add tfrecord io to built in IO list

2017-08-30 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-1927:
-

Assignee: Ahmet Altay

> Add tfrecord io to built in IO list
> ---
>
> Key: BEAM-1927
> URL: https://issues.apache.org/jira/browse/BEAM-1927
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py, website
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Minor
>  Labels: documentation, newbie, starter
> Fix For: 2.2.0
>
>
> Add tfrecordio 
> (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/tfrecordio.py)
>  to built-in io list (https://beam.apache.org/documentation/io/built-in/).
> cc: [~melap]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Python_Verify #3033

2017-08-30 Thread Apache Jenkins Server
See 


--
[...truncated 651.77 KB...]
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/mock-2.0.0.tar.gz",
 
"name": "mock-2.0.0.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-36.2.3.zip",
 
"name": "setuptools-36.2.3.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/pbr-3.0.0.tar.gz",
 
"name": "pbr-3.0.0.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/pbr-2.1.0.tar.gz",
 
"name": "pbr-2.1.0.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/pbr-3.1.1.tar.gz",
 
"name": "pbr-3.1.1.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/funcsigs-1.0.2.tar.gz",
 
"name": "funcsigs-1.0.2.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/PyHamcrest-1.9.0.tar.gz",
 
"name": "PyHamcrest-1.9.0.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-36.2.0.zip",
 
"name": "setuptools-36.2.0.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/pbr-1.10.0.tar.gz",
 
"name": "pbr-1.10.0.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-36.2.1.zip",
 
"name": "setuptools-36.2.1.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-36.0.1.zip",
 
"name": "setuptools-36.0.1.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-35.0.0.zip",
 
"name": "setuptools-35.0.0.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/pbr-3.1.0.tar.gz",
 
"name": "pbr-3.1.0.tar.gz"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-34.3.1.zip",
 
"name": "setuptools-34.3.1.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-36.2.5.zip",
 
"name": "setuptools-36.2.5.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-34.3.3.zip",
 
"name": "setuptools-34.3.3.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-34.3.2.zip",
 
"name": "setuptools-34.3.2.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/setuptools-34.1.1.zip",
 
"name": "setuptools-34.1.1.zip"
  }, 
  {
"location": 
"storage.googleapis.com/temp-storage-for-end-to-end-tests/staging-validatesrunner-test/beamapp-jenkins-0830211705-985000.1504127825.985239/pyparsing-2.1.10.zip",
 
"name": 

[jira] [Assigned] (BEAM-1189) Add guide for release verifiers in the release guide

2017-08-30 Thread Griselda Cuevas Zambrano (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Griselda Cuevas Zambrano reassigned BEAM-1189:
--

Assignee: Griselda Cuevas Zambrano

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4685

2017-08-30 Thread Apache Jenkins Server
See 


Changes:

[lcwik] [BEAM-1347] Create value state, combining state, and bag state views

--
[...truncated 98.45 KB...]
2017-08-30T21:32:59.407 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
2017-08-30T21:32:59.426 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/aether/aether-util/1.7/aether-util-1.7.jar
 (106 KB at 411.8 KB/sec)
2017-08-30T21:32:59.426 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/plexus/plexus-cipher/1.4/plexus-cipher-1.4.jar
2017-08-30T21:32:59.430 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-interpolation/1.14/plexus-interpolation-1.14.jar
 (60 KB at 229.4 KB/sec)
2017-08-30T21:32:59.430 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.24/plexus-utils-3.0.24.jar
2017-08-30T21:32:59.436 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/1.6/plexus-component-annotations-1.6.jar
 (5 KB at 15.7 KB/sec)
2017-08-30T21:32:59.436 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-api/3.0.0-M1/enforcer-api-3.0.0-M1.jar
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
2017-08-30T21:32:59.454 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/plexus/plexus-cipher/1.4/plexus-cipher-1.4.jar
 (14 KB at 46.4 KB/sec)
2017-08-30T21:32:59.454 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-rules/3.0.0-M1/enforcer-rules-3.0.0-M1.jar
2017-08-30T21:32:59.463 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-api/3.0.0-M1/enforcer-api-3.0.0-M1.jar
 (12 KB at 39.2 KB/sec)
2017-08-30T21:32:59.463 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar
2017-08-30T21:32:59.486 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.24/plexus-utils-3.0.24.jar
 (242 KB at 764.4 KB/sec)
2017-08-30T21:32:59.487 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.6/commons-codec-1.6.jar
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
2017-08-30T21:32:59.494 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-rules/3.0.0-M1/enforcer-rules-3.0.0-M1.jar
 (103 KB at 316.9 KB/sec)
2017-08-30T21:32:59.494 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/beanshell/bsh/2.0b4/bsh-2.0b4.jar
2017-08-30T21:32:59.525 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar
 (469 KB at 1320.1 KB/sec)
2017-08-30T21:32:59.525 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-compat/3.0/maven-compat-3.0.jar
2017-08-30T21:32:59.539 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.6/commons-codec-1.6.jar
 (228 KB at 616.0 KB/sec)
2017-08-30T21:32:59.539 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/wagon/wagon-provider-api/1.0-beta-6/wagon-provider-api-1.0-beta-6.jar
2017-08-30T21:32:59.541 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
 (28 KB at 75.2 KB/sec)
2017-08-30T21:32:59.549 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/beanshell/bsh/2.0b4/bsh-2.0b4.jar (276 
KB at 725.8 KB/sec)
2017-08-30T21:32:59.571 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/wagon/wagon-provider-api/1.0-beta-6/wagon-provider-api-1.0-beta-6.jar
 (52 KB at 129.5 KB/sec)
2017-08-30T21:32:59.573 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-compat/3.0/maven-compat-3.0.jar
 (279 KB at 690.4 KB/sec)
2017-08-30T21:33:00.319 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/ibm/icu/icu4j/56.1/icu4j-56.1.jar 
(10731 KB at 9339.3 KB/sec)
[WARNING] Failed to getClass for org.apache.maven.plugins.enforcer.EnforceMojo
[JENKINS] Archiving disabled
2017-08-30T21:33:02.863 [INFO]  
   
2017-08-30T21:33:02.863 [INFO] 

2017-08-30T21:33:02.863 [INFO] Skipping Apache Beam :: Parent
2017-08-30T21:33:02.863 [INFO] This project has been banned from the build due 

[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148084#comment-16148084
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3783


> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3783: [BEAM-1347] Create value state, combining state, an...

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3783


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: [BEAM-1347] Create value state, combining state, and bag state views over the BagUserState.

2017-08-30 Thread lcwik
[BEAM-1347] Create value state, combining state, and bag state views over the 
BagUserState.

This closes #3783


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/585440d2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/585440d2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/585440d2

Branch: refs/heads/master
Commit: 585440d228db2eae841bc92fa0babd9e131ef839
Parents: f6c8405 e0f628c
Author: Luke Cwik 
Authored: Wed Aug 30 14:30:51 2017 -0700
Committer: Luke Cwik 
Committed: Wed Aug 30 14:30:51 2017 -0700

--
 .../apache/beam/fn/harness/FnApiDoFnRunner.java | 380 ++-
 .../beam/fn/harness/FnApiDoFnRunnerTest.java| 229 +++
 .../fn/harness/state/FakeBeamFnStateClient.java |   2 +-
 3 files changed, 605 insertions(+), 6 deletions(-)
--




[1/2] beam git commit: [BEAM-1347] Create value state, combining state, and bag state views over the BagUserState.

2017-08-30 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master f6c840533 -> 585440d22


[BEAM-1347] Create value state, combining state, and bag state views over the 
BagUserState.

Also bind the state persistence to the end of finishBundle.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e0f628cc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e0f628cc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e0f628cc

Branch: refs/heads/master
Commit: e0f628cc7fbf6cbfb46825d6ee7bbc29e0bd66f5
Parents: f6c8405
Author: Luke Cwik 
Authored: Tue Aug 29 10:45:04 2017 -0700
Committer: Luke Cwik 
Committed: Wed Aug 30 14:30:27 2017 -0700

--
 .../apache/beam/fn/harness/FnApiDoFnRunner.java | 380 ++-
 .../beam/fn/harness/FnApiDoFnRunnerTest.java| 229 +++
 .../fn/harness/state/FakeBeamFnStateClient.java |   2 +-
 3 files changed, 605 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/e0f628cc/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
--
diff --git 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
index d325bb2..c361647 100644
--- 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
+++ 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
@@ -18,45 +18,77 @@
 package org.apache.beam.fn.harness;
 
 import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
 
 import com.google.auto.service.AutoService;
+import com.google.common.base.Suppliers;
 import com.google.common.collect.Collections2;
+import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.ImmutableMultimap;
 import com.google.common.collect.Multimap;
 import com.google.protobuf.ByteString;
+import java.io.IOException;
+import java.util.ArrayList;
 import java.util.Collection;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.Iterator;
 import java.util.Map;
 import java.util.Objects;
 import java.util.function.Consumer;
+import java.util.function.Function;
 import java.util.function.Supplier;
 import org.apache.beam.fn.harness.data.BeamFnDataClient;
 import org.apache.beam.fn.harness.fn.ThrowingConsumer;
 import org.apache.beam.fn.harness.fn.ThrowingRunnable;
+import org.apache.beam.fn.harness.state.BagUserState;
 import org.apache.beam.fn.harness.state.BeamFnStateClient;
+import org.apache.beam.fn.v1.BeamFnApi.StateKey;
+import org.apache.beam.fn.v1.BeamFnApi.StateRequest;
+import org.apache.beam.fn.v1.BeamFnApi.StateRequest.Builder;
 import org.apache.beam.runners.core.DoFnRunner;
 import org.apache.beam.runners.core.construction.ParDoTranslation;
 import org.apache.beam.runners.dataflow.util.DoFnInfo;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.KvCoder;
 import org.apache.beam.sdk.common.runner.v1.RunnerApi;
 import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.CombiningState;
+import org.apache.beam.sdk.state.MapState;
+import org.apache.beam.sdk.state.ReadableState;
+import org.apache.beam.sdk.state.ReadableStates;
+import org.apache.beam.sdk.state.SetState;
 import org.apache.beam.sdk.state.State;
+import org.apache.beam.sdk.state.StateBinder;
+import org.apache.beam.sdk.state.StateContext;
+import org.apache.beam.sdk.state.StateSpec;
 import org.apache.beam.sdk.state.TimeDomain;
 import org.apache.beam.sdk.state.Timer;
+import org.apache.beam.sdk.state.ValueState;
+import org.apache.beam.sdk.state.WatermarkHoldState;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.CombineWithContext.CombineFnWithContext;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.DoFn.OnTimerContext;
 import org.apache.beam.sdk.transforms.DoFn.ProcessContext;
 import org.apache.beam.sdk.transforms.reflect.DoFnInvoker;
 import org.apache.beam.sdk.transforms.reflect.DoFnInvokers;
 import org.apache.beam.sdk.transforms.reflect.DoFnSignature;
+import org.apache.beam.sdk.transforms.reflect.DoFnSignature.StateDeclaration;
 import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
 import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
 import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.PaneInfo;
+import 

[jira] [Updated] (BEAM-2803) JdbcIO read is very slow when query return a lot of rows

2017-08-30 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov updated BEAM-2803:
---
Fix Version/s: (was: Not applicable)
   2.2.0

> JdbcIO read is very slow when query return a lot of rows
> 
>
> Key: BEAM-2803
> URL: https://issues.apache.org/jira/browse/BEAM-2803
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: Not applicable
>Reporter: Jérémie Vexiau
>Assignee: Jean-Baptiste Onofré
>  Labels: performance
> Fix For: 2.2.0
>
> Attachments: test1500K.png, test1M.png, test2M.jpg, test500k.png
>
>
> Hi,
> I'm using JdbcIO reader in batch mode with the postgresql driver.
> my select query return more than 5 Millions rows
> using cursors with Statement.setFetchSize().
> these ParDo are OK :
> {code:java}
>   .apply(ParDo.of(new ReadFn<>(this))).setCoder(getCoder())
>   .apply(ParDo.of(new DoFn>() {
> private Random random;
> @Setup
> public void setup() {
>   random = new Random();
> }
> @ProcessElement
> public void processElement(ProcessContext context) {
>   context.output(KV.of(random.nextInt(), context.element()));
> }
>   }))
> {code}
> but reshuffle is very very slow. 
> it must be the GroupByKey with more than 5 millions of Key.
> {code:java}
> .apply(GroupByKey.create())
> {code}
> is there a way to optimize the reshuffle, or use another method to prevent 
> fusion ? 
> thanks in advance,
> edit: 
> I add some tests 
> I use google dataflow as runner, with 1 worker, 2 max, and workerMachineType 
> n1-standard-2
> and  autoscalingAlgorithm THROUGHPUT_BASED
> First one : query return 500 000 results : 
> !test500k.png!
> as we can see,
>  parDo(Read) is about 1300 r/s
> groupByKey is about 1080 r/s
> 2nd : query return 1 000 000 results 
> !test1M.png!
> parDo(read) => 1480 r/s
> groupByKey => 634 r/s
> 3rd : query return 1 500 000 results
> !test1500K.png!
> parDo(read) => 1700 r/s
> groupByKey => 565 r/s
> 4th query return 2 000 000 results
> !test2M.jpg!
> parDo(read) => 1485 r/s
> groupByKey => 537 r/s
> As we can see, groupByKey  rate decrease when number of record are more 
> important.
> ps:  2nd worker start just after ParDo(read) is succeed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2829) Add ability to set job labels in DataflowPipelineOptions

2017-08-30 Thread Zongwei Zhou (JIRA)
Zongwei Zhou created BEAM-2829:
--

 Summary: Add ability to set job labels in DataflowPipelineOptions
 Key: BEAM-2829
 URL: https://issues.apache.org/jira/browse/BEAM-2829
 Project: Beam
  Issue Type: New Feature
  Components: runner-dataflow
Affects Versions: 2.1.0, 2.0.0
Reporter: Zongwei Zhou
Assignee: Thomas Groh
Priority: Minor
 Fix For: 2.2.0


Enable setting job labels --labels in DataflowPipelineOptions (earlier Dataflow 
SDK 1.x supports this)
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-2644) Make it easier to test runtime-accessible ValueProvider's

2017-08-30 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-2644.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> Make it easier to test runtime-accessible ValueProvider's
> -
>
> Key: BEAM-2644
> URL: https://issues.apache.org/jira/browse/BEAM-2644
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 2.2.0
>
>
> Many transforms that take ValueProvider's have different codepaths for when 
> the provider is accessible or not. However, as far as I can tell, there is no 
> good way to construct a transform with an inaccessible ValueProvider, and 
> then test how it runs with an actual value supplied.
> The only way I could come up with is mimicking 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/options/ValueProviderTest.java#L202
>  , which is very ugly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4684

2017-08-30 Thread Apache Jenkins Server
See 


Changes:

[ekirpichov] [BEAM-2644] Introduces TestPipeline.newProvider()

--
[...truncated 1.12 MB...]
2017-08-30T20:08:34.222 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/fusesource/sigar/1.6.4/sigar-1.6.4.jar 
(419 KB at 63.4 KB/sec)
2017-08-30T20:08:34.222 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.jar
2017-08-30T20:08:34.276 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/cassandra/cassandra-thrift/3.9/cassandra-thrift-3.9.jar
 (1858 KB at 279.3 KB/sec)
2017-08-30T20:08:34.276 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/caffinitas/ohc/ohc-core/0.4.3/ohc-core-0.4.3.jar
2017-08-30T20:08:34.331 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/caffinitas/ohc/ohc-core/0.4.3/ohc-core-0.4.3.jar
 (125 KB at 18.5 KB/sec)
2017-08-30T20:08:34.331 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/2.2.6/caffeine-2.2.6.jar
2017-08-30T20:08:34.591 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final.jar
 (2219 KB at 318.4 KB/sec)
2017-08-30T20:08:34.591 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.1.1/cassandra-driver-core-3.1.1.jar
2017-08-30T20:08:34.604 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/2.2.6/caffeine-2.2.6.jar
 (926 KB at 132.6 KB/sec)
2017-08-30T20:08:34.735 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.jar
 (2257 KB at 317.3 KB/sec)
2017-08-30T20:08:34.811 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.1.1/cassandra-driver-core-3.1.1.jar
 (1029 KB at 143.0 KB/sec)
2017-08-30T20:08:35.390 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/it/unimi/dsi/fastutil/6.5.7/fastutil-6.5.7.jar
 (16508 KB at 2125.6 KB/sec)
2017-08-30T20:08:36.049 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/storm/storm-core/1.0.1/storm-core-1.0.1.jar
 (19650 KB at 2330.9 KB/sec)
2017-08-30T20:08:36.114 [INFO] Downloading: 
http://conjars.org/repo/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.jar
2017-08-30T20:08:36.114 [INFO] Downloading: 
http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
2017-08-30T20:08:36.115 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
2017-08-30T20:08:36.115 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-core/2.6.3/cascading-core-2.6.3.jar
2017-08-30T20:08:36.116 [INFO] Downloading: 
http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar
2017-08-30T20:08:36.173 [INFO] Downloading: 
http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
2017-08-30T20:08:36.230 [INFO] Downloaded: 
http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar (12 KB at 96.4 
KB/sec)
2017-08-30T20:08:36.230 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
2017-08-30T20:08:36.287 [INFO] Downloaded: 
http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
 (48 KB at 274.1 KB/sec)
2017-08-30T20:08:36.399 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
 (43 KB at 149.6 KB/sec)
2017-08-30T20:08:36.516 [INFO] Downloaded: 
http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
 (230 KB at 573.1 KB/sec)
2017-08-30T20:08:36.529 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
 (246 KB at 593.9 KB/sec)
2017-08-30T20:08:36.777 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-core/2.6.3/cascading-core-2.6.3.jar 
(680 KB at 1024.8 KB/sec)
2017-08-30T20:08:36.779 [INFO] Downloading: 
http://clojars.org/repo/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.jar
2017-08-30T20:08:36.836 [INFO] Downloading: 
http://www.datanucleus.org/downloads/maven2/org/apache/hive/hive-exec/1.2.1/hive-exec-1.2.1.jar
[JENKINS] Archiving disabled
2017-08-30T20:08:38.001 [INFO]  
   
2017-08-30T20:08:38.001 [INFO] 

2017-08-30T20:08:38.001 [INFO] Skipping Apache Beam :: Parent
2017-08-30T20:08:38.001 [INFO] This project has been banned from the build due 
to previous failures.
2017-08-30T20:08:38.001 [INFO] 

[JENKINS] Archiving disabled
[JENKINS] Archiving disabled

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Apex #2285

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2516) User reports 4 minutes to process 1 million line CSV in DirectRunner

2017-08-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147945#comment-16147945
 ] 

Ismaël Mejía commented on BEAM-2516:


It seems that with all the ongoing work to support the Fn/Runner API we have 
introduced a regression in the performance of the Direct runner.

The classical quickstart wordcount with the kinglear.txt file (170KB) on my 
machine passed from 5s with Beam 2.1.0 to 126s using the current 2.2.0-SNAPSHOT.

I executed the same wordcount with different inputs and got these times:
File size: Beam 2.1.0 vs Beam 2.2.0-SNAPSHOT
0.17MB: 5s vs 126s 
1MB: 8s vs 149s
11MB: 28s vs 170s

>From a quick view it seems that the regression does not seem to be because of 
>the size of the input. I profiled the execution and noticed that GC and 
>threads are OK, but CPU use is really high now because of Serialization on the 
>different transforms. I suppose this is the price to pay to have all the 
>multilanguage proto magic. I know that for example for a big batch job this 
>‘set-up’ time may be negligible but this is still a considerable regression.

The performance can be improved by avoiding the translation into the Runner API 
(in particular for a Java job which really does not benefit of it) but imagine 
this is not the goal, so maybe we need to explore other ways to tackle this by 
caching or avoiding some serialization weight.

> User reports 4 minutes to process 1 million line CSV in DirectRunner
> 
>
> Key: BEAM-2516
> URL: https://issues.apache.org/jira/browse/BEAM-2516
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Priority: Minor
>
> https://stackoverflow.com/questions/44736414/simple-apache-beam-manipulations-work-very-slow
> I don't know what the expectation are here, so I wasn't ready to say this is 
> WAI. Low priority since it isn't what the runner is for anyhow, but this 
> seems like the scale of data that should be snappy. Worth investigating, or 
> maybe you can quickly indicate why it is expected?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #3871

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2644) Make it easier to test runtime-accessible ValueProvider's

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147893#comment-16147893
 ] 

ASF GitHub Bot commented on BEAM-2644:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3753


> Make it easier to test runtime-accessible ValueProvider's
> -
>
> Key: BEAM-2644
> URL: https://issues.apache.org/jira/browse/BEAM-2644
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> Many transforms that take ValueProvider's have different codepaths for when 
> the provider is accessible or not. However, as far as I can tell, there is no 
> good way to construct a transform with an inaccessible ValueProvider, and 
> then test how it runs with an actual value supplied.
> The only way I could come up with is mimicking 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/options/ValueProviderTest.java#L202
>  , which is very ugly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3753: [BEAM-2644] Introduces TestPipeline.newProvider()

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3753


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-2644] Introduces TestPipeline.newProvider()

2017-08-30 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master 5c2cab017 -> f6c840533


[BEAM-2644] Introduces TestPipeline.newProvider()


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f1b19b71
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f1b19b71
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f1b19b71

Branch: refs/heads/master
Commit: f1b19b71d2905079a4640d9fb89e02985ca6e873
Parents: 5c2cab0
Author: Eugene Kirpichov 
Authored: Wed Aug 23 19:13:46 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 12:14:34 2017 -0700

--
 .../apache/beam/sdk/options/ValueProvider.java  | 10 ++--
 .../apache/beam/sdk/options/ValueProviders.java | 15 +++---
 .../apache/beam/sdk/testing/TestPipeline.java   | 49 +++-
 .../sdk/transforms/display/DisplayData.java |  5 +-
 .../java/org/apache/beam/sdk/io/AvroIOTest.java | 14 --
 .../sdk/options/ProxyInvocationHandlerTest.java |  4 +-
 .../beam/sdk/options/ValueProviderTest.java | 23 +
 .../beam/sdk/testing/TestPipelineTest.java  | 37 ++-
 .../apache/beam/sdk/transforms/CreateTest.java  | 22 ++---
 9 files changed, 127 insertions(+), 52 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f1b19b71/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java
index 15413e8..3e6a24b 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java
@@ -41,6 +41,7 @@ import java.lang.reflect.Proxy;
 import java.util.concurrent.ConcurrentHashMap;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.Create;
 import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.values.PCollection;
@@ -54,18 +55,21 @@ import org.apache.beam.sdk.values.PCollection;
  * A common task is to create a {@link PCollection} containing the value of 
this
  * {@link ValueProvider} regardless of whether it's accessible at construction 
time or not.
  * For that, use {@link Create#ofProvider}.
+ *
+ * For unit-testing a transform against a {@link ValueProvider} that only 
becomes available
+ * at runtime, use {@link TestPipeline#newProvider}.
  */
 @JsonSerialize(using = ValueProvider.Serializer.class)
 @JsonDeserialize(using = ValueProvider.Deserializer.class)
 public interface ValueProvider extends Serializable {
   /**
-   * Return the value wrapped by this {@link ValueProvider}.
+   * Returns the runtime value wrapped by this {@link ValueProvider} in case 
it is {@link
+   * #isAccessible}, otherwise fails.
*/
   T get();
 
   /**
-   * Whether the contents of this {@link ValueProvider} is available to
-   * routines that run at graph construction time.
+   * Whether the contents of this {@link ValueProvider} is currently available 
via {@link #get}.
*/
   boolean isAccessible();
 

http://git-wip-us.apache.org/repos/asf/beam/blob/f1b19b71/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProviders.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProviders.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProviders.java
index 2fa..9345462 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProviders.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProviders.java
@@ -22,17 +22,19 @@ import static 
com.google.common.base.Preconditions.checkNotNull;
 import com.fasterxml.jackson.databind.node.ObjectNode;
 import java.io.IOException;
 import java.util.Map;
+import org.apache.beam.sdk.testing.TestPipeline;
 
-/**
- * Utilities for working with the {@link ValueProvider} interface.
- */
+/** Utilities for working with the {@link ValueProvider} interface. */
 public class ValueProviders {
   private ValueProviders() {}
 
   /**
-   * Given {@code serializedOptions} as a JSON-serialized {@link 
PipelineOptions}, updates
-   * the values according to the provided values in {@code runtimeValues}.
+   * Given {@code serializedOptions} as a JSON-serialized {@link 
PipelineOptions}, updates the
+   * values according to the provided values in {@code runtimeValues}.
+   *
+   * @deprecated Use {@link TestPipeline#newProvider} for testing {@link 
ValueProvider} code.
*/
+  @Deprecated
   

[2/2] beam git commit: This closes #3753: [BEAM-2644] Introduces TestPipeline.newProvider()

2017-08-30 Thread jkff
This closes #3753: [BEAM-2644] Introduces TestPipeline.newProvider()


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f6c84053
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f6c84053
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f6c84053

Branch: refs/heads/master
Commit: f6c840533fbfce6e4aec87bbfc3d2ce813a7131d
Parents: 5c2cab0 f1b19b7
Author: Eugene Kirpichov 
Authored: Wed Aug 30 12:15:19 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 12:15:19 2017 -0700

--
 .../apache/beam/sdk/options/ValueProvider.java  | 10 ++--
 .../apache/beam/sdk/options/ValueProviders.java | 15 +++---
 .../apache/beam/sdk/testing/TestPipeline.java   | 49 +++-
 .../sdk/transforms/display/DisplayData.java |  5 +-
 .../java/org/apache/beam/sdk/io/AvroIOTest.java | 14 --
 .../sdk/options/ProxyInvocationHandlerTest.java |  4 +-
 .../beam/sdk/options/ValueProviderTest.java | 23 +
 .../beam/sdk/testing/TestPipelineTest.java  | 37 ++-
 .../apache/beam/sdk/transforms/CreateTest.java  | 22 ++---
 9 files changed, 127 insertions(+), 52 deletions(-)
--




[GitHub] beam pull request #3725: [BEAM-2827] Introduces AvroIO.watchForNewFiles

2017-08-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3725


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[5/6] beam git commit: Fixes a findbugs error in Apex runner

2017-08-30 Thread jkff
Fixes a findbugs error in Apex runner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/6590aed4
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/6590aed4
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/6590aed4

Branch: refs/heads/master
Commit: 6590aed4091a1fbff75311afc45c3d3df1b80d38
Parents: f1f3987
Author: Eugene Kirpichov 
Authored: Wed Aug 16 17:50:25 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 11:55:19 2017 -0700

--
 .../beam/runners/apex/translation/utils/ApexStateInternals.java  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/6590aed4/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/ApexStateInternals.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/ApexStateInternals.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/ApexStateInternals.java
index 18ea8e4..e23601d 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/ApexStateInternals.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/ApexStateInternals.java
@@ -431,7 +431,7 @@ public class ApexStateInternals implements 
StateInternals {
 /**
  * Serializable state for internals (namespace to state tag to coded 
value).
  */
-private Map> perKeyState = new 
HashMap<>();
+private Map> perKeyState = 
new HashMap<>();
 private final Coder keyCoder;
 
 private ApexStateInternalsFactory(Coder keyCoder) {
@@ -451,7 +451,7 @@ public class ApexStateInternals implements 
StateInternals {
   } catch (CoderException e) {
 throw new RuntimeException(e);
   }
-  Table stateTable = perKeyState.get(keyBytes);
+  HashBasedTable stateTable = 
perKeyState.get(keyBytes);
   if (stateTable == null) {
 stateTable = HashBasedTable.create();
 perKeyState.put(keyBytes, stateTable);



[1/6] beam git commit: Adds AvroIO watchForNewFiles

2017-08-30 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master d64f2cce8 -> 5c2cab017


Adds AvroIO watchForNewFiles


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f1f39871
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f1f39871
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f1f39871

Branch: refs/heads/master
Commit: f1f39871da3668bb2ffbc1c27449d36c995b645b
Parents: 82b0852
Author: Eugene Kirpichov 
Authored: Wed Aug 16 14:41:32 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 11:55:18 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroIO.java | 132 ++-
 .../java/org/apache/beam/sdk/io/AvroIOTest.java |  94 +++--
 2 files changed, 212 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f1f39871/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
index 9601a7d..f6f3308 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
@@ -19,6 +19,7 @@ package org.apache.beam.sdk.io;
 
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
+import static org.apache.beam.sdk.transforms.Watch.Growth.ignoreInput;
 
 import com.google.auto.value.AutoValue;
 import com.google.common.annotations.VisibleForTesting;
@@ -48,12 +49,14 @@ import org.apache.beam.sdk.transforms.Create;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.transforms.SerializableFunctions;
+import org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition;
 import org.apache.beam.sdk.transforms.display.DisplayData;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PDone;
 import org.apache.beam.sdk.values.TypeDescriptor;
 import org.apache.beam.sdk.values.TypeDescriptors;
+import org.joda.time.Duration;
 
 /**
  * {@link PTransform}s for reading and writing Avro files.
@@ -76,6 +79,9 @@ import org.apache.beam.sdk.values.TypeDescriptors;
  * allows them in case the filepattern contains a glob wildcard character. Use 
{@link
  * Read#withEmptyMatchTreatment} to configure this behavior.
  *
+ * By default, the filepatterns are expanded only once. {@link 
Read#watchForNewFiles}
+ * allows streaming of new files matching the filepattern(s).
+ *
  * Reading records of a known schema
  *
  * To read specific records, such as Avro-generated classes, use {@link 
#read(Class)}. To read
@@ -137,6 +143,20 @@ import org.apache.beam.sdk.values.TypeDescriptors;
  * filepatterns.apply(AvroIO.parseAllGenericRecords(new 
SerializableFunction...);
  * }
  *
+ * Streaming new files matching a filepattern
+ * {@code
+ * Pipeline p = ...;
+ *
+ * PCollection lines = p.apply(AvroIO
+ * .read(AvroAutoGenClass.class)
+ * .from("gs://my_bucket/path/to/records-*.avro")
+ * .watchForNewFiles(
+ *   // Check for new files every minute
+ *   Duration.standardMinutes(1),
+ *   // Stop watching the filepattern if no new files appear within an hour
+ *   afterTimeSinceNewOutput(Duration.standardHours(1;
+ * }
+ *
  * Reading a very large number of files
  *
  * If it is known that the filepattern will match a very large number of 
files (e.g. tens of
@@ -406,6 +426,8 @@ public class AvroIO {
   public abstract static class Read extends PTransform {
 @Nullable abstract ValueProvider getFilepattern();
 abstract EmptyMatchTreatment getEmptyMatchTreatment();
+@Nullable abstract Duration getWatchForNewFilesInterval();
+@Nullable abstract TerminationCondition 
getWatchForNewFilesTerminationCondition();
 @Nullable abstract Class getRecordClass();
 @Nullable abstract Schema getSchema();
 abstract boolean getHintMatchesManyFiles();
@@ -416,6 +438,9 @@ public class AvroIO {
 abstract static class Builder {
   abstract Builder setFilepattern(ValueProvider filepattern);
   abstract Builder setEmptyMatchTreatment(EmptyMatchTreatment 
emptyMatchTreatment);
+  abstract Builder setWatchForNewFilesInterval(Duration 
watchForNewFilesInterval);
+  abstract Builder setWatchForNewFilesTerminationCondition(
+  TerminationCondition condition);
   abstract Builder setRecordClass(Class recordClass);
   abstract Builder setSchema(Schema schema);
   abstract Builder 

[2/6] beam git commit: Gets rid of raw type in TextIO.Read.watchForNewFiles

2017-08-30 Thread jkff
Gets rid of raw type in TextIO.Read.watchForNewFiles


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/184f7a9b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/184f7a9b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/184f7a9b

Branch: refs/heads/master
Commit: 184f7a9b31641641cdb4bc7ddcf3556c0514f71b
Parents: d64f2cc
Author: Eugene Kirpichov 
Authored: Wed Aug 16 14:25:33 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 11:55:18 2017 -0700

--
 .../java/org/apache/beam/sdk/io/TextIO.java | 15 ---
 .../org/apache/beam/sdk/transforms/Watch.java   | 42 
 2 files changed, 51 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/184f7a9b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
index cbc17ff..835008f 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java
@@ -20,6 +20,7 @@ package org.apache.beam.sdk.io;
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 import static com.google.common.base.Preconditions.checkState;
+import static org.apache.beam.sdk.transforms.Watch.Growth.ignoreInput;
 
 import com.google.auto.value.AutoValue;
 import com.google.common.annotations.VisibleForTesting;
@@ -250,7 +251,7 @@ public class TextIO {
 abstract Duration getWatchForNewFilesInterval();
 
 @Nullable
-abstract TerminationCondition getWatchForNewFilesTerminationCondition();
+abstract TerminationCondition 
getWatchForNewFilesTerminationCondition();
 
 abstract boolean getHintMatchesManyFiles();
 abstract EmptyMatchTreatment getEmptyMatchTreatment();
@@ -262,7 +263,8 @@ public class TextIO {
   abstract Builder setFilepattern(ValueProvider filepattern);
   abstract Builder setCompressionType(CompressionType compressionType);
   abstract Builder setWatchForNewFilesInterval(Duration 
watchForNewFilesInterval);
-  abstract Builder 
setWatchForNewFilesTerminationCondition(TerminationCondition condition);
+  abstract Builder setWatchForNewFilesTerminationCondition(
+  TerminationCondition condition);
   abstract Builder setHintMatchesManyFiles(boolean hintManyFiles);
   abstract Builder setEmptyMatchTreatment(EmptyMatchTreatment treatment);
 
@@ -312,7 +314,8 @@ public class TextIO {
  * @see TerminationCondition
  */
 @Experimental(Kind.SPLITTABLE_DO_FN)
-public Read watchForNewFiles(Duration pollInterval, TerminationCondition 
terminationCondition) {
+public Read watchForNewFiles(
+Duration pollInterval, TerminationCondition 
terminationCondition) {
   return toBuilder()
   .setWatchForNewFilesInterval(pollInterval)
   .setWatchForNewFilesTerminationCondition(terminationCondition)
@@ -352,9 +355,9 @@ public class TextIO {
   .withCompressionType(getCompressionType())
   .withEmptyMatchTreatment(getEmptyMatchTreatment());
   if (getWatchForNewFilesInterval() != null) {
-readAll =
-readAll.watchForNewFiles(
-getWatchForNewFilesInterval(), 
getWatchForNewFilesTerminationCondition());
+TerminationCondition readAllCondition =
+ignoreInput(getWatchForNewFilesTerminationCondition());
+readAll = readAll.watchForNewFiles(getWatchForNewFilesInterval(), 
readAllCondition);
   }
   return input
   .apply("Create filepattern", Create.ofProvider(getFilepattern(), 
StringUtf8Coder.of()))

http://git-wip-us.apache.org/repos/asf/beam/blob/184f7a9b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
index 9da2408..21f0641 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
@@ -264,6 +264,15 @@ public class Watch {
 }
 
 /**
+ * Wraps a given input-independent {@link TerminationCondition} as an 
equivalent condition
+ * with a given input type, passing {@code null} to the original condition 
as input.
+ */
+public static  TerminationCondition 
ignoreInput(
+

[4/6] beam git commit: Better-organized javadocs for TextIO and AvroIO

2017-08-30 Thread jkff
Better-organized javadocs for TextIO and AvroIO


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/84eb7f3a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/84eb7f3a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/84eb7f3a

Branch: refs/heads/master
Commit: 84eb7f3ae431b467828a76e305123601d4ee333a
Parents: 184f7a9
Author: Eugene Kirpichov 
Authored: Wed Aug 16 14:29:52 2017 -0700
Committer: Eugene Kirpichov 
Committed: Wed Aug 30 11:55:18 2017 -0700

--
 .../java/org/apache/beam/sdk/io/AvroIO.java | 83 +---
 .../java/org/apache/beam/sdk/io/TextIO.java | 30 ---
 2 files changed, 75 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/84eb7f3a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
index 9e0422e..d4a7cbb 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
@@ -57,13 +57,20 @@ import org.apache.beam.sdk.values.TypeDescriptors;
 /**
  * {@link PTransform}s for reading and writing Avro files.
  *
+ * Reading Avro files
+ *
  * To read a {@link PCollection} from one or more Avro files with the same 
schema known at
- * pipeline construction time, use {@code AvroIO.read()}, using {@link 
AvroIO.Read#from} to specify
- * the filename or filepattern to read from. Alternatively, if the 
filepatterns to be read are
- * themselves in a {@link PCollection}, apply {@link #readAll}.
+ * pipeline construction time, use {@link #read}, using {@link 
AvroIO.Read#from} to specify the
+ * filename or filepattern to read from. If the filepatterns to be read are 
themselves in a {@link
+ * PCollection}, apply {@link #readAll}. If the schema is unknown at pipeline 
construction time, use
+ * {@link #parseGenericRecords} or {@link #parseAllGenericRecords}.
+ *
+ * Many configuration options below apply to several or all of these 
transforms.
  *
  * See {@link FileSystems} for information on supported file systems and 
filepatterns.
  *
+ * Reading records of a known schema
+ *
  * To read specific records, such as Avro-generated classes, use {@link 
#read(Class)}. To read
  * {@link GenericRecord GenericRecords}, use {@link 
#readGenericRecords(Schema)} which takes a
  * {@link Schema} object, or {@link #readGenericRecords(String)} which takes 
an Avro schema in a
@@ -71,26 +78,34 @@ import org.apache.beam.sdk.values.TypeDescriptors;
  * schema. Likewise, to read a {@link PCollection} of filepatterns, apply 
{@link
  * #readAllGenericRecords}.
  *
- * To read records from files whose schema is unknown at pipeline 
construction time or differs
- * between files, use {@link #parseGenericRecords} - in this case, you will 
need to specify a
- * parsing function for converting each {@link GenericRecord} into a value of 
your custom type.
- * Likewise, to read a {@link PCollection} of filepatterns with unknown 
schema, use {@link
- * #parseAllGenericRecords}.
- *
  * For example:
  *
  * {@code
  * Pipeline p = ...;
  *
- * // A simple Read of a local file (only runs locally):
+ * // Read Avro-generated classes from files on GCS
  * PCollection records =
- * p.apply(AvroIO.read(AvroAutoGenClass.class).from("/path/to/file.avro"));
+ * 
p.apply(AvroIO.read(AvroAutoGenClass.class).from("gs://my_bucket/path/to/records-*.avro"));
  *
- * // A Read from a GCS file (runs locally and using remote execution):
+ * // Read GenericRecord's of the given schema from files on GCS
  * Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
  * PCollection records =
  * p.apply(AvroIO.readGenericRecords(schema)
  *.from("gs://my_bucket/path/to/records-*.avro"));
+ * }
+ *
+ * Reading records of an unknown schema
+ *
+ * To read records from files whose schema is unknown at pipeline 
construction time or differs
+ * between files, use {@link #parseGenericRecords} - in this case, you will 
need to specify a
+ * parsing function for converting each {@link GenericRecord} into a value of 
your custom type.
+ * Likewise, to read a {@link PCollection} of filepatterns with unknown 
schema, use {@link
+ * #parseAllGenericRecords}.
+ *
+ * For example:
+ *
+ * {@code
+ * Pipeline p = ...;
  *
  * PCollection records =
  * p.apply(AvroIO.parseGenericRecords(new 
SerializableFunction() {
@@ -101,12 +116,7 @@ import org.apache.beam.sdk.values.TypeDescriptors;
  * }));
  * }
  *
- * If it is known that the filepattern will match a very large number of 

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Spark #2954

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147718#comment-16147718
 ] 

ASF GitHub Bot commented on BEAM-301:
-

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3794

[TEST PR] to verify [BEAM-301]

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam VERIFY_MERGE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3794


commit 95e14214cbf49c8358872c9d34edc6713bc02e0d
Author: Kenneth Knowles 
Date:   2017-05-26T23:07:45Z

Port DirectRunner WriteFiles override to SDK-agnostic APIs

commit 4a350608100ad3f5dd3bf2798fc9976dcec95467
Author: Kenneth Knowles 
Date:   2017-06-08T20:39:32Z

Port DirectRunner TestStream override to SDK-agnostic APIs

commit fd74b5ca83a32afc67a318f1f791726c6326c6b4
Author: Ismaël Mejía 
Date:   2017-06-05T21:20:27Z

[BEAM-2412] Update HBaseIO to use HBase client 1.2.6

commit 43b8e5f7b2e33c10d823037fa3af3e04e19c373d
Author: JingsongLi 
Date:   2017-06-07T06:34:25Z

Use CoderTypeSerializer and remove unuse code in FlinkStateInternals

commit 1d6c466018aa85a7ef41a0535a1db744fd18e87e
Author: JingsongLi 
Date:   2017-06-07T06:40:30Z

[BEAM-1483] Support SetState in Flink runner and fix MapState to be 
consistent with InMemoryStateInternals.

commit cfbeee2e5bf2ef2d0d0ca839ed2c915e302b0f14
Author: JingsongLi 
Date:   2017-06-07T17:31:34Z

[BEAM-2423] Abstract StateInternalsTest for the different state internals

commit 4f406f50fee333e871459bb5b5d54c7b1719a030
Author: chamik...@google.com 
Date:   2017-06-08T21:56:24Z

Adds ability to dynamically replace PTransforms during runtime.

To this end, adds two interfaces, PTransformMatcher and PTransformOverride.

Currently only supports replacements where input and output types are an 
exact match (we have to address complexities due to type hints before 
supporting replacements with different types).

This will be used by SplittableDoFn where matching ParDo transforms will be 
dynamically replaced by SplittableParDo.

commit 75710a6fd9fa9849680349def917c1c9ddcdd56e
Author: Robert Bradshaw 
Date:   2017-06-09T23:44:55Z

Actually test the fn_api_runner.

The test suite was not being run due to a typo.
Fix breakage due to changes in the code in the meantime.

commit cf33f5ee4db646ab5a53f30fb6de084859e3a86a
Author: Maria Garcia Herrero 
Date:   2017-06-10T06:34:59Z

Make unique test names for value-provider arguments

commit d9887e572902beaf97cf956d8062f61cf9396fb7
Author: Thomas Groh 
Date:   2017-06-12T18:07:47Z

Cleanup Combine Tests with Context

Split out the "shared" bit of all the accumulators, so they show up as
an explicit component of the final result string. Update timestamped
creation logic.

commit 17d2466d5ed6dd8ac9b9fe23f9ae5f9b2583bc23
Author: Eugene Kirpichov 
Date:   2017-06-12T18:51:39Z

Improves message when transitively serializing PipelineOptions

commit 7816f9844fa871bfc0e05e87c7876e9a5eb1e0e2
Author: Charles Chen 
Date:   2017-06-12T21:17:50Z

Reverse removal of NativeWrite evaluator in Python DirectRunner

commit 705db38a964525330ae1c13e270d2e793eda59d0
Author: Thomas Groh 
Date:   2017-06-12T23:55:59Z

Check for Deferral on Non-additional inputs


[jira] [Commented] (BEAM-2826) Need to generate a single XML file when write is performed on small amount of data

2017-08-30 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147720#comment-16147720
 ] 

Eugene Kirpichov commented on BEAM-2826:


The solution to this bug would be either augmenting XmlIO.write() with similar 
builders like TextIO and AvroIO (controlling sharding, and potentially also 
windowed writes, dynamic destinations), or figuring out a good way to do it 
generally for all file-based sinks. I'm not sure if the WriteFiles transform is 
in sufficient shape to be used like that.

I suppose we can start with adding sharding controls to XmlIO.write() - that'd 
be an easy starter task.

> Need to generate a single XML file when write is performed on small amount of 
> data
> --
>
> Key: BEAM-2826
> URL: https://issues.apache.org/jira/browse/BEAM-2826
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.0.0
>Reporter: Balajee Venkatesh
>Assignee: Kenneth Knowles
>
> I'm trying to write an XML file where the source is a text file stored in 
> GCS. The code is running fine but instead of a single XML file, it is 
> generating multiple XML files. (No. of XML files seem to follow total no. of 
> records present in source text file). I have observed this scenario while 
> using 'DataflowRunner'.
> When I run the same code in local then two files get generated. First one 
> contains all the records with proper elements and the second one contains 
> only opening and closing root element.
> As I learnt,it is expected that it may produce multiple files: e.g. if the 
> runner chooses to process your data parallelizing it into 3 tasks 
> ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some 
> cases, but the total data written will always add up to the expected data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3794: [TEST PR] to verify [BEAM-301]

2017-08-30 Thread XuMingmin
GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3794

[TEST PR] to verify [BEAM-301]

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam VERIFY_MERGE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3794


commit 95e14214cbf49c8358872c9d34edc6713bc02e0d
Author: Kenneth Knowles 
Date:   2017-05-26T23:07:45Z

Port DirectRunner WriteFiles override to SDK-agnostic APIs

commit 4a350608100ad3f5dd3bf2798fc9976dcec95467
Author: Kenneth Knowles 
Date:   2017-06-08T20:39:32Z

Port DirectRunner TestStream override to SDK-agnostic APIs

commit fd74b5ca83a32afc67a318f1f791726c6326c6b4
Author: Ismaël Mejía 
Date:   2017-06-05T21:20:27Z

[BEAM-2412] Update HBaseIO to use HBase client 1.2.6

commit 43b8e5f7b2e33c10d823037fa3af3e04e19c373d
Author: JingsongLi 
Date:   2017-06-07T06:34:25Z

Use CoderTypeSerializer and remove unuse code in FlinkStateInternals

commit 1d6c466018aa85a7ef41a0535a1db744fd18e87e
Author: JingsongLi 
Date:   2017-06-07T06:40:30Z

[BEAM-1483] Support SetState in Flink runner and fix MapState to be 
consistent with InMemoryStateInternals.

commit cfbeee2e5bf2ef2d0d0ca839ed2c915e302b0f14
Author: JingsongLi 
Date:   2017-06-07T17:31:34Z

[BEAM-2423] Abstract StateInternalsTest for the different state internals

commit 4f406f50fee333e871459bb5b5d54c7b1719a030
Author: chamik...@google.com 
Date:   2017-06-08T21:56:24Z

Adds ability to dynamically replace PTransforms during runtime.

To this end, adds two interfaces, PTransformMatcher and PTransformOverride.

Currently only supports replacements where input and output types are an 
exact match (we have to address complexities due to type hints before 
supporting replacements with different types).

This will be used by SplittableDoFn where matching ParDo transforms will be 
dynamically replaced by SplittableParDo.

commit 75710a6fd9fa9849680349def917c1c9ddcdd56e
Author: Robert Bradshaw 
Date:   2017-06-09T23:44:55Z

Actually test the fn_api_runner.

The test suite was not being run due to a typo.
Fix breakage due to changes in the code in the meantime.

commit cf33f5ee4db646ab5a53f30fb6de084859e3a86a
Author: Maria Garcia Herrero 
Date:   2017-06-10T06:34:59Z

Make unique test names for value-provider arguments

commit d9887e572902beaf97cf956d8062f61cf9396fb7
Author: Thomas Groh 
Date:   2017-06-12T18:07:47Z

Cleanup Combine Tests with Context

Split out the "shared" bit of all the accumulators, so they show up as
an explicit component of the final result string. Update timestamped
creation logic.

commit 17d2466d5ed6dd8ac9b9fe23f9ae5f9b2583bc23
Author: Eugene Kirpichov 
Date:   2017-06-12T18:51:39Z

Improves message when transitively serializing PipelineOptions

commit 7816f9844fa871bfc0e05e87c7876e9a5eb1e0e2
Author: Charles Chen 
Date:   2017-06-12T21:17:50Z

Reverse removal of NativeWrite evaluator in Python DirectRunner

commit 705db38a964525330ae1c13e270d2e793eda59d0
Author: Thomas Groh 
Date:   2017-06-12T23:55:59Z

Check for Deferral on Non-additional inputs

Because Side Inputs are represented within the expanded inputs, the
check that the transform is a Combine with Side Inputs would never be
hit. This ensures that we do not consider additional inputs during the
check to 

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink #3722

2017-08-30 Thread Apache Jenkins Server
See 


--
[...truncated 485.41 KB...]
2017-08-30T18:08:19.119 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/zookeeper/zookeeper/3.4.8/zookeeper-3.4.8.pom
2017-08-30T18:08:19.146 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/zookeeper/zookeeper/3.4.8/zookeeper-3.4.8.pom
 (4 KB at 145.6 KB/sec)
2017-08-30T18:08:19.148 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/log4j/log4j/1.2.16/log4j-1.2.16.pom
2017-08-30T18:08:19.177 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/log4j/log4j/1.2.16/log4j-1.2.16.pom (20 KB 
at 684.9 KB/sec)
2017-08-30T18:08:19.181 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.pom
2017-08-30T18:08:19.208 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.pom
 (10 KB at 344.4 KB/sec)
2017-08-30T18:08:19.243 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-clients_2.10/1.3.0/flink-clients_2.10-1.3.0.jar
2017-08-30T18:08:19.244 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-optimizer_2.10/1.3.0/flink-optimizer_2.10-1.3.0.jar
2017-08-30T18:08:19.245 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/force-shading/1.3.0/force-shading-1.3.0.jar
2017-08-30T18:08:19.246 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-core/1.3.0/flink-core-1.3.0.jar
2017-08-30T18:08:19.250 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-annotations/1.3.0/flink-annotations-1.3.0.jar
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
2017-08-30T18:08:19.278 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-annotations/1.3.0/flink-annotations-1.3.0.jar
 (8 KB at 218.5 KB/sec)
2017-08-30T18:08:19.278 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/esotericsoftware/kryo/kryo/2.24.0/kryo-2.24.0.jar
2017-08-30T18:08:19.326 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-clients_2.10/1.3.0/flink-clients_2.10-1.3.0.jar
 (89 KB at 964.2 KB/sec)
2017-08-30T18:08:19.326 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/objenesis/objenesis/2.1/objenesis-2.1.jar
2017-08-30T18:08:19.346 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/flink/force-shading/1.3.0/force-shading-1.3.0.jar
 (8 KB at 69.7 KB/sec)
2017-08-30T18:08:19.346 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-metrics-core/1.3.0/flink-metrics-core-1.3.0.jar
2017-08-30T18:08:19.356 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/objenesis/objenesis/2.1/objenesis-2.1.jar
 (41 KB at 377.6 KB/sec)
2017-08-30T18:08:19.356 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-java/1.3.0/flink-java-1.3.0.jar
2017-08-30T18:08:19.376 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-metrics-core/1.3.0/flink-metrics-core-1.3.0.jar
 (16 KB at 122.0 KB/sec)
2017-08-30T18:08:19.376 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop2/1.3.0/flink-shaded-hadoop2-1.3.0.jar
2017-08-30T18:08:19.388 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/esotericsoftware/kryo/kryo/2.24.0/kryo-2.24.0.jar
 (332 KB at 2366.7 KB/sec)
2017-08-30T18:08:19.388 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.jar
2017-08-30T18:08:19.418 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.jar (15 KB 
at 86.2 KB/sec)
2017-08-30T18:08:19.418 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.4/commons-codec-1.4.jar
2017-08-30T18:08:19.455 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.4/commons-codec-1.4.jar
 (57 KB at 275.7 KB/sec)
2017-08-30T18:08:19.455 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-net/commons-net/3.1/commons-net-3.1.jar
2017-08-30T18:08:19.570 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-java/1.3.0/flink-java-1.3.0.jar
 (752 KB at 2335.0 KB/sec)
2017-08-30T18:08:19.570 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar
2017-08-30T18:08:19.574 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
2017-08-30T18:08:19.578 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-net/commons-net/3.1/commons-net-3.1.jar
 (267 KB at 811.4 KB/sec)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Apex #2284

2017-08-30 Thread Apache Jenkins Server
See 


--
[...truncated 501.40 KB...]
2017-08-30T18:06:55.378 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/inject/guice/3.0/guice-3.0.pom
2017-08-30T18:06:55.403 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/inject/guice/3.0/guice-3.0.pom 
(8 KB at 284.5 KB/sec)
2017-08-30T18:06:55.405 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.pom
2017-08-30T18:06:55.432 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.pom
 (12 KB at 418.1 KB/sec)
2017-08-30T18:06:55.434 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/asm/asm/3.1/asm-3.1.pom
2017-08-30T18:06:55.461 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/asm/asm/3.1/asm-3.1.pom (278 B at 10.1 
KB/sec)
2017-08-30T18:06:55.462 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/asm/asm-parent/3.1/asm-parent-3.1.pom
2017-08-30T18:06:55.487 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/asm/asm-parent/3.1/asm-parent-3.1.pom (5 
KB at 162.2 KB/sec)
2017-08-30T18:06:55.489 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/sun/jersey/contribs/jersey-guice/1.9/jersey-guice-1.9.pom
2017-08-30T18:06:55.516 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/sun/jersey/contribs/jersey-guice/1.9/jersey-guice-1.9.pom
 (8 KB at 270.4 KB/sec)
2017-08-30T18:06:55.518 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/jline/jline/2.11/jline-2.11.pom
2017-08-30T18:06:55.545 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/jline/jline/2.11/jline-2.11.pom (17 KB at 
610.1 KB/sec)
2017-08-30T18:06:55.547 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant/1.9.2/ant-1.9.2.pom
2017-08-30T18:06:55.573 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant/1.9.2/ant-1.9.2.pom (10 
KB at 355.5 KB/sec)
2017-08-30T18:06:55.575 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-parent/1.9.2/ant-parent-1.9.2.pom
2017-08-30T18:06:55.602 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-parent/1.9.2/ant-parent-1.9.2.pom
 (6 KB at 202.7 KB/sec)
2017-08-30T18:06:55.604 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-launcher/1.9.2/ant-launcher-1.9.2.pom
2017-08-30T18:06:55.630 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-launcher/1.9.2/ant-launcher-1.9.2.pom
 (3 KB at 87.7 KB/sec)
2017-08-30T18:06:55.631 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/net/engio/mbassador/1.1.9/mbassador-1.1.9.pom
2017-08-30T18:06:55.657 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/net/engio/mbassador/1.1.9/mbassador-1.1.9.pom
 (9 KB at 314.0 KB/sec)
2017-08-30T18:06:55.659 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/net/lingala/zip4j/zip4j/1.3.2/zip4j-1.3.2.pom
2017-08-30T18:06:55.684 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/net/lingala/zip4j/zip4j/1.3.2/zip4j-1.3.2.pom
 (922 B at 34.6 KB/sec)
2017-08-30T18:06:55.686 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.10/commons-codec-1.10.pom
2017-08-30T18:06:55.711 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.10/commons-codec-1.10.pom
 (12 KB at 453.5 KB/sec)
2017-08-30T18:06:55.712 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/35/commons-parent-35.pom
2017-08-30T18:06:55.741 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/35/commons-parent-35.pom
 (57 KB at 1945.4 KB/sec)
2017-08-30T18:06:55.745 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/xbean/xbean-asm5-shaded/4.3/xbean-asm5-shaded-4.3.pom
2017-08-30T18:06:55.772 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/xbean/xbean-asm5-shaded/4.3/xbean-asm5-shaded-4.3.pom
 (4 KB at 121.1 KB/sec)
2017-08-30T18:06:55.774 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/xbean/xbean/4.3/xbean-4.3.pom
2017-08-30T18:06:55.802 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/xbean/xbean/4.3/xbean-4.3.pom 
(18 KB at 616.9 KB/sec)
2017-08-30T18:06:55.803 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/geronimo/genesis/genesis-java5-flava/2.1/genesis-java5-flava-2.1.pom
2017-08-30T18:06:55.829 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/geronimo/genesis/genesis-java5-flava/2.1/genesis-java5-flava-2.1.pom
 (6 KB at 206.0 KB/sec)
2017-08-30T18:06:55.830 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/geronimo/genesis/genesis-default-flava/2.1/genesis-default-flava-2.1.pom
2017-08-30T18:06:55.856 [INFO] Downloaded: 

[jira] [Comment Edited] (BEAM-2826) Need to generate a single XML file when write is performed on small amount of data

2017-08-30 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147446#comment-16147446
 ] 

Luke Cwik edited comment on BEAM-2826 at 8/30/17 3:34 PM:
--

What about doing this in your pipeline:
{code}
PC -> DoFn(AssignVoidKey) -> PC> -> GroupByKey -> PC -> DoFn(Format as XML string) -> PC -> 
TextIO.withNumShards(1).withSuffix("xml");
{code}


was (Author: lcwik):
What about doing this in your pipeline:
```
PC -> DoFn(AssignVoidKey) -> PC> -> GroupByKey -> PC -> DoFn(Format as XML string) -> PC -> 
TextIO.withNumShards(1).withSuffix("xml");
```

> Need to generate a single XML file when write is performed on small amount of 
> data
> --
>
> Key: BEAM-2826
> URL: https://issues.apache.org/jira/browse/BEAM-2826
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.0.0
>Reporter: Balajee Venkatesh
>Assignee: Kenneth Knowles
>
> I'm trying to write an XML file where the source is a text file stored in 
> GCS. The code is running fine but instead of a single XML file, it is 
> generating multiple XML files. (No. of XML files seem to follow total no. of 
> records present in source text file). I have observed this scenario while 
> using 'DataflowRunner'.
> When I run the same code in local then two files get generated. First one 
> contains all the records with proper elements and the second one contains 
> only opening and closing root element.
> As I learnt,it is expected that it may produce multiple files: e.g. if the 
> runner chooses to process your data parallelizing it into 3 tasks 
> ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some 
> cases, but the total data written will always add up to the expected data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2826) Need to generate a single XML file when write is performed on small amount of data

2017-08-30 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147446#comment-16147446
 ] 

Luke Cwik commented on BEAM-2826:
-

What about doing this in your pipeline:
```
PC -> DoFn(AssignVoidKey) -> PC> -> GroupByKey -> PC -> DoFn(Format as XML string) -> PC -> 
TextIO.withNumShards(1).withSuffix("xml");
```

> Need to generate a single XML file when write is performed on small amount of 
> data
> --
>
> Key: BEAM-2826
> URL: https://issues.apache.org/jira/browse/BEAM-2826
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Affects Versions: 2.0.0
>Reporter: Balajee Venkatesh
>Assignee: Kenneth Knowles
>
> I'm trying to write an XML file where the source is a text file stored in 
> GCS. The code is running fine but instead of a single XML file, it is 
> generating multiple XML files. (No. of XML files seem to follow total no. of 
> records present in source text file). I have observed this scenario while 
> using 'DataflowRunner'.
> When I run the same code in local then two files get generated. First one 
> contains all the records with proper elements and the second one contains 
> only opening and closing root element.
> As I learnt,it is expected that it may produce multiple files: e.g. if the 
> runner chooses to process your data parallelizing it into 3 tasks 
> ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some 
> cases, but the total data written will always add up to the expected data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark #2953

2017-08-30 Thread Apache Jenkins Server
See 


--
[...truncated 453.07 KB...]
2017-08-30T14:28:55.640 [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.AfterAllStateMachineTest
2017-08-30T14:28:55.641 [INFO] Running 
org.apache.beam.runners.core.triggers.ReshuffleTriggerStateMachineTest
2017-08-30T14:28:55.643 [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.ReshuffleTriggerStateMachineTest
2017-08-30T14:28:55.643 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterProcessingTimeStateMachineTest
2017-08-30T14:28:55.655 [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.009 s - in 
org.apache.beam.runners.core.triggers.AfterProcessingTimeStateMachineTest
2017-08-30T14:28:55.655 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterPaneStateMachineTest
2017-08-30T14:28:55.657 [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterPaneStateMachineTest
2017-08-30T14:28:55.657 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterSynchronizedProcessingTimeStateMachineTest
2017-08-30T14:28:55.659 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterSynchronizedProcessingTimeStateMachineTest
2017-08-30T14:28:56.068 [INFO] 
2017-08-30T14:28:56.068 [INFO] Results:
2017-08-30T14:28:56.068 [INFO] 
2017-08-30T14:28:56.068 [INFO] Tests run: 230, Failures: 0, Errors: 0, Skipped: 0
2017-08-30T14:28:56.068 [INFO] 
[JENKINS] Recording test results
2017-08-30T14:28:56.628 [INFO] 
2017-08-30T14:28:56.628 [INFO] --- 
build-helper-maven-plugin:3.0.0:regex-properties (render-artifact-id) @ 
beam-runners-core-java ---
2017-08-30T14:28:56.736 [INFO] 
2017-08-30T14:28:56.736 [INFO] --- jacoco-maven-plugin:0.7.8:report (report) @ 
beam-runners-core-java ---
2017-08-30T14:28:56.738 [INFO] Loading execution data file 

2017-08-30T14:28:56.775 [INFO] Analyzed bundle 'Apache Beam :: Runners :: Core 
Java' with 193 classes
2017-08-30T14:28:57.267 [INFO] 
2017-08-30T14:28:57.267 [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ 
beam-runners-core-java ---
2017-08-30T14:28:57.295 [INFO] Building jar: 

2017-08-30T14:28:57.434 [INFO] 
2017-08-30T14:28:57.434 [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ beam-runners-core-java ---
2017-08-30T14:28:57.806 [INFO] 
2017-08-30T14:28:57.806 [INFO] --- maven-jar-plugin:3.0.2:test-jar 
(default-test-jar) @ beam-runners-core-java ---
2017-08-30T14:28:57.820 [INFO] Building jar: 

2017-08-30T14:28:57.943 [INFO] 
2017-08-30T14:28:57.943 [INFO] --- maven-shade-plugin:3.0.0:shade 
(bundle-and-repackage) @ beam-runners-core-java ---
2017-08-30T14:28:57.946 [INFO] Excluding 
org.apache.beam:beam-sdks-java-core:jar:2.2.0-SNAPSHOT from the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding 
com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-core:jar:2.8.9 from the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9 from the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-databind:jar:2.8.9 from the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.14 from 
the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding net.bytebuddy:byte-buddy:jar:1.6.8 
from the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding org.apache.avro:avro:jar:1.8.2 from 
the shaded jar.
2017-08-30T14:28:57.946 [INFO] Excluding 
org.codehaus.jackson:jackson-core-asl:jar:1.9.13 from the shaded jar.
2017-08-30T14:28:57.947 [INFO] Excluding 
org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 from the shaded jar.
2017-08-30T14:28:57.947 [INFO] Excluding 
com.thoughtworks.paranamer:paranamer:jar:2.7 from the shaded jar.
2017-08-30T14:28:57.947 [INFO] Excluding org.tukaani:xz:jar:1.5 from the shaded 
jar.
2017-08-30T14:28:57.947 [INFO] Excluding 
org.xerial.snappy:snappy-java:jar:1.1.4-M3 from the shaded jar.
2017-08-30T14:28:57.947 [INFO] Excluding 
org.apache.commons:commons-compress:jar:1.14 from the shaded jar.
2017-08-30T14:28:57.947 [INFO] Excluding 
org.apache.commons:commons-lang3:jar:3.6 from the shaded jar.
2017-08-30T14:28:57.947 [INFO] Excluding 

[jira] [Created] (BEAM-2826) Need to generate a single XML file when write is performed on small amount of data

2017-08-30 Thread Balajee Venkatesh (JIRA)
Balajee Venkatesh created BEAM-2826:
---

 Summary: Need to generate a single XML file when write is 
performed on small amount of data
 Key: BEAM-2826
 URL: https://issues.apache.org/jira/browse/BEAM-2826
 Project: Beam
  Issue Type: New Feature
  Components: beam-model
Affects Versions: 2.0.0
Reporter: Balajee Venkatesh
Assignee: Kenneth Knowles


I'm trying to write an XML file where the source is a text file stored in GCS. 
The code is running fine but instead of a single XML file, it is generating 
multiple XML files. (No. of XML files seem to follow total no. of records 
present in source text file). I have observed this scenario while using 
'DataflowRunner'.

When I run the same code in local then two files get generated. First one 
contains all the records with proper elements and the second one contains only 
opening and closing root element.

As I learnt,it is expected that it may produce multiple files: e.g. if the 
runner chooses to process your data parallelizing it into 3 tasks ("bundles"), 
you'll get 3 files. Some of the parts may turn out empty in some cases, but the 
total data written will always add up to the expected data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3870

2017-08-30 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #4681

2017-08-30 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3705: [BEAM-165] Initial implementation of the MapReduce ...

2017-08-30 Thread peihe
GitHub user peihe reopened a pull request:

https://github.com/apache/beam/pull/3705

[BEAM-165] Initial implementation of the MapReduce runner.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam mr-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3705


commit 3260057b7beffc9b2bd2bf8f722db4805f5016b5
Author: Pei He 
Date:   2017-08-30T13:05:38Z

test builds




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2825) Improve readability of SparkGroupAlsoByWindowViaWindowSet

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147189#comment-16147189
 ] 

ASF GitHub Bot commented on BEAM-2825:
--

GitHub user staslev opened a pull request:

https://github.com/apache/beam/pull/3793

[BEAM-2825] Refactored SparkGroupAlsoByWindowViaWindowSet to improve …

…readability.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/staslev/beam 
BEAM-2825-Improve-readability-of-SparkGroupAlsoByWindowViaWindowSet

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3793


commit 04068c95b158f04843517309ffe5d2543b206637
Author: Stas Levin 
Date:   2017-08-30T09:01:32Z

[BEAM-2825] Refactored SparkGroupAlsoByWindowViaWindowSet to improve 
readability.




> Improve readability of SparkGroupAlsoByWindowViaWindowSet
> -
>
> Key: BEAM-2825
> URL: https://issues.apache.org/jira/browse/BEAM-2825
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Stas Levin
>Assignee: Stas Levin
>Priority: Minor
>
> It would be nice to address some of the readability issues one can currently 
> observe in {{SparkGroupAlsoByWindowViaWindowSet}}, in particular:
> # Long methods
> # Long anonymous classes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3793: [BEAM-2825] Refactored SparkGroupAlsoByWindowViaWin...

2017-08-30 Thread staslev
GitHub user staslev opened a pull request:

https://github.com/apache/beam/pull/3793

[BEAM-2825] Refactored SparkGroupAlsoByWindowViaWindowSet to improve …

…readability.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/staslev/beam 
BEAM-2825-Improve-readability-of-SparkGroupAlsoByWindowViaWindowSet

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3793


commit 04068c95b158f04843517309ffe5d2543b206637
Author: Stas Levin 
Date:   2017-08-30T09:01:32Z

[BEAM-2825] Refactored SparkGroupAlsoByWindowViaWindowSet to improve 
readability.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-165) Add Hadoop MapReduce runner

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147188#comment-16147188
 ] 

ASF GitHub Bot commented on BEAM-165:
-

Github user peihe closed the pull request at:

https://github.com/apache/beam/pull/3705


> Add Hadoop MapReduce runner
> ---
>
> Key: BEAM-165
> URL: https://issues.apache.org/jira/browse/BEAM-165
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas, runner-mapreduce
>Reporter: Jean-Baptiste Onofré
>Assignee: Pei He
>
> I think a MapReduce runner could be a good addition to Beam. It would allow 
> users to smoothly "migrate" from MapReduce to Spark or Flink.
> Of course, the MapReduce runner will run in batch mode (not stream).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3705: [BEAM-165] Initial implementation of the MapReduce ...

2017-08-30 Thread peihe
Github user peihe closed the pull request at:

https://github.com/apache/beam/pull/3705


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-2825) Improve readability of SparkGroupAlsoByWindowViaWindowSet

2017-08-30 Thread Stas Levin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stas Levin reassigned BEAM-2825:


Assignee: Stas Levin  (was: Amit Sela)

> Improve readability of SparkGroupAlsoByWindowViaWindowSet
> -
>
> Key: BEAM-2825
> URL: https://issues.apache.org/jira/browse/BEAM-2825
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Stas Levin
>Assignee: Stas Levin
>Priority: Minor
>
> It would be nice to address some of the readability issues one can currently 
> observe in {{SparkGroupAlsoByWindowViaWindowSet}}, in particular:
> # Long methods
> # Long anonymous classes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2457) Error: "Unable to find registrar for hdfs" - need to prevent/improve error message

2017-08-30 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146873#comment-16146873
 ] 

Aljoscha Krettek commented on BEAM-2457:


(This is on the Cloudera Quickstart VM)

I noticed that this doesn't work:
{code}
$ java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount 
--inputFile=hdfs:///user/aljoscha/wc-in --output=hdfs:///tmp/wc-out-beam 
--runner=DirectRunner
{code}

This also doesn't work:
{code}
$ export HADOOP_CONF_DIR=/etc/hadoop/conf
$ java -cp word-count-beam-bundled-0.1.jar org.apache.beam.examples.WordCount 
--inputFile=hdfs:///user/aljoscha/wc-in --output=hdfs:///tmp/wc-out-beam 
--runner=DirectRunner
{code}

This one also doesn't work:
{code}
$ java -cp word-count-beam-bundled-0.1.jar:/etc/hadoop/conf 
org.apache.beam.examples.WordCount --inputFile=hdfs:///user/aljoscha/wc-in 
--output=hdfs:///tmp/wc-out-beam --runner=DirectRunner
{code}

But this works:
{code}
$ export HADOOP_CONF_DIR=/etc/hadoop/conf
$ java -cp word-count-beam-bundled-0.1.jar:/etc/hadoop/conf 
org.apache.beam.examples.WordCount --inputFile=hdfs:///user/aljoscha/wc-in 
--output=hdfs:///tmp/wc-out-beam --runner=DirectRunner
{code}

> Error: "Unable to find registrar for hdfs" - need to prevent/improve error 
> message
> --
>
> Key: BEAM-2457
> URL: https://issues.apache.org/jira/browse/BEAM-2457
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Flavio Fiszman
>
> I've noticed a number of user reports where jobs are failing with the error 
> message "Unable to find registrar for hdfs": 
> * 
> https://stackoverflow.com/questions/44497662/apache-beamunable-to-find-registrar-for-hdfs/44508533?noredirect=1#comment76026835_44508533
> * 
> https://lists.apache.org/thread.html/144c384e54a141646fcbe854226bb3668da091c5dc7fa2d471626e9b@%3Cuser.beam.apache.org%3E
> * 
> https://lists.apache.org/thread.html/e4d5ac744367f9d036a1f776bba31b9c4fe377d8f11a4b530be9f829@%3Cuser.beam.apache.org%3E
>  
> This isn't too many reports, but it is the only time I can recall so many 
> users reporting the same error message in a such a short amount of time. 
> We believe the problem is one of two things: 
> 1) bad uber jar creation
> 2) incorrect HDFS configuration
> However, it's highly possible this could have some other root cause. 
> It seems like it'd be useful to:
> 1) Follow up with the above reports to see if they've resolved the issue, and 
> if so what fixed it. There may be another root cause out there.
> 2) Improve the error message to include more information about how to resolve 
> it
> 3) See if we can improve detection of the error cases to give more specific 
> information (specifically, if HDFS is miconfigured, can we detect that 
> somehow and tell the user exactly that?)
> 4) update documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2824) Support PipelineResult in JStormRunner

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146826#comment-16146826
 ] 

ASF GitHub Bot commented on BEAM-2824:
--

GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/3792

[BEAM-2824]: support PipelineResult.waitUntilFinish() in JStormRunner local 
mode and uses it in TestJStormRunner.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam jstorm-runner-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3792


commit 18cfc8ecc06d79d14cc9bcd07414a2471725883d
Author: Pei He 
Date:   2017-08-29T12:10:06Z

[BEAM-2824]: support PipelineResult.waitUntilFinish() in jstorm local mode.

commit 37bc0bd827ce5612b94d03685cc2ab176c691627
Author: Pei He 
Date:   2017-08-30T06:50:20Z

[BEAM-2824] Uses PipelineResult in TestJStormRunner.




> Support PipelineResult in JStormRunner
> --
>
> Key: BEAM-2824
> URL: https://issues.apache.org/jira/browse/BEAM-2824
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-jstorm
>Reporter: Pei He
>Assignee: Pei He
>
> Here are the work items:
> 1. supports metrics() in local mode.
> 2. supports waitUntilFinish() in local mode.
> 3. uses PipelineResult in TestJStormRunner.
> 4. supports metrics() in cluster mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3792: [BEAM-2824]: support PipelineResult.waitUntilFinish...

2017-08-30 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/3792

[BEAM-2824]: support PipelineResult.waitUntilFinish() in JStormRunner local 
mode and uses it in TestJStormRunner.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam jstorm-runner-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3792.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3792


commit 18cfc8ecc06d79d14cc9bcd07414a2471725883d
Author: Pei He 
Date:   2017-08-29T12:10:06Z

[BEAM-2824]: support PipelineResult.waitUntilFinish() in jstorm local mode.

commit 37bc0bd827ce5612b94d03685cc2ab176c691627
Author: Pei He 
Date:   2017-08-30T06:50:20Z

[BEAM-2824] Uses PipelineResult in TestJStormRunner.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #4680

2017-08-30 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2824) Support PipelineResult in JStormRunner

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146800#comment-16146800
 ] 

ASF GitHub Bot commented on BEAM-2824:
--

GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/3791

[BEAM-2824] support gauge and PipelineResults.metrics() in local mode.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam jstorm-runner-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3791


commit 60dbcd2f968aa6ba6fa8cbd1f4affe8506e8ef1c
Author: Pei He 
Date:   2017-08-30T07:16:44Z

[BEAM-2824] support gauge and PipelineResults.metrics() in local mode.




> Support PipelineResult in JStormRunner
> --
>
> Key: BEAM-2824
> URL: https://issues.apache.org/jira/browse/BEAM-2824
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-jstorm
>Reporter: Pei He
>Assignee: Pei He
>
> Here are the work items:
> 1. supports metrics() in local mode.
> 2. supports waitUntilFinish() in local mode.
> 3. uses PipelineResult in TestJStormRunner.
> 4. supports metrics() in cluster mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3791: [BEAM-2824] support gauge and PipelineResults.metri...

2017-08-30 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/3791

[BEAM-2824] support gauge and PipelineResults.metrics() in local mode.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam jstorm-runner-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3791


commit 60dbcd2f968aa6ba6fa8cbd1f4affe8506e8ef1c
Author: Pei He 
Date:   2017-08-30T07:16:44Z

[BEAM-2824] support gauge and PipelineResults.metrics() in local mode.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] beam git commit: This closes #3789

2017-08-30 Thread pei
This closes #3789


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/91488992
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/91488992
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/91488992

Branch: refs/heads/jstorm-runner
Commit: 9148899254ea873b4a2c5f3314fa30c3b633cea9
Parents: e00e0e8 c952686
Author: Pei He 
Authored: Wed Aug 30 14:57:48 2017 +0800
Committer: Pei He 
Committed: Wed Aug 30 14:57:48 2017 +0800

--
 .../java/org/apache/beam/runners/jstorm/JStormRunnerResult.java | 5 +++--
 .../apache/beam/runners/jstorm/translation/MetricsReporter.java | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)
--




[2/3] beam git commit: jstorm-runner: Fix incorrect updating of counter metrics

2017-08-30 Thread pei
jstorm-runner: Fix incorrect updating of counter metrics


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c9526869
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c9526869
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c9526869

Branch: refs/heads/jstorm-runner
Commit: c95268691f78a629866f722df1a3f7ef5e76a256
Parents: 557d703
Author: basti.lj 
Authored: Wed Aug 30 10:45:45 2017 +0800
Committer: Pei He 
Committed: Wed Aug 30 14:55:17 2017 +0800

--
 .../apache/beam/runners/jstorm/translation/MetricsReporter.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c9526869/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/translation/MetricsReporter.java
--
diff --git 
a/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/translation/MetricsReporter.java
 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/translation/MetricsReporter.java
index e7f3285..0315a59 100644
--- 
a/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/translation/MetricsReporter.java
+++ 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/translation/MetricsReporter.java
@@ -72,7 +72,7 @@ class MetricsReporter {
 AsmCounter counter = metricClient.registerCounter(metricName);
 Long incValue = (oldValue == null ? updateValue : updateValue - 
oldValue);
 counter.update(incValue);
-reportedCounters.put(metricName, incValue);
+reportedCounters.put(metricName, updateValue);
   }
 }
   }



[1/3] beam git commit: jstorm-runner: Fix the bug that max waiting time is missing on local mode

2017-08-30 Thread pei
Repository: beam
Updated Branches:
  refs/heads/jstorm-runner e00e0e841 -> 914889925


jstorm-runner: Fix the bug that max waiting time is missing on local mode


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/557d7036
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/557d7036
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/557d7036

Branch: refs/heads/jstorm-runner
Commit: 557d7036efe3bcb83ea99ca14ad052052bab5add
Parents: e00e0e8
Author: basti.lj 
Authored: Wed Aug 30 10:45:17 2017 +0800
Committer: Pei He 
Committed: Wed Aug 30 14:55:15 2017 +0800

--
 .../java/org/apache/beam/runners/jstorm/JStormRunnerResult.java | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/557d7036/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunnerResult.java
--
diff --git 
a/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunnerResult.java
 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunnerResult.java
index 4b1850e..b6b5281 100644
--- 
a/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunnerResult.java
+++ 
b/runners/jstorm/src/main/java/org/apache/beam/runners/jstorm/JStormRunnerResult.java
@@ -64,8 +64,8 @@ public abstract class JStormRunnerResult implements 
PipelineResult {
 
   private static class LocalJStormPipelineResult extends JStormRunnerResult {
 
-private LocalCluster localCluster;
-private long localModeExecuteTimeSecs;
+private final LocalCluster localCluster;
+private final long localModeExecuteTimeSecs;
 
 LocalJStormPipelineResult(
 String topologyName,
@@ -74,6 +74,7 @@ public abstract class JStormRunnerResult implements 
PipelineResult {
 long localModeExecuteTimeSecs) {
   super(topologyName, config);
   this.localCluster = checkNotNull(localCluster, "localCluster");
+  this.localModeExecuteTimeSecs = localModeExecuteTimeSecs;
 }
 
 @Override



[jira] [Commented] (BEAM-2632) TextIOReadTest create pipelines with non-unique application names

2017-08-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146770#comment-16146770
 ] 

ASF GitHub Bot commented on BEAM-2632:
--

GitHub user huafengw opened a pull request:

https://github.com/apache/beam/pull/3790

[BEAM-2632] Use Junit Paramaterized test suits in TextIOReadTest

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huafengw/beam fixtest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3790


commit cf75821f3175135e60999ba748377b0202f7cfcd
Author: huafengw 
Date:   2017-08-30T06:18:21Z

[BEAM-2632] Use Junit Paramaterized test suits in TextIOReadTest




> TextIOReadTest create pipelines with non-unique application names
> -
>
> Key: BEAM-2632
> URL: https://issues.apache.org/jira/browse/BEAM-2632
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Huafeng Wang
>Priority: Trivial
>  Labels: newbie, starter
>
> The test {{TextIOReadTest}} uses a loop to create a few tests within a single 
> test method. This results in a pipeline with non-unique applied transform 
> nodes.
> Perhaps the best way to fix this is to use a JUnit {{Paramaterized}} test 
> suite, or multiple. It does seem that the test is basically doing the full 
> product of empty/tiny/large with various compression types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3790: [BEAM-2632] Use Junit Paramaterized test suits in T...

2017-08-30 Thread huafengw
GitHub user huafengw opened a pull request:

https://github.com/apache/beam/pull/3790

[BEAM-2632] Use Junit Paramaterized test suits in TextIOReadTest

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huafengw/beam fixtest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3790


commit cf75821f3175135e60999ba748377b0202f7cfcd
Author: huafengw 
Date:   2017-08-30T06:18:21Z

[BEAM-2632] Use Junit Paramaterized test suits in TextIOReadTest




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2824) Support PipelineResult in JStormRunner

2017-08-30 Thread Pei He (JIRA)
Pei He created BEAM-2824:


 Summary: Support PipelineResult in JStormRunner
 Key: BEAM-2824
 URL: https://issues.apache.org/jira/browse/BEAM-2824
 Project: Beam
  Issue Type: New Feature
  Components: runner-jstorm
Reporter: Pei He
Assignee: Pei He


Here are the work items:
1. supports metrics() in local mode.
2. supports waitUntilFinish() in local mode.
3. uses PipelineResult in TestJStormRunner.
4. supports metrics() in cluster mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)