[jira] [Updated] (BEAM-2480) ACOS, ASIN, ATAN, COS, COT, DEGREES, RADIANS, SIN, TAN, SIGN, LN, LOG10, EXP Functions

2017-06-21 Thread Tarush Grover (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarush Grover updated BEAM-2480:

Summary: ACOS, ASIN, ATAN, COS, COT, DEGREES, RADIANS, SIN, TAN, SIGN, LN, 
LOG10, EXP Functions  (was: ACOS, ASIN, ATAN, COS, COT, DEGREES, RADIANS, SIN, 
TAN, SIGN Functions)

> ACOS, ASIN, ATAN, COS, COT, DEGREES, RADIANS, SIN, TAN, SIGN, LN, LOG10, EXP 
> Functions
> --
>
> Key: BEAM-2480
> URL: https://issues.apache.org/jira/browse/BEAM-2480
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Tarush Grover
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Dataflow #3422

2017-06-21 Thread Apache Jenkins Server
See 


--
[...truncated 927.59 KB...]
2017-06-22T06:08:17.131 [INFO]  
   
2017-06-22T06:08:17.131 [INFO] 

2017-06-22T06:08:17.131 [INFO] Building Apache Beam :: SDKs :: Java :: 
Extensions 2.1.0-SNAPSHOT
2017-06-22T06:08:17.131 [INFO] 

2017-06-22T06:08:17.134 [INFO] 
2017-06-22T06:08:17.134 [INFO] --- maven-clean-plugin:3.0.0:clean 
(default-clean) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:17.137 [INFO] Deleting 

 (includes = [**/*.pyc, **/*.egg-info/, **/sdks/python/LICENSE, 
**/sdks/python/NOTICE, **/sdks/python/README.md], excludes = [])
2017-06-22T06:08:17.191 [INFO] 
2017-06-22T06:08:17.191 [INFO] --- maven-enforcer-plugin:1.4.1:enforce 
(enforce) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:17.242 [INFO] 
2017-06-22T06:08:17.243 [INFO] --- maven-enforcer-plugin:1.4.1:enforce 
(enforce-banned-dependencies) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:17.296 [INFO] 
2017-06-22T06:08:17.296 [INFO] --- maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:17.356 [INFO] 
2017-06-22T06:08:17.356 [INFO] --- maven-checkstyle-plugin:2.17:check (default) 
@ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:17.969 [INFO] Starting audit...
Audit done.
2017-06-22T06:08:18.019 [INFO] 
2017-06-22T06:08:18.019 [INFO] --- 
build-helper-maven-plugin:3.0.0:regex-properties (render-artifact-id) @ 
beam-sdks-java-extensions-parent ---
2017-06-22T06:08:18.070 [INFO] 
2017-06-22T06:08:18.070 [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:18.171 [INFO] 
2017-06-22T06:08:18.171 [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ 
beam-sdks-java-extensions-parent ---
2017-06-22T06:08:18.172 [INFO] Skipping packaging of the jar
2017-06-22T06:08:18.271 [INFO] 
2017-06-22T06:08:18.272 [INFO] --- maven-jar-plugin:3.0.2:test-jar 
(default-test-jar) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:18.273 [INFO] Skipping packaging of the test-jar
2017-06-22T06:08:18.322 [INFO] 
2017-06-22T06:08:18.322 [INFO] --- maven-shade-plugin:3.0.0:shade 
(bundle-and-repackage) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:18.326 [INFO] Replacing original artifact with shaded artifact.
2017-06-22T06:08:18.376 [INFO] 
2017-06-22T06:08:18.376 [INFO] --- maven-dependency-plugin:3.0.1:analyze-only 
(default) @ beam-sdks-java-extensions-parent ---
2017-06-22T06:08:18.377 [INFO] Skipping pom project
[JENKINS] Archiving disabled
2017-06-22T06:08:19.676 [INFO]  
   
2017-06-22T06:08:19.676 [INFO] 

2017-06-22T06:08:19.676 [INFO] Building Apache Beam :: SDKs :: Java :: 
Extensions :: Google Cloud Platform Core 2.1.0-SNAPSHOT
2017-06-22T06:08:19.676 [INFO] 

2017-06-22T06:08:19.682 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/http-client/google-http-client-jackson2/1.22.0/google-http-client-jackson2-1.22.0.pom
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
2017-06-22T06:08:19.744 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auth/google-auth-library-oauth2-http/0.6.1/google-auth-library-oauth2-http-0.6.1.pom
2017-06-22T06:08:19.772 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auth/google-auth-library-oauth2-http/0.6.1/google-auth-library-oauth2-http-0.6.1.pom
 (3 KB at 76.3 KB/sec)
2017-06-22T06:08:19.774 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/auth/google-auth-library-parent/0.6.1/google-auth-library-parent-0.6.1.pom
2017-06-22T06:08:19.801 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/auth/google-auth-library-parent/0.6.1/google-auth-library-parent-0.6.1.pom
 (8 KB at 294.6 KB/sec)
2017-06-22T06:08:19.804 [IN

[GitHub] beam pull request #3387: Beam-2461 Ln function

2017-06-21 Thread app-tarush
Github user app-tarush closed the pull request at:

https://github.com/apache/beam/pull/3387


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1458) Checkpoint support in Beam

2017-06-21 Thread Reuven Lax (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058773#comment-16058773
 ] 

Reuven Lax commented on BEAM-1458:
--

Correct - I'm working on a doc for snapshot/update support for Beam. My 
proposal is a bit more general than this as it includes the following addition. 
Start a new (possibly different) pipeline from a checkpoint. This allows the 
user to modify their code, possibly add new stages, and then resume that 
pipeline from a checkpoint. This is a generalization of the "update" feature 
that Dataflow has. It requires a compatibility check to ensure that the new 
pipeline is compatible with the checkpoint.


> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2473) Dataflow runner should reject unsupported state & timers (MapState, SetState) at construction time, and documentation should be more prominent

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058690#comment-16058690
 ] 

ASF GitHub Bot commented on BEAM-2473:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3420

[BEAM-2473] DataflowRunner: Reject SetState and MapState

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @jkff 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam 
DataflowRunner-reject-fancy-state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3420


commit c49b4111805b34e797dc29d490d87e588bcdabb0
Author: Kenneth Knowles 
Date:   2017-06-22T03:58:35Z

DataflowRunner: Reject SetState and MapState




> Dataflow runner should reject unsupported state & timers (MapState, SetState) 
> at construction time, and documentation should be more prominent
> --
>
> Key: BEAM-2473
> URL: https://issues.apache.org/jira/browse/BEAM-2473
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3420: [BEAM-2473] DataflowRunner: Reject SetState and Map...

2017-06-21 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3420

[BEAM-2473] DataflowRunner: Reject SetState and MapState

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @jkff 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam 
DataflowRunner-reject-fancy-state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3420


commit c49b4111805b34e797dc29d490d87e588bcdabb0
Author: Kenneth Knowles 
Date:   2017-06-22T03:58:35Z

DataflowRunner: Reject SetState and MapState




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2485) Reject stateful ParDo / DoFn in merging windows at construction time in Dataflow runner

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058670#comment-16058670
 ] 

ASF GitHub Bot commented on BEAM-2485:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3419

[BEAM-2485] DataflowRunner: Reject merging windowing for stateful ParDo

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @jkff 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam DataflowRunner-merging-state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3419


commit 36f7b093db198ac029a596dbf21bfe63ed782f29
Author: Kenneth Knowles 
Date:   2017-06-22T03:25:31Z

DataflowRunner: Reject merging windowing for stateful ParDo




> Reject stateful ParDo / DoFn in merging windows at construction time in 
> Dataflow runner
> ---
>
> Key: BEAM-2485
> URL: https://issues.apache.org/jira/browse/BEAM-2485
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>
> The Dataflow runner does not support state in merging windows. This should be 
> rejected before the job is submitted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3419: [BEAM-2485] DataflowRunner: Reject merging windowin...

2017-06-21 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3419

[BEAM-2485] DataflowRunner: Reject merging windowing for stateful ParDo

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @jkff 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam DataflowRunner-merging-state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3419


commit 36f7b093db198ac029a596dbf21bfe63ed782f29
Author: Kenneth Knowles 
Date:   2017-06-22T03:25:31Z

DataflowRunner: Reject merging windowing for stateful ParDo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1612) Support real Bundle in Flink runner

2017-06-21 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated BEAM-1612:
---
Fix Version/s: 2.1.0

> Support real Bundle in Flink runner
> ---
>
> Key: BEAM-1612
> URL: https://issues.apache.org/jira/browse/BEAM-1612
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
> Fix For: 2.1.0
>
>
> The Bundle is very important in the beam model. Users can use the bundle to 
> flush buffer, can reuse many heavyweight resources in a bundle. Most IO 
> plugins use the bundle to flush. 
> Moreover, FlinkRunner can also use Bundle to reduce access to the FlinkState, 
> such as first placed in JavaHeap, flush into RocksDbState when invoke 
> finishBundle , this can reduce the number of serialization.
> But now FlinkRunner calls the finishBundle every processElement. We need 
> support real Bundle.
> I think we can have the following implementations:
> 1.Invoke finishBundle and next startBundle in {{snapshot}} of Flink. But 
> sometimes this "Bundle" maybe too big. This depends on the user's checkpoint 
> configuration.
> 2.Manually control the size of the bundle. The half-bundle will be flushed to 
> a full-bundle by count or eventTime or processTime or {{snapshot}}. We do not 
> need to wait, just call the startBundle and finishBundle at the right time.
> [Proposal 
> document|https://docs.google.com/document/d/1UzELM4nFu8SIeu-QJkbs0sv7Uzd1Ux4aXXM3cw4s7po/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2486) Should throws some useful messages when statefulParDo use non-KV input

2017-06-21 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058646#comment-16058646
 ] 

Kenneth Knowles edited comment on BEAM-2486 at 6/22/17 2:56 AM:


Hmm, I thought this was done in {{DoFnSignatures}}. It is definitely a 
runner-independent error. I will check it out.


was (Author: kenn):
Hmm, I thought this was done in {{DoFnSignatures}}. It is definitely a 
runner-independent error.

> Should throws some useful messages when statefulParDo use non-KV input
> --
>
> Key: BEAM-2486
> URL: https://issues.apache.org/jira/browse/BEAM-2486
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-flink
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>
> Now Flink runner will throws a ClassCastException without detail messages 
> when a statefulParDo use non-KV input. It is not easy for users to find 
> errors and causes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2486) Should throws some useful messages when statefulParDo use non-KV input

2017-06-21 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058646#comment-16058646
 ] 

Kenneth Knowles commented on BEAM-2486:
---

Hmm, I thought this was done in {{DoFnSignatures}}. It is definitely a 
runner-independent error.

> Should throws some useful messages when statefulParDo use non-KV input
> --
>
> Key: BEAM-2486
> URL: https://issues.apache.org/jira/browse/BEAM-2486
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-flink
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>
> Now Flink runner will throws a ClassCastException without detail messages 
> when a statefulParDo use non-KV input. It is not easy for users to find 
> errors and causes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2486) Should throws some useful messages when statefulParDo use non-KV input

2017-06-21 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2486:
--
Fix Version/s: 2.1.0

> Should throws some useful messages when statefulParDo use non-KV input
> --
>
> Key: BEAM-2486
> URL: https://issues.apache.org/jira/browse/BEAM-2486
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-flink
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>
> Now Flink runner will throws a ClassCastException without detail messages 
> when a statefulParDo use non-KV input. It is not easy for users to find 
> errors and causes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2473) Dataflow runner should reject unsupported state & timers (MapState, SetState) at construction time, and documentation should be more prominent

2017-06-21 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2473:
--
Summary: Dataflow runner should reject unsupported state & timers 
(MapState, SetState) at construction time, and documentation should be more 
prominent  (was: Dataflow runner should reject unsupported state & timers at 
construction time, and documentation should be more prominent)

> Dataflow runner should reject unsupported state & timers (MapState, SetState) 
> at construction time, and documentation should be more prominent
> --
>
> Key: BEAM-2473
> URL: https://issues.apache.org/jira/browse/BEAM-2473
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2473) Dataflow runner should reject unsupported state & timers (MapState, SetState) at construction time, and documentation should be more prominent

2017-06-21 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2473:
--
Fix Version/s: 2.1.0

> Dataflow runner should reject unsupported state & timers (MapState, SetState) 
> at construction time, and documentation should be more prominent
> --
>
> Key: BEAM-2473
> URL: https://issues.apache.org/jira/browse/BEAM-2473
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4179

2017-06-21 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3418: Java Dataflow runner harness compatibility.

2017-06-21 Thread robertwb
GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3418

Java Dataflow runner harness compatibility.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-harness-compat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3418


commit 5467e95fc401db25713ef0982efe27a050657800
Author: Robert Bradshaw 
Date:   2017-06-22T01:09:48Z

Java Dataflow runner harness compatibility.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2498) Dataflow runner should shade Runner/Fn API protos

2017-06-21 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-2498:
-

 Summary: Dataflow runner should shade Runner/Fn API protos
 Key: BEAM-2498
 URL: https://issues.apache.org/jira/browse/BEAM-2498
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Just checked, and runners-core-construction is shaded but not the Runner API 
protos. There may be a technical reason this cannot be done trivially, but we 
need to work at it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2486) Should throws some useful messages when statefulParDo use non-KV input

2017-06-21 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-2486:
-

Assignee: Kenneth Knowles

> Should throws some useful messages when statefulParDo use non-KV input
> --
>
> Key: BEAM-2486
> URL: https://issues.apache.org/jira/browse/BEAM-2486
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-flink
>Reporter: Jingsong Lee
>Assignee: Kenneth Knowles
>
> Now Flink runner will throws a ClassCastException without detail messages 
> when a statefulParDo use non-KV input. It is not easy for users to find 
> errors and causes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4178

2017-06-21 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3399: Add example for Bigquery streaming sink

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3399


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Add example for Bigquery streaming sink

2017-06-21 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master ae50fdd9e -> b3099bba2


Add example for Bigquery streaming sink


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/aa65ea11
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/aa65ea11
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/aa65ea11

Branch: refs/heads/master
Commit: aa65ea11e6e0d50864de21340219b5f4d019dbc2
Parents: ae50fdd
Author: Sourabh Bajaj 
Authored: Wed Jun 21 10:32:14 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 17:05:22 2017 -0700

--
 .../apache_beam/examples/windowed_wordcount.py  | 93 
 1 file changed, 93 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/aa65ea11/sdks/python/apache_beam/examples/windowed_wordcount.py
--
diff --git a/sdks/python/apache_beam/examples/windowed_wordcount.py 
b/sdks/python/apache_beam/examples/windowed_wordcount.py
new file mode 100644
index 000..bd57847
--- /dev/null
+++ b/sdks/python/apache_beam/examples/windowed_wordcount.py
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A streaming word-counting workflow.
+
+Important: streaming pipeline support in Python Dataflow is in development
+and is not yet available for use.
+"""
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+
+
+import apache_beam as beam
+import apache_beam.transforms.window as window
+
+TABLE_SCHEMA = ('word:STRING, count:INTEGER, '
+'window_start:TIMESTAMP, window_end:TIMESTAMP')
+
+
+def find_words(element):
+  import re
+  return re.findall(r'[A-Za-z\']+', element)
+
+
+class FormatDoFn(beam.DoFn):
+  def process(self, element, window=beam.DoFn.WindowParam):
+ts_format = '%Y-%m-%d %H:%M:%S.%f UTC'
+window_start = window.start.to_utc_datetime().strftime(ts_format)
+window_end = window.end.to_utc_datetime().strftime(ts_format)
+return [{'word': element[0],
+ 'count': element[1],
+ 'window_start':window_start,
+ 'window_end':window_end}]
+
+
+def run(argv=None):
+  """Build and run the pipeline."""
+
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+  '--input_topic', required=True,
+  help='Input PubSub topic of the form "/topics//".')
+  parser.add_argument(
+  '--output_table', required=True,
+  help=
+  ('Output BigQuery table for results specified as: PROJECT:DATASET.TABLE '
+   'or DATASET.TABLE.'))
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  with beam.Pipeline(argv=pipeline_args) as p:
+
+# Read the text from PubSub messages
+lines = p | beam.io.ReadStringsFromPubSub(known_args.input_topic)
+
+# Capitalize the characters in each line.
+transformed = (lines
+   | 'Split' >> (beam.FlatMap(find_words)
+ .with_output_types(unicode))
+   | 'PairWithOne' >> beam.Map(lambda x: (x, 1))
+   | beam.WindowInto(window.FixedWindows(2*60, 0))
+   | 'Group' >> beam.GroupByKey()
+   | 'Count' >> beam.Map(lambda (word, ones): (word, 
sum(ones)))
+   | 'Format' >> beam.ParDo(FormatDoFn()))
+
+# Write to BigQuery.
+# pylint: disable=expression-not-assigned
+transformed | 'Write' >> beam.io.WriteToBigQuery(
+known_args.output_table,
+schema=TABLE_SCHEMA,
+create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
+write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
+
+
+if __name__ == '__main__':
+  logging.getLogger().setLevel(logging.INFO)
+  run()



[2/2] beam git commit: This closes #3399

2017-06-21 Thread altay
This closes #3399


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b3099bba
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b3099bba
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b3099bba

Branch: refs/heads/master
Commit: b3099bba2d1b26a3bdd9df0f92d5d2f85f065e21
Parents: ae50fdd aa65ea1
Author: Ahmet Altay 
Authored: Wed Jun 21 17:05:24 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 17:05:24 2017 -0700

--
 .../apache_beam/examples/windowed_wordcount.py  | 93 
 1 file changed, 93 insertions(+)
--




[jira] [Commented] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-21 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058426#comment-16058426
 ] 

Chamikara Jayalath commented on BEAM-2490:
--

I ran the same pipeline with a similar but smaller (8 1MB gzip files)  input 
pattern but didn't observe any data loss. It is possible to share your input 
(or a smaller sample of it from which issue can be reproduced) ?

Also can you try running without the 'SplitLines' step. ReadFromText already 
splits input based on end of line characters, so 'SplitLines' should not have 
any effect.

Size estimation, as name suggests, is an estimation, and we use sampling here. 
So there could be variations between estimated size displayed in the step vs 
the actual size observed.

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2495) Python IT test fails if six<1.9

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058405#comment-16058405
 ] 

ASF GitHub Bot commented on BEAM-2495:
--

Github user markflyhigh closed the pull request at:

https://github.com/apache/beam/pull/3416


> Python IT test fails if six<1.9
> ---
>
> Key: BEAM-2495
> URL: https://issues.apache.org/jira/browse/BEAM-2495
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Python uses nose.plugins.0.10 to manage customized commandline argument: 
> --test-pipeline-options to build test pipeline options and run integration 
> test. nose.plugins.0.10 is required six>=1.9, otherwise, following error will 
> occur:
> {code}
> /tmp/perfkitbenchmarker/runs/985bc121/beam/sdks/python/nose-1.3.7-py2.7.egg/nose/plugins/manager.py:395:
>  RuntimeWarning: Unable to load plugin beam_test_plugin = 
> test_config:BeamTestPlugin: (six 1.5.2 (/usr/lib/python2.7/dist-packages), 
> Requirement.parse('six>=1.9'))
>   RuntimeWarning)
> usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
>or: setup.py --help [cmd1 cmd2 ...]
>or: setup.py --help-commands
>or: setup.py cmd --help
> error: option --test-pipeline-options not recognized
> {code}
> I think we should add this dependency to REQUIRED_TEST_PACKAGES in setup.py



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3416: [BEAM-2495] Add Python test dependency six>=1.9

2017-06-21 Thread markflyhigh
Github user markflyhigh closed the pull request at:

https://github.com/apache/beam/pull/3416


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3416

2017-06-21 Thread altay
This closes #3416


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ae50fdd9
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ae50fdd9
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ae50fdd9

Branch: refs/heads/master
Commit: ae50fdd9ed327c56c15380956c5873533eb2ba52
Parents: e015168 17c5012
Author: Ahmet Altay 
Authored: Wed Jun 21 15:59:47 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 15:59:47 2017 -0700

--
 sdks/python/setup.py | 2 ++
 1 file changed, 2 insertions(+)
--




[1/2] beam git commit: [BEAM-2495] Add Python test dependency six>=1.9

2017-06-21 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master e015168a3 -> ae50fdd9e


[BEAM-2495] Add Python test dependency six>=1.9


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/17c50122
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/17c50122
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/17c50122

Branch: refs/heads/master
Commit: 17c50122e684655846c4e07f19d16a38fa47d5a3
Parents: e015168
Author: Mark Liu 
Authored: Wed Jun 21 14:28:26 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 15:59:44 2017 -0700

--
 sdks/python/setup.py | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/17c50122/sdks/python/setup.py
--
diff --git a/sdks/python/setup.py b/sdks/python/setup.py
index 584c852..6646a58 100644
--- a/sdks/python/setup.py
+++ b/sdks/python/setup.py
@@ -112,6 +112,8 @@ REQUIRED_SETUP_PACKAGES = [
 
 REQUIRED_TEST_PACKAGES = [
 'pyhamcrest>=1.9,<2.0',
+# Six required by nose plugins management.
+'six>=1.9',
 ]
 
 GCP_REQUIREMENTS = [



Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4177

2017-06-21 Thread Apache Jenkins Server
See 


Changes:

[pei] ReduceFnRunner.onTrigger: skip storeCurrentPaneInfo() if trigger

--
[...truncated 1.28 MB...]
2017-06-21T22:23:42.046 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/activemq/activemq-jaas/5.13.1/activemq-jaas-5.13.1.pom
2017-06-21T22:23:42.072 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/activemq/activemq-jaas/5.13.1/activemq-jaas-5.13.1.pom
 (6 KB at 207.6 KB/sec)
2017-06-21T22:23:42.076 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/activemq/activemq-kahadb-store/5.13.1/activemq-kahadb-store-5.13.1.pom
2017-06-21T22:23:42.104 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/activemq/activemq-kahadb-store/5.13.1/activemq-kahadb-store-5.13.1.pom
 (10 KB at 352.2 KB/sec)
2017-06-21T22:23:42.108 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/activemq/protobuf/activemq-protobuf/1.1/activemq-protobuf-1.1.pom
2017-06-21T22:23:42.135 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/activemq/protobuf/activemq-protobuf/1.1/activemq-protobuf-1.1.pom
 (3 KB at 107.5 KB/sec)
2017-06-21T22:23:42.136 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/activemq/protobuf/activemq-protobuf-pom/1.1/activemq-protobuf-pom-1.1.pom
2017-06-21T22:23:42.164 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/activemq/protobuf/activemq-protobuf-pom/1.1/activemq-protobuf-pom-1.1.pom
 (12 KB at 411.9 KB/sec)
2017-06-21T22:23:42.166 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-net/commons-net/3.3/commons-net-3.3.pom
2017-06-21T22:23:42.193 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-net/commons-net/3.3/commons-net-3.3.pom
 (19 KB at 693.0 KB/sec)
2017-06-21T22:23:42.195 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/30/commons-parent-30.pom
2017-06-21T22:23:42.226 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/commons/commons-parent/30/commons-parent-30.pom
 (55 KB at 1759.5 KB/sec)
[JENKINS] Archiving disabled
2017-06-21T22:23:43.475 [INFO]  
   
2017-06-21T22:23:43.475 [INFO] 

2017-06-21T22:23:43.475 [INFO] Skipping Apache Beam :: Parent
2017-06-21T22:23:43.475 [INFO] This project has been banned from the build due 
to previous failures.
2017-06-21T22:23:43.475 [INFO] 

[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
2017-06-21T22:24:00.924 [INFO] 

2017-06-21T22:24:00.924 [INFO] Reactor Summary:
2017-06-21T22:24:00.924 [INFO] 
2017-06-21T22:24:00.924 [INFO] Apache Beam :: Parent 
.. SUCCESS [ 25.308 s]
2017-06-21T22:24:00.924 [INFO] Apache Beam :: SDKs :: Java :: Build Tools 
. SUCCESS [ 16.610 s]
2017-06-21T22:24:00.924 [INFO] Apache Beam :: SDKs 
 SUCCESS [  4.103 s]
2017-06-21T22:24:00.924 [INFO] Apache Beam :: SDKs :: Common 
.. SUCCESS [  1.706 s]
2017-06-21T22:24:00.924 [INFO] Apache Beam :: SDKs :: Common :: Runner API 
 SUCCESS [ 22.352 s]
2017-06-21T22:24:00.924 [INFO] Apache Beam :: SDKs :: Common :: Fn API 
 SUCCESS [ 17.377 s]
2017-06-21T22:24:00.924 [INFO] Apache Beam :: SDKs :: Java 
.

[jira] [Commented] (BEAM-1377) Support Splittable DoFn in Dataflow streaming runner

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058370#comment-16058370
 ] 

ASF GitHub Bot commented on BEAM-1377:
--

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3417

[BEAM-1377] Uses KV in SplittableParDo expansion instead of 
ElementAndRestriction

This is a workaround for the following issue.

ElementAndRestriction is in runners-core, which may be shaded by runners
(and is shaded by Dataflow runner), hence it should be *both* produced
and consumed by workers - but currently it's produced by (shaded)
SplittableParDo and consumed by (differently shaded) ProcessFn in the
runner's worker code.

There are several ways out of this, e.g. moving EAR into the SDK (icky
because it's an implementation detail of SplittableParDo), or using
a type that's already in the SDK. There may be other more complicated
ways too.

(This PR will require building a compatible Dataflow worker, so it will 
naturally not pass tests initially)

R: @kennknowles 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam sdf-kv

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3417


commit c4885833c6edd37fcd3161a124292044ebe820da
Author: Eugene Kirpichov 
Date:   2017-06-21T22:21:32Z

Uses KV in SplittableParDo expansion instead of ElementAndRestriction

This is a workaround for the following issue.

ElementAndRestriction is in runners-core, which may be shaded by runners
(and is shaded by Dataflow runner), hence it should be *both* produced
and consumed by workers - but currently it's produced by (shaded)
SplittableParDo and consumed by (differently shaded) ProcessFn in the
runner's worker code.

There are several ways out of this, e.g. moving EAR into the SDK (icky
because it's an implementation detail of SplittableParDo), or using
a type that's already in the SDK. There may be other more complicated
ways too.




> Support Splittable DoFn in Dataflow streaming runner
> 
>
> Key: BEAM-1377
> URL: https://issues.apache.org/jira/browse/BEAM-1377
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
> Fix For: 2.1.0
>
>
> Dataflow runner should support splittable DoFn.
> However, Dataflow batch and streaming runners will support it quite 
> differently, streaming being the somewhat easier one. The current issue is 
> about the streaming runner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3417: [BEAM-1377] Uses KV in SplittableParDo expansion in...

2017-06-21 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3417

[BEAM-1377] Uses KV in SplittableParDo expansion instead of 
ElementAndRestriction

This is a workaround for the following issue.

ElementAndRestriction is in runners-core, which may be shaded by runners
(and is shaded by Dataflow runner), hence it should be *both* produced
and consumed by workers - but currently it's produced by (shaded)
SplittableParDo and consumed by (differently shaded) ProcessFn in the
runner's worker code.

There are several ways out of this, e.g. moving EAR into the SDK (icky
because it's an implementation detail of SplittableParDo), or using
a type that's already in the SDK. There may be other more complicated
ways too.

(This PR will require building a compatible Dataflow worker, so it will 
naturally not pass tests initially)

R: @kennknowles 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam sdf-kv

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3417


commit c4885833c6edd37fcd3161a124292044ebe820da
Author: Eugene Kirpichov 
Date:   2017-06-21T22:21:32Z

Uses KV in SplittableParDo expansion instead of ElementAndRestriction

This is a workaround for the following issue.

ElementAndRestriction is in runners-core, which may be shaded by runners
(and is shaded by Dataflow runner), hence it should be *both* produced
and consumed by workers - but currently it's produced by (shaded)
SplittableParDo and consumed by (differently shaded) ProcessFn in the
runner's worker code.

There are several ways out of this, e.g. moving EAR into the SDK (icky
because it's an implementation detail of SplittableParDo), or using
a type that's already in the SDK. There may be other more complicated
ways too.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2497) TextIO can't read concatenated gzip files

2017-06-21 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-2497:
-

 Summary: TextIO can't read concatenated gzip files
 Key: BEAM-2497
 URL: https://issues.apache.org/jira/browse/BEAM-2497
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Affects Versions: 2.0.0
Reporter: Ahmet Altay
Assignee: Chamikara Jayalath
 Fix For: 2.1.0


I can reproduce this with python {{DirectRunner}} using the same repro steps as 
https://issues.apache.org/jira/browse/BEAM-167.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4176

2017-06-21 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3235: ReduceFnRunner.onTrigger: skip storeCurrentPaneInfo...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3235

2017-06-21 Thread pei
This closes #3235


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e015168a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e015168a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e015168a

Branch: refs/heads/master
Commit: e015168a379e47686a24e48bdce7c14cea94f495
Parents: 3746d4c cd88630
Author: Pei He 
Authored: Wed Jun 21 14:57:33 2017 -0700
Committer: Pei He 
Committed: Wed Jun 21 14:57:33 2017 -0700

--
 .../java/org/apache/beam/runners/core/ReduceFnRunner.java| 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--




[1/2] beam git commit: ReduceFnRunner.onTrigger: skip storeCurrentPaneInfo() if trigger isFinished.

2017-06-21 Thread pei
Repository: beam
Updated Branches:
  refs/heads/master 3746d4cad -> e015168a3


ReduceFnRunner.onTrigger: skip storeCurrentPaneInfo() if trigger isFinished.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/cd886300
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/cd886300
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/cd886300

Branch: refs/heads/master
Commit: cd886300719ac9d702fbe7b105b09bdc5bbe0d3b
Parents: 3746d4c
Author: Author: 波特 
Authored: Fri May 26 17:40:27 2017 +0800
Committer: Pei He 
Committed: Wed Jun 21 14:56:58 2017 -0700

--
 .../java/org/apache/beam/runners/core/ReduceFnRunner.java| 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/cd886300/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
index 62d519f..b5c3e3e 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
@@ -948,7 +948,7 @@ public class ReduceFnRunner {
   private Instant onTrigger(
   final ReduceFn.Context directContext,
   ReduceFn.Context renamedContext,
-  boolean isFinished, boolean isEndOfWindow)
+  final boolean isFinished, boolean isEndOfWindow)
   throws Exception {
 Instant inputWM = timerInternals.currentInputWatermarkTime();
 
@@ -1005,9 +1005,11 @@ public class ReduceFnRunner {
 @Override
 public void output(OutputT toOutput) {
   // We're going to output panes, so commit the (now used) 
PaneInfo.
-  // TODO: This is unnecessary if the trigger isFinished since 
the saved
+  // This is unnecessary if the trigger isFinished since the 
saved
   // state will be immediately deleted.
-  paneInfoTracker.storeCurrentPaneInfo(directContext, pane);
+  if (!isFinished) {
+paneInfoTracker.storeCurrentPaneInfo(directContext, pane);
+  }
 
   // Output the actual value.
   outputter.outputWindowedValue(



[jira] [Commented] (BEAM-2496) Use virtualenv in Performance test for dependency management

2017-06-21 Thread Mark Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058320#comment-16058320
 ] 

Mark Liu commented on BEAM-2496:


looks like we can use virtualenv in Jenkins DSL 
(https://jenkinsci.github.io/job-dsl-plugin/#path/javaposse.jobdsl.dsl.DslFactory.job-steps-virtualenv),
 which however need  ShiningPanda plugin installed.

> Use virtualenv in Performance test for dependency management
> 
>
> Key: BEAM-2496
> URL: https://issues.apache.org/jira/browse/BEAM-2496
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Currently, we directly install required python packages on Jenkins executors 
> in performance test , which will cause problem if new version of a package is 
> required, or previous test mess up with the environment.
> We should use virtualenv or similar tools to manage the requirement packages 
> to build a clean environment with no leftover after whole test is finished.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4175

2017-06-21 Thread Apache Jenkins Server
See 


Changes:

[altay] Allow production of unprocessed bundles, introduce TestStream evaluator

--
[...truncated 3.34 MB...]
2017-06-21T21:49:42.429 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/javax/servlet/javax.servlet-api/3.0.1/javax.servlet-api-3.0.1.jar
 (84 KB at 40.3 KB/sec)
2017-06-21T21:49:42.429 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-framework/2.1.2/grizzly-framework-2.1.2.jar
2017-06-21T21:49:42.435 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/sun/jersey/jersey-grizzly2/1.9/jersey-grizzly2-1.9.jar
 (18 KB at 8.3 KB/sec)
2017-06-21T21:49:42.435 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/gmbal/gmbal-api-only/3.0.0-b023/gmbal-api-only-3.0.0-b023.jar
2017-06-21T21:49:42.449 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-common/2.2.0/hadoop-common-2.2.0.jar
 (2672 KB at 1279.4 KB/sec)
2017-06-21T21:49:42.449 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/external/management-api/3.0.0-b012/management-api-3.0.0-b012.jar
2017-06-21T21:49:42.463 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/gmbal/gmbal-api-only/3.0.0-b023/gmbal-api-only-3.0.0-b023.jar
 (22 KB at 10.1 KB/sec)
2017-06-21T21:49:42.463 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-http-server/2.1.2/grizzly-http-server-2.1.2.jar
2017-06-21T21:49:42.470 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.2.0/hadoop-yarn-common-2.2.0.jar
 (1272 KB at 602.7 KB/sec)
2017-06-21T21:49:42.470 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-rcm/2.1.2/grizzly-rcm-2.1.2.jar
2017-06-21T21:49:42.496 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/external/management-api/3.0.0-b012/management-api-3.0.0-b012.jar
 (42 KB at 19.3 KB/sec)
2017-06-21T21:49:42.496 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-http-servlet/2.1.2/grizzly-http-servlet-2.1.2.jar
2017-06-21T21:49:42.500 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-rcm/2.1.2/grizzly-rcm-2.1.2.jar
 (8 KB at 3.7 KB/sec)
2017-06-21T21:49:42.500 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/glassfish/javax.servlet/3.1/javax.servlet-3.1.jar
2017-06-21T21:49:42.518 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-http/2.1.2/grizzly-http-2.1.2.jar
 (248 KB at 114.6 KB/sec)
2017-06-21T21:49:42.519 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-core/2.4.4/jackson-core-2.4.4.jar
2017-06-21T21:49:42.521 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-http-server/2.1.2/grizzly-http-server-2.1.2.jar
 (194 KB at 89.7 KB/sec)
2017-06-21T21:49:42.521 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.4.4/jackson-annotations-2.4.4.jar
2017-06-21T21:49:42.536 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-framework/2.1.2/grizzly-framework-2.1.2.jar
 (675 KB at 310.2 KB/sec)
2017-06-21T21:49:42.538 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.4.4/jackson-databind-2.4.4.jar
2017-06-21T21:49:42.536 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/javax.servlet/3.1/javax.servlet-3.1.jar
 (82 KB at 37.7 KB/sec)
2017-06-21T21:49:42.538 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar
2017-06-21T21:49:42.553 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.4.4/jackson-annotations-2.4.4.jar
 (38 KB at 17.2 KB/sec)
2017-06-21T21:49:42.553 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/kafka/kafka-clients/0.9.0.1/kafka-clients-0.9.0.1.jar
2017-06-21T21:49:42.594 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/glassfish/grizzly/grizzly-http-servlet/2.1.2/grizzly-http-servlet-2.1.2.jar
 (330 KB at 147.5 KB/sec)
2017-06-21T21:49:42.594 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/kafka/kafka_2.10/0.9.0.1/kafka_2.10-0.9.0.1.jar
2017-06-21T21:49:42.606 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-core/2.4.4/jackson-core-2.4.4.jar
 (221 KB at 98.0 KB/sec)
2017-06-21T21:49:42.606 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/101tec/zkclient/0.7/zkclient-0.7.jar
2017-06-21T21:49:42.647 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/101tec/zkclient/0.7/zkclient-0.7.jar 
(73 KB at 31.5 KB/sec)
2017-06-21T21:49:42.

[jira] [Created] (BEAM-2496) Use virtualenv in Performance test for dependency management

2017-06-21 Thread Mark Liu (JIRA)
Mark Liu created BEAM-2496:
--

 Summary: Use virtualenv in Performance test for dependency 
management
 Key: BEAM-2496
 URL: https://issues.apache.org/jira/browse/BEAM-2496
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu


Currently, we directly install required python packages on Jenkins executors in 
performance test , which will cause problem if new version of a package is 
required, or previous test mess up with the environment.

We should use virtualenv or similar tools to manage the requirement packages to 
build a clean environment with no leftover after whole test is finished.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2495) Python IT test fails if six<1.9

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058302#comment-16058302
 ] 

ASF GitHub Bot commented on BEAM-2495:
--

GitHub user markflyhigh opened a pull request:

https://github.com/apache/beam/pull/3416

[BEAM-2495] Add Python test dependency six>=1.9

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

It's required by nose plugins management which is used to define customized 
commandline argument `--test-pipeline-options` and helps to run python 
integration test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markflyhigh/incubator-beam update-python-six

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3416


commit 541f5b2e42076460a11a54b7f29c2bb60c9c8dd6
Author: Mark Liu 
Date:   2017-06-21T21:28:26Z

[BEAM-2495] Add Python test dependency six>=1.9




> Python IT test fails if six<1.9
> ---
>
> Key: BEAM-2495
> URL: https://issues.apache.org/jira/browse/BEAM-2495
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Python uses nose.plugins.0.10 to manage customized commandline argument: 
> --test-pipeline-options to build test pipeline options and run integration 
> test. nose.plugins.0.10 is required six>=1.9, otherwise, following error will 
> occur:
> {code}
> /tmp/perfkitbenchmarker/runs/985bc121/beam/sdks/python/nose-1.3.7-py2.7.egg/nose/plugins/manager.py:395:
>  RuntimeWarning: Unable to load plugin beam_test_plugin = 
> test_config:BeamTestPlugin: (six 1.5.2 (/usr/lib/python2.7/dist-packages), 
> Requirement.parse('six>=1.9'))
>   RuntimeWarning)
> usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
>or: setup.py --help [cmd1 cmd2 ...]
>or: setup.py --help-commands
>or: setup.py cmd --help
> error: option --test-pipeline-options not recognized
> {code}
> I think we should add this dependency to REQUIRED_TEST_PACKAGES in setup.py



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3416: [BEAM-2495] Add Python test dependency six>=1.9

2017-06-21 Thread markflyhigh
GitHub user markflyhigh opened a pull request:

https://github.com/apache/beam/pull/3416

[BEAM-2495] Add Python test dependency six>=1.9

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

It's required by nose plugins management which is used to define customized 
commandline argument `--test-pipeline-options` and helps to run python 
integration test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markflyhigh/incubator-beam update-python-six

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3416


commit 541f5b2e42076460a11a54b7f29c2bb60c9c8dd6
Author: Mark Liu 
Date:   2017-06-21T21:28:26Z

[BEAM-2495] Add Python test dependency six>=1.9




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink #3204

2017-06-21 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4174

2017-06-21 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2495) Python IT test fails if six<1.9

2017-06-21 Thread Mark Liu (JIRA)
Mark Liu created BEAM-2495:
--

 Summary: Python IT test fails if six<1.9
 Key: BEAM-2495
 URL: https://issues.apache.org/jira/browse/BEAM-2495
 Project: Beam
  Issue Type: Bug
  Components: sdk-py, testing
Reporter: Mark Liu
Assignee: Mark Liu


Python uses nose.plugins.0.10 to manage customized commandline argument: 
--test-pipeline-options to build test pipeline options and run integration 
test. nose.plugins.0.10 is required six>=1.9, otherwise, following error will 
occur:
{code}
/tmp/perfkitbenchmarker/runs/985bc121/beam/sdks/python/nose-1.3.7-py2.7.egg/nose/plugins/manager.py:395:
 RuntimeWarning: Unable to load plugin beam_test_plugin = 
test_config:BeamTestPlugin: (six 1.5.2 (/usr/lib/python2.7/dist-packages), 
Requirement.parse('six>=1.9'))
  RuntimeWarning)
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: option --test-pipeline-options not recognized
{code}

I think we should add this dependency to REQUIRED_TEST_PACKAGES in setup.py



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2493) TestStream.Builder.addElements() should return the same builder

2017-06-21 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2493:
--

Assignee: Thomas Groh  (was: Davor Bonaci)

> TestStream.Builder.addElements() should return the same builder
> ---
>
> Key: BEAM-2493
> URL: https://issues.apache.org/jira/browse/BEAM-2493
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Keith Berkoben
>Assignee: Thomas Groh
>
> When writing tests for pipelines, it is commonly the case where a TestStream 
> must be built in steps ex: 
> TestStream.Builder tsb = 
> TestStream.create().advanceWatermarkTo(new Instant(0);
> if(){
>   tsb.addElements();
> }
> TestStream  stream = tsb.advanceWatermarkToInfinity();
> The above code does not  work, however, because addElements() is creating a 
> NEW builder rather than augmenting the existing one.  This is a-typical for a 
> builder pattern and requires the user to do 
> tsb = tsb.addElements()
> which is more verbose and counterintuitive if one is expecting a builder. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2494) Remove 'GroupedShuffleRangeTracker' which is unused in the SDK

2017-06-21 Thread Chamikara Jayalath (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-2494.
--
   Resolution: Fixed
Fix Version/s: 2.1.0

> Remove 'GroupedShuffleRangeTracker' which is unused in the SDK
> --
>
> Key: BEAM-2494
> URL: https://issues.apache.org/jira/browse/BEAM-2494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2494) Remove 'GroupedShuffleRangeTracker' which is unused in the SDK

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058232#comment-16058232
 ] 

ASF GitHub Bot commented on BEAM-2494:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3414


> Remove 'GroupedShuffleRangeTracker' which is unused in the SDK
> --
>
> Key: BEAM-2494
> URL: https://issues.apache.org/jira/browse/BEAM-2494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3414: [BEAM-2494] Remove GroupedShuffleRangeTracker which...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3414


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Remove GroupedShuffleRangeTracker which is unused in the SDK

2017-06-21 Thread chamikara
Repository: beam
Updated Branches:
  refs/heads/master f0467b72f -> 3746d4cad


Remove GroupedShuffleRangeTracker which is unused in the SDK


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c6d0d798
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c6d0d798
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c6d0d798

Branch: refs/heads/master
Commit: c6d0d7983b19ce2e01b7b06a12f704fef17a00cc
Parents: f0467b7
Author: chamik...@google.com 
Authored: Wed Jun 21 10:37:11 2017 -0700
Committer: chamik...@google.com 
Committed: Wed Jun 21 14:05:17 2017 -0700

--
 sdks/python/apache_beam/io/range_trackers.py| 130 -
 .../apache_beam/io/range_trackers_test.py   | 186 ---
 2 files changed, 316 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c6d0d798/sdks/python/apache_beam/io/range_trackers.py
--
diff --git a/sdks/python/apache_beam/io/range_trackers.py 
b/sdks/python/apache_beam/io/range_trackers.py
index 9cb36e7..bef77d4 100644
--- a/sdks/python/apache_beam/io/range_trackers.py
+++ b/sdks/python/apache_beam/io/range_trackers.py
@@ -193,136 +193,6 @@ class OffsetRangeTracker(iobase.RangeTracker):
 self._split_points_unclaimed_callback = callback
 
 
-class GroupedShuffleRangeTracker(iobase.RangeTracker):
-  """For internal use only; no backwards-compatibility guarantees.
-
-  A 'RangeTracker' for positions used by'GroupedShuffleReader'.
-
-  These positions roughly correspond to hashes of keys. In case of hash
-  collisions, multiple groups can have the same position. In that case, the
-  first group at a particular position is considered a split point (because
-  it is the first to be returned when reading a position range starting at this
-  position), others are not.
-  """
-
-  def __init__(self, decoded_start_pos, decoded_stop_pos):
-super(GroupedShuffleRangeTracker, self).__init__()
-self._decoded_start_pos = decoded_start_pos
-self._decoded_stop_pos = decoded_stop_pos
-self._decoded_last_group_start = None
-self._last_group_was_at_a_split_point = False
-self._split_points_seen = 0
-self._lock = threading.Lock()
-
-  def start_position(self):
-return self._decoded_start_pos
-
-  def stop_position(self):
-return self._decoded_stop_pos
-
-  def last_group_start(self):
-return self._decoded_last_group_start
-
-  def _validate_decoded_group_start(self, decoded_group_start, split_point):
-if self.start_position() and decoded_group_start < self.start_position():
-  raise ValueError('Trying to return record at %r which is before the'
-   ' starting position at %r' %
-   (decoded_group_start, self.start_position()))
-
-if (self.last_group_start() and
-decoded_group_start < self.last_group_start()):
-  raise ValueError('Trying to return group at %r which is before the'
-   ' last-returned group at %r' %
-   (decoded_group_start, self.last_group_start()))
-if (split_point and self.last_group_start() and
-self.last_group_start() == decoded_group_start):
-  raise ValueError('Trying to return a group at a split point with '
-   'same position as the previous group: both at %r, '
-   'last group was %sat a split point.' %
-   (decoded_group_start,
-('' if self._last_group_was_at_a_split_point
- else 'not ')))
-if not split_point:
-  if self.last_group_start() is None:
-raise ValueError('The first group [at %r] must be at a split point' %
- decoded_group_start)
-  if self.last_group_start() != decoded_group_start:
-# This case is not a violation of general RangeTracker semantics, but 
it
-# is contrary to how GroupingShuffleReader in particular works. Hitting
-# it would mean it's behaving unexpectedly.
-raise ValueError('Trying to return a group not at a split point, but '
- 'with a different position than the previous group: '
- 'last group was %r at %r, current at a %s split'
- ' point.' %
- (self.last_group_start()
-  , decoded_group_start
-  , ('' if self._last_group_was_at_a_split_point
- else 'non-')))
-
-  def try_claim(self, decoded_group_start):
-with self._lock:
-  self._validate_decoded_group_start(decoded_group_start, True)
-  if (self.stop_position()
-  and decoded_group_start >= self.stop_position()):
-return False
-
-  self._decoded_last

[2/2] beam git commit: This closes #3414

2017-06-21 Thread chamikara
This closes #3414


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3746d4ca
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3746d4ca
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3746d4ca

Branch: refs/heads/master
Commit: 3746d4cadca034d6c45436efb0fc71304d093f91
Parents: f0467b7 c6d0d79
Author: chamik...@google.com 
Authored: Wed Jun 21 14:05:44 2017 -0700
Committer: chamik...@google.com 
Committed: Wed Jun 21 14:05:44 2017 -0700

--
 sdks/python/apache_beam/io/range_trackers.py| 130 -
 .../apache_beam/io/range_trackers_test.py   | 186 ---
 2 files changed, 316 deletions(-)
--




Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink #3203

2017-06-21 Thread Apache Jenkins Server
See 


Changes:

[altay] Allow production of unprocessed bundles, introduce TestStream evaluator

--
[...truncated 465.87 KB...]
2017-06-21T20:49:50.968 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-beanutils/commons-beanutils-bean-collections/1.8.3/commons-beanutils-bean-collections-1.8.3.pom
 (2 KB at 72.9 KB/sec)
2017-06-21T20:49:50.970 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.pom
2017-06-21T20:49:50.998 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.pom
 (5 KB at 151.7 KB/sec)
2017-06-21T20:49:51.001 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.pom
2017-06-21T20:49:51.030 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.pom
 (6 KB at 187.6 KB/sec)
2017-06-21T20:49:51.032 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.pom
2017-06-21T20:49:51.069 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.pom
 (962 B at 25.4 KB/sec)
2017-06-21T20:49:51.070 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/javax/activation/activation/1.1/activation-1.1.pom
2017-06-21T20:49:51.097 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/javax/activation/activation/1.1/activation-1.1.pom
 (2 KB at 38.3 KB/sec)
2017-06-21T20:49:51.099 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-runtime_2.10/1.3.0/flink-runtime_2.10-1.3.0.pom
2017-06-21T20:49:51.139 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-runtime_2.10/1.3.0/flink-runtime_2.10-1.3.0.pom
 (14 KB at 348.2 KB/sec)
2017-06-21T20:49:51.143 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/netty/netty-all/4.0.27.Final/netty-all-4.0.27.Final.pom
2017-06-21T20:49:51.171 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/netty/netty-all/4.0.27.Final/netty-all-4.0.27.Final.pom
 (17 KB at 601.8 KB/sec)
2017-06-21T20:49:51.173 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/netty/netty-parent/4.0.27.Final/netty-parent-4.0.27.Final.pom
2017-06-21T20:49:51.208 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/netty/netty-parent/4.0.27.Final/netty-parent-4.0.27.Final.pom
 (48 KB at 1361.2 KB/sec)
2017-06-21T20:49:51.215 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/javassist/javassist/3.18.2-GA/javassist-3.18.2-GA.pom
2017-06-21T20:49:51.246 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/javassist/javassist/3.18.2-GA/javassist-3.18.2-GA.pom
 (10 KB at 300.0 KB/sec)
2017-06-21T20:49:51.248 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.pom
2017-06-21T20:49:51.275 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.pom
 (2 KB at 73.9 KB/sec)
2017-06-21T20:49:51.277 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/data-artisans/flakka-actor_2.10/2.3-custom/flakka-actor_2.10-2.3-custom.pom
2017-06-21T20:49:51.313 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/data-artisans/flakka-actor_2.10/2.3-custom/flakka-actor_2.10-2.3-custom.pom
 (2 KB at 52.5 KB/sec)
2017-06-21T20:49:51.315 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/typesafe/config/1.2.1/config-1.2.1.pom
2017-06-21T20:49:51.343 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/typesafe/config/1.2.1/config-1.2.1.pom 
(2 KB at 65.1 KB/sec)
2017-06-21T20:49:51.345 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/data-artisans/flakka-remote_2.10/2.3-custom/flakka-remote_2.10-2.3-custom.pom
2017-06-21T20:49:51.375 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/data-artisans/flakka-remote_2.10/2.3-custom/flakka-remote_2.10-2.3-custom.pom
 (4 KB at 116.8 KB/sec)
2017-06-21T20:49:51.377 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/netty/netty/3.8.0.Final/netty-3.8.0.Final.pom
2017-06-21T20:49:51.407 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/netty/netty/3.8.0.Final/netty-3.8.0.Final.pom
 (26 KB at 853.0 KB/sec)
2017-06-21T20:49:51.412 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.pom
2017-06-21T20:49:51.445 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.pom
 (2 KB at 48.8 KB/sec)
2017-06-21T20:49:51.446 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/data-artisans/flakka-slf4j_2.10/2.3-custom/flakka-slf4j_2.10-2.3-custom.pom
2017-06-21T20:49:51.475 [INF

[jira] [Commented] (BEAM-1265) Add streaming support to Python DirectRunner

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058195#comment-16058195
 ] 

ASF GitHub Bot commented on BEAM-1265:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3409


> Add streaming support to Python DirectRunner
> 
>
> Key: BEAM-1265
> URL: https://issues.apache.org/jira/browse/BEAM-1265
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>
> Continue the work started in https://issues.apache.org/jira/browse/BEAM-428



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3409: [BEAM-1265] Allow production of unprocessed bundles...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3409


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3409

2017-06-21 Thread altay
This closes #3409


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f0467b72
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f0467b72
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f0467b72

Branch: refs/heads/master
Commit: f0467b72fd8dfbffaaeb353abfaa1fefa1ee0092
Parents: 28c6fd4 3520f94
Author: Ahmet Altay 
Authored: Wed Jun 21 13:44:08 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 13:44:08 2017 -0700

--
 .../runners/direct/evaluation_context.py| 14 ++--
 .../apache_beam/runners/direct/executor.py  | 40 +++--
 .../runners/direct/transform_evaluator.py   | 88 ++--
 sdks/python/apache_beam/runners/direct/util.py  |  4 +-
 .../runners/direct/watermark_manager.py | 11 ++-
 sdks/python/apache_beam/testing/test_stream.py  |  5 ++
 .../apache_beam/testing/test_stream_test.py | 37 
 7 files changed, 176 insertions(+), 23 deletions(-)
--




[1/2] beam git commit: Allow production of unprocessed bundles, introduce TestStream evaluator in DirectRunner

2017-06-21 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 28c6fd42e -> f0467b72f


Allow production of unprocessed bundles, introduce TestStream evaluator in 
DirectRunner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3520f948
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3520f948
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3520f948

Branch: refs/heads/master
Commit: 3520f94882b00aa8db64f6379044689d1b78ac06
Parents: 28c6fd4
Author: Charles Chen 
Authored: Tue Jun 20 17:16:20 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 13:44:05 2017 -0700

--
 .../runners/direct/evaluation_context.py| 14 ++--
 .../apache_beam/runners/direct/executor.py  | 40 +++--
 .../runners/direct/transform_evaluator.py   | 88 ++--
 sdks/python/apache_beam/runners/direct/util.py  |  4 +-
 .../runners/direct/watermark_manager.py | 11 ++-
 sdks/python/apache_beam/testing/test_stream.py  |  5 ++
 .../apache_beam/testing/test_stream_test.py | 37 
 7 files changed, 176 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3520f948/sdks/python/apache_beam/runners/direct/evaluation_context.py
--
diff --git a/sdks/python/apache_beam/runners/direct/evaluation_context.py 
b/sdks/python/apache_beam/runners/direct/evaluation_context.py
index 976e9e8..669a68a 100644
--- a/sdks/python/apache_beam/runners/direct/evaluation_context.py
+++ b/sdks/python/apache_beam/runners/direct/evaluation_context.py
@@ -208,11 +208,12 @@ class EvaluationContext(object):
   the committed bundles contained within the handled result.
 """
 with self._lock:
-  committed_bundles = self._commit_bundles(
-  result.uncommitted_output_bundles)
+  committed_bundles, unprocessed_bundles = self._commit_bundles(
+  result.uncommitted_output_bundles,
+  result.unprocessed_bundles)
   self._watermark_manager.update_watermarks(
   completed_bundle, result.transform, completed_timers,
-  committed_bundles, result.watermark_hold)
+  committed_bundles, unprocessed_bundles, result.watermark_hold)
 
   self._metrics.commit_logical(completed_bundle,
result.logical_metric_updates)
@@ -252,14 +253,17 @@ class EvaluationContext(object):
   executor_service.submit(task)
 self._pending_unblocked_tasks = []
 
-  def _commit_bundles(self, uncommitted_bundles):
+  def _commit_bundles(self, uncommitted_bundles, unprocessed_bundles):
 """Commits bundles and returns a immutable set of committed bundles."""
 for in_progress_bundle in uncommitted_bundles:
   producing_applied_ptransform = in_progress_bundle.pcollection.producer
   watermarks = self._watermark_manager.get_watermarks(
   producing_applied_ptransform)
   in_progress_bundle.commit(watermarks.synchronized_processing_output_time)
-return tuple(uncommitted_bundles)
+
+for unprocessed_bundle in unprocessed_bundles:
+  unprocessed_bundle.commit(None)
+return tuple(uncommitted_bundles), tuple(unprocessed_bundles)
 
   def get_execution_context(self, applied_ptransform):
 return _ExecutionContext(

http://git-wip-us.apache.org/repos/asf/beam/blob/3520f948/sdks/python/apache_beam/runners/direct/executor.py
--
diff --git a/sdks/python/apache_beam/runners/direct/executor.py 
b/sdks/python/apache_beam/runners/direct/executor.py
index a0a3886..e70e326 100644
--- a/sdks/python/apache_beam/runners/direct/executor.py
+++ b/sdks/python/apache_beam/runners/direct/executor.py
@@ -227,17 +227,25 @@ class _CompletionCallback(object):
 self._all_updates = all_updates
 self._timer_firings = timer_firings or []
 
-  def handle_result(self, input_committed_bundle, transform_result):
+  def handle_result(self, transform_executor, input_committed_bundle,
+transform_result):
 output_committed_bundles = self._evaluation_context.handle_result(
 input_committed_bundle, self._timer_firings, transform_result)
 for output_committed_bundle in output_committed_bundles:
   self._all_updates.offer(_ExecutorServiceParallelExecutor._ExecutorUpdate(
-  output_committed_bundle, None))
+  transform_executor,
+  committed_bundle=output_committed_bundle))
+for unprocessed_bundle in transform_result.unprocessed_bundles:
+  self._all_updates.offer(
+  _ExecutorServiceParallelExecutor._ExecutorUpdate(
+  transform_executor,
+  unprocessed_bundle=unprocessed_bundle))
 return output_committed_bundles
 
-  def handle_exception(self, exception):
+  de

[1/2] beam git commit: Fix minor issues on HCatalogIO

2017-06-21 Thread iemejia
Repository: beam
Updated Branches:
  refs/heads/master 2d25b6840 -> 28c6fd42e


Fix minor issues on HCatalogIO

- Restrict access level when possible
- Rename Filter to Partition for the write to be coherent with the HCatalog API
- Improve test coverage
- Fix documentation details
- Implement TearDown method for the writer


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c11f0ff5
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c11f0ff5
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c11f0ff5

Branch: refs/heads/master
Commit: c11f0ff57efca5786fb5da20006d9eb96b44cffe
Parents: 2d25b68
Author: Ismaël Mejía 
Authored: Fri Jun 9 00:01:55 2017 +0200
Committer: Ismaël Mejía 
Committed: Wed Jun 21 21:58:14 2017 +0200

--
 .../apache/beam/sdk/io/hcatalog/HCatalogIO.java | 113 ---
 .../io/hcatalog/EmbeddedMetastoreService.java   |   3 +-
 .../beam/sdk/io/hcatalog/HCatalogIOTest.java|  54 +
 .../sdk/io/hcatalog/HCatalogIOTestUtils.java|  22 ++--
 4 files changed, 90 insertions(+), 102 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c11f0ff5/sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/HCatalogIO.java
--
diff --git 
a/sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/HCatalogIO.java
 
b/sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/HCatalogIO.java
index 07b56e3..1549dab 100644
--- 
a/sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/HCatalogIO.java
+++ 
b/sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/HCatalogIO.java
@@ -78,11 +78,10 @@ import org.slf4j.LoggerFactory;
  *
  * pipeline
  *   .apply(HCatalogIO.read()
- *   .withConfigProperties(configProperties) //mandatory
- *   .withTable("employee") //mandatory
+ *   .withConfigProperties(configProperties)
  *   .withDatabase("default") //optional, assumes default if none specified
- *   .withFilter(filterString) //optional,
- *   should be specified if the table is partitioned
+ *   .withTable("employee")
+ *   .withFilter(filterString) //optional, may be specified if the table 
is partitioned
  * }
  *
  * Writing using HCatalog
@@ -100,13 +99,11 @@ import org.slf4j.LoggerFactory;
  * pipeline
  *   .apply(...)
  *   .apply(HiveIO.write()
- *   .withConfigProperties(configProperties) //mandatory
- *   .withTable("employee") //mandatory
+ *   .withConfigProperties(configProperties)
  *   .withDatabase("default") //optional, assumes default if none specified
- *   .withFilter(partitionValues) //optional,
- *   should be specified if the table is partitioned
- *   .withBatchSize(1024L)) //optional,
- *   assumes a default batch size of 1024 if none specified
+ *   .withTable("employee")
+ *   .withPartition(partitionValues) //optional, may be specified if the 
table is partitioned
+ *   .withBatchSize(1024L)) //optional, assumes a default batch size of 
1024 if none specified
  * }
  */
 @Experimental
@@ -114,14 +111,17 @@ public class HCatalogIO {
 
   private static final Logger LOG = LoggerFactory.getLogger(HCatalogIO.class);
 
+  private static final long BATCH_SIZE = 1024L;
+  private static final String DEFAULT_DATABASE = "default";
+
   /** Write data to Hive. */
   public static Write write() {
-return new 
AutoValue_HCatalogIO_Write.Builder().setBatchSize(1024L).build();
+return new 
AutoValue_HCatalogIO_Write.Builder().setBatchSize(BATCH_SIZE).build();
   }
 
   /** Read data from Hive. */
   public static Read read() {
-return new 
AutoValue_HCatalogIO_Read.Builder().setDatabase("default").build();
+return new 
AutoValue_HCatalogIO_Read.Builder().setDatabase(DEFAULT_DATABASE).build();
   }
 
   private HCatalogIO() {}
@@ -130,44 +130,26 @@ public class HCatalogIO {
   @VisibleForTesting
   @AutoValue
   public abstract static class Read extends PTransform> {
-@Nullable
-abstract Map getConfigProperties();
-
-@Nullable
-abstract String getDatabase();
-
-@Nullable
-abstract String getTable();
-
-@Nullable
-abstract String getFilter();
-
-@Nullable
-abstract ReaderContext getContext();
-
-@Nullable
-abstract Integer getSplitId();
-
+@Nullable abstract Map getConfigProperties();
+@Nullable abstract String getDatabase();
+@Nullable abstract String getTable();
+@Nullable abstract String getFilter();
+@Nullable abstract ReaderContext getContext();
+@Nullable abstract Integer getSplitId();
 abstract Builder toBuilder();
 
 @AutoValue.Builder
 abstract static class Builder {
   abstract Builder setConfigProperties(Map 
configProperties);
-
   abstra

[GitHub] beam pull request #3412: Fix minor issues on HCatalogIO

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3412

2017-06-21 Thread iemejia
This closes #3412


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/28c6fd42
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/28c6fd42
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/28c6fd42

Branch: refs/heads/master
Commit: 28c6fd42eb95fc44fbc038758c31499a9482514d
Parents: 2d25b68 c11f0ff
Author: Ismaël Mejía 
Authored: Wed Jun 21 21:58:21 2017 +0200
Committer: Ismaël Mejía 
Committed: Wed Jun 21 21:58:21 2017 +0200

--
 .../apache/beam/sdk/io/hcatalog/HCatalogIO.java | 113 ---
 .../io/hcatalog/EmbeddedMetastoreService.java   |   3 +-
 .../beam/sdk/io/hcatalog/HCatalogIOTest.java|  54 +
 .../sdk/io/hcatalog/HCatalogIOTestUtils.java|  22 ++--
 4 files changed, 90 insertions(+), 102 deletions(-)
--




[jira] [Commented] (BEAM-2392) Avoid use of proto builder clone

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058091#comment-16058091
 ] 

ASF GitHub Bot commented on BEAM-2392:
--

GitHub user nkilmer opened a pull request:

https://github.com/apache/beam/pull/3415

[BEAM-2392] Remove uses of proto builder clone

Hi @lukecwik, could you please review this?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkilmer/beam remove-proto-clone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3415


commit 46a5d0b8c9ca9ecaf7b91d3736d111f331f35a9c
Author: Nigel Kilmer 
Date:   2017-06-21T18:26:10Z

Removed uses of proto builder clone method




> Avoid use of proto builder clone
> 
>
> Key: BEAM-2392
> URL: https://issues.apache.org/jira/browse/BEAM-2392
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 2.1.0
>Reporter: Nigel Kilmer
>Assignee: Nigel Kilmer
>Priority: Minor
>
> BigtableServiceImpl uses the clone method of the MutateRowResponse proto 
> builder here:
> https://github.com/apache/beam/blob/04e3261818aed0c129e7c715e371463bf5b5c1b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImpl.java#L212
> This method is not generated by the Google-internal Java proto generator, so 
> I had to change this to get it to work with an internal project. Are you 
> interested in adding this change to the main repository for compatibility, or 
> would you prefer to keep the cleaner version that uses clone?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3415: [BEAM-2392] Remove uses of proto builder clone

2017-06-21 Thread nkilmer
GitHub user nkilmer opened a pull request:

https://github.com/apache/beam/pull/3415

[BEAM-2392] Remove uses of proto builder clone

Hi @lukecwik, could you please review this?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkilmer/beam remove-proto-clone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3415


commit 46a5d0b8c9ca9ecaf7b91d3736d111f331f35a9c
Author: Nigel Kilmer 
Date:   2017-06-21T18:26:10Z

Removed uses of proto builder clone method




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4173

2017-06-21 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4172

2017-06-21 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink #3201

2017-06-21 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4171

2017-06-21 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2494) Remove 'GroupedShuffleRangeTracker' which is unused in the SDK

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057895#comment-16057895
 ] 

ASF GitHub Bot commented on BEAM-2494:
--

GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3414

[BEAM-2494] Remove GroupedShuffleRangeTracker which is unused in the SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
remove_grouped_shuffle_range_tracker

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3414


commit fbe89781bbf32421cbafe19313e6fbe070115dc2
Author: chamik...@google.com 
Date:   2017-06-21T17:37:11Z

Remove GroupedShuffleRangeTracker which is unused in the SDK




> Remove 'GroupedShuffleRangeTracker' which is unused in the SDK
> --
>
> Key: BEAM-2494
> URL: https://issues.apache.org/jira/browse/BEAM-2494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3414: [BEAM-2494] Remove GroupedShuffleRangeTracker which...

2017-06-21 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3414

[BEAM-2494] Remove GroupedShuffleRangeTracker which is unused in the SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
remove_grouped_shuffle_range_tracker

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3414


commit fbe89781bbf32421cbafe19313e6fbe070115dc2
Author: chamik...@google.com 
Date:   2017-06-21T17:37:11Z

Remove GroupedShuffleRangeTracker which is unused in the SDK




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2494) Remove 'GroupedShuffleRangeTracker' which is unused in the SDK

2017-06-21 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-2494:


 Summary: Remove 'GroupedShuffleRangeTracker' which is unused in 
the SDK
 Key: BEAM-2494
 URL: https://issues.apache.org/jira/browse/BEAM-2494
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Affects Versions: 2.1.0
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2447) Reintroduce DoFn.ProcessContinuation

2017-06-21 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057882#comment-16057882
 ] 

Kenneth Knowles commented on BEAM-2447:
---

Is "After a failed tryClaim() call, the ProcessElement method MUST return 
stop()" necessary? What is the pitfall of the runner ignoring the value and 
treating everything as stop()?

> Reintroduce DoFn.ProcessContinuation
> 
>
> Key: BEAM-2447
> URL: https://issues.apache.org/jira/browse/BEAM-2447
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> ProcessContinuation.resume() is useful for tailing files - when we reach 
> current EOF, we want to voluntarily suspend the process() call rather than 
> wait for runner to checkpoint us.
> In BEAM-1903, DoFn.ProcessContinuation was removed because there was 
> ambiguity about the semantics of resume() especially w.r.t. the following 
> situation described in 
> https://docs.google.com/document/d/1BGc8pM1GOvZhwR9SARSVte-20XEoBUxrGJ5gTWXdv3c/edit
>  : the runner has taken a checkpoint on the tracker, and then the 
> ProcessElement call returns resume() signaling that the work is still not 
> done - then there's 2 checkpoints to deal with.
> Instead, the proper way to refine this semantics is:
> - After checkpoint() on a RestrictionTracker, the tracker MUST fail all 
> subsequent tryClaim() calls, and MUST succeed in checkDone().
> - After a failed tryClaim() call, the ProcessElement method MUST return stop()
> - So ProcessElement can return resume() only *instead* of doing tryClaim()
> - Then, if the runner has already taken a checkpoint but tracker has returned 
> resume(), we do not need to take a new checkpoint - the one already taken 
> already accurately describes the remainder of the work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink #3200

2017-06-21 Thread Apache Jenkins Server
See 


Changes:

[github] Turn notifications for broken Windows test off.

--
[...truncated 291.64 KB...]
2017-06-21T17:14:40.490 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-antlr/1.9.4/ant-antlr-1.9.4.jar
2017-06-21T17:14:40.518 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-antlr/1.9.4/ant-antlr-1.9.4.jar
 (12 KB at 82.2 KB/sec)
2017-06-21T17:14:40.519 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-groovydoc/2.4.7/groovy-groovydoc-2.4.7.jar
2017-06-21T17:14:40.520 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-ant/2.4.7/groovy-ant-2.4.7.jar
 (69 KB at 486.0 KB/sec)
2017-06-21T17:14:40.520 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-templates/2.4.7/groovy-templates-2.4.7.jar
2017-06-21T17:14:40.561 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-templates/2.4.7/groovy-templates-2.4.7.jar
 (96 KB at 528.0 KB/sec)
2017-06-21T17:14:40.561 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-junit/1.9.4/ant-junit-1.9.4.jar
2017-06-21T17:14:40.563 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-groovydoc/2.4.7/groovy-groovydoc-2.4.7.jar
 (133 KB at 721.4 KB/sec)
2017-06-21T17:14:40.563 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-xml/2.4.7/groovy-xml-2.4.7.jar
2017-06-21T17:14:40.607 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-junit/1.9.4/ant-junit-1.9.4.jar
 (116 KB at 506.7 KB/sec)
2017-06-21T17:14:40.607 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-impl/2.1/maven-reporting-impl-2.1.jar
2017-06-21T17:14:40.611 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-xml/2.4.7/groovy-xml-2.4.7.jar
 (211 KB at 910.9 KB/sec)
2017-06-21T17:14:40.611 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-project/2.0.10/maven-project-2.0.10.jar
2017-06-21T17:14:40.624 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant/1.9.4/ant-1.9.4.jar 
(1972 KB at 8115.0 KB/sec)
2017-06-21T17:14:40.624 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-settings/2.0.10/maven-settings-2.0.10.jar
2017-06-21T17:14:40.633 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-impl/2.1/maven-reporting-impl-2.1.jar
 (17 KB at 64.0 KB/sec)
2017-06-21T17:14:40.633 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-profile/2.0.10/maven-profile-2.0.10.jar
2017-06-21T17:14:40.645 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-project/2.0.10/maven-project-2.0.10.jar
 (121 KB at 454.0 KB/sec)
2017-06-21T17:14:40.645 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-registry/2.0.10/maven-plugin-registry-2.0.10.jar
2017-06-21T17:14:40.653 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-settings/2.0.10/maven-settings-2.0.10.jar
 (50 KB at 183.1 KB/sec)
2017-06-21T17:14:40.653 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-interpolation/1.1/plexus-interpolation-1.1.jar
2017-06-21T17:14:40.657 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/code/findbugs/findbugs/3.0.1/findbugs-3.0.1.jar
 (3711 KB at 13299.7 KB/sec)
2017-06-21T17:14:40.657 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/commons-validator/commons-validator/1.2.0/commons-validator-1.2.0.jar
2017-06-21T17:14:40.662 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-profile/2.0.10/maven-profile-2.0.10.jar
 (37 KB at 127.8 KB/sec)
2017-06-21T17:14:40.662 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.jar
2017-06-21T17:14:40.673 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/maven/maven-plugin-registry/2.0.10/maven-plugin-registry-2.0.10.jar
 (30 KB at 101.3 KB/sec)
2017-06-21T17:14:40.673 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-core/1.4/doxia-core-1.4.jar
2017-06-21T17:14:40.681 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-interpolation/1.1/plexus-interpolation-1.1.jar
 (35 KB at 115.1 KB/sec)
2017-06-21T17:14:40.681 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-doxia-tools/1.2.1/maven-doxia-tools-1.2.1.jar
2017-06-21T17:14:40.688 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/commons-validator/commons-validator/1.2.0/commons-validator-1.2.0.ja

[jira] [Created] (BEAM-2493) TestStream.Builder.addElements() should return the same builder

2017-06-21 Thread Keith Berkoben (JIRA)
Keith Berkoben created BEAM-2493:


 Summary: TestStream.Builder.addElements() should return the same 
builder
 Key: BEAM-2493
 URL: https://issues.apache.org/jira/browse/BEAM-2493
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Affects Versions: 2.0.0
Reporter: Keith Berkoben
Assignee: Davor Bonaci


When writing tests for pipelines, it is commonly the case where a TestStream 
must be built in steps ex: 

TestStream.Builder tsb = 
TestStream.create().advanceWatermarkTo(new Instant(0);
if(){
  tsb.addElements();
}
TestStream  stream = tsb.advanceWatermarkToInfinity();

The above code does not  work, however, because addElements() is creating a NEW 
builder rather than augmenting the existing one.  This is a-typical for a 
builder pattern and requires the user to do 

tsb = tsb.addElements()

which is more verbose and counterintuitive if one is expecting a builder. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[1/2] beam git commit: Turn notifications for broken Windows test off.

2017-06-21 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master d1d78121b -> 2d25b6840


Turn notifications for broken Windows test off.

Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/65a6d662
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/65a6d662
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/65a6d662

Branch: refs/heads/master
Commit: 65a6d66251b081d540de85ed55dce3b62de797e7
Parents: d1d7812
Author: jasonkuster 
Authored: Wed Jun 21 09:40:51 2017 -0700
Committer: GitHub 
Committed: Wed Jun 21 09:40:51 2017 -0700

--
 .../jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/65a6d662/.test-infra/jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy
--
diff --git 
a/.test-infra/jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy 
b/.test-infra/jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy
index f781b4e..6ef272c 100644
--- a/.test-infra/jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy
+++ b/.test-infra/jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy
@@ -32,7 +32,8 @@ mavenJob('beam_PostCommit_Java_MavenInstall_Windows') {
   common_job_properties.setMavenConfig(delegate, 'Maven 3.3.3 (Windows)')
 
   // Sets that this is a PostCommit job.
-  common_job_properties.setPostCommit(delegate, '0 */6 * * *', false)
+  // TODO(BEAM-1042, BEAM-1045, BEAM-2269, BEAM-2299) Turn notifications back 
on once fixed.
+  common_job_properties.setPostCommit(delegate, '0 */6 * * *', false, '', 
false)
 
   // Allows triggering this build against pull requests.
   common_job_properties.enablePhraseTriggeringFromPullRequest(



[GitHub] beam pull request #3413: Disable notifications for Windows test until bugs a...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3413


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3413: Turn notifications for broken Windows test off.

2017-06-21 Thread kenn
This closes #3413: Turn notifications for broken Windows test off.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2d25b684
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2d25b684
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2d25b684

Branch: refs/heads/master
Commit: 2d25b6840009ccaaccc90337ea7812ce39188e02
Parents: d1d7812 65a6d66
Author: Kenneth Knowles 
Authored: Wed Jun 21 10:11:05 2017 -0700
Committer: Kenneth Knowles 
Committed: Wed Jun 21 10:11:05 2017 -0700

--
 .../jenkins/job_beam_PostCommit_Java_MavenInstall_Windows.groovy  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--




[jira] [Commented] (BEAM-1458) Checkpoint support in Beam

2017-06-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057839#comment-16057839
 ] 

Ismaël Mejía commented on BEAM-1458:


I am curious isn't this a bit correlated to the recently discuss API for 
Pipeline drain by [~reuvenlax] ?
https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8/edit

> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3413: Disable notifications for Windows test until bugs a...

2017-06-21 Thread jasonkuster
GitHub user jasonkuster opened a pull request:

https://github.com/apache/beam/pull/3413

Disable notifications for Windows test until bugs are fixed.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jasonkuster/beam patch-4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3413


commit 65a6d66251b081d540de85ed55dce3b62de797e7
Author: jasonkuster 
Date:   2017-06-21T16:40:51Z

Turn notifications for broken Windows test off.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1265) Add streaming support to Python DirectRunner

2017-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057785#comment-16057785
 ] 

ASF GitHub Bot commented on BEAM-1265:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3405


> Add streaming support to Python DirectRunner
> 
>
> Key: BEAM-1265
> URL: https://issues.apache.org/jira/browse/BEAM-1265
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Charles Chen
>
> Continue the work started in https://issues.apache.org/jira/browse/BEAM-428



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3405: [BEAM-1265] Use state / timer API for DirectRunner ...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3405


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3405

2017-06-21 Thread altay
This closes #3405


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d1d78121
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d1d78121
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d1d78121

Branch: refs/heads/master
Commit: d1d78121b64b6d30dfddd023eb1145d6983d96d8
Parents: 50acc6c 56041b7
Author: Ahmet Altay 
Authored: Wed Jun 21 09:23:17 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 09:23:17 2017 -0700

--
 .../runners/direct/evaluation_context.py|  3 +-
 .../apache_beam/runners/direct/executor.py  | 37 -
 .../runners/direct/transform_evaluator.py   | 48 +---
 .../runners/direct/transform_result.py  | 40 --
 sdks/python/apache_beam/runners/direct/util.py  | 58 
 .../runners/direct/watermark_manager.py | 56 +++
 sdks/python/apache_beam/transforms/trigger.py   | 10 +++-
 7 files changed, 163 insertions(+), 89 deletions(-)
--




[1/2] beam git commit: Use state / timer API for DirectRunner timer firings

2017-06-21 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 50acc6c20 -> d1d78121b


Use state / timer API for DirectRunner timer firings


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/56041b78
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/56041b78
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/56041b78

Branch: refs/heads/master
Commit: 56041b7850abfbb10d4a6ff2ddecb227a0a4e7c8
Parents: 50acc6c
Author: Charles Chen 
Authored: Tue Jun 20 15:22:58 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Jun 21 09:23:13 2017 -0700

--
 .../runners/direct/evaluation_context.py|  3 +-
 .../apache_beam/runners/direct/executor.py  | 37 -
 .../runners/direct/transform_evaluator.py   | 48 +---
 .../runners/direct/transform_result.py  | 40 --
 sdks/python/apache_beam/runners/direct/util.py  | 58 
 .../runners/direct/watermark_manager.py | 56 +++
 sdks/python/apache_beam/transforms/trigger.py   | 10 +++-
 7 files changed, 163 insertions(+), 89 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/56041b78/sdks/python/apache_beam/runners/direct/evaluation_context.py
--
diff --git a/sdks/python/apache_beam/runners/direct/evaluation_context.py 
b/sdks/python/apache_beam/runners/direct/evaluation_context.py
index 8fa8e06..976e9e8 100644
--- a/sdks/python/apache_beam/runners/direct/evaluation_context.py
+++ b/sdks/python/apache_beam/runners/direct/evaluation_context.py
@@ -148,7 +148,8 @@ class EvaluationContext(object):
 self._transform_keyed_states = self._initialize_keyed_states(
 root_transforms, value_to_consumers)
 self._watermark_manager = WatermarkManager(
-Clock(), root_transforms, value_to_consumers)
+Clock(), root_transforms, value_to_consumers,
+self._transform_keyed_states)
 self._side_inputs_container = _SideInputsContainer(views)
 self._pending_unblocked_tasks = []
 self._counter_factory = counters.CounterFactory()

http://git-wip-us.apache.org/repos/asf/beam/blob/56041b78/sdks/python/apache_beam/runners/direct/executor.py
--
diff --git a/sdks/python/apache_beam/runners/direct/executor.py 
b/sdks/python/apache_beam/runners/direct/executor.py
index eff2d3c..a0a3886 100644
--- a/sdks/python/apache_beam/runners/direct/executor.py
+++ b/sdks/python/apache_beam/runners/direct/executor.py
@@ -222,14 +222,14 @@ class _CompletionCallback(object):
   or for a source transform.
   """
 
-  def __init__(self, evaluation_context, all_updates, timers=None):
+  def __init__(self, evaluation_context, all_updates, timer_firings=None):
 self._evaluation_context = evaluation_context
 self._all_updates = all_updates
-self._timers = timers
+self._timer_firings = timer_firings or []
 
   def handle_result(self, input_committed_bundle, transform_result):
 output_committed_bundles = self._evaluation_context.handle_result(
-input_committed_bundle, self._timers, transform_result)
+input_committed_bundle, self._timer_firings, transform_result)
 for output_committed_bundle in output_committed_bundles:
   self._all_updates.offer(_ExecutorServiceParallelExecutor._ExecutorUpdate(
   output_committed_bundle, None))
@@ -251,11 +251,12 @@ class TransformExecutor(_ExecutorService.CallableTask):
   """
 
   def __init__(self, transform_evaluator_registry, evaluation_context,
-   input_bundle, applied_ptransform, completion_callback,
-   transform_evaluation_state):
+   input_bundle, fired_timers, applied_ptransform,
+   completion_callback, transform_evaluation_state):
 self._transform_evaluator_registry = transform_evaluator_registry
 self._evaluation_context = evaluation_context
 self._input_bundle = input_bundle
+self._fired_timers = fired_timers
 self._applied_ptransform = applied_ptransform
 self._completion_callback = completion_callback
 self._transform_evaluation_state = transform_evaluation_state
@@ -288,6 +289,10 @@ class TransformExecutor(_ExecutorService.CallableTask):
   self._applied_ptransform, self._input_bundle,
   side_input_values, scoped_metrics_container)
 
+  if self._fired_timers:
+for timer_firing in self._fired_timers:
+  evaluator.process_timer_wrapper(timer_firing)
+
   if self._input_bundle:
 for value in self._input_bundle.get_elements_iterable():
   evaluator.process_element(value)
@@ -379,11 +384,11 @@ class _ExecutorServiceParallelExecutor(object):
 if committed_bundle.pcollection in self.value_to_consumers:
   co

[jira] [Assigned] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-21 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-2490:
-

Assignee: Chamikara Jayalath  (was: Ahmet Altay)

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Ben Chambers (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057756#comment-16057756
 ] 

Ben Chambers commented on BEAM-2491:


That seems OK. I think both of the classes in the core-construction-java module 
are only used internally by some runners to implement filtering, etc.

> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Minor comment : some tools (e.g. Elasticsearch test framework) detect this 
> duplication and fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2492) Have PipelineOptions DisplayData filter out attributes marked with @org.apache.beam.sdk.options.Hidden

2017-06-21 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-2492:
---

 Summary: Have PipelineOptions DisplayData filter out attributes 
marked with @org.apache.beam.sdk.options.Hidden
 Key: BEAM-2492
 URL: https://issues.apache.org/jira/browse/BEAM-2492
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Luke Cwik
Priority: Minor


Currently DisplayData for PipelineOptions filters out attributes if they:
* represent the default value
* are tagged with @JsonIgnore

Code pointer:
https://github.com/apache/beam/blob/50acc6c20f03b4616c80027fa2ff750c77a18664/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L291



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057615#comment-16057615
 ] 

Etienne Chauchot edited comment on BEAM-2491 at 6/21/17 3:19 PM:
-

I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java*. Indeed of the two packages, it is the 
smallest one, so it will impact less code.
[~bchambers] WDYT?




was (Author: echauchot):
I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java*.Indeed of the two packages, it is the 
smallest one, so it will impact less code.
[~bchambers] WDYT?



> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Minor comment : some tools (e.g. Elasticsearch test framework) detect this 
> duplication and fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057615#comment-16057615
 ] 

Etienne Chauchot edited comment on BEAM-2491 at 6/21/17 3:19 PM:
-

I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java*.Indeed of the two packages, it is the 
smallest one, so it will impact less code.
[~bchambers] WDYT?




was (Author: echauchot):
I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java* 
[~bchambers] WDYT?



> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Minor comment : some tools (e.g. Elasticsearch test framework) detect this 
> duplication and fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057615#comment-16057615
 ] 

Etienne Chauchot edited comment on BEAM-2491 at 6/21/17 3:14 PM:
-

I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java* 
[~bchambers] WDYT?




was (Author: echauchot):
I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java* : currently there is only one external 
use of the classes in this package in 
org.apache.beam.runners.direct.DirectMetricsTest. 
[~bchambers] WDYT?



> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Minor comment : some tools (e.g. Elasticsearch test framework) detect this 
> duplication and fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-2491:
---
Description: 
There is twice the package org.apache.beam.runners.core.metrics in the code 
base:
* one in the module beam-runners-core-construction-java
* one in the module beam-runners-core-java

Consequently there will be two 
org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.

Minor comment : some tools (e.g. Elasticsearch test framework) detect this 
duplication and fail at runtime.


  was:
There is twice the package org.apache.beam.runners.core.metrics in the code 
base:
* one in the module beam-runners-core-construction-java
* one in the module beam-runners-core-java

Consequently there will be two 
org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
Some tools (e.g. Elasticsearch test framework) detect this duplication and fail 
at runtime.



> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Minor comment : some tools (e.g. Elasticsearch test framework) detect this 
> duplication and fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057615#comment-16057615
 ] 

Etienne Chauchot commented on BEAM-2491:


I propose to move the package org.apache.beam.runners.core.metrics to  
org.apache.beam.runners.core.*construction*.metrics in module 
*beam-runners-core-construction-java* : currently there is only one external 
use of the classes in this package in 
org.apache.beam.runners.direct.DirectMetricsTest. 
[~bchambers] WDYT?



> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Some tools (e.g. Elasticsearch test framework) detect this duplication and 
> fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-2491:
---
Priority: Trivial  (was: Minor)

> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Trivial
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Some tools (e.g. Elasticsearch test framework) detect this duplication and 
> fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-2491:
---
Description: 
There is twice the package org.apache.beam.runners.core.metrics in the code 
base:
* one in the module beam-runners-core-construction-java
* one in the module beam-runners-core-java

Consequently there will be two 
org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
Some tools (e.g. Elasticsearch test framework) detect this duplication and fail 
at runtime.


  was:
There is twice the package org.apache.beam.runners.core.metrics in the code 
base:
* one in the module beam-runners-core-construction-java
* one in the module beam-runners-core-java
Consequently there will be two 
org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
Some tools (e.g. Elasticsearch test framework) detect this duplication and fail 
at runtime.



> Duplicate org.apache.beam.runners.core.metrics.pachage-info.class
> -
>
> Key: BEAM-2491
> URL: https://issues.apache.org/jira/browse/BEAM-2491
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Minor
>
> There is twice the package org.apache.beam.runners.core.metrics in the code 
> base:
> * one in the module beam-runners-core-construction-java
> * one in the module beam-runners-core-java
> Consequently there will be two 
> org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
> Some tools (e.g. Elasticsearch test framework) detect this duplication and 
> fail at runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2491) Duplicate org.apache.beam.runners.core.metrics.pachage-info.class

2017-06-21 Thread Etienne Chauchot (JIRA)
Etienne Chauchot created BEAM-2491:
--

 Summary: Duplicate 
org.apache.beam.runners.core.metrics.pachage-info.class
 Key: BEAM-2491
 URL: https://issues.apache.org/jira/browse/BEAM-2491
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot
Priority: Minor


There is twice the package org.apache.beam.runners.core.metrics in the code 
base:
* one in the module beam-runners-core-construction-java
* one in the module beam-runners-core-java
Consequently there will be two 
org.apache.beam.runners.core.metrics.pachage-info.class in the classpath.
Some tools (e.g. Elasticsearch test framework) detect this duplication and fail 
at runtime.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1458) Checkpoint support in Beam

2017-06-21 Thread Rafal Wojdyla (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057577#comment-16057577
 ] 

Rafal Wojdyla commented on BEAM-1458:
-

Fyi - we have a first version of checkpoints in 
[Scio](https://github.com/spotify/scio) - [checkpoint 
package](https://github.com/spotify/scio/blob/master/scio-extra/src/main/scala/com/spotify/scio/extra/checkpoint/package.scala).

> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>Assignee: Rafal Wojdyla
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1458) Checkpoint support in Beam

2017-06-21 Thread Rafal Wojdyla (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla reassigned BEAM-1458:
---

Assignee: (was: Rafal Wojdyla)

> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-1458) Checkpoint support in Beam

2017-06-21 Thread Rafal Wojdyla (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057577#comment-16057577
 ] 

Rafal Wojdyla edited comment on BEAM-1458 at 6/21/17 2:09 PM:
--

Fyi - we have a first version of checkpoints in 
[Scio|https://github.com/spotify/scio] - [checkpoint 
package|https://github.com/spotify/scio/blob/master/scio-extra/src/main/scala/com/spotify/scio/extra/checkpoint/package.scala].


was (Author: ravwojdyla):
Fyi - we have a first version of checkpoints in 
[Scio](https://github.com/spotify/scio) - [checkpoint 
package](https://github.com/spotify/scio/blob/master/scio-extra/src/main/scala/com/spotify/scio/extra/checkpoint/package.scala).

> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>Assignee: Rafal Wojdyla
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4170

2017-06-21 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Java_MavenInstall_Windows #160

2017-06-21 Thread Apache Jenkins Server
See 


Changes:

[iemejia] Return a valid Coder for any subtype of Mutation on

[jbonofre] [BEAM-2481] Update commons-lang3 dependency to version 3.6

--
[...truncated 2.66 MB...]
2017-06-21T12:38:26.826 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/gossip/gossip/1.8/gossip-1.8.pom
2017-06-21T12:38:26.837 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/gossip/gossip/1.8/gossip-1.8.pom
 (12 KB at 1023.3 KB/sec)
2017-06-21T12:38:26.840 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/forge/forge-parent/9/forge-parent-9.pom
2017-06-21T12:38:26.852 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/forge/forge-parent/9/forge-parent-9.pom
 (13 KB at 1069.5 KB/sec)
2017-06-21T12:38:26.856 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/gossip/gossip-core/1.8/gossip-core-1.8.pom
2017-06-21T12:38:26.865 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/gossip/gossip-core/1.8/gossip-core-1.8.pom
 (3 KB at 241.4 KB/sec)
2017-06-21T12:38:26.869 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/sonatype/gossip/gossip-bootstrap/1.8/gossip-bootstrap-1.8.pom
2017-06-21T12:38:26.879 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/sonatype/gossip/gossip-bootstrap/1.8/gossip-bootstrap-1.8.pom
 (2 KB at 157.3 KB/sec)
2017-06-21T12:38:26.883 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.pom
2017-06-21T12:38:26.893 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/guava/guava/14.0.1/guava-14.0.1.pom
 (6 KB at 525.0 KB/sec)
2017-06-21T12:38:26.896 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/guava/guava-parent/14.0.1/guava-parent-14.0.1.pom
2017-06-21T12:38:26.905 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/guava/guava-parent/14.0.1/guava-parent-14.0.1.pom
 (3 KB at 277.2 KB/sec)
2017-06-21T12:38:26.909 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-classworlds/2.4.2/plexus-classworlds-2.4.2.pom
2017-06-21T12:38:26.917 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-classworlds/2.4.2/plexus-classworlds-2.4.2.pom
 (4 KB at 428.3 KB/sec)
2017-06-21T12:38:26.921 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-interpolation/1.16/plexus-interpolation-1.16.pom
2017-06-21T12:38:26.932 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-interpolation/1.16/plexus-interpolation-1.16.pom
 (2 KB at 91.2 KB/sec)
2017-06-21T12:38:26.937 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/eclipse/aether/aether-api/0.9.0.M2/aether-api-0.9.0.M2.pom
2017-06-21T12:38:26.948 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/eclipse/aether/aether-api/0.9.0.M2/aether-api-0.9.0.M2.pom
 (2 KB at 154.2 KB/sec)
2017-06-21T12:38:26.953 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/gmaven/gmaven-adapter-api/2.0/gmaven-adapter-api-2.0.pom
2017-06-21T12:38:26.961 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/gmaven/gmaven-adapter-api/2.0/gmaven-adapter-api-2.0.pom
 (2 KB at 220.5 KB/sec)
2017-06-21T12:38:26.966 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/gmaven/gmaven-adapter-impl/2.0/gmaven-adapter-impl-2.0.pom
2017-06-21T12:38:26.977 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/gmaven/gmaven-adapter-impl/2.0/gmaven-adapter-impl-2.0.pom
 (3 KB at 241.3 KB/sec)
2017-06-21T12:38:26.982 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-all/2.1.5/groovy-all-2.1.5.pom
2017-06-21T12:38:27.075 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-all/2.1.5/groovy-all-2.1.5.pom
 (18 KB at 189.4 KB/sec)
2017-06-21T12:38:27.080 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant/1.8.4/ant-1.8.4.pom
2017-06-21T12:38:27.093 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant/1.8.4/ant-1.8.4.pom (10 
KB at 725.2 KB/sec)
2017-06-21T12:38:27.097 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-parent/1.8.4/ant-parent-1.8.4.pom
2017-06-21T12:38:27.108 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-parent/1.8.4/ant-parent-1.8.4.pom
 (5 KB at 404.9 KB/sec)
2017-06-21T12:38:27.113 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-launcher/1.8.4/ant-launcher-1.8.4.pom
2017-06-21T12:38:27.123 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/ant/ant-launcher/1.8.4/ant-launcher-1.8.4.pom
 (3 KB at 230.6 KB/sec)
2017-06-21T12:38:27.128 [INFO] Downloading: 
https://repo.maven.apache.org/maven

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #4169

2017-06-21 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-21 Thread Olivier NGUYEN QUOC (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier NGUYEN QUOC updated BEAM-2490:
--
Description: 
I run a very simple pipeline:
* Read my files from Google Cloud Storage
* Split with '\n' char
* Write in on a Google Cloud Storage

I have 8 files that match with the pattern:
* my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
* my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
* my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
* my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
* my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
* my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)

This code should take them all:

{code:python}
beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
{code}

It runs well but there is only a 288.62 MB file in output of this pipeline 
(instead of a 1.5 GB file).

The whole pipeline code:

{code:python}
data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
   | 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
)
output = (
  data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
num_shards=1)
)
{code}

Dataflow indicates me that the estimated size   of the output after the 
ReadFromText step is 602.29 MB only, which not correspond to any unique input 
file size nor the overall file size matching with the pattern.


  was:
I run a very simple pipeline:
* Read my files from Google Cloud Storage
* Split with '\n' char
* Write in on a Google Cloud Storage

I have 6 files matching with the pattern:
* my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
* my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
* my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
* my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
* my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
* my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)

This code should take them all:

{code:python}
beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
{code}

It runs well but there is only a 288.62 MB file in output of this pipeline 
(instead of a 1.5 GB file).

The whole pipeline code:

{code:python}
data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
   | 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
)
output = (
  data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
num_shards=1)
)
{code}

Dataflow indicates me that the estimated size   of the output after the 
ReadFromText step is 602.29 MB only, which not correspond to any unique input 
file size nor the overall file size matching with the pattern.



> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Ahmet Altay
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
>

[jira] [Updated] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-21 Thread Olivier NGUYEN QUOC (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier NGUYEN QUOC updated BEAM-2490:
--
Description: 
I run a very simple pipeline:
* Read my files from Google Cloud Storage
* Split with '\n' char
* Write in on a Google Cloud Storage

I have 6 files matching with the pattern:
* my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
* my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
* my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
* my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
* my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
* my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)

This code should take them all:

{code:python}
beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
{code}

It runs well but there is only a 288.62 MB file in output of this pipeline 
(instead of a 1.5 GB file).

The whole pipeline code:

{code:python}
data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
   | 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
)
output = (
  data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
num_shards=1)
)
{code}

Dataflow indicates me that the estimated size   of the output after the 
ReadFromText step is 602.29 MB only, which not correspond to any unique input 
file size nor the overall file size matching with the pattern.


  was:
I run a very simple pipeline:
* Read my files from storage
* Split with '\n' char
* Write in on a Google Cloud Storage

I have 6 files matching with the pattern:
* my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
* my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
* my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
* my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
* my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
* my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)

This code should take them all:

{code:python}
beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
{code}

It runs well but there is only a 288.62 MB file in output of this pipeline 
(instead of a 1.5 GB file).

The whole pipeline code:

{code:python}
data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
   | 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
)
output = (
  data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
num_shards=1)
)
{code}

Dataflow indicates me that the estimated size   of the output after the 
ReadFromText step is 602.29 MB only, which not correspond to any unique input 
file size nor the overall file size matching with the pattern.



> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Ahmet Altay
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 6 files matching with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should

[jira] [Updated] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-21 Thread Olivier NGUYEN QUOC (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier NGUYEN QUOC updated BEAM-2490:
--
Description: 
I run a very simple pipeline:
* Read my files from storage
* Split with '\n' char
* Write in on a Google Cloud Storage

I have 6 files matching with the pattern:
* my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
* my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
* my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
* my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
* my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
* my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)

This code should take them all:

{code:python}
beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
{code}

It runs well but there is only a 288.62 MB file in output of this pipeline 
(instead of a 1.5 GB file).

The whole pipeline code:

{code:python}
data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
   | 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
)
output = (
  data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
num_shards=1)
)
{code}

Dataflow indicates me that the estimated size   of the output after the 
ReadFromText step is 602.29 MB only, which not correspond to any unique input 
file size nor the overall file size matching with the pattern.


  was:
I run a very simple pipeline:
* Read my files from storage
* Split with '\n' char
* Write in on a Google Cloud Storage

I have 6 files matching with the pattern:
* my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
* my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
* my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
* my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
* my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
* my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
* my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)

It run well but there is only a 288.62 MB file in output of this pipeline 
(instead of a 1.5 GB file).

{code:python}
data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
  "gs://_folder1/my_files_20160901*.csv.gz",
  skip_header_lines=1,
  compression_type=beam.io.filesystem.CompressionTypes.GZIP
  )
   | 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
)
output = (
  data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
num_shards=1)
)
{code}

Dataflow indicates me that the estimated size   of the output after the 
ReadFromText step is 602.29 MB only, which not correspond to any unique input 
file size nor the overall file size matching with the pattern.



> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Ahmet Altay
>
> I run a very simple pipeline:
> * Read my files from storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 6 files matching with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in outp

[jira] [Updated] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-21 Thread Olivier NGUYEN QUOC (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier NGUYEN QUOC updated BEAM-2490:
--
Summary: ReadFromText function is not taking all data with glob operator 
(*)   (was: ReadFromText function not taking all data with glob operator (*) )

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Ahmet Altay
>
> I run a very simple pipeline:
> * Read my files from storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 6 files matching with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> It run well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3412: Fix minor issues on HCatalogIO

2017-06-21 Thread iemejia
GitHub user iemejia opened a pull request:

https://github.com/apache/beam/pull/3412

Fix minor issues on HCatalogIO

- Restrict access level when possible
- Rename Filter to Partition for the write to be coherent with the HCatalog 
API
- Improve test coverage
- Fix documentation details
- Implement TearDown method for the writer

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/beam static-analysis-fixes-hcatalogio

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3412


commit 0c74a4796d379eb5e459905d73f177a4326e3404
Author: Ismaël Mejía 
Date:   2017-06-08T22:01:55Z

Fix minor issues on HCatalogIO

- Restrict access level when possible
- Rename Filter to Partition for the write to be coherent with the HCatalog 
API
- Improve test coverage
- Fix documentation details
- Implement TearDown method for the writer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >