Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #3043

2017-03-26 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Dataflow #235

2017-03-26 Thread Apache Jenkins Server
See 


Changes:

[iemejia] Fix Regex#AllMatches javadoc

[iemejia] Fix Regex#FindAll javadoc

--
[...truncated 268.47 KB...]
 * [new ref] refs/pull/928/head -> origin/pr/928/head
 * [new ref] refs/pull/929/head -> origin/pr/929/head
 * [new ref] refs/pull/93/head -> origin/pr/93/head
 * [new ref] refs/pull/930/head -> origin/pr/930/head
 * [new ref] refs/pull/930/merge -> origin/pr/930/merge
 * [new ref] refs/pull/931/head -> origin/pr/931/head
 * [new ref] refs/pull/931/merge -> origin/pr/931/merge
 * [new ref] refs/pull/932/head -> origin/pr/932/head
 * [new ref] refs/pull/932/merge -> origin/pr/932/merge
 * [new ref] refs/pull/933/head -> origin/pr/933/head
 * [new ref] refs/pull/933/merge -> origin/pr/933/merge
 * [new ref] refs/pull/934/head -> origin/pr/934/head
 * [new ref] refs/pull/934/merge -> origin/pr/934/merge
 * [new ref] refs/pull/935/head -> origin/pr/935/head
 * [new ref] refs/pull/936/head -> origin/pr/936/head
 * [new ref] refs/pull/936/merge -> origin/pr/936/merge
 * [new ref] refs/pull/937/head -> origin/pr/937/head
 * [new ref] refs/pull/937/merge -> origin/pr/937/merge
 * [new ref] refs/pull/938/head -> origin/pr/938/head
 * [new ref] refs/pull/939/head -> origin/pr/939/head
 * [new ref] refs/pull/94/head -> origin/pr/94/head
 * [new ref] refs/pull/940/head -> origin/pr/940/head
 * [new ref] refs/pull/940/merge -> origin/pr/940/merge
 * [new ref] refs/pull/941/head -> origin/pr/941/head
 * [new ref] refs/pull/941/merge -> origin/pr/941/merge
 * [new ref] refs/pull/942/head -> origin/pr/942/head
 * [new ref] refs/pull/942/merge -> origin/pr/942/merge
 * [new ref] refs/pull/943/head -> origin/pr/943/head
 * [new ref] refs/pull/943/merge -> origin/pr/943/merge
 * [new ref] refs/pull/944/head -> origin/pr/944/head
 * [new ref] refs/pull/945/head -> origin/pr/945/head
 * [new ref] refs/pull/945/merge -> origin/pr/945/merge
 * [new ref] refs/pull/946/head -> origin/pr/946/head
 * [new ref] refs/pull/946/merge -> origin/pr/946/merge
 * [new ref] refs/pull/947/head -> origin/pr/947/head
 * [new ref] refs/pull/947/merge -> origin/pr/947/merge
 * [new ref] refs/pull/948/head -> origin/pr/948/head
 * [new ref] refs/pull/948/merge -> origin/pr/948/merge
 * [new ref] refs/pull/949/head -> origin/pr/949/head
 * [new ref] refs/pull/949/merge -> origin/pr/949/merge
 * [new ref] refs/pull/95/head -> origin/pr/95/head
 * [new ref] refs/pull/95/merge -> origin/pr/95/merge
 * [new ref] refs/pull/950/head -> origin/pr/950/head
 * [new ref] refs/pull/951/head -> origin/pr/951/head
 * [new ref] refs/pull/951/merge -> origin/pr/951/merge
 * [new ref] refs/pull/952/head -> origin/pr/952/head
 * [new ref] refs/pull/952/merge -> origin/pr/952/merge
 * [new ref] refs/pull/953/head -> origin/pr/953/head
 * [new ref] refs/pull/954/head -> origin/pr/954/head
 * [new ref] refs/pull/954/merge -> origin/pr/954/merge
 * [new ref] refs/pull/955/head -> origin/pr/955/head
 * [new ref] refs/pull/955/merge -> origin/pr/955/merge
 * [new ref] refs/pull/956/head -> origin/pr/956/head
 * [new ref] refs/pull/957/head -> origin/pr/957/head
 * [new ref] refs/pull/958/head -> origin/pr/958/head
 * [new ref] refs/pull/959/head -> origin/pr/959/head
 * [new ref] refs/pull/959/merge -> origin/pr/959/merge
 * [new ref] refs/pull/96/head -> origin/pr/96/head
 * [new ref] refs/pull/96/merge -> origin/pr/96/merge
 * [new ref] refs/pull/960/head -> origin/pr/960/head
 * [new ref] refs/pull/960/merge -> origin/pr/960/merge
 * [new ref] refs/pull/961/head -> origin/pr/961/head
 * [new ref] refs/pull/962/head -> origin/pr/962/head
 * [new ref] refs/pull/962/merge -> origin/pr/962/merge
 * [new ref] refs/pull/963/head -> origin/pr/963/head
 * [new ref] refs/pull/963/merge -> origin/pr/963/merge
 * [new ref] refs/pull/964/head -> origin/pr/964/head
 * [new ref] refs/pull/965/head -> origin/pr/965/head
 * [new ref] refs/pull/965/merge -> origin/pr/965/merge
 * [new ref] refs/pull/966/head -> origin/pr/966/head
 * [new ref] refs/pull/967/head -> origin/pr/967/head
 * [new ref] refs/pull/967/merge -> origin/pr/967/merge
 * [new ref] refs/pull/968/head -> origin/pr/968/head
 * [new ref] refs/pull/968/merge -> origin/pr/968/merge
 * [new ref] refs/pull/969/head -> origin/pr/969/head
 * [new ref] refs/pull/969/merge -

[jira] [Commented] (BEAM-1806) a new option `asLeftOuterJoin` for CoGroupByKey

2017-03-26 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942642#comment-15942642
 ] 

Aviem Zur commented on BEAM-1806:
-

Have you taken a look at 
[join-library|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java]?

> a new option `asLeftOuterJoin` for CoGroupByKey
> ---
>
> Key: BEAM-1806
> URL: https://issues.apache.org/jira/browse/BEAM-1806
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> Similar as BEAM-1805, restrict it as left-outer-join. 
> The first {{PCollection}} is used as the key, if it's empty, output is 
> ignored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1805) a new option `asInnerJoin` for CoGroupByKey

2017-03-26 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942640#comment-15942640
 ] 

Aviem Zur commented on BEAM-1805:
-

Have you taken a look at 
[join-library|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java]?

> a new option `asInnerJoin` for CoGroupByKey
> ---
>
> Key: BEAM-1805
> URL: https://issues.apache.org/jira/browse/BEAM-1805
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> {{CoGroupByKey}} joins multiple PCollection>, act as full-outer join.
> Option {{asInnerJoin()}} restrict the output to convert to an inner-join 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1810) Spark runner combineGlobally uses Kryo serialization

2017-03-26 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1810.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Spark runner combineGlobally uses Kryo serialization
> 
>
> Key: BEAM-1810
> URL: https://issues.apache.org/jira/browse/BEAM-1810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> {{TransformTranslator#combineGlobally}} inadvertently uses Kryo serialization 
> to serialize data. This should never happen.
> This is due to Spark's implementation of {{RDD.isEmpty}} which takes the 
> first element to check if there are elements in the {{RDD}}, it is then 
> serialized with Spark's configured serializer (In our case, Kryo).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2651

2017-03-26 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1810) Spark runner combineGlobally uses Kryo serialization

2017-03-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942618#comment-15942618
 ] 

Ismaël Mejía commented on BEAM-1810:


I updated after the PR was merged, and I want to confirm this fix works for me. 
Thanks Aviem!

> Spark runner combineGlobally uses Kryo serialization
> 
>
> Key: BEAM-1810
> URL: https://issues.apache.org/jira/browse/BEAM-1810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> {{TransformTranslator#combineGlobally}} inadvertently uses Kryo serialization 
> to serialize data. This should never happen.
> This is due to Spark's implementation of {{RDD.isEmpty}} which takes the 
> first element to check if there are elements in the {{RDD}}, it is then 
> serialized with Spark's configured serializer (In our case, Kryo).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-1802.
--

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942617#comment-15942617
 ] 

Ismaël Mejía commented on BEAM-1802:


Works perfectly with the merged PR, thanks.

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #3042

2017-03-26 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Python_Verify #1631

2017-03-26 Thread Apache Jenkins Server
See 


Changes:

[iemejia] Fix Regex#AllMatches javadoc

[iemejia] Fix Regex#FindAll javadoc

--
[...truncated 411.33 KB...]
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.3.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
test_empty_side_outputs (apache_beam.transforms.ptransform_test.PTransformTest) 
... ok
:243:
 DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  e.exception.message.startswith(
test_as_singleton_with_different_defaults_without_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
:132:
 UserWarning: Using fallback coder for typehint: List[int].
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
:132:
 UserWarning: Using fallback coder for typehint: Union[].
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.3.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
test_as_list_without_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
test_as_singleton_with_different_defaults_with_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-re

Jenkins build is back to normal : beam_PostCommit_Python_Verify #1630

2017-03-26 Thread Apache Jenkins Server
See 




[3/3] beam git commit: This closes #2329

2017-03-26 Thread iemejia
This closes #2329


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/026aec85
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/026aec85
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/026aec85

Branch: refs/heads/master
Commit: 026aec856c13a7e1e3da08646b9b97ccf1f40e3c
Parents: c9e55a4 b92b966
Author: Ismaël Mejía 
Authored: Mon Mar 27 05:20:22 2017 +0200
Committer: Ismaël Mejía 
Committed: Mon Mar 27 05:20:22 2017 +0200

--
 .../java/org/apache/beam/sdk/transforms/Regex.java| 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)
--




[1/3] beam git commit: Fix Regex#AllMatches javadoc

2017-03-26 Thread iemejia
Repository: beam
Updated Branches:
  refs/heads/master c9e55a436 -> 026aec856


Fix Regex#AllMatches javadoc


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b49bae3a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b49bae3a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b49bae3a

Branch: refs/heads/master
Commit: b49bae3a5881778e051ca7b285ac6ce5938239f4
Parents: c9e55a4
Author: wtanaka.com 
Authored: Sun Mar 26 01:07:54 2017 -1000
Committer: Ismaël Mejía 
Committed: Mon Mar 27 05:19:51 2017 +0200

--
 .../src/main/java/org/apache/beam/sdk/transforms/Regex.java   | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b49bae3a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
index 7e85605..a494fc9 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
@@ -486,7 +486,8 @@ public class Regex {
 
   /**
* {@code Regex.MatchesName} takes a {@code PCollection} and 
returns a {@code
-   * PCollection} representing the value extracted from all the Regex 
groups of the input
+   * PCollection>} representing the value extracted from all the
+   * Regex groups of the input
* {@code PCollection} to the number of times that element occurs in the 
input.
*
* This transform runs a Regex on the entire input line. If the entire 
line does not match the
@@ -497,8 +498,8 @@ public class Regex {
*
* {@code
* PCollection words = ...;
-   * PCollection values =
-   * words.apply(Regex.matches("myregex (mygroup)"));
+   * PCollection> values =
+   * words.apply(Regex.allMatches("myregex (mygroup)"));
* }
*/
   public static class AllMatches



[GitHub] beam pull request #2329: Fix type signature typo in Javadoc

2017-03-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/3] beam git commit: Fix Regex#FindAll javadoc

2017-03-26 Thread iemejia
Fix Regex#FindAll javadoc


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b92b9664
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b92b9664
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b92b9664

Branch: refs/heads/master
Commit: b92b96643732b05326150decace502194656662c
Parents: b49bae3
Author: wtanaka.com 
Authored: Sun Mar 26 01:11:26 2017 -1000
Committer: Ismaël Mejía 
Committed: Mon Mar 27 05:19:56 2017 +0200

--
 .../src/main/java/org/apache/beam/sdk/transforms/Regex.java   | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b92b9664/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
index a494fc9..690d321 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java
@@ -710,7 +710,8 @@ public class Regex {
 
   /**
* {@code Regex.Find} takes a {@code PCollection} and 
returns a {@code
-   * PCollection} representing the value extracted from the Regex 
groups of the input {@code
+   * PCollection>} representing the value extracted from the
+   * Regex groups of the input {@code
* PCollection} to the number of times that element occurs in the input.
*
* This transform runs a Regex on the entire input line. If a portion of 
the line does not
@@ -721,8 +722,8 @@ public class Regex {
*
* {@code
* PCollection words = ...;
-   * PCollection values =
-   * words.apply(Regex.find("myregex (mygroup)"));
+   * PCollection> values =
+   * words.apply(Regex.findAll("myregex (mygroup)"));
* }
*/
   public static class FindAll extends PTransform, 
PCollection>> {



Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2650

2017-03-26 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #2332: BEAM-1760 Potential null dereference in HDFSFileSin...

2017-03-26 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/beam/pull/2332

BEAM-1760 Potential null dereference in HDFSFileSink#doFinalize

Check whether s.getPath().getParent() is null.
If it is null, break out of the loop.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2332


commit 6a36f605e6ba36559ebc71b9551c7fa2308fe906
Author: tedyu 
Date:   2017-03-27T00:49:47Z

BEAM-1760 Potential null dereference in HDFSFileSink#doFinalize




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1760) Potential null dereference in HDFSFileSink#doFinalize

2017-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942509#comment-15942509
 ] 

ASF GitHub Bot commented on BEAM-1760:
--

GitHub user tedyu opened a pull request:

https://github.com/apache/beam/pull/2332

BEAM-1760 Potential null dereference in HDFSFileSink#doFinalize

Check whether s.getPath().getParent() is null.
If it is null, break out of the loop.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2332.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2332


commit 6a36f605e6ba36559ebc71b9551c7fa2308fe906
Author: tedyu 
Date:   2017-03-27T00:49:47Z

BEAM-1760 Potential null dereference in HDFSFileSink#doFinalize




> Potential null dereference in HDFSFileSink#doFinalize
> -
>
> Key: BEAM-1760
> URL: https://issues.apache.org/jira/browse/BEAM-1760
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
>   for (FileStatus s : statuses) {
> String name = s.getPath().getName();
> int pos = name.indexOf('.');
> String ext = pos > 0 ? name.substring(pos) : "";
> fs.rename(
> s.getPath(),
> new Path(s.getPath().getParent(), String.format("part-r-%05d%s", 
> i, ext)));
> i++;
>   }
> }
> {code}
> We should check whether s.getPath().getParent() is null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PerformanceTests_Dataflow #234

2017-03-26 Thread Apache Jenkins Server
See 


--
[...truncated 263.43 KB...]
 * [new ref] refs/pull/928/head -> origin/pr/928/head
 * [new ref] refs/pull/929/head -> origin/pr/929/head
 * [new ref] refs/pull/93/head -> origin/pr/93/head
 * [new ref] refs/pull/930/head -> origin/pr/930/head
 * [new ref] refs/pull/930/merge -> origin/pr/930/merge
 * [new ref] refs/pull/931/head -> origin/pr/931/head
 * [new ref] refs/pull/931/merge -> origin/pr/931/merge
 * [new ref] refs/pull/932/head -> origin/pr/932/head
 * [new ref] refs/pull/932/merge -> origin/pr/932/merge
 * [new ref] refs/pull/933/head -> origin/pr/933/head
 * [new ref] refs/pull/933/merge -> origin/pr/933/merge
 * [new ref] refs/pull/934/head -> origin/pr/934/head
 * [new ref] refs/pull/934/merge -> origin/pr/934/merge
 * [new ref] refs/pull/935/head -> origin/pr/935/head
 * [new ref] refs/pull/936/head -> origin/pr/936/head
 * [new ref] refs/pull/936/merge -> origin/pr/936/merge
 * [new ref] refs/pull/937/head -> origin/pr/937/head
 * [new ref] refs/pull/937/merge -> origin/pr/937/merge
 * [new ref] refs/pull/938/head -> origin/pr/938/head
 * [new ref] refs/pull/939/head -> origin/pr/939/head
 * [new ref] refs/pull/94/head -> origin/pr/94/head
 * [new ref] refs/pull/940/head -> origin/pr/940/head
 * [new ref] refs/pull/940/merge -> origin/pr/940/merge
 * [new ref] refs/pull/941/head -> origin/pr/941/head
 * [new ref] refs/pull/941/merge -> origin/pr/941/merge
 * [new ref] refs/pull/942/head -> origin/pr/942/head
 * [new ref] refs/pull/942/merge -> origin/pr/942/merge
 * [new ref] refs/pull/943/head -> origin/pr/943/head
 * [new ref] refs/pull/943/merge -> origin/pr/943/merge
 * [new ref] refs/pull/944/head -> origin/pr/944/head
 * [new ref] refs/pull/945/head -> origin/pr/945/head
 * [new ref] refs/pull/945/merge -> origin/pr/945/merge
 * [new ref] refs/pull/946/head -> origin/pr/946/head
 * [new ref] refs/pull/946/merge -> origin/pr/946/merge
 * [new ref] refs/pull/947/head -> origin/pr/947/head
 * [new ref] refs/pull/947/merge -> origin/pr/947/merge
 * [new ref] refs/pull/948/head -> origin/pr/948/head
 * [new ref] refs/pull/948/merge -> origin/pr/948/merge
 * [new ref] refs/pull/949/head -> origin/pr/949/head
 * [new ref] refs/pull/949/merge -> origin/pr/949/merge
 * [new ref] refs/pull/95/head -> origin/pr/95/head
 * [new ref] refs/pull/95/merge -> origin/pr/95/merge
 * [new ref] refs/pull/950/head -> origin/pr/950/head
 * [new ref] refs/pull/951/head -> origin/pr/951/head
 * [new ref] refs/pull/951/merge -> origin/pr/951/merge
 * [new ref] refs/pull/952/head -> origin/pr/952/head
 * [new ref] refs/pull/952/merge -> origin/pr/952/merge
 * [new ref] refs/pull/953/head -> origin/pr/953/head
 * [new ref] refs/pull/954/head -> origin/pr/954/head
 * [new ref] refs/pull/954/merge -> origin/pr/954/merge
 * [new ref] refs/pull/955/head -> origin/pr/955/head
 * [new ref] refs/pull/955/merge -> origin/pr/955/merge
 * [new ref] refs/pull/956/head -> origin/pr/956/head
 * [new ref] refs/pull/957/head -> origin/pr/957/head
 * [new ref] refs/pull/958/head -> origin/pr/958/head
 * [new ref] refs/pull/959/head -> origin/pr/959/head
 * [new ref] refs/pull/959/merge -> origin/pr/959/merge
 * [new ref] refs/pull/96/head -> origin/pr/96/head
 * [new ref] refs/pull/96/merge -> origin/pr/96/merge
 * [new ref] refs/pull/960/head -> origin/pr/960/head
 * [new ref] refs/pull/960/merge -> origin/pr/960/merge
 * [new ref] refs/pull/961/head -> origin/pr/961/head
 * [new ref] refs/pull/962/head -> origin/pr/962/head
 * [new ref] refs/pull/962/merge -> origin/pr/962/merge
 * [new ref] refs/pull/963/head -> origin/pr/963/head
 * [new ref] refs/pull/963/merge -> origin/pr/963/merge
 * [new ref] refs/pull/964/head -> origin/pr/964/head
 * [new ref] refs/pull/965/head -> origin/pr/965/head
 * [new ref] refs/pull/965/merge -> origin/pr/965/merge
 * [new ref] refs/pull/966/head -> origin/pr/966/head
 * [new ref] refs/pull/967/head -> origin/pr/967/head
 * [new ref] refs/pull/967/merge -> origin/pr/967/merge
 * [new ref] refs/pull/968/head -> origin/pr/968/head
 * [new ref] refs/pull/968/merge -> origin/pr/968/merge
 * [new ref] refs/pull/969/head -> origin/pr/969/head
 * [new ref] refs/pull/969/merge -> origin/pr/969/merge
 * [new ref] refs/pull/97/head -> origin/pr/97/head
 * [new ref]  

Build failed in Jenkins: beam_PostCommit_Python_Verify #1629

2017-03-26 Thread Apache Jenkins Server
See 


--
[...truncated 404.85 KB...]
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting pyparsing (from packaging>=16.8->setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/pyparsing-2.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr packaging 
appdirs pyparsing
test_undeclared_side_outputs 
(apache_beam.transforms.ptransform_test.PTransformTest) ... ok
:132:
 UserWarning: Using fallback coder for typehint: List[int].
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
:132:
 UserWarning: Using fallback coder for typehint: Union[].
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-34.3.3.zip
test_empty_side_outputs (apache_beam.transforms.ptransform_test.PTransformTest) 
... ok
:132:
 UserWarning: Using fallback coder for typehint: List[int].
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
:132:
 UserWarning: Using fallback coder for typehint: Union[].
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-2.0.0.tar.gz
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting packaging>=16.8 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/packaging-16.8.tar.gz
Collecting appdirs>=1.4.0 (from setuptools->pyhamcrest->-r 
postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/appdirs-1.4.3.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-c

Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2649

2017-03-26 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1772) Support merging WindowFn other than IntervalWindow on Flink Runner

2017-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942410#comment-15942410
 ] 

ASF GitHub Bot commented on BEAM-1772:
--

GitHub user JingsongLi opened a pull request:

https://github.com/apache/beam/pull/2331

[BEAM-1772] Support merging WindowFn other than IntervalWindow on Flink 
Runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JingsongLi/beam BEAM-1772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2331


commit 03bbbf98f57c3b9590a83264945f8a6fa4dfd17a
Author: JingsongLi 
Date:   2017-03-26T18:09:32Z

Abstract combine in Flink Batch Runner

commit ca0098795e73a1c004c059048158cc762a133fc3
Author: JingsongLi 
Date:   2017-03-26T19:05:57Z

[BEAM-1772] Support merging WindowFn other than IntervalWindow on Flink 
Runner




> Support merging WindowFn other than IntervalWindow on Flink Runner
> --
>
> Key: BEAM-1772
> URL: https://issues.apache.org/jira/browse/BEAM-1772
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Assignee: Jingsong Lee
>
> Flink currently supports merging IntervalWindows, however if you have a 
> WindowFn who extends IntervalWindow the execution breaks.
> I found this while executing a Pipeline in Flink's batch mode.
> This will involve probably changing the window merging logic in 
> `FlinkMergingReduceFunction.mergeWindows()` and other similar parts to really 
> use the merging logic of the `WindowFn`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2331: [BEAM-1772] Support merging WindowFn other than Int...

2017-03-26 Thread JingsongLi
GitHub user JingsongLi opened a pull request:

https://github.com/apache/beam/pull/2331

[BEAM-1772] Support merging WindowFn other than IntervalWindow on Flink 
Runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JingsongLi/beam BEAM-1772

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2331


commit 03bbbf98f57c3b9590a83264945f8a6fa4dfd17a
Author: JingsongLi 
Date:   2017-03-26T18:09:32Z

Abstract combine in Flink Batch Runner

commit ca0098795e73a1c004c059048158cc762a133fc3
Author: JingsongLi 
Date:   2017-03-26T19:05:57Z

[BEAM-1772] Support merging WindowFn other than IntervalWindow on Flink 
Runner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1772) Support merging WindowFn other than IntervalWindow on Flink Runner

2017-03-26 Thread Jingsong Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942408#comment-15942408
 ] 

Jingsong Lee commented on BEAM-1772:


Consider the performance, I retained the combining with sorted data, and 
extended a combining based on HashMap State for Non-IntervalWindow merging.

> Support merging WindowFn other than IntervalWindow on Flink Runner
> --
>
> Key: BEAM-1772
> URL: https://issues.apache.org/jira/browse/BEAM-1772
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Assignee: Jingsong Lee
>
> Flink currently supports merging IntervalWindows, however if you have a 
> WindowFn who extends IntervalWindow the execution breaks.
> I found this while executing a Pipeline in Flink's batch mode.
> This will involve probably changing the window merging logic in 
> `FlinkMergingReduceFunction.mergeWindows()` and other similar parts to really 
> use the merging logic of the `WindowFn`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1813) --setup_file needs fuller-path

2017-03-26 Thread Mike Lambert (JIRA)
Mike Lambert created BEAM-1813:
--

 Summary: --setup_file needs fuller-path
 Key: BEAM-1813
 URL: https://issues.apache.org/jira/browse/BEAM-1813
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Mike Lambert
Assignee: Ahmet Altay
Priority: Minor


Running {{--setup_file setup.py}} fails:

{noformat}
  File 
"/Users/lambert/Projects/dancedeets-monorepo/dataflow/lib/apache_beam/runners/dataflow/internal/dependency.py",
 line 402, in _build_setup_package
os.chdir(os.path.dirname(setup_file))
OSError: [Errno 2] No such file or directory: ''
{noformat}

Mainly, because it's attempting to do os.chdir('').

Running {{--setup_file ./setup.py}} works fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PerformanceTests_Dataflow #233

2017-03-26 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam5 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Pruning obsolete local branches
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* +refs/pull/*:refs/remotes/origin/pr/* 
 > --prune
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision c9e55a4360a9fe06d6ed943a222bce524a6b10af (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c9e55a4360a9fe06d6ed943a222bce524a6b10af
 > git rev-list c9e55a4360a9fe06d6ed943a222bce524a6b10af # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Dataflow] $ /bin/bash -xe 
/tmp/hudson900773831144653672.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Dataflow] $ /bin/bash -xe 
/tmp/hudson4934110300191816667.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Dataflow] $ /bin/bash -xe 
/tmp/hudson2300103037968440314.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.0.4 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.11 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Dataflow] $ /bin/bash -xe 
/tmp/hudson3916549271958939212.sh
+ python PerfKitBenchmarker/pkb.py --project=apache-beam-testing 
--dpb_log_level=INFO --maven_binary=/home/jenkins/tools/maven/latest/bin/mvn 
--bigquery_table=beam_performance.pkb_results --official=true 
--benchmarks=dpb_wordcount_benchmark 
--dpb_dataflow_staging_location=gs://temp-storage-for-perf-tests/staging 
--dpb_wordcount_input=dataflow-samples/shakespeare/kinglear.txt 
--config_override=dpb_wordcount_benchmark.dpb_service.service_type=dataflow
WARNING:root:File resource loader root perfkitbenchmarker/data/ycsb is not a 
directory.
2017-03-26 18:01:58,740 48272949 MainThread INFO Verbose logging to: 
/tmp/perfkitbenchmarker/runs/48272949/pkb.log
2017-03-26 18:01:58,740 48272949 MainThread INFO PerfKitBenchmarker 
version: v1.11.0-23-gdc9ac3e
2017-03-26 18:01:58,741 48272949 MainThread INFO Flag values:
--maven_binary=/home/jenkins/tools/maven/latest/bin/mvn
--project=apache-beam-testing
--bigquery_table=beam_performanc

[jira] [Closed] (BEAM-1788) Using google-cloud-datastore in Beam requires re-authentication

2017-03-26 Thread Mike Lambert (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lambert closed BEAM-1788.
--
   Resolution: Won't Fix
Fix Version/s: Not applicable

Okay, seems that the official answers were discussed in the associated github 
thread:
- do not pickle datastore Clients
- rebuild Clients inside the dataflow workers, ideally in start_bundle
- the datastore team may add warnings to help guide users

Going to close this bug now...

> Using google-cloud-datastore in Beam requires re-authentication
> ---
>
> Key: BEAM-1788
> URL: https://issues.apache.org/jira/browse/BEAM-1788
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Mike Lambert
>Assignee: Ahmet Altay
>Priority: Minor
>  Labels: datastore, python
> Fix For: Not applicable
>
>
> When I run a pipeline, I believe everything (params, lexically scoped 
> variables) must be pickleable for the individual processing stages.
> I have to load a dependent datastore record in one of my processing 
> pipelines. (Horribly inefficient, I know, but it's my DB design for now...)
> A {{google.cloud.datastore.Client()}} is not serializable due to the 
> {{google.cloud.datastore._http.Connection}} it contains, that is using GRPC:
> {noformat}
>   File "lib/apache_beam/transforms/ptransform.py", line 474, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
>   File "lib/apache_beam/internal/pickler.py", line 212, in loads
> return dill.loads(s)
>   File "/Users/me/Library/Python/2.7/lib/python/site-packages/dill/dill.py", 
> line 277, in loads
> return load(file)
>   File "/Users/me/Library/Python/2.7/lib/python/site-packages/dill/dill.py", 
> line 266, in load
> obj = pik.load()
>   File 
> "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 864, in load
> dispatch[key](self)
>   File 
> "/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
>  line 1089, in load_newobj
> obj = cls.__new__(cls, *args)
>   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 35, in 
> grpc._cython.cygrpc.Channel.__cinit__ 
> (src/python/grpcio/grpc/_cython/cygrpc.c:4022)
> TypeError: __cinit__() takes at least 2 positional arguments (0 given)
> {noformat}
> So instead, constructing a Client inside my pipeline...it appears to be 
> jumping through hoops to recreate the Client, in that *each* execution of my 
> pipeline is printing:
> {{DEBUG:google_auth_httplib2:Making request: POST 
> https://accounts.google.com/o/oauth2/token}}
> I'm sure Google SRE would be very unhappy if I scaled up this mapreduce. :)
> This is a tricky cross-team interaction issue (only occurs for those using 
> google-cloud-datastore *and* apache-beam google-dataflow), so not sure the 
> proper place to file this. I've cross-posted it at 
> https://github.com/GoogleCloudPlatform/google-cloud-python/issues/3191



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942302#comment-15942302
 ] 

ASF GitHub Bot commented on BEAM-1573:
--

GitHub user peay opened a pull request:

https://github.com/apache/beam/pull/2330

[BEAM-1573] Use Kafka serializers instead of coders in KafkaIO

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



**This is a work in progress, do not merge**.

Modified:
- Use Kafka serializers and deserializers in KafkaIO
- Added helper methods `fromAvro` and `toAvro`, to use serialization based 
on `AvroCoder`. This is uniform with other IO such as HDFS.
- Moved `CoderBasedKafkaSerializer` out, and added 
`CoderBaseKafkaDeserializer`. These are used for `toAvro/fromAvro`, and can be 
useful to port existing code that relies on coder.
- Added `InstantSerializer` and `InstantDeserializer`, as `Instant` is used 
in some of the tests.
 
Writer lets Kafka handle serialization itself. Reader uses Kafka byte 
deserializers, and calls the user-provided Kafka deserializer from `advance`. 
Note that Kafka serializers and deserializers are not themselves 
`Serializable`. Hence, I've used a `Class<..>` in the `spec` both for read and 
write.

There is still an issue, though. `Read` still takes **both a deserializer 
and a coder**. This is because the source must implement 
`getDefaultOutputCoder`, and I am not sure how to infer it. Having to provide 
the two is heavy, but I am not sure how to infer the coders in this context. 
Any thoughts?

cc @rangadi @jkff


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peay/beam BEAM-1573

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2330


commit 511a1301190b08b05573e3025d7ade2746d61e5f
Author: peay 
Date:   2017-03-26T14:51:59Z

[BEAM-1573] Use Kafka serializers instead of coders in KafkaIO




> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
> Fix For: Not applicable
>
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2330: [BEAM-1573] Use Kafka serializers instead of coders...

2017-03-26 Thread peay
GitHub user peay opened a pull request:

https://github.com/apache/beam/pull/2330

[BEAM-1573] Use Kafka serializers instead of coders in KafkaIO

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



**This is a work in progress, do not merge**.

Modified:
- Use Kafka serializers and deserializers in KafkaIO
- Added helper methods `fromAvro` and `toAvro`, to use serialization based 
on `AvroCoder`. This is uniform with other IO such as HDFS.
- Moved `CoderBasedKafkaSerializer` out, and added 
`CoderBaseKafkaDeserializer`. These are used for `toAvro/fromAvro`, and can be 
useful to port existing code that relies on coder.
- Added `InstantSerializer` and `InstantDeserializer`, as `Instant` is used 
in some of the tests.
 
Writer lets Kafka handle serialization itself. Reader uses Kafka byte 
deserializers, and calls the user-provided Kafka deserializer from `advance`. 
Note that Kafka serializers and deserializers are not themselves 
`Serializable`. Hence, I've used a `Class<..>` in the `spec` both for read and 
write.

There is still an issue, though. `Read` still takes **both a deserializer 
and a coder**. This is because the source must implement 
`getDefaultOutputCoder`, and I am not sure how to infer it. Having to provide 
the two is heavy, but I am not sure how to infer the coders in this context. 
Any thoughts?

cc @rangadi @jkff


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peay/beam BEAM-1573

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2330


commit 511a1301190b08b05573e3025d7ade2746d61e5f
Author: peay 
Date:   2017-03-26T14:51:59Z

[BEAM-1573] Use Kafka serializers instead of coders in KafkaIO




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2648

2017-03-26 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Dataflow #232

2017-03-26 Thread Apache Jenkins Server
See 


Changes:

[aviemzur] [BEAM-1810] Replace usage of RDD#isEmpty on non-serialized RDDs

--
[...truncated 231.31 KB...]
 ! 77fee20...7435b56 refs/pull/2154/merge -> origin/pr/2154/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2156/merge: No such 
file or directory
 ! 816ea27...0d9235e refs/pull/2156/merge -> origin/pr/2156/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2157/merge: No such 
file or directory
 ! fb7adab...3f207b5 refs/pull/2157/merge -> origin/pr/2157/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2162/merge: No such 
file or directory
 ! 2ac1b6b...0e0ce79 refs/pull/2162/merge -> origin/pr/2162/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2166/merge: No such 
file or directory
 ! c8946a1...766592c refs/pull/2166/merge -> origin/pr/2166/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2175/merge: No such 
file or directory
 ! e3c9273...76b35fb refs/pull/2175/merge -> origin/pr/2175/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2193/merge: No such 
file or directory
 ! 08b9b5f...72fb5c2 refs/pull/2193/merge -> origin/pr/2193/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2204/merge: No such 
file or directory
 ! f21eeb0...b260f77 refs/pull/2204/merge -> origin/pr/2204/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2207/merge: No such 
file or directory
 ! eae253e...b88c35e refs/pull/2207/merge -> origin/pr/2207/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2209/merge: No such 
file or directory
 ! 2681dd3...88634c8 refs/pull/2209/merge -> origin/pr/2209/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2211/merge: No such 
file or directory
 ! 35887b1...19bf250 refs/pull/2211/merge -> origin/pr/2211/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2212/merge: No such 
file or directory
 ! 77692f9...1b0cc01 refs/pull/2212/merge -> origin/pr/2212/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2214/merge: No such 
file or directory
 ! ca71215...17bb487 refs/pull/2214/merge -> origin/pr/2214/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2216/merge: No such 
file or directory
 ! bb73bd3...9048601 refs/pull/2216/merge -> origin/pr/2216/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2243/merge: No such 
file or directory
 ! 9bd7670...475e2d7 refs/pull/2243/merge -> origin/pr/2243/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2245/merge: No such 
file or directory
 ! be6e174...3afe794 refs/pull/2245/merge -> origin/pr/2245/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2263/merge: No such 
file or directory
 ! fedd6a8...4bad184 refs/pull/2263/merge -> origin/pr/2263/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2266/merge: No such 
file or directory
 ! 5af7d9a...135fb1f refs/pull/2266/merge -> origin/pr/2266/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2269/merge: No such 
file or directory
 ! b082eb8...1e92c1b refs/pull/2269/merge -> origin/pr/2269/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2271/merge: No such 
file or directory
 ! 341b2eb...7e2b80d refs/pull/2271/merge -> origin/pr/2271/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2273/merge: No such 
file or directory
 ! c06d8f6...a28a9a1 refs/pull/2273/merge -> origin/pr/2273/merge  (unable to 
update local ref)
error: unable to resolve reference refs/remotes/origin/pr/2282/merge: No such 
file or directory
 ! b815133...f74db06 refs/pull/2282/merge -> origin/pr/2282/merge  (unable to 
update local ref)
 * [new ref] refs/pull/2283/head -> origin/pr/2283/head
 * [new ref] refs/pull/2283/merge -> origin/pr/2283/merge
 * [new ref] refs/pull/2284/head -> origin/pr/2284/head
 * [new ref] refs/pull/2284/merge -> origin/pr/2284/merge
 * [new ref] refs/pull/2285/head -> origin/pr/2285/head
 * [new ref] refs/pull/2285/merge -> origin/pr/2285/merge
 * [new ref] refs/pull/2286/head -> origin/pr/2286/head
 * [new ref] refs/pull/2286/merge -> origin/pr/2286/merge
 * [new ref] refs/pull/2287/head -> orig

[GitHub] beam pull request #2329: Fix type signature typo in Javadoc

2017-03-26 Thread wtanaka
GitHub user wtanaka opened a pull request:

https://github.com/apache/beam/pull/2329

Fix type signature typo in Javadoc



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wtanaka/beam signaturetypo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2329


commit fe33b3b0d7483dcb294ab5d14c6e36029d39da12
Author: wtanaka.com 
Date:   2017-03-26T11:07:54Z

Fix type signature typo in Javadoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-1812) Allow configuring checkpoints in Flink Runner PipelineOptions

2017-03-26 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-1812:
--

 Summary: Allow configuring checkpoints in Flink Runner 
PipelineOptions
 Key: BEAM-1812
 URL: https://issues.apache.org/jira/browse/BEAM-1812
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


Flink allows fine grained configuration of checkpointing: 
https://github.com/aljoscha/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java

Among other things this allows to configure externalised checkpoints, which is 
a valuable feature when running a job in production because it allows restoring 
a job after a failure as from a savepoint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2647

2017-03-26 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #3038

2017-03-26 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1810) Spark runner combineGlobally uses Kryo serialization

2017-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942200#comment-15942200
 ] 

ASF GitHub Bot commented on BEAM-1810:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2328


> Spark runner combineGlobally uses Kryo serialization
> 
>
> Key: BEAM-1810
> URL: https://issues.apache.org/jira/browse/BEAM-1810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> {{TransformTranslator#combineGlobally}} inadvertently uses Kryo serialization 
> to serialize data. This should never happen.
> This is due to Spark's implementation of {{RDD.isEmpty}} which takes the 
> first element to check if there are elements in the {{RDD}}, it is then 
> serialized with Spark's configured serializer (In our case, Kryo).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2328: [BEAM-1810] Replace usage of RDD#isEmpty on non-ser...

2017-03-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2328


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2328

2017-03-26 Thread aviemzur
This closes #2328


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c9e55a43
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c9e55a43
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c9e55a43

Branch: refs/heads/master
Commit: c9e55a4360a9fe06d6ed943a222bce524a6b10af
Parents: 348d335 b32f048
Author: Aviem Zur 
Authored: Sun Mar 26 11:23:45 2017 +0300
Committer: Aviem Zur 
Committed: Sun Mar 26 11:23:45 2017 +0300

--
 .../translation/GroupCombineFunctions.java  | 15 ++-
 .../spark/translation/TransformTranslator.java  | 26 
 2 files changed, 25 insertions(+), 16 deletions(-)
--




[1/2] beam git commit: [BEAM-1810] Replace usage of RDD#isEmpty on non-serialized RDDs

2017-03-26 Thread aviemzur
Repository: beam
Updated Branches:
  refs/heads/master 348d33588 -> c9e55a436


[BEAM-1810] Replace usage of RDD#isEmpty on non-serialized RDDs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b32f0482
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b32f0482
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b32f0482

Branch: refs/heads/master
Commit: b32f0482784b9df7ce67226b32febe6e664a45b6
Parents: 348d335
Author: Aviem Zur 
Authored: Sat Mar 25 21:49:06 2017 +0300
Committer: Aviem Zur 
Committed: Sun Mar 26 10:31:40 2017 +0300

--
 .../translation/GroupCombineFunctions.java  | 15 ++-
 .../spark/translation/TransformTranslator.java  | 26 
 2 files changed, 25 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b32f0482/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
index b2a589d..917a9ee 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
@@ -18,8 +18,7 @@
 
 package org.apache.beam.runners.spark.translation;
 
-import static com.google.common.base.Preconditions.checkArgument;
-
+import com.google.common.base.Optional;
 import org.apache.beam.runners.spark.coders.CoderHelpers;
 import org.apache.beam.runners.spark.util.ByteArray;
 import org.apache.beam.sdk.coders.Coder;
@@ -67,14 +66,12 @@ public class GroupCombineFunctions {
   /**
* Apply a composite {@link org.apache.beam.sdk.transforms.Combine.Globally} 
transformation.
*/
-  public static  Iterable> 
combineGlobally(
+  public static  Optional>> 
combineGlobally(
   JavaRDD> rdd,
   final SparkGlobalCombineFn sparkCombineFn,
   final Coder iCoder,
   final Coder aCoder,
   final WindowingStrategy windowingStrategy) {
-checkArgument(!rdd.isEmpty(), "CombineGlobally computation should be 
skipped for empty RDDs.");
-
 // coders.
 final WindowedValue.FullWindowedValueCoder wviCoder =
 WindowedValue.FullWindowedValueCoder.of(iCoder,
@@ -93,6 +90,11 @@ public class GroupCombineFunctions {
 // AccumT: A
 // InputT: I
 JavaRDD inputRDDBytes = 
rdd.map(CoderHelpers.toByteFunction(wviCoder));
+
+if (inputRDDBytes.isEmpty()) {
+  return Optional.absent();
+}
+
 /*Itr>*/ byte[] accumulatedBytes = inputRDDBytes.aggregate(
 CoderHelpers.toByteArray(sparkCombineFn.zeroValue(), iterAccumCoder),
 new Function2() {
@@ -115,7 +117,8 @@ public class GroupCombineFunctions {
   }
 }
 );
-return CoderHelpers.fromByteArray(accumulatedBytes, iterAccumCoder);
+
+return Optional.of(CoderHelpers.fromByteArray(accumulatedBytes, 
iterAccumCoder));
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/beam/blob/b32f0482/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
index b4362b0..ffb207a 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
@@ -27,6 +27,7 @@ import static 
org.apache.beam.runners.spark.io.hadoop.ShardNameBuilder.replaceSh
 import static 
org.apache.beam.runners.spark.translation.TranslationUtils.rejectSplittable;
 import static 
org.apache.beam.runners.spark.translation.TranslationUtils.rejectStateAndTimers;
 
+import com.google.common.base.Optional;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Maps;
@@ -259,9 +260,20 @@ public final class TransformTranslator {
 ((BoundedDataset) 
context.borrowDataset(transform)).getRDD();
 
 JavaRDD> outRdd;
-// handle empty input RDD, which will naturally skip the entire 
execution
-// as Spark will not run on empty RDDs.
-if (inRdd.isEmpty()) {
+
+Optional>> maybeAccumulated =
+GroupCombineFunctions.combineGlobally(inRdd, sparkCombineFn, 
iCoder, aCoder,
+windowingStrategy);
+

[jira] [Updated] (BEAM-1811) Extract common class for WithTimestamps.AddTimestampsDoFn and Create.TimestampedValues.ConvertTimestamps

2017-03-26 Thread Wesley Tanaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wesley Tanaka updated BEAM-1811:

Description: 
It seems like these APIs are predominantly duplicative of each other and, that 
it's hard to find one of them if you knew about the other.

https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithTimestamps.java#L134

https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java#L560

What would make the most sense to me is if TimestampedValues were implemented 
in terms of both Values and WithTimestamps.  I'm still learning about Beam 
though -- would this approach cause some kind of performance problem?

  was:
It seems like these APIs are predominantly duplicative of each other and, that 
it's hard to find one of them if you knew about the other.

https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithTimestamps.java#L134

https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java#L560

What would make the most sense to me is if TimestampedValues were implemented 
in terms of both Values and WithTimestamps.


> Extract common class for WithTimestamps.AddTimestampsDoFn and 
> Create.TimestampedValues.ConvertTimestamps
> 
>
> Key: BEAM-1811
> URL: https://issues.apache.org/jira/browse/BEAM-1811
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Wesley Tanaka
>Assignee: Frances Perry
>Priority: Minor
>
> It seems like these APIs are predominantly duplicative of each other and, 
> that it's hard to find one of them if you knew about the other.
> https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithTimestamps.java#L134
> https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java#L560
> What would make the most sense to me is if TimestampedValues were implemented 
> in terms of both Values and WithTimestamps.  I'm still learning about Beam 
> though -- would this approach cause some kind of performance problem?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1811) Extract common class for WithTimestamps.AddTimestampsDoFn and Create.TimestampedValues.ConvertTimestamps

2017-03-26 Thread Wesley Tanaka (JIRA)
Wesley Tanaka created BEAM-1811:
---

 Summary: Extract common class for WithTimestamps.AddTimestampsDoFn 
and Create.TimestampedValues.ConvertTimestamps
 Key: BEAM-1811
 URL: https://issues.apache.org/jira/browse/BEAM-1811
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Wesley Tanaka
Assignee: Frances Perry
Priority: Minor


It seems like these APIs are predominantly duplicative of each other and, that 
it's hard to find one of them if you knew about the other.

https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithTimestamps.java#L134

https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java#L560

What would make the most sense to me is if TimestampedValues were implemented 
in terms of both Values and WithTimestamps.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2646

2017-03-26 Thread Apache Jenkins Server
See