[beam] branch master updated (e24c25b -> fc38698)

2018-09-27 Thread herohde
This is an automated email from the ASF dual-hosted git repository.

herohde pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from e24c25b  Merge pull request #6506: [BEAM-4494] Migrate recent website 
changes from beam-site to beam
 add bb4d322  added avroio package
 add 3ac1bfb  updated read emits to support both string and custom type 
reflects
 add b88c91b  added avro write support
 add 599cba6  updated to be in-line with beam project specifications
 add e9685ba  update package log prints
 add 20cc1a3  added readavro example
 add ea8ed43  updated example package header
 add 936bafd  removed output.avro file
 new fc38698  Go SDK avroio Package - Read/Write Avro Files

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/go/examples/readavro/readavro.go |  91 +++
 sdks/go/pkg/beam/io/avroio/avroio.go  | 201 ++
 2 files changed, 292 insertions(+)
 create mode 100644 sdks/go/examples/readavro/readavro.go
 create mode 100644 sdks/go/pkg/beam/io/avroio/avroio.go



[beam] 01/01: Go SDK avroio Package - Read/Write Avro Files

2018-09-27 Thread herohde
This is an automated email from the ASF dual-hosted git repository.

herohde pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit fc386986c7bc40f6a2094bdbc05150d7b86e419d
Merge: e24c25b 936bafd
Author: Henning Rohde 
AuthorDate: Thu Sep 27 10:33:26 2018 -0700

Go SDK avroio Package - Read/Write Avro Files

 sdks/go/examples/readavro/readavro.go |  91 +++
 sdks/go/pkg/beam/io/avroio/avroio.go  | 201 ++
 2 files changed, 292 insertions(+)



Build failed in Jenkins: beam_PostCommit_Py_VR_Dataflow #1180

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[mergebot] [BEAM-5436] Improve docs for Go SDK

[kedin] Fix Java11 Jira link

--
[...truncated 76.12 KB...]
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-40.4.3.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.11.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  Could not find a version that satisfies the requirement funcsigs>=1 (from 
mock->-r postcommit_requirements.txt (line 2)) (from versions: )
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
No matching distribution found for funcsigs>=1 (from mock->-r 
postcommit_requirements.txt (line 2))
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
test_as_list_twice (apache_beam.transforms.sideinputs_test.SideInputsTest) ... 
ERROR
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-4.2.0.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 

[jira] [Work logged] (BEAM-5399) Beam Dependency Update Request: com.commercehub.gradle.plugin:gradle-avro-plugin 0.15.1

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5399?focusedWorklogId=148803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148803
 ]

ASF GitHub Bot logged work on BEAM-5399:


Author: ASF GitHub Bot
Created on: 27/Sep/18 16:17
Start Date: 27/Sep/18 16:17
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6437: [BEAM-5399] Upgrade 
gradle-avro-plugin to latest version
URL: https://github.com/apache/beam/pull/6437#issuecomment-425154045
 
 
   It's not clear to me why java pre-commits are [timing 
out](https://builds.apache.org/job/beam_PreCommit_Java_Phrase/303/). Not sure 
if it's due to this PR or general flakiness.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148803)
Time Spent: 0.5h  (was: 20m)

> Beam Dependency Update Request: 
> com.commercehub.gradle.plugin:gradle-avro-plugin 0.15.1
> ---
>
> Key: BEAM-5399
> URL: https://issues.apache.org/jira/browse/BEAM-5399
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Scott Wegner
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 2018-09-17 12:19:04.601511
> Please review and upgrade the 
> com.commercehub.gradle.plugin:gradle-avro-plugin to the latest version 0.15.1 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4783) Spark SourceRDD Not Designed With Dynamic Allocation In Mind

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4783?focusedWorklogId=148801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148801
 ]

ASF GitHub Bot logged work on BEAM-4783:


Author: ASF GitHub Bot
Created on: 27/Sep/18 16:12
Start Date: 27/Sep/18 16:12
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6181: [BEAM-4783] 
Add bundleSize for splitting BoundedSources.
URL: https://github.com/apache/beam/pull/6181#issuecomment-425152159
 
 
   I don't know how to proceed.
   
   I am convinced that in batch mode my proposal is the correct way to proceed. 
Another example of a silly interaction that occurs do to using 
defaultParallelism in SourceRDD is reading 2 different files. If one of the two 
files is a couple of orders of magnitude larger you will need to allocate 
enough resources to the job to read the larger file, lets say n cores, then the 
smaller file will get split into n pieces which will result in the smaller file 
being broken up into many very small bundles.
   
   The issue is I do not understand the repercussions this change will have on 
the streaming mode. Maybe we will need to have two different approaches to the 
groupBy logic, one for each mode.
   
   I am ok with this being experimental and only working if you supply the 
--bundleSize to the pipeline options. I would like an answer to the last 
question I asked to understand if in batch mode I can always use the new 
experimental groupByKeyOnlyDefaultPartitioner because I believe it will not 
cause a double shuffle in batch mode.
   
   Other than that I believe I need a code review and make sure everyone agrees 
with the approach.
   
   If this is not agreed upon I would hope someone could give me some advice on 
how to get the SparkRunner to work with dynamicAllocation. (Starting with 2 
cores and spinning up more if the files are large and are split into more 
bundles.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148801)
Time Spent: 3h 10m  (was: 3h)

> Spark SourceRDD Not Designed With Dynamic Allocation In Mind
> 
>
> Key: BEAM-4783
> URL: https://issues.apache.org/jira/browse/BEAM-4783
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Affects Versions: 2.5.0
>Reporter: Kyle Winkelman
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: newbie
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When the spark-runner is used along with the configuration 
> spark.dynamicAllocation.enabled=true the SourceRDD does not detect this. It 
> then falls back to the value calculated in this description:
>   // when running on YARN/SparkDeploy it's the result of max(totalCores, 
> 2).
>   // when running on Mesos it's 8.
>   // when running local it's the total number of cores (local = 1, 
> local[N] = N,
>   // local[*] = estimation of the machine's cores).
>   // ** the configuration "spark.default.parallelism" takes precedence 
> over all of the above **
> So in most cases this default is quite small. This is an issue when using a 
> very large input file as it will only get split in half.
> I believe that when Dynamic Allocation is enable the SourceRDD should use the 
> DEFAULT_BUNDLE_SIZE and possibly expose a SparkPipelineOptions that allows 
> you to change this DEFAULT_BUNDLE_SIZE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #15

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[mergebot] [BEAM-5436] Improve docs for Go SDK

[kedin] Fix Java11 Jira link

--
[...truncated 8.34 KB...]
> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 1.481 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.043 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.009 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.003 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.002 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. 
Took 0.003 secs.
:buildSrc:check 

[jira] [Work logged] (BEAM-4494) Migrate website source code to apache/beam [website-migration] branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4494?focusedWorklogId=148799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148799
 ]

ASF GitHub Bot logged work on BEAM-4494:


Author: ASF GitHub Bot
Created on: 27/Sep/18 16:07
Start Date: 27/Sep/18 16:07
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #6506: [BEAM-4494] Migrate 
recent website changes from beam-site to beam
URL: https://github.com/apache/beam/pull/6506
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/website/src/contribute/index.md b/website/src/contribute/index.md
index 82d629fbe83..9390e65a95e 100644
--- a/website/src/contribute/index.md
+++ b/website/src/contribute/index.md
@@ -338,7 +338,7 @@ When submitting a new PR, please tag 
[@RobbeSneyders](https://github.com/robbesn
 
 Work to support the next LTS release of Java is in progress. For more details 
about the scope and info on the various tasks please see the JIRA ticket.
 
-- JIRA: [BEAM-2530](https://issues.apache.org/jira/issues/BEAM-2530)
+- JIRA: [BEAM-2530](https://issues.apache.org/jira/browse/BEAM-2530)
 - Contact: [Ismaël Mejía](mailto:ieme...@gmail.com)
 
 ### IO Performance Testing
diff --git a/website/src/get-started/quickstart-go.md 
b/website/src/get-started/quickstart-go.md
index 3dcd1562665..a14965eb3b3 100644
--- a/website/src/get-started/quickstart-go.md
+++ b/website/src/get-started/quickstart-go.md
@@ -61,11 +61,14 @@ $ wordcount --input  --output counts
 {:.runner-dataflow}
 ```
 $ go install github.com/apache/beam/sdks/go/examples/wordcount
+# As part of the initial setup, for non linux users - install package unix 
before run
+$ go get -u golang.org/x/sys/unix
 $ wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
 --output gs:///counts \
 --runner dataflow \
 --project your-gcp-project \
 --temp_location gs:///tmp/ \
+--staging_location gs:///binaries/ \
 
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
 ```
 
diff --git a/website/src/get-started/wordcount-example.md 
b/website/src/get-started/wordcount-example.md
index aad78576397..684f5acada7 100644
--- a/website/src/get-started/wordcount-example.md
+++ b/website/src/get-started/wordcount-example.md
@@ -464,11 +464,14 @@ This runner is not yet available for the Go SDK.
 {:.runner-dataflow}
 ```
 $ go install github.com/apache/beam/sdks/go/examples/wordcount
+# As part of the initial setup, for non linux users - install package unix 
before run
+$ go get -u golang.org/x/sys/unix
 $ wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
 --output gs:///counts \
 --runner dataflow \
 --project your-gcp-project \
 --temp_location gs:///tmp/ \
+--staging_location gs:///binaries/ \
 
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
 ```
 
@@ -791,11 +794,14 @@ This runner is not yet available for the Go SDK.
 {:.runner-dataflow}
 ```
 $ go install github.com/apache/beam/sdks/go/examples/debugging_wordcount
+# As part of the initial setup, for non linux users - install package unix 
before run
+$ go get -u golang.org/x/sys/unix
 $ debugging_wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
   --output gs:///counts \
   --runner dataflow \
   --project your-gcp-project \
   --temp_location gs:///tmp/ \
+  --staging_location gs:///binaries/ \
   
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
 ```
 
@@ -1115,11 +1121,14 @@ This runner is not yet available for the Go SDK.
 {:.runner-dataflow}
 ```
 $ go install github.com/apache/beam/sdks/go/examples/windowed_wordcount
+# As part of the initial setup, for non linux users - install package unix 
before run
+$ go get -u golang.org/x/sys/unix
 $ windowed_wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
 --output gs:///counts \
 --runner dataflow \
 --project your-gcp-project \
 --temp_location gs:///tmp/ \
+--staging_location gs:///binaries/ \
 
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
 ```
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the 

[jira] [Work logged] (BEAM-4494) Migrate website source code to apache/beam [website-migration] branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4494?focusedWorklogId=148798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148798
 ]

ASF GitHub Bot logged work on BEAM-4494:


Author: ASF GitHub Bot
Created on: 27/Sep/18 16:07
Start Date: 27/Sep/18 16:07
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6506: [BEAM-4494] Migrate 
recent website changes from beam-site to beam
URL: https://github.com/apache/beam/pull/6506#issuecomment-425150536
 
 
   We're not quite ready yet, but I'll hope we'll be ready by the end of this 
week. You can follow migration progress here: 
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-5456
   
   We plan on updating documentation, and I'll send an email when we're ready 
to make the switch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148798)
Time Spent: 2h 40m  (was: 2.5h)

> Migrate website source code to apache/beam [website-migration] branch
> -
>
> Key: BEAM-4494
> URL: https://issues.apache.org/jira/browse/BEAM-4494
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
> Fix For: 2.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (cdabaf9 -> e24c25b)

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from cdabaf9  [BEAM-5518] Ignore failing ssl validation of globenewswire 
(#6502)
 add a4e771f  [BEAM-5436] Improve docs for Go SDK
 add dbda665  This closes https://github.com/apache/beam-site/pull/557
 add 0f3beb5  Fix Java11 Jira link
 add 364091d  This closes https://github.com/apache/beam-site/pull/564
 add 1544edd  Migrate beam-site sources to apache/beam
 add e24c25b  Merge pull request #6506: [BEAM-4494] Migrate recent website 
changes from beam-site to beam

No new revisions were added by this update.

Summary of changes:
 website/src/contribute/index.md  | 2 +-
 website/src/get-started/quickstart-go.md | 3 +++
 website/src/get-started/wordcount-example.md | 9 +
 3 files changed, 13 insertions(+), 1 deletion(-)



[jira] [Work logged] (BEAM-4499) Migrate Apache website publishing to use apache/beam asf-site branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4499?focusedWorklogId=148797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148797
 ]

ASF GitHub Bot logged work on BEAM-4499:


Author: ASF GitHub Bot
Created on: 27/Sep/18 16:03
Start Date: 27/Sep/18 16:03
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #565: [BEAM-4499] Remove 
non-release documentation sources and generated content
URL: https://github.com/apache/beam-site/pull/565#issuecomment-425149048
 
 
   The diff view here isn't useful; it's easier to see the directory structure 
from the [tree 
view](https://github.com/apache/beam-site/tree/8d95bc45cc05533cc6793482ce3dfa8a92ef6432).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148797)
Time Spent: 0.5h  (was: 20m)

> Migrate Apache website publishing to use apache/beam asf-site branch
> 
>
> Key: BEAM-4499
> URL: https://issues.apache.org/jira/browse/BEAM-4499
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4499) Migrate Apache website publishing to use apache/beam asf-site branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4499?focusedWorklogId=148795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148795
 ]

ASF GitHub Bot logged work on BEAM-4499:


Author: ASF GitHub Bot
Created on: 27/Sep/18 15:56
Start Date: 27/Sep/18 15:56
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #565: [BEAM-4499] Remove 
non-release documentation sources and generated content
URL: https://github.com/apache/beam-site/pull/565#issuecomment-425146638
 
 
   R: @alanmyrvold
   
   Note that the plan is to merge this into a new `release-docs` branch which 
will serve only the release-based javadocs / pydocs


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148795)
Time Spent: 20m  (was: 10m)

> Migrate Apache website publishing to use apache/beam asf-site branch
> 
>
> Key: BEAM-4499
> URL: https://issues.apache.org/jira/browse/BEAM-4499
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4499) Migrate Apache website publishing to use apache/beam asf-site branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4499?focusedWorklogId=148792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148792
 ]

ASF GitHub Bot logged work on BEAM-4499:


Author: ASF GitHub Bot
Created on: 27/Sep/18 15:53
Start Date: 27/Sep/18 15:53
Worklog Time Spent: 10m 
  Work Description: swegner opened a new pull request #565: [BEAM-4499] 
Remove non-release documentation sources and generated content
URL: https://github.com/apache/beam-site/pull/565
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148792)
Time Spent: 10m
Remaining Estimate: 0h

> Migrate Apache website publishing to use apache/beam asf-site branch
> 
>
> Key: BEAM-4499
> URL: https://issues.apache.org/jira/browse/BEAM-4499
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] branch release-docs created (now 87fb41e)

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a change to branch release-docs
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


  at 87fb41e  Prepare repository for deployment.

No new revisions were added by this update.



[jira] [Work logged] (BEAM-4494) Migrate website source code to apache/beam [website-migration] branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4494?focusedWorklogId=148791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148791
 ]

ASF GitHub Bot logged work on BEAM-4494:


Author: ASF GitHub Bot
Created on: 27/Sep/18 15:50
Start Date: 27/Sep/18 15:50
Worklog Time Spent: 10m 
  Work Description: akedin commented on issue #6506: [BEAM-4494] Migrate 
recent website changes from beam-site to beam
URL: https://github.com/apache/beam/pull/6506#issuecomment-425144835
 
 
   Should we start directly adding the docs to beam instead of beam-site? Or 
have a script to move stuff between them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148791)
Time Spent: 2.5h  (was: 2h 20m)

> Migrate website source code to apache/beam [website-migration] branch
> -
>
> Key: BEAM-4494
> URL: https://issues.apache.org/jira/browse/BEAM-4494
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
> Fix For: 2.5.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4494) Migrate website source code to apache/beam [website-migration] branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4494?focusedWorklogId=148790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148790
 ]

ASF GitHub Bot logged work on BEAM-4494:


Author: ASF GitHub Bot
Created on: 27/Sep/18 15:45
Start Date: 27/Sep/18 15:45
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6506: [BEAM-4494] Migrate 
recent website changes from beam-site to beam
URL: https://github.com/apache/beam/pull/6506#issuecomment-425142933
 
 
   R: @akedin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148790)
Time Spent: 2h 20m  (was: 2h 10m)

> Migrate website source code to apache/beam [website-migration] branch
> -
>
> Key: BEAM-4494
> URL: https://issues.apache.org/jira/browse/BEAM-4494
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
> Fix For: 2.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4494) Migrate website source code to apache/beam [website-migration] branch

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4494?focusedWorklogId=148789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148789
 ]

ASF GitHub Bot logged work on BEAM-4494:


Author: ASF GitHub Bot
Created on: 27/Sep/18 15:45
Start Date: 27/Sep/18 15:45
Worklog Time Spent: 10m 
  Work Description: swegner opened a new pull request #6506: [BEAM-4494] 
Migrate recent website changes from beam-site to beam
URL: https://github.com/apache/beam/pull/6506
 
 
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148789)
Time Spent: 2h 10m  (was: 2h)

> Migrate website source code to apache/beam [website-migration] branch
> -
>
> Key: BEAM-4494
> URL: https://issues.apache.org/jira/browse/BEAM-4494
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott 

[jira] [Updated] (BEAM-5500) Portable python sdk worker leaks memory in streaming mode

2018-09-27 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-5500:
--
Labels: portability-flink  (was: )

> Portable python sdk worker leaks memory in streaming mode
> -
>
> Key: BEAM-5500
> URL: https://issues.apache.org/jira/browse/BEAM-5500
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Micah Wylde
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
> Attachments: chart.png
>
>
> When using the portable python sdk with flink in streaming mode, we see that 
> the python worker processes steadily increase memory usage until they are OOM 
> killed. This behavior is consistent across various kinds of streaming 
> pipelines, including those with fixed windows and global windows.
> A simple wordcount-like pipeline demonstrates the issue for us (note this is 
> run on the [Lyft beam fork|https://github.com/lyft/beam/], which provides 
> access to kinesis as a portable streaming source):
> {code:java}
> counts = (p
> | 'Kinesis' >> FlinkKinesisInput().with_stream('test-stream')
> | 'decode' >> beam.FlatMap(decode) # parses from json into python objs
> | 'pair_with_one' >> beam.Map(lambda x: (x["event_name"], 1))
> | 'window' >> beam.WindowInto(window.GlobalWindows(),
>   trigger=AfterProcessingTime(15 * 1000),
>   accumulation_mode=AccumulationMode.DISCARDING)
> | 'group' >> beam.GroupByKey()
> | 'count' >> beam.Map(count_ones)
> | beam.Map(lambda x: logging.warn("count: %s", str(x)) or x))
> {code}
> When run, we see a steady increase in memory usage in the sdk_worker process. 
> Using [heapy|http://guppy-pe.sourceforge.net/#Heapy] I've analyzed the memory 
> usage over time and found that it's largely dicts and strings (see attached 
> chart).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148775
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 15:01
Start Date: 27/Sep/18 15:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6375: [BEAM-4858] Clean up 
division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#issuecomment-425126489
 
 
   You're right, the a and b were switched in computing the error term when I 
copied this to the PR. This meant that significantly more points were 
considered outliers (but enough retained to typically give a reasonable 
regression). Unfortunately this fix means that it's still pretty sensitive to 
multiple outliers...
   
   I'm trying a simpler approach: just assume the top quantile is outliers. We 
have enough data to make this pretty robust. Running experiments now. 
   
   (As for computing h, I used sagemath.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148775)
Time Spent: 4.5h  (was: 4h 20m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 14/15: Publishing website 2018/09/26 03:27:28 at commit b50c0ae

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 638229394ba8c8ddf204f1676902cb207eb8ca5c
Author: jenkins 
AuthorDate: Wed Sep 26 03:27:28 2018 +

Publishing website 2018/09/26 03:27:28 at commit b50c0ae



[beam] 10/15: Publishing website 2018/09/26 03:03:34 at commit 7576a9b

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 20cc7978f97cfb6cb3a737550bbc6f04e2a676b6
Author: jenkins 
AuthorDate: Wed Sep 26 03:03:34 2018 +

Publishing website 2018/09/26 03:03:34 at commit 7576a9b



[beam] 03/15: Publishing website 2018/09/26 02:14:31 at commit d87b42f

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 615322eacf535e2694a478bfd856e8dda69ed900
Author: jenkins 
AuthorDate: Wed Sep 26 02:14:31 2018 +

Publishing website 2018/09/26 02:14:31 at commit d87b42f



[beam] 12/15: Publishing website 2018/09/26 03:14:38 at commit b16bd7d

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit b3c8fc8fd18fc6fc87ab834b29ffc9209e97ad3a
Author: jenkins 
AuthorDate: Wed Sep 26 03:14:38 2018 +

Publishing website 2018/09/26 03:14:38 at commit b16bd7d



[beam] 05/15: Publishing website 2018/09/26 02:26:05 at commit fdcaa7c

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit bbb83a40a34aac3aa632b4c851925970d7d7fa9b
Author: jenkins 
AuthorDate: Wed Sep 26 02:26:05 2018 +

Publishing website 2018/09/26 02:26:05 at commit fdcaa7c



[beam] 08/15: Publishing website 2018/09/26 02:51:58 at commit 6fb19ce

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 42e0dec6bced4ddb2ed004075f7f94d5b5020fb1
Author: jenkins 
AuthorDate: Wed Sep 26 02:51:58 2018 +

Publishing website 2018/09/26 02:51:58 at commit 6fb19ce



[beam] 04/15: Publishing website 2018/09/26 02:20:18 at commit 8cf8566

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit a156066da26500b5edb8f7e56f5b0805b666d5af
Author: jenkins 
AuthorDate: Wed Sep 26 02:20:18 2018 +

Publishing website 2018/09/26 02:20:18 at commit 8cf8566



[beam] 13/15: Publishing website 2018/09/26 03:22:29 at commit 4edeecc

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 12b0ceb7b61880be74b5eeb24b3e370c569f813c
Author: jenkins 
AuthorDate: Wed Sep 26 03:22:29 2018 +

Publishing website 2018/09/26 03:22:29 at commit 4edeecc



[beam] 09/15: Publishing website 2018/09/26 03:00:36 at commit ba06d1f

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit be93908adce4d832917fcdcbf5444f0505f4e507
Author: jenkins 
AuthorDate: Wed Sep 26 03:00:36 2018 +

Publishing website 2018/09/26 03:00:36 at commit ba06d1f



[beam] 11/15: Publishing website 2018/09/26 03:07:13 at commit 86f7f05

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit ac3df3e4ab719e7d14752910378aac9bc604d4e0
Author: jenkins 
AuthorDate: Wed Sep 26 03:07:13 2018 +

Publishing website 2018/09/26 03:07:13 at commit 86f7f05



[beam] 07/15: Publishing website 2018/09/26 02:39:15 at commit 7875d2c

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 827cc9aa30c589ccd460303cfdce1cb9224561d7
Author: jenkins 
AuthorDate: Wed Sep 26 02:39:15 2018 +

Publishing website 2018/09/26 02:39:15 at commit 7875d2c



[beam] 01/15: Add a README for the asf-site branch

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 71d5b0e126e13dd629ce0e2adcf2449ad6adec55
Author: Scott Wegner 
AuthorDate: Thu Sep 27 07:49:34 2018 -0700

Add a README for the asf-site branch
---
 README.md | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/README.md b/README.md
new file mode 100644
index 000..15962d5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,26 @@
+
+
+This branch contains the generated code for http://beam.apache.org.
+
+To contribute to the  website, please modify the
+[website sources on 
master](https://github.com/apache/beam/tree/master/website).
+See the
+[contribution 
guide](https://beam.apache.org/contribute/#contributing-to-the-website)
+for details.



[beam] 06/15: Publishing website 2018/09/26 02:31:39 at commit 076f73c

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit b2c6acd05a34b03e9f6d3ab97136227674e6c32b
Author: jenkins 
AuthorDate: Wed Sep 26 02:31:39 2018 +

Publishing website 2018/09/26 02:31:39 at commit 076f73c



[beam] 02/15: Publishing website 2018/09/26 01:51:44 at commit e9b7e5c

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git

commit e97397e8eb696de164537d619adf2525bd8e0e8d
Author: jenkins 
AuthorDate: Wed Sep 26 01:51:44 2018 +

Publishing website 2018/09/26 01:51:44 at commit e9b7e5c
---
 content  |   1 +
 website/generated-content/index.html | 405 +++
 2 files changed, 406 insertions(+)

diff --git a/content b/content
new file mode 12
index 000..6cb4abe
--- /dev/null
+++ b/content
@@ -0,0 +1 @@
+website/generated-content
\ No newline at end of file
diff --git a/website/generated-content/index.html 
b/website/generated-content/index.html
new file mode 100644
index 000..dbdaf3d
--- /dev/null
+++ b/website/generated-content/index.html
@@ -0,0 +1,405 @@
+
+
+
+
+  
+
+
+  
+  
+  
+  Apache Beam
+  
+  https://fonts.googleapis.com/css?family=Roboto:100,300,400; 
rel="stylesheet">
+  
+  https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";>
+  
+  
+  
+  
+  
+  https://beam.apache.org/; data-proofer-ignore>
+  
+  https://beam.apache.org/feed.xml;>
+  
+
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+ga('create', 'UA-73650088-1', 'auto');
+ga('send', 'pageview');
+  
+
+
+  
+
+
+
+
+  
+Toggle navigation
+
+
+
+  
+
+  
+
+  
+
+
+
+
+
+  
+
+  Get Started
+
+
+  Documentation
+
+
+  SDKS
+
+
+  RUNNERS
+
+
+  Contribute
+
+
+  Community
+
+Blog
+  
+  
+
+  https://www.apache.org/foundation/press/kit/feather_small.png; alt="Apache 
Logo" style="height:20px;">
+  
+http://www.apache.org/;>ASF Homepage
+http://www.apache.org/licenses/;>License
+http://www.apache.org/security/;>Security
+http://www.apache.org/foundation/thanks.html;>Thanks
+http://www.apache.org/foundation/sponsorship.html;>Sponsorship
+https://www.apache.org/foundation/policies/conduct;>Code of 
Conduct
+  
+
+  
+
+
+
+
+  
+
+  
+
+  
+
+  
+Apache Beam: An advanced unified programming model
+  
+  
+Implement batch and streaming data processing jobs that run on any 
execution engine.
+  
+  
+Learn more
+Download Beam SDK 2.6.0
+  
+  
+Java 
Quickstart
+Python 
Quickstart
+   Go 
Quickstart
+  
+
+  
+  
+
+  
+The latest from the blog
+  
+  
+
+
+  Beam Summit Europe 
2018
+  Aug 21, 2018
+
+
+
+  A review of input 
streaming connectors
+  Aug 20, 2018
+
+
+
+  Apache Beam 
2.6.0
+  Aug 10, 2018
+
+
+  
+
+  
+
+  
+
+
+
+  
+All about Apache Beam
+  
+  
+
+
+  
+Unified
+  
+  
+Use a single programming model for both batch and streaming use cases.
+  
+
+
+
+  
+Portable
+  
+  
+Execute pipelines on multiple execution environments.
+  
+
+
+
+  
+Extensible
+  
+  
+Write and share new SDKs, IO connectors, and transformation libraries.
+  
+
+
+  
+
+
+
+
+
+
+
+
+
+  
+Works with
+  
+  
+
+
+  http://apex.apache.org;>
+
+
+
+  http://flink.apache.org;>
+
+
+
+  http://spark.apache.org/;>
+
+
+
+  https://cloud.google.com/dataflow/;>
+
+
+
+  http://gearpump.apache.org/;>
+
+
+
+  http://samza.apache.org/;>
+
+
+  
+
+
+
+  
+
+  Testimonials
+
+
+  
+  
+
+  A framework that delivers the flexibility and advanced functionality 
our customers need.
+
+
+  
+  
+–Talend
+  
+
+  
+  
+  
+
+  Apache Beam has powerful semantics that solve real-world challenges 
of stream processing.
+
+
+  
+  
+–PayPal
+  
+
+  
+  
+  
+
+  Apache Beam represents a 

[jira] [Work logged] (BEAM-2887) Python SDK support for portable pipelines

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2887?focusedWorklogId=148771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148771
 ]

ASF GitHub Bot logged work on BEAM-2887:


Author: ASF GitHub Bot
Created on: 27/Sep/18 14:53
Start Date: 27/Sep/18 14:53
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6504: [BEAM-2887] Remove 
special FnApi version of wordcount.
URL: https://github.com/apache/beam/pull/6504#issuecomment-425123358
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148771)
Time Spent: 20m  (was: 10m)

> Python SDK support for portable pipelines
> -
>
> Key: BEAM-2887
> URL: https://issues.apache.org/jira/browse/BEAM-2887
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ahmet Altay
>Priority: Major
>  Labels: portability
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148770
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 14:52
Start Date: 27/Sep/18 14:52
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6375: [BEAM-4858] Clean 
up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#issuecomment-425123012
 
 
   Thanks. The code LGTM, PTAL at test failures and please do a performance 
check using updated code before merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148770)
Time Spent: 4h 20m  (was: 4h 10m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2887) Python SDK support for portable pipelines

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2887?focusedWorklogId=148769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148769
 ]

ASF GitHub Bot logged work on BEAM-2887:


Author: ASF GitHub Bot
Created on: 27/Sep/18 14:52
Start Date: 27/Sep/18 14:52
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #6504: [BEAM-2887] 
Remove special FnApi version of wordcount.
URL: https://github.com/apache/beam/pull/6504
 
 
   The "standard" wordcount now runs on FnApi, use that for integration testing.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148769)
Time Spent: 10m
Remaining Estimate: 0h

> Python SDK support for portable pipelines
> -
>
> Key: BEAM-2887
> URL: https://issues.apache.org/jira/browse/BEAM-2887
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>

[jira] [Commented] (BEAM-5500) Portable python sdk worker leaks memory in streaming mode

2018-09-27 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630515#comment-16630515
 ] 

Thomas Weise commented on BEAM-5500:


The pipeline runs in streaming mode. With the Flink runner, currently every 
record is processed as a separate bundle. The pipeline above has multiple 
executable stages. The leak will be: number_of_records * number_of_stages * 300.

So if we process 1 million records and have 2 stages, we have leaked 600MB of 
memory.

 

> Portable python sdk worker leaks memory in streaming mode
> -
>
> Key: BEAM-5500
> URL: https://issues.apache.org/jira/browse/BEAM-5500
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Micah Wylde
>Assignee: Robert Bradshaw
>Priority: Major
> Attachments: chart.png
>
>
> When using the portable python sdk with flink in streaming mode, we see that 
> the python worker processes steadily increase memory usage until they are OOM 
> killed. This behavior is consistent across various kinds of streaming 
> pipelines, including those with fixed windows and global windows.
> A simple wordcount-like pipeline demonstrates the issue for us (note this is 
> run on the [Lyft beam fork|https://github.com/lyft/beam/], which provides 
> access to kinesis as a portable streaming source):
> {code:java}
> counts = (p
> | 'Kinesis' >> FlinkKinesisInput().with_stream('test-stream')
> | 'decode' >> beam.FlatMap(decode) # parses from json into python objs
> | 'pair_with_one' >> beam.Map(lambda x: (x["event_name"], 1))
> | 'window' >> beam.WindowInto(window.GlobalWindows(),
>   trigger=AfterProcessingTime(15 * 1000),
>   accumulation_mode=AccumulationMode.DISCARDING)
> | 'group' >> beam.GroupByKey()
> | 'count' >> beam.Map(count_ones)
> | beam.Map(lambda x: logging.warn("count: %s", str(x)) or x))
> {code}
> When run, we see a steady increase in memory usage in the sdk_worker process. 
> Using [heapy|http://guppy-pe.sourceforge.net/#Heapy] I've analyzed the memory 
> usage over time and found that it's largely dicts and strings (see attached 
> chart).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Python_VR_Flink #158

2018-09-27 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread Alan Myrvold (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Myrvold resolved BEAM-5518.

   Resolution: Fixed
Fix Version/s: 2.8.0

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
> Fix For: 2.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
>  - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  
> Works fine opening in a browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #14

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[scott] [BEAM-5518] Ignore failing ssl validation of globenewswire (#6502)

--
[...truncated 8.31 KB...]
> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 1.367 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.024 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.002 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.002 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
3,5,main]) completed. Took 0.002 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 3,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 3,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 3,5,main]) completed. 
Took 0.004 secs.
:buildSrc:check (Thread[Task 

[jira] [Work logged] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?focusedWorklogId=148753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148753
 ]

ASF GitHub Bot logged work on BEAM-5518:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:48
Start Date: 27/Sep/18 13:48
Worklog Time Spent: 10m 
  Work Description: swegner closed pull request #6502: [BEAM-5518] Ignore 
failing ssl validation of globenewswire
URL: https://github.com/apache/beam/pull/6502
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/website/Rakefile b/website/Rakefile
index 40283e8ecac..5160bad45f5 100644
--- a/website/Rakefile
+++ b/website/Rakefile
@@ -17,7 +17,8 @@ task :test do
 
 /jstorm.io/,
 /datatorrent.com/,
-/ai.google/ # https://issues.apache.org/jira/browse/INFRA-16527
+/ai.google/, # https://issues.apache.org/jira/browse/INFRA-16527
+/globenewswire.com/ # https://issues.apache.org/jira/browse/BEAM-5518
 ],
 :parallel => { :in_processes => Etc.nprocessors },
 }).run


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148753)
Time Spent: 1h  (was: 50m)

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
>  - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  
> Works fine opening in a browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: [BEAM-5518] Ignore failing ssl validation of globenewswire (#6502)

2018-09-27 Thread scott
This is an automated email from the ASF dual-hosted git repository.

scott pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new cdabaf9  [BEAM-5518] Ignore failing ssl validation of globenewswire 
(#6502)
cdabaf9 is described below

commit cdabaf9d87c0ffa1a37f74b483fb4e4d6ce1686e
Author: Alan Myrvold 
AuthorDate: Thu Sep 27 09:48:02 2018 -0400

[BEAM-5518] Ignore failing ssl validation of globenewswire (#6502)
---
 website/Rakefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/website/Rakefile b/website/Rakefile
index 40283e8..5160bad 100644
--- a/website/Rakefile
+++ b/website/Rakefile
@@ -17,7 +17,8 @@ task :test do
 
 /jstorm.io/,
 /datatorrent.com/,
-/ai.google/ # https://issues.apache.org/jira/browse/INFRA-16527
+/ai.google/, # https://issues.apache.org/jira/browse/INFRA-16527
+/globenewswire.com/ # https://issues.apache.org/jira/browse/BEAM-5518
 ],
 :parallel => { :in_processes => Etc.nprocessors },
 }).run



[jira] [Work logged] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?focusedWorklogId=148751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148751
 ]

ASF GitHub Bot logged work on BEAM-5518:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:47
Start Date: 27/Sep/18 13:47
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #6502: [BEAM-5518] Ignore 
failing ssl validation of globenewswire
URL: https://github.com/apache/beam/pull/6502#issuecomment-425099041
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148751)
Time Spent: 50m  (was: 40m)

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
>  - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  
> Works fine opening in a browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?focusedWorklogId=148750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148750
 ]

ASF GitHub Bot logged work on BEAM-5518:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:47
Start Date: 27/Sep/18 13:47
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on a change in pull request 
#6502: [BEAM-5518] Ignore failing ssl validation of globenewswire
URL: https://github.com/apache/beam/pull/6502#discussion_r220929452
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -17,7 +17,8 @@ task :test do
 
 /jstorm.io/,
 /datatorrent.com/,
-/ai.google/ # https://issues.apache.org/jira/browse/INFRA-16527
+/ai.google/, # https://issues.apache.org/jira/browse/INFRA-16527
+/globenewswire.com/
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148750)
Time Spent: 40m  (was: 0.5h)

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
>  - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  
> Works fine opening in a browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?focusedWorklogId=148749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148749
 ]

ASF GitHub Bot logged work on BEAM-5518:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:46
Start Date: 27/Sep/18 13:46
Worklog Time Spent: 10m 
  Work Description: swegner commented on a change in pull request #6502: 
[BEAM-5518] Ignore failing ssl validation of globenewswire
URL: https://github.com/apache/beam/pull/6502#discussion_r220929161
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -17,7 +17,8 @@ task :test do
 
 /jstorm.io/,
 /datatorrent.com/,
-/ai.google/ # https://issues.apache.org/jira/browse/INFRA-16527
+/ai.google/, # https://issues.apache.org/jira/browse/INFRA-16527
+/globenewswire.com/
 
 Review comment:
   Can you add a comment as to why this is failing? Preferably a JIRA that we 
can can track context and check up on to see when it is fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148749)
Time Spent: 0.5h  (was: 20m)

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
>  - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  
> Works fine opening in a browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5272) Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5272?focusedWorklogId=148745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148745
 ]

ASF GitHub Bot logged work on BEAM-5272:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:43
Start Date: 27/Sep/18 13:43
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 closed pull request #6308: [BEAM-5272] 
Randomize the reduced splits in BigtableIO so that multiple workers may not hit 
the same tablet server
URL: https://github.com/apache/beam/pull/6308
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
index ae8fe7d04d9..cb5a174713e 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
@@ -848,18 +848,27 @@ protected BigtableSource withEstimatedSizeBytes(Long 
estimatedSizeBytes) {
   // Delegate to testable helper.
   List splits =
   splitBasedOnSamples(desiredBundleSizeBytes, 
getSampleRowKeys(options));
-  return reduceSplits(splits, options, MAX_SPLIT_COUNT);
+
+  // Reduce the splits.
+  List reduced = reduceSplits(splits, options, 
MAX_SPLIT_COUNT);
+  // Randomize the result before returning an immutable copy of the 
splits, the default behavior
+  // may lead to multiple workers hitting the same tablet.
+  Collections.shuffle(reduced);
+  return ImmutableList.copyOf(reduced);
 }
 
+/**
+ * Returns a mutable list of reduced splits.
+ */
 @VisibleForTesting
 protected List reduceSplits(
 List splits, PipelineOptions options, long 
maxSplitCounts)
 throws IOException {
   int numberToCombine = (int) ((splits.size() + maxSplitCounts - 1) / 
maxSplitCounts);
   if (splits.size() < maxSplitCounts || numberToCombine < 2) {
-return splits;
+return new ArrayList<>(splits);
   }
-  ImmutableList.Builder reducedSplits = 
ImmutableList.builder();
+  List reducedSplits = new ArrayList<>();
   List previousSourceRanges = new ArrayList();
   int counter = 0;
   long size = 0;
@@ -879,7 +888,7 @@ protected BigtableSource withEstimatedSizeBytes(Long 
estimatedSizeBytes) {
   if (size > 0) {
 reducedSplits.add(new BigtableSource(config, filter, 
previousSourceRanges, size));
   }
-  return reducedSplits.build();
+  return reducedSplits;
 }
 
 /**
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
index 47727e5b8a1..518dc104c4e 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java
@@ -100,7 +100,7 @@
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.TypeDescriptor;
 import org.hamcrest.Matchers;
-import org.hamcrest.collection.IsIterableContainingInOrder;
+import org.hamcrest.collection.IsIterableContainingInAnyOrder;
 import org.junit.Before;
 import org.junit.Rule;
 import org.junit.Test;
@@ -680,10 +680,10 @@ public void testReduceSplitsWithSomeNonAdjacentRanges() 
throws Exception {
 keyRanges,
 null /*size*/);
 
-List splits =
-source.split(numRows * bytesPerRow / numSamples, null /* options */);
-
-assertThat(splits, hasSize(keyRanges.size()));
+List splits = new ArrayList<>();
+for (ByteKeyRange range : keyRanges) {
+  splits.add(source.withSingleRange(range));
+}
 
 List reducedSplits = source.reduceSplits(splits, null, 
maxSplit);
 
@@ -697,7 +697,8 @@ public void testReduceSplitsWithSomeNonAdjacentRanges() 
throws Exception {
 
 assertThat(
 actualRangesAfterSplit,
-
IsIterableContainingInOrder.contains(expectedKeyRangesAfterReducedSplits.toArray()));
+IsIterableContainingInAnyOrder.containsInAnyOrder(
+expectedKeyRangesAfterReducedSplits.toArray()));
   }
 
   /** Tests reduce split with all non adjacent ranges. */
@@ -730,10 +731,10 @@ public void testReduceSplitsWithAllNonAdjacentRange() 
throws Exception {
 keyRanges,
 null /*size*/);
 
-List splits =
-

[jira] [Work logged] (BEAM-5272) Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5272?focusedWorklogId=148744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148744
 ]

ASF GitHub Bot logged work on BEAM-5272:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:43
Start Date: 27/Sep/18 13:43
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 commented on issue #6503: [BEAM-5272] 
Randomize the reduced splits in BigtableIO so that multiple workers may not hit 
the same tablet server
URL: https://github.com/apache/beam/pull/6503#issuecomment-425097437
 
 
   @chamikaramj, could you take a look? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148744)
Time Spent: 1h 10m  (was: 1h)

> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server
> --
>
> Key: BEAM-5272
> URL: https://issues.apache.org/jira/browse/BEAM-5272
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kevin Si
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5272) Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5272?focusedWorklogId=148742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148742
 ]

ASF GitHub Bot logged work on BEAM-5272:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:42
Start Date: 27/Sep/18 13:42
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 opened a new pull request #6503: 
[BEAM-5272] Randomize the reduced splits in BigtableIO so that multiple workers 
may not hit the same tablet server
URL: https://github.com/apache/beam/pull/6503
 
 
   Randomize the reduced splits in BigtableIO so that multiple workers may not 
hit the same tablet server.
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148742)
Time Spent: 50m  (was: 40m)

> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server
> --
>
> Key: BEAM-5272
> URL: https://issues.apache.org/jira/browse/BEAM-5272
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kevin Si
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5272) Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5272?focusedWorklogId=148743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148743
 ]

ASF GitHub Bot logged work on BEAM-5272:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:42
Start Date: 27/Sep/18 13:42
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 commented on issue #6308: [BEAM-5272] 
Randomize the reduced splits in BigtableIO so that multiple workers may not hit 
the same tablet server
URL: https://github.com/apache/beam/pull/6308#issuecomment-425097332
 
 
   Created a new PR: https://github.com/apache/beam/pull/6503


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148743)
Time Spent: 1h  (was: 50m)

> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server
> --
>
> Key: BEAM-5272
> URL: https://issues.apache.org/jira/browse/BEAM-5272
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kevin Si
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread Alan Myrvold (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Myrvold updated BEAM-5518:
---
Description: 
Failure when running: 

./gradlew :beam-website:testWebsite

Error includes:
 - ./generated-content/blog/2017/02/01/graduation-media-recap.html

  *  External link 
[https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
 failed: response code 0 means something's wrong.

             It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.

             Sometimes, making too many requests at once also breaks things.

             Either way, the return message (if any) from the server is: Peer 
certificate cannot be authenticated with given CA certificates

rake aborted!

HTML-Proofer found 1 failure!

 

Also fails when running:

curl -v 
[https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]

 

Works fine opening in a browser.

  was:
Failure when running: 

./gradlew :beam-website:testWebsite

Error includes:

- ./generated-content/blog/2017/02/01/graduation-media-recap.html

  *  External link 
https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html
 failed: response code 0 means something's wrong.

             It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.

             Sometimes, making too many requests at once also breaks things.

             Either way, the return message (if any) from the server is: Peer 
certificate cannot be authenticated with given CA certificates

rake aborted!

HTML-Proofer found 1 failure!

 

Also fails when running:

curl -v 
https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html


> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
>  - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> [https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html]
>  
> Works fine opening in a browser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-27 Thread Robert Bradshaw (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630428#comment-16630428
 ] 

Robert Bradshaw commented on BEAM-5509:
---

I don't think we should be representing integral values as floating point in 
the pipeline options representation (though perhaps we'd have to use strings 
given that JSON doesn't support ints.)

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5509) Python pipeline_options doesn't handle int type

2018-09-27 Thread Robert Bradshaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-5509:
--
Labels: portability-flink  (was: )

> Python pipeline_options doesn't handle int type
> ---
>
> Key: BEAM-5509
> URL: https://issues.apache.org/jira/browse/BEAM-5509
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability-flink
>
> The int option supplied at the command line is turned into a decimal during 
> serialization and then the parser in SDK harness fails to restore it as int.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?focusedWorklogId=148741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148741
 ]

ASF GitHub Bot logged work on BEAM-5518:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:37
Start Date: 27/Sep/18 13:37
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6502: [BEAM-5518] 
Ignore failing ssl validation of globenewswire
URL: https://github.com/apache/beam/pull/6502#issuecomment-425095412
 
 
   +R: @swegner PTAL?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148741)
Time Spent: 20m  (was: 10m)

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> URL: https://issues.apache.org/jira/browse/BEAM-5518
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.8.0
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Failure when running: 
> ./gradlew :beam-website:testWebsite
> Error includes:
> - ./generated-content/blog/2017/02/01/graduation-media-recap.html
>   *  External link 
> https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html
>  failed: response code 0 means something's wrong.
>              It's possible libcurl couldn't connect to the server or perhaps 
> the request timed out.
>              Sometimes, making too many requests at once also breaks things.
>              Either way, the return message (if any) from the server is: Peer 
> certificate cannot be authenticated with given CA certificates
> rake aborted!
> HTML-Proofer found 1 failure!
>  
> Also fails when running:
> curl -v 
> https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5518?focusedWorklogId=148740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148740
 ]

ASF GitHub Bot logged work on BEAM-5518:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:36
Start Date: 27/Sep/18 13:36
Worklog Time Spent: 10m 
  Work Description: alanmyrvold opened a new pull request #6502: 
[BEAM-5518] Ignore failing ssl validation of globenewswire
URL: https://github.com/apache/beam/pull/6502
 
 
   Fix failure when running:
   ./gradlew :beam-website:testWebsite
   
   And shown in https://builds.apache.org/job/beam_PostCommit_Website_Publish/13
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148740)
Time Spent: 10m
Remaining Estimate: 0h

> :beam-website:testWebsite fails due to validation of ssl cert for 
> globenewswire.com
> ---
>
> Key: BEAM-5518
> 

[jira] [Created] (BEAM-5518) :beam-website:testWebsite fails due to validation of ssl cert for globenewswire.com

2018-09-27 Thread Alan Myrvold (JIRA)
Alan Myrvold created BEAM-5518:
--

 Summary: :beam-website:testWebsite fails due to validation of ssl 
cert for globenewswire.com
 Key: BEAM-5518
 URL: https://issues.apache.org/jira/browse/BEAM-5518
 Project: Beam
  Issue Type: Bug
  Components: website
Affects Versions: 2.8.0
Reporter: Alan Myrvold
Assignee: Alan Myrvold


Failure when running: 

./gradlew :beam-website:testWebsite

Error includes:

- ./generated-content/blog/2017/02/01/graduation-media-recap.html

  *  External link 
https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html
 failed: response code 0 means something's wrong.

             It's possible libcurl couldn't connect to the server or perhaps 
the request timed out.

             Sometimes, making too many requests at once also breaks things.

             Either way, the return message (if any) from the server is: Peer 
certificate cannot be authenticated with given CA certificates

rake aborted!

HTML-Proofer found 1 failure!

 

Also fails when running:

curl -v 
https://globenewswire.com/news-release/2017/01/10/904692/0/en/The-Apache-Software-Foundation-Announces-Apache-Beam-as-a-Top-Level-Project.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5420) BigtableIO tries to get runtime parameters when collecting display data at pipeline construction time

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5420?focusedWorklogId=148737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148737
 ]

ASF GitHub Bot logged work on BEAM-5420:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:24
Start Date: 27/Sep/18 13:24
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 commented on issue #6501: [BEAM-5420] When 
getting display data from a runtime parameter, don't call get()
URL: https://github.com/apache/beam/pull/6501#issuecomment-425090432
 
 
   @chamikaramj, could you take a look? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148737)
Time Spent: 0.5h  (was: 20m)

> BigtableIO tries to get runtime parameters when collecting display data at 
> pipeline construction time
> -
>
> Key: BEAM-5420
> URL: https://issues.apache.org/jira/browse/BEAM-5420
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Si
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For example: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfig.java#L165]
> At Dataflow pipeline construction time calling getProjectId() gives an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5420) BigtableIO tries to get runtime parameters when collecting display data at pipeline construction time

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5420?focusedWorklogId=148735=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148735
 ]

ASF GitHub Bot logged work on BEAM-5420:


Author: ASF GitHub Bot
Created on: 27/Sep/18 13:16
Start Date: 27/Sep/18 13:16
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 opened a new pull request #6501: 
[BEAM-5420] When getting display data from a runtime parameter, don't call get()
URL: https://github.com/apache/beam/pull/6501
 
 
   When getting display data from a runtime parameter, don't call get()
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148735)
Time Spent: 20m  (was: 10m)

> BigtableIO tries to get runtime parameters when collecting display data at 
> pipeline construction time
> -
>
> Key: BEAM-5420
> URL: https://issues.apache.org/jira/browse/BEAM-5420
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kevin Si
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableConfig.java#L165]
> At Dataflow pipeline construction time calling getProjectId() gives an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5413) Add method for defining composite transforms as lambda expressions

2018-09-27 Thread Jeff Klukas (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Klukas resolved BEAM-5413.
---
Resolution: Fixed

This has been merged.

> Add method for defining composite transforms as lambda expressions
> --
>
> Key: BEAM-5413
> URL: https://issues.apache.org/jira/browse/BEAM-5413
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: 2.8.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Defining a composite transform today requires writing a full named subclass 
> of PTransform (as [the programming guide 
> documents|https://beam.apache.org/documentation/programming-guide/#composite-transforms]
>  but there are cases where users may want to define a fairly trivial 
> composite transform using a less verbose Java 8 lambda expression.
> Consider an example where the user has defined MyDeserializationTransform 
> that attempts to deserialize byte arrays into some object, returning a 
> PCollectionTuple  with tags for successfully deserialized records (mainTag) 
> and for errors (errorTag).
> If we introduce a PTransform::compose method that takes in a 
> SerializableFunction, the user can handle errors in a small lambda expression:
>  
> {code:java}
> byteArrays
>     .apply("attempt to deserialize messages", 
> new MyDeserializationTransform())
>     .apply("write deserialization errors",
>     PTransform.compose((PCollectionTuple input) -> {
>     input
>   .get(errorTag)
>   .apply(new MyErrorOutputTransform());
>     return input.get(mainTag);
>     })
>     .apply("more processing on the deserialized messages", 
>  new MyOtherTransform())
> {code}
> This style allows a more concise and fluent pipeline definition than is 
> currently possible.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5413) Add method for defining composite transforms as lambda expressions

2018-09-27 Thread Jeff Klukas (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630377#comment-16630377
 ] 

Jeff Klukas commented on BEAM-5413:
---

This was merged in with @Experimental annotation. It's my understanding it will 
be included with 2.8.0.

> Add method for defining composite transforms as lambda expressions
> --
>
> Key: BEAM-5413
> URL: https://issues.apache.org/jira/browse/BEAM-5413
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeff Klukas
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: 2.8.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Defining a composite transform today requires writing a full named subclass 
> of PTransform (as [the programming guide 
> documents|https://beam.apache.org/documentation/programming-guide/#composite-transforms]
>  but there are cases where users may want to define a fairly trivial 
> composite transform using a less verbose Java 8 lambda expression.
> Consider an example where the user has defined MyDeserializationTransform 
> that attempts to deserialize byte arrays into some object, returning a 
> PCollectionTuple  with tags for successfully deserialized records (mainTag) 
> and for errors (errorTag).
> If we introduce a PTransform::compose method that takes in a 
> SerializableFunction, the user can handle errors in a small lambda expression:
>  
> {code:java}
> byteArrays
>     .apply("attempt to deserialize messages", 
> new MyDeserializationTransform())
>     .apply("write deserialization errors",
>     PTransform.compose((PCollectionTuple input) -> {
>     input
>   .get(errorTag)
>   .apply(new MyErrorOutputTransform());
>     return input.get(mainTag);
>     })
>     .apply("more processing on the deserialized messages", 
>  new MyOtherTransform())
> {code}
> This style allows a more concise and fluent pipeline definition than is 
> currently possible.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4496) Create Jenkins job to push generated HTML to asf-site branch

2018-09-27 Thread Scott Wegner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wegner closed BEAM-4496.
--
   Resolution: Fixed
Fix Version/s: 2.8.0

> Create Jenkins job to push generated HTML to asf-site branch
> 
>
> Key: BEAM-4496
> URL: https://issues.apache.org/jira/browse/BEAM-4496
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system, website
>Reporter: Scott Wegner
>Assignee: Alan Myrvold
>Priority: Major
>  Labels: beam-site-automation-reliability
> Fix For: 2.8.0
>
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5500) Portable python sdk worker leaks memory in streaming mode

2018-09-27 Thread Robert Bradshaw (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630296#comment-16630296
 ] 

Robert Bradshaw commented on BEAM-5500:
---

I've tried running this repeatedly in the direct runner, and am only finding 
the aforementioned leak of ~300 bytes per stage per bundle (which, yes, needs 
to be resolved). How many bundles/second are you processing? 

> Portable python sdk worker leaks memory in streaming mode
> -
>
> Key: BEAM-5500
> URL: https://issues.apache.org/jira/browse/BEAM-5500
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Micah Wylde
>Assignee: Robert Bradshaw
>Priority: Major
> Attachments: chart.png
>
>
> When using the portable python sdk with flink in streaming mode, we see that 
> the python worker processes steadily increase memory usage until they are OOM 
> killed. This behavior is consistent across various kinds of streaming 
> pipelines, including those with fixed windows and global windows.
> A simple wordcount-like pipeline demonstrates the issue for us (note this is 
> run on the [Lyft beam fork|https://github.com/lyft/beam/], which provides 
> access to kinesis as a portable streaming source):
> {code:java}
> counts = (p
> | 'Kinesis' >> FlinkKinesisInput().with_stream('test-stream')
> | 'decode' >> beam.FlatMap(decode) # parses from json into python objs
> | 'pair_with_one' >> beam.Map(lambda x: (x["event_name"], 1))
> | 'window' >> beam.WindowInto(window.GlobalWindows(),
>   trigger=AfterProcessingTime(15 * 1000),
>   accumulation_mode=AccumulationMode.DISCARDING)
> | 'group' >> beam.GroupByKey()
> | 'count' >> beam.Map(count_ones)
> | beam.Map(lambda x: logging.warn("count: %s", str(x)) or x))
> {code}
> When run, we see a steady increase in memory usage in the sdk_worker process. 
> Using [heapy|http://guppy-pe.sourceforge.net/#Heapy] I've analyzed the memory 
> usage over time and found that it's largely dicts and strings (see attached 
> chart).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #157

2018-09-27 Thread Apache Jenkins Server
See 


--
[...truncated 51.23 MB...]
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (1/16) 
(a7cc7ea94d0a7ed3741c1da6471c1b6b).
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
314b035f323d32d475bf74a9582515d6.
[ToKeyedWorkItem (1/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (1/16) 
(a7cc7ea94d0a7ed3741c1da6471c1b6b) [FINISHED]
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
cd68f8ff7bff809c5fad12754514110e.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
aec835121b108095a746168c2a33ec13.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
9a45794ebe0fb838a72fddbb954a34c8.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(afa9b8a22c3f17dd345c952716691be7) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
d2ddc4147f52066c732ec6860ee01a37.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
2c4b3a065d63ccfafe25b47073de24f1.
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
ToKeyedWorkItem (10/16) (f0ea7380521359cefa3bb4bb8bcb6fda) switched from 
RUNNING to FINISHED.
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Freeing task resources for ToKeyedWorkItem (10/16) 
(f0ea7380521359cefa3bb4bb8bcb6fda).
[ToKeyedWorkItem (10/16)] INFO org.apache.flink.runtime.taskmanager.Task - 
Ensuring all FileSystem streams are closed for task ToKeyedWorkItem (10/16) 
(f0ea7380521359cefa3bb4bb8bcb6fda) [FINISHED]
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
1395f0910c2ec761f41eea394d8382aa.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (16/16) 
(ebc444a1c101622126873ad5a8be371d) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
86def6485f8fadf572998d60082e3548.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
f88f6db2c73ec7660535810f33533442.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 
0ec171fea3e204b783df36fd55a51261.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (3/16) 
(9bd8cf462d21260ce123fb44e1d10027) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
a8af47f8e6bb0e268fee4a9735eceacf.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
0a791f0c4b8b183a765537066baa4600.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task 

Build failed in Jenkins: beam_PreCommit_Website_Cron #110

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-5270] Fix ToString coder to return bytes objects in Python 3.

--
[...truncated 8.25 KB...]
> Task :buildSrc:assemble
Skipping task ':buildSrc:assemble' as it has no actions.
:buildSrc:assemble (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
completed. Took 0.0 secs.
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
started.

> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
completed. Took 1.4 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) completed. Took 0.033 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) completed. Took 0.003 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) completed. Took 0.002 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
11,5,main]) completed. Took 0.002 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 7,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 7,5,main]) completed. 
Took 0.004 secs.
:buildSrc:check (Thread[Task worker for 

Build failed in Jenkins: beam_PostCommit_Website_Publish #13

2018-09-27 Thread Apache Jenkins Server
See 


--
[...truncated 8.95 KB...]
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 1.361 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.022 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
2,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 2,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 2,5,main]) completed. 
Took 0.002 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 2,5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 2,5,main]) 
completed. Took 0.0 secs.
:buildSrc:build (Thread[Task worker for ':buildSrc' Thread 2,5,main]) started.

> Task :buildSrc:build
Skipping task ':buildSrc:build' as it has no actions.
:buildSrc:build 

Build failed in Jenkins: beam_PostCommit_Python_VR_Flink #156

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-5270] Fix ToString coder to return bytes objects in Python 3.

--
[...truncated 51.27 MB...]
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
55c99867982a86d7e0802b2aedc6f0dd.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (10/16) 
(aff0aaa40d562f8c26adf701ce163985) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
dcfe569ded1dba7f83e9aaab7df32a42.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (5/16) 
(e4baa3f626a79a12e1a15054ed9643bc) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
a877246d81a2102eddcae785104cec54.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
fffad82ea6672e1ab5c76b9dc6c33e50.
[flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering task and 
sending final execution state FINISHED to JobManager for task ToKeyedWorkItem 
cd066ea14f80305c1844c712aac90dff.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (6/16) 
(deb74a745e76ac8d58b38987b227a11d) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (9/16) 
(88b34d563f260a8c33d1d03cadb0856f) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(12/16) (705af756ed83af98221f0954f1ef8059) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (3/16) 
(c3e41be8de581e71fd81edebd4914e84) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (15/16) 
(e3d7da277fe28766ef80ee080c8b220e) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(14/16) (c17975142216f690d7acd1fc520302fc) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - GroupByKey -> 
24GroupByKey/GroupByWindow.None/beam:env:docker:v1:0 (13/16) 
(dc7279ebf8e8befacd2f91fc9901400f) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (9/16) 
(536221b89c5a0b0c46de19af37bac888) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (6/16) 
(8c1e7aef95c0c4e6dc28952f607a2ac5) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(15/16) (0fcbc9326cbcfb05764bb27b1085d42b) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(13/16) (4332d705ac4e0c370aac1c4ce80b5c13) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(16/16) (834933a53c0a24fb18c5f29782506f59) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem 
(11/16) (55c99867982a86d7e0802b2aedc6f0dd) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (4/16) 
(dcfe569ded1dba7f83e9aaab7df32a42) switched from RUNNING to FINISHED.
[flink-akka.actor.default-dispatcher-5] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - ToKeyedWorkItem (2/16) 

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148691
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 11:16
Start Date: 27/Sep/18 11:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220884037
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,60 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a*x + b over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a*x + b over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a * xs + b - ys
+  s2 = sum(errs**2) / (n - 2)
+  if s2 == 0:
+# It's an exact fit!
+return a, b
+  h = (sum_x2 - 2 * sum_x * xs + n * xs**2) / (n * sum_x2 - sum_x**2)
+  cook_ds = 0.5 / s2 * errs**2 * (h / (1 - h)**2)
+
+  # Re-compute the regression, excluding those points with Cook's distance
+  # greater than 1.
+  b, a = np.polyfit(xs, ys, 1, w=cook_ds < 1)
 
 Review comment:
   Reconciled. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148691)
Time Spent: 4h 10m  (was: 4h)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division 

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148690
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 11:16
Start Date: 27/Sep/18 11:16
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220883853
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,60 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a*x + b over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a*x + b over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a * xs + b - ys
+  s2 = sum(errs**2) / (n - 2)
 
 Review comment:
   The typical definition takes into account the number of degrees of freedom. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148690)
Time Spent: 4h  (was: 3h 50m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from 

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148689
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 11:14
Start Date: 27/Sep/18 11:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220883501
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -316,17 +357,22 @@ def next_batch_size(self):
 last_batch_size = self._data[-1][0]
 cap = min(last_batch_size * self._MAX_GROWTH_FACTOR, self._max_batch_size)
 
+target = self._max_batch_size
+
 if self._target_batch_duration_secs:
   # Solution to a + b*x = self._target_batch_duration_secs.
-  cap = min(cap, (self._target_batch_duration_secs - a) / b)
+  target = min(target, (self._target_batch_duration_secs - a) / b)
 
 if self._target_batch_overhead:
   # Solution to a / (a + b*x) = self._target_batch_overhead.
-  cap = min(cap, (a / b) * (1 / self._target_batch_overhead - 1))
+  target = min(target, (a / b) * (1 / self._target_batch_overhead - 1))
 
-# Avoid getting stuck at min_batch_size.
+# Avoid getting stuck.
 jitter = len(self._data) % 2
-return int(max(self._min_batch_size + jitter, cap))
+if len(self._data) > 10:
+  target += int(target * self._variance * 2 * (random.random() - .5))
+
+return int(max(self._min_batch_size + jitter, min(target, cap)))
 
 Review comment:
   Clarified. The choice of approximation isn't important here (and I don't 
want to give the impression that it's a non-linear polynomial fit.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148689)
Time Spent: 3h 50m  (was: 3h 40m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148687
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 11:11
Start Date: 27/Sep/18 11:11
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220882847
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,68 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data).
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+n = float(len(xs))
 
 Review comment:
   Right. Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148687)
Time Spent: 3h 40m  (was: 3.5h)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Website_Publish #12

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[robertwb] [BEAM-5270] Fix ToString coder to return bytes objects in Python 3.

--
[...truncated 8.99 KB...]
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 1.495 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.033 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 4,5,main]) 
completed. Took 0.002 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
4,5,main]) completed. Took 0.016 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
7,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
7,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 7,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 7,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 7,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 7,5,main]) completed. 
Took 0.005 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 7,5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 7,5,main]) 
completed. Took 0.0 secs.
:buildSrc:build (Thread[Task worker for ':buildSrc' Thread 8,5,main]) 

[beam] 01/01: Merge pull request #6497 [BEAM-5270] Fix ToString coder to return bytes objects in Python 3.

2018-09-27 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit cd5f3536665f36daaf0aa11755e12c2d88446630
Merge: 1ffba44 a36dbde
Author: Robert Bradshaw 
AuthorDate: Thu Sep 27 13:06:49 2018 +0200

Merge pull request #6497 [BEAM-5270] Fix ToString coder to return bytes 
objects in Python 3.

[BEAM-5270] Fix ToString coder to return bytes objects in Python 3.

 sdks/python/apache_beam/coders/coder_impl.py |  4 
 sdks/python/apache_beam/coders/coders.py | 19 ---
 2 files changed, 16 insertions(+), 7 deletions(-)



[beam] branch master updated (1ffba44 -> cd5f353)

2018-09-27 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 1ffba44  Merge pull request #6443: Updating BigQuerySink pydoc
 add a36dbde  [BEAM-5270] Fix ToString coder to return bytes objects in 
Python 3.
 new cd5f353  Merge pull request #6497 [BEAM-5270] Fix ToString coder to 
return bytes objects in Python 3.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/coders/coder_impl.py |  4 
 sdks/python/apache_beam/coders/coders.py | 19 ---
 2 files changed, 16 insertions(+), 7 deletions(-)



[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=148686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148686
 ]

ASF GitHub Bot logged work on BEAM-5270:


Author: ASF GitHub Bot
Created on: 27/Sep/18 11:06
Start Date: 27/Sep/18 11:06
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #6497: [BEAM-5270] Fix 
ToString coder to return bytes objects in Python 3.
URL: https://github.com/apache/beam/pull/6497
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/coders/coder_impl.py 
b/sdks/python/apache_beam/coders/coder_impl.py
index 6fd9b169ed6..eb6f9a1e510 100644
--- a/sdks/python/apache_beam/coders/coder_impl.py
+++ b/sdks/python/apache_beam/coders/coder_impl.py
@@ -197,6 +197,10 @@ def get_estimated_size_and_observables(self, value, 
nested=False):
 
 return self.estimate_size(value, nested), []
 
+  def __repr__(self):
+return 'CallbackCoderImpl[encoder=%s, decoder=%s]' % (
+self._encoder, self._decoder)
+
 
 class DeterministicFastPrimitivesCoderImpl(CoderImpl):
   """For internal use only; no backwards-compatibility guarantees."""
diff --git a/sdks/python/apache_beam/coders/coders.py 
b/sdks/python/apache_beam/coders/coders.py
index ad4edbbb374..f0ed6dcbeb9 100644
--- a/sdks/python/apache_beam/coders/coders.py
+++ b/sdks/python/apache_beam/coders/coders.py
@@ -22,6 +22,7 @@
 from __future__ import absolute_import
 
 import base64
+import sys
 from builtins import object
 
 import google.protobuf.wrappers_pb2
@@ -314,13 +315,17 @@ def is_deterministic(self):
 class ToStringCoder(Coder):
   """A default string coder used if no sink coder is specified."""
 
-  def encode(self, value):
-try:   # Python 2
-  if isinstance(value, unicode):   # pylint: disable=unicode-builtin
-return value.encode('utf-8')
-except NameError:  # Python 3
-  pass
-return str(value)
+  if sys.version_info.major == 2:
+
+def encode(self, value):
+  # pylint: disable=unicode-builtin
+  return (value.encode('utf-8') if isinstance(value, unicode)  # noqa: F821
+  else str(value))
+
+  else:
+
+def encode(self, value):
+  return value if isinstance(value, bytes) else str(value).encode('utf-8')
 
   def decode(self, _):
 raise NotImplementedError('ToStringCoder cannot be used for decoding.')


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148686)
Time Spent: 3h 10m  (was: 3h)

> Finish Python 3 porting for coders module
> -
>
> Key: BEAM-5270
> URL: https://issues.apache.org/jira/browse/BEAM-5270
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148625
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220826243
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,68 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data).
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit over all points.
+n = len(xs)
+sum_x = sum(xs)
+sum_y = sum(ys)
+xbar = sum_x / n
+ybar = sum_y / n
+b = sum((xs - xbar) * (ys - ybar)) / sum((xs - xbar)**2)
+a = ybar - b * xbar
+
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  sum_x2 = sum(xs**2)
+  errs = a * xs + b - ys
+  s2 = sum(errs**2) / (n - 2)
+  if s2 == 0:
+# It's an exact fit!
+return a, b
+  h = (sum_x2 - 2 * sum_x * xs + n * xs**2) / (n * sum_x2 - sum_x**2)
+  cook_ds = 0.5 / s2 * errs**2 * (h / (1 - h)**2)
 
 Review comment:
   Thanks, cross-checked :-). Did you also compute h by pen-and-paper or there 
is a faster way? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148625)
Time Spent: 3h 10m  (was: 3h)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> 

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148622
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220659176
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,68 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data).
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+n = float(len(xs))
 
 Review comment:
   Thanks. I think the comment should say `y = a + bx` here and in line 297 
below according to current naming?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148622)
Time Spent: 2h 50m  (was: 2h 40m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148627
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220795308
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,60 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a*x + b over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a*x + b over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a * xs + b - ys
+  s2 = sum(errs**2) / (n - 2)
+  if s2 == 0:
+# It's an exact fit!
+return a, b
+  h = (sum_x2 - 2 * sum_x * xs + n * xs**2) / (n * sum_x2 - sum_x**2)
+  cook_ds = 0.5 / s2 * errs**2 * (h / (1 - h)**2)
+
+  # Re-compute the regression, excluding those points with Cook's distance
+  # greater than 1.
+  b, a = np.polyfit(xs, ys, 1, w=cook_ds < 1)
 
 Review comment:
   Nit: Excluding those with D _greater_ than one, so w should be cook_ds  <= 1?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148627)
Time Spent: 3.5h  (was: 3h 20m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become 

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148623
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220792493
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -316,17 +357,22 @@ def next_batch_size(self):
 last_batch_size = self._data[-1][0]
 cap = min(last_batch_size * self._MAX_GROWTH_FACTOR, self._max_batch_size)
 
+target = self._max_batch_size
+
 if self._target_batch_duration_secs:
   # Solution to a + b*x = self._target_batch_duration_secs.
-  cap = min(cap, (self._target_batch_duration_secs - a) / b)
+  target = min(target, (self._target_batch_duration_secs - a) / b)
 
 if self._target_batch_overhead:
   # Solution to a / (a + b*x) = self._target_batch_overhead.
-  cap = min(cap, (a / b) * (1 / self._target_batch_overhead - 1))
+  target = min(target, (a / b) * (1 / self._target_batch_overhead - 1))
 
-# Avoid getting stuck at min_batch_size.
+# Avoid getting stuck.
 jitter = len(self._data) % 2
-return int(max(self._min_batch_size + jitter, cap))
+if len(self._data) > 10:
+  target += int(target * self._variance * 2 * (random.random() - .5))
+
+return int(max(self._min_batch_size + jitter, min(target, cap)))
 
 Review comment:
   Thanks, that helps. Consider the wording: ```...which would not allow us to 
do the approximation with a polynomial.```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148623)
Time Spent: 3h  (was: 2h 50m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148621
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220659438
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -316,17 +357,22 @@ def next_batch_size(self):
 last_batch_size = self._data[-1][0]
 cap = min(last_batch_size * self._MAX_GROWTH_FACTOR, self._max_batch_size)
 
+target = self._max_batch_size
+
 if self._target_batch_duration_secs:
   # Solution to a + b*x = self._target_batch_duration_secs.
-  cap = min(cap, (self._target_batch_duration_secs - a) / b)
+  target = min(target, (self._target_batch_duration_secs - a) / b)
 
 if self._target_batch_overhead:
   # Solution to a / (a + b*x) = self._target_batch_overhead.
 
 Review comment:
   Thanks, I missed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148621)
Time Spent: 2h 50m  (was: 2h 40m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact likely comes from changes in 
> the logic that depends on  how division is evaluated, not from the 
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids 
> regression seems to be to preserve the existing Python 2 behavior using 
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean 
> up the logic. We may want to add a targeted microbenchmark to evaluate 
> performance of this code, and maybe cythonize the code, since it seems to be 
> performance-sensitive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148624
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220791614
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,60 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a*x + b over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a*x + b over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a * xs + b - ys
 
 Review comment:
   Shouldn't this be `a + b *xs - ys` since we have ` b, a = np.polyfit(xs, ys, 
1) ` and `polyfit` returns coefficient for highest power first? The test starts 
to fail with this change though...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148624)
Time Spent: 3h  (was: 2h 50m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I 

[jira] [Work logged] (BEAM-4858) Clean up _BatchSizeEstimator in element-batching transform.

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=148626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148626
 ]

ASF GitHub Bot logged work on BEAM-4858:


Author: ASF GitHub Bot
Created on: 27/Sep/18 07:59
Start Date: 27/Sep/18 07:59
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6375: 
[BEAM-4858] Clean up division in batch size estimator.
URL: https://github.com/apache/beam/pull/6375#discussion_r220792049
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -269,23 +270,60 @@ def record_time(self, batch_size):
 self._thin_data()
 
   def _thin_data(self):
-sorted_data = sorted(self._data)
-odd_one_out = [sorted_data[-1]] if len(sorted_data) % 2 == 1 else []
-# Sort the pairs by how different they are.
-
-def div_keys(kv1_kv2):
-  (x1, _), (x2, _) = kv1_kv2
-  return old_div(x2, x1) # TODO(BEAM-4858)
-
-pairs = sorted(zip(sorted_data[::2], sorted_data[1::2]),
-   key=div_keys)
-# Keep the top 1/3 most different pairs, average the top 2/3 most similar.
-threshold = 2 * len(pairs) // 3
-self._data = (
-list(sum(pairs[threshold:], ()))
-+ [((x1 + x2) / 2.0, (t1 + t2) / 2.0)
-   for (x1, t1), (x2, t2) in pairs[:threshold]]
-+ odd_one_out)
+# Make sure we don't change the parity of len(self._data)
+# As it's used below to alternate jitter.
+self._data.pop(random.randrange(len(self._data) // 4))
+self._data.pop(random.randrange(len(self._data) // 2))
+
+  @staticmethod
+  def linear_regression_no_numpy(xs, ys):
+# Least squares fit for y = a*x + b over all points.
+n = float(len(xs))
+xbar = sum(xs) / n
+ybar = sum(ys) / n
+b = (sum([(x - xbar) * (y - ybar) for x, y in zip(xs, ys)])
+ / sum([(x - xbar)**2 for x in xs]))
+a = ybar - b * xbar
+return a, b
+
+  @staticmethod
+  def linear_regression_numpy(xs, ys):
+# pylint: disable=wrong-import-order, wrong-import-position
+import numpy as np
+from numpy import sum
+xs = np.asarray(xs, dtype=float)
+ys = np.asarray(ys, dtype=float)
+
+# First do a simple least squares fit for y = a*x + b over all points.
+b, a = np.polyfit(xs, ys, 1)
+
+n = len(xs)
+if n < 10:
+  return a, b
+else:
+  # Refine this by throwing out outliers, according to Cook's distance.
+  # https://en.wikipedia.org/wiki/Cook%27s_distance
+  sum_x = sum(xs)
+  sum_x2 = sum(xs**2)
+  errs = a * xs + b - ys
+  s2 = sum(errs**2) / (n - 2)
 
 Review comment:
   s2 is mean squared error, right? Why do we divide over `n-2` as opposed to 
`n`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148626)
Time Spent: 3h 20m  (was: 3h 10m)

> Clean up _BatchSizeEstimator in element-batching transform.
> ---
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729] 
> non-trivial performance-sensitive logic in element-batching transform. Let's 
> take a look at 
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>  
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on 
> the type of the keys - whether they are integers or floats. 
> The keys of key-value pairs contained in {{self._data}} are added as integers 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
>  however, when we 'thin' the collected entries 
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
>  the keys will become floats. Surprisingly, using either integer or float 
> division consistently [in the 
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
>   negatively affects the performance of a custom pipeline I was using to 
> benchmark these changes. The performance impact 

[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=148603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148603
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 27/Sep/18 06:43
Start Date: 27/Sep/18 06:43
Worklog Time Spent: 10m 
  Work Description: joar commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220808845
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem.py
 ##
 @@ -531,24 +530,117 @@ def _list(self, dir_or_prefix):
 """
 raise NotImplementedError
 
+  @staticmethod
+  def _split_scheme(url_or_path):
+match = re.match(r'(^[a-z]+)://(.*)', url_or_path)
+if match is not None:
+  return match.groups()
+return None, url_or_path
+
+  @staticmethod
+  def _combine_scheme(scheme, path):
+if scheme is None:
+  return path
+return '{}://{}'.format(scheme, path)
+
   def _url_dirname(self, url_or_path):
 """Like posixpath.dirname, but preserves scheme:// prefix.
 
 Args:
   url_or_path: A string in the form of scheme://some/path OR /some/path.
 """
-match = re.match(r'([a-z]+://)(.*)', url_or_path)
-if match is None:
-  return posixpath.dirname(url_or_path)
-url_prefix, path = match.groups()
-return url_prefix + posixpath.dirname(path)
+scheme, path = self._split_scheme(url_or_path)
+return self._combine_scheme(scheme, posixpath.dirname(path))
+
+  def match_files(self, file_metas, pattern):
+"""Filter :class:`FileMetadata` objects by :data:`pattern`
+
+Args:
+  file_metas (:obj:`list` of :class:`FileMetadata`):
+Files to consider when matching
+  pattern (str): File pattern
+
+See Also:
+  :meth:`translate_pattern`
+
+Returns:
+  Generator of matching :class:`FileMetadata`
+"""
+re_pattern = re.compile(self.translate_pattern(pattern))
+match = re_pattern.match
+for file_metadata in file_metas:
+  is_match = match(file_metadata.path)
+  logger.debug('%r %r', is_match, file_metadata)
 
 Review comment:
   Fixed in 
https://github.com/apache/beam/pull/6423/commits/660797e60779b9825776d372fb39900538445a71


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148603)
Time Spent: 5h  (was: 4h 50m)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=148602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148602
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 27/Sep/18 06:43
Start Date: 27/Sep/18 06:43
Worklog Time Spent: 10m 
  Work Description: joar commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220808845
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem.py
 ##
 @@ -531,24 +530,117 @@ def _list(self, dir_or_prefix):
 """
 raise NotImplementedError
 
+  @staticmethod
+  def _split_scheme(url_or_path):
+match = re.match(r'(^[a-z]+)://(.*)', url_or_path)
+if match is not None:
+  return match.groups()
+return None, url_or_path
+
+  @staticmethod
+  def _combine_scheme(scheme, path):
+if scheme is None:
+  return path
+return '{}://{}'.format(scheme, path)
+
   def _url_dirname(self, url_or_path):
 """Like posixpath.dirname, but preserves scheme:// prefix.
 
 Args:
   url_or_path: A string in the form of scheme://some/path OR /some/path.
 """
-match = re.match(r'([a-z]+://)(.*)', url_or_path)
-if match is None:
-  return posixpath.dirname(url_or_path)
-url_prefix, path = match.groups()
-return url_prefix + posixpath.dirname(path)
+scheme, path = self._split_scheme(url_or_path)
+return self._combine_scheme(scheme, posixpath.dirname(path))
+
+  def match_files(self, file_metas, pattern):
+"""Filter :class:`FileMetadata` objects by :data:`pattern`
+
+Args:
+  file_metas (:obj:`list` of :class:`FileMetadata`):
+Files to consider when matching
+  pattern (str): File pattern
+
+See Also:
+  :meth:`translate_pattern`
+
+Returns:
+  Generator of matching :class:`FileMetadata`
+"""
+re_pattern = re.compile(self.translate_pattern(pattern))
+match = re_pattern.match
+for file_metadata in file_metas:
+  is_match = match(file_metadata.path)
+  logger.debug('%r %r', is_match, file_metadata)
 
 Review comment:
   Adressed in 
https://github.com/apache/beam/pull/6423/commits/660797e60779b9825776d372fb39900538445a71


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148602)
Time Spent: 4h 50m  (was: 4h 40m)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5417) FileSystems.match behaviour diff between GCS and local file system

2018-09-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5417?focusedWorklogId=148601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148601
 ]

ASF GitHub Bot logged work on BEAM-5417:


Author: ASF GitHub Bot
Created on: 27/Sep/18 06:42
Start Date: 27/Sep/18 06:42
Worklog Time Spent: 10m 
  Work Description: joar commented on a change in pull request #6423: 
[BEAM-5417] Parity between GCS and local match
URL: https://github.com/apache/beam/pull/6423#discussion_r220808666
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystem_test.py
 ##
 @@ -109,8 +113,58 @@ def _flatten_match(self, match_results):
 for match_result in match_results
 for file_metadata in match_result.metadata_list]
 
-  def test_match_glob(self):
-bucket_name = 'gcsio-test'
+  @parameterized.expand([
+  ('**/*', all),
 
 Review comment:
   - `**` would match a string without a slash.
   - `**/*` wouldn't.
   
   I've added a test case to illustrate this: 
https://github.com/apache/beam/pull/6423/commits/437fc9afe779c21ee61357b6f3cb3c23311ec7eb#diff-ae713b87ea5350e5ff107a7620d45f2aR118


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 148601)
Time Spent: 4h 40m  (was: 4.5h)

> FileSystems.match behaviour diff between GCS and local file system
> --
>
> Key: BEAM-5417
> URL: https://issues.apache.org/jira/browse/BEAM-5417
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Joar Wandborg
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Given the directory structure:
>  
> {noformat}
> .
> ├── filesystem-match-test
> │   ├── a
> │   │   └── file.txt
> │   └── b
> │   └── file.txt
> └── filesystem-match-test.py
> {noformat}
>  
> Where {{filesystem-match-test.py}} contains:
> {code:python}
> from __future__ import print_function
> import os
> import posixpath
> from apache_beam.io.filesystem import MatchResult
> from apache_beam.io.filesystems import FileSystems
> BASES = [
> os.path.join(os.path.dirname(__file__), "./"),
> "gs://my-bucket/test/",
> ]
> pattern = "filesystem-match-test/*/file.txt"
> for base_path in BASES:
> full_pattern = posixpath.join(base_path, pattern)
> print("full_pattern: {}".format(full_pattern))
> match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
> print("metadata list: {}".format(match_result.metadata_list))
> {code}
> Running {{python filesystem-match-test.py}} does not match any files locally, 
> but does match files on GCS:
> {noformat}
> full_pattern: ./filesystem-match-test/*/file.txt
> metadata list: []
> full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
> metadata list: 
> [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), 
> FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
> {noformat}
> The expected result is that a/file.txt and b/file.txt should be matched for 
> both patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_TFRecordIOIT #1032

2018-09-27 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PreCommit_Website_Cron #109

2018-09-27 Thread Apache Jenkins Server
See 


Changes:

[pablo] Updating BigQuerySink pydoc

[ehudm] Pass --pubsubRooturl option to Dataflow runner.

[github] fix typo

[github] Adding clarification to use WriteToBigQuery

[yifanzou] [BEAM-5339] Modify the dependency tool based on new JIRA policy.

[github] Fix trailing whitespace

--
[...truncated 8.85 KB...]
> Task :buildSrc:spotlessGroovy
file or directory 
'
 not found
file or directory 
'
 not found
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 1.379 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.001 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.031 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.003 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.003 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
5,5,main]) completed. Took 0.002 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 5,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 5,5,main]) completed. 
Took 0.002 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 5,5,main]) started.

> Task :buildSrc:check
Skipping task 

Build failed in Jenkins: beam_PostCommit_Website_Publish #11

2018-09-27 Thread Apache Jenkins Server
See 


--
[...truncated 9.09 KB...]
file or directory 
'
 not found
Caching disabled for task ':buildSrc:spotlessGroovy': Caching has not been 
enabled for the task
Task ':buildSrc:spotlessGroovy' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovy'.
file or directory 
'
 not found
:buildSrc:spotlessGroovy (Thread[Task worker for ':buildSrc' Thread 11,5,main]) 
completed. Took 1.539 secs.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) started.

> Task :buildSrc:spotlessGroovyCheck
Skipping task ':buildSrc:spotlessGroovyCheck' as it has no actions.
:buildSrc:spotlessGroovyCheck (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) started.

> Task :buildSrc:spotlessGroovyGradle
Caching disabled for task ':buildSrc:spotlessGroovyGradle': Caching has not 
been enabled for the task
Task ':buildSrc:spotlessGroovyGradle' is not up-to-date because:
  No history is available.
All input files are considered out-of-date for incremental task 
':buildSrc:spotlessGroovyGradle'.
:buildSrc:spotlessGroovyGradle (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) completed. Took 0.033 secs.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) started.

> Task :buildSrc:spotlessGroovyGradleCheck
Skipping task ':buildSrc:spotlessGroovyGradleCheck' as it has no actions.
:buildSrc:spotlessGroovyGradleCheck (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) completed. Took 0.0 secs.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
started.

> Task :buildSrc:spotlessCheck
Skipping task ':buildSrc:spotlessCheck' as it has no actions.
:buildSrc:spotlessCheck (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
completed. Took 0.0 secs.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
started.

> Task :buildSrc:compileTestJava NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestJava' as it has no source files and no 
previous output files.
:buildSrc:compileTestJava (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
completed. Took 0.001 secs.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) started.

> Task :buildSrc:compileTestGroovy NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:compileTestGroovy' as it has no source files and no 
previous output files.
:buildSrc:compileTestGroovy (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) completed. Took 0.001 secs.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) started.

> Task :buildSrc:processTestResources NO-SOURCE
file or directory 
'
 not found
Skipping task ':buildSrc:processTestResources' as it has no source files and no 
previous output files.
:buildSrc:processTestResources (Thread[Task worker for ':buildSrc' Thread 
9,5,main]) completed. Took 0.001 secs.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
started.

> Task :buildSrc:testClasses UP-TO-DATE
Skipping task ':buildSrc:testClasses' as it has no actions.
:buildSrc:testClasses (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
completed. Took 0.0 secs.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 9,5,main]) started.

> Task :buildSrc:test NO-SOURCE
Skipping task ':buildSrc:test' as it has no source files and no previous output 
files.
:buildSrc:test (Thread[Task worker for ':buildSrc' Thread 9,5,main]) completed. 
Took 0.002 secs.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 9,5,main]) started.

> Task :buildSrc:check
Skipping task ':buildSrc:check' as it has no actions.
:buildSrc:check (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
completed. Took 0.0 secs.
:buildSrc:build (Thread[Task worker for ':buildSrc' Thread 9,5,main]) started.

> Task :buildSrc:build
Skipping task ':buildSrc:build' as it has no actions.
:buildSrc:build (Thread[Task worker for ':buildSrc' Thread 9,5,main]) 
completed. Took 0.0 secs.
Settings evaluated using settings file 
'
Using local directory build cache for the root build 

<    1   2