[jira] [Work logged] (BEAM-3979) New DoFn should allow injecting of all parameters in ProcessContext

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3979?focusedWorklogId=94937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94937
 ]

ASF GitHub Bot logged work on BEAM-3979:


Author: ASF GitHub Bot
Created on: 25/Apr/18 05:20
Start Date: 25/Apr/18 05:20
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #4989: [BEAM-3979] Start 
completing the new DoFn vision: plumb context parameters into process functions.
URL: https://github.com/apache/beam/pull/4989#issuecomment-384163795
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94937)
Time Spent: 4h  (was: 3h 50m)

> New DoFn should allow injecting of all parameters in ProcessContext
> ---
>
> Key: BEAM-3979
> URL: https://issues.apache.org/jira/browse/BEAM-3979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This was intended in the past, but never completed. Ideally all primitive 
> parameters in ProcessContext should be injectable, and OutputReceiver 
> parameters can be used to collection output. So, we should be able to write a 
> DoFn as follows
> @ProcessElement
> public void process(@Element String word, OutputReceiver receiver) {
>   receiver.output(word.toUpperCase());
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3979) New DoFn should allow injecting of all parameters in ProcessContext

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3979?focusedWorklogId=94931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94931
 ]

ASF GitHub Bot logged work on BEAM-3979:


Author: ASF GitHub Bot
Created on: 25/Apr/18 04:42
Start Date: 25/Apr/18 04:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #4989: [BEAM-3979] Start 
completing the new DoFn vision: plumb context parameters into process functions.
URL: https://github.com/apache/beam/pull/4989#issuecomment-384158968
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94931)
Time Spent: 3h 50m  (was: 3h 40m)

> New DoFn should allow injecting of all parameters in ProcessContext
> ---
>
> Key: BEAM-3979
> URL: https://issues.apache.org/jira/browse/BEAM-3979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This was intended in the past, but never completed. Ideally all primitive 
> parameters in ProcessContext should be injectable, and OutputReceiver 
> parameters can be used to collection output. So, we should be able to write a 
> DoFn as follows
> @ProcessElement
> public void process(@Element String word, OutputReceiver receiver) {
>   receiver.output(word.toUpperCase());
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4171) Python BigQuery source fails for timestamps < year 1900

2018-04-24 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-4171:


 Summary: Python BigQuery source fails for timestamps < year 1900
 Key: BEAM-4171
 URL: https://issues.apache.org/jira/browse/BEAM-4171
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Chamikara Jayalath


This seems to be due to Python SDK using 'date.strftime' which has a known 
limitation.

[https://bugs.python.org/issue1777412]

This has been fixed for Python 3.3+ 

 

We might want to consider one of the workaround mentioned in the below post to 
fix BQ for Python 2.7.

[https://stackoverflow.com/questions/10263956/use-datetime-strftime-on-years-before-1900-require-year-1900]

 

This affects both Direct and Dataflow runners.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4096) BigQueryIO ValueProvider support for NumFileShards and Triggering Frequency

2018-04-24 Thread Jan Peuker (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Peuker updated BEAM-4096:
-
Summary: BigQueryIO ValueProvider support for NumFileShards and Triggering 
Frequency  (was: BigQueryIO ValueProvider support for Method and Triggering 
Frequency)

> BigQueryIO ValueProvider support for NumFileShards and Triggering Frequency
> ---
>
> Key: BEAM-4096
> URL: https://issues.apache.org/jira/browse/BEAM-4096
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Ryan McDowell
>Assignee: Jan Peuker
>Priority: Minor
> Fix For: 2.5.0
>
>
> Enhance BigQueryIO to accept ValueProviders for:
>  * withTriggeringFrequency(..)
>  * withNumFileShards(..)
> It would allow Dataflow templates to accept these parameters at runtime 
> instead of being hardcoded. This opens up the ability to create Dataflow 
> templates which allow users to flip back-and-forth between batch and 
> streaming inserts.
> withMethod(..) cannot be changed at runtime currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4096) BigQueryIO ValueProvider support for NumFileShards and Triggering Frequency

2018-04-24 Thread Jan Peuker (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451598#comment-16451598
 ] 

Jan Peuker commented on BEAM-4096:
--

Removed Method and will focus on Shards (correcting the Javadoc as well) and 
Triggering Frequency

> BigQueryIO ValueProvider support for NumFileShards and Triggering Frequency
> ---
>
> Key: BEAM-4096
> URL: https://issues.apache.org/jira/browse/BEAM-4096
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Ryan McDowell
>Assignee: Jan Peuker
>Priority: Minor
> Fix For: 2.5.0
>
>
> Enhance BigQueryIO to accept ValueProviders for:
>  * withTriggeringFrequency(..)
>  * withNumFileShards(..)
> It would allow Dataflow templates to accept these parameters at runtime 
> instead of being hardcoded. This opens up the ability to create Dataflow 
> templates which allow users to flip back-and-forth between batch and 
> streaming inserts.
> withMethod(..) cannot be changed at runtime currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4096) BigQueryIO ValueProvider support for Method and Triggering Frequency

2018-04-24 Thread Jan Peuker (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Peuker updated BEAM-4096:
-
Description: 
Enhance BigQueryIO to accept ValueProviders for:
 * withTriggeringFrequency(..)
 * withNumFileShards(..)

It would allow Dataflow templates to accept these parameters at runtime instead 
of being hardcoded. This opens up the ability to create Dataflow templates 
which allow users to flip back-and-forth between batch and streaming inserts.

withMethod(..) cannot be changed at runtime currently.

  was:
Enhance BigQueryIO to accept ValueProviders for:
 * withMethod(..)
 * withTriggeringFrequency(..)
 * withNumFileShards(..)

It would allow Dataflow templates to accept these parameters at runtime instead 
of being hardcoded. This opens up the ability to create Dataflow templates 
which allow users to flip back-and-forth between batch and streaming inserts.


> BigQueryIO ValueProvider support for Method and Triggering Frequency
> 
>
> Key: BEAM-4096
> URL: https://issues.apache.org/jira/browse/BEAM-4096
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Ryan McDowell
>Assignee: Jan Peuker
>Priority: Minor
> Fix For: 2.5.0
>
>
> Enhance BigQueryIO to accept ValueProviders for:
>  * withTriggeringFrequency(..)
>  * withNumFileShards(..)
> It would allow Dataflow templates to accept these parameters at runtime 
> instead of being hardcoded. This opens up the ability to create Dataflow 
> templates which allow users to flip back-and-forth between batch and 
> streaming inserts.
> withMethod(..) cannot be changed at runtime currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3514) Use portable WindowIntoPayload in DataflowRunner

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451589#comment-16451589
 ] 

Kenneth Knowles commented on BEAM-3514:
---

I feel like I saw an identical JIRA recently. But I know you dealt with some 
aspect of this recently.

> Use portable WindowIntoPayload in DataflowRunner
> 
>
> Key: BEAM-3514
> URL: https://issues.apache.org/jira/browse/BEAM-3514
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)



[jira] [Resolved] (BEAM-3715) Gradle build of hadoop-input-format is trying (and often failing) to fetch optional dependencies of elasticsearch-hadoop

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-3715.
---
   Resolution: Fixed
Fix Version/s: Not applicable

I'm not sure this was completely resolved, but I do not know of it blocking the 
build right now.

> Gradle build of hadoop-input-format is trying (and often failing) to fetch 
> optional dependencies of elasticsearch-hadoop
> 
>
> Key: BEAM-3715
> URL: https://issues.apache.org/jira/browse/BEAM-3715
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_ValidatesContainer_Dataflow #114

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Stop implementing EvaluatorFactory in Registry

[Pablo] Python Metrics now rely on StateSampler state.

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[apilloud] [SQL] Embed BeamSqlTable in BeamCalciteTable

[owenzhang1990] [BEAM-4129] Run WordCount example on Gearpump runner with Gradle

[aromanenko.dev] [BEAM-4066] Moved anonymous classes into inner ones

[sidhom] [BEAM-4149] Set worker id to "" if it is not set in the request header

[sidhom] [BEAM-3327] Refactor ControlClientPool to allow client multiplexing

[sidhom] [BEAM-3327] Basic Docker environment factory

[sidhom] Fix python lint error

[robertwb] [BEAM-4097] Set environment for Python sdk function specs.

[robertwb] Logging around default docker image environment.

[iemejia] [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[kedin] Add JsonToRow Transform

[kedin] Convert JsonToRow from MapElements.via() to ParDo

[wcn] Allow request and init hooks to update the context.

--
[...truncated 1.03 KB...]
Fetching upstream changes from https://github.com/apache/beam.git
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9c2b43227e1ddac39676f6c09aca1af82a9d4cdb (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9c2b43227e1ddac39676f6c09aca1af82a9d4cdb
Commit message: "Merge pull request #5120: [BEAM-4160] Add JsonToRow transform"
 > git rev-list --no-walk 0f2ba71e1b6db88ed79744e363586a8ff16dcb08 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PostCommit_Python_ValidatesContainer_Dataflow] $ /bin/bash -xe 
/tmp/jenkins8141324131341984009.sh
+ cd src
+ bash sdks/python/container/run_validatescontainer.sh

# pip install --user installation location.
LOCAL_PATH=$HOME/.local/bin/

# Where to store integration test outputs.
GCS_LOCATION=gs://temp-storage-for-end-to-end-tests

# Project for the container and integration test
PROJECT=apache-beam-testing

# Verify in the root of the repository
test -d sdks/python/container

# Verify docker and gcloud commands exist
command -v docker
/usr/bin/docker
command -v gcloud
/usr/bin/gcloud
docker -v
Docker version 17.05.0-ce, build 89658be
gcloud -v
Google Cloud SDK 191.0.0
alpha 2018.02.23
beta 2018.02.23
bq 2.0.29
core 2018.02.23
gsutil 4.28

# ensure gcloud is version 186 or above
TMPDIR=$(mktemp -d)
mktemp -d
gcloud_ver=$(gcloud -v | head -1 | awk '{print $4}')
gcloud -v | head -1 | awk '{print $4}'
if [[ "$gcloud_ver" < "186" ]]
then
  pushd $TMPDIR
  curl 
https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-186.0.0-linux-x86_64.tar.gz
 --output gcloud.tar.gz
  tar xf gcloud.tar.gz
  ./google-cloud-sdk/install.sh --quiet
  . ./google-cloud-sdk/path.bash.inc
  popd
  gcloud components update --quiet || echo 'gcloud components update failed'
  gcloud -v
fi

# Build the container
TAG=$(date +%Y%m%d-%H%M%S)
date +%Y%m%d-%H%M%S
CONTAINER=us.gcr.io/$PROJECT/$USER/python
echo "Using container $CONTAINER"
Using container us.gcr.io/apache-beam-testing/jenkins/python
./gradlew :beam-sdks-python-container:docker 
-Pdocker-repository-root=us.gcr.io/$PROJECT/$USER -Pdocker-tag=$TAG
Parallel execution with configuration on demand is an incubating feature.
Applying build_rules.gradle to beam
createPerformanceTestHarness with default configuration for project beam
Adding 48 .gitignore exclusions to Apache Rat
Applying build_rules.gradle to beam-sdks-python-container
applyGoNature with default configuration for project beam-sdks-python-container
applyDockerNature with default configuration for project 
beam-sdks-python-container
containerImageName with [name:python] for project beam-sdks-python-container
Applying build_rules.gradle to beam-sdks-go
applyGoNature with default configuration for project beam-sdks-go
:beam-sdks-go:prepare
:beam-sdks-python-container:prepare
Use project 

[jira] [Assigned] (BEAM-3514) Use portable WindowIntoPayload in DataflowRunner

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3514:
-

Assignee: Henning Rohde  (was: Kenneth Knowles)

> Use portable WindowIntoPayload in DataflowRunner
> 
>
> Key: BEAM-3514
> URL: https://issues.apache.org/jira/browse/BEAM-3514
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Henning Rohde
>Priority: Major
>  Labels: portability
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3546) Fn API metrics in Dataflow

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-3546.
---
   Resolution: Fixed
Fix Version/s: 2.5.0

> Fn API metrics in Dataflow
> --
>
> Key: BEAM-3546
> URL: https://issues.apache.org/jira/browse/BEAM-3546
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Labels: portability
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4114) Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4114:
-

Assignee: Robin Trietsch  (was: Kenneth Knowles)

> Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()
> --
>
> Key: BEAM-4114
> URL: https://issues.apache.org/jira/browse/BEAM-4114
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-join-library
>Affects Versions: 2.4.0
>Reporter: Robin Trietsch
>Assignee: Robin Trietsch
>Priority: Major
>
> When using the 
> [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-],
>  a checkNotNull() is done for the 
> [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207]
>  and 
> [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].
> However, it makes more sense to allow null values, since sometimes, if the 
> key used for the join is not the same, you'd like to see that the value will 
> become null. This should be decided by the developer, and not by the join 
> library.
> Looking at the source code, this is also supported by 
> [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42]
>  (it allows null values), which is used in Join.fullOuterJoin().
> If required, I can create a pull request on GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4114) Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-4114:
--
Component/s: (was: sdk-java-core)
 sdk-java-join-library

> Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()
> --
>
> Key: BEAM-4114
> URL: https://issues.apache.org/jira/browse/BEAM-4114
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-join-library
>Affects Versions: 2.4.0
>Reporter: Robin Trietsch
>Assignee: Robin Trietsch
>Priority: Major
>
> When using the 
> [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-],
>  a checkNotNull() is done for the 
> [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207]
>  and 
> [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].
> However, it makes more sense to allow null values, since sometimes, if the 
> key used for the join is not the same, you'd like to see that the value will 
> become null. This should be decided by the developer, and not by the join 
> library.
> Looking at the source code, this is also supported by 
> [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42]
>  (it allows null values), which is used in Join.fullOuterJoin().
> If required, I can create a pull request on GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4114) Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451581#comment-16451581
 ] 

Kenneth Knowles commented on BEAM-4114:
---

Your point makes sense. I don't know enough details to know if there is some 
reason it causes a problem, but I agree with the reasoning. We'd love a PR if 
you can put one together.

> Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()
> --
>
> Key: BEAM-4114
> URL: https://issues.apache.org/jira/browse/BEAM-4114
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Robin Trietsch
>Assignee: Kenneth Knowles
>Priority: Major
>
> When using the 
> [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-],
>  a checkNotNull() is done for the 
> [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207]
>  and 
> [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].
> However, it makes more sense to allow null values, since sometimes, if the 
> key used for the join is not the same, you'd like to see that the value will 
> become null. This should be decided by the developer, and not by the join 
> library.
> Looking at the source code, this is also supported by 
> [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42]
>  (it allows null values), which is used in Join.fullOuterJoin().
> If required, I can create a pull request on GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3863:
-

Assignee: Aljoscha Krettek  (was: Kenneth Knowles)

> AfterProcessingTime trigger doesn't fire reliably
> -
>
> Key: BEAM-3863
> URL: https://issues.apache.org/jira/browse/BEAM-3863
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Pawel Bartoszek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Issue*
> Beam AfterProcessingTime trigger doesn't fire always reliably after a 
> configured delay.
> The following job triggers should fire after watermark passes the end of the 
> window and then every 5 seconds for late data and the finally at the end of 
> allowed lateness.
> *Expected behaviour*
> Late firing after processing time trigger should fire after 5 seconds since 
> first late records arrive in the pane.
> *Actual behaviour*
> From my testings late triggers works for some keys but not for the other - 
> it's pretty random which keys are affected. The DummySource generates 15 
> distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one 
> late record. In case late trigger firing is missed it won't fire until the 
> allowed lateness period. 
> *Job code*
> {code:java}
> String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"};
> FlinkPipelineOptions options = 
> PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class);
> Pipeline pipeline = Pipeline.create(options);
> PCollection apply = pipeline.apply(Read.from(new DummySource()))
> 
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow()
> .withLateFirings(
> AfterProcessingTime
> 
> .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5
> .accumulatingFiredPanes()
> .withAllowedLateness(Duration.standardMinutes(2), 
> Window.ClosingBehavior.FIRE_IF_NON_EMPTY)
> );
> apply.apply(Count.perElement())
> .apply(ParDo.of(new DoFn, Long>() {
> @ProcessElement
> public void process(ProcessContext context, BoundedWindow window) 
> {
> LOG.info("Count: {}. For window {}, Pane {}", 
> context.element(), window, context.pane());
> }
> }));
> pipeline.run().waitUntilFinish();{code}
>  
> *How can you replicate the issue?*
>  I've created a github repo 
> [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown 
> above. Please check out the README file for details how to replicate the 
> issue.
> *What's is causing the issue?*
> I explained the cause in PR.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451578#comment-16451578
 ] 

Kenneth Knowles commented on BEAM-3863:
---

It looks like Flink. At some point there was a problem with off-by-one in terms 
of when timers are delivered and the shouldFire() returning true.

> AfterProcessingTime trigger doesn't fire reliably
> -
>
> Key: BEAM-3863
> URL: https://issues.apache.org/jira/browse/BEAM-3863
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Pawel Bartoszek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Issue*
> Beam AfterProcessingTime trigger doesn't fire always reliably after a 
> configured delay.
> The following job triggers should fire after watermark passes the end of the 
> window and then every 5 seconds for late data and the finally at the end of 
> allowed lateness.
> *Expected behaviour*
> Late firing after processing time trigger should fire after 5 seconds since 
> first late records arrive in the pane.
> *Actual behaviour*
> From my testings late triggers works for some keys but not for the other - 
> it's pretty random which keys are affected. The DummySource generates 15 
> distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one 
> late record. In case late trigger firing is missed it won't fire until the 
> allowed lateness period. 
> *Job code*
> {code:java}
> String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"};
> FlinkPipelineOptions options = 
> PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class);
> Pipeline pipeline = Pipeline.create(options);
> PCollection apply = pipeline.apply(Read.from(new DummySource()))
> 
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow()
> .withLateFirings(
> AfterProcessingTime
> 
> .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5
> .accumulatingFiredPanes()
> .withAllowedLateness(Duration.standardMinutes(2), 
> Window.ClosingBehavior.FIRE_IF_NON_EMPTY)
> );
> apply.apply(Count.perElement())
> .apply(ParDo.of(new DoFn, Long>() {
> @ProcessElement
> public void process(ProcessContext context, BoundedWindow window) 
> {
> LOG.info("Count: {}. For window {}, Pane {}", 
> context.element(), window, context.pane());
> }
> }));
> pipeline.run().waitUntilFinish();{code}
>  
> *How can you replicate the issue?*
>  I've created a github repo 
> [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown 
> above. Please check out the README file for details how to replicate the 
> issue.
> *What's is causing the issue?*
> I explained the cause in PR.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3863) AfterProcessingTime trigger doesn't fire reliably

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3863:
--
Component/s: (was: sdk-java-core)
 runner-flink

> AfterProcessingTime trigger doesn't fire reliably
> -
>
> Key: BEAM-3863
> URL: https://issues.apache.org/jira/browse/BEAM-3863
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Pawel Bartoszek
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Issue*
> Beam AfterProcessingTime trigger doesn't fire always reliably after a 
> configured delay.
> The following job triggers should fire after watermark passes the end of the 
> window and then every 5 seconds for late data and the finally at the end of 
> allowed lateness.
> *Expected behaviour*
> Late firing after processing time trigger should fire after 5 seconds since 
> first late records arrive in the pane.
> *Actual behaviour*
> From my testings late triggers works for some keys but not for the other - 
> it's pretty random which keys are affected. The DummySource generates 15 
> distinct keys AA,BB,..., PP. For each key it sends 5 on time records and one 
> late record. In case late trigger firing is missed it won't fire until the 
> allowed lateness period. 
> *Job code*
> {code:java}
> String[] runnerArgs = {"--runner=FlinkRunner", "--parallelism=8"};
> FlinkPipelineOptions options = 
> PipelineOptionsFactory.fromArgs(runnerArgs).as(FlinkPipelineOptions.class);
> Pipeline pipeline = Pipeline.create(options);
> PCollection apply = pipeline.apply(Read.from(new DummySource()))
> 
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow()
> .withLateFirings(
> AfterProcessingTime
> 
> .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5
> .accumulatingFiredPanes()
> .withAllowedLateness(Duration.standardMinutes(2), 
> Window.ClosingBehavior.FIRE_IF_NON_EMPTY)
> );
> apply.apply(Count.perElement())
> .apply(ParDo.of(new DoFn, Long>() {
> @ProcessElement
> public void process(ProcessContext context, BoundedWindow window) 
> {
> LOG.info("Count: {}. For window {}, Pane {}", 
> context.element(), window, context.pane());
> }
> }));
> pipeline.run().waitUntilFinish();{code}
>  
> *How can you replicate the issue?*
>  I've created a github repo 
> [https://github.com/pbartoszek/BEAM-3863_late_trigger] with the code shown 
> above. Please check out the README file for details how to replicate the 
> issue.
> *What's is causing the issue?*
> I explained the cause in PR.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1189) Add guide for release verifiers in the release guide

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451570#comment-16451570
 ] 

Kenneth Knowles commented on BEAM-1189:
---

[~griscz] is this obsolete? Still happening? I am just going through old bugs.

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3830) Add validation spreadsheet to release guide on website

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451568#comment-16451568
 ] 

Kenneth Knowles commented on BEAM-3830:
---

I didn't actually add this - but is the situation change now that the testing 
coverage is so much better? Or, maybe this just sits unassigned.

> Add validation spreadsheet to release guide on website
> --
>
> Key: BEAM-3830
> URL: https://issues.apache.org/jira/browse/BEAM-3830
> Project: Beam
>  Issue Type: Wish
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Alan Myrvold
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-1189) Add guide for release verifiers in the release guide

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-1189:
-

Assignee: Griselda Cuevas Zambrano  (was: Kenneth Knowles)

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3830) Add validation spreadsheet to release guide on website

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3830:
-

Assignee: Alan Myrvold  (was: Kenneth Knowles)

> Add validation spreadsheet to release guide on website
> --
>
> Key: BEAM-3830
> URL: https://issues.apache.org/jira/browse/BEAM-3830
> Project: Beam
>  Issue Type: Wish
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Alan Myrvold
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3992) Use JSON-B instead of an hardcoded Jackson

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451566#comment-16451566
 ] 

Kenneth Knowles commented on BEAM-3992:
---

Seems nice. [~romain.manni-bucau] are you taking this on? Might be something to 
discuss on dev@.

> Use JSON-B instead of an hardcoded Jackson
> --
>
> Key: BEAM-3992
> URL: https://issues.apache.org/jira/browse/BEAM-3992
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> Currently beam uses jackson directly everywhere, it either messes up 
> environments where another version is here and can't be upgraded or it forces 
> to shade and have a fat app where it is not needed. Using JSON-B will allow 
> to switch the impl and beam will just rely on ~200k of API instead of megs of 
> dependencies making it smoother to integrated in all environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3992) Use JSON-B instead of an hardcoded Jackson

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3992:
-

Assignee: (was: Kenneth Knowles)

> Use JSON-B instead of an hardcoded Jackson
> --
>
> Key: BEAM-3992
> URL: https://issues.apache.org/jira/browse/BEAM-3992
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.4.0
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> Currently beam uses jackson directly everywhere, it either messes up 
> environments where another version is here and can't be upgraded or it forces 
> to shade and have a fat app where it is not needed. Using JSON-B will allow 
> to switch the impl and beam will just rely on ~200k of API instead of megs of 
> dependencies making it smoother to integrated in all environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1287) Give new DoFn the ability to output to a particular window

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451563#comment-16451563
 ] 

Kenneth Knowles commented on BEAM-1287:
---

This is covered by your recent work, right?

> Give new DoFn the ability to output to a particular window
> --
>
> Key: BEAM-1287
> URL: https://issues.apache.org/jira/browse/BEAM-1287
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
>Priority: Major
>
> The new {{DoFn}} design allows us to have specialized output receivers, such 
> as a key-preserving output (the default is non-key-preserving) or 
> non-window-preserving (the default is window-preserving) output. This JIRA is 
> for the latter, with an emphasis on making the two as analogous as we can.
> {code}
> new DoFn() {
>   @ProcessElement
>   public void processElement(ProcessContext c, OutputToWindow receiver) {
> receiver.outputWithTimestamp(value, timestamp, window);
>   }
> }
> {code}
> After this change, window assignment need not be a primitive.
> Why is this OK? The primary motivation for keeping windows strongly separated 
> is because they yield parallelism if we don't impose any requirement that 
> multiple windows for a single key be co-located or linearized. We should be 
> able to process a single key with millions of non-merging windows in parallel 
> without having to reify the windows (though this isn't _that_ bad). That is a 
> major change/improvement over the vague assumption that keys are the atom of 
> parallelism.
> This change will not remove this property, as it pertains to input and state. 
> The analogy with keys:
>  - Stateful DoFn requires the ability to access key-and-window state. For 
> some runners, perhaps this does not require colocation. For runners that want 
> to do this efficiently/locally, it means some key-and-window colocation 
> operation followed by only key-and-window preserving transforms. So 
> outputting to a new window breaks the invariant, just as a non-key-preserving 
> transform would. Until we had the new {{DoFn}} we couldn't know if 
> non-window-preserving output was used.
>  - Non-key-preserving output also breaks any idea that combined aggregates 
> are actually one per key, etc. So windows can work the same way.
>  - Timestamps are interesting. By analogy with keys, timestamps would be just 
> part of the value and able to change freely. This doesn't work so well 
> because of lateness. To avoid digging deeper into changing anything, this 
> proposal just suggests that a timestamp is provided, and whether it is 
> allowed to be late is governed by the same rules as {{outputWithTimestamp}}.
>  - Not clear if this has uses for merging windows.
> This change is entirely backwards compatible, but given that it removes a 
> primitive and is rather little effort, it might bear earlier consideration. 
> No work will begin until it is brought to the dev list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-1287) Give new DoFn the ability to output to a particular window

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-1287:
-

Assignee: Reuven Lax  (was: Kenneth Knowles)

> Give new DoFn the ability to output to a particular window
> --
>
> Key: BEAM-1287
> URL: https://issues.apache.org/jira/browse/BEAM-1287
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
>Priority: Major
>
> The new {{DoFn}} design allows us to have specialized output receivers, such 
> as a key-preserving output (the default is non-key-preserving) or 
> non-window-preserving (the default is window-preserving) output. This JIRA is 
> for the latter, with an emphasis on making the two as analogous as we can.
> {code}
> new DoFn() {
>   @ProcessElement
>   public void processElement(ProcessContext c, OutputToWindow receiver) {
> receiver.outputWithTimestamp(value, timestamp, window);
>   }
> }
> {code}
> After this change, window assignment need not be a primitive.
> Why is this OK? The primary motivation for keeping windows strongly separated 
> is because they yield parallelism if we don't impose any requirement that 
> multiple windows for a single key be co-located or linearized. We should be 
> able to process a single key with millions of non-merging windows in parallel 
> without having to reify the windows (though this isn't _that_ bad). That is a 
> major change/improvement over the vague assumption that keys are the atom of 
> parallelism.
> This change will not remove this property, as it pertains to input and state. 
> The analogy with keys:
>  - Stateful DoFn requires the ability to access key-and-window state. For 
> some runners, perhaps this does not require colocation. For runners that want 
> to do this efficiently/locally, it means some key-and-window colocation 
> operation followed by only key-and-window preserving transforms. So 
> outputting to a new window breaks the invariant, just as a non-key-preserving 
> transform would. Until we had the new {{DoFn}} we couldn't know if 
> non-window-preserving output was used.
>  - Non-key-preserving output also breaks any idea that combined aggregates 
> are actually one per key, etc. So windows can work the same way.
>  - Timestamps are interesting. By analogy with keys, timestamps would be just 
> part of the value and able to change freely. This doesn't work so well 
> because of lateness. To avoid digging deeper into changing anything, this 
> proposal just suggests that a timestamp is provided, and whether it is 
> allowed to be late is governed by the same rules as {{outputWithTimestamp}}.
>  - Not clear if this has uses for merging windows.
> This change is entirely backwards compatible, but given that it removes a 
> primitive and is rather little effort, it might bear earlier consideration. 
> No work will begin until it is brought to the dev list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3918) test failure BeamSqlDslAggregationTest.testTriggeredTumble

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles closed BEAM-3918.
-
   Resolution: Cannot Reproduce
Fix Version/s: Not applicable

At this point I think this is obsolete, right?

> test failure BeamSqlDslAggregationTest.testTriggeredTumble
> --
>
> Key: BEAM-3918
> URL: https://issues.apache.org/jira/browse/BEAM-3918
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, testing
>Reporter: Xu Mingmin
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>
> I cannot pass the test case added in 
> [https://github.com/apache/beam/pull/4826,] here's the error message: (run 
> with {{mvn clean install -pl sdks/java/extensions/sql/}})
> {code}
> [ERROR] Failures:
> [ERROR]   BeamSqlDslAggregationTest.testTriggeredTumble:384 Windowed 
> Query/BeamProjectRel.Transform/BEAMPROJECTREL_1566_149/ParMultiDo(BeamSqlProject).output:
> Expected: iterable over [ rowType=RowType{fieldNames=[f_int_sum], 
> fieldCoders=[org.apache.beam.sdk.extensions.sql.SqlTypeCoder$SqlIntegerCoder@436813f3]}}>,
>   fieldCoders=[org.apache.beam.sdk.extensions.sql.SqlTypeCoder$SqlIntegerCoder@436813f3]}}>,
>   fieldCoders=[org.apache.beam.sdk.extensions.sql.SqlTypeCoder$SqlIntegerCoder@436813f3]}}>]
>  in any order
>  but: No item matches:  rowType=RowType{fieldNames=[f_int_sum], 
> fieldCoders=[org.apache.beam.sdk.extensions.sql.SqlTypeCoder$SqlIntegerCoder@436813f3]}}>,
>   fieldCoders=[org.apache.beam.sdk.extensions.sql.SqlTypeCoder$SqlIntegerCoder@436813f3]}}>
>  in [ fieldCoders=[org.apache.beam.sdk.extensions.sql.SqlTypeCoder$SqlIntegerCoder@436813f3]}}>]
> {code}
> What it confuses me is, the failure is not found in precommit jobs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #182

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[kedin] Add JsonToRow Transform

[kedin] Convert JsonToRow from MapElements.via() to ParDo

--
[...truncated 19.37 MB...]

org.apache.beam.examples.cookbook.TriggerExampleTest > testExtractTotalFlow 
STANDARD_ERROR
Apr 25, 2018 2:40:23 AM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > 
testFilterSingleMonthDataFn STANDARD_ERROR
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > testProjectionFn 
STANDARD_ERROR
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractCountryInfoFn 
STANDARD_ERROR
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractEventDataFn 
STANDARD_ERROR
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.DebuggingWordCountTest > testDebuggingWordCount 
STANDARD_ERROR
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.io.FileBasedSource 
getEstimatedSizeBytes
INFO: Filepattern /tmp/junit120444639599036259/junit192480798379155287.tmp 
matched 1 files with total size 54
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern 
/tmp/junit120444639599036259/junit192480798379155287.tmp into bundles of size 3 
took 1 ms and produced 1 files and 18 bundles

org.apache.beam.examples.WordCountTest > testExtractWordsFn STANDARD_ERROR
Apr 25, 2018 2:40:24 AM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.subprocess.ExampleEchoPipelineTest > 
testExampleEchoPipeline STANDARD_ERROR
Apr 25, 2018 2:40:26 AM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-Echo7195164603381926811.sh 
Apr 25, 2018 2:40:26 AM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 25, 2018 2:40:26 AM org.apache.beam.examples.subprocess.utils.FileUtils 
copyFileFromGCSToWorker
INFO: Moving File /tmp/test-Echo7195164603381926811.sh to 
/tmp/test-Echoo8917452202130660514/test-Echo7195164603381926811.sh 
Apr 25, 2018 2:40:26 AM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-EchoAgain591204249350566431.sh 
Apr 25, 2018 2:40:26 AM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 25, 2018 2:40:26 AM org.apache.beam.examples.subprocess.utils.FileUtils 
copyFileFromGCSToWorker
INFO: Moving File /tmp/test-EchoAgain591204249350566431.sh to 
/tmp/test-Echoo8917452202130660514/test-EchoAgain591204249350566431.sh 

org.apache.beam.examples.complete.game.HourlyTeamScoreTest > 
testUserScoresFilter STANDARD_OUT
GOT user3_BananaEmu,BananaEmu,17,144796569,2015-11-19 12:41:31.053
GOT user0_MagentaKangaroo,MagentaKangaroo,4,144796569,2015-11-19 
12:41:31.053
GOT user18_BananaEmu,BananaEmu,1,144796569,2015-11-19 12:41:31.053
GOT user2_AmberCockatoo,AmberCockatoo,13,144796569,2015-11-19 
12:41:31.053
GOT user0_MagentaKangaroo,MagentaKangaroo,3,144795563,2015-11-19 
09:53:53.444
GOT user18_ApricotCaneToad,ApricotCaneToad,14,144796569,2015-11-19 
12:41:31.053
GOT user19_BisqueBilby,BisqueBilby,6,144795563,2015-11-19 09:53:53.444
GOT user18_BananaEmu,BananaEmu,7,144796569,2015-11-19 12:41:31.053
GOT 
user7_AndroidGreenKookaburra,AndroidGreenKookaburra,11,144795563,2015-11-19 
09:53:53.444
GOT 
user7_AndroidGreenKookaburra,AndroidGreenKookaburra,12,144795563,2015-11-19 
09:53:53.444
GOT user19_BisqueBilby,BisqueBilby,8,144795563,2015-11-19 09:53:53.444
GOT user13_ApricotQuokka,ApricotQuokka,15,144795563,2015-11-19 
09:53:53.444
GOT user7_AlmondWallaby,AlmondWallaby,15,144795563,2015-11-19 
09:53:53.444
GOT 

[jira] [Commented] (BEAM-4150) Standardize use of PCollection coder proto attribute

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451527#comment-16451527
 ] 

Kenneth Knowles commented on BEAM-4150:
---

I think this is resolved in favor of the element coder, leaving the manner of 
managing windowing metadata up to the runner, yes?

> Standardize use of PCollection coder proto attribute
> 
>
> Key: BEAM-4150
> URL: https://issues.apache.org/jira/browse/BEAM-4150
> Project: Beam
>  Issue Type: Task
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In some places it's expected to be a WindowedCoder, in others the raw 
> ElementCoder. We should use the same convention (decided in discussion to be 
> the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the 
> attached windowing strategy, and the input/output ports should specify the 
> encoding directly rather than read the adjacent PCollection coder fields. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4150) Standardize use of PCollection coder proto attribute

2018-04-24 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451528#comment-16451528
 ] 

Kenneth Knowles commented on BEAM-4150:
---

Feel free to grab and reopen.

> Standardize use of PCollection coder proto attribute
> 
>
> Key: BEAM-4150
> URL: https://issues.apache.org/jira/browse/BEAM-4150
> Project: Beam
>  Issue Type: Task
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In some places it's expected to be a WindowedCoder, in others the raw 
> ElementCoder. We should use the same convention (decided in discussion to be 
> the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the 
> attached windowing strategy, and the input/output ports should specify the 
> encoding directly rather than read the adjacent PCollection coder fields. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4150) Standardize use of PCollection coder proto attribute

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-4150.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Standardize use of PCollection coder proto attribute
> 
>
> Key: BEAM-4150
> URL: https://issues.apache.org/jira/browse/BEAM-4150
> Project: Beam
>  Issue Type: Task
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In some places it's expected to be a WindowedCoder, in others the raw 
> ElementCoder. We should use the same convention (decided in discussion to be 
> the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the 
> attached windowing strategy, and the input/output ports should specify the 
> encoding directly rather than read the adjacent PCollection coder fields. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3617) Restore proto round trip for Java DirectRunner (was: Performance degradation on the direct runner)

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3617:
-

Assignee: (was: Kenneth Knowles)

> Restore proto round trip for Java DirectRunner (was: Performance degradation 
> on the direct runner)
> --
>
> Key: BEAM-3617
> URL: https://issues.apache.org/jira/browse/BEAM-3617
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Jean-Baptiste Onofré
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Running Nexmark queries with the direct runner between Beam 2.2.0 and 2.3.0 
> shows a performance degradation:
> {code}
> 
>  Beam 2.2.0   Beam 2.3.0
>   Query  Runtime(sec) Runtime(sec)
> 
>      6.410.6
>   0001   5.110.2
>   0002   3.0 5.8
>   0003   3.8 6.2
>   0004   0.9 1.4
>   0005   5.811.4
>   0006   0.8 1.4
>   0007 193.8  1249.1
>   0008   3.9 6.9
>   0009   0.9 1.3
>   0010   6.4 8.2
>   0011   5.0 9.4
>   0012   4.7 9.1
> {code}
> We can see especially Query 7 that is 10 times longer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3934) BoundedReader should be closed in JavaReadViaImpulse#ReadFromBoundedSourceFn

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3934:
-

Assignee: Chamikara Jayalath  (was: Kenneth Knowles)

> BoundedReader should be closed in JavaReadViaImpulse#ReadFromBoundedSourceFn
> 
>
> Key: BEAM-3934
> URL: https://issues.apache.org/jira/browse/BEAM-3934
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ted Yu
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> {code}
> public void readSoruce(ProcessContext ctxt) throws IOException {
>   BoundedSource.BoundedReader reader =
>   ctxt.element().createReader(ctxt.getPipelineOptions());
>   for (boolean more = reader.start(); more; more = reader.advance()) {
> ctxt.outputWithTimestamp(reader.getCurrent(), 
> reader.getCurrentTimestamp());
>   }
> }
> {code}
> The BoundedSource.BoundedReader instance should be closed before returning 
> from the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3935) FileChannel instance should be closed in ArtifactServiceStager#StagingCallable#get

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3935:
-

Assignee: Ben Sidhom  (was: Kenneth Knowles)

> FileChannel instance should be closed in 
> ArtifactServiceStager#StagingCallable#get
> --
>
> Key: BEAM-3935
> URL: https://issues.apache.org/jira/browse/BEAM-3935
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ted Yu
>Assignee: Ben Sidhom
>Priority: Minor
>
> {code}
>   FileChannel channel = new FileInputStream(file).getChannel();
>   ByteBuffer readBuffer = ByteBuffer.allocate(bufferSize);
>   while (!responseObserver.isTerminal() && channel.position() < 
> channel.size()) {
> readBuffer.clear();
> channel.read(readBuffer);
> {code}
> The channel should be closed before returning from get()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3870) Jars contain duplicate class definitions

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3870:
-

Assignee: Henning Rohde  (was: Kenneth Knowles)

> Jars contain duplicate class definitions
> 
>
> Key: BEAM-3870
> URL: https://issues.apache.org/jira/browse/BEAM-3870
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-harness
>Affects Versions: 2.4.0
>Reporter: Nigel Kilmer
>Assignee: Henning Rohde
>Priority: Minor
>
> There are a large number of classes with definitions in both 
> beam-sdks-java-harness-2.4.0-SNAPSHOT.jar and either 
> beam-model-pipeline-2.4.0-SNAPSHOT.jar or 
> beam-model-fn-execution-2.4.0-SNAPSHOT.jar.
> Some randomly chosen examples of classes with this problem (out of a much 
> larger set):
> org/apache/beam/model/pipeline/v1/Endpoints
> org/apache/beam/model/pipeline/v1/RunnerApi
> org/apache/beam/model/pipeline/v1/StandardWindowFns
> org/apache/beam/model/fnexecution/v1/ProvisionApi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3915) Unclosed reader in JavaReadViaImpulse#ReadFromBoundedSourceFn

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3915:
-

Assignee: Chamikara Jayalath  (was: Kenneth Knowles)

> Unclosed reader in JavaReadViaImpulse#ReadFromBoundedSourceFn
> -
>
> Key: BEAM-3915
> URL: https://issues.apache.org/jira/browse/BEAM-3915
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ted Yu
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> {code}
>   static class ReadFromBoundedSourceFn extends DoFn {
> @ProcessElement
> public void readSoruce(ProcessContext ctxt) throws IOException {
>   BoundedSource.BoundedReader reader =
>   ctxt.element().createReader(ctxt.getPipelineOptions());
> {code}
> The reader should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4068) Consistent option specification between SDKs and runners by URN

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4068:
-

Assignee: (was: Kenneth Knowles)

> Consistent option specification between SDKs and runners by URN
> ---
>
> Key: BEAM-4068
> URL: https://issues.apache.org/jira/browse/BEAM-4068
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Priority: Minor
>
> Pipeline options are materialized differently by different SDKs. However, in 
> some cases, runners require Java-specific options that are not available 
> elsewhere. We should decide on well-known URNs and use them across SDKs where 
> applicable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3966) Move core utilities into a new top-level module

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3966:
-

Assignee: (was: Kenneth Knowles)

> Move core utilities into a new top-level module
> ---
>
> Key: BEAM-3966
> URL: https://issues.apache.org/jira/browse/BEAM-3966
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ben Sidhom
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As part of a longer-term dependency cleanup, fn-execution and similar 
> utilities should be moved into a new top-level module (util?) that can be 
> shared among runner and/or SDK code while clearly delineating the boundary 
> between runner and SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3138) Stop depending on Test JARs

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3138:
-

Assignee: (was: Kenneth Knowles)

> Stop depending on Test JARs
> ---
>
> Key: BEAM-3138
> URL: https://issues.apache.org/jira/browse/BEAM-3138
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, runner-core, sdk-java-core, sdk-java-harness
>Reporter: Thomas Groh
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Testing components can be in a testing or otherwise signaled package, but 
> shouldn't really be depended on by depending on a test jar in the test scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4148) Local server api descriptors contain urls that work on Mac and Linux

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4148:
-

Assignee: (was: Kenneth Knowles)

> Local server api descriptors contain urls that work on Mac and Linux
> 
>
> Key: BEAM-4148
> URL: https://issues.apache.org/jira/browse/BEAM-4148
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Ben Sidhom
>Priority: Minor
>
> Docker for Mac does not allow host networking and thus will not allow SDK 
> harnesses to access runner services via `localhost`. Instead, a special DNS 
> name is used to refer to the host machine: docker.for.mac.host.internal. 
> (Note that this value sometimes changes between Docker releases).
> We should attempt to detect the host operating system and return different 
> API descriptors based on this.
> See 
> [https://github.com/bsidhom/beam/commit/3adaeb0d33dc26f0910c1f8af2821cce4ee0b965]
>  for how this might be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4106) Merge staging file options between Dataflow runner and portable runner

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4106:
-

Assignee: (was: Kenneth Knowles)

> Merge staging file options between Dataflow runner and portable runner
> --
>
> Key: BEAM-4106
> URL: https://issues.apache.org/jira/browse/BEAM-4106
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-dataflow
>Reporter: Ben Sidhom
>Priority: Trivial
>
> Both runners (will) require staging file options. This should be refactored 
> such that the same options are used.
> Java pipelines stage all entries on the classpath (assuming a URLClassLoader 
> is used) by default. The new merged pipeline options should have similar 
> behavior and be documented as such.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4116) Add a timeout to artifact stager

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4116:
-

Assignee: (was: Kenneth Knowles)

> Add a timeout to artifact stager
> 
>
> Key: BEAM-4116
> URL: https://issues.apache.org/jira/browse/BEAM-4116
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Ben Sidhom
>Priority: Trivial
>
> ArtifactServiceStager currently blocks indefinitely while waiting for all 
> artifacts to stage. It would be nice to have a configurable timeout here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3784) Enhance Apache Beam interpreter for Apache Zeppelin

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3784:
-

Assignee: (was: Kenneth Knowles)

> Enhance Apache Beam interpreter for Apache Zeppelin
> ---
>
> Key: BEAM-3784
> URL: https://issues.apache.org/jira/browse/BEAM-3784
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: SQL, bigdata, cloud, gsoc2018, java
>
> Apache Zeppelin includes an integration with Apache Beam: 
> https://zeppelin.apache.org/docs/0.7.0/interpreter/beam.html
> How well does this work for interactive exploration? Can this be enhanced to 
> support Beam SQL? What about unbounded data? Let's find out by exploring the 
> existing interpreter and enhancing it particularly for streaming SQL.
> This project will require the ability to read, write, and run Java and SQL. 
> You will come out of it with familiarity with two Apache big data projects 
> and lots of ideas!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4104) Experiments should be output as DisplayData

2018-04-24 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-4104:
-

Assignee: (was: Kenneth Knowles)

> Experiments should be output as DisplayData
> ---
>
> Key: BEAM-4104
> URL: https://issues.apache.org/jira/browse/BEAM-4104
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Priority: Minor
>
> Currently, any experiments specified via "–experiment=.." from 
> ExperimentalOptions.java is not output as display data because the 
> ExperimentalOptions interface is marked @Hidden. This makes it more difficult 
> to understand what experiments were set when investigating a job.
> I don't see any reason why this interface should be hidden; I suggest 
> removing the annotation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (50a6304 -> 9c2b432)

2018-04-24 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 50a6304  Merge pull request #5204: Allow request hooks to update the 
context.
 add 46211c3  Add JsonToRow Transform
 add c208ce7  Convert JsonToRow from MapElements.via() to ParDo
 new 9c2b432  Merge pull request #5120: [BEAM-4160] Add JsonToRow transform

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/beam/sdk/transforms/JsonToRow.java  | 126 +
 .../apache/beam/sdk/transforms/JsonToRowTest.java  |  94 +++
 .../beam/sdk/extensions/sql/JsonToRowSqlTest.java  |  90 +++
 3 files changed, 310 insertions(+)
 create mode 100644 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
 create mode 100644 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/JsonToRowTest.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/JsonToRowSqlTest.java

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[beam] 01/01: Merge pull request #5120: [BEAM-4160] Add JsonToRow transform

2018-04-24 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 9c2b43227e1ddac39676f6c09aca1af82a9d4cdb
Merge: 50a6304 c208ce7
Author: Kenn Knowles 
AuthorDate: Tue Apr 24 19:09:58 2018 -0700

Merge pull request #5120: [BEAM-4160] Add JsonToRow transform

 .../org/apache/beam/sdk/transforms/JsonToRow.java  | 126 +
 .../apache/beam/sdk/transforms/JsonToRowTest.java  |  94 +++
 .../beam/sdk/extensions/sql/JsonToRowSqlTest.java  |  90 +++
 3 files changed, 310 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Work logged] (BEAM-4160) Convert JSON objects to Rows

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?focusedWorklogId=94919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94919
 ]

ASF GitHub Bot logged work on BEAM-4160:


Author: ASF GitHub Bot
Created on: 25/Apr/18 02:10
Start Date: 25/Apr/18 02:10
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #5120: [BEAM-4160] Add 
JsonToRow transform
URL: https://github.com/apache/beam/pull/5120
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
new file mode 100644
index 000..eb3011df4d4
--- /dev/null
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import java.io.IOException;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.util.RowJsonDeserializer;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Experimental
+ *
+ * Creates a {@link PTransform} to convert input JSON objects to {@link Row 
Rows}
+ * with given {@link Schema}.
+ *
+ * Currently supported {@link Schema} field types are:  {@link 
Schema.TypeName#BYTE}
+ * {@link Schema.TypeName#INT16} {@link 
Schema.TypeName#INT32} {@link
+ * Schema.TypeName#INT64} {@link Schema.TypeName#FLOAT} 
{@link
+ * Schema.TypeName#DOUBLE} {@link Schema.TypeName#BOOLEAN} 
{@link
+ * Schema.TypeName#STRING} 
+ *
+ * For specifics of JSON deserialization see {@link RowJsonDeserializer}.
+ *
+ * Conversion is strict, with minimal type coercion:
+ *
+ * Booleans are only parsed from {@code true} or {@code false} literals, 
not from {@code "true"}
+ * or {@code "false"} strings or any other values (exception is thrown in 
these cases).
+ *
+ * If a JSON number doesn't fit into the corresponding schema field type, 
an exception is be
+ * thrown. Strings are not auto-converted to numbers. Floating point numbers 
are not auto-converted
+ * to integral numbers. Precision loss also causes exceptions.
+ *
+ * Only JSON string values can be parsed into {@link TypeName#STRING}. 
Numbers, booleans are not
+ * automatically converted, exceptions are thrown in these cases.
+ *
+ * If a schema field is missing from the JSON value, an exception will be 
thrown.
+ *
+ * Explicit {@code null} literals are allowed in JSON objects. No other 
values are parsed into
+ * {@code null}.
+ */
+@Experimental
+public class JsonToRow {
+
+  public static PTransform 
withSchema(
+  Schema rowSchema) {
+return JsonToRowFn.forSchema(rowSchema);
+  }
+
+  static class JsonToRowFn extends PTransform {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+
+static JsonToRowFn forSchema(Schema rowSchema) {
+  return new JsonToRowFn(rowSchema);
+}
+
+private JsonToRowFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollection expand(PCollection jsonStrings) {
+  return jsonStrings
+  .apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  context.output(jsonToRow(context.element()));
+}
+  }))
+  .setCoder(schema.getRowCoder());
+}
+
+private Row jsonToRow(String jsonString) {
+   

[jira] [Work logged] (BEAM-4160) Convert JSON objects to Rows

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?focusedWorklogId=94917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94917
 ]

ASF GitHub Bot logged work on BEAM-4160:


Author: ASF GitHub Bot
Created on: 25/Apr/18 02:09
Start Date: 25/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#5120: [BEAM-4160] Add JsonToRow transform
URL: https://github.com/apache/beam/pull/5120#discussion_r183928514
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import java.io.IOException;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.util.RowJsonDeserializer;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Creates a {@link PTransform} to convert input JSON objects to {@link Row 
Rows} with given {@link
+ * Schema}.
+ *
+ * Currently supported {@link Schema} field types are:  {@link 
Schema.TypeName#BYTE}
+ * {@link Schema.TypeName#INT16} {@link 
Schema.TypeName#INT32} {@link
+ * Schema.TypeName#INT64} {@link Schema.TypeName#FLOAT} 
{@link
+ * Schema.TypeName#DOUBLE} {@link Schema.TypeName#BOOLEAN} 
{@link
+ * Schema.TypeName#STRING} 
+ *
+ * For specifics of JSON deserialization see {@link RowJsonDeserializer}.
+ *
+ * Conversion is strict, with minimal type coercion:
+ *
+ * Booleans are only parsed from {@code true} or {@code false} literals, 
not from {@code "true"}
+ * or {@code "false"} strings or any other values (exception is thrown in 
these cases).
+ *
+ * If a JSON number doesn't fit into the corresponding schema field type, 
an exception is be
+ * thrown. Strings are not auto-converted to numbers. Floating point numbers 
are not auto-converted
+ * to integral numbers. Precision loss also causes exceptions.
+ *
+ * Only JSON string values can be parsed into {@link TypeName#STRING}. 
Numbers, booleans are not
+ * automatically converted, exceptions are thrown in these cases.
+ *
+ * If a schema field is missing from the JSON value, an exception will be 
thrown.
+ *
+ * Explicit {@code null} literals are allowed in JSON objects. No other 
values are parsed into
+ * {@code null}.
+ */
+public class JsonToRow {
+
+  public static PTransform 
withSchema(
+  Schema rowSchema) {
+return JsonToRowFn.forSchema(rowSchema);
+  }
+
+  static class JsonToRowFn extends PTransform {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+
+static JsonToRowFn forSchema(Schema rowSchema) {
+  return new JsonToRowFn(rowSchema);
+}
+
+private JsonToRowFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollection expand(PCollection jsonStrings) {
+  return jsonStrings
+  .apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  context.output(jsonToRow(context.element()));
+}
+  }))
+  .setCoder(schema.getRowCoder());
+}
+
+private Row jsonToRow(String jsonString) {
+  try {
+return objectMapper().readValue(jsonString, Row.class);
+  } catch (IOException e) {
+throw new IllegalArgumentException("Unable to parse json object: " + 
jsonString, e);
+  }
+}
+
+private ObjectMapper objectMapper() {
+  if (this.objectMapper == null) {
+synchronized (this) {
+  if (this.objectMapper == null) {
+this.objectMapper = 
newObjectMapperWith(RowJsonDeserializer.forSchema(this.schema));
 
 Review comment:
   I think I meant 

[jira] [Work logged] (BEAM-4160) Convert JSON objects to Rows

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?focusedWorklogId=94918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94918
 ]

ASF GitHub Bot logged work on BEAM-4160:


Author: ASF GitHub Bot
Created on: 25/Apr/18 02:09
Start Date: 25/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#5120: [BEAM-4160] Add JsonToRow transform
URL: https://github.com/apache/beam/pull/5120#discussion_r183928530
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import java.io.IOException;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.util.RowJsonDeserializer;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Creates a {@link PTransform} to convert input JSON objects to {@link Row 
Rows} with given {@link
+ * Schema}.
+ *
+ * Currently supported {@link Schema} field types are:  {@link 
Schema.TypeName#BYTE}
+ * {@link Schema.TypeName#INT16} {@link 
Schema.TypeName#INT32} {@link
+ * Schema.TypeName#INT64} {@link Schema.TypeName#FLOAT} 
{@link
+ * Schema.TypeName#DOUBLE} {@link Schema.TypeName#BOOLEAN} 
{@link
+ * Schema.TypeName#STRING} 
+ *
+ * For specifics of JSON deserialization see {@link RowJsonDeserializer}.
+ *
+ * Conversion is strict, with minimal type coercion:
+ *
+ * Booleans are only parsed from {@code true} or {@code false} literals, 
not from {@code "true"}
+ * or {@code "false"} strings or any other values (exception is thrown in 
these cases).
+ *
+ * If a JSON number doesn't fit into the corresponding schema field type, 
an exception is be
+ * thrown. Strings are not auto-converted to numbers. Floating point numbers 
are not auto-converted
+ * to integral numbers. Precision loss also causes exceptions.
+ *
+ * Only JSON string values can be parsed into {@link TypeName#STRING}. 
Numbers, booleans are not
+ * automatically converted, exceptions are thrown in these cases.
+ *
+ * If a schema field is missing from the JSON value, an exception will be 
thrown.
+ *
+ * Explicit {@code null} literals are allowed in JSON objects. No other 
values are parsed into
+ * {@code null}.
+ */
+public class JsonToRow {
+
+  public static PTransform 
withSchema(
+  Schema rowSchema) {
+return JsonToRowFn.forSchema(rowSchema);
+  }
+
+  static class JsonToRowFn extends PTransform {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+
+static JsonToRowFn forSchema(Schema rowSchema) {
+  return new JsonToRowFn(rowSchema);
+}
+
+private JsonToRowFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollection expand(PCollection jsonStrings) {
+  return jsonStrings
+  .apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  context.output(jsonToRow(context.element()));
+}
+  }))
+  .setCoder(schema.getRowCoder());
+}
+
+private Row jsonToRow(String jsonString) {
+  try {
+return objectMapper().readValue(jsonString, Row.class);
+  } catch (IOException e) {
+throw new IllegalArgumentException("Unable to parse json object: " + 
jsonString, e);
+  }
+}
+
+private ObjectMapper objectMapper() {
+  if (this.objectMapper == null) {
+synchronized (this) {
+  if (this.objectMapper == null) {
+this.objectMapper = 
newObjectMapperWith(RowJsonDeserializer.forSchema(this.schema));
 
 Review comment:
   But it doesn't matter 

[jira] [Work logged] (BEAM-4113) Periodic cleanup and setup of Python environment on Jenkins machines

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4113?focusedWorklogId=94911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94911
 ]

ASF GitHub Bot logged work on BEAM-4113:


Author: ASF GitHub Bot
Created on: 25/Apr/18 01:53
Start Date: 25/Apr/18 01:53
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #5162: [BEAM-4113] Install 
latest pip/virtualenv in Gradle Python precommit.
URL: https://github.com/apache/beam/pull/5162#issuecomment-384136007
 
 
   Yep. Closing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94911)
Time Spent: 0.5h  (was: 20m)

> Periodic cleanup and setup of Python environment on Jenkins machines
> 
>
> Key: BEAM-4113
> URL: https://issues.apache.org/jira/browse/BEAM-4113
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Instead of having Python testing scripts tidy up the environment every time 
> they run, let's have a single script that runs once per day and does:
>  # Check if pip is installed under ~/.local and is the expected version. If 
> not, install the correct version.
>  # Same as above, but for virtualenv.
>  # Print a list of packages installed under ~/.local.
>  # Optional: Uninstall any unexpected packages under ~/.local. For example, 
> apache-beam shouldn't be installed under ~/.local, since 2 concurrent tests 
> on the same machine will attempt to install to the same location.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4113) Periodic cleanup and setup of Python environment on Jenkins machines

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4113?focusedWorklogId=94912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94912
 ]

ASF GitHub Bot logged work on BEAM-4113:


Author: ASF GitHub Bot
Created on: 25/Apr/18 01:53
Start Date: 25/Apr/18 01:53
Worklog Time Spent: 10m 
  Work Description: udim closed pull request #5162: [BEAM-4113] Install 
latest pip/virtualenv in Gradle Python precommit.
URL: https://github.com/apache/beam/pull/5162
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/build.gradle b/sdks/python/build.gradle
index 5984b35f791..7cb82062d7d 100644
--- a/sdks/python/build.gradle
+++ b/sdks/python/build.gradle
@@ -30,9 +30,38 @@ def tox_opts = "-c tox.ini --recreate"
 
 task setupVirtualenv {
   doLast {
-exec {
-  commandLine 'virtualenv', "${envdir}"
+if (System.getenv('JENKINS_URL') != null) {
+  // Upgrade pip and install virtualenv on a Jenkins machine.
+  exec {
+executable 'sh'
+args '-c', 'python -m pip install --user --upgrade pip'
+  }
+  exec {
+executable 'sh'
+args '-c', 'python -m pip install --user --upgrade virtualenv'
+  }
+  // Create virtual environment on a Jenkins machine.
+  exec {
+executable 'sh'
+args '-c', 'python --version; python -m pip --version; echo -n 
"virtualenv "; python -m virtualenv --version'
+  }
+  exec {
+executable 'sh'
+args '-c', "python -m virtualenv ${envdir}"
+  }
+} else {
+  // Create virtual environment on a development machine. Development
+  // machines should have correct pip and virtualenv versions installed.
+  exec {
+executable 'sh'
+args '-c', 'python --version; pip --version; echo -n "virtualenv "; 
virtualenv --version'
+  }
+  exec {
+executable 'sh'
+args '-c', "virtualenv ${envdir}"
+  }
 }
+
 exec {
   executable 'sh'
   args '-c', ". ${envdir}/bin/activate && pip install --upgrade tox==3.0.0 
grpcio-tools==1.3.5"


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94912)
Time Spent: 40m  (was: 0.5h)

> Periodic cleanup and setup of Python environment on Jenkins machines
> 
>
> Key: BEAM-4113
> URL: https://issues.apache.org/jira/browse/BEAM-4113
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Instead of having Python testing scripts tidy up the environment every time 
> they run, let's have a single script that runs once per day and does:
>  # Check if pip is installed under ~/.local and is the expected version. If 
> not, install the correct version.
>  # Same as above, but for virtualenv.
>  # Print a list of packages installed under ~/.local.
>  # Optional: Uninstall any unexpected packages under ~/.local. For example, 
> apache-beam shouldn't be installed under ~/.local, since 2 concurrent tests 
> on the same machine will attempt to install to the same location.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2163) Remove the dependency on examples from ptransform_test

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2163?focusedWorklogId=94904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94904
 ]

ASF GitHub Bot logged work on BEAM-2163:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:54
Start Date: 25/Apr/18 00:54
Worklog Time Spent: 10m 
  Work Description: JavierAntonioGonzalezTrejo commented on issue #5199: 
[BEAM-2163] Remove the dependency on examples from ptransform_test
URL: https://github.com/apache/beam/pull/5199#issuecomment-384125792
 
 
   Hi Pablo
   I put the class on a general snippets file inside the io folder. I put named 
the file snippets in case any other function or class that is required on the 
io folder, but does not makes sense under any other file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94904)
Time Spent: 50m  (was: 40m)

> Remove the dependency on examples from ptransform_test
> --
>
> Key: BEAM-2163
> URL: https://issues.apache.org/jira/browse/BEAM-2163
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sourabh Bajaj
>Priority: Major
>  Labels: newbie, starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform_test.py#L176
> This validates runner test depends on the Counting source snippet and the 
> source should be copied here instead of this dependency.
> The actual beam code should not depend on the examples package at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #181

2018-04-24 Thread Apache Jenkins Server
See 


--
[...truncated 19.62 MB...]
INFO: 2018-04-25T00:47:14.516Z: Fusing consumer 
PAssert$3/CreateActual/Flatten.Iterables/FlattenIterables/FlatMap into 
PAssert$3/CreateActual/ExtractPane/Map
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.566Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/Values/Values/Map into 
PAssert$3/CreateActual/GatherPanes/GroupByKey/GroupByWindow
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.593Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/GroupByKey/GroupByWindow into 
PAssert$3/CreateActual/GatherPanes/GroupByKey/Read
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.617Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/GroupByKey/Reify into 
PAssert$3/CreateActual/GatherPanes/Window.Into()/Window.Assign
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.651Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/GroupByKey/Write into 
PAssert$3/CreateActual/GatherPanes/GroupByKey/Reify
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.671Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/Window.Into()/Window.Assign into 
PAssert$3/CreateActual/GatherPanes/WithKeys/AddKeys/Map
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.697Z: Unzipping flatten s18-u63 for input 
s19.output-c61
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.726Z: Fusing unzipped copy of 
PAssert$3/CreateActual/GatherPanes/Reify.Window/ParDo(Anonymous), through 
flatten s18-u63, into producer 
PAssert$3/CreateActual/FilterActuals/Window.Assign
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.757Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/Reify.Window/ParDo(Anonymous) into 
PAssert$3/CreateActual/FilterActuals/Window.Assign
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.777Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow)
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.810Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.836Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.872Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:47:14.898Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
Apr 25, 2018 12:47:17 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 

Build failed in Jenkins: beam_PerformanceTests_Spark #1632

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 85.37 KB...]
2018-04-25 00:17:04,088 073d9fd4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-25 00:17:29,143 073d9fd4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-25 00:17:32,717 073d9fd4 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r3f5510e62ff105ad_0162fa298c8b_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_r3f5510e62ff105ad_0162fa298c8b_1 
... (0s) Current status: RUNNING
  Waiting on 
bqjob_r3f5510e62ff105ad_0162fa298c8b_1 ... (0s) Current status: DONE   
2018-04-25 00:17:32,718 073d9fd4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-25 00:17:58,647 073d9fd4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-25 00:18:02,175 073d9fd4 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r31e14ef016565f2c_0162fa29ffed_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_r31e14ef016565f2c_0162fa29ffed_1 
... (0s) Current status: RUNNING
  Waiting on 
bqjob_r31e14ef016565f2c_0162fa29ffed_1 ... (0s) Current status: DONE   
2018-04-25 00:18:02,176 073d9fd4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-25 00:18:22,590 073d9fd4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-25 00:18:26,212 073d9fd4 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r548fbc5e2bd27b9e_0162fa2a5d57_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: Upload complete.Waiting on bqjob_r548fbc5e2bd27b9e_0162fa2a5d57_1 
... (0s) Current status: RUNNING
  Waiting on 
bqjob_r548fbc5e2bd27b9e_0162fa2a5d57_1 ... (0s) Current status: DONE   
2018-04-25 00:18:26,213 073d9fd4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-25 00:18:41,674 073d9fd4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-25 00:18:45,238 073d9fd4 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job

[jira] [Closed] (BEAM-4169) Modify max worker of jenkins build

2018-04-24 Thread yifan zou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou closed BEAM-4169.
---
   Resolution: Duplicate
Fix Version/s: Not applicable

> Modify max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4169) Modify max worker of jenkins build

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?focusedWorklogId=94893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94893
 ]

ASF GitHub Bot logged work on BEAM-4169:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:24
Start Date: 25/Apr/18 00:24
Worklog Time Spent: 10m 
  Work Description: yifanzou closed pull request #5222: BEAM-4169, remove 
max worker for Jenkins build
URL: https://github.com/apache/beam/pull/5222
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.test-infra/jenkins/common_job_properties.groovy 
b/.test-infra/jenkins/common_job_properties.groovy
index 1ab3a2f9acd..b13901c7644 100644
--- a/.test-infra/jenkins/common_job_properties.groovy
+++ b/.test-infra/jenkins/common_job_properties.groovy
@@ -170,10 +170,7 @@ class common_job_properties {
 "--info",
 // Continue the build even if there is a failure to show as many potential 
failures as possible.
 '--continue',
-// Limit background number of workers to prevent exhausting machine memory.
-// Jenkins machines have 15GB memory, and run 2 jobs in parallel; workers 
are configured with
-// JVM max heap size 3.5GB. So 2 jobs * 2 workers * 3.5GB heap = 14GB
-'--max-workers=2',
+'--scan',
   ]
 
   static void setGradleSwitches(context) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94893)
Time Spent: 40m  (was: 0.5h)

> Modify max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_JDBC #493

2018-04-24 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread yifan zou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-4169:

Description: (was: 16 new machines with 16 CPUs and 104 GB MEM are 
created for beam Jenkins. We have large enough memory to run gradle build with 
maximum workers. 4 workers * 2 jobs * 3.5GB = 28GB)

> Remove max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_XmlIOIT_HDFS #89

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 258.16 KB...]
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:249)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:236)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy62.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at 

Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #91

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 37.72 KB...]
[INFO] Excluding io.grpc:grpc-protobuf:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.api:api-common:jar:1.0.0-rc2 from the shaded jar.
[INFO] Excluding com.google.api:gax:jar:1.3.1 from the shaded jar.
[INFO] Excluding org.threeten:threetenbp:jar:1.3.3 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core-grpc:jar:1.2.0 from the 
shaded jar.
[INFO] Excluding com.google.protobuf:protobuf-java-util:jar:3.2.0 from the 
shaded jar.
[INFO] Excluding com.google.code.gson:gson:jar:2.7 from the shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-pubsub:jar:v1-rev382-1.23.0 from the shaded 
jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 from the 
shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-protobuf:jar:1.23.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson:jar:1.23.0 
from the shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-auth:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-netty:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http2:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-buffer:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-common:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-transport:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-resolver:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-all:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-okhttp:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.squareup.okhttp:okhttp:jar:2.5.0 from the shaded jar.
[INFO] Excluding com.squareup.okio:okio:jar:1.6.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-lite:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-nano:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 
from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding commons-logging:commons-logging:jar:1.2 from the shaded jar.
[INFO] Excluding 

[jira] [Updated] (BEAM-4169) Modify max worker of jenkins build

2018-04-24 Thread yifan zou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-4169:

Summary: Modify max worker of jenkins build  (was: Remove max worker of 
jenkins build)

> Modify max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Compressed_TextIOIT_HDFS #90

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 77.37 KB...]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.copy(HadoopFileSystem.java:131)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:301)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy62.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.copy(HadoopFileSystem.java:131)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:301)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
at 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #179

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[wcn] Drain source when user function processing fails.

--
[...truncated 22.03 MB...]
INFO: 2018-04-25T00:14:48.904Z: Unzipping flatten s18-u63 for input 
s19.output-c61
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:48.947Z: Fusing unzipped copy of 
PAssert$3/CreateActual/GatherPanes/Reify.Window/ParDo(Anonymous), through 
flatten s18-u63, into producer 
PAssert$3/CreateActual/FilterActuals/Window.Assign
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:48.986Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/Reify.Window/ParDo(Anonymous) into 
PAssert$3/CreateActual/FilterActuals/Window.Assign
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.011Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow)
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.040Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.065Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.097Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.133Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.176Z: Fusing consumer 
PAssert$3/CreateActual/RewindowActuals/Window.Assign into 
PAssert$3/CreateActual/Flatten.Iterables/FlattenIterables/FlatMap
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.211Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Reify
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey+PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Partial
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.251Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
Apr 25, 2018 12:14:56 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:14:49.279Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Write
 into 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #180

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[wcn] Allow request and init hooks to update the context.

--
[...truncated 19.13 MB...]
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.412Z: Fusing consumer 
PAssert$3/CreateActual/GatherPanes/Reify.Window/ParDo(Anonymous) into 
PAssert$3/CreateActual/FilterActuals/Window.Assign
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.437Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow)
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.461Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.482Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.509Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.537Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.572Z: Fusing consumer 
PAssert$3/CreateActual/RewindowActuals/Window.Assign into 
PAssert$3/CreateActual/Flatten.Iterables/FlattenIterables/FlatMap
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.604Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Reify
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey+PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Partial
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.624Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.645Z: Fusing consumer 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Write
 into 
PAssert$3/CreateActual/View.AsSingleton/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Reify
Apr 25, 2018 12:15:33 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-25T00:15:26.671Z: Fusing consumer 

[jira] [Work logged] (BEAM-4125) Add a library to manipulate the proto representation of a pipeline

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4125?focusedWorklogId=94887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94887
 ]

ASF GitHub Bot logged work on BEAM-4125:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:15
Start Date: 25/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5172: 
[BEAM-4125] Add ProtoOverrides
URL: https://github.com/apache/beam/pull/5172#discussion_r183914531
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static org.hamcrest.Matchers.contains;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import com.google.protobuf.ByteString;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.AccumulationMode.Enum;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Coder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ComponentsOrBuilder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.model.pipeline.v1.RunnerApi.SdkFunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy;
+import 
org.apache.beam.runners.core.construction.graph.ProtoOverrides.TransformReplacement;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * Tests for {@link ProtoOverrides}.
+ */
+@RunWith(JUnit4.class)
+public class ProtoOverridesTest {
+  @Rule public ExpectedException thrown = ExpectedException.none();
+
+  @Test
+  public void replacesOnlyMatching() {
+PTransform firstTransform =
+
PTransform.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn("beam:first")).build();
+RunnerApi.Pipeline p =
+Pipeline.newBuilder()
+.addAllRootTransformIds(ImmutableList.of("first", "second"))
+.setComponents(
+Components.newBuilder()
+.putTransforms("first", firstTransform)
+.putTransforms(
+"second",
+PTransform.newBuilder()
+
.setSpec(FunctionSpec.newBuilder().setUrn("beam:second"))
+.build())
+.putPcollections(
+"intermediatePc",
+
PCollection.newBuilder().setUniqueName("intermediate").build())
+.putCoders(
+"coder",
+
Coder.newBuilder().setSpec(SdkFunctionSpec.getDefaultInstance()).build()))
+.build();
+
+PTransform secondReplacement =
+PTransform.newBuilder()
+.addSubtransforms("second_sub")
+.setSpec(
+FunctionSpec.newBuilder()
+.setUrn("beam:second:replacement")
+.setPayload(ByteString.copyFrom("foo-bar-baz".getBytes(
+.build();
+WindowingStrategy introducedWS =
+
WindowingStrategy.newBuilder().setAccumulationMode(Enum.ACCUMULATING).build();
+RunnerApi.Components extraComponents =
+Components.newBuilder()
+.putPcollections(
+"intermediatePc",
+
PCollection.newBuilder().setUniqueName("intermediate_replacement").build())
+.putWindowingStrategies("new_ws", 

[jira] [Work logged] (BEAM-4125) Add a library to manipulate the proto representation of a pipeline

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4125?focusedWorklogId=94890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94890
 ]

ASF GitHub Bot logged work on BEAM-4125:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:15
Start Date: 25/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5172: 
[BEAM-4125] Add ProtoOverrides
URL: https://github.com/apache/beam/pull/5172#discussion_r183915518
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static org.hamcrest.Matchers.contains;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import com.google.protobuf.ByteString;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.AccumulationMode.Enum;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Coder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ComponentsOrBuilder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.model.pipeline.v1.RunnerApi.SdkFunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy;
+import 
org.apache.beam.runners.core.construction.graph.ProtoOverrides.TransformReplacement;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * Tests for {@link ProtoOverrides}.
+ */
+@RunWith(JUnit4.class)
+public class ProtoOverridesTest {
+  @Rule public ExpectedException thrown = ExpectedException.none();
+
+  @Test
+  public void replacesOnlyMatching() {
+PTransform firstTransform =
+
PTransform.newBuilder().setSpec(FunctionSpec.newBuilder().setUrn("beam:first")).build();
+RunnerApi.Pipeline p =
+Pipeline.newBuilder()
+.addAllRootTransformIds(ImmutableList.of("first", "second"))
+.setComponents(
+Components.newBuilder()
+.putTransforms("first", firstTransform)
+.putTransforms(
+"second",
+PTransform.newBuilder()
+
.setSpec(FunctionSpec.newBuilder().setUrn("beam:second"))
+.build())
+.putPcollections(
+"intermediatePc",
+
PCollection.newBuilder().setUniqueName("intermediate").build())
+.putCoders(
+"coder",
+
Coder.newBuilder().setSpec(SdkFunctionSpec.getDefaultInstance()).build()))
+.build();
+
+PTransform secondReplacement =
+PTransform.newBuilder()
+.addSubtransforms("second_sub")
+.setSpec(
+FunctionSpec.newBuilder()
+.setUrn("beam:second:replacement")
+.setPayload(ByteString.copyFrom("foo-bar-baz".getBytes(
+.build();
+WindowingStrategy introducedWS =
+
WindowingStrategy.newBuilder().setAccumulationMode(Enum.ACCUMULATING).build();
+RunnerApi.Components extraComponents =
+Components.newBuilder()
+.putPcollections(
+"intermediatePc",
+
PCollection.newBuilder().setUniqueName("intermediate_replacement").build())
+.putWindowingStrategies("new_ws", 

[jira] [Work logged] (BEAM-4125) Add a library to manipulate the proto representation of a pipeline

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4125?focusedWorklogId=94889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94889
 ]

ASF GitHub Bot logged work on BEAM-4125:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:15
Start Date: 25/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5172: 
[BEAM-4125] Add ProtoOverrides
URL: https://github.com/apache/beam/pull/5172#discussion_r183915612
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static org.hamcrest.Matchers.contains;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import com.google.protobuf.ByteString;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.AccumulationMode.Enum;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Coder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ComponentsOrBuilder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.model.pipeline.v1.RunnerApi.SdkFunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy;
+import 
org.apache.beam.runners.core.construction.graph.ProtoOverrides.TransformReplacement;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * Tests for {@link ProtoOverrides}.
+ */
+@RunWith(JUnit4.class)
+public class ProtoOverridesTest {
+  @Rule public ExpectedException thrown = ExpectedException.none();
 
 Review comment:
   I don't see this being used anywhere in the test. Can it be removed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94889)
Time Spent: 40m  (was: 0.5h)

> Add a library to manipulate the proto representation of a pipeline
> --
>
> Key: BEAM-4125
> URL: https://issues.apache.org/jira/browse/BEAM-4125
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is important for a transform which includes in-environment transforms 
> (such as a lifted Combine), or for runners which use the beam representation 
> as their internal representation (such as the directrunner)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4125) Add a library to manipulate the proto representation of a pipeline

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4125?focusedWorklogId=94888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94888
 ]

ASF GitHub Bot logged work on BEAM-4125:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:15
Start Date: 25/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5172: 
[BEAM-4125] Add ProtoOverrides
URL: https://github.com/apache/beam/pull/5172#discussion_r183907831
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ProtoOverrides.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ComponentsOrBuilder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.runners.PTransformOverride;
+
+/**
+ * A way to apply a Proto-based {@link PTransformOverride}.
+ *
+ * This should generally be used to replace runner-executed transforms with 
runner-executed
+ * composites and simpler runner-executed primitives. It is generically less 
powerful than the
+ * native {@link org.apache.beam.sdk.Pipeline#replaceAll(List)} and more 
error-prone, so should only
+ * be used for relatively simple replacements.
+ */
+@Experimental
+public class ProtoOverrides {
+  /**
+   * Update all composites present in the {@code originalPipeline} have an URN 
equal to the provided
+   * {@code urn} using the provide {@link TransformReplacement}.
+   */
+  public static Pipeline updateTransform(
+  String urn, Pipeline originalPipeline, TransformReplacement 
compositeBuilder) {
+Components.Builder resultComponents = 
originalPipeline.getComponents().toBuilder();
+for (Map.Entry pt :
+originalPipeline.getComponents().getTransformsMap().entrySet()) {
+  if (pt.getValue().getSpec() != null && 
urn.equals(pt.getValue().getSpec().getUrn())) {
+MessageWithComponents updated =
+compositeBuilder.getReplacement(pt.getKey(), 
originalPipeline.getComponents());
+checkArgument(
+
updated.getPtransform().getOutputsMap().equals(pt.getValue().getOutputsMap()),
+"A %s must produce all of the outputs of the original %s",
+TransformReplacement.class.getSimpleName(),
+PTransform.class.getSimpleName());
+removeSubtransforms(pt.getValue(), resultComponents);
+resultComponents
+.mergeFrom(updated.getComponents())
+.putTransforms(pt.getKey(), updated.getPtransform());
+  }
+}
+return 
originalPipeline.toBuilder().setComponents(resultComponents).build();
+  }
+
+  /**
+   * Remove any subtransforms that are produced by them but not by the 
enclosing transform.
 
 Review comment:
   Awkward function description; I don't know what "them" refers to.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94888)
Time Spent: 0.5h  (was: 20m)

> Add a library to manipulate the proto representation of a pipeline
> --
>
> Key: BEAM-4125
> URL: https://issues.apache.org/jira/browse/BEAM-4125
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: 

[jira] [Work logged] (BEAM-4125) Add a library to manipulate the proto representation of a pipeline

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4125?focusedWorklogId=94892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94892
 ]

ASF GitHub Bot logged work on BEAM-4125:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:15
Start Date: 25/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5172: 
[BEAM-4125] Add ProtoOverrides
URL: https://github.com/apache/beam/pull/5172#discussion_r183908747
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ProtoOverrides.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ComponentsOrBuilder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.runners.PTransformOverride;
+
+/**
+ * A way to apply a Proto-based {@link PTransformOverride}.
+ *
+ * This should generally be used to replace runner-executed transforms with 
runner-executed
+ * composites and simpler runner-executed primitives. It is generically less 
powerful than the
+ * native {@link org.apache.beam.sdk.Pipeline#replaceAll(List)} and more 
error-prone, so should only
+ * be used for relatively simple replacements.
+ */
+@Experimental
+public class ProtoOverrides {
+  /**
+   * Update all composites present in the {@code originalPipeline} have an URN 
equal to the provided
 
 Review comment:
   This description is awkwardly worded, sounds like there's some small typos.
   
   Specifically:
   `{@code originalPipeline} have` -> `{@code originalPipeline} that have`
   `the provide {@link TransformReplacement}` -> `the provided {@link 
TransformReplacement}`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94892)
Time Spent: 1h  (was: 50m)

> Add a library to manipulate the proto representation of a pipeline
> --
>
> Key: BEAM-4125
> URL: https://issues.apache.org/jira/browse/BEAM-4125
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is important for a transform which includes in-environment transforms 
> (such as a lifted Combine), or for runners which use the beam representation 
> as their internal representation (such as the directrunner)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4125) Add a library to manipulate the proto representation of a pipeline

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4125?focusedWorklogId=94891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94891
 ]

ASF GitHub Bot logged work on BEAM-4125:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:15
Start Date: 25/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #5172: 
[BEAM-4125] Add ProtoOverrides
URL: https://github.com/apache/beam/pull/5172#discussion_r183914417
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/ProtoOverridesTest.java
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static org.hamcrest.Matchers.contains;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasKey;
+import static org.hamcrest.Matchers.not;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import com.google.protobuf.ByteString;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.AccumulationMode.Enum;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Coder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.ComponentsOrBuilder;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.MessageWithComponents;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline;
+import org.apache.beam.model.pipeline.v1.RunnerApi.SdkFunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.WindowingStrategy;
+import 
org.apache.beam.runners.core.construction.graph.ProtoOverrides.TransformReplacement;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * Tests for {@link ProtoOverrides}.
+ */
+@RunWith(JUnit4.class)
+public class ProtoOverridesTest {
+  @Rule public ExpectedException thrown = ExpectedException.none();
+
+  @Test
+  public void replacesOnlyMatching() {
+PTransform firstTransform =
 
 Review comment:
   I think you can avoid declaring this as a separate variable to lower the 
mental load.
   
   You could define it inline while creating the pipeline (like you do with 
"second") and then use getTransformsOrThrow to retrieve it at the end for the 
assert.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94891)
Time Spent: 50m  (was: 40m)

> Add a library to manipulate the proto representation of a pipeline
> --
>
> Key: BEAM-4125
> URL: https://issues.apache.org/jira/browse/BEAM-4125
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is important for a transform which includes in-environment transforms 
> (such as a lifted Combine), or for runners which use the beam representation 
> as their internal representation (such as the directrunner)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_TextIOIT_HDFS #96

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 238.39 KB...]
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:249)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:236)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy62.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy63.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 

Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #182

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 48.14 KB...]
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client:jar:1.23.0 from the 
shaded jar.
[INFO] Excluding com.google.oauth-client:google-oauth-client:jar:1.23.0 from 
the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client:jar:1.23.0 from the 
shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson2:jar:1.23.0 
from the shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-dataflow:jar:v1b3-rev221-1.23.0 from the 
shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-clouddebugger:jar:v2-rev233-1.23.0 from the 
shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-storage:jar:v1-rev124-1.23.0 from the 
shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-credentials:jar:0.7.1 from 
the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-oauth2-http:jar:0.7.1 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigdataoss:util:jar:1.4.5 from the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client-java6:jar:1.23.0 from 
the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client-jackson2:jar:1.23.0 
from the shaded jar.
[INFO] Excluding com.google.oauth-client:google-oauth-client-java6:jar:1.23.0 
from the shaded jar.
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing 

 with 

[INFO] Replacing original test artifact with shaded test artifact.
[INFO] Replacing 

 with 

[INFO] Dependency-reduced POM written at: 

[INFO] 
[INFO] --- maven-failsafe-plugin:2.21.0:integration-test (default) @ 
beam-sdks-java-io-hadoop-input-format ---
[INFO] Failsafe report directory: 

[INFO] parallel='all', perCoreThreadCount=true, threadCount=4, 
useUnlimitedThreads=false, threadCountSuites=0, threadCountClasses=0, 
threadCountMethods=0, parallelOptimized=true
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0 s <<< 
FAILURE! - in org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT
[ERROR] 

[jira] [Work logged] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?focusedWorklogId=94885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94885
 ]

ASF GitHub Bot logged work on BEAM-4169:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:11
Start Date: 25/Apr/18 00:11
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5222: BEAM-4169, remove 
max worker for Jenkins build
URL: https://github.com/apache/beam/pull/5222#issuecomment-384119352
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94885)
Time Spent: 0.5h  (was: 20m)

> Remove max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 16 new machines with 16 CPUs and 104 GB MEM are created for beam Jenkins. We 
> have large enough memory to run gradle build with maximum workers. 4 workers 
> * 2 jobs * 3.5GB = 28GB



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Python #1189

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[rangadi] Disable flaky unbounded pipeline test

[wcn] Drain source when user function processing fails.

[kedin] Add primitive java types support to Row generation logic, add example

[aaltay] Unpinning Python jobs from Jenkins machines. (#5214)

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[wcn] Allow request and init hooks to update the context.

--
[...truncated 4.30 KB...]
Collecting contextlib2>=0.5.1 (from -r PerfKitBenchmarker/requirements.txt 
(line 24))
  Using cached 
https://files.pythonhosted.org/packages/a2/71/8273a7eeed0aff6a854237ab5453bc9aa67deb49df4832801c21f0ff3782/contextlib2-0.5.5-py2.py3-none-any.whl
Collecting pywinrm (from -r PerfKitBenchmarker/requirements.txt (line 25))
  Using cached 
https://files.pythonhosted.org/packages/0d/12/13a3117bbd2230043aa32dcfa2198c33269665eaa1a8fa26174ce49b338f/pywinrm-0.3.0-py2.py3-none-any.whl
Requirement already satisfied: six in /usr/local/lib/python2.7/dist-packages 
(from absl-py->-r PerfKitBenchmarker/requirements.txt (line 14)) (1.11.0)
Requirement already satisfied: MarkupSafe>=0.23 in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15)) (1.0)
Collecting colorama; extra == "windows" (from colorlog[windows]==2.6.0->-r 
PerfKitBenchmarker/requirements.txt (line 17))
  Using cached 
https://files.pythonhosted.org/packages/db/c8/7dcf9dbcb22429512708fe3a547f8b6101c0d02137acbd892505aee57adf/colorama-0.3.9-py2.py3-none-any.whl
Collecting requests-ntlm>=0.3.0 (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
  Using cached 
https://files.pythonhosted.org/packages/03/4b/8b9a1afde8072c4d5710d9fa91433d504325821b038e00237dc8d6d833dc/requests_ntlm-1.1.0-py2.py3-none-any.whl
Requirement already satisfied: requests>=2.9.1 in 
/usr/local/lib/python2.7/dist-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (2.18.4)
Collecting xmltodict (from pywinrm->-r PerfKitBenchmarker/requirements.txt 
(line 25))
  Using cached 
https://files.pythonhosted.org/packages/42/a9/7e99652c6bc619d19d58cdd8c47560730eb5825d43a7e25db2e1d776ceb7/xmltodict-0.11.0-py2.py3-none-any.whl
Requirement already satisfied: cryptography>=1.3 in 
/usr/local/lib/python2.7/dist-packages (from requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (2.2.2)
Collecting ntlm-auth>=1.0.2 (from requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
  Using cached 
https://files.pythonhosted.org/packages/69/bc/230987c0dc22c763529330b2e669dbdba374d6a10c1f61232274184731be/ntlm_auth-1.1.0-py2.py3-none-any.whl
Requirement already satisfied: certifi>=2017.4.17 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (2018.4.16)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (3.0.4)
Requirement already satisfied: idna<2.7,>=2.5 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in 
/usr/local/lib/python2.7/dist-packages (from requests>=2.9.1->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (1.22)
Requirement already satisfied: cffi>=1.7; platform_python_implementation != 
"PyPy" in /usr/local/lib/python2.7/dist-packages (from 
cryptography>=1.3->requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (1.11.5)
Requirement already satisfied: enum34; python_version < "3" in 
/usr/local/lib/python2.7/dist-packages (from 
cryptography>=1.3->requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (1.1.6)
Requirement already satisfied: asn1crypto>=0.21.0 in 
/usr/local/lib/python2.7/dist-packages (from 
cryptography>=1.3->requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (0.24.0)
Requirement already satisfied: ipaddress; python_version < "3" in 
/usr/local/lib/python2.7/dist-packages (from 
cryptography>=1.3->requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (1.0.22)
Requirement already satisfied: pycparser in 
/usr/local/lib/python2.7/dist-packages (from cffi>=1.7; 
platform_python_implementation != 
"PyPy"->cryptography>=1.3->requests-ntlm>=0.3.0->pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25)) (2.18)
Installing collected packages: absl-py, colorama, colorlog, blinker, futures, 
pint, numpy, contextlib2, ntlm-auth, requests-ntlm, xmltodict, 

[jira] [Work logged] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?focusedWorklogId=94884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94884
 ]

ASF GitHub Bot logged work on BEAM-4169:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:06
Start Date: 25/Apr/18 00:06
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5222: BEAM-4169, remove 
max worker for Jenkins build
URL: https://github.com/apache/beam/pull/5222#issuecomment-384118379
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94884)
Time Spent: 20m  (was: 10m)

> Remove max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 16 new machines with 16 CPUs and 104 GB MEM are created for beam Jenkins. We 
> have large enough memory to run gradle build with maximum workers. 4 workers 
> * 2 jobs * 3.5GB = 28GB



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4150) Standardize use of PCollection coder proto attribute

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4150?focusedWorklogId=94883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94883
 ]

ASF GitHub Bot logged work on BEAM-4150:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:05
Start Date: 25/Apr/18 00:05
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #4867: [BEAM-4150] Augment 
Python SDK worker to handle unwindowed coders.
URL: https://github.com/apache/beam/pull/4867#issuecomment-384118266
 
 
   Jenkins: retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94883)
Time Spent: 0.5h  (was: 20m)

> Standardize use of PCollection coder proto attribute
> 
>
> Key: BEAM-4150
> URL: https://issues.apache.org/jira/browse/BEAM-4150
> Project: Beam
>  Issue Type: Task
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In some places it's expected to be a WindowedCoder, in others the raw 
> ElementCoder. We should use the same convention (decided in discussion to be 
> the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the 
> attached windowing strategy, and the input/output ports should specify the 
> encoding directly rather than read the adjacent PCollection coder fields. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4113) Periodic cleanup and setup of Python environment on Jenkins machines

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4113?focusedWorklogId=94882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94882
 ]

ASF GitHub Bot logged work on BEAM-4113:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:04
Start Date: 25/Apr/18 00:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #5162: [BEAM-4113] Install 
latest pip/virtualenv in Gradle Python precommit.
URL: https://github.com/apache/beam/pull/5162#issuecomment-384118015
 
 
   Should we close this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94882)
Time Spent: 20m  (was: 10m)

> Periodic cleanup and setup of Python environment on Jenkins machines
> 
>
> Key: BEAM-4113
> URL: https://issues.apache.org/jira/browse/BEAM-4113
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Instead of having Python testing scripts tidy up the environment every time 
> they run, let's have a single script that runs once per day and does:
>  # Check if pip is installed under ~/.local and is the expected version. If 
> not, install the correct version.
>  # Same as above, but for virtualenv.
>  # Print a list of packages installed under ~/.local.
>  # Optional: Uninstall any unexpected packages under ~/.local. For example, 
> apache-beam shouldn't be installed under ~/.local, since 2 concurrent tests 
> on the same machine will attempt to install to the same location.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3914) 'Unzip' flattens before performing fusion

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3914?focusedWorklogId=94876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94876
 ]

ASF GitHub Bot logged work on BEAM-3914:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:03
Start Date: 25/Apr/18 00:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4977: 
[BEAM-3914] Deduplicate Unzipped Flattens after Pipeline Fusion
URL: https://github.com/apache/beam/pull/4977#discussion_r183911558
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java
 ##
 @@ -0,0 +1,438 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.collect.Iterables.getOnlyElement;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasEntry;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasSize;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import 
org.apache.beam.runners.core.construction.graph.OutputDeduplicator.DeduplicationResult;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link OutputDeduplicator}. */
+@RunWith(JUnit4.class)
+public class OutputDeduplicatorTest {
+  @Test
+  public void unchangedWithNoDuplicates() {
+/* When all the PCollections are produced by only one transform or stage, 
the result should be
+ * empty//identical to the input.
+ */
+PTransform one =
+PTransform.newBuilder().putInputs("in", "red.out").putOutputs("out", 
"one.out").build();
+PCollection oneOut = 
PCollection.newBuilder().setUniqueName("one.out").build();
+PTransform two =
+PTransform.newBuilder().putInputs("in", "red.out").putOutputs("out", 
"two.out").build();
+PCollection twoOut = 
PCollection.newBuilder().setUniqueName("two.out").build();
+PTransform red = PTransform.newBuilder().putOutputs("out", 
"red.out").build();
+PCollection redOut = 
PCollection.newBuilder().setUniqueName("red.out").build();
+PTransform blue =
+PTransform.newBuilder()
+.putInputs("one", "one.out")
+.putInputs("two", "two.out")
+.putOutputs("out", "blue.out")
+.build();
+PCollection blueOut = 
PCollection.newBuilder().setUniqueName("blue.out").build();
+RunnerApi.Components components =
+Components.newBuilder()
+.putTransforms("one", one)
+.putPcollections("one.out", oneOut)
+.putTransforms("two", two)
+.putPcollections("two.out", twoOut)
+.putTransforms("red", red)
+.putPcollections("red.out", redOut)
+.putTransforms("blue", blue)
+.putPcollections("blue.out", blueOut)
+.build();
+ExecutableStage oneStage =
+ImmutableExecutableStage.of(
+components,
+Environment.getDefaultInstance(),
+PipelineNode.pCollection("red.out", redOut),
+ImmutableList.of(),
+ImmutableList.of(PipelineNode.pTransform("one", one)),
+ImmutableList.of(PipelineNode.pCollection("one.out", oneOut)));
+ExecutableStage twoStage =
+  

[jira] [Work logged] (BEAM-3914) 'Unzip' flattens before performing fusion

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3914?focusedWorklogId=94878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94878
 ]

ASF GitHub Bot logged work on BEAM-3914:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:03
Start Date: 25/Apr/18 00:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4977: 
[BEAM-3914] Deduplicate Unzipped Flattens after Pipeline Fusion
URL: https://github.com/apache/beam/pull/4977#discussion_r183911824
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java
 ##
 @@ -0,0 +1,438 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.collect.Iterables.getOnlyElement;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasEntry;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasSize;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import 
org.apache.beam.runners.core.construction.graph.OutputDeduplicator.DeduplicationResult;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link OutputDeduplicator}. */
+@RunWith(JUnit4.class)
+public class OutputDeduplicatorTest {
+  @Test
+  public void unchangedWithNoDuplicates() {
+/* When all the PCollections are produced by only one transform or stage, 
the result should be
+ * empty//identical to the input.
+ */
+PTransform one =
+PTransform.newBuilder().putInputs("in", "red.out").putOutputs("out", 
"one.out").build();
+PCollection oneOut = 
PCollection.newBuilder().setUniqueName("one.out").build();
+PTransform two =
+PTransform.newBuilder().putInputs("in", "red.out").putOutputs("out", 
"two.out").build();
+PCollection twoOut = 
PCollection.newBuilder().setUniqueName("two.out").build();
+PTransform red = PTransform.newBuilder().putOutputs("out", 
"red.out").build();
+PCollection redOut = 
PCollection.newBuilder().setUniqueName("red.out").build();
+PTransform blue =
+PTransform.newBuilder()
+.putInputs("one", "one.out")
+.putInputs("two", "two.out")
+.putOutputs("out", "blue.out")
+.build();
+PCollection blueOut = 
PCollection.newBuilder().setUniqueName("blue.out").build();
+RunnerApi.Components components =
+Components.newBuilder()
+.putTransforms("one", one)
+.putPcollections("one.out", oneOut)
+.putTransforms("two", two)
+.putPcollections("two.out", twoOut)
+.putTransforms("red", red)
+.putPcollections("red.out", redOut)
+.putTransforms("blue", blue)
+.putPcollections("blue.out", blueOut)
+.build();
+ExecutableStage oneStage =
+ImmutableExecutableStage.of(
+components,
+Environment.getDefaultInstance(),
+PipelineNode.pCollection("red.out", redOut),
+ImmutableList.of(),
+ImmutableList.of(PipelineNode.pTransform("one", one)),
+ImmutableList.of(PipelineNode.pCollection("one.out", oneOut)));
+ExecutableStage twoStage =
+  

[jira] [Work logged] (BEAM-3914) 'Unzip' flattens before performing fusion

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3914?focusedWorklogId=94880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94880
 ]

ASF GitHub Bot logged work on BEAM-3914:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:03
Start Date: 25/Apr/18 00:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4977: 
[BEAM-3914] Deduplicate Unzipped Flattens after Pipeline Fusion
URL: https://github.com/apache/beam/pull/4977#discussion_r183913153
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicator.java
 ##
 @@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.collect.HashMultimap;
+import com.google.common.collect.Multimap;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.LinkedHashMap;
+import java.util.LinkedHashSet;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.FunctionSpec;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+
+/**
+ * Utilities to insert synthetic {@link PCollectionNode PCollections} for 
{@link PCollection
+ * PCollections} which are produced by multiple independently executable 
stages.
+ */
+class OutputDeduplicator {
+
+  /**
+   * Ensure that no {@link PCollection} output by any of the {@code stages} or 
{@code
+   * unfusedTransforms} is produced by more than one of those stages or 
transforms.
+   *
+   * For each {@link PCollection} output by multiple stages and/or 
transforms, each producer is
+   * rewritten to produce a partial {@link PCollection}, which are then 
flattened together via an
+   * introduced Flatten node which produces the original output.
+   */
+  static DeduplicationResult ensureSingleProducer(
+  QueryablePipeline pipeline,
+  Collection stages,
+  Collection unfusedTransforms) {
+RunnerApi.Components.Builder unzippedComponents = 
pipeline.getComponents().toBuilder();
+
+Multimap pcollectionProducers =
+getProducers(pipeline, stages, unfusedTransforms);
+Multimap requiresNewOutput = 
HashMultimap.create();
+// Create a synthetic PCollection for each of these nodes. The transforms 
in the runner
+// portion of the graph that creates them should be replaced in the result 
components. The
+// ExecutableStage must also be rewritten to have updated outputs and 
transforms.
+for (Map.Entry 
collectionProducer :
+pcollectionProducers.asMap().entrySet()) {
+  if (collectionProducer.getValue().size() > 1) {
+for (StageOrTransform producer : collectionProducer.getValue()) {
+  requiresNewOutput.put(producer, collectionProducer.getKey());
+}
+  }
+}
+
+Map updatedStages = new 
LinkedHashMap<>();
+Map updatedTransforms = new LinkedHashMap<>();
+Multimap originalToPartial = 
HashMultimap.create();
+for (Map.Entry 
deduplicationTargets :
+requiresNewOutput.asMap().entrySet()) {
+  if (deduplicationTargets.getKey().getStage() != null) {
+StageDeduplication deduplication =
+   

[jira] [Work logged] (BEAM-3914) 'Unzip' flattens before performing fusion

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3914?focusedWorklogId=94877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94877
 ]

ASF GitHub Bot logged work on BEAM-3914:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:03
Start Date: 25/Apr/18 00:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4977: 
[BEAM-3914] Deduplicate Unzipped Flattens after Pipeline Fusion
URL: https://github.com/apache/beam/pull/4977#discussion_r183910579
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java
 ##
 @@ -0,0 +1,438 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.collect.Iterables.getOnlyElement;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasEntry;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasSize;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import 
org.apache.beam.runners.core.construction.graph.OutputDeduplicator.DeduplicationResult;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link OutputDeduplicator}. */
+@RunWith(JUnit4.class)
+public class OutputDeduplicatorTest {
+  @Test
+  public void unchangedWithNoDuplicates() {
+/* When all the PCollections are produced by only one transform or stage, 
the result should be
+ * empty//identical to the input.
 
 Review comment:
   single slash


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94877)
Time Spent: 2h 20m  (was: 2h 10m)

> 'Unzip' flattens before performing fusion
> -
>
> Key: BEAM-3914
> URL: https://issues.apache.org/jira/browse/BEAM-3914
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This consists of duplicating nodes downstream of a flatten that exist within 
> an environment, and reintroducing the flatten immediately upstream of a 
> runner-executed transform (the flatten should be executed within the runner)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3914) 'Unzip' flattens before performing fusion

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3914?focusedWorklogId=94881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94881
 ]

ASF GitHub Bot logged work on BEAM-3914:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:03
Start Date: 25/Apr/18 00:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4977: 
[BEAM-3914] Deduplicate Unzipped Flattens after Pipeline Fusion
URL: https://github.com/apache/beam/pull/4977#discussion_r183911633
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java
 ##
 @@ -0,0 +1,438 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.collect.Iterables.getOnlyElement;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasEntry;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasSize;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import 
org.apache.beam.runners.core.construction.graph.OutputDeduplicator.DeduplicationResult;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link OutputDeduplicator}. */
+@RunWith(JUnit4.class)
+public class OutputDeduplicatorTest {
+  @Test
+  public void unchangedWithNoDuplicates() {
+/* When all the PCollections are produced by only one transform or stage, 
the result should be
+ * empty//identical to the input.
+ */
+PTransform one =
+PTransform.newBuilder().putInputs("in", "red.out").putOutputs("out", 
"one.out").build();
+PCollection oneOut = 
PCollection.newBuilder().setUniqueName("one.out").build();
+PTransform two =
+PTransform.newBuilder().putInputs("in", "red.out").putOutputs("out", 
"two.out").build();
+PCollection twoOut = 
PCollection.newBuilder().setUniqueName("two.out").build();
+PTransform red = PTransform.newBuilder().putOutputs("out", 
"red.out").build();
+PCollection redOut = 
PCollection.newBuilder().setUniqueName("red.out").build();
+PTransform blue =
+PTransform.newBuilder()
+.putInputs("one", "one.out")
+.putInputs("two", "two.out")
+.putOutputs("out", "blue.out")
+.build();
+PCollection blueOut = 
PCollection.newBuilder().setUniqueName("blue.out").build();
+RunnerApi.Components components =
+Components.newBuilder()
+.putTransforms("one", one)
+.putPcollections("one.out", oneOut)
+.putTransforms("two", two)
+.putPcollections("two.out", twoOut)
+.putTransforms("red", red)
+.putPcollections("red.out", redOut)
+.putTransforms("blue", blue)
+.putPcollections("blue.out", blueOut)
+.build();
+ExecutableStage oneStage =
+ImmutableExecutableStage.of(
+components,
+Environment.getDefaultInstance(),
+PipelineNode.pCollection("red.out", redOut),
+ImmutableList.of(),
+ImmutableList.of(PipelineNode.pTransform("one", one)),
+ImmutableList.of(PipelineNode.pCollection("one.out", oneOut)));
+ExecutableStage twoStage =
+  

[jira] [Work logged] (BEAM-3914) 'Unzip' flattens before performing fusion

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3914?focusedWorklogId=94879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94879
 ]

ASF GitHub Bot logged work on BEAM-3914:


Author: ASF GitHub Bot
Created on: 25/Apr/18 00:03
Start Date: 25/Apr/18 00:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4977: 
[BEAM-3914] Deduplicate Unzipped Flattens after Pipeline Fusion
URL: https://github.com/apache/beam/pull/4977#discussion_r183911010
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/OutputDeduplicatorTest.java
 ##
 @@ -0,0 +1,438 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction.graph;
+
+import static com.google.common.collect.Iterables.getOnlyElement;
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.hasEntry;
+import static org.hamcrest.Matchers.hasItems;
+import static org.hamcrest.Matchers.hasSize;
+import static org.junit.Assert.assertThat;
+
+import com.google.common.collect.ImmutableList;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Components;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PCollection;
+import org.apache.beam.model.pipeline.v1.RunnerApi.PTransform;
+import 
org.apache.beam.runners.core.construction.graph.OutputDeduplicator.DeduplicationResult;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link OutputDeduplicator}. */
+@RunWith(JUnit4.class)
+public class OutputDeduplicatorTest {
+  @Test
+  public void unchangedWithNoDuplicates() {
+/* When all the PCollections are produced by only one transform or stage, 
the result should be
+ * empty//identical to the input.
+ */
+PTransform one =
 
 Review comment:
   A bit of ascii art would go a long way in understanding this graph. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94879)
Time Spent: 2h 40m  (was: 2.5h)

> 'Unzip' flattens before performing fusion
> -
>
> Key: BEAM-3914
> URL: https://issues.apache.org/jira/browse/BEAM-3914
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This consists of duplicating nodes downstream of a flatten that exist within 
> an environment, and reintroducing the flatten immediately upstream of a 
> runner-executed transform (the flatten should be executed within the runner)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4170) Gradle publish -Prelease should create beam-parent with source .zip

2018-04-24 Thread Alan Myrvold (JIRA)
Alan Myrvold created BEAM-4170:
--

 Summary: Gradle publish -Prelease should create beam-parent with 
source .zip
 Key: BEAM-4170
 URL: https://issues.apache.org/jira/browse/BEAM-4170
 Project: Beam
  Issue Type: Sub-task
  Components: build-system
Affects Versions: 2.5.0
Reporter: Alan Myrvold
Assignee: Luke Cwik


When I published locally with gradle, I didn't see a beam-parent directory, but 
I see it in 2.4.0.

./gradlew publish -Prelease -PdistMgmtSnapshotsUrl=http://someserver 
--no-parallel

In 2.4.0 it was 
https://repo1.maven.org/maven2/org/apache/beam/beam-parent/2.4.0/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #177

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[kedin] Add primitive java types support to Row generation logic, add example

--
[...truncated 18.68 MB...]
org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testNoTornadoes 
STANDARD_ERROR
Apr 24, 2018 11:40:09 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.TriggerExampleTest > testExtractTotalFlow 
STANDARD_ERROR
Apr 24, 2018 11:40:10 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > 
testFilterSingleMonthDataFn STANDARD_ERROR
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > testProjectionFn 
STANDARD_ERROR
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractCountryInfoFn 
STANDARD_ERROR
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractEventDataFn 
STANDARD_ERROR
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.DebuggingWordCountTest > testDebuggingWordCount 
STANDARD_ERROR
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.io.FileBasedSource 
getEstimatedSizeBytes
INFO: Filepattern 
/tmp/junit2245267671230643969/junit2476733885584945703.tmp matched 1 files with 
total size 54
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern 
/tmp/junit2245267671230643969/junit2476733885584945703.tmp into bundles of size 
3 took 0 ms and produced 1 files and 18 bundles

org.apache.beam.examples.WordCountTest > testExtractWordsFn STANDARD_ERROR
Apr 24, 2018 11:40:11 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.subprocess.ExampleEchoPipelineTest > 
testExampleEchoPipeline STANDARD_ERROR
Apr 24, 2018 11:40:13 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-Echo3173891149910928532.sh 
Apr 24, 2018 11:40:13 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:40:13 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-Echo3173891149910928532.sh to 
/tmp/test-Echoo802144664658894864/test-Echo3173891149910928532.sh 
Apr 24, 2018 11:40:13 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-EchoAgain3022186494051844904.sh 
Apr 24, 2018 11:40:13 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:40:13 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-EchoAgain3022186494051844904.sh to 
/tmp/test-Echoo802144664658894864/test-EchoAgain3022186494051844904.sh 

org.apache.beam.examples.complete.game.HourlyTeamScoreTest > 
testUserScoresFilter STANDARD_OUT
GOT user18_BananaEmu,BananaEmu,7,144796569,2015-11-19 12:41:31.053
GOT user6_AmberNumbat,AmberNumbat,11,144795563,2015-11-19 09:53:53.444
GOT user18_BananaEmu,BananaEmu,1,144796569,2015-11-19 12:41:31.053
GOT user0_MagentaKangaroo,MagentaKangaroo,4,144796569,2015-11-19 
12:41:31.053
GOT user3_BananaEmu,BananaEmu,17,144796569,2015-11-19 12:41:31.053
GOT user0_MagentaKangaroo,MagentaKangaroo,3,144795563,2015-11-19 
09:53:53.444
GOT user2_AmberCockatoo,AmberCockatoo,13,144796569,2015-11-19 
12:41:31.053
GOT user19_BisqueBilby,BisqueBilby,6,144795563,2015-11-19 09:53:53.444
GOT user13_ApricotQuokka,ApricotQuokka,15,144795563,2015-11-19 
09:53:53.444
GOT 
user7_AndroidGreenKookaburra,AndroidGreenKookaburra,12,144795563,2015-11-19 
09:53:53.444
GOT 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #178

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[kirpichov] [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

--
[...truncated 18.67 MB...]
org.apache.beam.examples.cookbook.FilterExamplesTest > 
testFilterSingleMonthDataFn STANDARD_ERROR
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > testProjectionFn 
STANDARD_ERROR
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractCountryInfoFn 
STANDARD_ERROR
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractEventDataFn 
STANDARD_ERROR
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.DebuggingWordCountTest > testDebuggingWordCount 
STANDARD_ERROR
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.io.FileBasedSource 
getEstimatedSizeBytes
INFO: Filepattern 
/tmp/junit7290404604861474461/junit7856709705613336601.tmp matched 1 files with 
total size 54
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern 
/tmp/junit7290404604861474461/junit7856709705613336601.tmp into bundles of size 
3 took 1 ms and produced 1 files and 18 bundles

org.apache.beam.examples.WordCountTest > testExtractWordsFn STANDARD_ERROR
Apr 24, 2018 11:40:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.subprocess.ExampleEchoPipelineTest > 
testExampleEchoPipeline STANDARD_ERROR
Apr 24, 2018 11:40:24 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-Echo5729650519303680627.sh 
Apr 24, 2018 11:40:24 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:40:24 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-Echo5729650519303680627.sh to 
/tmp/test-Echoo1092152855298419818/test-Echo5729650519303680627.sh 
Apr 24, 2018 11:40:24 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-EchoAgain3964862310941665987.sh 
Apr 24, 2018 11:40:24 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:40:24 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-EchoAgain3964862310941665987.sh to 
/tmp/test-Echoo1092152855298419818/test-EchoAgain3964862310941665987.sh 

org.apache.beam.examples.complete.game.HourlyTeamScoreTest > 
testUserScoresFilter STANDARD_OUT
GOT user13_ApricotQuokka,ApricotQuokka,15,144795563,2015-11-19 
09:53:53.444
GOT user6_AmberNumbat,AmberNumbat,11,144795563,2015-11-19 09:53:53.444
GOT user0_MagentaKangaroo,MagentaKangaroo,3,144795563,2015-11-19 
09:53:53.444
GOT user7_AlmondWallaby,AlmondWallaby,15,144795563,2015-11-19 
09:53:53.444
GOT 
user7_AndroidGreenKookaburra,AndroidGreenKookaburra,12,144795563,2015-11-19 
09:53:53.444
GOT 
user7_AndroidGreenKookaburra,AndroidGreenKookaburra,11,144795563,2015-11-19 
09:53:53.444
GOT user19_BisqueBilby,BisqueBilby,6,144795563,2015-11-19 09:53:53.444
GOT user0_MagentaKangaroo,MagentaKangaroo,4,144796569,2015-11-19 
12:41:31.053
GOT user18_BananaEmu,BananaEmu,7,144796569,2015-11-19 12:41:31.053
GOT user2_AmberCockatoo,AmberCockatoo,13,144796569,2015-11-19 
12:41:31.053
GOT user18_BananaEmu,BananaEmu,1,144796569,2015-11-19 12:41:31.053
GOT 
user0_AndroidGreenEchidna,AndroidGreenEchidna,0,144796569,2015-11-19 
12:41:31.053
GOT user19_BisqueBilby,BisqueBilby,8,144795563,2015-11-19 09:53:53.444
GOT user18_ApricotCaneToad,ApricotCaneToad,14,144796569,2015-11-19 
12:41:31.053
GOT user3_BananaEmu,BananaEmu,17,144796569,2015-11-19 12:41:31.053

org.apache.beam.examples.complete.game.UserScoreTest > testTeamScoreSums 
STANDARD_OUT
GOT user0_MagentaKangaroo,MagentaKangaroo,3,144795563,2015-11-19 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #176

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Cleanups in GroupByKeyOnlyEvaluatorFactory

--
[...truncated 19.49 MB...]
org.apache.beam.examples.cookbook.CombinePerKeyExamplesTest > 
testExtractLargeWordsFn STANDARD_ERROR
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
getCurrentContainer
WARNING: Reporting metrics are not supported in the current execution 
environment.

org.apache.beam.examples.cookbook.MaxPerKeyExamplesTest > testFormatMaxesFn 
STANDARD_ERROR
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.MaxPerKeyExamplesTest > testExtractTempFn 
STANDARD_ERROR
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testFormatCounts 
STANDARD_ERROR
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testExtractTornadoes 
STANDARD_ERROR
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testNoTornadoes 
STANDARD_ERROR
Apr 24, 2018 11:32:49 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.TriggerExampleTest > testExtractTotalFlow 
STANDARD_ERROR
Apr 24, 2018 11:32:50 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > 
testFilterSingleMonthDataFn STANDARD_ERROR
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > testProjectionFn 
STANDARD_ERROR
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractCountryInfoFn 
STANDARD_ERROR
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractEventDataFn 
STANDARD_ERROR
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.DebuggingWordCountTest > testDebuggingWordCount 
STANDARD_ERROR
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.io.FileBasedSource 
getEstimatedSizeBytes
INFO: Filepattern 
/tmp/junit6674066865889181197/junit8041641849536197142.tmp matched 1 files with 
total size 54
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern 
/tmp/junit6674066865889181197/junit8041641849536197142.tmp into bundles of size 
3 took 0 ms and produced 1 files and 18 bundles

org.apache.beam.examples.WordCountTest > testExtractWordsFn STANDARD_ERROR
Apr 24, 2018 11:32:51 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.subprocess.ExampleEchoPipelineTest > 
testExampleEchoPipeline STANDARD_ERROR
Apr 24, 2018 11:32:53 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-Echo706128473611446487.sh 
Apr 24, 2018 11:32:53 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:32:53 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-Echo706128473611446487.sh to 

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #175

2018-04-24 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Add a CollectionT to Bundle

[tgroh] Use Bundle in WatermarkManager

--
[...truncated 18.96 MB...]

org.apache.beam.examples.cookbook.MaxPerKeyExamplesTest > testExtractTempFn 
STANDARD_ERROR
Apr 24, 2018 11:31:21 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testFormatCounts 
STANDARD_ERROR
Apr 24, 2018 11:31:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testExtractTornadoes 
STANDARD_ERROR
Apr 24, 2018 11:31:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.BigQueryTornadoesTest > testNoTornadoes 
STANDARD_ERROR
Apr 24, 2018 11:31:22 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.TriggerExampleTest > testExtractTotalFlow 
STANDARD_ERROR
Apr 24, 2018 11:31:23 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > 
testFilterSingleMonthDataFn STANDARD_ERROR
Apr 24, 2018 11:31:23 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.FilterExamplesTest > testProjectionFn 
STANDARD_ERROR
Apr 24, 2018 11:31:23 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractCountryInfoFn 
STANDARD_ERROR
Apr 24, 2018 11:31:23 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.cookbook.JoinExamplesTest > testExtractEventDataFn 
STANDARD_ERROR
Apr 24, 2018 11:31:24 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.DebuggingWordCountTest > testDebuggingWordCount 
STANDARD_ERROR
Apr 24, 2018 11:31:24 PM org.apache.beam.sdk.io.FileBasedSource 
getEstimatedSizeBytes
INFO: Filepattern 
/tmp/junit5219256489574877044/junit5278720245691410365.tmp matched 1 files with 
total size 54
Apr 24, 2018 11:31:24 PM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern 
/tmp/junit5219256489574877044/junit5278720245691410365.tmp into bundles of size 
3 took 1 ms and produced 1 files and 18 bundles

org.apache.beam.examples.WordCountTest > testExtractWordsFn STANDARD_ERROR
Apr 24, 2018 11:31:24 PM org.apache.beam.sdk.transforms.DoFnTester of
WARNING: Your tests use DoFnTester, which may not exercise DoFns correctly. 
Please use TestPipeline instead.

org.apache.beam.examples.subprocess.ExampleEchoPipelineTest > 
testExampleEchoPipeline STANDARD_ERROR
Apr 24, 2018 11:31:26 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-Echo8566447910782988931.sh 
Apr 24, 2018 11:31:26 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:31:26 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-Echo8566447910782988931.sh to 
/tmp/test-Echoo508107964237318519/test-Echo8566447910782988931.sh 
Apr 24, 2018 11:31:26 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils initSemaphore
INFO: Initialized Semaphore for binary test-EchoAgain6057674206472611287.sh 
Apr 24, 2018 11:31:26 PM 
org.apache.beam.examples.subprocess.utils.CallingSubProcessUtils setUp
INFO: Calling filesetup to move Executables to worker.
Apr 24, 2018 11:31:26 PM 
org.apache.beam.examples.subprocess.utils.FileUtils copyFileFromGCSToWorker
INFO: Moving File /tmp/test-EchoAgain6057674206472611287.sh to 
/tmp/test-Echoo508107964237318519/test-EchoAgain6057674206472611287.sh 

org.apache.beam.examples.complete.game.HourlyTeamScoreTest > 
testUserScoresFilter STANDARD_OUT
GOT 

[jira] [Updated] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread yifan zou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-4169:

Description: 16 new machines with 16 CPUs and 104 MEM are created for beam 
Jenkins. We have large enough memory to run gradle build with maximum workers. 
4 workers * 2 jobs * 3.5GB = 28GB

> Remove max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 16 new machines with 16 CPUs and 104 MEM are created for beam Jenkins. We 
> have large enough memory to run gradle build with maximum workers. 4 workers 
> * 2 jobs * 3.5GB = 28GB



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread yifan zou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-4169:

Description: 16 new machines with 16 CPUs and 104 GB MEM are created for 
beam Jenkins. We have large enough memory to run gradle build with maximum 
workers. 4 workers * 2 jobs * 3.5GB = 28GB  (was: 16 new machines with 16 CPUs 
and 104 MEM are created for beam Jenkins. We have large enough memory to run 
gradle build with maximum workers. 4 workers * 2 jobs * 3.5GB = 28GB)

> Remove max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 16 new machines with 16 CPUs and 104 GB MEM are created for beam Jenkins. We 
> have large enough memory to run gradle build with maximum workers. 4 workers 
> * 2 jobs * 3.5GB = 28GB



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4141) Data channel deadlocks when user function fails

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4141?focusedWorklogId=94861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94861
 ]

ASF GitHub Bot logged work on BEAM-4141:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:25
Start Date: 24/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: tgroh closed pull request #5184: BEAM-4141: Drain 
source when user function processing fails.
URL: https://github.com/apache/beam/pull/5184
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/go/pkg/beam/core/runtime/harness/datamgr.go 
b/sdks/go/pkg/beam/core/runtime/harness/datamgr.go
index ffd82779570..89f6d7b65c8 100644
--- a/sdks/go/pkg/beam/core/runtime/harness/datamgr.go
+++ b/sdks/go/pkg/beam/core/runtime/harness/datamgr.go
@@ -28,7 +28,18 @@ import (
"google.golang.org/grpc"
 )
 
-const chunkSize = int(4e6) // Bytes to put in a single gRPC message. Max is 
slightly higher.
+const (
+   chunkSize   = int(4e6) // Bytes to put in a single gRPC message. Max is 
slightly higher.
+   bufElements = 20   // Number of chunks buffered per reader.
+)
+
+// This is a reduced version of the full gRPC interface to help with testing.
+// TODO(wcn): need a compile-time assertion to make sure this stays synced 
with what's
+// in pb.BeamFnData_DataClient
+type dataClient interface {
+   Send(*pb.Elements) error
+   Recv() (*pb.Elements, error)
+}
 
 // DataManager manages data channels to the FnHarness. A fixed number of 
channels
 // are generally used, each managing multiple logical byte streams.
@@ -75,7 +86,7 @@ func (m *DataManager) open(ctx context.Context, port 
exec.Port) (*DataChannel, e
 // DataChannel manages a single grpc connection to the FnHarness.
 type DataChannel struct {
cc *grpc.ClientConn
-   client pb.BeamFnData_DataClient
+   client dataClient
port   exec.Port
 
writers map[string]*dataWriter
@@ -95,7 +106,10 @@ func NewDataChannel(ctx context.Context, port exec.Port) 
(*DataChannel, error) {
cc.Close()
return nil, fmt.Errorf("failed to connect to data service: %v", 
err)
}
+   return makeDataChannel(ctx, cc, client, port)
+}
 
+func makeDataChannel(ctx context.Context, cc *grpc.ClientConn, client 
dataClient, port exec.Port) (*DataChannel, error) {
ret := {
cc:  cc,
client:  client,
@@ -108,25 +122,25 @@ func NewDataChannel(ctx context.Context, port exec.Port) 
(*DataChannel, error) {
return ret, nil
 }
 
-func (m *DataChannel) OpenRead(ctx context.Context, id exec.StreamID) 
(io.ReadCloser, error) {
-   return m.makeReader(ctx, id), nil
+func (c *DataChannel) OpenRead(ctx context.Context, id exec.StreamID) 
(io.ReadCloser, error) {
+   return c.makeReader(ctx, id), nil
 }
 
-func (m *DataChannel) OpenWrite(ctx context.Context, id exec.StreamID) 
(io.WriteCloser, error) {
-   return m.makeWriter(ctx, id), nil
+func (c *DataChannel) OpenWrite(ctx context.Context, id exec.StreamID) 
(io.WriteCloser, error) {
+   return c.makeWriter(ctx, id), nil
 }
 
-func (m *DataChannel) read(ctx context.Context) {
+func (c *DataChannel) read(ctx context.Context) {
cache := make(map[string]*dataReader)
for {
-   msg, err := m.client.Recv()
+   msg, err := c.client.Recv()
if err != nil {
if err == io.EOF {
// TODO(herohde) 10/12/2017: can this happen 
before shutdown? Reconnect?
-   log.Warnf(ctx, "DataChannel %v closed", m.port)
+   log.Warnf(ctx, "DataChannel %v closed", c.port)
return
}
-   panic(fmt.Errorf("channel %v bad: %v", m.port, err))
+   panic(fmt.Errorf("channel %v bad: %v", c.port, err))
}
 
recordStreamReceive(msg)
@@ -136,19 +150,26 @@ func (m *DataChannel) read(ctx context.Context) {
// to reduce lock contention.
 
for _, elm := range msg.GetData() {
-   id := exec.StreamID{Port: m.port, Target: 
exec.Target{ID: elm.GetTarget().PrimitiveTransformReference, Name: 
elm.GetTarget().GetName()}, InstID: elm.GetInstructionReference()}
+   id := exec.StreamID{Port: c.port, Target: 
exec.Target{ID: elm.GetTarget().PrimitiveTransformReference, Name: 
elm.GetTarget().GetName()}, InstID: elm.GetInstructionReference()}
sid := id.String()
 
-   // log.Printf("Chan read (%v): 

[jira] [Work logged] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4169?focusedWorklogId=94859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94859
 ]

ASF GitHub Bot logged work on BEAM-4169:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:23
Start Date: 24/Apr/18 23:23
Worklog Time Spent: 10m 
  Work Description: yifanzou opened a new pull request #5222: BEAM-4169, 
remove max worker for Jenkins build
URL: https://github.com/apache/beam/pull/5222
 
 
   DESCRIPTION HERE
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `./gradlew build` to make sure basic checks pass. A more thorough 
check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94859)
Time Spent: 10m
Remaining Estimate: 0h

> Remove max worker of jenkins build
> --
>
> Key: BEAM-4169
> URL: https://issues.apache.org/jira/browse/BEAM-4169
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4160) Convert JSON objects to Rows

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?focusedWorklogId=94858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94858
 ]

ASF GitHub Bot logged work on BEAM-4160:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:23
Start Date: 24/Apr/18 23:23
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #5120: 
[BEAM-4160] Add JsonToRow transform
URL: https://github.com/apache/beam/pull/5120#discussion_r183907965
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import java.io.IOException;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.util.RowJsonDeserializer;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Creates a {@link PTransform} to convert input JSON objects to {@link Row 
Rows} with given {@link
+ * Schema}.
+ *
+ * Currently supported {@link Schema} field types are:  {@link 
Schema.TypeName#BYTE}
+ * {@link Schema.TypeName#INT16} {@link 
Schema.TypeName#INT32} {@link
+ * Schema.TypeName#INT64} {@link Schema.TypeName#FLOAT} 
{@link
+ * Schema.TypeName#DOUBLE} {@link Schema.TypeName#BOOLEAN} 
{@link
+ * Schema.TypeName#STRING} 
+ *
+ * For specifics of JSON deserialization see {@link RowJsonDeserializer}.
+ *
+ * Conversion is strict, with minimal type coercion:
+ *
+ * Booleans are only parsed from {@code true} or {@code false} literals, 
not from {@code "true"}
+ * or {@code "false"} strings or any other values (exception is thrown in 
these cases).
+ *
+ * If a JSON number doesn't fit into the corresponding schema field type, 
an exception is be
+ * thrown. Strings are not auto-converted to numbers. Floating point numbers 
are not auto-converted
+ * to integral numbers. Precision loss also causes exceptions.
+ *
+ * Only JSON string values can be parsed into {@link TypeName#STRING}. 
Numbers, booleans are not
+ * automatically converted, exceptions are thrown in these cases.
+ *
+ * If a schema field is missing from the JSON value, an exception will be 
thrown.
+ *
+ * Explicit {@code null} literals are allowed in JSON objects. No other 
values are parsed into
+ * {@code null}.
+ */
+public class JsonToRow {
 
 Review comment:
   Added


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94858)
Time Spent: 40m  (was: 0.5h)

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4169) Remove max worker of jenkins build

2018-04-24 Thread yifan zou (JIRA)
yifan zou created BEAM-4169:
---

 Summary: Remove max worker of jenkins build
 Key: BEAM-4169
 URL: https://issues.apache.org/jira/browse/BEAM-4169
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: yifan zou
Assignee: yifan zou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4160) Convert JSON objects to Rows

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?focusedWorklogId=94852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94852
 ]

ASF GitHub Bot logged work on BEAM-4160:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:10
Start Date: 24/Apr/18 23:10
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #5120: 
[BEAM-4160] Add JsonToRow transform
URL: https://github.com/apache/beam/pull/5120#discussion_r183906007
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
 ##
 @@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import java.io.IOException;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.sdk.util.RowJsonDeserializer;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Creates a {@link PTransform} to convert input JSON objects to {@link Row 
Rows} with given {@link
+ * Schema}.
+ *
+ * Currently supported {@link Schema} field types are:  {@link 
Schema.TypeName#BYTE}
+ * {@link Schema.TypeName#INT16} {@link 
Schema.TypeName#INT32} {@link
+ * Schema.TypeName#INT64} {@link Schema.TypeName#FLOAT} 
{@link
+ * Schema.TypeName#DOUBLE} {@link Schema.TypeName#BOOLEAN} 
{@link
+ * Schema.TypeName#STRING} 
+ *
+ * For specifics of JSON deserialization see {@link RowJsonDeserializer}.
+ *
+ * Conversion is strict, with minimal type coercion:
+ *
+ * Booleans are only parsed from {@code true} or {@code false} literals, 
not from {@code "true"}
+ * or {@code "false"} strings or any other values (exception is thrown in 
these cases).
+ *
+ * If a JSON number doesn't fit into the corresponding schema field type, 
an exception is be
+ * thrown. Strings are not auto-converted to numbers. Floating point numbers 
are not auto-converted
+ * to integral numbers. Precision loss also causes exceptions.
+ *
+ * Only JSON string values can be parsed into {@link TypeName#STRING}. 
Numbers, booleans are not
+ * automatically converted, exceptions are thrown in these cases.
+ *
+ * If a schema field is missing from the JSON value, an exception will be 
thrown.
+ *
+ * Explicit {@code null} literals are allowed in JSON objects. No other 
values are parsed into
+ * {@code null}.
+ */
+public class JsonToRow {
+
+  public static PTransform 
withSchema(
+  Schema rowSchema) {
+return JsonToRowFn.forSchema(rowSchema);
+  }
+
+  static class JsonToRowFn extends PTransform {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+
+static JsonToRowFn forSchema(Schema rowSchema) {
+  return new JsonToRowFn(rowSchema);
+}
+
+private JsonToRowFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollection expand(PCollection jsonStrings) {
+  return jsonStrings
+  .apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  context.output(jsonToRow(context.element()));
+}
+  }))
+  .setCoder(schema.getRowCoder());
+}
+
+private Row jsonToRow(String jsonString) {
+  try {
+return objectMapper().readValue(jsonString, Row.class);
+  } catch (IOException e) {
+throw new IllegalArgumentException("Unable to parse json object: " + 
jsonString, e);
+  }
+}
+
+private ObjectMapper objectMapper() {
+  if (this.objectMapper == null) {
+synchronized (this) {
+  if (this.objectMapper == null) {
+this.objectMapper = 
newObjectMapperWith(RowJsonDeserializer.forSchema(this.schema));
 
 Review comment:
   It is safe, this 

[jira] [Work logged] (BEAM-3966) Move core utilities into a new top-level module

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3966?focusedWorklogId=94853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94853
 ]

ASF GitHub Bot logged work on BEAM-3966:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:10
Start Date: 24/Apr/18 23:10
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #4974: [BEAM-3966] Move 
`sdks/java/fn-execution` to `util/java/fn-execution`
URL: https://github.com/apache/beam/pull/4974#issuecomment-384108677
 
 
   OK. I though it was still contentious, but if things look good as they are, 
I'll rebase and ping someone when it's ready for merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94853)
Time Spent: 1.5h  (was: 1h 20m)

> Move core utilities into a new top-level module
> ---
>
> Key: BEAM-3966
> URL: https://issues.apache.org/jira/browse/BEAM-3966
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ben Sidhom
>Assignee: Kenneth Knowles
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As part of a longer-term dependency cleanup, fn-execution and similar 
> utilities should be moved into a new top-level module (util?) that can be 
> shared among runner and/or SDK code while clearly delineating the boundary 
> between runner and SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4166) FnApiDoFnRunner doesn't invoke setup/teardown

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4166?focusedWorklogId=94850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94850
 ]

ASF GitHub Bot logged work on BEAM-4166:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:09
Start Date: 24/Apr/18 23:09
Worklog Time Spent: 10m 
  Work Description: jkff closed pull request #5216: [BEAM-4166] Invoke 
@Setup in FnApiDoFnRunner
URL: https://github.com/apache/beam/pull/5216
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
index b899c16767a..ec1d9b081e4 100644
--- 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
+++ 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
@@ -384,6 +384,7 @@
 this.windowingStrategy = windowingStrategy;
 this.doFnSignature = DoFnSignatures.signatureForDoFn(doFn);
 this.doFnInvoker = DoFnInvokers.invokerFor(doFn);
+this.doFnInvoker.invokeSetup();
 this.stateBinder = new BeamFnStateBinder();
 this.startBundleContext = new StartBundleContext();
 this.processBundleContext = new ProcessBundleContext();
diff --git 
a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
 
b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
index 722561fc6fa..85aa564353a 100644
--- 
a/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
+++ 
b/sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
@@ -18,6 +18,7 @@
 
 package org.apache.beam.fn.harness;
 
+import static com.google.common.base.Preconditions.checkState;
 import static 
org.apache.beam.sdk.util.WindowedValue.timestampedValueInGlobalWindow;
 import static org.apache.beam.sdk.util.WindowedValue.valueInGlobalWindow;
 import static org.hamcrest.Matchers.contains;
@@ -97,10 +98,33 @@
 private static final TupleTag mainOutput = new 
TupleTag<>("mainOutput");
 private static final TupleTag additionalOutput = new 
TupleTag<>("output");
 
+private enum State {
+  NOT_SET_UP,
+  OUTSIDE_BUNDLE,
+  INSIDE_BUNDLE,
+}
+
+private State state = State.NOT_SET_UP;
+
 private BoundedWindow window;
 
+@Setup
+public void setUp() {
+  checkState(State.NOT_SET_UP.equals(state), "Unexpected state: %s", 
state);
+  state = State.OUTSIDE_BUNDLE;
+}
+
+// No testing for TearDown - it's currently not supported by FnHarness.
+
+@StartBundle
+public void startBundle() {
+  checkState(State.OUTSIDE_BUNDLE.equals(state), "Unexpected state: %s", 
state);
+  state = State.INSIDE_BUNDLE;
+}
+
 @ProcessElement
 public void processElement(ProcessContext context, BoundedWindow window) {
+  checkState(State.INSIDE_BUNDLE.equals(state), "Unexpected state: %s", 
state);
   context.output("MainOutput" + context.element());
   context.output(additionalOutput, "AdditionalOutput" + context.element());
   this.window = window;
@@ -108,6 +132,8 @@ public void processElement(ProcessContext context, 
BoundedWindow window) {
 
 @FinishBundle
 public void finishBundle(FinishBundleContext context) {
+  checkState(State.INSIDE_BUNDLE.equals(state), "Unexpected state: %s", 
state);
+  state = State.OUTSIDE_BUNDLE;
   if (window != null) {
 context.output("FinishBundle", window.maxTimestamp(), window);
 window = null;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94850)
Time Spent: 50m  (was: 40m)

> FnApiDoFnRunner doesn't invoke setup/teardown
> -
>
> Key: BEAM-4166
> URL: https://issues.apache.org/jira/browse/BEAM-4166
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> And we apparently lack test coverage for that - one would think that 
> ValidatesRunner tests would check lifecycle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4166) FnApiDoFnRunner doesn't invoke setup/teardown

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4166?focusedWorklogId=94851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94851
 ]

ASF GitHub Bot logged work on BEAM-4166:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:09
Start Date: 24/Apr/18 23:09
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5216: 
[BEAM-4166] Invoke @Setup in FnApiDoFnRunner
URL: https://github.com/apache/beam/pull/5216#discussion_r183905871
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
 ##
 @@ -384,6 +384,7 @@
 this.windowingStrategy = windowingStrategy;
 this.doFnSignature = DoFnSignatures.signatureForDoFn(doFn);
 this.doFnInvoker = DoFnInvokers.invokerFor(doFn);
+this.doFnInvoker.invokeSetup();
 
 Review comment:
   I wasn't sure how the DoFnRunner was used. That makes sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94851)
Time Spent: 1h  (was: 50m)

> FnApiDoFnRunner doesn't invoke setup/teardown
> -
>
> Key: BEAM-4166
> URL: https://issues.apache.org/jira/browse/BEAM-4166
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> And we apparently lack test coverage for that - one would think that 
> ValidatesRunner tests would check lifecycle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5216: [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

2018-04-24 Thread jkff
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 68974067fdb16bde0864a5ecc55326dd5d240f1a
Merge: 5ec708e 054654a
Author: Eugene Kirpichov 
AuthorDate: Tue Apr 24 16:09:19 2018 -0700

Merge pull request #5216: [BEAM-4166] Invoke @Setup in FnApiDoFnRunner

[BEAM-4166] Invoke @Setup in FnApiDoFnRunner

 .../apache/beam/fn/harness/FnApiDoFnRunner.java|  1 +
 .../beam/fn/harness/FnApiDoFnRunnerTest.java   | 26 ++
 2 files changed, 27 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
j...@apache.org.


[jira] [Work logged] (BEAM-3983) BigQuery writes from pure SQL

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3983?focusedWorklogId=94849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94849
 ]

ASF GitHub Bot logged work on BEAM-3983:


Author: ASF GitHub Bot
Created on: 24/Apr/18 23:08
Start Date: 24/Apr/18 23:08
Worklog Time Spent: 10m 
  Work Description: apilloud opened a new pull request #5220: 
[BEAM-3983][SQL] Add BigQuery table provider
URL: https://github.com/apache/beam/pull/5220
 
 
   This adds a bigquery table provider to Beam SQL.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [X] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [X] Write a pull request description that is detailed enough to 
understand:
  - [X] What the pull request does
  - [X] Why it does it
  - [X] How it does it
  - [X] Why this approach
- [X] Each commit in the pull request should have a meaningful subject line 
and body.
- [X] Run `./gradlew build` to make sure basic checks pass. A more thorough 
check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94849)
Time Spent: 11h  (was: 10h 50m)

> BigQuery writes from pure SQL
> -
>
> Key: BEAM-3983
> URL: https://issues.apache.org/jira/browse/BEAM-3983
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> It would be nice if you could write to BigQuery in SQL without writing any 
> java code. For example:
> {code:java}
> INSERT INTO bigquery SELECT * FROM PCOLLECTION{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (5ec708e -> 6897406)

2018-04-24 Thread jkff
This is an automated email from the ASF dual-hosted git repository.

jkff pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 5ec708e  Merge pull request #5215: [BEAM-3157][SQL] Add primitive java 
types support to Row generation logic, add example
 add 054654a  [BEAM-4166] Invoke @Setup in FnApiDoFnRunner
 new 6897406  Merge pull request #5216: [BEAM-4166] Invoke @Setup in 
FnApiDoFnRunner

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../apache/beam/fn/harness/FnApiDoFnRunner.java|  1 +
 .../beam/fn/harness/FnApiDoFnRunnerTest.java   | 26 ++
 2 files changed, 27 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
j...@apache.org.


  1   2   3   4   >