date:20180413

Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #66

2018-04-13 Thread Apache Jenkins Server

See

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90968
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:30
Start Date: 13/Apr/18 20:30
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381252733
 
 
   Run Dataflow PostRelease


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90968)
Time Spent: 91h 50m  (was: 91h 40m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 91h 50m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[beam] branch master updated (6107c31 -> a716332)

2018-04-13 Thread tgroh

This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 6107c31  Merge pull request #4995: Fix GreedyPipelineFuser test
 add 908effc  Rename `defaultRegistry` to `javaSdkNativeRegistry`
 new a716332  Merge pull request #5123: Rename `defaultRegistry` to 
`javaSdkNativeRegistry`

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../main/java/org/apache/beam/runners/direct/DirectRunner.java   | 2 +-
 .../apache/beam/runners/direct/TransformEvaluatorRegistry.java   | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.

[beam] 01/01: Merge pull request #5123: Rename `defaultRegistry` to `javaSdkNativeRegistry`

2018-04-13 Thread tgroh

This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit a716332a2b2babc7f50236c50762b0d9a938f395
Merge: 6107c31 908effc
Author: Thomas Groh 
AuthorDate: Fri Apr 13 14:50:24 2018 -0700

Merge pull request #5123: Rename `defaultRegistry` to 
`javaSdkNativeRegistry`

 .../main/java/org/apache/beam/runners/direct/DirectRunner.java   | 2 +-
 .../apache/beam/runners/direct/TransformEvaluatorRegistry.java   | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.

[jira] [Updated] (BEAM-4072) using instance of S3Options causes crash when no AWS region specified

2018-04-13 Thread Samuel Waggoner (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Waggoner updated BEAM-4072:
--
Summary: using instance of S3Options causes crash when no AWS region 
specified  (was: using instance of S3Options causes crash when no AWS 
credentials available)

> using instance of S3Options causes crash when no AWS region specified
> -
>
> Key: BEAM-4072
> URL: https://issues.apache.org/jira/browse/BEAM-4072
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Samuel Waggoner
>Assignee: Ismaël Mejía
>Priority: Major
>
> I want my pipeline to support writing to S3, but I also want it to work 
> locally when a user doesn't specify any S3-specific options. Instead, I get 
> the following exception:
> {code:java}
> com.amazonaws.SdkClientException: Unable to find a region via the region 
> provider chain. Must provide an explicit region in the builder or setup 
> environment to supply a region. E at 
> com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:371){code}
> I had thought that I should use a single PipelineOptions interface that 
> supports all available options for my pipeline. Is this wrong? Should using 
> an instance of S3Options for my pipeline options force the user to write to 
> s3 (or otherwise provide s3 options). 
> Should I choose a PipelineOptions subinterface based on the options the user 
> tries to use at runtime?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostRelease_NightlySnapshot #192

2018-04-13 Thread Apache Jenkins Server

See 


--
GitHub pull request #4788 of commit 69e26b6e387cd4f2c260c9621afdb06532171f13, 
no merge conflicts.
Setting status of 69e26b6e387cd4f2c260c9621afdb06532171f13 to PENDING with url 
https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/192/ and 
message: 'Build started sha1 is merged.'
Using context: Jenkins: ./gradlew :release:runQuickstartsJava
[EnvInject] - Loading node environment variables.
Building remotely on beam1 (beam) in workspace 

Cloning the remote Git repository
Cloning repository https://github.com/apache/beam.git
 > git init 
 >  # 
 > timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/4788/*:refs/remotes/origin/pr/4788/*
 > git rev-parse refs/remotes/origin/pr/4788/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/4788/merge^{commit} # timeout=10
Checking out Revision e4d6b3ee9907eeff5a9f5dfd88ca776f91f500c3 
(refs/remotes/origin/pr/4788/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e4d6b3ee9907eeff5a9f5dfd88ca776f91f500c3
Commit message: "Merge 69e26b6e387cd4f2c260c9621afdb06532171f13 into 
007553fc154b058fda5edcebc2ed54100b30f0cc"
 > git rev-list --no-walk 4f0cc202bad3e72bd0b7b15df2b8a5096b233577 # timeout=10
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[Gradle] - Launching build.
[src] $ 
 
--info --continue --rerun-tasks --no-daemon -Pver= -Prepourl= 
:release:runQuickstartsJava
Initialized native services in: /home/jenkins/.gradle/native
Using 4 worker leases.
Starting Build
Settings evaluated using settings file 
'
Projects loaded. Root project using build file 
'
Included projects: [root project 'beam', project ':beam-examples-java', project 
':beam-local-artifact-service-java', project ':beam-model-fn-execution', 
project ':beam-model-job-management', project ':beam-model-pipeline', project 
':beam-runners-apex', project ':beam-runners-core-construction-java', project 
':beam-runners-core-java', project ':beam-runners-direct-java', project 
':beam-runners-flink_2.11', project ':beam-runners-gcp-gcemd', project 
':beam-runners-gcp-gcsproxy', project ':beam-runners-gearpump', project 
':beam-runners-google-cloud-dataflow-java', project 
':beam-runners-java-fn-execution', project ':beam-runners-local-java-core', 
project ':beam-runners-reference-java', project 
':beam-runners-reference-job-server', project ':beam-runners-spark', project 
':beam-sdks-go', project ':beam-sdks-go-container', project 
':beam-sdks-go-examples', project ':beam-sdks-java-build-tools', project 
':beam-sdks-java-container', project ':beam-sdks-java-core', project 
':beam-sdks-java-extensions-google-cloud-platform-core', project 
':beam-sdks-java-extensions-join-library', project 
':beam-sdks-java-extensions-json-jackson', project 
':beam-sdks-java-extensions-protobuf', project 
':beam-sdks-java-extensions-sketching', project 
':beam-sdks-java-extensions-sorter', project ':beam-sdks-java-extensions-sql', 
project ':beam-sdks-java-fn-execution', project ':beam-sdks-java-harness', 
project ':beam-sdks-java-io-amazon-web-services', project 
':beam-sdks-java-io-amqp', project ':beam-sdks-java-io-cassandra', project 
':beam-sdks-java-io-common', project ':beam-sdks-java-io-elasticsearch', 
project ':beam-sdks-java-io-elasticsearch-tests-2', project 
':beam-sdks-java-io-elasticsearch-tests-5', project 
':beam-sdks-java-io-elasticsearch-tests-common', project 
':beam-sdks-java-io-file-based-io-tests', project 
':beam-sdks-java-io-google-cloud-platform', project

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #106

2018-04-13 Thread Apache Jenkins Server

See

[jira] [Closed] (BEAM-3360) [SQL] Do not assign triggers for HOP/TUMBLE

2018-04-13 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3360.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

fixed by https://github.com/apache/beam/pull/4546

> [SQL] Do not assign triggers for HOP/TUMBLE
> ---
>
> Key: BEAM-3360
> URL: https://issues.apache.org/jira/browse/BEAM-3360
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently when parsing HOP/TUMBLE/SESSION expressions we create a repeating 
> trigger for the defined windows, see:
> {code:java|title=BeamAggregationRule.java}
>   private Trigger createTriggerWithDelay(GregorianCalendar delayTime) {
> return 
> Repeatedly.forever(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime
> 
> .pastFirstElementInPane().plusDelayOf(Duration.millis(delayTime.getTimeInMillis();
>   }
> {code}
> This will not work correctly with joins, as joins with multiple trigger 
> firings are currently broken: https://issues.apache.org/jira/browse/BEAM-3190 
> .
> Even if joins with multiple firings worked correctly, SQL parsing stage is 
> still probably an incorrect place to infer them.
> Better alternatives:
>  - inherit the user-defined triggers for the input pcollection without 
> modification;
>  - triggering at sinks ( https://s.apache.org/beam-sink-triggers ) might 
> define a way to backpropagate triggers with correct semantics;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4066) Nexmark fails when running with PubSub as source/sink

2018-04-13 Thread Alexey Romanenko (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437551#comment-16437551
 ] 

Alexey Romanenko commented on BEAM-4066:


The root cause of this issue - the anonymous inner class for DoFn is not 
serialisable, so we need to extract them. I'll provide a PR for that. 

> Nexmark fails when running with PubSub as source/sink
> -
>
> Key: BEAM-4066
> URL: https://issues.apache.org/jira/browse/BEAM-4066
> Project: Beam
>  Issue Type: Bug
>  Components: examples-nexmark
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
> Attachments: gistfile1.txt
>
>
> Running Nexmark with PubSub cause this exception:
> {noformat}
> Caused by: java.io.NotSerializableException: 
> org.apache.beam.sdk.nexmark.NexmarkLauncher
> {noformat}
> Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (BEAM-3157) BeamSql transform should support other PCollection types

2018-04-13 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3157.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> BeamSql transform should support other PCollection types
> 
>
> Key: BEAM-3157
> URL: https://issues.apache.org/jira/browse/BEAM-3157
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Ismaël Mejía
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently the Beam SQL transform only supports input and output data 
> represented as a BeamRecord. This seems to me like an usability limitation 
> (even if we can do a ParDo to prepare objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map 
> name/type/value from an object field into Calcite so it is convenient to have 
> a specific data type (BeamRecord) for this. However we can accomplish the 
> same by using a PCollection of JavaBean (where we know the same information 
> via the field names/types/values) or by using Avro records where we also have 
> the Schema information. For the output PCollection we can map the object via 
> a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not 
> support composite types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection col = BeamSql.query("SELECT * FROM  WHERE 
> ...").from(MyPojo.class);
> A projection:
> PCollection newCol = BeamSql.query("SELECT id, 
> name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns 
> however I suppose that for memory use reasons maybe mapping directly into 
> Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90892
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:58
Start Date: 13/Apr/18 16:58
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async 
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181450180
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
 ##
 @@ -350,19 +353,21 @@ public TokenRange(
* Writer storing an entity into Apache Cassandra database.
*/
   protected class WriterImpl implements Writer {
-
+private static final int CONCURRENT_ASYNC_QUERIES = 100;
 
 Review comment:
   Though, it would make sense to add a comment why this number was chosen.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90892)
Time Spent: 1h 50m  (was: 1h 40m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-4059) Make sure Dataflow ValidatesRunner tests pass in Gradle

2018-04-13 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437583#comment-16437583
 ] 

Pablo Estrada edited comment on BEAM-4059 at 4/13/18 5:04 PM:
--

Looking at the latest passing test, I've grouped the number of jobs started by 
the minute they were started. In the latest passing test, at times 10 jobs were 
started in the same minute. I'd think this means that the Maven-based tests 
were faster due to the fact that they had more jobs running at the same time. 
This would point towards increasing the no. of parallel forks in the Gradle 
suite: 

 
{code:java}
 # tests, time, action
 6 2018-04-05_19_16 started job
 4 2018-04-05_19_19 started job
 4 2018-04-05_19_20 started job
 1 2018-04-05_19_22 started job
 5 2018-04-05_19_23 started job
 8 2018-04-05_19_26 started job
 1 2018-04-05_19_29 started job
 8 2018-04-05_19_30 started job
 4 2018-04-05_19_33 started job
 2 2018-04-05_19_36 started job
 8 2018-04-05_19_37 started job
 2 2018-04-05_19_39 started job
 10 2018-04-05_19_40 started job
 7 2018-04-05_19_43 started job
 5 2018-04-05_19_44 started job
 2 2018-04-05_19_46 started job
 5 2018-04-05_19_47 started job
 2 2018-04-05_19_48 started job
 1 2018-04-05_19_49 started job
 7 2018-04-05_19_50 started job
 1 2018-04-05_19_51 started job
 7 2018-04-05_19_53 started job
 1 2018-04-05_19_54 started job
 1 2018-04-05_19_55 started job
 1 2018-04-05_19_56 started job
 2 2018-04-05_19_57 started job
 6 2018-04-05_19_58 started job
 1 2018-04-05_20_00 started job
 4 2018-04-05_20_01 started job
 5 2018-04-05_20_02 started job
 2 2018-04-05_20_03 started job
 1 2018-04-05_20_04 started job
 8 2018-04-05_20_05 started job
 2 2018-04-05_20_06 started job
 2 2018-04-05_20_07 started job
 5 2018-04-05_20_08 started job
 6 2018-04-05_20_09 started job
 2 2018-04-05_20_10 started job
 2 2018-04-05_20_11 started job
 6 2018-04-05_20_12 started job
 3 2018-04-05_20_13 started job
 3 2018-04-05_20_14 started job
 5 2018-04-05_20_15 started job
 2 2018-04-05_20_16 started job
 3 2018-04-05_20_17 started job
 4 2018-04-05_20_18 started job
 5 2018-04-05_20_19 started job
 2 2018-04-05_20_20 started job
 3 2018-04-05_20_21 started job
 6 2018-04-05_20_22 started job
 1 2018-04-05_20_23 started job
 3 2018-04-05_20_24 started job
 6 2018-04-05_20_25 started job
 4 2018-04-05_20_26 started job
 2 2018-04-05_20_27 started job
 1 2018-04-05_20_28 started job
 9 2018-04-05_20_29 started job
 3 2018-04-05_20_30 started job
 10 2018-04-05_20_32 started job
 2 2018-04-05_20_33 started job
 2 2018-04-05_20_34 started job
 4 2018-04-05_20_35 started job
 4 2018-04-05_20_36 started job
 3 2018-04-05_20_37 started job
 2 2018-04-05_20_38 started job
 6 2018-04-05_20_39 started job
 3 2018-04-05_20_40 started job
 9 2018-04-05_20_42 started job
 3 2018-04-05_20_44 started job
 6 2018-04-05_20_45 started job
 3 2018-04-05_20_46 started job
 2 2018-04-05_20_47 started job
 1 2018-04-05_20_48 started job
 1 2018-04-05_20_49 started job
{code}
 


was (Author: pabloem):
Looking at the latest passing test, I've grouped the number of jobs started by 
the minute they were started. In the latest passing test, at times 10 jobs were 
started in the same minute. I'd think this means that the Maven-based tests 
were faster due to the fact that they had more jobs running at the same time. 
This would point towards increasing the no. of parallel forks in the Gradle 
suite: 

{{# of Tests, Time, Action

6 2018-04-05_19_16 started job
 4 2018-04-05_19_19 started job
 4 2018-04-05_19_20 started job
 1 2018-04-05_19_22 started job
 5 2018-04-05_19_23 started job
 8 2018-04-05_19_26 started job
 1 2018-04-05_19_29 started job
 8 2018-04-05_19_30 started job
 4 2018-04-05_19_33 started job
 2 2018-04-05_19_36 started job
 8 2018-04-05_19_37 started job
 2 2018-04-05_19_39 started job
 10 2018-04-05_19_40 started job
 7 2018-04-05_19_43 started job
 5 2018-04-05_19_44 started job
 2 2018-04-05_19_46 started job
 5 2018-04-05_19_47 started job
 2 2018-04-05_19_48 started job
 1 2018-04-05_19_49 started job
 7 2018-04-05_19_50 started job
 1 2018-04-05_19_51 started job
 7 2018-04-05_19_53 started job
 1 2018-04-05_19_54 started job
 1 2018-04-05_19_55 started job
 1 2018-04-05_19_56 started job
 2 2018-04-05_19_57 started job
 6 2018-04-05_19_58 started job
 1 2018-04-05_20_00 started job
 4 2018-04-05_20_01 started job
 5 2018-04-05_20_02 started job
 2 2018-04-05_20_03 started job
 1 2018-04-05_20_04 started job
 8 2018-04-05_20_05 started job
 2 2018-04-05_20_06 started job
 2 2018-04-05_20_07 started job
 5 2018-04-05_20_08 started job
 6 2018-04-05_20_09 started job
 2 2018-04-05_20_10 started job
 2 2018-04-05_20_11 started job
 6 2018-04-05_20_12 started job
 3 2018-04-05_20_13 started job
 3 2018-04-05_20_14 started job
 5 2018-04-05_20_15 started job
 2 2018-04-05_20_16 started job
 3 2018-04-05_20_17 started job

[jira] [Work logged] (BEAM-4069) Empty pipeline options can be gracefully serialized/deserialized

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4069?focusedWorklogId=90917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90917
 ]

ASF GitHub Bot logged work on BEAM-4069:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:12
Start Date: 13/Apr/18 18:12
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5126: [BEAM-4069] 
Gracefully deserialize empty options structs
URL: https://github.com/apache/beam/pull/5126#issuecomment-381218793
 
 
   R: @kennknowles 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90917)
Time Spent: 20m  (was: 10m)

> Empty pipeline options can be gracefully serialized/deserialized
> 
>
> Key: BEAM-4069
> URL: https://issues.apache.org/jira/browse/BEAM-4069
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> PipelineOptionsTranslation.fromProto currently crashes with a 
> NullPointerException when passed an empty options Struct. This is due to 
> ProxyInvocationHandler.Deserializer expecting a non-empty enclosing Struct.
> Empty pipeline options may be passed by SDKs interacting with a job server, 
> so this case needs to be handled. Note that testing a round-trip of an 
> effectively-empty Java PipelineOptions object is not sufficient to catch this 
> because "empty" Java options still contain default fields not defined in 
> other SDKs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3994) Use typed sinks and sources for FnApiControlClientPoolService

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3994?focusedWorklogId=90930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90930
 ]

ASF GitHub Bot logged work on BEAM-3994:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:26
Start Date: 13/Apr/18 18:26
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5008: 
[BEAM-3994] Use typed client pool sinks and sources
URL: https://github.com/apache/beam/pull/5008#discussion_r181470695
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java
 ##
 @@ -34,13 +34,14 @@
 implements FnService {
   private static final Logger LOGGER = 
LoggerFactory.getLogger(FnApiControlClientPoolService.class);
 
-  private final BlockingQueue clientPool;
+  private final ThrowingConsumer clientPool;
   private final Collection vendedClients = new 
CopyOnWriteArrayList<>();
   private final HeaderAccessor headerAccessor;
   private AtomicBoolean closed = new AtomicBoolean();
 
   private FnApiControlClientPoolService(
-  BlockingQueue clientPool, HeaderAccessor 
headerAccessor) {
+  ThrowingConsumer clientPool,
 
 Review comment:
   I tried to rename this, but it made for a surprisingly annoying time when 
trying to rebase this into the correct commit. I don't think it's worth making 
that change here because this file is going away anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90930)
Time Spent: 2h  (was: 1h 50m)

> Use typed sinks and sources for FnApiControlClientPoolService
> -
>
> Key: BEAM-3994
> URL: https://issues.apache.org/jira/browse/BEAM-3994
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We operate with blocking queues directly when managing control clients with 
> the FnApiControlClientPoolService. This makes interactions with the client 
> pool difficult to understand. We should instead make client sources and sinks 
> explicit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3994) Use typed sinks and sources for FnApiControlClientPoolService

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3994?focusedWorklogId=90931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90931
 ]

ASF GitHub Bot logged work on BEAM-3994:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:26
Start Date: 13/Apr/18 18:26
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5008: 
[BEAM-3994] Use typed client pool sinks and sources
URL: https://github.com/apache/beam/pull/5008#discussion_r181470775
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/function/package-info.java
 ##
 @@ -16,7 +16,5 @@
  * limitations under the License.
  */
 
-/**
- * Java 8 functional interface extensions.
- */
-package org.apache.beam.fn.harness.fn;
+/** Java 8 functional utilities for fn-execution. */
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90931)
Time Spent: 2h 10m  (was: 2h)

> Use typed sinks and sources for FnApiControlClientPoolService
> -
>
> Key: BEAM-3994
> URL: https://issues.apache.org/jira/browse/BEAM-3994
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We operate with blocking queues directly when managing control clients with 
> the FnApiControlClientPoolService. This makes interactions with the client 
> pool difficult to understand. We should instead make client sources and sinks 
> explicit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-2898) Flink supports chaining/fusion of single-SDK stages

2018-04-13 Thread Ben Sidhom (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Sidhom reassigned BEAM-2898:


Assignee: Ben Sidhom

> Flink supports chaining/fusion of single-SDK stages
> ---
>
> Key: BEAM-2898
> URL: https://issues.apache.org/jira/browse/BEAM-2898
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The Fn API supports fused stages, which avoids unnecessarily round-tripping 
> the data over the Fn API between stages. The Flink runner should use that 
> capability for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4072) using instance of S3Options causes crash when no AWS credentials available

2018-04-13 Thread Samuel Waggoner (JIRA)

Samuel Waggoner created BEAM-4072:
-

 Summary: using instance of S3Options causes crash when no AWS 
credentials available
 Key: BEAM-4072
 URL: https://issues.apache.org/jira/browse/BEAM-4072
 Project: Beam
  Issue Type: Bug
  Components: io-java-aws
Reporter: Samuel Waggoner
Assignee: Ismaël Mejía


I want my pipeline to support writing to S3, but I also want it to work locally 
when a user doesn't specify any S3-specific options. Instead, I get the 
following exception:
{code:java}
com.amazonaws.SdkClientException: Unable to find a region via the region 
provider chain. Must provide an explicit region in the builder or setup 
environment to supply a region. E at 
com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:371){code}
I had thought that I should use a single PipelineOptions interface that 
supports all available options for my pipeline. Is this wrong? Should using an 
instance of S3Options for my pipeline options force the user to write to s3 (or 
otherwise provide s3 options). 

Should I choose a PipelineOptions subinterface based on the options the user 
tries to use at runtime?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=90955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90955
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:08
Start Date: 13/Apr/18 20:08
Worklog Time Spent: 10m 
  Work Description: gkumar7 commented on issue #5111: BEAM-4038: Support 
Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#issuecomment-381247716
 
 
   @rangadi, let me know what you think. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90955)
Time Spent: 20m  (was: 10m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90958
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:10
Start Date: 13/Apr/18 20:10
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381248361
 
 
   Run Dataflow PostRelease


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90958)
Time Spent: 91h 10m  (was: 91h)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 91h 10m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4073) The DirectRunner should interact with a Pipeline via an abstraction of the Graph rather than SDK types

2018-04-13 Thread Thomas Groh (JIRA)

Thomas Groh created BEAM-4073:
-

 Summary: The DirectRunner should interact with a Pipeline via an 
abstraction of the Graph rather than SDK types
 Key: BEAM-4073
 URL: https://issues.apache.org/jira/browse/BEAM-4073
 Project: Beam
  Issue Type: Bug
  Components: runner-direct
Reporter: Thomas Groh
Assignee: Thomas Groh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90971
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:48
Start Date: 13/Apr/18 20:48
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381256727
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90971)
Time Spent: 92h  (was: 91h 50m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 92h
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostRelease_NightlySnapshot #193

2018-04-13 Thread Apache Jenkins Server

See 


--
[...truncated 24.69 MB...]
[INFO] 
Verified [INFO] BUILD SUCCESS
cd word-count-beam
ls
pom.xml
src
Verified pom.xml
Verified src
ls src/main/java/org/apache/beam/examples/
common
complete
DebuggingWordCount.java
MinimalWordCount.java
subprocess
WindowedWordCount.java
WordCount.java
Verified WordCount.java
ls src/main/java/org/apache/beam/examples/complete/game/
GameStats.java
HourlyTeamScore.java
injector
LeaderBoard.java
StatefulTeamScore.java
UserScore.java
utils
Verified UserScore.java

**
* Test: Runs the WordCount Code with Dataflow runner
**

gsutil rm 
gs://temp-storage-for-release-validation-tests/nightly-snapshot-validation/count*
 || echo 'No files'
CommandException: No URLs matched: 
gs://temp-storage-for-release-validation-tests/nightly-snapshot-validation/count*
No files
mvn compile exec:java -q   
-Dexec.mainClass=org.apache.beam.examples.WordCount   
-Dexec.args="--runner=DataflowRunner
--project=apache-beam-testing
--gcpTempLocation=gs://temp-storage-for-release-validation-tests/nightly-snapshot-validation/tmp

--output=gs://temp-storage-for-release-validation-tests/nightly-snapshot-validation/counts
--inputFile=gs://apache-beam-samples/shakespeare/*" 
-Pdataflow-runner
Using maven /home/jenkins/tools/maven/apache-maven-3.5.2
Apr 13, 2018 9:07:22 PM 
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory
 create
INFO: No stagingLocation provided, falling back to gcpTempLocation
Apr 13, 2018 9:07:22 PM org.apache.beam.runners.dataflow.DataflowRunner 
fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from 
the classpath: will stage 109 files. Enable logging at DEBUG level to see which 
files will be staged.
Apr 13, 2018 9:07:23 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing 
implications related to Google Compute Engine usage and other Google Cloud 
Services.
Apr 13, 2018 9:07:23 PM org.apache.beam.runners.dataflow.util.PackageUtil 
stageClasspathElements
INFO: Uploading 109 files from PipelineOptions.filesToStage to staging location 
to prepare for execution.
Apr 13, 2018 9:07:25 PM org.apache.beam.runners.dataflow.util.PackageUtil 
tryStagePackage
INFO: Uploading 
/tmp/groovy-generated-8562441399955990022-tmpdir/word-count-beam/target/classes 
to 
gs://temp-storage-for-release-validation-tests/nightly-snapshot-validation/tmp/staging/classes-U_sqF91VWhdJaeBizUd1tg.jar
Apr 13, 2018 9:07:25 PM org.apache.beam.runners.dataflow.util.PackageUtil 
stageClasspathElements
INFO: Staging files complete: 108 files cached, 1 files newly uploaded
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding ReadLines/Read as step s1
Apr 13, 2018 9:07:26 PM org.apache.beam.sdk.io.FileBasedSource 
getEstimatedSizeBytes
INFO: Filepattern gs://apache-beam-samples/shakespeare/* matched 43 files with 
total size 5284696
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding WordCount.CountWords/ParDo(ExtractWords) as step s2
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding WordCount.CountWords/Count.PerElement/Init/Map as step s3
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey as step 
s4
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues
 as step s5
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding MapElements/Map as step s6
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding WriteCounts/WriteFiles/RewindowIntoGlobal/Window.Assign as step s7
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles 
as step s8
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding 
WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/GroupUnwritten as step 
s9
Apr 13, 2018 9:07:26 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding

[jira] [Commented] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-13 Thread Eugene Kirpichov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437961#comment-16437961
 ] 

Eugene Kirpichov commented on BEAM-4016:


Yeah it's the desired order. The fix is to add a call to invoker.invokeSetup() 
to 
https://github.com/apache/beam/blob/6107c314e7a1af0d29b3cd865c0dee1013c2261c/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L357
 . 

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4071) Portable Runner Job API shim

2018-04-13 Thread Ben Sidhom (JIRA)

Ben Sidhom created BEAM-4071:


 Summary: Portable Runner Job API shim
 Key: BEAM-4071
 URL: https://issues.apache.org/jira/browse/BEAM-4071
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Ben Sidhom
Assignee: Ben Sidhom


There needs to be a way to execute Java-SDK pipelines against a portable job 
server. The job server itself is expected to be started up out-of-band. The 
"PortableRunner" should take an option indicating the Job API endpoint and 
defer other runner configurations to the backend itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Apex_Gradle #90

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[sidhom] Support impulse transforms in Flink

[sidhom] Add Impulse ValidatesRunner test

--
[...truncated 27.96 MB...]
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.engine.Node emitEndStream
INFO: 14 sending EndOfStream
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.engine.Node emitEndStream
INFO: 15 sending EndOfStream
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.engine.Node emitEndStream
INFO: 16 sending EndOfStream
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.engine.Node emitEndStream
INFO: 17 sending EndOfStream
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.engine.Node emitEndStream
INFO: 18 sending EndOfStream
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.engine.Node emitEndStream
INFO: 19 sending EndOfStream
Apr 13, 2018 7:54:12 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.engine.StreamingContainer 
processHeartbeatResponse
INFO: Undeploy request: [12]
Apr 13, 2018 7:54:13 PM com.datatorrent.stram.engine.StreamingContainer 
undeploy
INFO: Undeploy complete.
Apr 13, 2018 7:54:13 PM com.datatorrent.bufferserver.server.Server$3 run
INFO: Removing ln 
LogicalNode@68178a9bidentifier=tcp://localhost:40192/12.outputPort.12, 
upstream=12.outputPort.12, group=stream9/13.data2, partitions=[], 
iterator=com.datatorrent.bufferserver.internal.DataList$DataListIterator@3da01154{da=com.datatorrent.bufferserver.internal.DataList$Block@5efb8cb6{identifier=12.outputPort.12,
 data=1048576, readingOffset=0, writingOffset=77, 
starting_window=5ad10ae1, ending_window=5ad10ae2, refCount=2, 
uniqueIdentifier=0, next=null, future=null}}} from dl 
com.datatorrent.bufferserver.internal.DataList@2a6e2f95 {12.outputPort.12}
Apr 13, 2018 7:54:14 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:14 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:14 PM com.datatorrent.stram.Journal write
WARNING: Journal output stream is null. Skipping write to the WAL.
Apr 13, 2018 7:54:14

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90960
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:25
Start Date: 13/Apr/18 20:25
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381251561
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90960)
Time Spent: 91h 20m  (was: 91h 10m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 91h 20m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90963
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:26
Start Date: 13/Apr/18 20:26
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381248361
 
 
   Run Dataflow PostRelease


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90963)
Time Spent: 91h 40m  (was: 91.5h)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 91h 40m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90962
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:26
Start Date: 13/Apr/18 20:26
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381242690
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90962)
Time Spent: 91.5h  (was: 91h 20m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 91.5h
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (BEAM-3952) GreedyStageFuserTest broken

2018-04-13 Thread Thomas Groh (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-3952.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> GreedyStageFuserTest broken
> ---
>
> Key: BEAM-3952
> URL: https://issues.apache.org/jira/browse/BEAM-3952
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Thomas Groh
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The materializesWithDifferentEnvConsumer test is currently failing due to a 
> bad assertion. The fused subgraph contains the parDo.out PCollection but the 
> test expects an empty output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3952) GreedyStageFuserTest broken

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3952?focusedWorklogId=90982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90982
 ]

ASF GitHub Bot logged work on BEAM-3952:


Author: ASF GitHub Bot
Created on: 13/Apr/18 21:22
Start Date: 13/Apr/18 21:22
Worklog Time Spent: 10m 
  Work Description: tgroh closed pull request #4995: [BEAM-3952][BEAM-3988] 
Fix GreedyPipelineFuser test
URL: https://github.com/apache/beam/pull/4995
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
index 27c46fdabe3..4373d78eb58 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
@@ -110,16 +110,9 @@ private QueryablePipeline(Collection transformIds, 
Components components
 return ids;
   }
 
-  /**
-   * Returns true if the provided transform is a primitive. A primitive has no 
subtransforms and
-   * produces a new {@link PCollection}.
-   *
-   * Note that this precludes primitive transforms which only consume input 
and produce no
-   * PCollections as output.
-   */
+  /** Returns true if the provided transform is a primitive. A primitive has 
no subtransforms. */
   private static boolean isPrimitiveTransform(PTransform transform) {
-return transform.getSubtransformsCount() == 0
-&& 
!transform.getInputsMap().values().containsAll(transform.getOutputsMap().values());
+return transform.getSubtransformsCount() == 0;
   }
 
   private MutableNetwork buildNetwork(
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java
index ee64eca03e3..369160a73d5 100644
--- 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java
+++ 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/graph/GreedyStageFuserTest.java
@@ -672,6 +672,7 @@ public void materializesWithDifferentEnvConsumer() {
 PTransform parDoTransform =
 PTransform.newBuilder()
 .putInputs("input", "impulse.out")
+.putOutputs("out", "parDo.out")
 .setSpec(
 FunctionSpec.newBuilder()
 .setUrn(PTransformTranslation.PAR_DO_TRANSFORM_URN)
@@ -680,20 +681,20 @@ public void materializesWithDifferentEnvConsumer() {
 
.setDoFn(SdkFunctionSpec.newBuilder().setEnvironmentId("common"))
 .build()
 .toByteString()))
-.putOutputs("out", "parDo.out")
 .build();
 
+PCollection parDoOutput = 
PCollection.newBuilder().setUniqueName("parDo.out").build();
 QueryablePipeline p =
 QueryablePipeline.forPrimitivesIn(
 partialComponents
 .toBuilder()
 .putTransforms("parDo", parDoTransform)
-.putPcollections(
-"parDo.out", 
PCollection.newBuilder().setUniqueName("parDo.out").build())
+.putPcollections("parDo.out", parDoOutput)
 .putTransforms(
 "window",
 PTransform.newBuilder()
 .putInputs("input", "parDo.out")
+.putOutputs("output", "window.out")
 .setSpec(
 FunctionSpec.newBuilder()
 
.setUrn(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN)
@@ -704,6 +705,8 @@ public void materializesWithDifferentEnvConsumer() {
 .build()
 .toByteString()))
 .build())
+.putPcollections(
+"window.out", 
PCollection.newBuilder().setUniqueName("window.out").build())
 .putEnvironments("rare", 
Environment.newBuilder().setUrl("rare").build())
 .putEnvironments("common", env)
 .build());
@@ -711,7 +714,9 @@ public void materializesWithDifferentEnvConsumer() {
 ExecutableStage subgraph =

[jira] [Resolved] (BEAM-3988) QueryablePipeline should consider any PTransform with no subtransforms to be a primtiive

2018-04-13 Thread Thomas Groh (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-3988.
---
   Resolution: Fixed
Fix Version/s: 2.5.0

> QueryablePipeline should consider any PTransform with no subtransforms to be 
> a primtiive
> 
>
> Key: BEAM-3988
> URL: https://issues.apache.org/jira/browse/BEAM-3988
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
> Fix For: 2.5.0
>
>
> This more closely aligns with what will actually need execution by the runner.
>  
> Some PTransforms will no longer be acceptable as composites, namely any 
> transform which decomposes some {{PInput}} to {{POutput}}. However, these 
> transforms are trivially authorable inline, so this shouldn't be a big deal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90948
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 19:46
Start Date: 13/Apr/18 19:46
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381242690
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90948)
Time Spent: 91h  (was: 90h 50m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 91h
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3513) Use portable CombinePayload in Java DataflowRunner

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3513?focusedWorklogId=90969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90969
 ]

ASF GitHub Bot logged work on BEAM-3513:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:36
Start Date: 13/Apr/18 20:36
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #5119: [BEAM-3513] Removing 
PrimitiveCombineGroupedValues override w/ FnAPI.
URL: https://github.com/apache/beam/pull/5119#issuecomment-381254135
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90969)
Time Spent: 20m  (was: 10m)

> Use portable CombinePayload in Java DataflowRunner
> --
>
> Key: BEAM-3513
> URL: https://issues.apache.org/jira/browse/BEAM-3513
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2937) Fn API combiner support w/ lifting to PGBK

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2937?focusedWorklogId=90972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90972
 ]

ASF GitHub Bot logged work on BEAM-2937:


Author: ASF GitHub Bot
Created on: 13/Apr/18 20:51
Start Date: 13/Apr/18 20:51
Worklog Time Spent: 10m 
  Work Description: youngoli opened a new pull request #5128: [BEAM-2937] 
Update Portable Combine URNs to new URNs.
URL: https://github.com/apache/beam/pull/5128
 
 
   Changing combine URNs based on the portable combines doc:
   https://s.apache.org/beam-runner-api-combine-model
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [x] Write a pull request description that is detailed enough to 
understand:
  - [x] What the pull request does
  - [x] Why it does it
  - [x] How it does it
  - [x] Why this approach
- [x] Each commit in the pull request should have a meaningful subject line 
and body.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90972)
Time Spent: 10m
Remaining Estimate: 0h

> Fn API combiner support w/ lifting to PGBK
> --
>
> Key: BEAM-2937
> URL: https://issues.apache.org/jira/browse/BEAM-2937
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The FnAPI should support this optimization. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4074) Cleanup metrics/counters code

2018-04-13 Thread Robert Bradshaw (JIRA)

Robert Bradshaw created BEAM-4074:
-

 Summary: Cleanup metrics/counters code
 Key: BEAM-4074
 URL: https://issues.apache.org/jira/browse/BEAM-4074
 Project: Beam
  Issue Type: Task
  Components: sdk-py-core
Reporter: Robert Bradshaw
Assignee: Ahmet Altay


E.g. right now we have

metricbase.Distribution
metric.DelegatingDistribution
metric.cells.DistributionCell
metric.cells.DistributionResult
metric.cells.DistributionData
metric.cells.DistributionAggregator
transforms.cy_combiners.DataflowDistributionCounter
transforms.cy_combiners.DataflowDistributionCounterFn
transforms.cy_dataflow_distribution_counter.DataflowDistributionCounter

plus some code under util/counters. This is true for "ordinary" sum and max 
counters as well. We should consolidate/simplify this code. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (BEAM-4016) Direct runner incorrect lifecycle, @SplitRestriction should execute after @Setup on SplittableDoFn

2018-04-13 Thread Eugene Kirpichov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437961#comment-16437961
 ] 

Eugene Kirpichov edited comment on BEAM-4016 at 4/13/18 9:29 PM:
-

Yeah it's the desired order. The fix is runner-independent, and it is to add a 
call to invoker.invokeSetup() to 
https://github.com/apache/beam/blob/6107c314e7a1af0d29b3cd865c0dee1013c2261c/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L357
 . 


was (Author: jkff):
Yeah it's the desired order. The fix is to add a call to invoker.invokeSetup() 
to 
https://github.com/apache/beam/blob/6107c314e7a1af0d29b3cd865c0dee1013c2261c/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L357
 . 

> Direct runner incorrect lifecycle, @SplitRestriction should execute after 
> @Setup on SplittableDoFn
> --
>
> Key: BEAM-4016
> URL: https://issues.apache.org/jira/browse/BEAM-4016
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.4.0
>Reporter: Ismaël Mejía
>Assignee: Thomas Groh
>Priority: Major
> Attachments: sdf-splitrestriction-lifeycle-test.patch
>
>
> The method annotated with @SplitRestriction is the method where we can define 
> the RestrictionTrackers (splits) in advance in a SDF. It makes sense to 
> execute this after the @Setup method given that usually connections are 
> established at Setup and can be used to ask the different data stores about 
> the partitioning strategy. I added a test for this in the 
> SplittableDoFnTest.SDFWithLifecycle test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #90

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[tgroh] Fix materializesWithDifferentEnvConsumer

[tgroh] Reduce Requirements to be considered a Primitve

--
[...truncated 1.24 MB...]
at 
org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:47)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:627)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:626)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:828)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:626)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:169)
at 
org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:123)
at 
org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:83)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:346)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:328)
at 
org.apache.beam.runners.spark.translation.streaming.CreateStreamTest.testFirstElementLate(CreateStreamTest.java:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy3.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

[jira] [Commented] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-13 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437547#comment-16437547
 ] 

Anton Kedin commented on BEAM-3773:
---

Raw Beam JDBC Prototype: 
[https://github.com/akedin/beam/commit/096ca8d7185af6b6a01f8231cf85c40fa221051a]

[~apilloud] is looking at Calcite Avatica integration: 
[https://calcite.apache.org/avatica/docs/] 

 

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-13 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin reassigned BEAM-3773:
-

Assignee: Andrew Pilloud

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PerformanceTests_Compressed_TextIOIT_HDFS #45

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 40.48 KB...]
[INFO] Excluding com.google.auth:google-auth-library-credentials:jar:0.7.1 from 
the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-oauth2-http:jar:0.7.1 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigdataoss:util:jar:1.4.5 from the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client-java6:jar:1.22.0 from 
the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client-jackson2:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.oauth-client:google-oauth-client-java6:jar:1.22.0 
from the shaded jar.
[INFO] Excluding 
org.apache.beam:beam-sdks-java-io-hadoop-file-system:jar:2.5.0-SNAPSHOT from 
the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-hdfs:jar:2.7.1 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-core:jar:1.9 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-server:jar:1.9 from the shaded jar.
[INFO] Excluding asm:asm:jar:3.1 from the shaded jar.
[INFO] Excluding commons-cli:commons-cli:jar:1.2 from the shaded jar.
[INFO] Excluding commons-codec:commons-codec:jar:1.4 from the shaded jar.
[INFO] Excluding commons-io:commons-io:jar:2.4 from the shaded jar.
[INFO] Excluding commons-lang:commons-lang:jar:2.6 from the shaded jar.
[INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded jar.
[INFO] Excluding commons-daemon:commons-daemon:jar:1.0.13 from the shaded jar.
[INFO] Excluding log4j:log4j:jar:1.2.17 from the shaded jar.
[INFO] Excluding com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded 
jar.
[INFO] Excluding javax.servlet:servlet-api:jar:2.5 from the shaded jar.
[INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar.
[INFO] Excluding io.netty:netty-all:jar:4.0.23.Final from the shaded jar.
[INFO] Excluding xerces:xercesImpl:jar:2.9.1 from the shaded jar.
[INFO] Excluding xml-apis:xml-apis:jar:1.3.04 from the shaded jar.
[INFO] Excluding org.apache.htrace:htrace-core:jar:3.1.0-incubating from the 
shaded jar.
[INFO] Excluding org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 from the 
shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-client:jar:2.7.1 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-common:jar:2.7.3 from the shaded jar.
[INFO] Excluding org.apache.commons:commons-math3:jar:3.1.1 from the shaded jar.
[INFO] Excluding commons-httpclient:commons-httpclient:jar:3.1 from the shaded 
jar.
[INFO] Excluding commons-net:commons-net:jar:3.1 from the shaded jar.
[INFO] Excluding commons-collections:commons-collections:jar:3.2.2 from the 
shaded jar.
[INFO] Excluding javax.servlet.jsp:jsp-api:jar:2.1 from the shaded jar.
[INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the 
shaded jar.
[INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar.
[INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded 
jar.
[INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the 
shaded jar.
[INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.10 from the shaded jar.
[INFO] Excluding com.google.code.gson:gson:jar:2.2.4 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-auth:jar:2.7.3 from the shaded jar.
[INFO] Excluding 
org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15 from the 
shaded jar.
[INFO] Excluding org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15 from 
the shaded jar.
[INFO] Excluding org.apache.directory.api:api-asn1-api:jar:1.0.0-M20 from the 
shaded jar.
[INFO] Excluding org.apache.directory.api:api-util:jar:1.0.0-M20 from the 
shaded jar.
[INFO] Excluding org.apache.curator:curator-framework:jar:2.7.1 from the shaded 
jar.
[INFO] Excluding org.apache.curator:curator-client:jar:2.7.1 from the shaded 
jar.
[INFO] Excluding org.apache.curator:curator-recipes:jar:2.7.1 from the shaded 
jar.
[INFO] Excluding org.apache.zookeeper:zookeeper:jar:3.4.6 from the shaded jar.
[INFO] Excluding io.netty:netty:jar:3.7.0.Final from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.7.1 from 
the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.7.1 
from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-yarn-client:jar:2.7.1 from the shaded 
jar.
[INFO] Excluding org.apache.hadoop:hadoop-yarn-server-common:jar:2.7.1 from the 
shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.7.1 
from the shaded jar.
[INFO]

[jira] [Work logged] (BEAM-4069) Empty pipeline options can be gracefully serialized/deserialized

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4069?focusedWorklogId=90916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90916
 ]

ASF GitHub Bot logged work on BEAM-4069:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:11
Start Date: 13/Apr/18 18:11
Worklog Time Spent: 10m 
  Work Description: bsidhom opened a new pull request #5126: [BEAM-4069] 
Gracefully deserialize empty options structs
URL: https://github.com/apache/beam/pull/5126
 
 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90916)
Time Spent: 10m
Remaining Estimate: 0h

> Empty pipeline options can be gracefully serialized/deserialized
> 
>
> Key: BEAM-4069
> URL: https://issues.apache.org/jira/browse/BEAM-4069
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> PipelineOptionsTranslation.fromProto currently crashes with a 
> NullPointerException when passed an empty options Struct. This is due to 
> ProxyInvocationHandler.Deserializer expecting a non-empty enclosing Struct.
> Empty pipeline options may be passed by SDKs interacting with a job server, 
> so this case needs to be handled. Note that testing a round-trip of an 
> effectively-empty Java PipelineOptions object is not sufficient to catch this 
> because "empty" Java options still contain default fields not defined in 
> other SDKs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4068) Consistent option specification between SDKs and runners by URN

2018-04-13 Thread Ben Sidhom (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437686#comment-16437686
 ] 

Ben Sidhom commented on BEAM-4068:
--

See [https://github.com/axelmagn/beam/pull/5/files] for an example. Note that 
the specific runner options there are not relevant, but that runners in general 
require arbitrary options.

> Consistent option specification between SDKs and runners by URN
> ---
>
> Key: BEAM-4068
> URL: https://issues.apache.org/jira/browse/BEAM-4068
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Kenneth Knowles
>Priority: Minor
>
> Pipeline options are materialized differently by different SDKs. However, in 
> some cases, runners require Java-specific options that are not available 
> elsewhere. We should decide on well-known URNs and use them across SDKs where 
> applicable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4059) Make sure Dataflow ValidatesRunner tests pass in Gradle

2018-04-13 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437750#comment-16437750
 ] 

Pablo Estrada commented on BEAM-4059:
-

In my test runs with higher number of parallel forks, I{m seeing that around 
207 jobs, gradle is getting stuck, and executing new tests very slowly.

> Make sure Dataflow ValidatesRunner tests pass in Gradle
> ---
>
> Key: BEAM-4059
> URL: https://issues.apache.org/jira/browse/BEAM-4059
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #65

2018-04-13 Thread Apache Jenkins Server

See 


--
[...truncated 19.35 MB...]
INFO: Staging pipeline description to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testwrite-jenkins-0413190722-cb475153/output/results/staging/
Apr 13, 2018 7:07:45 PM org.apache.beam.runners.dataflow.util.PackageUtil 
tryStagePackage
INFO: Uploading <80353 bytes, hash QN6vgjf6loWRj6wqlfk0Dg> to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testwrite-jenkins-0413190722-cb475153/output/results/staging/pipeline-QN6vgjf6loWRj6wqlfk0Dg.pb

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_OUT
Dataflow SDK version: 2.5.0-SNAPSHOT

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_ERROR
Apr 13, 2018 7:07:46 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-04-13_12_07_45-8881118306076350727?project=apache-beam-testing

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_OUT
Submitted job: 2018-04-13_12_07_45-8881118306076350727

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_ERROR
Apr 13, 2018 7:07:46 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud dataflow jobs --project=apache-beam-testing cancel 
--region=us-central1 2018-04-13_12_07_45-8881118306076350727
Apr 13, 2018 7:07:46 PM org.apache.beam.runners.dataflow.TestDataflowRunner 
run
INFO: Running Dataflow job 2018-04-13_12_07_45-8881118306076350727 with 0 
expected assertions.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:45.610Z: Autoscaling is enabled for job 
2018-04-13_12_07_45-8881118306076350727. The number of workers will be between 
1 and 1000.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:45.637Z: Autoscaling was automatically enabled for 
job 2018-04-13_12_07_45-8881118306076350727.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:48.779Z: Checking required Cloud APIs are enabled.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:48.939Z: Checking permissions granted to controller 
Service Account.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:52.399Z: Worker configuration: n1-standard-1 in 
us-central1-f.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:52.767Z: Expanding CoGroupByKey operations into 
optimizable parts.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:52.943Z: Expanding GroupByKey operations into 
optimizable parts.
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:52.979Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:53.123Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:53.157Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create.Values/Read(CreateSource)
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:53.182Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey+SpannerIO.Write/Write
 mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Partial
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
Apr 13, 2018 7:07:55 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T19:07:53.214Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
 into SpannerIO.Write/Write

[jira] [Work logged] (BEAM-4069) Empty pipeline options can be gracefully serialized/deserialized

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4069?focusedWorklogId=90946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90946
 ]

ASF GitHub Bot logged work on BEAM-4069:


Author: ASF GitHub Bot
Created on: 13/Apr/18 19:11
Start Date: 13/Apr/18 19:11
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #5126: [BEAM-4069] 
Gracefully deserialize empty options structs
URL: https://github.com/apache/beam/pull/5126#issuecomment-381234838
 
 
   If I'm reading the Jenkins output correctly, this appears to be failing due 
to `ERROR: beam2 is offline; cannot locate JDK 1.8 (latest)`. I'm not sure if 
it even attempted to run any tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90946)
Time Spent: 0.5h  (was: 20m)

> Empty pipeline options can be gracefully serialized/deserialized
> 
>
> Key: BEAM-4069
> URL: https://issues.apache.org/jira/browse/BEAM-4069
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PipelineOptionsTranslation.fromProto currently crashes with a 
> NullPointerException when passed an empty options Struct. This is due to 
> ProxyInvocationHandler.Deserializer expecting a non-empty enclosing Struct.
> Empty pipeline options may be passed by SDKs interacting with a job server, 
> so this case needs to be handled. Note that testing a round-trip of an 
> effectively-empty Java PipelineOptions object is not sufficient to catch this 
> because "empty" Java options still contain default fields not defined in 
> other SDKs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types

2018-04-13 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437561#comment-16437561
 ] 

Anton Kedin commented on BEAM-3157:
---

PR is merged which allows Rows generation for POJOS: 
[https://github.com/apache/beam/pull/4649/files]

Usage example: 
[https://github.com/apache/beam/blob/670c75e94795ad9da6a0690647e996dc97b60718/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredRowCoderSqlTest.java#L82]
 

Caveat: `SELECT *` doesn't work correctly due to undefined order of fields.

This PR resolves this Jira for now. Bugs and extra features will go into 
separate Jiras.

PS: there's also work to convert JSON obejcts directly to Rows: 
https://github.com/apache/beam/pull/5120

> BeamSql transform should support other PCollection types
> 
>
> Key: BEAM-3157
> URL: https://issues.apache.org/jira/browse/BEAM-3157
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Ismaël Mejía
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently the Beam SQL transform only supports input and output data 
> represented as a BeamRecord. This seems to me like an usability limitation 
> (even if we can do a ParDo to prepare objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map 
> name/type/value from an object field into Calcite so it is convenient to have 
> a specific data type (BeamRecord) for this. However we can accomplish the 
> same by using a PCollection of JavaBean (where we know the same information 
> via the field names/types/values) or by using Avro records where we also have 
> the Schema information. For the output PCollection we can map the object via 
> a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not 
> support composite types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection col = BeamSql.query("SELECT * FROM  WHERE 
> ...").from(MyPojo.class);
> A projection:
> PCollection newCol = BeamSql.query("SELECT id, 
> name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns 
> however I suppose that for memory use reasons maybe mapping directly into 
> Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4059) Make sure Dataflow ValidatesRunner tests pass in Gradle

2018-04-13 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437583#comment-16437583
 ] 

Pablo Estrada commented on BEAM-4059:
-

Looking at the latest passing test, I've grouped the number of jobs started by 
the minute they were started. In the latest passing test, at times 10 jobs were 
started in the same minute. I'd think this means that the Maven-based tests 
were faster due to the fact that they had more jobs running at the same time. 
This would point towards increasing the no. of parallel forks in the Gradle 
suite: 

{{# of Tests, Time, Action

6 2018-04-05_19_16 started job
 4 2018-04-05_19_19 started job
 4 2018-04-05_19_20 started job
 1 2018-04-05_19_22 started job
 5 2018-04-05_19_23 started job
 8 2018-04-05_19_26 started job
 1 2018-04-05_19_29 started job
 8 2018-04-05_19_30 started job
 4 2018-04-05_19_33 started job
 2 2018-04-05_19_36 started job
 8 2018-04-05_19_37 started job
 2 2018-04-05_19_39 started job
 10 2018-04-05_19_40 started job
 7 2018-04-05_19_43 started job
 5 2018-04-05_19_44 started job
 2 2018-04-05_19_46 started job
 5 2018-04-05_19_47 started job
 2 2018-04-05_19_48 started job
 1 2018-04-05_19_49 started job
 7 2018-04-05_19_50 started job
 1 2018-04-05_19_51 started job
 7 2018-04-05_19_53 started job
 1 2018-04-05_19_54 started job
 1 2018-04-05_19_55 started job
 1 2018-04-05_19_56 started job
 2 2018-04-05_19_57 started job
 6 2018-04-05_19_58 started job
 1 2018-04-05_20_00 started job
 4 2018-04-05_20_01 started job
 5 2018-04-05_20_02 started job
 2 2018-04-05_20_03 started job
 1 2018-04-05_20_04 started job
 8 2018-04-05_20_05 started job
 2 2018-04-05_20_06 started job
 2 2018-04-05_20_07 started job
 5 2018-04-05_20_08 started job
 6 2018-04-05_20_09 started job
 2 2018-04-05_20_10 started job
 2 2018-04-05_20_11 started job
 6 2018-04-05_20_12 started job
 3 2018-04-05_20_13 started job
 3 2018-04-05_20_14 started job
 5 2018-04-05_20_15 started job
 2 2018-04-05_20_16 started job
 3 2018-04-05_20_17 started job
 4 2018-04-05_20_18 started job
 5 2018-04-05_20_19 started job
 2 2018-04-05_20_20 started job
 3 2018-04-05_20_21 started job
 6 2018-04-05_20_22 started job
 1 2018-04-05_20_23 started job
 3 2018-04-05_20_24 started job
 6 2018-04-05_20_25 started job
 4 2018-04-05_20_26 started job
 2 2018-04-05_20_27 started job
 1 2018-04-05_20_28 started job
 9 2018-04-05_20_29 started job
 3 2018-04-05_20_30 started job
 10 2018-04-05_20_32 started job
 2 2018-04-05_20_33 started job
 2 2018-04-05_20_34 started job
 4 2018-04-05_20_35 started job
 4 2018-04-05_20_36 started job
 3 2018-04-05_20_37 started job
 2 2018-04-05_20_38 started job
 6 2018-04-05_20_39 started job
 3 2018-04-05_20_40 started job
 9 2018-04-05_20_42 started job
 3 2018-04-05_20_44 started job
 6 2018-04-05_20_45 started job
 3 2018-04-05_20_46 started job
 2 2018-04-05_20_47 started job
 1 2018-04-05_20_48 started job
 1 2018-04-05_20_49 started job}}

> Make sure Dataflow ValidatesRunner tests pass in Gradle
> ---
>
> Key: BEAM-4059
> URL: https://issues.apache.org/jira/browse/BEAM-4059
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #63

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

--
[...truncated 19.28 MB...]
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample 
as view/GBKaSVForKeys as step s23
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample 
as view/ParDo(ToIsmMetadataRecordForKey) as step s24
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample 
as view/Flatten.PCollections as step s25
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Keys sample 
as view/CreateDataflowView as step s26
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Partition 
input as step s27
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Group by 
partition as step s28
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Batch 
mutations together as step s29
Apr 13, 2018 4:59:01 PM 
org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator addStep
INFO: Adding SpannerIO.Write/Write mutations to Cloud Spanner/Write 
mutations to Spanner as step s30
Apr 13, 2018 4:59:01 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Staging pipeline description to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testwrite-jenkins-0413165841-f8b730a/output/results/staging/
Apr 13, 2018 4:59:01 PM org.apache.beam.runners.dataflow.util.PackageUtil 
tryStagePackage
INFO: Uploading <80355 bytes, hash Q-DyMEGh2VHGbsHGLVF8oA> to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testwrite-jenkins-0413165841-f8b730a/output/results/staging/pipeline-Q-DyMEGh2VHGbsHGLVF8oA.pb

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_OUT
Dataflow SDK version: 2.5.0-SNAPSHOT

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_ERROR
Apr 13, 2018 4:59:03 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-04-13_09_59_02-938182063080779098?project=apache-beam-testing

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_OUT
Submitted job: 2018-04-13_09_59_02-938182063080779098

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testWrite STANDARD_ERROR
Apr 13, 2018 4:59:03 PM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud dataflow jobs --project=apache-beam-testing cancel 
--region=us-central1 2018-04-13_09_59_02-938182063080779098
Apr 13, 2018 4:59:03 PM org.apache.beam.runners.dataflow.TestDataflowRunner 
run
INFO: Running Dataflow job 2018-04-13_09_59_02-938182063080779098 with 0 
expected assertions.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T16:59:02.266Z: Autoscaling is enabled for job 
2018-04-13_09_59_02-938182063080779098. The number of workers will be between 1 
and 1000.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T16:59:02.285Z: Autoscaling was automatically enabled for 
job 2018-04-13_09_59_02-938182063080779098.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T16:59:04.787Z: Checking required Cloud APIs are enabled.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T16:59:04.875Z: Checking permissions granted to controller 
Service Account.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T16:59:08.072Z: Worker configuration: n1-standard-1 in 
us-central1-f.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-04-13T16:59:08.453Z: Expanding CoGroupByKey operations into 
optimizable parts.
Apr 13, 2018 4:59:14 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO:

[jira] [Work logged] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3042?focusedWorklogId=90899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90899
 ]

ASF GitHub Bot logged work on BEAM-3042:


Author: ASF GitHub Bot
Created on: 13/Apr/18 17:14
Start Date: 13/Apr/18 17:14
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5075: [BEAM-3042] Refactor 
of TransformIOCounters (performance, inheritance). Time counter for sideinputs.
URL: https://github.com/apache/beam/pull/5075#issuecomment-381202532
 
 
   As it turns out, this is significantly slower even with the flag 
deactivated. We can hold up this change for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90899)
Time Spent: 2h 40m  (was: 2.5h)

> Add tracking of bytes read / time spent when reading side inputs
> 
>
> Key: BEAM-3042
> URL: https://issues.apache.org/jira/browse/BEAM-3042
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> It is difficult for Dataflow users to understand how modifying a pipeline or 
> data set can affect how much inter-transform IO is used in their job. The 
> intent of this feature request is to help users understand how side inputs 
> behave when they are consumed.
> This will allow users to understand how much time and how much data their 
> pipeline uses to read/write to inter-transform IO. Users will also be able to 
> modify their pipelines and understand how their changes affect these IO 
> metrics.
> For further information, please review the internal Google doc 
> go/insights-transform-io-design-doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90903
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 17:23
Start Date: 13/Apr/18 17:23
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381204818
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90903)
Time Spent: 90.5h  (was: 90h 20m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 90.5h
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90944
 ]

ASF GitHub Bot logged work on BEAM-4056:


Author: ASF GitHub Bot
Created on: 13/Apr/18 19:00
Start Date: 13/Apr/18 19:00
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5118: 
[BEAM-4056] Identify side inputs by transform id and local name
URL: https://github.com/apache/beam/pull/5118#discussion_r181474800
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/SideInputReference.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.graph;
+
+import com.google.auto.value.AutoValue;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import 
org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload.SideInputId;
+import 
org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode;
+
+/**
+ * A reference to a side input. This includes the PTransform that references 
the side input as well
+ * as the PCollection referenced. Both are necessary in order to fully resolve 
a view.
+ */
+@AutoValue
+public abstract class SideInputReference {
+
+  /** Create a side input reference. */
+  public static SideInputReference of(String transformId, String localName,
 
 Review comment:
   Yes, it's available. We already require components everywhere due to 
PCollectionNode. Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90944)
Time Spent: 1h 10m  (was: 1h)

> Identify Side Inputs by PTransform ID and local name
> 
>
> Key: BEAM-4056
> URL: https://issues.apache.org/jira/browse/BEAM-4056
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is necessary in order to correctly identify side inputs during all 
> phases of portable pipeline execution (fusion, translation, and SDK 
> execution).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90945
 ]

ASF GitHub Bot logged work on BEAM-4056:


Author: ASF GitHub Bot
Created on: 13/Apr/18 19:00
Start Date: 13/Apr/18 19:00
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5118: 
[BEAM-4056] Identify side inputs by transform id and local name
URL: https://github.com/apache/beam/pull/5118#discussion_r181473382
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -217,12 +217,12 @@ message ExecutableStagePayload {
   // PTransform the ExecutableStagePayload is the payload of.
   string input = 2;
 
-  // Side Input PCollection ids. Each must be present as a value in the inputs 
of
-  // any PTransform the ExecutableStagePayload is the payload of.
-  repeated string side_inputs = 3;
+  // The side inputs required for this executable stage. Each must be prsent 
as a side input of
 
 Review comment:
   Aren't you familiar with that common abbreviation??? It shaves off a whole 
character!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90945)
Time Spent: 1h 20m  (was: 1h 10m)

> Identify Side Inputs by PTransform ID and local name
> 
>
> Key: BEAM-4056
> URL: https://issues.apache.org/jira/browse/BEAM-4056
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is necessary in order to correctly identify side inputs during all 
> phases of portable pipeline execution (fusion, translation, and SDK 
> execution).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4056) Identify Side Inputs by PTransform ID and local name

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4056?focusedWorklogId=90943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90943
 ]

ASF GitHub Bot logged work on BEAM-4056:


Author: ASF GitHub Bot
Created on: 13/Apr/18 19:00
Start Date: 13/Apr/18 19:00
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5118: 
[BEAM-4056] Identify side inputs by transform id and local name
URL: https://github.com/apache/beam/pull/5118#discussion_r181474184
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -122,11 +123,16 @@ default PTransform toPTransform() {
 pt.putInputs("input", getInputPCollection().getId());
 payload.setInput(input.getId());
 
-int sideInputIndex = 0;
-for (PCollectionNode sideInputNode : getSideInputPCollections()) {
-  pt.putInputs(String.format("side_input_%s", sideInputIndex), 
sideInputNode.getId());
-  payload.addSideInputs(sideInputNode.getId());
-  sideInputIndex++;
+for (SideInputReference sideInput : getSideInputs()) {
+  // Side inputs of the ExecutableStage itself can be uniquely identified 
by inner PTransform
+  // name and local name.
+  String outerLocalName = String.format("%s:%s",
+  sideInput.transformId(), sideInput.localName());
+  pt.putInputs(outerLocalName, sideInput.getCollection().getId());
+  payload.addSideInputs(SideInputId.newBuilder()
 
 Review comment:
   Let me know if this is better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90943)
Time Spent: 1h  (was: 50m)

> Identify Side Inputs by PTransform ID and local name
> 
>
> Key: BEAM-4056
> URL: https://issues.apache.org/jira/browse/BEAM-4056
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Ben Sidhom
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is necessary in order to correctly identify side inputs during all 
> phases of portable pipeline execution (fusion, translation, and SDK 
> execution).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4060) meta-info/poms missing in snapshots

2018-04-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437755#comment-16437755
 ] 

Ismaël Mejía commented on BEAM-4060:


This is similar in scope to BEAM-4057. But since this one tracks a smaller 
subset it is probably a good idea to let it separately and validate with 
[~ravwojdyla] when fixed to see if we are not breaking for example some use of 
this on Scio.

> meta-info/poms missing in snapshots
> ---
>
> Key: BEAM-4060
> URL: https://issues.apache.org/jira/browse/BEAM-4060
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Rafal Wojdyla
>Assignee: Luke Cwik
>Priority: Blocker
>
> Current snapshots are missing bunch of meta-info files, including pom.xml and 
> pom.properties:
> 2.4.0-SNAPSHOT example:
> {noformat}
> jar -tf 
> ~/.ivy2/cache/org.apache.beam/beam-runners-direct-java/jars/beam-runners-direct-java-2.4.0-SNAPSHOT.jar
>  | grep META-INFMETA-INF/
> META-INF/DEPENDENCIES
> META-INF/LICENSE
> META-INF/MANIFEST.MF
> META-INF/NOTICE
> META-INF/maven/
> META-INF/maven/com.google.code.findbugs/
> META-INF/maven/com.google.code.findbugs/jsr305/
> META-INF/maven/com.google.code.findbugs/jsr305/pom.properties
> META-INF/maven/com.google.code.findbugs/jsr305/pom.xml
> META-INF/maven/com.google.guava/
> META-INF/maven/com.google.guava/guava/
> META-INF/maven/com.google.guava/guava/pom.properties
> META-INF/maven/com.google.guava/guava/pom.xml
> META-INF/maven/com.google.protobuf/
> META-INF/maven/com.google.protobuf/protobuf-java-util/
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.xml
> META-INF/maven/com.google.protobuf/protobuf-java/
> META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java/pom.xml
> META-INF/maven/org.apache.beam/
> META-INF/maven/org.apache.beam/beam-model-pipeline/
> META-INF/maven/org.apache.beam/beam-model-pipeline/pom.properties
> META-INF/maven/org.apache.beam/beam-model-pipeline/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-core-java/
> META-INF/maven/org.apache.beam/beam-runners-core-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-core-java/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-direct-java/
> META-INF/maven/org.apache.beam/beam-runners-direct-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-direct-java/pom.xml
> META-INF/services/
> META-INF/services/org.apache.beam.runners.direct.repackaged.runners.core.construction.CoderTranslatorRegistrar
> META-INF/services/org.apache.beam.runners.direct.repackaged.runners.core.construction.TransformPayloadTranslatorRegistrar
> META-INF/services/org.apache.beam.sdk.options.PipelineOptionsRegistrar
> META-INF/services/org.apache.beam.sdk.runners.PipelineRunnerRegistrar
> {noformat}
> 2.5.0-SNAPSHOT:
> {noformat}
> circleci@5abc19b95c60:~/scio$ jar -tf 
> ~/.ivy2/cache/org.apache.beam/beam-runners-direct-java/jars/beam-runners-direct-java-2.5.0-SNAPSHOT.jar
>   | grep META-INF
> META-INF/
> META-INF/MANIFEST.MF
> META-INF/services/
> META-INF/maven/
> META-INF/maven/com.google.guava/
> META-INF/maven/com.google.guava/guava/
> META-INF/maven/com.google.guava/guava/pom.properties
> META-INF/maven/com.google.guava/guava/pom.xml
> META-INF/maven/com.google.protobuf/
> META-INF/maven/com.google.protobuf/protobuf-java-util/
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.xml
> META-INF/maven/com.google.protobuf/protobuf-java/
> META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java/pom.xml
> META-INF/services/org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.TransformPayloadTranslatorRegistrar
> META-INF/services/org.apache.beam.sdk.runners.PipelineRunnerRegistrar
> META-INF/services/org.apache.beam.sdk.options.PipelineOptionsRegistrar
> META-INF/services/org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.CoderTranslatorRegistrar
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #89

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[sidhom] Support impulse transforms in Flink

[sidhom] Add Impulse ValidatesRunner test

--
[...truncated 1.24 MB...]
at 
org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:47)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:627)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:626)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:828)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:626)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:169)
at 
org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:123)
at 
org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:83)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:346)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:328)
at 
org.apache.beam.runners.spark.translation.streaming.CreateStreamTest.testFirstElementLate(CreateStreamTest.java:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy3.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at

Jenkins build is back to normal : beam_PostCommit_Python_ValidatesRunner_Dataflow #1346

2018-04-13 Thread Apache Jenkins Server

See

[jira] [Updated] (BEAM-4066) Nexmark fails when running with PubSub as source/sink

2018-04-13 Thread Alexey Romanenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-4066:
---
Attachment: gistfile1.txt

> Nexmark fails when running with PubSub as source/sink
> -
>
> Key: BEAM-4066
> URL: https://issues.apache.org/jira/browse/BEAM-4066
> Project: Beam
>  Issue Type: Bug
>  Components: examples-nexmark
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
> Attachments: gistfile1.txt
>
>
> Running Nexmark with PubSub cause this exception:
> {noformat}
> Caused by: java.io.NotSerializableException: 
> org.apache.beam.sdk.nexmark.NexmarkLauncher
> {noformat}
> Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2898) Flink supports chaining/fusion of single-SDK stages

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2898?focusedWorklogId=90889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90889
 ]

ASF GitHub Bot logged work on BEAM-2898:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:54
Start Date: 13/Apr/18 16:54
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on issue #4783: [BEAM-2898] Support 
Impulse transforms in Flink batch runner
URL: https://github.com/apache/beam/pull/4783#issuecomment-381196752
 
 
   It looks like the precommits that are actually being run are passing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90889)
Time Spent: 4h 40m  (was: 4.5h)

> Flink supports chaining/fusion of single-SDK stages
> ---
>
> Key: BEAM-2898
> URL: https://issues.apache.org/jira/browse/BEAM-2898
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The Fn API supports fused stages, which avoids unnecessarily round-tripping 
> the data over the Fn API between stages. The Flink runner should use that 
> capability for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-4028) Step / Operation naming should rely on a NameContext class

2018-04-13 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437590#comment-16437590
 ] 

Pablo Estrada commented on BEAM-4028:
-

Before closing this, this log statement should be removed 
[https://github.com/apache/beam/pull/5043#discussion-diff-181442300R123]

> Step / Operation naming should rely on a NameContext class
> --
>
> Key: BEAM-4028
> URL: https://issues.apache.org/jira/browse/BEAM-4028
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Steps can have different names depending on the runner (stage, step, user, 
> system name...). 
> Depending on the needs of different components (operations, logging, metrics, 
> statesampling) these step names are passed around without a specific order.
> Instead, SDK should rely on `NameContext` objects that carry all the naming 
> information for a single step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[beam-site] 01/01: This closes #419

2018-04-13 Thread mergebot-role

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit b1d3ae3184ec390e7b2917d26cf8f5347f69fbc5
Merge: 6cfd3ba 07dd232
Author: Mergebot 
AuthorDate: Fri Apr 13 10:20:22 2018 -0700

This closes #419

 content/contribute/eclipse/index.html | 75 +--
 src/contribute/eclipse.md | 66 +-
 2 files changed, 84 insertions(+), 57 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.

[beam-site] branch mergebot updated (c2e16d4 -> b1d3ae3)

2018-04-13 Thread mergebot-role

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


 discard c2e16d4  This closes #419
 new b1d3ae3  This closes #419

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (c2e16d4)
\
 N -- N -- N   refs/heads/mergebot (b1d3ae3)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.

Build failed in Jenkins: beam_PerformanceTests_Python #1144

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 2.78 KB...]
Successfully installed setuptools pip
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1282056347515699854.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6350825671478846649.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Collecting numpy==1.13.3 (from -r PerfKitBenchmarker/requirements.txt (line 22))
:339:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  SNIMissingWarning
:137:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecurePlatformWarning
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe>=0.23 in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: colorama; extra == "windows" in 
/usr/lib/python2.7/dist-packages (from colorlog[windows]==2.6.0->-r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: xmltodict in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: requests-ntlm>=0.3.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: requests>=2.9.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from pywinrm->-r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: ntlm-auth>=1.0.2 in 
/home/jenkins/.local/lib/python2.7/site-packages (from 
requests-ntlm>=0.3.0->pywinrm->-r PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: cryptography>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from 
requests-ntlm>=0.3.0->pywinrm->-r

Build failed in Jenkins: beam_PerformanceTests_AvroIOIT_HDFS #45

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 404.31 KB...]
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:248)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:235)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy60.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy61.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:248)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:235)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn$DoFnInvoker.invokeProcessElement(Unknown
 Source)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177)
at 
org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:138)
at 
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
at 
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
at 
com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
at 
com.google.cloud.dataflow.worker.AssignWindowsParDoFnFactory$AssignWindowsParDoFn.processElement(AssignWindowsParDoFnFactory.java:118)
at

Jenkins build is back to normal : beam_PerformanceTests_XmlIOIT #140

2018-04-13 Thread Apache Jenkins Server

See

Build failed in Jenkins: beam_PerformanceTests_TextIOIT_HDFS #51

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 114.13 KB...]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2447)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy60.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy61.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:248)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:235)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
 Cannot create 
file/.temp-beam-2018-04-13_18-11-46-0/797164bb-e446-48d5-bfbb-af832cdb3c06. 
Name node is in safe mode.
The reported blocks 31 has reached the threshold 0.9990 of total blocks 31. The 
number of live datanodes 1 has reached the minimum number 0. In safe mode 
extension. Safe mode will be turned off automatically in 18 seconds.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1327)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2447)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
at

[jira] [Commented] (BEAM-2898) Flink supports chaining/fusion of single-SDK stages

2018-04-13 Thread Ben Sidhom (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437763#comment-16437763
 ] 

Ben Sidhom commented on BEAM-2898:
--

Pipeline fusion has been implemented in runner-core. With support for impulse, 
the Flink runner will support pipeline fusion once translation by proto has 
been implemented.

> Flink supports chaining/fusion of single-SDK stages
> ---
>
> Key: BEAM-2898
> URL: https://issues.apache.org/jira/browse/BEAM-2898
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Henning Rohde
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The Fn API supports fused stages, which avoids unnecessarily round-tripping 
> the data over the Fn API between stages. The Flink runner should use that 
> capability for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-2588) Portable Flink Runner Job API

2018-04-13 Thread Ben Sidhom (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437781#comment-16437781
 ] 

Ben Sidhom commented on BEAM-2588:
--

I've renamed this bug to reflect the fact that the portable Flink runner will 
effectively be its own runner entrypoint. The Job API itself does not actually 
deal with Java "Runners" at all.

> Portable Flink Runner Job API
> -
>
> Key: BEAM-2588
> URL: https://issues.apache.org/jira/browse/BEAM-2588
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Axel Magnuson
>Priority: Major
>  Labels: portability
>
> Whatever the result of https://s.apache.org/beam-job-api we will need a way 
> for the JVM-based FlinkRunner to receive and run pipelines authors in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-2588) Portable Flink Runner Job API

2018-04-13 Thread Ben Sidhom (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Sidhom updated BEAM-2588:
-
Description: The portable Flink runner needs to be wired into a job server 
so that it can accept jobs the job api (https://s.apache.org/beam-job-api).  
(was: Whatever the result of https://s.apache.org/beam-job-api we will need a 
way for the JVM-based FlinkRunner to receive and run pipelines authors in 
Python.)

> Portable Flink Runner Job API
> -
>
> Key: BEAM-2588
> URL: https://issues.apache.org/jira/browse/BEAM-2588
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Axel Magnuson
>Priority: Major
>  Labels: portability
>
> The portable Flink runner needs to be wired into a job server so that it can 
> accept jobs the job api (https://s.apache.org/beam-job-api).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90878
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:40
Start Date: 13/Apr/18 16:40
Worklog Time Spent: 10m 
  Work Description: adejanovski commented on a change in pull request 
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async 
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181445060
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
 ##
 @@ -350,19 +353,21 @@ public TokenRange(
* Writer storing an entity into Apache Cassandra database.
*/
   protected class WriterImpl implements Writer {
-
+private static final int CONCURRENT_ASYNC_QUERIES = 100;
 
 Review comment:
   Not really, because it doesn't relate to the number of nodes or vnodes, it's 
more of a best practice to prevent nodes from being overwhelmed. The capacity 
of the nodes to handle a lot of concurrent queries will depend partly on the 
number of threads in the read thread pool of Cassandra, which is a 
configuration element we cannot access from the client.
   I'm not sure users should be dealing with this as concurrency can already be 
handled by limiting the number of splits/workers.
   We can make this configurable but it could be confusing to users and 
wouldn't bring improvements in throughput IMHO.
   
   I'd be in favor of leaving this as a constant, but if you want to make it 
configurable I'll do it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90878)
Time Spent: 1h 10m  (was: 1h)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90887
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:51
Start Date: 13/Apr/18 16:51
Worklog Time Spent: 10m 
  Work Description: adejanovski commented on issue #5112: [BEAM-4049] 
Improve CassandraIO write throughput by performing async queries
URL: https://github.com/apache/beam/pull/5112#issuecomment-381196012
 
 
   @aromanenko-dev, I just pushed a new version that uses loops instead of 
`Futures.successfulAsList()`.
   I've tested it by generating timeouts in the middle of a pipeline's 
execution and the exceptions were correctly triggered and reported in the logs.
   The pipeline exited at the first exception, which is the desired behavior.
   Retry policies can be implemented in the driver to handle transient 
failures, but it should be part of another PR anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90887)
Time Spent: 1.5h  (was: 1h 20m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90886
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:51
Start Date: 13/Apr/18 16:51
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async 
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181448083
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
 ##
 @@ -373,11 +378,22 @@ public WriterImpl(CassandraIO.Write spec) {
 @Override
 public void write(T entity) {
   Mapper mapper = (Mapper) mappingManager.mapper(entity.getClass());
-  mapper.save(entity);
+  this.writeFutures.add(mapper.saveAsync(entity));
+  if (this.writeFutures.size() % CONCURRENT_ASYNC_QUERIES == 0) {
+// We reached the max number of allowed in flight queries
+LOG.debug("Waiting for a batch of 100 Cassandra writes to be 
executed...");
+Futures.successfulAsList(this.writeFutures);
 
 Review comment:
   I think, we have to fail the whole bundle in case of failures. I'd recommend 
to take a look on methods `BigtableWriterFn.checkForFailures()` or 
`KinesisWriterFn.checkForFailures()`. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90886)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3981) Futurize and fix python 2 compatibility for coders package

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3981?focusedWorklogId=90914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90914
 ]

ASF GitHub Bot logged work on BEAM-3981:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:00
Start Date: 13/Apr/18 18:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #5053: [BEAM-3981] Futurize 
coders subpackage
URL: https://github.com/apache/beam/pull/5053#issuecomment-381215630
 
 
   I see the following error in the output
   
   ```
   I  /tmp/pip-cVZPmP-build/setup.py:79: UserWarning: You are using version 
0.27.3 of cython. However, version 0.28.1 is recommended. 
   I_CYTHON_VERSION, REQUIRED_CYTHON_VERSION 
   I   
   I  Error compiling Cython file: 
   I   
   I  ... 
   I  except NameError:   # Python 3 
   Ilong = int 
   Iunicode = str 
   I   
   I   
   I  class CoderImpl(object): 
   I  ^ 
   I   
   I   
   I  apache_beam/coders/coder_impl.py:68:0: 'object' is not a type name 
   I  Compiling apache_beam/coders/stream.pyx because it changed. 
   I  Compiling apache_beam/runners/worker/statesampler_fast.pyx because it 
changed. 
   I  Compiling apache_beam/coders/coder_impl.py because it changed. 
   I  Compiling apache_beam/metrics/execution.py because it changed. 
   I  Compiling apache_beam/runners/common.py because it changed. 
   I  Compiling apache_beam/runners/worker/logger.py because it changed. 
   I  Compiling apache_beam/runners/worker/opcounters.py because it 
changed. 
   I  Compiling apache_beam/runners/worker/operations.py because it 
changed. 
   I  Compiling apache_beam/transforms/cy_combiners.py because it changed. 
   I  Compiling apache_beam/utils/counters.py because it changed. 
   I  Compiling apache_beam/utils/windowed_value.py because it changed. 
   I  [ 1/11] Cythonizing apache_beam/coders/coder_impl.py 
   I  Traceback (most recent call last): 
   IFile "", line 1, in  
   IFile "/tmp/pip-cVZPmP-build/setup.py", line 204, in  
   I  'apache_beam/utils/windowed_value.py', 
   IFile 
"/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 
1039, in cythonize 
   I  cythonize_one(*args) 
   IFile 
"/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 
1161, in cythonize_one 
   I  raise CompileError(None, pyx_file) 
   I  Cython.Compiler.Errors.CompileError: apache_beam/coders/coder_impl.py 
   I   
   I   
   ```
   
   I assume this is because The new code requires the new Cython version 
however dataflow workers do not have it.
   
   @tvalentyn Could you upgrade the workers at head to use the 0.28.1 cython 
version?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90914)
Time Spent: 11h  (was: 10h 50m)

> Futurize and fix python 2 compatibility for coders package
> --
>
> Key: BEAM-3981
> URL: https://issues.apache.org/jira/browse/BEAM-3981
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Run automatic conversion with futurize tool on coders subpackage and fix 
> python 2 compatibility. This prepares the subpackage for python 3 support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=90922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90922
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:20
Start Date: 13/Apr/18 18:20
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4793: [BEAM-3776] Fix issue 
with merging late windows where a watermark hold could be added behind the 
input watermark.
URL: https://github.com/apache/beam/pull/4793#issuecomment-381220818
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90922)
Time Spent: 5h 10m  (was: 5h)

> StateMerging.mergeWatermarks sets a late watermark hold for late merging 
> windows that depend only on the window
> ---
>
> Key: BEAM-3776
> URL: https://issues.apache.org/jira/browse/BEAM-3776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Critical
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> WatermarkHold.addElementHold and WatermarkHold.addGarbageCollectionHold take 
> to not add holds that would be before the input watermark.
> However WatermarkHold.onMerge calls StateMerging.mergeWatermarks which if the 
> window depends only on window, sets a hold for the end of the window 
> regardless of the input watermark.
> Thus if you have a WindowingStrategy such as:
> WindowingStrategy.of(Sessions.withGapDuration(gapDuration))
>  .withMode(AccumulationMode.DISCARDING_FIRED_PANES)
>  .withTrigger(
>  Repeatedly.forever(
>  AfterWatermark.pastEndOfWindow()
>  .withLateFirings(AfterPane.elementCountAtLeast(10
>  .withAllowedLateness(allowedLateness))
> and you merge windows that are late, you might end up holding the watermark 
> until the allowedLateness has passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PerformanceTests_Spark #1586

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 88.52 KB...]
'apache-beam-testing:bqjob_r50269718b66c2f16_0162c0407f57_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-04-13 18:24:37,037 b58e2e45 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-13 18:24:57,246 b58e2e45 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-13 18:24:59,609 b58e2e45 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: Upload complete.
Waiting on bqjob_r28b01afee4f06294_0162c040d6d3_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r28b01afee4f06294_0162c040d6d3_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r28b01afee4f06294_0162c040d6d3_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-04-13 18:24:59,610 b58e2e45 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-13 18:25:16,362 b58e2e45 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-13 18:25:18,695 b58e2e45 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: Upload complete.
Waiting on bqjob_r4a79861ec52cff9a_0162c041219b_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r4a79861ec52cff9a_0162c041219b_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r4a79861ec52cff9a_0162c041219b_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-04-13 18:25:18,696 b58e2e45 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-13 18:25:35,888 b58e2e45 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-13 18:25:38,398 b58e2e45 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: Upload complete.
Waiting on bqjob_r1ae8607e47df9cba_0162c0416e8d_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r1ae8607e47df9cba_0162c0416e8d_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error

[jira] [Closed] (BEAM-3547) [SQL] Nested Query Generates Incompatible Trigger

2018-04-13 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3547.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Nested Query Generates Incompatible Trigger
> -
>
> Key: BEAM-3547
> URL: https://issues.apache.org/jira/browse/BEAM-3547
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> From 
> [https://stackoverflow.com/questions/48335383/nested-queries-in-beam-sql] :
>  
> SQL:
> {code:java}
> PCollection Query_Output = Query.apply(
> BeamSql.queryMulti("Select Orders.OrderID From Orders Where 
> Orders.CustomerID IN (Select Customers.CustomerID From Customers WHERE 
> Customers.CustomerID = 2)"));{code}
>  
> Error:
> {code:java}
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> validateAndConvert
> INFO: SQL:
> SELECT `Orders`.`OrderID`
> FROM `Orders` AS `Orders`
> WHERE `Orders`.`CustomerID` IN (SELECT `Customers`.`CustomerID`
> FROM `Customers` AS `Customers`
> WHERE `Customers`.`CustomerID` = 2)
> Jan 19, 2018 11:56:36 AM 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(OrderID=[$0])
>   LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> LogicalTableScan(table=[[Orders]])
> LogicalAggregate(group=[{0}])
>   LogicalProject(CustomerID=[$0])
> LogicalFilter(condition=[=($0, 2)])
>   LogicalTableScan(table=[[Customers]])
> Exception in thread "main" java.lang.IllegalStateException: 
> java.lang.IllegalStateException: Inputs to Flatten had incompatible triggers: 
> DefaultTrigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:165)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160)
> at com.bitwise.cloud.ExampleOfJoins.main(ExampleOfJoins.java:91)
> Caused by: java.lang.IllegalStateException: Inputs to Flatten had 
> incompatible triggers: DefaultTrigger, 
> Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:123)
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:101)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionList.apply(PCollectionList.java:182)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:124)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:74)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple.apply(KeyedPCollectionTuple.java:107)
> at org.apache.beam.sdk.extensions.joinlibrary.Join.innerJoin(Join.java:59)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.standardJoin(BeamJoinRel.java:217)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.buildBeamPipeline(BeamJoinRel.java:161)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:68)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:163)
> ... 5 more{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4066) Nexmark fails when running with PubSub as source/sink

2018-04-13 Thread Alexey Romanenko (JIRA)

Alexey Romanenko created BEAM-4066:
--

 Summary: Nexmark fails when running with PubSub as source/sink
 Key: BEAM-4066
 URL: https://issues.apache.org/jira/browse/BEAM-4066
 Project: Beam
  Issue Type: Bug
  Components: examples-nexmark
Reporter: Alexey Romanenko
Assignee: Alexey Romanenko


Running Nexmark with PubSub cause this exception:

{noformat}
Caused by: java.io.NotSerializableException: 
org.apache.beam.sdk.nexmark.NexmarkLauncher
{noformat}

Full log is attached.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90873
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:34
Start Date: 13/Apr/18 16:34
Worklog Time Spent: 10m 
  Work Description: adejanovski commented on a change in pull request 
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async 
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181443693
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
 ##
 @@ -373,11 +378,22 @@ public WriterImpl(CassandraIO.Write spec) {
 @Override
 public void write(T entity) {
   Mapper mapper = (Mapper) mappingManager.mapper(entity.getClass());
-  mapper.save(entity);
+  this.writeFutures.add(mapper.saveAsync(entity));
+  if (this.writeFutures.size() % CONCURRENT_ASYNC_QUERIES == 0) {
+// We reached the max number of allowed in flight queries
+LOG.debug("Waiting for a batch of 100 Cassandra writes to be 
executed...");
+Futures.successfulAsList(this.writeFutures);
 
 Review comment:
   Yes, we definitely should and successfulAsList() won't do this properly.
   
   I'll loop over the futures instead and call `.get()` so that we can throw 
the exceptions that would be triggered.
   What's the way of dealing with exceptions in Beam IOs ? Should I log the 
error and re-throw the exception or just not catch it and let it propagate up 
the stack ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90873)
Time Spent: 50m  (was: 40m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90874
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:34
Start Date: 13/Apr/18 16:34
Worklog Time Spent: 10m 
  Work Description: adejanovski commented on a change in pull request 
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async 
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181443784
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
 ##
 @@ -373,11 +378,22 @@ public WriterImpl(CassandraIO.Write spec) {
 @Override
 public void write(T entity) {
   Mapper mapper = (Mapper) mappingManager.mapper(entity.getClass());
-  mapper.save(entity);
+  this.writeFutures.add(mapper.saveAsync(entity));
+  if (this.writeFutures.size() % CONCURRENT_ASYNC_QUERIES == 0) {
+// We reached the max number of allowed in flight queries
+LOG.debug("Waiting for a batch of 100 Cassandra writes to be 
executed...");
+Futures.successfulAsList(this.writeFutures);
+this.writeFutures = Lists.newArrayList();
+  }
 }
 
 @Override
 public void close() {
+  if (this.writeFutures.size() > 0) {
+// Waiting for the last in flight queries to end
+Futures.successfulAsList(this.writeFutures);
 
 Review comment:
   Same answer, a loop over the futures will be more appropriate, nice catch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90874)
Time Spent: 1h  (was: 50m)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-3547) [SQL] Nested Query Generates Incompatible Trigger

2018-04-13 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-3547:
--
Fix Version/s: (was: Not applicable)
   2.4.0

> [SQL] Nested Query Generates Incompatible Trigger
> -
>
> Key: BEAM-3547
> URL: https://issues.apache.org/jira/browse/BEAM-3547
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: 2.4.0
>
>
> From 
> [https://stackoverflow.com/questions/48335383/nested-queries-in-beam-sql] :
>  
> SQL:
> {code:java}
> PCollection Query_Output = Query.apply(
> BeamSql.queryMulti("Select Orders.OrderID From Orders Where 
> Orders.CustomerID IN (Select Customers.CustomerID From Customers WHERE 
> Customers.CustomerID = 2)"));{code}
>  
> Error:
> {code:java}
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> validateAndConvert
> INFO: SQL:
> SELECT `Orders`.`OrderID`
> FROM `Orders` AS `Orders`
> WHERE `Orders`.`CustomerID` IN (SELECT `Customers`.`CustomerID`
> FROM `Customers` AS `Customers`
> WHERE `Customers`.`CustomerID` = 2)
> Jan 19, 2018 11:56:36 AM 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(OrderID=[$0])
>   LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> LogicalTableScan(table=[[Orders]])
> LogicalAggregate(group=[{0}])
>   LogicalProject(CustomerID=[$0])
> LogicalFilter(condition=[=($0, 2)])
>   LogicalTableScan(table=[[Customers]])
> Exception in thread "main" java.lang.IllegalStateException: 
> java.lang.IllegalStateException: Inputs to Flatten had incompatible triggers: 
> DefaultTrigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:165)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160)
> at com.bitwise.cloud.ExampleOfJoins.main(ExampleOfJoins.java:91)
> Caused by: java.lang.IllegalStateException: Inputs to Flatten had 
> incompatible triggers: DefaultTrigger, 
> Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:123)
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:101)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionList.apply(PCollectionList.java:182)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:124)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:74)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple.apply(KeyedPCollectionTuple.java:107)
> at org.apache.beam.sdk.extensions.joinlibrary.Join.innerJoin(Join.java:59)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.standardJoin(BeamJoinRel.java:217)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.buildBeamPipeline(BeamJoinRel.java:161)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:68)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:163)
> ... 5 more{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2990) support data type MAP

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2990?focusedWorklogId=90894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90894
 ]

ASF GitHub Bot logged work on BEAM-2990:


Author: ASF GitHub Bot
Created on: 13/Apr/18 17:01
Start Date: 13/Apr/18 17:01
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on a change in pull request #5079: 
[BEAM-2990] support MAP in SQL schema
URL: https://github.com/apache/beam/pull/5079#discussion_r181450896
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##
 @@ -208,8 +209,12 @@ public int hashCode() {
 public static final Set STRING_TYPES = ImmutableSet.of(STRING);
 public static final Set DATE_TYPES = ImmutableSet.of(DATETIME);
 public static final Set CONTAINER_TYPES = ImmutableSet.of(ARRAY);
+public static final Set MAP_TYPES = ImmutableSet.of(MAP);
 public static final Set COMPOSITE_TYPES = ImmutableSet.of(ROW);
 
+public boolean isPrimitiveType() {
+  return isNumericType() || isStringType() || isDateType();
 
 Review comment:
   right, will change


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90894)
Time Spent: 4.5h  (was: 4h 20m)

> support data type MAP
> -
>
> Key: BEAM-2990
> URL: https://issues.apache.org/jira/browse/BEAM-2990
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> support Non-scalar types:
> MAP   Collection of keys mapped to values
> ARRAY Ordered, contiguous collection that may contain duplicates



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-2990) support data type MAP

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2990?focusedWorklogId=90897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90897
 ]

ASF GitHub Bot logged work on BEAM-2990:


Author: ASF GitHub Bot
Created on: 13/Apr/18 17:10
Start Date: 13/Apr/18 17:10
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on a change in pull request #5079: 
[BEAM-2990] support MAP in SQL schema
URL: https://github.com/apache/beam/pull/5079#discussion_r181452879
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##
 @@ -208,6 +209,7 @@ public int hashCode() {
 public static final Set STRING_TYPES = ImmutableSet.of(STRING);
 public static final Set DATE_TYPES = ImmutableSet.of(DATETIME);
 public static final Set CONTAINER_TYPES = ImmutableSet.of(ARRAY);
+public static final Set MAP_TYPES = ImmutableSet.of(MAP);
 
 Review comment:
   In Java, container extends `Collection` and map extends `Map`, they're very 
different IMO. If we merge them together I don't see any benefit as this is a 
backend function and developers are using either 
`TypeName.ARRAY.type().withComponentType()` or 
`TypeName.MAP.type().withMapType`. 
   
   To make it clear, I would prefer to use the term **Collection** instead of 
*Component* or *Contianer*. Any comments? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90897)
Time Spent: 4h 40m  (was: 4.5h)

> support data type MAP
> -
>
> Key: BEAM-2990
> URL: https://issues.apache.org/jira/browse/BEAM-2990
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> support Non-scalar types:
> MAP   Collection of keys mapped to values
> ARRAY Ordered, contiguous collection that may contain duplicates



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4067) Add portable Flink test runner

2018-04-13 Thread Ben Sidhom (JIRA)

Ben Sidhom created BEAM-4067:


 Summary: Add portable Flink test runner
 Key: BEAM-4067
 URL: https://issues.apache.org/jira/browse/BEAM-4067
 Project: Beam
  Issue Type: New Feature
  Components: runner-flink
Reporter: Ben Sidhom
Assignee: Aljoscha Krettek


The portable Flink runner cannot be tested through the normal mechanisms used 
for ValidatesRunner tests because it requires a job server to be constructed 
out of band and for pipelines to be run through it. We should implement a shim 
that acts as a standard Java SDK Runner that spins up the necessary server 
(possibly in-process) and runs against it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #64

2018-04-13 Thread Apache Jenkins Server

See

Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #137

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 43.52 KB...]
[INFO] Excluding com.google.cloud.bigdataoss:gcsio:jar:1.4.5 from the shaded 
jar.
[INFO] Excluding 
com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev6-1.22.0 
from the shaded jar.
[INFO] Excluding 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.5.0-SNAPSHOT from 
the shaded jar.
[INFO] Excluding 
org.apache.beam:beam-sdks-java-extensions-protobuf:jar:2.5.0-SNAPSHOT from the 
shaded jar.
[INFO] Excluding io.grpc:grpc-core:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.errorprone:error_prone_annotations:jar:2.0.15 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-context:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.instrumentation:instrumentation-api:jar:0.3.0 from 
the shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-bigquery:jar:v2-rev374-1.22.0 from the 
shaded jar.
[INFO] Excluding com.google.api:gax-grpc:jar:0.20.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.api:api-common:jar:1.0.0-rc2 from the shaded jar.
[INFO] Excluding com.google.api:gax:jar:1.3.1 from the shaded jar.
[INFO] Excluding org.threeten:threetenbp:jar:1.3.3 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core-grpc:jar:1.2.0 from the 
shaded jar.
[INFO] Excluding com.google.apis:google-api-services-pubsub:jar:v1-rev10-1.22.0 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 from the 
shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-protobuf:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-auth:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-netty:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http2:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.opencensus:opencensus-api:jar:0.7.0 from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-all:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-okhttp:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.squareup.okhttp:okhttp:jar:2.5.0 from the shaded jar.
[INFO] Excluding com.squareup.okio:okio:jar:1.6.0 from the shaded jar.
[INFO] Excluding

Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #46

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 39.02 KB...]
[INFO] Excluding com.google.api:gax:jar:1.3.1 from the shaded jar.
[INFO] Excluding org.threeten:threetenbp:jar:1.3.3 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core-grpc:jar:1.2.0 from the 
shaded jar.
[INFO] Excluding com.google.apis:google-api-services-pubsub:jar:v1-rev10-1.22.0 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 from the 
shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-protobuf:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-auth:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-netty:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http2:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-buffer:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-common:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-transport:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-resolver:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding commons-logging:commons-logging:jar:1.2 from the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.opencensus:opencensus-api:jar:0.7.0 from the shaded jar.
[INFO] Excluding io.dropwizard.metrics:metrics-core:jar:3.1.2 from the shaded 
jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-all:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-okhttp:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.squareup.okhttp:okhttp:jar:2.5.0 from the shaded jar.
[INFO] Excluding com.squareup.okio:okio:jar:1.6.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-lite:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-nano:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 
from the shaded jar.
[INFO] Excluding io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 
from the shaded jar.
[INFO] Excluding

Build failed in Jenkins: beam_PerformanceTests_XmlIOIT_HDFS #44

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[ehudm] Add lint checks for modules under sdks/python/.

[kedin] Add Row Json Deserializer

[kedin] Add RowJsonValueExtractors

[aaltay] [BEAM-4028] Adding NameContext to Python SDK. (#5043)

--
[...truncated 181.54 KB...]
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy61.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:68)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:248)
at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:235)
at 
org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:923)
at 
org.apache.beam.sdk.io.WriteFiles$WriteUnshardedTempFilesWithSpillingFn.processElement(WriteFiles.java:503)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy60.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy61.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:109)
at

[jira] [Work logged] (BEAM-2898) Flink supports chaining/fusion of single-SDK stages

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2898?focusedWorklogId=90921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90921
 ]

ASF GitHub Bot logged work on BEAM-2898:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:15
Start Date: 13/Apr/18 18:15
Worklog Time Spent: 10m 
  Work Description: tgroh closed pull request #4783: [BEAM-2898] Support 
Impulse transforms in Flink batch runner
URL: https://github.com/apache/beam/pull/4783
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/runners/apex/pom.xml b/runners/apex/pom.xml
index 5faf95f8853..04147ba9bf5 100644
--- a/runners/apex/pom.xml
+++ b/runners/apex/pom.xml
@@ -231,6 +231,7 @@
 org.apache.beam.sdk.testing.UsesSplittableParDo,
 org.apache.beam.sdk.testing.UsesAttemptedMetrics,
 org.apache.beam.sdk.testing.UsesCommittedMetrics,
+org.apache.beam.sdk.testing.UsesImpulse,
 org.apache.beam.sdk.testing.UsesTestStream
   
   none
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java
index 40e08360087..128b826ecb0 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java
@@ -64,6 +64,8 @@ private static PTransformMatcher boundedMatcher() {
 ReadTranslation.sourceIsBounded(transform) == 
PCollection.IsBounded.BOUNDED);
   }
 
+  // TODO: https://issues.apache.org/jira/browse/BEAM-3859 Support unbounded 
reads via impulse.
+
   private static class BoundedReadViaImpulse extends PTransform {
 private final BoundedSource source;
 
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuser.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuser.java
index 35e079151bd..abbba2a91a9 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuser.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuser.java
@@ -70,13 +70,13 @@ private GreedyPipelineFuser(Pipeline p) {
   }
 
   /**
-   * Fuses a {@link Pipeline} into a collection of {@link ExecutableStage}s.
+   * Fuses a {@link Pipeline} into a collection of {@link ExecutableStage 
ExecutableStages}.
*
* This fuser expects each ExecutableStage to have exactly one input. 
This means that pipelines
* must be rooted at Impulse, or other runner-executed primitive transforms, 
instead of primitive
* Read nodes. The utilities in
-   * {@link org.apache.beam.runners.core.construction.JavaReadViaImpulse} can 
be used to translate
-   * non-compliant pipelines.
+   * {@link org.apache.beam.runners.core.construction.JavaReadViaImpulse} can 
be used to convert
+   * bounded pipelines using the Read primitive.
*/
   public static FusedPipeline fuse(Pipeline p) {
 GreedyPipelineFuser fuser = new GreedyPipelineFuser(p);
diff --git a/runners/flink/build.gradle b/runners/flink/build.gradle
index d6673ae246d..b633f7c060a 100644
--- a/runners/flink/build.gradle
+++ b/runners/flink/build.gradle
@@ -104,6 +104,7 @@ def createValidatesRunnerTask(Map m) {
 excludeCategories 
'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders'
 excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB'
 excludeCategories 'org.apache.beam.sdk.testing.UsesCommittedMetrics'
+excludeCategories 'org.apache.beam.sdk.testing.UsesImpulse'
 excludeCategories 'org.apache.beam.sdk.testing.UsesTestStream'
   }
 } else {
@@ -111,8 +112,8 @@ def createValidatesRunnerTask(Map m) {
 includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner'
 excludeCategories 
'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders'
 excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB'
-excludeCategories 'org.apache.beam.sdk.testing.UsesSplittableParDo'
 excludeCategories 'org.apache.beam.sdk.testing.UsesCommittedMetrics'
+excludeCategories 'org.apache.beam.sdk.testing.UsesSplittableParDo'
 excludeCategories 'org.apache.beam.sdk.testing.UsesTestStream'
   }
 }
diff --git

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90926
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 18:22
Start Date: 13/Apr/18 18:22
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-381204818
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90926)
Time Spent: 90h 50m  (was: 90h 40m)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 90h 50m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-4060) meta-info/poms missing in snapshots

2018-04-13 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-4060:
---
Priority: Blocker  (was: Major)

> meta-info/poms missing in snapshots
> ---
>
> Key: BEAM-4060
> URL: https://issues.apache.org/jira/browse/BEAM-4060
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Rafal Wojdyla
>Assignee: Luke Cwik
>Priority: Blocker
>
> Current snapshots are missing bunch of meta-info files, including pom.xml and 
> pom.properties:
> 2.4.0-SNAPSHOT example:
> {noformat}
> jar -tf 
> ~/.ivy2/cache/org.apache.beam/beam-runners-direct-java/jars/beam-runners-direct-java-2.4.0-SNAPSHOT.jar
>  | grep META-INFMETA-INF/
> META-INF/DEPENDENCIES
> META-INF/LICENSE
> META-INF/MANIFEST.MF
> META-INF/NOTICE
> META-INF/maven/
> META-INF/maven/com.google.code.findbugs/
> META-INF/maven/com.google.code.findbugs/jsr305/
> META-INF/maven/com.google.code.findbugs/jsr305/pom.properties
> META-INF/maven/com.google.code.findbugs/jsr305/pom.xml
> META-INF/maven/com.google.guava/
> META-INF/maven/com.google.guava/guava/
> META-INF/maven/com.google.guava/guava/pom.properties
> META-INF/maven/com.google.guava/guava/pom.xml
> META-INF/maven/com.google.protobuf/
> META-INF/maven/com.google.protobuf/protobuf-java-util/
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.xml
> META-INF/maven/com.google.protobuf/protobuf-java/
> META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java/pom.xml
> META-INF/maven/org.apache.beam/
> META-INF/maven/org.apache.beam/beam-model-pipeline/
> META-INF/maven/org.apache.beam/beam-model-pipeline/pom.properties
> META-INF/maven/org.apache.beam/beam-model-pipeline/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-core-java/
> META-INF/maven/org.apache.beam/beam-runners-core-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-core-java/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-direct-java/
> META-INF/maven/org.apache.beam/beam-runners-direct-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-direct-java/pom.xml
> META-INF/services/
> META-INF/services/org.apache.beam.runners.direct.repackaged.runners.core.construction.CoderTranslatorRegistrar
> META-INF/services/org.apache.beam.runners.direct.repackaged.runners.core.construction.TransformPayloadTranslatorRegistrar
> META-INF/services/org.apache.beam.sdk.options.PipelineOptionsRegistrar
> META-INF/services/org.apache.beam.sdk.runners.PipelineRunnerRegistrar
> {noformat}
> 2.5.0-SNAPSHOT:
> {noformat}
> circleci@5abc19b95c60:~/scio$ jar -tf 
> ~/.ivy2/cache/org.apache.beam/beam-runners-direct-java/jars/beam-runners-direct-java-2.5.0-SNAPSHOT.jar
>   | grep META-INF
> META-INF/
> META-INF/MANIFEST.MF
> META-INF/services/
> META-INF/maven/
> META-INF/maven/com.google.guava/
> META-INF/maven/com.google.guava/guava/
> META-INF/maven/com.google.guava/guava/pom.properties
> META-INF/maven/com.google.guava/guava/pom.xml
> META-INF/maven/com.google.protobuf/
> META-INF/maven/com.google.protobuf/protobuf-java-util/
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.xml
> META-INF/maven/com.google.protobuf/protobuf-java/
> META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java/pom.xml
> META-INF/services/org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.TransformPayloadTranslatorRegistrar
> META-INF/services/org.apache.beam.sdk.runners.PipelineRunnerRegistrar
> META-INF/services/org.apache.beam.sdk.options.PipelineOptionsRegistrar
> META-INF/services/org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.CoderTranslatorRegistrar
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90891
 ]

ASF GitHub Bot logged work on BEAM-4049:


Author: ASF GitHub Bot
Created on: 13/Apr/18 16:57
Start Date: 13/Apr/18 16:57
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async 
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181449900
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
 ##
 @@ -350,19 +353,21 @@ public TokenRange(
* Writer storing an entity into Apache Cassandra database.
*/
   protected class WriterImpl implements Writer {
-
+private static final int CONCURRENT_ASYNC_QUERIES = 100;
 
 Review comment:
   Ok, let's keep it as it for now to avoid adding new tuning knobs (as it's 
not recommended in Beam - 
https://beam.apache.org/contribute/ptransform-style-guide/#what-parameters-to-expose)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90891)
Time Spent: 1h 40m  (was: 1.5h)

> Improve write throughput of CassandraIO
> ---
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Affects Versions: 2.4.0
>Reporter: Alexander Dejanovski
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Labels: performance
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous 
> fashion. 
> This implies that writes are serialized and is a very suboptimal way of 
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait 
> for completion each time 100 queries are in flight, in order to avoid 
> overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3485) CassandraIO.read() splitting produces invalid queries

2018-04-13 Thread Alexander Dejanovski (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437588#comment-16437588
 ] 

Alexander Dejanovski commented on BEAM-3485:


# So, out of experience I know that most clusters out there are running with 16 
to 256 vnodes per node, times the number of nodes we're going to generate a lot 
of splits. Still, it would be good to be able to enforce a minimum number of 
splits if needed, so I'd be in favor of adding it as optional input. If the 
computed number of splits is lower (or if Beam fails to compute it) then we 
should fallback to the user input.
Tell me if you agree and I'll add it.
 # It is for Murmur3 but it could be good to support the RandomPartitioner 
which uses tokens between 0 and 2^127-1, which should be out of the Long span. 

> CassandraIO.read() splitting produces invalid queries
> -
>
> Key: BEAM-3485
> URL: https://issues.apache.org/jira/browse/BEAM-3485
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Reporter: Eugene Kirpichov
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See 
> [https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264?noredirect=1#comment83548442_48131264]
> As the question author points out, the error is likely that token($pk) should 
> be token(pk). This was likely masked by BEAM-3424 and BEAM-3425, and the 
> splitting code path effectively was never invoked, and was broken from the 
> first PR - so there are likely other bugs.
> When testing this issue, we must ensure good code coverage in an IT against a 
> real Cassandra instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4068) Consistent option specification between SDKs and runners by URN

2018-04-13 Thread Ben Sidhom (JIRA)

Ben Sidhom created BEAM-4068:


 Summary: Consistent option specification between SDKs and runners 
by URN
 Key: BEAM-4068
 URL: https://issues.apache.org/jira/browse/BEAM-4068
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Ben Sidhom
Assignee: Kenneth Knowles


Pipeline options are materialized differently by different SDKs. However, in 
some cases, runners require Java-specific options that are not available 
elsewhere. We should decide on well-known URNs and use them across SDKs where 
applicable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4069) Empty pipeline options can be gracefully serialized/deserialized

2018-04-13 Thread Ben Sidhom (JIRA)

Ben Sidhom created BEAM-4069:


 Summary: Empty pipeline options can be gracefully 
serialized/deserialized
 Key: BEAM-4069
 URL: https://issues.apache.org/jira/browse/BEAM-4069
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Ben Sidhom
Assignee: Ben Sidhom


PipelineOptionsTranslation.fromProto currently crashes with a 
NullPointerException when passed an empty options Struct. This is due to 
ProxyInvocationHandler.Deserializer expecting a non-empty enclosing Struct.

Empty pipeline options may be passed by SDKs interacting with a job server, so 
this case needs to be handled. Note that testing a round-trip of an 
effectively-empty Java PipelineOptions object is not sufficient to catch this 
because "empty" Java options still contain default fields not defined in other 
SDKs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3339) Create post-release testing of the nightly snapshots

2018-04-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3339?focusedWorklogId=90911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90911
 ]

ASF GitHub Bot logged work on BEAM-3339:


Author: ASF GitHub Bot
Created on: 13/Apr/18 17:45
Start Date: 13/Apr/18 17:45
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4788: [BEAM-3339] Mobile 
gaming automation for Java nightly snapshot on core runners
URL: https://github.com/apache/beam/pull/4788#issuecomment-371945017
 
 
   Run Dataflow PostRelease


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90911)
Time Spent: 90h 40m  (was: 90.5h)

> Create post-release testing of the nightly snapshots
> 
>
> Key: BEAM-3339
> URL: https://issues.apache.org/jira/browse/BEAM-3339
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Alan Myrvold
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 90h 40m
>  Remaining Estimate: 0h
>
> The nightly java snapshots in 
> https://repository.apache.org/content/groups/snapshots/org/apache/beam should 
> be verified by following the 
> https://beam.apache.org/get-started/quickstart-java/ instructions, to verify 
> that the release is usable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (BEAM-4070) Disable cython profiling by default

2018-04-13 Thread Boyuan Zhang (JIRA)

Boyuan Zhang created BEAM-4070:
--

 Summary: Disable cython profiling by default
 Key: BEAM-4070
 URL: https://issues.apache.org/jira/browse/BEAM-4070
 Project: Beam
  Issue Type: Task
  Components: sdk-py-core
Reporter: Boyuan Zhang
Assignee: Ahmet Altay


Enabling cython profiling adds some overhead.

http://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-4070) Disable cython profiling by default

2018-04-13 Thread Boyuan Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang reassigned BEAM-4070:
--

Assignee: Boyuan Zhang  (was: Ahmet Altay)

> Disable cython profiling by default
> ---
>
> Key: BEAM-4070
> URL: https://issues.apache.org/jira/browse/BEAM-4070
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> Enabling cython profiling adds some overhead.
> http://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-4060) meta-info/poms missing in snapshots

2018-04-13 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-4060:
---
Issue Type: Sub-task  (was: Bug)
Parent: BEAM-3249

> meta-info/poms missing in snapshots
> ---
>
> Key: BEAM-4060
> URL: https://issues.apache.org/jira/browse/BEAM-4060
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Rafal Wojdyla
>Assignee: Luke Cwik
>Priority: Blocker
>
> Current snapshots are missing bunch of meta-info files, including pom.xml and 
> pom.properties:
> 2.4.0-SNAPSHOT example:
> {noformat}
> jar -tf 
> ~/.ivy2/cache/org.apache.beam/beam-runners-direct-java/jars/beam-runners-direct-java-2.4.0-SNAPSHOT.jar
>  | grep META-INFMETA-INF/
> META-INF/DEPENDENCIES
> META-INF/LICENSE
> META-INF/MANIFEST.MF
> META-INF/NOTICE
> META-INF/maven/
> META-INF/maven/com.google.code.findbugs/
> META-INF/maven/com.google.code.findbugs/jsr305/
> META-INF/maven/com.google.code.findbugs/jsr305/pom.properties
> META-INF/maven/com.google.code.findbugs/jsr305/pom.xml
> META-INF/maven/com.google.guava/
> META-INF/maven/com.google.guava/guava/
> META-INF/maven/com.google.guava/guava/pom.properties
> META-INF/maven/com.google.guava/guava/pom.xml
> META-INF/maven/com.google.protobuf/
> META-INF/maven/com.google.protobuf/protobuf-java-util/
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.xml
> META-INF/maven/com.google.protobuf/protobuf-java/
> META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java/pom.xml
> META-INF/maven/org.apache.beam/
> META-INF/maven/org.apache.beam/beam-model-pipeline/
> META-INF/maven/org.apache.beam/beam-model-pipeline/pom.properties
> META-INF/maven/org.apache.beam/beam-model-pipeline/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-core-construction-java/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-core-java/
> META-INF/maven/org.apache.beam/beam-runners-core-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-core-java/pom.xml
> META-INF/maven/org.apache.beam/beam-runners-direct-java/
> META-INF/maven/org.apache.beam/beam-runners-direct-java/pom.properties
> META-INF/maven/org.apache.beam/beam-runners-direct-java/pom.xml
> META-INF/services/
> META-INF/services/org.apache.beam.runners.direct.repackaged.runners.core.construction.CoderTranslatorRegistrar
> META-INF/services/org.apache.beam.runners.direct.repackaged.runners.core.construction.TransformPayloadTranslatorRegistrar
> META-INF/services/org.apache.beam.sdk.options.PipelineOptionsRegistrar
> META-INF/services/org.apache.beam.sdk.runners.PipelineRunnerRegistrar
> {noformat}
> 2.5.0-SNAPSHOT:
> {noformat}
> circleci@5abc19b95c60:~/scio$ jar -tf 
> ~/.ivy2/cache/org.apache.beam/beam-runners-direct-java/jars/beam-runners-direct-java-2.5.0-SNAPSHOT.jar
>   | grep META-INF
> META-INF/
> META-INF/MANIFEST.MF
> META-INF/services/
> META-INF/maven/
> META-INF/maven/com.google.guava/
> META-INF/maven/com.google.guava/guava/
> META-INF/maven/com.google.guava/guava/pom.properties
> META-INF/maven/com.google.guava/guava/pom.xml
> META-INF/maven/com.google.protobuf/
> META-INF/maven/com.google.protobuf/protobuf-java-util/
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java-util/pom.xml
> META-INF/maven/com.google.protobuf/protobuf-java/
> META-INF/maven/com.google.protobuf/protobuf-java/pom.properties
> META-INF/maven/com.google.protobuf/protobuf-java/pom.xml
> META-INF/services/org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.TransformPayloadTranslatorRegistrar
> META-INF/services/org.apache.beam.sdk.runners.PipelineRunnerRegistrar
> META-INF/services/org.apache.beam.sdk.options.PipelineOptionsRegistrar
> META-INF/services/org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.CoderTranslatorRegistrar
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (BEAM-2588) Portable Flink Runner Job API

2018-04-13 Thread Ben Sidhom (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Sidhom updated BEAM-2588:
-
Summary: Portable Flink Runner Job API  (was: FlinkRunner shim for serving 
Job API)

> Portable Flink Runner Job API
> -
>
> Key: BEAM-2588
> URL: https://issues.apache.org/jira/browse/BEAM-2588
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Axel Magnuson
>Priority: Major
>  Labels: portability
>
> Whatever the result of https://s.apache.org/beam-job-api we will need a way 
> for the JVM-based FlinkRunner to receive and run pipelines authors in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failed in Jenkins: beam_PerformanceTests_Spark #1587

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[sidhom] Support impulse transforms in Flink

[sidhom] Add Impulse ValidatesRunner test

[tgroh] Fix materializesWithDifferentEnvConsumer

[tgroh] Reduce Requirements to be considered a Primitve

[sidhom] [BEAM-3994] Use typed client pool sinks and sources

[sidhom] [BEAM-3966] Move functional utilities into shared module

[sidhom] Use general functional interfaces in ControlClientPool

[sidhom] Rename createLinked() to createBuffered() in QueueControlClientPool

[github] Add region argument to dataflow.go

[github] Region isn't on proto; create and get instead.

[tgroh] Rename `defaultRegistry` to `javaSdkNativeRegistry`

--
[...truncated 92.24 KB...]
'apache-beam-testing:bqjob_r4516e5798ee1ecd7_0162c1b3b63f_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r4516e5798ee1ecd7_0162c1b3b63f_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r4516e5798ee1ecd7_0162c1b3b63f_1 ... (0s) Current status: DONE   
2018-04-14 01:10:05,000 ddb101a4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-14 01:10:23,596 ddb101a4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-14 01:10:26,013 ddb101a4 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r3e80b267b7e9fab4_0162c1b40810_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r3e80b267b7e9fab4_0162c1b40810_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r3e80b267b7e9fab4_0162c1b40810_1 ... (0s) Current status: DONE   
2018-04-14 01:10:26,014 ddb101a4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-14 01:10:42,791 ddb101a4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-04-14 01:10:45,022 ddb101a4 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r24a66a7f6bf88bd9_0162c1b452a4_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r24a66a7f6bf88bd9_0162c1b452a4_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r24a66a7f6bf88bd9_0162c1b452a4_1 ... (0s) Current status: DONE   
2018-04-14 01:10:45,023 ddb101a4 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-04-14 01:11:11,888 ddb101a4 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #94

2018-04-13 Thread Apache Jenkins Server

See 


Changes:

[altay] Cythonize DistributionAccumulator

--
[...truncated 1.24 MB...]
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:627)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:626)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:828)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:626)
at 
org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:169)
at 
org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:123)
at 
org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:83)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:346)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:328)
at 
org.apache.beam.runners.spark.translation.streaming.CreateStreamTest.testFirstElementLate(CreateStreamTest.java:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:317)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy3.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at

1 2 3 >

1 - 100 of 230 matches

Mail list logo