Build failed in Jenkins: beam_PerformanceTests_Python #1014

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[chamikara] [BEAM-3217] Jenkins job for HadoopInputFormatIOIT (#4758)

--
[...truncated 1.11 KB...]
 > git rev-list --no-walk 8739d179738cc6f13bbc46450fea5260d84480d5 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6350957535074079850.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4460057535873778469.sh
+ rm -rf .env
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins8377841373395627616.sh
+ virtualenv .env --system-site-packages
New python executable in 

Installing setuptools, pip, wheel...done.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins2944863083441394228.sh
+ .env/bin/pip install --upgrade setuptools pip
Requirement already up-to-date: setuptools in ./.env/lib/python2.7/site-packages
Requirement already up-to-date: pip in ./.env/lib/python2.7/site-packages
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4356819988515387982.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3714709656202860957.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Collecting numpy==1.13.3 (from -r PerfKitBenchmarker/requirements.txt (line 22))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe>=0.23 in 

Jenkins build is back to normal : beam_PerformanceTests_TextIOIT #259

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-3816) [nexmark] Something is slightly off with Query 6

2018-03-12 Thread Etienne Chauchot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394939#comment-16394939
 ] 

Etienne Chauchot commented on BEAM-3816:


Good catch!

When I submitted the first Nexmark PR I did a quick doc to ease the review in 
which there is pseudo code for each query. It could be quicker for you than 
reading the code. Here it is

[https://docs.google.com/document/d/1VgnGiVu8vSfm7Et-xAtQYv0PlEpqeyfmhpQUNPmWRJs/edit#heading=h.wvl8pjxbcudc]

Besides in the nexmark presentation we did at the ApacheCon there is also a 
schema for query 6 in slide 29:

https://drive.google.com/open?id=0ByTW9khE9-fVNzhlUUhYN0hWdEk

 

> [nexmark] Something is slightly off with Query 6
> 
>
> Key: BEAM-3816
> URL: https://issues.apache.org/jira/browse/BEAM-3816
> Project: Beam
>  Issue Type: Bug
>  Components: examples-nexmark
>Reporter: Andrew Pilloud
>Priority: Major
>  Labels: easyfix, newbie, nexmark, test
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> java.lang.AssertionError: Query6/Query6.Stamp/ParMultiDo(Anonymous).output: 
> wrong pipeline output Expected: <[\{"seller":1048,"price":83609648}, 
> \{"seller":1052,"price":61788353}, \{"seller":1086,"price":33744823}, 
> \{"seller":1078,"price":19876735}, \{"seller":1058,"price":50692833}, 
> \{"seller":1044,"price":6719489}, \{"seller":1096,"price":31287415}, 
> \{"seller":1095,"price":37004879}, \{"seller":1082,"price":22528654}, 
> \{"seller":1006,"price":57288736}, \{"seller":1051,"price":3967261}, 
> \{"seller":1084,"price":6394160}, \{"seller":1020,"price":3871757}, 
> \{"seller":1007,"price":185293}, \{"seller":1031,"price":11840889}, 
> \{"seller":1080,"price":26896442}, \{"seller":1030,"price":294928}, 
> \{"seller":1066,"price":26839191}, \{"seller":1000,"price":28257749}, 
> \{"seller":1055,"price":17087173}, \{"seller":1072,"price":45662210}, 
> \{"seller":1057,"price":4568399}, \{"seller":1025,"price":29008970}, 
> \{"seller":1064,"price":85810641}, \{"seller":1040,"price":99819658}, 
> \{"seller":1014,"price":11256690}, \{"seller":1098,"price":97259323}, 
> \{"seller":1011,"price":20447800}, \{"seller":1092,"price":77520938}, 
> \{"seller":1010,"price":53323687}, \{"seller":1060,"price":70032044}, 
> \{"seller":1062,"price":29076960}, \{"seller":1075,"price":19451464}, 
> \{"seller":1087,"price":27669185}, \{"seller":1009,"price":22951354}, 
> \{"seller":1065,"price":71875611}, \{"seller":1063,"price":87596779}, 
> \{"seller":1021,"price":62918185}, \{"seller":1034,"price":18472448}, 
> \{"seller":1028,"price":68556008}, \{"seller":1070,"price":92550447}]> but: 
> was <[\{"seller":1048,"price":83609648}, \{"seller":1052,"price":61788353}, 
> \{"seller":1086,"price":33744823}, \{"seller":1078,"price":19876735}, 
> \{"seller":1058,"price":50692833}, \{"seller":1044,"price":6719489}, 
> \{"seller":1096,"price":31287415}, \{"seller":1095,"price":37004879}, 
> \{"seller":1082,"price":22528654}, \{"seller":1006,"price":57288736}, 
> \{"seller":1051,"price":3967261}, \{"seller":1084,"price":6394160}, 
> \{"seller":1000,"price":34395558}, \{"seller":1020,"price":3871757}, 
> \{"seller":1007,"price":185293}, \{"seller":1031,"price":11840889}, 
> \{"seller":1080,"price":26896442}, \{"seller":1030,"price":294928}, 
> \{"seller":1066,"price":26839191}, \{"seller":1055,"price":17087173}, 
> \{"seller":1072,"price":45662210}, \{"seller":1057,"price":4568399}, 
> \{"seller":1025,"price":29008970}, \{"seller":1064,"price":85810641}, 
> \{"seller":1040,"price":99819658}, \{"seller":1014,"price":11256690}, 
> \{"seller":1098,"price":97259323}, \{"seller":1011,"price":20447800}, 
> \{"seller":1092,"price":77520938}, \{"seller":1010,"price":53323687}, 
> \{"seller":1060,"price":70032044}, \{"seller":1062,"price":29076960}, 
> \{"seller":1075,"price":19451464}, \{"seller":1087,"price":27669185}, 
> \{"seller":1009,"price":22951354}, \{"seller":1065,"price":71875611}, 
> \{"seller":1063,"price":87596779}, \{"seller":1021,"price":62918185}, 
> \{"seller":1034,"price":18472448}, \{"seller":1028,"price":68556008}, 
> \{"seller":1070,"price":92550447}]>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_HadoopInputFormat #9

2018-03-12 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #5138

2018-03-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_JDBC #320

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[chamikara] [BEAM-3217] Jenkins job for HadoopInputFormatIOIT (#4758)

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam1 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision af28351e896b71579bac3f530544fd5c9a7e9a44 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f af28351e896b71579bac3f530544fd5c9a7e9a44
Commit message: "[BEAM-3217] Jenkins job for HadoopInputFormatIOIT (#4758)"
 > git rev-list --no-walk 8739d179738cc6f13bbc46450fea5260d84480d5 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins2087494470902224142.sh
+ gcloud container clusters get-credentials io-datastores --zone=us-central1-a 
--verbosity=debug
DEBUG: Running gcloud.container.clusters.get-credentials with 
Namespace(__calliope_internal_deepest_parser=ArgumentParser(prog='gcloud.container.clusters.get-credentials',
 usage=None, description='See 
https://cloud.google.com/container-engine/docs/kubectl for\nkubectl 
documentation.', version=None, formatter_class=, conflict_handler='error', add_help=False), 
account=None, api_version=None, authority_selector=None, 
authorization_token_file=None, cmd_func=>, 
command_path=['gcloud', 'container', 'clusters', 'get-credentials'], 
configuration=None, credential_file_override=None, document=None, format=None, 
h=None, help=None, http_timeout=None, log_http=None, name='io-datastores', 
project=None, quiet=None, trace_email=None, trace_log=None, trace_token=None, 
user_output_enabled=None, verbosity='debug', version=None, 
zone='us-central1-a').
Fetching cluster endpoint and auth data.
DEBUG: Saved kubeconfig to /home/jenkins/.kube/config
kubeconfig entry generated for io-datastores.
INFO: Display format "default".
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins2531143870726764645.sh
+ cp /home/jenkins/.kube/config 

[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins57141665798749715.sh
+ kubectl 
--kubeconfig=
 create namespace jdbcioit-1520830946699
namespace "jdbcioit-1520830946699" created
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins8568486586580637780.sh
++ kubectl config current-context
+ kubectl 
--kubeconfig=
 config set-context gke_apache-beam-testing_us-central1-a_io-datastores 
--namespace=jdbcioit-1520830946699
Context "gke_apache-beam-testing_us-central1-a_io-datastores" modified.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4333419699857998037.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins5795402172359791504.sh
+ rm -rf .env
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins6905008518353066497.sh
+ virtualenv .env --system-site-packages
New python executable in 

Installing setuptools, pip, wheel...done.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins1274075968534736337.sh
+ .env/bin/pip install --upgrade setuptools pip
Requirement already up-to-date: setuptools in ./.env/lib/python2.7/site-packages
Requirement already up-to-date: pip in ./.env/lib/python2.7/site-packages
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins1259343240878361632.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins2101286615910043357.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Apex #3678

2018-03-12 Thread Apache Jenkins Server
See 




[beam] 01/01: Merge pull request #4143 from mdvorsky/faster_rnd

2018-03-12 Thread echauchot
This is an automated email from the ASF dual-hosted git repository.

echauchot pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 259e7f6030b23759ba4efbfc089af2b6a00e6ed5
Merge: af28351 1d781ea
Author: Etienne Chauchot 
AuthorDate: Mon Mar 12 09:02:38 2018 +0100

Merge pull request #4143 from mdvorsky/faster_rnd

Merge pull request #4143: Improve implementation of 
nexmark.StringsGenerator.nextExactString()

 .../sdk/nexmark/sources/generator/model/StringsGenerator.java  | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
echauc...@apache.org.


[beam] branch master updated (af28351 -> 259e7f6)

2018-03-12 Thread echauchot
This is an automated email from the ASF dual-hosted git repository.

echauchot pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from af28351  [BEAM-3217] Jenkins job for HadoopInputFormatIOIT (#4758)
 add 1d781ea  Improve implementation of 
nexmark.StringsGenerator.nextExactString()
 new 259e7f6  Merge pull request #4143 from mdvorsky/faster_rnd

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../sdk/nexmark/sources/generator/model/StringsGenerator.java  | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
echauc...@apache.org.


Build failed in Jenkins: beam_PerformanceTests_Spark #1458

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[chamikara] [BEAM-3217] Jenkins job for HadoopInputFormatIOIT (#4758)

--
[...truncated 91.99 KB...]
'apache-beam-testing:bqjob_r51a325bd1c71de61_016218dc72bd_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 06:18:45,247 c0b416ec MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 06:19:04,391 c0b416ec MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 06:19:06,579 c0b416ec MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.18s,  CPU:0.26s,  MaxMemory:25416kb 
STDOUT: Upload complete.
Waiting on bqjob_r58bcde494d2e3114_016218dcc65c_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r58bcde494d2e3114_016218dcc65c_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r58bcde494d2e3114_016218dcc65c_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 06:19:06,580 c0b416ec MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 06:19:31,846 c0b416ec MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 06:19:34,140 c0b416ec MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.28s,  CPU:0.37s,  MaxMemory:25524kb 
STDOUT: Upload complete.
Waiting on bqjob_r6169fb8a4d9cd638_016218dd3234_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r6169fb8a4d9cd638_016218dd3234_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r6169fb8a4d9cd638_016218dd3234_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 06:19:34,141 c0b416ec MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 06:19:50,563 c0b416ec MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 06:19:53,099 c0b416ec MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.52s,  CPU:0.22s,  MaxMemory:25328kb 
STDOUT: Upload complete.
Waiting on bqjob_r10b8d7cf49d8b436_016218dd7aa8_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r10b8d7cf49d8b436_016218dd7aa8_1 ... (0s) 
Current status: DONE   

Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Spark #4400

2018-03-12 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Apex #3679

2018-03-12 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5139

2018-03-12 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Flink #5220

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79336
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 12/Mar/18 06:31
Start Date: 12/Mar/18 06:31
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173702692
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   In the case where the user provides its own `dataSourceFactory`, I guess he 
wants to have complete control and can "wrap" himself in a pool no ? The 
pooling is in the case of the user provides a `dataSource` or configuration to 
create a `dataSource` internally in the IO (using DBCP). Thoughts ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79336)
Time Spent: 3h 40m  (was: 3.5h)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #5221

2018-03-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #10

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[mariand] Improve implementation of nexmark.StringsGenerator.nextExactString()

--
[...truncated 38.32 KB...]
[INFO] Excluding io.grpc:grpc-context:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.instrumentation:instrumentation-api:jar:0.3.0 from 
the shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-bigquery:jar:v2-rev374-1.22.0 from the 
shaded jar.
[INFO] Excluding com.google.api:gax-grpc:jar:0.20.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.api:api-common:jar:1.0.0-rc2 from the shaded jar.
[INFO] Excluding com.google.api:gax:jar:1.3.1 from the shaded jar.
[INFO] Excluding org.threeten:threetenbp:jar:1.3.3 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core-grpc:jar:1.2.0 from the 
shaded jar.
[INFO] Excluding com.google.protobuf:protobuf-java-util:jar:3.2.0 from the 
shaded jar.
[INFO] Excluding com.google.apis:google-api-services-pubsub:jar:v1-rev10-1.22.0 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 from the 
shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-protobuf:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson:jar:1.22.0 
from the shaded jar.
[INFO] Excluding com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-auth:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-netty:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.netty:netty-codec-http2:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-all:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-okhttp:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.squareup.okhttp:okhttp:jar:2.5.0 from the shaded jar.
[INFO] Excluding com.squareup.okio:okio:jar:1.6.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-lite:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-nano:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 
from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.opencensus:opencensus-api:jar:0.7.0 from the shaded jar.
[INFO] Excluding io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client:jar:1.22.0 from the 
shaded jar.
[INFO] Excluding com.google.oauth-client:google-oauth-client:jar:1.22.0 from 
the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client:jar:1.22.0 from the 
shaded jar.
[INFO] Excluding 

Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Apex #3680

2018-03-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Spark #1459

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[mariand] Improve implementation of nexmark.StringsGenerator.nextExactString()

--
[...truncated 92.08 KB...]
'apache-beam-testing:bqjob_r5bc108e8431ef2a6_01621a2607a8_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 12:18:44,683 7f699926 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 12:19:08,211 7f699926 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 12:19:10,471 7f699926 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.25s,  CPU:0.22s,  MaxMemory:25476kb 
STDOUT: Upload complete.
Waiting on bqjob_r344f0f6a942e211f_01621a266c30_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r344f0f6a942e211f_01621a266c30_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r344f0f6a942e211f_01621a266c30_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 12:19:10,472 7f699926 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 12:19:34,738 7f699926 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 12:19:37,079 7f699926 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.33s,  CPU:0.20s,  MaxMemory:25360kb 
STDOUT: Upload complete.
Waiting on bqjob_r49c7ba1a9cae97ae_01621a26d3d4_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r49c7ba1a9cae97ae_01621a26d3d4_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r49c7ba1a9cae97ae_01621a26d3d4_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 12:19:37,079 7f699926 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 12:19:59,566 7f699926 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 12:20:01,617 7f699926 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.04s,  CPU:0.25s,  MaxMemory:25332kb 
STDOUT: Upload complete.
Waiting on bqjob_r6959a2f469ed50d9_01621a2734e3_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r6959a2f469ed50d9_01621a2734e3_1 ... (0s) 
Current status: 

Build failed in Jenkins: beam_PerformanceTests_Python #1015

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[mariand] Improve implementation of nexmark.StringsGenerator.nextExactString()

--
[...truncated 1.74 KB...]
+ rm -rf .env
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5541120375992188366.sh
+ virtualenv .env --system-site-packages
New python executable in .env/bin/python
Installing setuptools, pip...done.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins2580336623527017581.sh
+ .env/bin/pip install --upgrade setuptools pip
Downloading/unpacking setuptools from 
https://pypi.python.org/packages/ad/dc/fcced9ec3f2561c0cbe8eb6527eef7cf4f4919a2b3a07891a36e846635af/setuptools-38.5.2-py2.py3-none-any.whl#md5=abd3307cdce6fb543b5a4d0e3e98bdb6
Downloading/unpacking pip from 
https://pypi.python.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#md5=297dbd16ef53bcef0447d245815f5144
Installing collected packages: setuptools, pip
  Found existing installation: setuptools 2.2
Uninstalling setuptools:
  Successfully uninstalled setuptools
  Found existing installation: pip 1.5.4
Uninstalling pip:
  Successfully uninstalled pip
Successfully installed setuptools pip
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3382612300424869383.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/jenkins822369944527949040.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Collecting numpy==1.13.3 (from -r PerfKitBenchmarker/requirements.txt (line 22))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe>=0.23 in 
/usr/local/lib/python2.7/dist-packages (from jinja2>=2.7->-r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: colorama; extra == "windows" in 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5140

2018-03-12 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #5222

2018-03-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_JDBC #321

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[mariand] Improve implementation of nexmark.StringsGenerator.nextExactString()

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 259e7f6030b23759ba4efbfc089af2b6a00e6ed5 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 259e7f6030b23759ba4efbfc089af2b6a00e6ed5
Commit message: "Merge pull request #4143 from mdvorsky/faster_rnd"
 > git rev-list --no-walk af28351e896b71579bac3f530544fd5c9a7e9a44 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins2551720807077895837.sh
+ gcloud container clusters get-credentials io-datastores --zone=us-central1-a 
--verbosity=debug
DEBUG: Running gcloud.container.clusters.get-credentials with 
Namespace(__calliope_internal_deepest_parser=ArgumentParser(prog='gcloud.container.clusters.get-credentials',
 usage=None, description='See 
https://cloud.google.com/container-engine/docs/kubectl for\nkubectl 
documentation.', version=None, formatter_class=, conflict_handler='error', add_help=False), 
account=None, api_version=None, authority_selector=None, 
authorization_token_file=None, cmd_func=>, 
command_path=['gcloud', 'container', 'clusters', 'get-credentials'], 
configuration=None, credential_file_override=None, document=None, format=None, 
h=None, help=None, http_timeout=None, log_http=None, name='io-datastores', 
project=None, quiet=None, trace_email=None, trace_log=None, trace_token=None, 
user_output_enabled=None, verbosity='debug', version=None, 
zone='us-central1-a').
WARNING: Accessing a Container Engine cluster requires the kubernetes 
commandline
client [kubectl]. To install, run
  $ gcloud components install kubectl

Fetching cluster endpoint and auth data.
DEBUG: Saved kubeconfig to /home/jenkins/.kube/config
kubeconfig entry generated for io-datastores.
INFO: Display format "default".
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins8831017092201434152.sh
+ cp /home/jenkins/.kube/config 

[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins7660060960970914979.sh
+ kubectl 
--kubeconfig=
 create namespace jdbcioit-1520848945942
namespace "jdbcioit-1520848945942" created
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins7326632921083589457.sh
++ kubectl config current-context
+ kubectl 
--kubeconfig=
 config set-context gke_apache-beam-testing_us-central1-a_io-datastores 
--namespace=jdbcioit-1520848945942
Context "gke_apache-beam-testing_us-central1-a_io-datastores" modified.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins4302574502270946759.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins6270057041953252361.sh
+ rm -rf .env
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins8585677650878772256.sh
+ virtualenv .env --system-site-packages
New python executable in 

Installing setuptools, pip, wheel...done.
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins6242283973574970383.sh
+ .env/bin/pip install --upgrade setuptools pip
Requirement already up-to-date: setuptools in ./.env/lib/python2.7/site-packages
Requirement already up-to-date: pip in ./.env/lib/python2.7/site-packages
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins6678308900151377160.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_JDBC] $ /bin/bash -xe /tmp/jenkins3921584920540159805.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already 

[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=79547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79547
 ]

ASF GitHub Bot logged work on BEAM-3819:


Author: ASF GitHub Bot
Created on: 12/Mar/18 17:58
Start Date: 12/Mar/18 17:58
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev opened a new pull request #4851: 
[BEAM-3819] Add withLimit() option to KinesisIO
URL: https://github.com/apache/beam/pull/4851
 
 
   Add `KinesisIO.Read.withLimit()` to limit maximum number of fetched records 
in `GetRecordsResult`. It should help to reduce a load on Kinesis if needed.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [x] What the pull request does
  - [x] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [x] Each commit in the pull request should have a meaningful subject line 
and body.
- [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79547)
Time Spent: 10m
Remaining Estimate: 0h

> Add withLimit() option to KinesisIO
> ---
>
> Key: BEAM-3819
> URL: https://issues.apache.org/jira/browse/BEAM-3819
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In some cases, the user might need to set the {{limit}} on the 
> {{SimplifiedKinesisClient}}, especially for performance reason, depending of 
> the number of records.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3819) Add withLimit() option to KinesisIO

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3819?focusedWorklogId=79549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79549
 ]

ASF GitHub Bot logged work on BEAM-3819:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:01
Start Date: 12/Mar/18 18:01
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #4851: [BEAM-3819] 
Add withLimit() option to KinesisIO
URL: https://github.com/apache/beam/pull/4851#issuecomment-372406353
 
 
   @jbonofre @pawel-kaczmarczyk @iemejia  - please, take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79549)
Time Spent: 20m  (was: 10m)

> Add withLimit() option to KinesisIO
> ---
>
> Key: BEAM-3819
> URL: https://issues.apache.org/jira/browse/BEAM-3819
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some cases, the user might need to set the {{limit}} on the 
> {{SimplifiedKinesisClient}}, especially for performance reason, depending of 
> the number of records.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=79554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79554
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:05
Start Date: 12/Mar/18 18:05
Worklog Time Spent: 10m 
  Work Description: huygaa11 commented on a change in pull request #4793: 
[BEAM-3776] Fix issue with merging late windows where a watermark hold could be 
added behind the input watermark.
URL: https://github.com/apache/beam/pull/4793#discussion_r173894122
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java
 ##
 @@ -247,8 +249,26 @@ private Instant addGarbageCollectionHold(
   /**
* Prefetch watermark holds in preparation for merging.
*/
-  public void prefetchOnMerge(MergingStateAccessor state) {
-StateMerging.prefetchWatermarks(state, elementHoldTag);
+  public void prefetchOnMerge(MergingStateAccessor context) {
+Map map = 
context.accessInEachMergingWindow(elementHoldTag);
 
 Review comment:
   Is there any specific reason why code is moved out from StateMerging.java?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79554)
Time Spent: 50m  (was: 40m)

> StateMerging.mergeWatermarks sets a late watermark hold for late merging 
> windows that depend only on the window
> ---
>
> Key: BEAM-3776
> URL: https://issues.apache.org/jira/browse/BEAM-3776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> WatermarkHold.addElementHold and WatermarkHold.addGarbageCollectionHold take 
> to not add holds that would be before the input watermark.
> However WatermarkHold.onMerge calls StateMerging.mergeWatermarks which if the 
> window depends only on window, sets a hold for the end of the window 
> regardless of the input watermark.
> Thus if you have a WindowingStrategy such as:
> WindowingStrategy.of(Sessions.withGapDuration(gapDuration))
>  .withMode(AccumulationMode.DISCARDING_FIRED_PANES)
>  .withTrigger(
>  Repeatedly.forever(
>  AfterWatermark.pastEndOfWindow()
>  .withLateFirings(AfterPane.elementCountAtLeast(10
>  .withAllowedLateness(allowedLateness))
> and you merge windows that are late, you might end up holding the watermark 
> until the allowedLateness has passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=79553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79553
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:05
Start Date: 12/Mar/18 18:05
Worklog Time Spent: 10m 
  Work Description: huygaa11 commented on a change in pull request #4793: 
[BEAM-3776] Fix issue with merging late windows where a watermark hold could be 
added behind the input watermark.
URL: https://github.com/apache/beam/pull/4793#discussion_r173888558
 
 

 ##
 File path: 
runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
 ##
 @@ -873,6 +907,288 @@ public void testWatermarkHoldAndLateData() throws 
Exception {
 tester.assertHasOnlyGlobalAndFinishedSetsFor();
   }
 
+  @Test
+  public void testMergingWatermarkHoldAndLateDataSpecific() throws Exception {
+LinkedList configurations = new LinkedList<>();
+
+// Simple: late new window
+LinkedList actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(1));
+configurations.add(actions);
+
+// Simple: late new window, closed and extended.
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(1));
+actions.add(Action.times(10));
+configurations.add(actions);
+
+// Simple: late new window, closed and merged
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(1));
+actions.add(Action.times(14));
+actions.add(Action.times(6));
+configurations.add(actions);
+
+// Simple: late new window, extended past watermark
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(33));
+configurations.add(actions);
+
+// Simple: late new window, extended past watermark, extend more
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(33));
+actions.add(Action.times(43));
+configurations.add(actions);
+
+// Simple: late new window, extended past watermark, extend more
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(33));
+actions.add(Action.times(11));
+configurations.add(actions);
+
+// Simple: new window closes, then extended
+actions = new LinkedList<>();
+actions.add(Action.times(11));
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(18));
+configurations.add(actions);
+
+// Merging: new window closes, then extended then merged with new window
+actions = new LinkedList<>();
+actions.add(Action.times(11));
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(18));
+actions.add(Action.times(41));
+actions.add(Action.times(27, 33));
+configurations.add(actions);
+
+// Merging: late window, merges with new window
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(29));
+actions.add(Action.times(41));
+configurations.add(actions);
+
+// Merging: late window, new window joined
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(29));
+actions.add(Action.times(45));
+actions.add(Action.times(36));
+configurations.add(actions);
+
+// Merging: late window, new window all at once
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(29, 45, 36));
+configurations.add(actions);
+
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(42));
+actions.add(Action.times(33));
+configurations.add(actions);
+
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(42));
+actions.add(Action.times(33, 21));
+actions.add(Action.inputWatermark(50));
+actions.add(Action.times(12));
+configurations.add(actions);
+
+for (LinkedList configuration : configurations) {
+  System.out.println("Running config " + configuration.toString());
 
 Review comment:
   Remove all the print statements.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=79555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79555
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:06
Start Date: 12/Mar/18 18:06
Worklog Time Spent: 10m 
  Work Description: huygaa11 commented on a change in pull request #4793: 
[BEAM-3776] Fix issue with merging late windows where a watermark hold could be 
added behind the input watermark.
URL: https://github.com/apache/beam/pull/4793#discussion_r173894122
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java
 ##
 @@ -247,8 +249,26 @@ private Instant addGarbageCollectionHold(
   /**
* Prefetch watermark holds in preparation for merging.
*/
-  public void prefetchOnMerge(MergingStateAccessor state) {
-StateMerging.prefetchWatermarks(state, elementHoldTag);
+  public void prefetchOnMerge(MergingStateAccessor context) {
+Map map = 
context.accessInEachMergingWindow(elementHoldTag);
 
 Review comment:
   Is there any specific reason why code is moved out from StateMerging.java?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79555)
Time Spent: 1h  (was: 50m)

> StateMerging.mergeWatermarks sets a late watermark hold for late merging 
> windows that depend only on the window
> ---
>
> Key: BEAM-3776
> URL: https://issues.apache.org/jira/browse/BEAM-3776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Critical
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> WatermarkHold.addElementHold and WatermarkHold.addGarbageCollectionHold take 
> to not add holds that would be before the input watermark.
> However WatermarkHold.onMerge calls StateMerging.mergeWatermarks which if the 
> window depends only on window, sets a hold for the end of the window 
> regardless of the input watermark.
> Thus if you have a WindowingStrategy such as:
> WindowingStrategy.of(Sessions.withGapDuration(gapDuration))
>  .withMode(AccumulationMode.DISCARDING_FIRED_PANES)
>  .withTrigger(
>  Repeatedly.forever(
>  AfterWatermark.pastEndOfWindow()
>  .withLateFirings(AfterPane.elementCountAtLeast(10
>  .withAllowedLateness(allowedLateness))
> and you merge windows that are late, you might end up holding the watermark 
> until the allowedLateness has passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #11

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[klk] Add test for a trigger with windowed SQL query

[aljoscha.krettek] Annotate ParDoTest.duplicateTimerSetting with UsesTestStream

--
[...truncated 46.28 KB...]
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing 

 with 

[INFO] Replacing original test artifact with shaded test artifact.
[INFO] Replacing 

 with 

[INFO] Dependency-reduced POM written at: 

[INFO] 
[INFO] --- maven-failsafe-plugin:2.20.1:integration-test (default) @ 
beam-sdks-java-io-hadoop-input-format ---
[INFO] Failsafe report directory: 

[INFO] parallel='all', perCoreThreadCount=true, threadCount=4, 
useUnlimitedThreads=false, threadCountSuites=0, threadCountClasses=0, 
threadCountMethods=0, parallelOptimized=true
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 360.057 
s <<< FAILURE! - in 
org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT
[ERROR] 
readUsingHadoopInputFormat(org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT)
  Time elapsed: 360.057 s  <<< ERROR!
java.lang.RuntimeException: 
(f2e6dec52990d764): java.lang.RuntimeException: 
org.apache.beam.sdk.util.UserCodeException: org.postgresql.util.PSQLException: 
The connection attempt failed.
at 
com.google.cloud.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:190)
at 
com.google.cloud.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:161)
at 
com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
at 
com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
at 
com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
at 
com.google.cloud.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:121)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:328)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:284)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.beam.sdk.util.UserCodeException: 
org.postgresql.util.PSQLException: The connection attempt failed.
at 
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
at 
org.apache.beam.sdk.io.jdbc.JdbcIO$Write$WriteFn$DoFnInvoker.invokeSetup(Unknown
 Source)
at 
com.google.cloud.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:63)
at 
com.google.cloud.dataflow.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:45)
at 
com.google.cloud.dataflow.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:94)
at 

Build failed in Jenkins: beam_PerformanceTests_Python #1016

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[klk] Add test for a trigger with windowed SQL query

[aljoscha.krettek] Annotate ParDoTest.duplicateTimerSetting with UsesTestStream

--
[...truncated 1.21 KB...]
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins9025455714916000633.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6580474350851929166.sh
+ rm -rf .env
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6559286789200088391.sh
+ virtualenv .env --system-site-packages
New python executable in 

Installing setuptools, pip, wheel...done.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1787265590884991539.sh
+ .env/bin/pip install --upgrade setuptools pip
Requirement already up-to-date: setuptools in ./.env/lib/python2.7/site-packages
Requirement already up-to-date: pip in ./.env/lib/python2.7/site-packages
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins8802388234752567971.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins9114410556895299689.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Collecting numpy==1.13.3 (from -r PerfKitBenchmarker/requirements.txt (line 22))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 
/home/jenkins/.local/lib/python2.7/site-packages (from absl-py->-r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: MarkupSafe>=0.23 in 
/usr/local/lib/python2.7/dist-packages (from 

Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Spark #4402

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-3830) Add validation spreadsheet to release guide on website

2018-03-12 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-3830:
-

 Summary: Add validation spreadsheet to release guide on website
 Key: BEAM-3830
 URL: https://issues.apache.org/jira/browse/BEAM-3830
 Project: Beam
  Issue Type: Wish
  Components: website
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Flink #5223

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3749) support customized trigger/accumulationMode in BeamSql

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3749?focusedWorklogId=79440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79440
 ]

ASF GitHub Bot logged work on BEAM-3749:


Author: ASF GitHub Bot
Created on: 12/Mar/18 13:43
Start Date: 12/Mar/18 13:43
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #4826: [BEAM-3749] Add 
test for a trigger with windowed SQL query
URL: https://github.com/apache/beam/pull/4826
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
index cdea0f8ce1f..ed19668d3b8 100644
--- 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
+++ 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
@@ -27,6 +27,8 @@
 import java.util.Iterator;
 import java.util.List;
 import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestStream;
+import org.apache.beam.sdk.testing.UsesTestStream;
 import org.apache.beam.sdk.transforms.Create;
 import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.transforms.windowing.AfterPane;
@@ -45,6 +47,7 @@
 import org.joda.time.Duration;
 import org.junit.Before;
 import org.junit.Test;
+import org.junit.experimental.categories.Category;
 
 /**
  * Tests for GROUP-BY/aggregation, with 
global_window/fix_time_window/sliding_window/session_window
@@ -328,6 +331,65 @@ private void runTumbleWindow(PCollection input) 
throws Exception {
 pipeline.run().waitUntilFinish();
   }
 
+  /**
+   * Tests that a trigger set up prior to a SQL statement still is effective
+   * within the SQL statement.
+   */
+  @Test
+  @Category(UsesTestStream.class)
+  public void testTriggeredTumble() throws Exception {
+RowType inputRowType =
+
RowSqlType.builder().withIntegerField("f_int").withTimestampField("f_timestamp").build();
+
+PCollection input =
+pipeline.apply(
+TestStream.create(inputRowType.getRowCoder())
+.addElements(
+Row.withRowType(inputRowType)
+.addValues(1, FORMAT.parse("2017-01-01 01:01:01"))
+.build(),
+Row.withRowType(inputRowType)
+.addValues(2, FORMAT.parse("2017-01-01 01:01:01"))
+.build())
+.addElements(
+Row.withRowType(inputRowType)
+.addValues(3, FORMAT.parse("2017-01-01 01:01:01"))
+.build())
+.addElements(
+Row.withRowType(inputRowType)
+.addValues(4, FORMAT.parse("2017-01-01 01:01:01"))
+.build())
+.advanceWatermarkToInfinity());
+
+String sql =
+"SELECT SUM(f_int) AS f_int_sum FROM PCOLLECTION"
++ " GROUP BY TUMBLE(f_timestamp, INTERVAL '1' HOUR)";
+
+RowType outputRowType = 
RowSqlType.builder().withIntegerField("fn_int_sum").build();
+
+PCollection result =
+input
+.apply(
+"Triggering",
+Window.configure()
+
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
+.withAllowedLateness(Duration.ZERO)
+
.withOnTimeBehavior(Window.OnTimeBehavior.FIRE_IF_NON_EMPTY)
+.accumulatingFiredPanes())
+.apply("Windowed Query", BeamSql.query(sql));
+
+PAssert.that(result)
+.containsInAnyOrder(
+TestUtils.RowsBuilder.of(outputRowType)
+.addRows(3) // first bundle 1+2
+.addRows(6) // next bundle 1+2+3
+.addRows(10) // next bundle 1+2+3+4)
+.getRows());
+
+pipeline.run().waitUntilFinish();
+
+  }
+
   /**
* GROUP-BY with HOP window(aka sliding_window) with bounded PCollection.
*/


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79440)
Time Spent: 3h 20m  (was: 3h 10m)

> support customized trigger/accumulationMode in BeamSql
> 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5141

2018-03-12 Thread Apache Jenkins Server
See 




[beam] branch master updated (478d913 -> 793bfac)

2018-03-12 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 478d913  Merge pull request #4849: Annotate 
ParDoTest.duplicateTimerSetting with UsesTestStream
 add 22fb304  Add test for a trigger with windowed SQL query
 new 793bfac  Merge pull request #4826: [BEAM-3749] Add test for a trigger 
with windowed SQL query

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../extensions/sql/BeamSqlDslAggregationTest.java  | 62 ++
 1 file changed, 62 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[beam] 01/01: Merge pull request #4826: [BEAM-3749] Add test for a trigger with windowed SQL query

2018-03-12 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 793bfac7516c2532ad4c979da220f2a9801e1b19
Merge: 478d913 22fb304
Author: Kenn Knowles 
AuthorDate: Mon Mar 12 06:43:14 2018 -0700

Merge pull request #4826: [BEAM-3749] Add test for a trigger with windowed 
SQL query

 .../extensions/sql/BeamSqlDslAggregationTest.java  | 62 ++
 1 file changed, 62 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Assigned] (BEAM-3758) Migrate Python SDK Read transform to be Impulse->SDF

2018-03-12 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3758:
-

Assignee: Robert Bradshaw  (was: Kenneth Knowles)

> Migrate Python SDK Read transform to be Impulse->SDF
> 
>
> Key: BEAM-3758
> URL: https://issues.apache.org/jira/browse/BEAM-3758
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Major
>
> Currently, Read is the "primitive" even though portability doesn't even have 
> the concept. Anyhow at least the DataflowRunner should override it to be 
> impulse, since the service requires this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3831) Update Google Cloud Core version (which depends on org.json)

2018-03-12 Thread Paul Gerver (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Gerver updated BEAM-3831:
--
Issue Type: Wish  (was: Bug)

> Update Google Cloud Core version (which depends on org.json)
> 
>
> Key: BEAM-3831
> URL: https://issues.apache.org/jira/browse/BEAM-3831
> Project: Beam
>  Issue Type: Wish
>  Components: io-java-gcp
>Affects Versions: 2.3.0
>Reporter: Paul Gerver
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> Looking at dependencies pulled in for my application and saw that org.json 
> jar file was being used. Looked up the dependency tree and saw the following:
> {noformat}
> [INFO] |  +- 
> org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
> ...
> [INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
> [INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
> The Apache Foundation has noted that use of software with the json license 
> should not be used: [https://www.apache.org/legal/resolved.html#json]
>  
> The earliest version to exclude the dependency is v1.15.0: 
> https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3138) Stop depending on Test JARs

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3138?focusedWorklogId=79560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79560
 ]

ASF GitHub Bot logged work on BEAM-3138:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:29
Start Date: 12/Mar/18 18:29
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #4740: 
[BEAM-3138][BEAM-3573] Eliminate some test-jar deps
URL: https://github.com/apache/beam/pull/4740#issuecomment-372415747
 
 
   run java gradle precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79560)
Time Spent: 40m  (was: 0.5h)

> Stop depending on Test JARs
> ---
>
> Key: BEAM-3138
> URL: https://issues.apache.org/jira/browse/BEAM-3138
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, runner-core, sdk-java-core, sdk-java-harness
>Reporter: Thomas Groh
>Assignee: Kenneth Knowles
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Testing components can be in a testing or otherwise signaled package, but 
> shouldn't really be depended on by depending on a test jar in the test scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3758) Migrate Python SDK Read transform to be Impulse->SDF

2018-03-12 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395705#comment-16395705
 ] 

Kenneth Knowles commented on BEAM-3758:
---

Hitting [~bsidhom] [~axelmagn].

> Migrate Python SDK Read transform to be Impulse->SDF
> 
>
> Key: BEAM-3758
> URL: https://issues.apache.org/jira/browse/BEAM-3758
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> Currently, Read is the "primitive" even though portability doesn't even have 
> the concept. Anyhow at least the DataflowRunner should override it to be 
> impulse, since the service requires this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3758) Migrate Python SDK Read transform to be Impulse->SDF

2018-03-12 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3758:
-

Assignee: Kenneth Knowles  (was: Robert Bradshaw)

> Migrate Python SDK Read transform to be Impulse->SDF
> 
>
> Key: BEAM-3758
> URL: https://issues.apache.org/jira/browse/BEAM-3758
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> Currently, Read is the "primitive" even though portability doesn't even have 
> the concept. Anyhow at least the DataflowRunner should override it to be 
> impulse, since the service requires this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_JDBC #322

2018-03-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Spark #1460

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[klk] Add test for a trigger with windowed SQL query

[aljoscha.krettek] Annotate ParDoTest.duplicateTimerSetting with UsesTestStream

--
[...truncated 90.64 KB...]
'apache-beam-testing:bqjob_r537af81d4f2b9cd8_01621b7041d9_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 18:19:26,747 ad6e44a0 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 18:19:47,074 ad6e44a0 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 18:19:49,362 ad6e44a0 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.28s,  CPU:0.22s,  MaxMemory:25480kb 
STDOUT: Upload complete.
Waiting on bqjob_r4f92b1897afddbc4_01621b709b0a_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r4f92b1897afddbc4_01621b709b0a_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r4f92b1897afddbc4_01621b709b0a_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 18:19:49,362 ad6e44a0 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 18:20:15,039 ad6e44a0 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 18:20:17,336 ad6e44a0 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.29s,  CPU:0.24s,  MaxMemory:25468kb 
STDOUT: Upload complete.
Waiting on bqjob_r3ac98ac67406fc06_01621b71083b_1 ... (0s) Current status: 
RUNNING 
 Waiting on bqjob_r3ac98ac67406fc06_01621b71083b_1 ... (0s) 
Current status: DONE   
BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r3ac98ac67406fc06_01621b71083b_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)

2018-03-12 18:20:17,336 ad6e44a0 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-12 18:20:38,503 ad6e44a0 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-12 18:20:41,011 ad6e44a0 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.50s,  CPU:0.22s,  MaxMemory:25276kb 
STDOUT: Upload complete.
Waiting on bqjob_r21c1c990e29a6f85_01621b7163e5_1 ... (0s) Current status: 
RUNNING 
 Waiting on 

[jira] [Created] (BEAM-3831) Update Google Cloud Core version (which depends on org.json)

2018-03-12 Thread Paul Gerver (JIRA)
Paul Gerver created BEAM-3831:
-

 Summary: Update Google Cloud Core version (which depends on 
org.json)
 Key: BEAM-3831
 URL: https://issues.apache.org/jira/browse/BEAM-3831
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.3.0
Reporter: Paul Gerver
Assignee: Chamikara Jayalath


Looking at dependencies pulled in for my application and saw that org.json jar 
file was being used. Looked up the dependency tree and saw the following:
{noformat}
[INFO] |  +- 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
...
[INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
[INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
The Apache Foundation has noted that use of software with the json license 
should not be used: [https://www.apache.org/legal/resolved.html#json]

 

The earliest version to exclude the dependency is v1.15.0 
([https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3831) Update Google Cloud Core version (which depends on org.json)

2018-03-12 Thread Paul Gerver (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Gerver updated BEAM-3831:
--
Description: 
Looking at dependencies pulled in for my application and saw that org.json jar 
file was being used. Looked up the dependency tree and saw the following:
{noformat}
[INFO] |  +- 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
...
[INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
[INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
The Apache Foundation has noted that use of software with the json license 
should not be used: [https://www.apache.org/legal/resolved.html#json]

 

The earliest version to exclude the dependency is 
[v1.15.0|[https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0|https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0)]]

  was:
Looking at dependencies pulled in for my application and saw that org.json jar 
file was being used. Looked up the dependency tree and saw the following:
{noformat}
[INFO] |  +- 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
...
[INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
[INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
The Apache Foundation has noted that use of software with the json license 
should not be used: [https://www.apache.org/legal/resolved.html#json]

 

The earliest version to exclude the dependency is v1.15.0 
([https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0)]


> Update Google Cloud Core version (which depends on org.json)
> 
>
> Key: BEAM-3831
> URL: https://issues.apache.org/jira/browse/BEAM-3831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.3.0
>Reporter: Paul Gerver
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> Looking at dependencies pulled in for my application and saw that org.json 
> jar file was being used. Looked up the dependency tree and saw the following:
> {noformat}
> [INFO] |  +- 
> org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
> ...
> [INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
> [INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
> The Apache Foundation has noted that use of software with the json license 
> should not be used: [https://www.apache.org/legal/resolved.html#json]
>  
> The earliest version to exclude the dependency is 
> [v1.15.0|[https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0|https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0)]]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3831) Update Google Cloud Core version (which depends on org.json)

2018-03-12 Thread Paul Gerver (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Gerver updated BEAM-3831:
--
Description: 
Looking at dependencies pulled in for my application and saw that org.json jar 
file was being used. Looked up the dependency tree and saw the following:
{noformat}
[INFO] |  +- 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
...
[INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
[INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
The Apache Foundation has noted that use of software with the json license 
should not be used: [https://www.apache.org/legal/resolved.html#json]

 

The earliest version to exclude the dependency is v1.15.0: 
https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0

  was:
Looking at dependencies pulled in for my application and saw that org.json jar 
file was being used. Looked up the dependency tree and saw the following:
{noformat}
[INFO] |  +- 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
...
[INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
[INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
The Apache Foundation has noted that use of software with the json license 
should not be used: [https://www.apache.org/legal/resolved.html#json]

 

The earliest version to exclude the dependency is 
[v1.15.0|[https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0|https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0)]]


> Update Google Cloud Core version (which depends on org.json)
> 
>
> Key: BEAM-3831
> URL: https://issues.apache.org/jira/browse/BEAM-3831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.3.0
>Reporter: Paul Gerver
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> Looking at dependencies pulled in for my application and saw that org.json 
> jar file was being used. Looked up the dependency tree and saw the following:
> {noformat}
> [INFO] |  +- 
> org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0:compile
> ...
> [INFO] |  |  +- com.google.cloud:google-cloud-core:jar:1.0.2:compile
> [INFO] |  |  |  \- org.json:json:jar:20160810:compile{noformat}
> The Apache Foundation has noted that use of software with the json license 
> should not be used: [https://www.apache.org/legal/resolved.html#json]
>  
> The earliest version to exclude the dependency is v1.15.0: 
> https://mvnrepository.com/artifact/com.google.cloud/google-cloud-core/1.15.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=79571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79571
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:46
Start Date: 12/Mar/18 18:46
Worklog Time Spent: 10m 
  Work Description: huygaa11 commented on a change in pull request #4793: 
[BEAM-3776] Fix issue with merging late windows where a watermark hold could be 
added behind the input watermark.
URL: https://github.com/apache/beam/pull/4793#discussion_r173895435
 
 

 ##
 File path: 
runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
 ##
 @@ -873,6 +907,288 @@ public void testWatermarkHoldAndLateData() throws 
Exception {
 tester.assertHasOnlyGlobalAndFinishedSetsFor();
   }
 
+  @Test
+  public void testMergingWatermarkHoldAndLateDataSpecific() throws Exception {
 
 Review comment:
   This test should be simpler and shorter.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79571)
Time Spent: 1h 20m  (was: 1h 10m)

> StateMerging.mergeWatermarks sets a late watermark hold for late merging 
> windows that depend only on the window
> ---
>
> Key: BEAM-3776
> URL: https://issues.apache.org/jira/browse/BEAM-3776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> WatermarkHold.addElementHold and WatermarkHold.addGarbageCollectionHold take 
> to not add holds that would be before the input watermark.
> However WatermarkHold.onMerge calls StateMerging.mergeWatermarks which if the 
> window depends only on window, sets a hold for the end of the window 
> regardless of the input watermark.
> Thus if you have a WindowingStrategy such as:
> WindowingStrategy.of(Sessions.withGapDuration(gapDuration))
>  .withMode(AccumulationMode.DISCARDING_FIRED_PANES)
>  .withTrigger(
>  Repeatedly.forever(
>  AfterWatermark.pastEndOfWindow()
>  .withLateFirings(AfterPane.elementCountAtLeast(10
>  .withAllowedLateness(allowedLateness))
> and you merge windows that are late, you might end up holding the watermark 
> until the allowedLateness has passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=79572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79572
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:46
Start Date: 12/Mar/18 18:46
Worklog Time Spent: 10m 
  Work Description: huygaa11 commented on a change in pull request #4793: 
[BEAM-3776] Fix issue with merging late windows where a watermark hold could be 
added behind the input watermark.
URL: https://github.com/apache/beam/pull/4793#discussion_r173896102
 
 

 ##
 File path: 
runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
 ##
 @@ -873,6 +907,288 @@ public void testWatermarkHoldAndLateData() throws 
Exception {
 tester.assertHasOnlyGlobalAndFinishedSetsFor();
   }
 
+  @Test
+  public void testMergingWatermarkHoldAndLateDataSpecific() throws Exception {
+LinkedList configurations = new LinkedList<>();
+
+// Simple: late new window
+LinkedList actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(1));
+configurations.add(actions);
+
+// Simple: late new window, closed and extended.
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(1));
+actions.add(Action.times(10));
+configurations.add(actions);
+
+// Simple: late new window, closed and merged
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(1));
+actions.add(Action.times(14));
+actions.add(Action.times(6));
+configurations.add(actions);
+
+// Simple: late new window, extended past watermark
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(33));
+configurations.add(actions);
+
+// Simple: late new window, extended past watermark, extend more
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(33));
+actions.add(Action.times(43));
+configurations.add(actions);
+
+// Simple: late new window, extended past watermark, extend more
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(33));
+actions.add(Action.times(11));
+configurations.add(actions);
+
+// Simple: new window closes, then extended
+actions = new LinkedList<>();
+actions.add(Action.times(11));
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(18));
+configurations.add(actions);
+
+// Merging: new window closes, then extended then merged with new window
+actions = new LinkedList<>();
+actions.add(Action.times(11));
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(18));
+actions.add(Action.times(41));
+actions.add(Action.times(27, 33));
+configurations.add(actions);
+
+// Merging: late window, merges with new window
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(29));
+actions.add(Action.times(41));
+configurations.add(actions);
+
+// Merging: late window, new window joined
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(29));
+actions.add(Action.times(45));
+actions.add(Action.times(36));
+configurations.add(actions);
+
+// Merging: late window, new window all at once
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(29, 45, 36));
+configurations.add(actions);
+
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(42));
+actions.add(Action.times(33));
+configurations.add(actions);
+
+actions = new LinkedList<>();
+actions.add(Action.inputWatermark(40));
+actions.add(Action.times(25));
+actions.add(Action.times(42));
+actions.add(Action.times(33, 21));
+actions.add(Action.inputWatermark(50));
+actions.add(Action.times(12));
+configurations.add(actions);
+
+for (LinkedList configuration : configurations) {
+  System.out.println("Running config " + configuration.toString());
+  MetricsContainerImpl container = new MetricsContainerImpl("any");
+  MetricsEnvironment.setCurrentContainer(container);
+  // Test handling of late data. Specifically, ensure the watermark hold 
is correct.
+  Duration allowedLateness = Duration.standardMinutes(1);
+  Duration gapDuration = Duration.millis(10);
+  System.out.printf("Gap duration %s\n", gapDuration);
+  

[jira] [Work logged] (BEAM-3776) StateMerging.mergeWatermarks sets a late watermark hold for late merging windows that depend only on the window

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3776?focusedWorklogId=79570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79570
 ]

ASF GitHub Bot logged work on BEAM-3776:


Author: ASF GitHub Bot
Created on: 12/Mar/18 18:46
Start Date: 12/Mar/18 18:46
Worklog Time Spent: 10m 
  Work Description: huygaa11 commented on a change in pull request #4793: 
[BEAM-3776] Fix issue with merging late windows where a watermark hold could be 
added behind the input watermark.
URL: https://github.com/apache/beam/pull/4793#discussion_r173894355
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java
 ##
 @@ -247,8 +249,26 @@ private Instant addGarbageCollectionHold(
   /**
* Prefetch watermark holds in preparation for merging.
*/
-  public void prefetchOnMerge(MergingStateAccessor state) {
-StateMerging.prefetchWatermarks(state, elementHoldTag);
+  public void prefetchOnMerge(MergingStateAccessor context) {
 
 Review comment:
   Is there any specific reason why code is moved out from StateMerging.java?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79570)
Time Spent: 1h 10m  (was: 1h)

> StateMerging.mergeWatermarks sets a late watermark hold for late merging 
> windows that depend only on the window
> ---
>
> Key: BEAM-3776
> URL: https://issues.apache.org/jira/browse/BEAM-3776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Critical
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> WatermarkHold.addElementHold and WatermarkHold.addGarbageCollectionHold take 
> to not add holds that would be before the input watermark.
> However WatermarkHold.onMerge calls StateMerging.mergeWatermarks which if the 
> window depends only on window, sets a hold for the end of the window 
> regardless of the input watermark.
> Thus if you have a WindowingStrategy such as:
> WindowingStrategy.of(Sessions.withGapDuration(gapDuration))
>  .withMode(AccumulationMode.DISCARDING_FIRED_PANES)
>  .withTrigger(
>  Repeatedly.forever(
>  AfterWatermark.pastEndOfWindow()
>  .withLateFirings(AfterPane.elementCountAtLeast(10
>  .withAllowedLateness(allowedLateness))
> and you merge windows that are late, you might end up holding the watermark 
> until the allowedLateness has passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3758) Migrate Python SDK Read transform to be Impulse->SDF

2018-03-12 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3758:
-

Assignee: Robert Bradshaw  (was: Kenneth Knowles)

> Migrate Python SDK Read transform to be Impulse->SDF
> 
>
> Key: BEAM-3758
> URL: https://issues.apache.org/jira/browse/BEAM-3758
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Major
>
> Currently, Read is the "primitive" even though portability doesn't even have 
> the concept. Anyhow at least the DataflowRunner should override it to be 
> impulse, since the service requires this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3138) Stop depending on Test JARs

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3138?focusedWorklogId=79583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79583
 ]

ASF GitHub Bot logged work on BEAM-3138:


Author: ASF GitHub Bot
Created on: 12/Mar/18 19:31
Start Date: 12/Mar/18 19:31
Worklog Time Spent: 10m 
  Work Description: kennknowles closed pull request #4740: 
[BEAM-3138][BEAM-3573] Eliminate some test-jar deps
URL: https://github.com/apache/beam/pull/4740
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/model/fn-execution/pom.xml b/model/fn-execution/pom.xml
index c568c4ff964..f6188e91636 100644
--- a/model/fn-execution/pom.xml
+++ b/model/fn-execution/pom.xml
@@ -31,16 +31,6 @@
   Portable definitions for execution user-defined 
functions
 
   
-
-  
-src/test/resources
-true
-  
-  
-
${project.build.directory}/original_sources_to_package
-  
-
-
 
   
   
diff --git 
a/model/fn-execution/src/test/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
 
b/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
similarity index 100%
rename from 
model/fn-execution/src/test/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
rename to 
model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
diff --git a/model/job-management/pom.xml b/model/job-management/pom.xml
index 4b723f3dae9..162d0aaf4a1 100644
--- a/model/job-management/pom.xml
+++ b/model/job-management/pom.xml
@@ -31,16 +31,6 @@
   Portable definitions for submitting pipelines.
 
   
-
-  
-src/main/resources
-true
-  
-  
-
${project.build.directory}/original_sources_to_package
-  
-
-
 
   
   
diff --git a/model/pipeline/pom.xml b/model/pipeline/pom.xml
index 73fea08c898..d7cbee7918d 100644
--- a/model/pipeline/pom.xml
+++ b/model/pipeline/pom.xml
@@ -31,16 +31,6 @@
   Portable definitions for building pipelines
 
   
-
-  
-src/main/resources
-true
-  
-  
-
${project.build.directory}/original_sources_to_package
-  
-
-
 
   
   
diff --git a/pom.xml b/pom.xml
index 42ac08a3cff..9573a07767f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -494,13 +494,6 @@
 ${project.version}
   
 
-  
-org.apache.beam
-beam-model-fn-execution
-${project.version}
-test-jar
-  
-
   
 org.apache.beam
 beam-sdks-java-core
@@ -562,13 +555,6 @@
 ${project.version}
   
 
-  
-org.apache.beam
-beam-sdks-java-fn-execution
-${project.version}
-test-jar
-  
-
   
 org.apache.beam
 beam-sdks-java-harness
@@ -749,6 +735,7 @@
 ${project.version}
   
 
+  
   
 org.apache.beam
 beam-runners-core-java
@@ -1409,6 +1396,20 @@
 ${commons.csv.version}
   
 
+  
+  
+com.google.guava
+guava-testlib
+${guava.version}
+  
+
+  
+junit
+junit
+${junit.version}
+provided
+  
+
   
 
   
@@ -1456,8 +1457,6 @@
   
 
   
-
 com.google.guava
 guava-testlib
 ${guava.version}
@@ -1955,10 +1954,6 @@
 
   
 com.google.common
-
-  
-  com.google.common.**.testing.*
-
 
 
   
org.apache.${renderedArtifactId}.repackaged.com.google.common
diff --git a/runners/apex/build.gradle b/runners/apex/build.gradle
index bb97e36a357..800e5947c5e 100644
--- a/runners/apex/build.gradle
+++ b/runners/apex/build.gradle
@@ -43,7 +43,7 @@ dependencies {
   shadow library.java.findbugs_jsr305
   shadow library.java.apex_engine
   testCompile project(path: ":sdks:java:core", configuration: "shadowTest")
-  testCompile project(":model:fn-execution").sourceSets.test.output
+  // ApexStateInternalsTest extends abstract StateInternalsTest
   testCompile project(":runners:core-java").sourceSets.test.output
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
diff --git a/runners/apex/pom.xml b/runners/apex/pom.xml
index e5ecaaf0499..5faf95f8853 100644
--- a/runners/apex/pom.xml
+++ b/runners/apex/pom.xml
@@ -185,10 +185,10 @@
 
   org.apache.beam
   beam-model-fn-execution
-  test-jar
   test
 
 
+
 
   org.apache.beam
   beam-runners-core-java
diff --git 

[beam] branch master updated (793bfac -> 52005c6)

2018-03-12 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 793bfac  Merge pull request #4826: [BEAM-3749] Add test for a trigger 
with windowed SQL query
 add d40ebab  Increase whitelist of false detections in 
SdkCoreApiSurfaceTest
 add c9e2855  Eliminate beam-model-fn-execution test-jar deps
 add 62f6ee3  Eliminate beam-sdks-java-fn-execution test-jar deps
 add 31c72f2  Eliminate incorrect sdks-java-core test-jar deps
 add 8bb8aa0  Notate uses of beam-runners-core-java test-jar
 new 52005c6  Merge pull request #4740: [BEAM-3138][BEAM-3573] Eliminate 
some test-jar deps

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 model/fn-execution/pom.xml | 10 ---
 .../beam/model/fnexecution/v1/standard_coders.yaml |  0
 model/job-management/pom.xml   | 10 ---
 model/pipeline/pom.xml | 10 ---
 pom.xml| 35 ++
 runners/apex/build.gradle  |  2 +-
 runners/apex/pom.xml   |  2 +-
 runners/direct-java/build.gradle   |  1 -
 runners/direct-java/pom.xml|  1 -
 runners/flink/build.gradle |  2 +-
 runners/flink/pom.xml  |  2 +-
 runners/google-cloud-dataflow-java/build.gradle|  1 -
 runners/google-cloud-dataflow-java/pom.xml |  1 -
 runners/java-fn-execution/build.gradle |  1 -
 runners/java-fn-execution/pom.xml  |  7 -
 .../logging/GrpcLoggingServiceTest.java|  2 +-
 runners/spark/build.gradle |  2 +-
 runners/spark/pom.xml  |  2 +-
 sdks/java/core/build.gradle|  4 +--
 sdks/java/core/pom.xml | 20 +
 .../apache/beam/sdk/io/fs/ResourceIdTester.java|  0
 .../java/org/apache/beam/sdk/util/ApiSurface.java  | 14 -
 .../org/apache/beam/SdkCoreApiSurfaceTest.java | 20 -
 sdks/java/fn-execution/pom.xml |  7 +++--
 .../org/apache/beam/sdk/fn/test/TestExecutors.java |  1 +
 .../org/apache/beam/sdk/fn/test/TestStreams.java   |  2 ++
 .../org/apache/beam/sdk/fn/test}/package-info.java |  6 ++--
 .../BeamFnDataBufferingOutboundObserverTest.java   |  2 +-
 .../java/org/apache/beam/sdk/fn/test/Consumer.java | 26 
 .../java/org/apache/beam/sdk/fn/test/Supplier.java | 26 
 sdks/java/harness/build.gradle |  2 --
 sdks/java/harness/pom.xml  |  7 -
 sdks/java/io/amazon-web-services/build.gradle  |  1 -
 33 files changed, 54 insertions(+), 175 deletions(-)
 rename model/fn-execution/src/{test => 
main}/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml (100%)
 rename sdks/java/core/src/{test => 
main}/java/org/apache/beam/sdk/io/fs/ResourceIdTester.java (100%)
 rename sdks/java/fn-execution/src/{test => 
main}/java/org/apache/beam/sdk/fn/test/TestExecutors.java (98%)
 rename sdks/java/fn-execution/src/{test => 
main}/java/org/apache/beam/sdk/fn/test/TestStreams.java (98%)
 copy 
sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta 
=> fn-execution/src/main/java/org/apache/beam/sdk/fn/test}/package-info.java 
(90%)
 delete mode 100644 
sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/Consumer.java
 delete mode 100644 
sdks/java/fn-execution/src/test/java/org/apache/beam/sdk/fn/test/Supplier.java

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[beam] 01/01: Merge pull request #4740: [BEAM-3138][BEAM-3573] Eliminate some test-jar deps

2018-03-12 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 52005c6fc929a7bd8ff8b5e6aef85533d267fb5b
Merge: 793bfac 8bb8aa0
Author: Kenn Knowles 
AuthorDate: Mon Mar 12 12:31:13 2018 -0700

Merge pull request #4740: [BEAM-3138][BEAM-3573] Eliminate some test-jar 
deps

 model/fn-execution/pom.xml | 10 ---
 .../beam/model/fnexecution/v1/standard_coders.yaml |  0
 model/job-management/pom.xml   | 10 ---
 model/pipeline/pom.xml | 10 ---
 pom.xml| 35 ++
 runners/apex/build.gradle  |  2 +-
 runners/apex/pom.xml   |  2 +-
 runners/direct-java/build.gradle   |  1 -
 runners/direct-java/pom.xml|  1 -
 runners/flink/build.gradle |  2 +-
 runners/flink/pom.xml  |  2 +-
 runners/google-cloud-dataflow-java/build.gradle|  1 -
 runners/google-cloud-dataflow-java/pom.xml |  1 -
 runners/java-fn-execution/build.gradle |  1 -
 runners/java-fn-execution/pom.xml  |  7 -
 .../logging/GrpcLoggingServiceTest.java|  2 +-
 runners/spark/build.gradle |  2 +-
 runners/spark/pom.xml  |  2 +-
 sdks/java/core/build.gradle|  4 +--
 sdks/java/core/pom.xml | 20 +
 .../apache/beam/sdk/io/fs/ResourceIdTester.java|  0
 .../java/org/apache/beam/sdk/util/ApiSurface.java  | 14 -
 .../org/apache/beam/SdkCoreApiSurfaceTest.java | 20 -
 sdks/java/fn-execution/pom.xml |  7 +++--
 .../org/apache/beam/sdk/fn/test/TestExecutors.java |  1 +
 .../org/apache/beam/sdk/fn/test/TestStreams.java   |  2 ++
 .../org/apache/beam/sdk/fn/test/package-info.java} |  8 +
 .../BeamFnDataBufferingOutboundObserverTest.java   |  2 +-
 .../java/org/apache/beam/sdk/fn/test/Consumer.java | 26 
 sdks/java/harness/build.gradle |  2 --
 sdks/java/harness/pom.xml  |  7 -
 sdks/java/io/amazon-web-services/build.gradle  |  1 -
 32 files changed, 53 insertions(+), 152 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Work logged] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3714?focusedWorklogId=79535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79535
 ]

ASF GitHub Bot logged work on BEAM-3714:


Author: ASF GitHub Bot
Created on: 12/Mar/18 17:38
Start Date: 12/Mar/18 17:38
Worklog Time Spent: 10m 
  Work Description: evindj commented on issue #4786: [BEAM-3714]modified 
result set to be forward only and read only
URL: https://github.com/apache/beam/pull/4786#issuecomment-372397754
 
 
   Hi Eugene, I could not find time for it last week but updates comming up
   shortly
   
   On Mon, Mar 12, 2018 at 12:41 PM Eugene Kirpichov 
   wrote:
   
   > @evindj  Any updates here?
   >
   > —
   > You are receiving this because you were mentioned.
   >
   >
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   -- 
   
   *DJIOFACK INNOCENT*
   *"Be better than the day before!" -*
   *+1 404 751 8024*
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79535)
Time Spent: 1h 10m  (was: 1h)

> JdbcIO.read() should create a forward-only, read-only result set
> 
>
> Key: BEAM-3714
> URL: https://issues.apache.org/jira/browse/BEAM-3714
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jdbc
>Reporter: Eugene Kirpichov
>Assignee: Innocent
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934]
>  - a user is trying to load a large table from MySQL, and the MySQL JDBC 
> driver requires special measures when loading large result sets.
> JdbcIO currently calls simply "connection.prepareStatement(query)" 
> https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508
>  - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY 
> - these values should always be used.
> Seems that different databases have different requirements for streaming 
> result sets.
> E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must 
> not be in autocommit mode." 
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . 
> Oracle, I think, doesn't have any special requirements but I don't know. 
> Fetch size should probably still be set to a reasonably large value.
> Seems that the common denominator of these requirements is: set fetch size to 
> a reasonably large but not maximum value; disable autocommit (there's nothing 
> to commit in read() anyway).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #4401

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Comment Edited] (BEAM-3417) Fix Calcite assertions

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390732#comment-16390732
 ] 

Anton Kedin edited comment on BEAM-3417 at 3/12/18 8:10 PM:


*What fails?*

[Assert in question is in in VolcanoPlanner 
|https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546].
 It checks whether [all traits are 
simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517]
 by checking whether they're not instances of RelCompositeTrait.

*Why it fails?*

In our case, when it fails, traitSet.allSimple() has 2 traits. One is 
BeamLogicalConvention (it's not a composite trait), and another is a 
collation-related composite trait which causes the assertion to fail.

*Where does the composite trait come from?*

We specify the collation trait def in 
[BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89]
 before parsing. It then [gets replaced in 
LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102]
 with the [composite 
trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239]
 which causes the failure.

*Why LogicalTableScan needs to do the collation magic?*

Dunno, it seems that it adds the statistics information to the collation trait 
so that the engine can handle sorting correctly. It does so only when we ask it 
to by adding the collation trait def.

*Why VolcanoPlanner doesn't like CompositeTraitSet in that part?*

Dunno.

*Do we need the collation trait def?*

Dunno.

*What do we do?*

If we can, it probably makes sense to replace LogicalTableScan rel with some 
kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic 
or makes it configurable.

 - (update) At the second look, this assertion seems to happen before the Rel 
Replacement, so we probably won't be able to replace the logical table scan rel 
with our own logic.


was (Author: kedin):
*What fails?*

[Assert in question is in in VolcanoPlanner 
|https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546].
 It checks whether [all traits are 
simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517]
 by checking whether they're not instances of RelCompositeTrait.

*Why it fails?*

In our case, when it fails, traitSet.allSimple() has 2 traits. One is 
BeamLogicalConvention (it's not a composite trait), and another is a 
collation-related composite trait which causes the assertion to fail.

*Where does the composite trait come from?*

We specify the collation trait def in 
[BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89]
 before parsing. It then [gets replaced in 
LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102]
 with the [composite 
trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239]
 which causes the failure.

*Why LogicalTableScan needs to do the collation magic?*

Dunno, it seems that it adds the statistics information to the collation trait 
so that the engine can handle sorting correctly. It does so only when we ask it 
to by adding the collation trait def.

*Why VolcanoPlanner doesn't like CompositeTraitSet in that part?*

Dunno.

*Do we need the collation trait def?*

Dunno.

*What do we do?*

If we can, it probably makes sense to replace LogicalTableScanRel with some 
kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic 
or makes it configurable

> Fix Calcite assertions
> --
>
> Key: BEAM-3417
> URL: https://issues.apache.org/jira/browse/BEAM-3417
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> Currently we disable assertions in test for every project which depends on 
> Beam SQL / Calcite. Otherwise it fails assertions when Calcite validates 
> relational representation of the query. E.g. in projects 

[jira] [Work logged] (BEAM-3425) CassandraIO fails to estimate size: Codec not found for requested operation: [varchar <-> java.lang.Long]

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3425?focusedWorklogId=79595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79595
 ]

ASF GitHub Bot logged work on BEAM-3425:


Author: ASF GitHub Bot
Created on: 12/Mar/18 20:17
Start Date: 12/Mar/18 20:17
Worklog Time Spent: 10m 
  Work Description: kubum commented on issue #4426: [BEAM-3425] Get the 
range start & end as String
URL: https://github.com/apache/beam/pull/4426#issuecomment-372448376
 
 
   Hi @jbonofre! Do you know what is the state of spliting at the moment? I see 
it is still the same in master, do you need any help with that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79595)
Time Spent: 10m
Remaining Estimate: 0h

> CassandraIO fails to estimate size: Codec not found for requested operation: 
> [varchar <-> java.lang.Long]
> -
>
> Key: BEAM-3425
> URL: https://issues.apache.org/jira/browse/BEAM-3425
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Reporter: Eugene Kirpichov
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See exception in 
> https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264#48131264
>  .
> The exception comes from 
> https://github.com/apache/beam/blob/master/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java#L279
>  , where I suppose "range_start" and "range_end" are really varchar, but the 
> code expects them to be long.
> Indeed they are varchar: 
> https://github.com/apache/cassandra/blob/4c80eeece37d79f434078224a0504400ae10a20d/src/java/org/apache/cassandra/db/SystemKeyspace.java#L238
>  and have been for at least the past 3 years.
> However really they seem to be storing longs: 
> https://github.com/apache/cassandra/blob/95b43b195e4074533100f863344c182a118a8b6c/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java#L229
> So I guess all that needs to be fixed is adding a Long.parseLong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #5143

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79518
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 12/Mar/18 16:44
Start Date: 12/Mar/18 16:44
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173865797
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   Can you think of a case where the current default pooling is not a good 
choice AND a user can provide a pooling configuration that will reliably give 
better behavior regardless of runner and of implementation details of JdbcIO?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79518)
Time Spent: 3h 50m  (was: 3h 40m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3500) JdbcIO: Improve connection management

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3500?focusedWorklogId=79520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79520
 ]

ASF GitHub Bot logged work on BEAM-3500:


Author: ASF GitHub Bot
Created on: 12/Mar/18 16:48
Start Date: 12/Mar/18 16:48
Worklog Time Spent: 10m 
  Work Description: jkff commented on a change in pull request #4461: 
[BEAM-3500] "Attach" JDBC connection to the bundle and add DataSourceFactory 
allowing full control of the way the DataSource is created
URL: https://github.com/apache/beam/pull/4461#discussion_r173867345
 
 

 ##
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##
 @@ -525,18 +596,24 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
 
 private ReadFn(
 DataSourceConfiguration dataSourceConfiguration,
+DataSourceFactory dataSourceFactory,
 ValueProvider query,
 PreparedStatementSetter parameterSetter,
 RowMapper rowMapper) {
   this.dataSourceConfiguration = dataSourceConfiguration;
+  this.dataSourceFactory = dataSourceFactory;
   this.query = query;
   this.parameterSetter = parameterSetter;
   this.rowMapper = rowMapper;
 }
 
 @Setup
 public void setup() throws Exception {
-  dataSource = dataSourceConfiguration.buildDatasource();
+  if (dataSourceFactory != null) {
+dataSource = dataSourceFactory.create();
 
 Review comment:
   In other words: In my view, dataSourceFactory is rather for cases where the 
DataSource requires some trickery to even construct, e.g. requires setting some 
vendor-specific connection parameters to establish the connection. I don't know 
if such cases exist at all, and I'd be open to simply dropping this feature 
from this PR. But if we think it's necessary, then I think we still want to 
wrap that with pooling, and I think for a user to try to configure pooling 
themselves is always a bad idea.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79520)
Time Spent: 4h  (was: 3h 50m)

> JdbcIO: Improve connection management
> -
>
> Key: BEAM-3500
> URL: https://issues.apache.org/jira/browse/BEAM-3500
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.2.0
>Reporter: Pawel Bartoszek
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> JdbcIO write DoFn acquires connection in {{@Setup}} and release it in 
> {{@Teardown}} methods, which means that connection might stay opened for days 
> in streaming job case. Keeping single connection open for so long might be 
> very risky as it's exposed to database, network etc issues.
> *Taking connection from the pool when it is actually needed*
> I suggest that connection would be taken from the connection pool in 
> {{executeBatch}} method and released when the batch is flushed. This will 
> allow the pool to take care of any returned unhealthy connections etc.
> *Make JdbcIO accept data source factory*
>  It would be nice if JdbcIO accepted DataSourceFactory rather than DataSource 
> itself. I am saying that because sink checks if DataSource implements 
> `Serializable` interface, which make it impossible to pass 
> BasicDataSource(used internally by sink) as it doesn’t implement this 
> interface. Something like:
> {code:java}
> interface DataSourceFactory extends Serializable{
>  DataSource create();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3714?focusedWorklogId=79517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79517
 ]

ASF GitHub Bot logged work on BEAM-3714:


Author: ASF GitHub Bot
Created on: 12/Mar/18 16:41
Start Date: 12/Mar/18 16:41
Worklog Time Spent: 10m 
  Work Description: jkff commented on issue #4786: [BEAM-3714]modified 
result set to be forward only and read only
URL: https://github.com/apache/beam/pull/4786#issuecomment-372377397
 
 
   @evindj Any updates here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79517)
Time Spent: 1h  (was: 50m)

> JdbcIO.read() should create a forward-only, read-only result set
> 
>
> Key: BEAM-3714
> URL: https://issues.apache.org/jira/browse/BEAM-3714
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jdbc
>Reporter: Eugene Kirpichov
>Assignee: Innocent
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934]
>  - a user is trying to load a large table from MySQL, and the MySQL JDBC 
> driver requires special measures when loading large result sets.
> JdbcIO currently calls simply "connection.prepareStatement(query)" 
> https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508
>  - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY 
> - these values should always be used.
> Seems that different databases have different requirements for streaming 
> result sets.
> E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must 
> not be in autocommit mode." 
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . 
> Oracle, I think, doesn't have any special requirements but I don't know. 
> Fetch size should probably still be set to a reasonably large value.
> Seems that the common denominator of these requirements is: set fetch size to 
> a reasonably large but not maximum value; disable autocommit (there's nothing 
> to commit in read() anyway).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #5142

2018-03-12 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3785?focusedWorklogId=79778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79778
 ]

ASF GitHub Bot logged work on BEAM-3785:


Author: ASF GitHub Bot
Created on: 13/Mar/18 05:21
Start Date: 13/Mar/18 05:21
Worklog Time Spent: 10m 
  Work Description: akedin opened a new pull request #4857: 
[BEAM-3785][SQL] Add support for arrays of rows
URL: https://github.com/apache/beam/pull/4857
 
 
   Support array fields containing rows, not just primitive types
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79778)
Time Spent: 1h 20m  (was: 1h 10m)

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396555#comment-16396555
 ] 

Anton Kedin commented on BEAM-3785:
---

implemented arrays of rows

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79761
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:39
Start Date: 13/Mar/18 04:39
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372544755
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79761)
Time Spent: 1h 50m  (was: 1h 40m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3785?focusedWorklogId=79780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79780
 ]

ASF GitHub Bot logged work on BEAM-3785:


Author: ASF GitHub Bot
Created on: 13/Mar/18 05:23
Start Date: 13/Mar/18 05:23
Worklog Time Spent: 10m 
  Work Description: akedin commented on issue #4857: [BEAM-3785][SQL] Add 
support for arrays of rows
URL: https://github.com/apache/beam/pull/4857#issuecomment-372550789
 
 
   R: @apilloud @kennknowles @XuMingmin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79780)
Time Spent: 1.5h  (was: 1h 20m)

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79784
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 05:37
Start Date: 13/Mar/18 05:37
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372552829
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79784)
Time Spent: 2h 10m  (was: 2h)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79755
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:27
Start Date: 13/Mar/18 04:27
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372543232
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79755)
Time Spent: 1.5h  (was: 1h 20m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3326) Execute a Stage via the portability framework in the ReferenceRunner

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3326?focusedWorklogId=79766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79766
 ]

ASF GitHub Bot logged work on BEAM-3326:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:47
Start Date: 13/Mar/18 04:47
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4825: [BEAM-3326] Add an 
InProcess SdkHarness Rule
URL: https://github.com/apache/beam/pull/4825#issuecomment-372545825
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79766)
Time Spent: 2h 40m  (was: 2.5h)

> Execute a Stage via the portability framework in the ReferenceRunner
> 
>
> Key: BEAM-3326
> URL: https://issues.apache.org/jira/browse/BEAM-3326
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is the supertask for remote execution in the Universal Local Runner 
> (BEAM-2899).
> This executes a stage remotely via portability framework APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79786
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 05:44
Start Date: 13/Mar/18 05:44
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372553716
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79786)
Time Spent: 2h 20m  (was: 2h 10m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79657
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 12/Mar/18 23:25
Start Date: 12/Mar/18 23:25
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners
URL: https://github.com/apache/beam/pull/4853#issuecomment-372495713
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79657)
Time Spent: 20m  (was: 10m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79656
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 12/Mar/18 23:25
Start Date: 12/Mar/18 23:25
Worklog Time Spent: 10m 
  Work Description: yifanzou opened a new pull request #4853: [BEAM-3840] 
Get python mobile-gaming automating on core runners
URL: https://github.com/apache/beam/pull/4853
 
 
   DESCRIPTION HERE
   
   Python release automation of mobile gaming examples.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79656)
Time Spent: 10m
Remaining Estimate: 0h

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Spark #1461

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[klk] Increase whitelist of false detections in SdkCoreApiSurfaceTest

[klk] Eliminate beam-model-fn-execution test-jar deps

[klk] Eliminate beam-sdks-java-fn-execution test-jar deps

[klk] Eliminate incorrect sdks-java-core test-jar deps

[klk] Notate uses of beam-runners-core-java test-jar

--
[...truncated 90.62 KB...]
'apache-beam-testing:bqjob_r7205a581b0bd5c83_01621cb9e303_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r7205a581b0bd5c83_01621cb9e303_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r7205a581b0bd5c83_01621cb9e303_1 ... (0s) Current status: DONE   
2018-03-13 00:19:29,063 12f62fa7 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-13 00:19:53,938 12f62fa7 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-13 00:19:56,403 12f62fa7 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.45s,  CPU:0.30s,  MaxMemory:28996kb 
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_rb52b19819a3516c_01621cba4d02_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_rb52b19819a3516c_01621cba4d02_1 ... (0s) 
Current status: RUNNING 
Waiting on 
bqjob_rb52b19819a3516c_01621cba4d02_1 ... (0s) Current status: DONE   
2018-03-13 00:19:56,404 12f62fa7 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-13 00:20:15,312 12f62fa7 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-13 00:20:17,638 12f62fa7 MainThread INFO Ran: {bq load --autodetect 
--source_format=NEWLINE_DELIMITED_JSON beam_performance.pkb_results 

  ReturnCode:1,  WallTime:0:02.31s,  CPU:0.30s,  MaxMemory:28992kb 
STDOUT: 

BigQuery error in load operation: Error processing job
'apache-beam-testing:bqjob_r7216d1b00b1bfbe6_01621cbaa08f_1': Invalid schema
update. Field timestamp has changed type from TIMESTAMP to FLOAT

STDERR: 
/usr/lib/google-cloud-sdk/platform/bq/third_party/oauth2client/contrib/gce.py:73:
 UserWarning: You have requested explicit scopes to be used with a GCE service 
account.
Using this argument will have no effect on the actual scopes for tokens
requested. These scopes are set at VM instance creation time and
can't be overridden in the request.

  warnings.warn(_SCOPES_WARNING)
Upload complete.Waiting on bqjob_r7216d1b00b1bfbe6_01621cbaa08f_1 ... (0s) 
Current status: RUNNING 
 Waiting on 
bqjob_r7216d1b00b1bfbe6_01621cbaa08f_1 ... (0s) Current status: DONE   
2018-03-13 00:20:17,638 12f62fa7 MainThread INFO Retrying exception running 
IssueRetryableCommand: Command returned a non-zero exit code.

2018-03-13 00:20:45,546 12f62fa7 MainThread INFO Running: bq load 
--autodetect --source_format=NEWLINE_DELIMITED_JSON 
beam_performance.pkb_results 

2018-03-13 00:20:47,829 12f62fa7 MainThread INFO Ran: {bq load --autodetect 

[jira] [Resolved] (BEAM-3124) Make flatten explicit with portability

2018-03-12 Thread Daniel Oliveira (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira resolved BEAM-3124.
---
   Resolution: Done
Fix Version/s: 2.4.0

> Make flatten explicit with portability
> --
>
> Key: BEAM-3124
> URL: https://issues.apache.org/jira/browse/BEAM-3124
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>
> Some adjustments need to be made to the way flattens are handled to the get 
> them working for portability. Ideally flattens should be present in the 
> execution graphs, there should be no DoFns with multiple inputs, and flattens 
> should execute on either the system runners or SDK runners when appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1874) Google Cloud Storage TextIO read fails with gz-files having Content-Encoding: gzip header

2018-03-12 Thread Deepyaman Datta (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396465#comment-16396465
 ] 

Deepyaman Datta commented on BEAM-1874:
---

Hi,

I'm getting the same error using both `DirectRunner` and `DataflowRunner`. If I 
run my pipeline on a subset of files in GCS without `Content-Encoding` set, it 
works; if `Content-Encoding` is `gzip`, it fails. I have a mixture of files 
with and without `Content-Encoding` set, and I cannot touch the files.

[~smphhh] How are you removing the headers before reading the files with Beam?

Thanks!

Deepyaman

> Google Cloud Storage TextIO read fails with gz-files having Content-Encoding: 
> gzip header
> -
>
> Key: BEAM-1874
> URL: https://issues.apache.org/jira/browse/BEAM-1874
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 0.6.0
>Reporter: Samuli Holopainen
>Assignee: Chamikara Jayalath
>Priority: Major
>
> We have gzipped text files in Google Cloud Storage that have the following 
> metadata headers set:
> Content-Encoding: gzip
> Content-Type: application/octet-stream
> Trying to read these with apache_beam.io.ReadFromText yields the following 
> error:
> ERROR:root:Exception while fetching 341565 bytes from position 0 of 
> gs://...-c72fa25a-5d8a-4801-a0b4-54b58c4723ce.gz: Cannot have start index 
> greater than total size
> Traceback (most recent call last):
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py",
>  line 585, in _fetch_to_queue
> value = func(*args)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py",
>  line 610, in _get_segment
> downloader.GetRange(start, end)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apitools/base/py/transfer.py",
>  line 477, in GetRange
> progress, end_byte = self.__NormalizeStartEnd(start, end)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apitools/base/py/transfer.py",
>  line 340, in __NormalizeStartEnd
> 'Cannot have start index greater than total size')
> TransferInvalidError: Cannot have start index greater than total size
> WARNING:root:Task failed: Traceback (most recent call last):
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/runners/direct/executor.py",
>  line 300, in __call__
> result = evaluator.finish_bundle()
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
>  line 206, in finish_bundle
> bundles = _read_values_to_bundles(reader)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
>  line 196, in _read_values_to_bundles
> read_result = [GlobalWindows.windowed_value(e) for e in reader]
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/concat_source.py",
>  line 79, in read
> range_tracker.sub_range_tracker(source_ix)):
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/textio.py",
>  line 155, in read_records
> read_buffer)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/textio.py",
>  line 245, in _read_record
> sep_bounds = self._find_separator_bounds(file_to_read, read_buffer)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/textio.py",
>  line 190, in _find_separator_bounds
> file_to_read, read_buffer, current_pos + 1):
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/textio.py",
>  line 212, in _try_to_ensure_num_bytes_in_buffer
> read_data = file_to_read.read(self._buffer_size)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/fileio.py",
>  line 460, in read
> self._fetch_to_internal_buffer(num_bytes)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/fileio.py",
>  line 420, in _fetch_to_internal_buffer
> buf = self._file.read(self._read_size)
>   File 
> "/Users/samuli.holopainen/miniconda2/envs/python-dataflow/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py",
>  line 472, in read
> return self._read_inner(size=size, readline=False)
>   File 
> 

[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79757
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:33
Start Date: 13/Mar/18 04:33
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372543999
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79757)
Time Spent: 1h 40m  (was: 1.5h)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=79770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79770
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:50
Start Date: 13/Mar/18 04:50
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4751: [BEAM-3327] Implement 
simple Docker container manager
URL: https://github.com/apache/beam/pull/4751#issuecomment-372546237
 
 
   Bein' grumbly about wedged tests and timeouts, as you do


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79770)
Time Spent: 4.5h  (was: 4h 20m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=79768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79768
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:50
Start Date: 13/Mar/18 04:50
Worklog Time Spent: 10m 
  Work Description: tgroh commented on issue #4751: [BEAM-3327] Implement 
simple Docker container manager
URL: https://github.com/apache/beam/pull/4751#issuecomment-372546237
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79768)
Time Spent: 4h 20m  (was: 4h 10m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79773
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:51
Start Date: 13/Mar/18 04:51
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r174016261
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
 ##
 @@ -61,12 +66,23 @@
* The returned {@link QueryablePipeline} will contain only the primitive 
transforms present
* within the provided components.
*/
-  public static QueryablePipeline fromComponents(Components components) {
+  public static QueryablePipeline forPrimitivesIn(Components components) {
+return forComponents(retainOnlyPrimitives(components));
+  }
+
+  /**
+   * Create a new {@link QueryablePipeline} based on the provided components.
+   *
+   * Relationships between composite transforms and their subtransforms, 
and producer
+   * relationships between {@link PTransformNode transforms} and {@link 
PCollectionNode
+   * PCollections} are not yet modelled by {@link QueryablePipeline}, so the 
input {@link
+   * Components} should be treatable as though each node is a primitive.
+   */
+  static QueryablePipeline forComponents(Components components) {
 
 Review comment:
   https://github.com/apache/beam/pull/4844 performs a lot of this change, 
without embedding the components within the stage.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79773)
Time Spent: 12.5h  (was: 12h 20m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561
 ] 

Anton Kedin edited comment on BEAM-3785 at 3/13/18 5:28 AM:


to go:

 - arrays of arrays

 - test complex indexing (nested expressions)

 - test aggregations, other complex operations

 - DOT operator


was (Author: kedin):
to go:

 - arrays of arrays

 - test complex indexing

 - DOT operator

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561
 ] 

Anton Kedin commented on BEAM-3785:
---

to go:

 - arrays of arrays

 - test complex indexing

 - DOT operator

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79782
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 05:30
Start Date: 13/Mar/18 05:30
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372551729
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79782)
Time Spent: 2h  (was: 1h 50m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79774
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 04:52
Start Date: 13/Mar/18 04:52
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4777: 
[BEAM-3565] Add FusedPipeline#toPipeline
URL: https://github.com/apache/beam/pull/4777#discussion_r174016261
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java
 ##
 @@ -61,12 +66,23 @@
* The returned {@link QueryablePipeline} will contain only the primitive 
transforms present
* within the provided components.
*/
-  public static QueryablePipeline fromComponents(Components components) {
+  public static QueryablePipeline forPrimitivesIn(Components components) {
+return forComponents(retainOnlyPrimitives(components));
+  }
+
+  /**
+   * Create a new {@link QueryablePipeline} based on the provided components.
+   *
+   * Relationships between composite transforms and their subtransforms, 
and producer
+   * relationships between {@link PTransformNode transforms} and {@link 
PCollectionNode
+   * PCollections} are not yet modelled by {@link QueryablePipeline}, so the 
input {@link
+   * Components} should be treatable as though each node is a primitive.
+   */
+  static QueryablePipeline forComponents(Components components) {
 
 Review comment:
   https://github.com/apache/beam/pull/4844 performs a lot of this change, 
without embedding the components within the stage; it will still require us to 
create a partial components, but should ultimately cause this construction to 
go through the same path as the original `queryablePipeline`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79774)
Time Spent: 12h 40m  (was: 12.5h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3842) Allow static methods to be defined within PipelineOptions interfaces.

2018-03-12 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-3842:
---

 Summary: Allow static methods to be defined within PipelineOptions 
interfaces.
 Key: BEAM-3842
 URL: https://issues.apache.org/jira/browse/BEAM-3842
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79664
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 12/Mar/18 23:52
Start Date: 12/Mar/18 23:52
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372500364
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79664)
Time Spent: 40m  (was: 0.5h)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3841) Python TestDataflowRunner should oeverride run_pipeline

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3841?focusedWorklogId=79668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79668
 ]

ASF GitHub Bot logged work on BEAM-3841:


Author: ASF GitHub Bot
Created on: 13/Mar/18 00:07
Start Date: 13/Mar/18 00:07
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #4856: [BEAM-3841] Fix 
TestDataflowRunner.run to run_pipeline
URL: https://github.com/apache/beam/pull/4856#issuecomment-372502819
 
 
   +R: @robertwb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79668)
Time Spent: 20m  (was: 10m)

> Python TestDataflowRunner should oeverride run_pipeline
> ---
>
> Key: BEAM-3841
> URL: https://issues.apache.org/jira/browse/BEAM-3841
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py]
>  is inherited from 
> [DataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py].
>  Basically, it wraps DataflowRunner.run_pipeline and provide more test 
> actions. 
> However DataflowRunner.run renamed to run_pipeline in [this 
> commit|https://github.com/apache/beam/commit/8cf222d3db1188aff5432af548961fc670f97635],
>  but run function in TestDataflowRunner didn't change.
> We should change it accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3841) Python TestDataflowRunner should oeverride run_pipeline

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3841?focusedWorklogId=79667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79667
 ]

ASF GitHub Bot logged work on BEAM-3841:


Author: ASF GitHub Bot
Created on: 13/Mar/18 00:05
Start Date: 13/Mar/18 00:05
Worklog Time Spent: 10m 
  Work Description: markflyhigh opened a new pull request #4856: 
[BEAM-3841] Fix TestDataflowRunner.run to run_pipeline
URL: https://github.com/apache/beam/pull/4856
 
 
   Since DataflowRunner renamed `run` to `run_pipeline` in 
[here](https://github.com/apache/beam/commit/8cf222d3db1188aff5432af548961fc670f97635),
 same rename should also happen in TestDataflowRunner, whose run function 
overrides DataflowRunner's.
   
   Test is done by running wordcount_it_test on Dataflow.
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79667)
Time Spent: 10m
Remaining Estimate: 0h

> Python TestDataflowRunner should oeverride run_pipeline
> ---
>
> Key: BEAM-3841
> URL: https://issues.apache.org/jira/browse/BEAM-3841
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py]
>  is inherited from 
> [DataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py].
>  Basically, it wraps DataflowRunner.run_pipeline and provide more test 
> actions. 
> However DataflowRunner.run renamed to run_pipeline in [this 
> commit|https://github.com/apache/beam/commit/8cf222d3db1188aff5432af548961fc670f97635],
>  but run function in TestDataflowRunner didn't change.
> We should change it accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79685
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 01:00
Start Date: 13/Mar/18 01:00
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372511269
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79685)
Time Spent: 50m  (was: 40m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3841) Python TestDataflowRunner should oeverride run_pipeline

2018-03-12 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-3841:
---
Issue Type: Bug  (was: Test)

> Python TestDataflowRunner should oeverride run_pipeline
> ---
>
> Key: BEAM-3841
> URL: https://issues.apache.org/jira/browse/BEAM-3841
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>
> [TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py]
>  is inherited from 
> [DataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py].
>  Basically, it wraps DataflowRunner.run_pipeline and provide more test 
> actions. 
> However DataflowRunner.run renamed to run_pipeline in [this 
> commit|https://github.com/apache/beam/commit/8cf222d3db1188aff5432af548961fc670f97635],
>  but run function in TestDataflowRunner didn't change.
> We should change it accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-3841) Python TestDataflowRunner should oeverride run_pipeline

2018-03-12 Thread Mark Liu (JIRA)
Mark Liu created BEAM-3841:
--

 Summary: Python TestDataflowRunner should oeverride run_pipeline
 Key: BEAM-3841
 URL: https://issues.apache.org/jira/browse/BEAM-3841
 Project: Beam
  Issue Type: Test
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu


[TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py]
 is inherited from 
[DataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py].
 Basically, it wraps DataflowRunner.run_pipeline and provide more test actions. 

However DataflowRunner.run renamed to run_pipeline in [this 
commit|https://github.com/apache/beam/commit/8cf222d3db1188aff5432af548961fc670f97635],
 but run function in TestDataflowRunner didn't change.

We should change it accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3843) Simplify checking for experimental options

2018-03-12 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-3843:

Summary: Simplify checking for experimental options  (was: Simplify usage 
of ExperimentalOptions)

> Simplify checking for experimental options
> --
>
> Key: BEAM-3843
> URL: https://issues.apache.org/jira/browse/BEAM-3843
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_Python #1017

2018-03-12 Thread Apache Jenkins Server
See 


Changes:

[klk] Increase whitelist of false detections in SdkCoreApiSurfaceTest

[klk] Eliminate beam-model-fn-execution test-jar deps

[klk] Eliminate beam-sdks-java-fn-execution test-jar deps

[klk] Eliminate incorrect sdks-java-core test-jar deps

[klk] Notate uses of beam-runners-core-java test-jar

--
[...truncated 1.66 KB...]
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5953038148508546577.sh
+ rm -rf .env
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/jenkins542121057356018542.sh
+ virtualenv .env --system-site-packages
New python executable in .env/bin/python
Installing setuptools, pip...done.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4529282279178487927.sh
+ .env/bin/pip install --upgrade setuptools pip
Downloading/unpacking setuptools from 
https://pypi.python.org/packages/ad/dc/fcced9ec3f2561c0cbe8eb6527eef7cf4f4919a2b3a07891a36e846635af/setuptools-38.5.2-py2.py3-none-any.whl#md5=abd3307cdce6fb543b5a4d0e3e98bdb6
Downloading/unpacking pip from 
https://pypi.python.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#md5=297dbd16ef53bcef0447d245815f5144
Installing collected packages: setuptools, pip
  Found existing installation: setuptools 2.2
Uninstalling setuptools:
  Successfully uninstalled setuptools
  Found existing installation: pip 1.5.4
Uninstalling pip:
  Successfully uninstalled pip
Successfully installed setuptools pip
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins2475989602410480409.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5815191060977757417.sh
+ .env/bin/pip install -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied: absl-py in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied: jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied: setuptools in ./.env/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 16))
Requirement already satisfied: colorlog[windows]==2.6.0 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 17))
Requirement already satisfied: blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied: futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied: PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied: pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Collecting numpy==1.13.3 (from -r PerfKitBenchmarker/requirements.txt (line 22))
:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied: contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Requirement already satisfied: pywinrm in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 25))
Requirement already satisfied: six in 

[jira] [Work logged] (BEAM-3840) Get Python Mobile-Gaming Running on Core Runners

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3840?focusedWorklogId=79704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79704
 ]

ASF GitHub Bot logged work on BEAM-3840:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:13
Start Date: 13/Mar/18 02:13
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4853: [BEAM-3840] Get 
python mobile-gaming automating on core runners -- testing
URL: https://github.com/apache/beam/pull/4853#issuecomment-372523145
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79704)
Time Spent: 1h 20m  (was: 1h 10m)

> Get Python Mobile-Gaming Running on Core Runners
> 
>
> Key: BEAM-3840
> URL: https://issues.apache.org/jira/browse/BEAM-3840
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Affects Versions: 2.5.0
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79710
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:33
Start Date: 13/Mar/18 02:33
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4844: 
[BEAM-3565] Add ExecutableStagePayload to simplify runner stage reconstruction
URL: https://github.com/apache/beam/pull/4844#discussion_r174001443
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -84,64 +81,72 @@
* follows:
*
* 
-   *   The {@link PTransform#getSubtransformsList()} contains the result 
of {@link
-   *   #getTransforms()}.
+   *   The {@link PTransform#getSubtransformsList()} contains no 
subtransforms. This ensures
+   *   that executable stages are treated as primitive transforms.
*   The only {@link PCollection} in the {@link 
PTransform#getInputsMap()} is the result of
*   {@link #getInputPCollection()}.
*   The output {@link PCollection PCollections} in the values of {@link
*   PTransform#getOutputsMap()} are the {@link PCollectionNode 
PCollections} returned by
*   {@link #getOutputPCollections()}.
+   *   The {@link FunctionSpec} contains an {@link ExecutableStagePayload} 
which has its input
+   *   and output PCollections set to the same values as the outer 
PTransform itself. It further
+   *   contains the environment set of transforms for this stage.
* 
+   *
+   * The executable stage can be reconstructed from the resulting {@link 
ExecutableStagePayload}
+   * and components alone via {@link #fromPayload(ExecutableStagePayload, 
Components)}.
*/
   default PTransform toPTransform() {
+ExecutableStagePayload.Builder payload = 
ExecutableStagePayload.newBuilder();
+
+payload.setEnvironment(getEnvironment());
+
+PCollectionNode input = getInputPCollection();
+payload.setInput(input.getId());
+
+for (PTransformNode transform : getTransforms()) {
+  payload.addTransforms(transform.getId());
+}
+
+for (PCollectionNode output : getOutputPCollections()) {
+  payload.addOutputs(output.getId());
+}
+
 PTransform.Builder pt = PTransform.newBuilder();
+pt.setSpec(FunctionSpec.newBuilder()
+.setUrn(ExecutableStage.URN)
+.setPayload(payload.build().toByteString())
+.build());
 pt.putInputs("input", getInputPCollection().getId());
-int i = 0;
-for (PCollectionNode materializedPCollection : getOutputPCollections()) {
-  pt.putOutputs(String.format("materialized_%s", i), 
materializedPCollection.getId());
-  i++;
-}
-for (PTransformNode fusedTransform : getTransforms()) {
-  pt.addSubtransforms(fusedTransform.getId());
+int outputIndex = 0;
+for (PCollectionNode pcNode : getOutputPCollections()) {
+  // Do something
 
 Review comment:
   What's this comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79710)
Time Spent: 12h  (was: 11h 50m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79713
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:33
Start Date: 13/Mar/18 02:33
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4844: 
[BEAM-3565] Add ExecutableStagePayload to simplify runner stage reconstruction
URL: https://github.com/apache/beam/pull/4844#discussion_r174001198
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -84,64 +81,72 @@
* follows:
*
* 
-   *   The {@link PTransform#getSubtransformsList()} contains the result 
of {@link
-   *   #getTransforms()}.
+   *   The {@link PTransform#getSubtransformsList()} contains no 
subtransforms. This ensures
 
 Review comment:
   s/contains no subtransforms/is empty/
   
   Just because it scans better


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79713)
Time Spent: 12h 20m  (was: 12h 10m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79711
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:33
Start Date: 13/Mar/18 02:33
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4844: 
[BEAM-3565] Add ExecutableStagePayload to simplify runner stage reconstruction
URL: https://github.com/apache/beam/pull/4844#discussion_r174001847
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -84,64 +81,72 @@
* follows:
*
* 
-   *   The {@link PTransform#getSubtransformsList()} contains the result 
of {@link
-   *   #getTransforms()}.
+   *   The {@link PTransform#getSubtransformsList()} contains no 
subtransforms. This ensures
+   *   that executable stages are treated as primitive transforms.
*   The only {@link PCollection} in the {@link 
PTransform#getInputsMap()} is the result of
*   {@link #getInputPCollection()}.
*   The output {@link PCollection PCollections} in the values of {@link
*   PTransform#getOutputsMap()} are the {@link PCollectionNode 
PCollections} returned by
*   {@link #getOutputPCollections()}.
+   *   The {@link FunctionSpec} contains an {@link ExecutableStagePayload} 
which has its input
+   *   and output PCollections set to the same values as the outer 
PTransform itself. It further
+   *   contains the environment set of transforms for this stage.
* 
+   *
+   * The executable stage can be reconstructed from the resulting {@link 
ExecutableStagePayload}
+   * and components alone via {@link #fromPayload(ExecutableStagePayload, 
Components)}.
*/
   default PTransform toPTransform() {
+ExecutableStagePayload.Builder payload = 
ExecutableStagePayload.newBuilder();
+
+payload.setEnvironment(getEnvironment());
+
+PCollectionNode input = getInputPCollection();
+payload.setInput(input.getId());
+
+for (PTransformNode transform : getTransforms()) {
+  payload.addTransforms(transform.getId());
 
 Review comment:
   I'm not huge on the double iteration, given that I think you can populate 
both the payload and the transform builder in the same places to maximize 
mental locality (e.g. populate `payload.setInput` and then 
`pt.putInput("input", `)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79711)
Time Spent: 12h 10m  (was: 12h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79712
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:33
Start Date: 13/Mar/18 02:33
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4844: 
[BEAM-3565] Add ExecutableStagePayload to simplify runner stage reconstruction
URL: https://github.com/apache/beam/pull/4844#discussion_r174001680
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -84,64 +81,72 @@
* follows:
*
* 
-   *   The {@link PTransform#getSubtransformsList()} contains the result 
of {@link
-   *   #getTransforms()}.
+   *   The {@link PTransform#getSubtransformsList()} contains no 
subtransforms. This ensures
+   *   that executable stages are treated as primitive transforms.
*   The only {@link PCollection} in the {@link 
PTransform#getInputsMap()} is the result of
*   {@link #getInputPCollection()}.
*   The output {@link PCollection PCollections} in the values of {@link
*   PTransform#getOutputsMap()} are the {@link PCollectionNode 
PCollections} returned by
*   {@link #getOutputPCollections()}.
+   *   The {@link FunctionSpec} contains an {@link ExecutableStagePayload} 
which has its input
+   *   and output PCollections set to the same values as the outer 
PTransform itself. It further
+   *   contains the environment set of transforms for this stage.
* 
+   *
+   * The executable stage can be reconstructed from the resulting {@link 
ExecutableStagePayload}
+   * and components alone via {@link #fromPayload(ExecutableStagePayload, 
Components)}.
*/
   default PTransform toPTransform() {
+ExecutableStagePayload.Builder payload = 
ExecutableStagePayload.newBuilder();
+
+payload.setEnvironment(getEnvironment());
+
+PCollectionNode input = getInputPCollection();
+payload.setInput(input.getId());
+
+for (PTransformNode transform : getTransforms()) {
+  payload.addTransforms(transform.getId());
+}
+
+for (PCollectionNode output : getOutputPCollections()) {
+  payload.addOutputs(output.getId());
+}
+
 PTransform.Builder pt = PTransform.newBuilder();
+pt.setSpec(FunctionSpec.newBuilder()
+.setUrn(ExecutableStage.URN)
+.setPayload(payload.build().toByteString())
+.build());
 pt.putInputs("input", getInputPCollection().getId());
-int i = 0;
-for (PCollectionNode materializedPCollection : getOutputPCollections()) {
-  pt.putOutputs(String.format("materialized_%s", i), 
materializedPCollection.getId());
-  i++;
-}
-for (PTransformNode fusedTransform : getTransforms()) {
-  pt.addSubtransforms(fusedTransform.getId());
+int outputIndex = 0;
+for (PCollectionNode pcNode : getOutputPCollections()) {
+  // Do something
+  pt.putOutputs(String.format("materialized_%d", outputIndex), 
pcNode.getId());
+  outputIndex++;
 }
-pt.setSpec(FunctionSpec.newBuilder().setUrn(ExecutableStage.URN));
 return pt.build();
   }
 
+  // TODO: Should this live under ExecutableStageTranslation?
 
 Review comment:
   Regardless of if it should or shouldn't, you should have a JIRA to determine 
it.
   
   For what it's worth, I think the toProto and fromProto methods should be 
coresident, and I think this is a totally reasonable place for them (as it's 
already involved with the proto representation of the pipeline, so we don't get 
significant separability between the java and proto representations.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79712)
Time Spent: 12h 10m  (was: 12h)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some 

[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79708
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:33
Start Date: 13/Mar/18 02:33
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4844: 
[BEAM-3565] Add ExecutableStagePayload to simplify runner stage reconstruction
URL: https://github.com/apache/beam/pull/4844#discussion_r174001324
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/ExecutableStage.java
 ##
 @@ -84,64 +81,72 @@
* follows:
*
* 
-   *   The {@link PTransform#getSubtransformsList()} contains the result 
of {@link
-   *   #getTransforms()}.
+   *   The {@link PTransform#getSubtransformsList()} contains no 
subtransforms. This ensures
+   *   that executable stages are treated as primitive transforms.
*   The only {@link PCollection} in the {@link 
PTransform#getInputsMap()} is the result of
*   {@link #getInputPCollection()}.
*   The output {@link PCollection PCollections} in the values of {@link
*   PTransform#getOutputsMap()} are the {@link PCollectionNode 
PCollections} returned by
*   {@link #getOutputPCollections()}.
+   *   The {@link FunctionSpec} contains an {@link ExecutableStagePayload} 
which has its input
 
 Review comment:
   The `PTransform#getSpec()` contains an...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79708)
Time Spent: 11h 50m  (was: 11h 40m)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3565) Add utilities for producing a collection of PTransforms that can execute in a single SDK Harness

2018-03-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3565?focusedWorklogId=79709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79709
 ]

ASF GitHub Bot logged work on BEAM-3565:


Author: ASF GitHub Bot
Created on: 13/Mar/18 02:33
Start Date: 13/Mar/18 02:33
Worklog Time Spent: 10m 
  Work Description: tgroh commented on a change in pull request #4844: 
[BEAM-3565] Add ExecutableStagePayload to simplify runner stage reconstruction
URL: https://github.com/apache/beam/pull/4844#discussion_r174001107
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -203,6 +203,23 @@ message PCollection {
   DisplayData display_data = 5;
 }
 
+// The payload for an executable stage. This will eventually be passed to an 
SDK in the form of a
+// ProcessBundleDescriptor.
+message ExecutableStagePayload {
+
+  Environment environment = 1;
 
 Review comment:
   I'd like a comment explaining why this is a value rather than an ID, or 
replace it with `string environment_id`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79709)

> Add utilities for producing a collection of PTransforms that can execute in a 
> single SDK Harness
> 
>
> Key: BEAM-3565
> URL: https://issues.apache.org/jira/browse/BEAM-3565
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>Priority: Major
>  Labels: portability
> Fix For: 2.4.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> An SDK Harness executes some ("fused") collection of PTransforms. The java 
> runner libraries should provide some way to take a Pipeline that executes in 
> both a runner and an environment and construct a collection of transforms 
> which can execute within a single environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >