[jira] [Updated] (BEAM-3848) SolrIO: Add retrying mechanism in client writes

2018-06-11 Thread Tim Robertson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated BEAM-3848:

Issue Type: Improvement  (was: New Feature)

> SolrIO: Add retrying mechanism in client writes
> ---
>
> Key: BEAM-3848
> URL: https://issues.apache.org/jira/browse/BEAM-3848
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Minor
> Fix For: 2.5.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> A busy SOLR server is prone to return RemoteSOLRException on writing which 
> currently fails a complete task (e.g. a partition of a spark RDD being 
> written to SOLR).
> A good addition would be the ability to provide a retrying mechanism for the 
> batch in flight, rather than failing fast, which will most likely trigger a 
> much larger retry of more writes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3201) ElasticsearchIO should allow the user to optionally pass id, type and index per document

2018-06-11 Thread Tim Robertson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated BEAM-3201:

Issue Type: New Feature  (was: Improvement)

> ElasticsearchIO should allow the user to optionally pass id, type and index 
> per document
> 
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Tim Robertson
>Priority: Major
> Fix For: 2.5.0
>
>
> *Dynamic documents id*: Today the ESIO only inserts the payload of the ES 
> documents. Elasticsearch generates a document id for each record inserted. So 
> each new insertion is considered as a new document. Users want to be able to 
> update documents using the IO. So, for the write part of the IO, users should 
> be able to provide a document id so that they could update already stored 
> documents. Providing an id for the documents could also help the user on 
> indempotency.
> *Dynamic ES type and ES index*: In some cases (streaming pipeline with high 
> throughput) partitioning the PCollection to allow to plug to different ESIO 
> instances (pointing to different index/type) is not very practical, the users 
> would like to be able to set ES index/type per document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=110958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110958
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 12/Jun/18 05:50
Start Date: 12/Jun/18 05:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5519: 
[BEAM-4432] Adding Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#discussion_r194620716
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -117,6 +117,7 @@ def get_version():
 REQUIRED_TEST_PACKAGES = [
 'nose>=1.3.7',
 'pyhamcrest>=1.9,<2.0',
+'numpy>=1.14.3',
 
 Review comment:
   Numpy is needed here because we do a zipf distribution (i.e. heavily 
weighted towards a few keys). We can define an 'extra' that is `'perftests'`, 
so that numpy is only installed when we do `pip install -e .[perftest]`. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110958)
Time Spent: 2h  (was: 1h 50m)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=110957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110957
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 12/Jun/18 05:48
Start Date: 12/Jun/18 05:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5519: 
[BEAM-4432] Adding Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#discussion_r194620716
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -117,6 +117,7 @@ def get_version():
 REQUIRED_TEST_PACKAGES = [
 'nose>=1.3.7',
 'pyhamcrest>=1.9,<2.0',
+'numpy>=1.14.3',
 
 Review comment:
   Numpy is needed here because we do a zipf distribution (i.e. heavily 
weighted towards a few keys). Another option would be to have these files not 
be part of Beam, but since they will be used for perf tests, it'll likely be 
worth having them directly n beam.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110957)
Time Spent: 1h 50m  (was: 1h 40m)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=110956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110956
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 12/Jun/18 05:47
Start Date: 12/Jun/18 05:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5519: 
[BEAM-4432] Adding Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#discussion_r194620610
 
 

 ##
 File path: sdks/python/scripts/generate_pydoc.sh
 ##
 @@ -174,6 +174,7 @@ nitpicky = True
 nitpick_ignore = []
 nitpick_ignore += [('py:class', iden) for iden in ignore_identifiers]
 nitpick_ignore += [('py:obj', iden) for iden in ignore_identifiers]
+nitpick_ignore += [('py:exc', 'ValueError')]
 
 Review comment:
   I found this issue in Sphinx. 
https://github.com/sphinx-doc/sphinx/issues/1034


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110956)
Time Spent: 1h 40m  (was: 1.5h)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #741

2018-06-11 Thread Apache Jenkins Server
See 


Changes:

[lukasz.gajowy] [BEAM-4415] Add jenkins job for HDFS ParquetIOIT

--
[...truncated 17.22 MB...]
INFO: Uploading <115996 bytes, hash E67Ewfupx338XgmvCsRyDA> to 
gs://temp-storage-for-end-to-end-tests/spannerwriteit0testreportfailures-jenkins-0612052921-91208746/output/results/staging/pipeline-E67Ewfupx338XgmvCsRyDA.pb

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_OUT
Dataflow SDK version: 2.6.0-SNAPSHOT

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_ERROR
Jun 12, 2018 5:29:25 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-06-11_22_29_25-14638156897536955410?project=apache-beam-testing

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_OUT
Submitted job: 2018-06-11_22_29_25-14638156897536955410

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_ERROR
Jun 12, 2018 5:29:25 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud dataflow jobs --project=apache-beam-testing cancel 
--region=us-central1 2018-06-11_22_29_25-14638156897536955410
Jun 12, 2018 5:29:25 AM org.apache.beam.runners.dataflow.TestDataflowRunner 
run
INFO: Running Dataflow job 2018-06-11_22_29_25-14638156897536955410 with 0 
expected assertions.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:25.214Z: Autoscaling is enabled for job 
2018-06-11_22_29_25-14638156897536955410. The number of workers will be between 
1 and 1000.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:25.238Z: Autoscaling was automatically enabled for 
job 2018-06-11_22_29_25-14638156897536955410.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:27.663Z: Checking required Cloud APIs are enabled.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:27.865Z: Checking permissions granted to controller 
Service Account.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:31.235Z: Worker configuration: n1-standard-1 in 
us-central1-b.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:31.567Z: Expanding CoGroupByKey operations into 
optimizable parts.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:31.729Z: Expanding GroupByKey operations into 
optimizable parts.
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:31.761Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:31.950Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:31.982Z: Elided trivial flatten 
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:32.008Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:32.024Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Jun 12, 2018 5:29:36 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T05:29:32.047Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Jun 12, 2018 5:29:36 AM 

[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=110951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110951
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 12/Jun/18 04:58
Start Date: 12/Jun/18 04:58
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5519: 
[BEAM-4432] Adding Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#discussion_r194614742
 
 

 ##
 File path: sdks/python/scripts/generate_pydoc.sh
 ##
 @@ -174,6 +174,7 @@ nitpicky = True
 nitpick_ignore = []
 nitpick_ignore += [('py:class', iden) for iden in ignore_identifiers]
 nitpick_ignore += [('py:obj', iden) for iden in ignore_identifiers]
+nitpick_ignore += [('py:exc', 'ValueError')]
 
 Review comment:
   Do you know why?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110951)
Time Spent: 1.5h  (was: 1h 20m)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4432) Performance tests need a way to generate Synthetic data

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4432?focusedWorklogId=110950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110950
 ]

ASF GitHub Bot logged work on BEAM-4432:


Author: ASF GitHub Bot
Created on: 12/Jun/18 04:58
Start Date: 12/Jun/18 04:58
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #5519: 
[BEAM-4432] Adding Sources to produce Synthetic output for Batch pipelines
URL: https://github.com/apache/beam/pull/5519#discussion_r194614733
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -117,6 +117,7 @@ def get_version():
 REQUIRED_TEST_PACKAGES = [
 'nose>=1.3.7',
 'pyhamcrest>=1.9,<2.0',
+'numpy>=1.14.3',
 
 Review comment:
   This is a rather big dependency. Is it possible for us to not use it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110950)
Time Spent: 1h 20m  (was: 1h 10m)

> Performance tests need a way to generate Synthetic data
> ---
>
> Key: BEAM-4432
> URL: https://issues.apache.org/jira/browse/BEAM-4432
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] 01/01: Prepare repository for deployment.

2018-06-11 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 8e72ae4e6a30fb3e9088e565e7f36361575bb683
Author: Mergebot 
AuthorDate: Mon Jun 11 21:53:41 2018 -0700

Prepare repository for deployment.
---
 content/documentation/io/built-in/index.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/content/documentation/io/built-in/index.html 
b/content/documentation/io/built-in/index.html
index a40af1f..4891e2d 100644
--- a/content/documentation/io/built-in/index.html
+++ b/content/documentation/io/built-in/index.html
@@ -315,6 +315,10 @@
 RestIOJava
 https://issues.apache.org/jira/browse/BEAM-1946;>BEAM-1946
   
+  
+Apache KafkaPython
+https://issues.apache.org/jira/browse/BEAM-3788;>BEAM-3788
+  
 
 
   

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] branch asf-site updated (0fbf549 -> 8e72ae4)

2018-06-11 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from 0fbf549  Prepare repository for deployment.
 add 5fa7d4d  Add Python Kafka to list of in-progress I/O transforms.
 add 58cf452  This closes #462
 new 8e72ae4  Prepare repository for deployment.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/documentation/io/built-in/index.html | 4 
 src/documentation/io/built-in.md | 4 
 2 files changed, 8 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[jira] [Work logged] (BEAM-3079) Samza runner

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3079?focusedWorklogId=110949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110949
 ]

ASF GitHub Bot logged work on BEAM-3079:


Author: ASF GitHub Bot
Created on: 12/Jun/18 04:51
Start Date: 12/Jun/18 04:51
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5517: [BEAM-3079] 
Update samza-runner with more features and improvements
URL: https://github.com/apache/beam/pull/5517#issuecomment-396464975
 
 
   There's a merge commit I see from `master` in there. And in the diff I see 
`UsesImpulse` being added. Can you separate the upstream sync from the upgrade, 
or is that not possible?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110949)
Time Spent: 1.5h  (was: 1h 20m)

> Samza runner
> 
>
> Key: BEAM-3079
> URL: https://issues.apache.org/jira/browse/BEAM-3079
> Project: Beam
>  Issue Type: Wish
>  Components: runner-ideas
>Reporter: Xinyu Liu
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Apache Samza is a distributed data-processing platform which supports both 
> stream and batch processing. It'll be awesome if we can run BEAM's advanced 
> data transform and multi-language sdks on top of Samza.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam-site] branch mergebot updated (8034d43 -> 58cf452)

2018-06-11 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


 discard 8034d43  This closes #460
 discard f955d76  [BEAM-2852] Update Nexmark documentation with Kafka support
 new 5fa7d4d  Add Python Kafka to list of in-progress I/O transforms.
 new 58cf452  This closes #462

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8034d43)
\
 N -- N -- N   refs/heads/mergebot (58cf452)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/documentation/io/built-in.md  |  4 
 src/documentation/sdks/nexmark.md | 26 ++
 2 files changed, 6 insertions(+), 24 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 02/02: This closes #462

2018-06-11 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 58cf452346b4a29903f0bc527e2591287c9606ab
Merge: 0fbf549 5fa7d4d
Author: Mergebot 
AuthorDate: Mon Jun 11 21:50:57 2018 -0700

This closes #462

 src/documentation/io/built-in.md | 4 
 1 file changed, 4 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam-site] 01/02: Add Python Kafka to list of in-progress I/O transforms.

2018-06-11 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 5fa7d4d4350af40a9f54b351fab589a2b00a4dab
Author: chamik...@google.com 
AuthorDate: Tue Jun 5 11:18:01 2018 -0700

Add Python Kafka to list of in-progress I/O transforms.
---
 src/documentation/io/built-in.md | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/documentation/io/built-in.md b/src/documentation/io/built-in.md
index 44b6945..1dae879 100644
--- a/src/documentation/io/built-in.md
+++ b/src/documentation/io/built-in.md
@@ -125,4 +125,8 @@ This table contains I/O transforms that are currently 
planned or in-progress. St
 RestIOJava
 https://issues.apache.org/jira/browse/BEAM-1946;>BEAM-1946
   
+  
+Apache KafkaPython
+https://issues.apache.org/jira/browse/BEAM-3788;>BEAM-3788
+  
 

-- 
To stop receiving notification emails like this one, please contact
mergebot-r...@apache.org.


[beam] branch master updated (a8ce41b -> 26c5d37)

2018-06-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a8ce41b  Remove new Go precommit dependencies due to failures
 add e18844b  [BEAM-4415] Add jenkins job for HDFS ParquetIOIT
 new 26c5d37  Merge pull request #5520 from 
lgajowy/BEAM-4415_HDFS_PARQUET_JENKINS_JOB

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy   | 11 +++
 1 file changed, 11 insertions(+)

-- 
To stop receiving notification emails like this one, please contact
al...@apache.org.


[jira] [Work logged] (BEAM-4415) Enable HDFS based Performance Test for ParquetIO

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4415?focusedWorklogId=110948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110948
 ]

ASF GitHub Bot logged work on BEAM-4415:


Author: ASF GitHub Bot
Created on: 12/Jun/18 04:49
Start Date: 12/Jun/18 04:49
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #5520: [BEAM-4415] Add 
jenkins job for HDFS ParquetIOIT
URL: https://github.com/apache/beam/pull/5520
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy 
b/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
index 62a2346fa17..53b1777db56 100644
--- a/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
+++ b/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
@@ -78,6 +78,17 @@ def testsConfigurations = [
 numberOfRecords: '10',
 charset: 'UTF-8'
 ]
+],
+[
+jobName   : 'beam_PerformanceTests_ParquetIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for 
beam_PerformanceTests_ParquetIOIT on HDFS',
+itClass   : 
'org.apache.beam.sdk.io.parquet.ParquetIOIT',
+bqTable   : 
'beam_performance.parquetioit_hdfs_pkb_results',
+prCommitStatusName: 'Java ParquetIOPerformance Test on HDFS',
+prTriggerPhase: 'Run Java ParquetIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
 ]
 ]
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110948)
Time Spent: 2h 10m  (was: 2h)

> Enable HDFS based Performance Test for ParquetIO
> 
>
> Key: BEAM-4415
> URL: https://issues.apache.org/jira/browse/BEAM-4415
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-parquet, testing
>Reporter: Łukasz Gajowy
>Assignee: Łukasz Gajowy
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There already is a running job for ParquetIO on Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_ParquetIOIT/]
>  
> There are also Jenkins' Jobs running such tests on an HDFS cluster: 
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_AvroIOIT_HDFS/]
>  
> Therefore, we should provide a Performance Test for ParquetIO running on HDFS 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5520 from lgajowy/BEAM-4415_HDFS_PARQUET_JENKINS_JOB

2018-06-11 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 26c5d374d87e53e6de761c8bc666b0ef6edb65dc
Merge: a8ce41b e18844b
Author: Ahmet Altay 
AuthorDate: Mon Jun 11 21:49:53 2018 -0700

Merge pull request #5520 from lgajowy/BEAM-4415_HDFS_PARQUET_JENKINS_JOB

[BEAM-4415] Add jenkins job for HDFS ParquetIOIT

 .../jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy   | 11 +++
 1 file changed, 11 insertions(+)


-- 
To stop receiving notification emails like this one, please contact
al...@apache.org.


[jira] [Work logged] (BEAM-2281) call SqlFunctions in operator implementation

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2281?focusedWorklogId=110946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110946
 ]

ASF GitHub Bot logged work on BEAM-2281:


Author: ASF GitHub Bot
Created on: 12/Jun/18 04:23
Start Date: 12/Jun/18 04:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5596: [BEAM-2281] 
Improve SQL expression testing so SQL syntax tests suffice; delete extraneous 
unit tests
URL: https://github.com/apache/beam/pull/5596#issuecomment-396461521
 
 
   Rebased past the deletion of `BeamSql`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110946)
Time Spent: 3h  (was: 2h 50m)

> call SqlFunctions in operator implementation
> 
>
> Key: BEAM-2281
> URL: https://issues.apache.org/jira/browse/BEAM-2281
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Calcite has a collections of functions in 
> {{org.apache.calcite.runtime.SqlFunctions}}. It sounds a good source to 
> leverage when adding operators as {{BeamSqlExpression}}. 
> [~xumingming] [~app-tarush], any comments?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4538) readAll for BigQuery IO

2018-06-11 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-4538:
-

Assignee: (was: Chamikara Jayalath)

> readAll for BigQuery IO
> ---
>
> Key: BEAM-4538
> URL: https://issues.apache.org/jira/browse/BEAM-4538
> Project: Beam
>  Issue Type: Wish
>  Components: io-java-gcp
>Reporter: Ahmet Altay
>Priority: Major
>
> Customer reported:
> """
> BigQueryIO.readTableRows() does not support reading partitions specified by 
> side inputs; the only way to select partitions is to know them ahead of time 
> and pass them in on the command line in PipelineOptions for selection in a 
> WHERE clause.
>  
> Ideally we'd have something like a readAll() transform, like we have for 
> TextIO, JdbcIO etc. that allows the reading configuration to be dynamic in a 
> sense.
> """



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4538) readAll for BigQuery IO

2018-06-11 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-4538:
-

 Summary: readAll for BigQuery IO
 Key: BEAM-4538
 URL: https://issues.apache.org/jira/browse/BEAM-4538
 Project: Beam
  Issue Type: Wish
  Components: io-java-gcp
Reporter: Ahmet Altay
Assignee: Chamikara Jayalath


Customer reported:

"""

BigQueryIO.readTableRows() does not support reading partitions specified by 
side inputs; the only way to select partitions is to know them ahead of time 
and pass them in on the command line in PipelineOptions for selection in a 
WHERE clause.
 
Ideally we'd have something like a readAll() transform, like we have for 
TextIO, JdbcIO etc. that allows the reading configuration to be dynamic in a 
sense.
"""



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4537) CASE expression output type mismatch

2018-06-11 Thread Kai Jiang (JIRA)
Kai Jiang created BEAM-4537:
---

 Summary: CASE expression output type mismatch
 Key: BEAM-4537
 URL: https://issues.apache.org/jira/browse/BEAM-4537
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Kai Jiang
Assignee: Kai Jiang


TPC-DS query 84 involves with keyword coalesce(). coalesce will expand into 
case expression.

output type of CASE expression should match its family type with input types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4008) Futurize and fix python 2 compatibility for utils subpackage

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4008?focusedWorklogId=110940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110940
 ]

ASF GitHub Bot logged work on BEAM-4008:


Author: ASF GitHub Bot
Created on: 12/Jun/18 03:35
Start Date: 12/Jun/18 03:35
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #5336: 
[BEAM-4008] Futurize utils subpackage
URL: https://github.com/apache/beam/pull/5336#discussion_r194606891
 
 

 ##
 File path: sdks/python/apache_beam/utils/windowed_value.py
 ##
 @@ -178,33 +182,14 @@ def __repr__(self):
 self.windows,
 self.pane_info)
 
-  def __hash__(self):
-return (hash(self.value) +
-3 * self.timestamp_micros +
-7 * hash(self.windows) +
-11 * hash(self.pane_info))
-
-  # We'd rather implement __eq__, but Cython supports that via __richcmp__
-  # instead.  Fortunately __cmp__ is understood by both (but not by Python 3).
-  def __cmp__(left, right):  # pylint: disable=no-self-argument
-"""Compares left and right for equality.
-
-For performance reasons, doesn't actually impose an ordering
-on unequal values (always returning 1).
-"""
-if type(left) is not type(right):
-  return cmp(type(left), type(right))
+  def _key(self):
+return self.value, self.timestamp_micros, self.windows, self.pane_info
 
-# TODO(robertwb): Avoid the type checks?
-# Returns False (0) if equal, and True (1) if not.
-return not WindowedValue._typed_eq(left, right)
+  def __eq__(self, other):
+return type(self) == type(other) and self._key() == other._key()
 
-  @staticmethod
-  def _typed_eq(left, right):
-return (left.timestamp_micros == right.timestamp_micros
-and left.value == right.value
-and left.windows == right.windows
-and left.pane_info == right.pane_info)
+  def __hash__(self):
 
 Review comment:
   Thanks for your patience for awaiting the feedback, @RobbeSneyders.
   
   I doublechecked  performance of __hash__  using a microbenchmark. Previous 
implementation of __hash__ appears to be at least 7.5% faster, possibly due to 
overheads to create a tuple, and/or the way Cython compiles it.
   
   I suggest we revert to previous implementation of __hash__. For consistency  
we can also implement __eq__  by comparing fields directly.  
   
   My microbenchmark can be found in 
https://github.com/apache/beam/compare/master...tvalentyn:utils_futurization_benchmark,
 see 
https://github.com/tvalentyn/beam/commit/67d3df0f1248f43feb84503361f0071efc560fc2.
 It depends on the mini-framework that's being finalized in 
https://github.com/apache/beam/pull/5565.
   
   CC: @robertwb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110940)
Time Spent: 1h 20m  (was: 1h 10m)

> Futurize and fix python 2 compatibility for utils subpackage
> 
>
> Key: BEAM-4008
> URL: https://issues.apache.org/jira/browse/BEAM-4008
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4291) ArtifactRetrievalService that retrieves artifacts from a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=110935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110935
 ]

ASF GitHub Bot logged work on BEAM-4291:


Author: ASF GitHub Bot
Created on: 12/Jun/18 02:29
Start Date: 12/Jun/18 02:29
Worklog Time Spent: 10m 
  Work Description: axelmagn commented on issue #5584: [BEAM-4291] Add 
distributed artifact retrieval
URL: https://github.com/apache/beam/pull/5584#issuecomment-396445791
 
 
   do not merge yet.  this branch now contains WIP/FIX commits.
   
   Also, I've added a commit that uses the ProxyManifest.  Remaining TODO: 
write unit tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110935)
Time Spent: 2h 40m  (was: 2.5h)

> ArtifactRetrievalService that retrieves artifacts from a distributed 
> filesystem
> ---
>
> Key: BEAM-4291
> URL: https://issues.apache.org/jira/browse/BEAM-4291
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Axel Magnuson
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In agreement with how they are staged in BEAM-4290.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow

2018-06-11 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-4536:
---

 Summary: Python SDK: Pubsub reading with_attributes broken for 
Dataflow
 Key: BEAM-4536
 URL: https://issues.apache.org/jira/browse/BEAM-4536
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Affects Versions: 2.5.0
Reporter: Udi Meiri
Assignee: Udi Meiri


Using 
[ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True)
 will fail on Dataflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #740

2018-06-11 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-4535) Python tests are failing for Windows

2018-06-11 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-4535:


 Summary: Python tests are failing for Windows
 Key: BEAM-4535
 URL: https://issues.apache.org/jira/browse/BEAM-4535
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Chamikara Jayalath
Assignee: Udi Meiri


Error is:

Traceback (most recent call last):
  File "C:\Users\deft-testing-integra\python_sdk_download\apache_beam\io\fileba
sedsource_test.py", line 532, in test_read_auto_pattern
    compression_type=CompressionTypes.AUTO))
  File "C:\Users\deft-testing-integra\python_sdk_download\apache_beam\io\fileba
sedsource.py", line 119, in __init__
    self._validate()
  File "C:\Users\deft-testing-integra\python_sdk_download\apache_beam\options\v
alue_provider.py", line 133, in _f
    return fnc(self, *args, **kwargs)
  File "C:\Users\deft-testing-integra\python_sdk_download\apache_beam\io\fileba
sedsource.py", line 179, in _validate
    'No files found based on the file pattern %s' % pattern)
IOError: No files found based on the file pattern 
c:\windows\temp\tmpwon5_g\mytemp*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (BEAM-4475) Go precommit should include "go test ./..."

2018-06-11 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reopened BEAM-4475:
-

The PR broke Go precommits due to 
https://github.com/gogradle/gogradle/issues/225, so rolled back changes.

> Go precommit should include "go test ./..."
> ---
>
> Key: BEAM-4475
> URL: https://issues.apache.org/jira/browse/BEAM-4475
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Minor
> Fix For: 2.6.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would have prevented a recent break caused by a green PR: 
> https://github.com/apache/beam/pull/5558.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #739

2018-06-11 Thread Apache Jenkins Server
See 


--
[...truncated 17.19 MB...]

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_OUT
Dataflow SDK version: 2.6.0-SNAPSHOT

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_ERROR
Jun 12, 2018 12:49:00 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To access the Dataflow monitoring console, please navigate to 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-06-11_17_48_58-10758176347035259739?project=apache-beam-testing

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_OUT
Submitted job: 2018-06-11_17_48_58-10758176347035259739

org.apache.beam.sdk.io.gcp.spanner.SpannerWriteIT > testReportFailures 
STANDARD_ERROR
Jun 12, 2018 12:49:00 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud dataflow jobs --project=apache-beam-testing cancel 
--region=us-central1 2018-06-11_17_48_58-10758176347035259739
Jun 12, 2018 12:49:00 AM 
org.apache.beam.runners.dataflow.TestDataflowRunner run
INFO: Running Dataflow job 2018-06-11_17_48_58-10758176347035259739 with 0 
expected assertions.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:48:58.915Z: Autoscaling is enabled for job 
2018-06-11_17_48_58-10758176347035259739. The number of workers will be between 
1 and 1000.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:48:58.943Z: Autoscaling was automatically enabled for 
job 2018-06-11_17_48_58-10758176347035259739.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:01.789Z: Checking required Cloud APIs are enabled.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:01.893Z: Checking permissions granted to controller 
Service Account.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:05.341Z: Worker configuration: n1-standard-1 in 
us-central1-b.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:05.746Z: Expanding CoGroupByKey operations into 
optimizable parts.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:05.918Z: Expanding GroupByKey operations into 
optimizable parts.
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:05.935Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:06.123Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:06.144Z: Elided trivial flatten 
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:06.181Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:06.211Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:06.243Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Jun 12, 2018 12:49:08 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:49:06.277Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 

[beam] 01/01: Remove new Go precommit dependencies due to failures

2018-06-11 Thread herohde
This is an automated email from the ASF dual-hosted git repository.

herohde pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit a8ce41bc9713628052ed74a0b4b5b9fec5bcc34b
Merge: d906270 bd1ffe7
Author: Henning Rohde 
AuthorDate: Mon Jun 11 17:51:45 2018 -0700

Remove new Go precommit dependencies due to failures

 build.gradle | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
hero...@apache.org.


[beam] branch master updated (d906270 -> a8ce41b)

2018-06-11 Thread herohde
This is an automated email from the ASF dual-hosted git repository.

herohde pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from d906270  Merge pull request #5590: [SQL] Delete BeamSql class
 add bd1ffe7  Remove new Go precommit dependencies due to failures
 new a8ce41b  Remove new Go precommit dependencies due to failures

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 build.gradle | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
hero...@apache.org.


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110919
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:41
Start Date: 12/Jun/18 00:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5594: [BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not 
have dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194586156
 
 

 ##
 File path: build_rules.gradle
 ##
 @@ -1253,6 +1254,125 @@ ext.applyGrpcNature = {
 
 
/*/
 
+ext.applyPortabilityNature = {
+  println "applyPortabilityNature with " + (it ? "$it" : "default 
configuration") + " for project $project.name"
+  applyJavaNature(enableFindbugs: false, enableErrorProne: false, 
shadowClosure: {
+// guava uses the com.google.common and com.google.thirdparty package 
namespaces
+relocate "com.google.common", 
"org.apache.beam.vendor.guava.v20.com.google.common"
+relocate "com.google.thirdparty", 
"org.apache.beam.vendor.guava.v20.com.google.thirdparty"
+
+relocate "com.google.protobuf", 
"org.apache.beam.vendor.protobuf.v3.com.google.protobuf"
 
 Review comment:
   Is there a way to error out if a jar contains a package that is not 
relocated here ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110919)
Time Spent: 3h 40m  (was: 3.5h)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110917
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:41
Start Date: 12/Jun/18 00:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5594: [BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not 
have dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194587134
 
 

 ##
 File path: model/fn-execution/build.gradle
 ##
 @@ -17,17 +17,14 @@
  */
 
 apply from: project(":").file("build_rules.gradle")
-applyJavaNature(enableFindbugs: false, enableErrorProne: false)
-applyGrpcNature()
+applyPortabilityNature()
 
 description = "Apache Beam :: Model :: Fn Execution"
 ext.summary = "Portable definitions for execution user-defined functions."
 
 dependencies {
-  compile library.java.guava
-  shadow project(path: ":beam-model-pipeline", configuration: "shadow")
-  shadow library.java.protobuf_java
-  shadow library.java.grpc_core
-  shadow library.java.grpc_protobuf
-  shadow library.java.grpc_stub
+  // We purposely depend on the unshaded classes for compilation and
 
 Review comment:
   Can you explain how this works ? Won't compiled classes refer to unshaded 
classes ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110917)
Time Spent: 3.5h  (was: 3h 20m)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110918
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:41
Start Date: 12/Jun/18 00:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5594: [BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not 
have dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194586401
 
 

 ##
 File path: build_rules.gradle
 ##
 @@ -1253,6 +1254,125 @@ ext.applyGrpcNature = {
 
 
/*/
 
+ext.applyPortabilityNature = {
+  println "applyPortabilityNature with " + (it ? "$it" : "default 
configuration") + " for project $project.name"
+  applyJavaNature(enableFindbugs: false, enableErrorProne: false, 
shadowClosure: {
+// guava uses the com.google.common and com.google.thirdparty package 
namespaces
+relocate "com.google.common", 
"org.apache.beam.vendor.guava.v20.com.google.common"
+relocate "com.google.thirdparty", 
"org.apache.beam.vendor.guava.v20.com.google.thirdparty"
+
+relocate "com.google.protobuf", 
"org.apache.beam.vendor.protobuf.v3.com.google.protobuf"
+relocate "com.google.gson", 
"org.apache.beam.vendor.gson.v2.com.google.gson"
+relocate "io.grpc", "org.apache.beam.vendor.grpc.v1.io.grpc"
+relocate "com.google.auth", 
"org.apache.beam.vendor.google_auth_library_credentials.v0_9_1.com.google.auth"
 
 Review comment:
   Add a function for formatting relocation path from various parts 
("org.apache.beam", library name, package, etc) ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110918)
Time Spent: 3.5h  (was: 3h 20m)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110920
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:41
Start Date: 12/Jun/18 00:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5594: [BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not 
have dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194586642
 
 

 ##
 File path: build_rules.gradle
 ##
 @@ -1253,6 +1254,125 @@ ext.applyGrpcNature = {
 
 
/*/
 
+ext.applyPortabilityNature = {
+  println "applyPortabilityNature with " + (it ? "$it" : "default 
configuration") + " for project $project.name"
+  applyJavaNature(enableFindbugs: false, enableErrorProne: false, 
shadowClosure: {
+// guava uses the com.google.common and com.google.thirdparty package 
namespaces
+relocate "com.google.common", 
"org.apache.beam.vendor.guava.v20.com.google.common"
+relocate "com.google.thirdparty", 
"org.apache.beam.vendor.guava.v20.com.google.thirdparty"
+
+relocate "com.google.protobuf", 
"org.apache.beam.vendor.protobuf.v3.com.google.protobuf"
+relocate "com.google.gson", 
"org.apache.beam.vendor.gson.v2.com.google.gson"
+relocate "io.grpc", "org.apache.beam.vendor.grpc.v1.io.grpc"
+relocate "com.google.auth", 
"org.apache.beam.vendor.google_auth_library_credentials.v0_9_1.com.google.auth"
+relocate "com.google.api", 
"org.apache.beam.vendor.proto_google_common_protos.v1.com.google.api"
+relocate "com.google.cloud", 
"org.apache.beam.vendor.proto_google_common_protos.v1.com.google.cloud"
+relocate "com.google.logging", 
"org.apache.beam.vendor.proto_google_common_protos.v1.com.google.logging"
+relocate "com.google.longrunning", 
"org.apache.beam.vendor.proto_google_common_protos.v1.com.google.longrunning"
+relocate "com.google.rpc", 
"org.apache.beam.vendor.proto_google_common_protos.v1.com.google.rpc"
+relocate "com.google.type", 
"org.apache.beam.vendor.proto_google_common_protos.v1.com.google.type"
+relocate "io.opencensus", 
"org.apache.beam.vendor.opencensus.v0_11.io.opencensus"
+
+// Adapted from 
https://github.com/grpc/grpc-java/blob/e283f70ad91f99c7fee8b31b605ef12a4f9b1690/netty/shaded/build.gradle#L41
+relocate "io.netty", "org.apache.beam.vendor.netty.v4.io.netty"
 
 Review comment:
   Why not upgrade to gRPC netty shaded instead ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110920)
Time Spent: 3h 40m  (was: 3.5h)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4534) Errors in the mobile gaming GameStats pipeline

2018-06-11 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-4534:
---

 Summary: Errors in the mobile gaming GameStats pipeline
 Key: BEAM-4534
 URL: https://issues.apache.org/jira/browse/BEAM-4534
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Affects Versions: 2.5.0
Reporter: Pablo Estrada
Assignee: Ahmet Altay


When running the GameStats pipeline, a bunch of dataflow-related erros seem to 
be occurring. This is not a 2.5.0 release blocker.

---
2018-06-11 14:39:35.725 PDT
Check failed: global_data_producers_.emplace(tag, 
computation->computation_id()) .second side0
Expand all | Collapse all {
 insertId:  "5542278371401226255:829484:0:12190"  
 jsonPayload: {
  line:  "environment.cc:60"   
  message:  "Check failed: global_data_producers_.emplace(tag, 
computation->computation_id()) .second side0"   
  thread:  "743"   
 }
 labels: {
  compute.googleapis.com/resource_id:  "5542278371401226255"   
  compute.googleapis.com/resource_name:  
"beamapp-boyuanz-061121170-06111417-y1cw-harness-sqw8"   
  compute.googleapis.com/resource_type:  "instance"   
  dataflow.googleapis.com/job_id:  "2018-06-11_14_17_05-8449944902450710343"   
  dataflow.googleapis.com/job_name:  "beamapp-boyuanz-0611211703-960649"   
  dataflow.googleapis.com/region:  "us-central1"   
 }
 logName:  
"projects/google.com:clouddfe/logs/dataflow.googleapis.com%2Fshuffler"  
 receiveTimestamp:  "2018-06-11T21:40:04.215594890Z"  
 resource: {
  labels: {
   job_id:  "2018-06-11_14_17_05-8449944902450710343"
   job_name:  "beamapp-boyuanz-0611211703-960649"
   project_id:  "google.com:clouddfe"
   region:  "us-central1"
   step_id:  ""
  }
  type:  "dataflow_step"   
 }
 severity:  "CRITICAL"  
 timestamp:  "2018-06-11T21:39:35.725620Z"  
}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110914
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:29
Start Date: 12/Jun/18 00:29
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5594: [BEAM-4481, 
BEAM-4484] Start vendoring portability dependencies to not have dependency 
conflicts
URL: https://github.com/apache/beam/pull/5594#issuecomment-396427930
 
 
   @kennknowles It isn't a separate package yet, its just being shaded into the 
model jars since one of the issues that I didn't figure out was how to get the 
generated code to use vendored classes.
   
   As a side note, I was hoping that you would use this as an example in the PR 
that you created that vendored guava and I would be able to re-use what you 
created.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110914)
Time Spent: 3h 20m  (was: 3h 10m)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4388) Support optimized logical plan

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4388?focusedWorklogId=110910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110910
 ]

ASF GitHub Bot logged work on BEAM-4388:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:26
Start Date: 12/Jun/18 00:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #5481: [BEAM-4388] Support 
optimized logical plan
URL: https://github.com/apache/beam/pull/5481#issuecomment-396427579
 
 
   That is correct, `BeamQueryPlanner.java` no longer matches the JDBC version. 
(There is basically no change you can make to that file without also changing 
code in calcite core.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110910)
Time Spent: 4.5h  (was: 4h 20m)

> Support optimized logical plan
> --
>
> Key: BEAM-4388
> URL: https://issues.apache.org/jira/browse/BEAM-4388
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Before converting into Beam Pipeline physical plan, logical plan should be 
> optimized and it will be super helpful for efficiently executing Beam 
> PTransforms pipeline. 
> Calcite has two ways for optimizing logical plan (HepPlanner and 
> VolcanoPlanner). We can support VolcanoPlanner first and apply calcite 
> builtin optimize rules (like 
> FilterJoinRule.FILTER_ON_JOIN) to sql query optimize plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110908
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:25
Start Date: 12/Jun/18 00:25
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5594: 
[BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not have 
dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194585437
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactServiceStager.java
 ##
 @@ -57,6 +54,9 @@
 import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceStub;
 import org.apache.beam.sdk.util.MoreFutures;
 import org.apache.beam.sdk.util.ThrowingSupplier;
+import org.apache.beam.vendor.grpc.v1.io.grpc.Channel;
 
 Review comment:
   Having sorted order is easiest and supported by the most amount of tooling 
so deviating makes those peoples lives worse.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110908)
Time Spent: 3h 10m  (was: 3h)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_XmlIOIT #380

2018-06-11 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-4474] Ensure unbounded Dataflow jobs are submitted as streaming

[kedin] [SQL] Manually read rows in TestBigQuery

[robertwb] Use ByteCountingOutputStream for estimate_size.

[kedin] Add polling assertion support to TestBigQuery

[kedin] [SQL] Add Pubsub to BigQuery E2E integration test

[kedin] [SQL] Delete BeamSql

[kedin] [SQL] Rename QueryTransform to SqlTransform

[wcn] Encode position ids.

[robertwb] Explain why module namespace is modified in Cython codepath.

--
[...truncated 231.34 KB...]
INFO: 2018-06-12T00:05:49.967Z: Fusing consumer Calculate 
hashcode/Combine.perKey(Hashing)/Combine.GroupedValues into Calculate 
hashcode/Combine.perKey(Hashing)/GroupByKey/Read
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.002Z: Fusing consumer Write xml 
files/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Values/Values/Map 
into Write xml 
files/WriteFiles/GatherTempFileResults/Reshuffle.ViaRandomKey/Reshuffle/ExpandIterable
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.034Z: Fusing consumer Find 
files/Reshuffle.ViaRandomKey/Reshuffle/ExpandIterable into Find 
files/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.056Z: Fusing consumer Write xml 
files/WriteFiles/GatherTempFileResults/Reshuffle/ExpandIterable into Write xml 
files/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/GroupByWindow
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.083Z: Fusing consumer Write xml 
files/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign
 into Write xml 
files/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Pair with 
random key
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.113Z: Fusing consumer Calculate 
hashcode/Combine.perKey(Hashing)/GroupByKey/Reify into Calculate 
hashcode/Combine.perKey(Hashing)/GroupByKey+Calculate 
hashcode/Combine.perKey(Hashing)/Combine.GroupedValues/Partial
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.149Z: Fusing consumer Read xml 
files/ReadAllViaFileBasedSource/Reshuffle/Reshuffle/GroupByKey/GroupByWindow 
into Read xml 
files/ReadAllViaFileBasedSource/Reshuffle/Reshuffle/GroupByKey/Read
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.178Z: Fusing consumer Calculate 
hashcode/Combine.perKey(Hashing)/GroupByKey+Calculate 
hashcode/Combine.perKey(Hashing)/Combine.GroupedValues/Partial into Calculate 
hashcode/WithKeys/AddKeys/Map
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.212Z: Fusing consumer Find 
files/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Write into Find 
files/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/Reify
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.233Z: Fusing consumer Map xml records to 
strings/Map into Read xml files/ReadAllViaFileBasedSource/Read ranges
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.262Z: Fusing consumer Calculate 
hashcode/WithKeys/AddKeys/Map into Map xml records to strings/Map
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.295Z: Fusing consumer Find 
files/Reshuffle.ViaRandomKey/Reshuffle/Window.Into()/Window.Assign into Find 
files/Reshuffle.ViaRandomKey/Pair with random key
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.323Z: Fusing consumer Write xml 
files/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/ExpandIterable
 into Write xml 
files/WriteFiles/FinalizeTempFileBundles/Reshuffle.ViaRandomKey/Reshuffle/GroupByKey/GroupByWindow
Jun 12, 2018 12:05:58 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-12T00:05:50.348Z: Fusing consumer Write xml 
files/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/GroupByWindow into 
Write xml files/WriteFiles/GatherTempFileResults/Reshuffle/GroupByKey/Read
Jun 12, 2018 12:05:58 

[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110907
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:23
Start Date: 12/Jun/18 00:23
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5594: 
[BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not have 
dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194585279
 
 

 ##
 File path: build_rules.gradle
 ##
 @@ -560,6 +560,7 @@ ext.applyJavaNature = {
   shadowJar ({
 classifier = "shaded"
 mergeServiceFiles()
+zip64 true
 
 Review comment:
   It allows jar files to contain 2^16 or more files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110907)
Time Spent: 3h  (was: 2h 50m)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110906
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:22
Start Date: 12/Jun/18 00:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #5594: 
[BEAM-4481, BEAM-4484] Start vendoring portability dependencies to not have 
dependency conflicts
URL: https://github.com/apache/beam/pull/5594#discussion_r194585154
 
 

 ##
 File path: build_rules.gradle
 ##
 @@ -1253,6 +1254,125 @@ ext.applyGrpcNature = {
 
 
/*/
 
+ext.applyPortabilityNature = {
+  println "applyPortabilityNature with " + (it ? "$it" : "default 
configuration") + " for project $project.name"
+  applyJavaNature(enableFindbugs: false, enableErrorProne: false, 
shadowClosure: {
 
 Review comment:
   These modules that this applies to only have generated code so I don't 
believe error prone will be of use which is why I don't expose it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110906)
Time Spent: 2h 50m  (was: 2h 40m)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Go_GradleBuild #248

2018-06-11 Thread Apache Jenkins Server
See 


Changes:

[herohde] [BEAM-4474] Ensure unbounded Dataflow jobs are submitted as streaming

[kedin] [SQL] Manually read rows in TestBigQuery

[robertwb] Use ByteCountingOutputStream for estimate_size.

[kedin] Add polling assertion support to TestBigQuery

[kedin] [SQL] Add Pubsub to BigQuery E2E integration test

[kedin] [SQL] Delete BeamSql

[kedin] [SQL] Rename QueryTransform to SqlTransform

[wcn] Encode position ids.

[robertwb] Explain why module namespace is modified in Cython codepath.

--
[...truncated 294.26 KB...]
Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/bytes GLO}]
2: ParDo [In(Main): []uint8 <- {1: []uint8/bytes GLO}] -> [Out: T -> {2: 
int/int[varintz] GLO}]
3: ParDo [In(Main): int <- {2: int/int[varintz] GLO}] -> [Out: int -> {3: 
int/int[varintz] GLO} Out: int -> {4: int/int[varintz] GLO} Out: int -> {5: 
int/int[varintz] GLO} Out: int -> {6: int/int[varintz] GLO} Out: int -> {7: 
int/int[varintz] GLO}]
2018/06/12 00:17:19 Plan[plan]:
8: Impulse[0]
1: Discard
2: Discard
3: Discard
4: Discard
5: Discard
6: ParDo[beam.partitionFn] Out:[1 2 3 4 5]
7: ParDo[beam.createFn] Out:[6]
--- PASS: TestPartitionFailures (0.00s)
=== RUN   TestPartitionFlattenIdentity
2018/06/12 00:17:19 Pipeline:
2018/06/12 00:17:19 Nodes: {1: []uint8/bytes GLO}
{2: int/int[varintz] GLO}
{3: int/int[varintz] GLO}
{4: int/int[varintz] GLO}
{5: int/int[varintz] GLO}
{6: []uint8/bytes GLO}
{7: int/int[varintz] GLO}
{8: int/int[varintz] GLO}
{9: int/int[varintz] GLO}
Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/bytes GLO}]
2: ParDo [In(Main): []uint8 <- {1: []uint8/bytes GLO}] -> [Out: T -> {2: 
int/int[varintz] GLO}]
3: ParDo [In(Main): int <- {2: int/int[varintz] GLO}] -> [Out: int -> {3: 
int/int[varintz] GLO} Out: int -> {4: int/int[varintz] GLO}]
4: Flatten [In(Main): int <- {3: int/int[varintz] GLO} In(Main): int <- {4: 
int/int[varintz] GLO}] -> [Out: int -> {5: int/int[varintz] GLO}]
5: Impulse [] -> [Out: []uint8 -> {6: []uint8/bytes GLO}]
6: ParDo [In(Main): []uint8 <- {6: []uint8/bytes GLO} In(Iter): T <- {5: 
int/int[varintz] GLO} In(Iter): T <- {2: int/int[varintz] GLO}] -> [Out: T -> 
{7: int/int[varintz] GLO} Out: T -> {8: int/int[varintz] GLO} Out: T -> {9: 
int/int[varintz] GLO}]
7: ParDo [In(Main): X <- {7: int/int[varintz] GLO}] -> []
8: ParDo [In(Main): X <- {9: int/int[varintz] GLO}] -> []
2018/06/12 00:17:19 Plan[plan]:
12: Impulse[0]
13: Impulse[0]
1: ParDo[passert.failFn] Out:[]
2: Discard
3: ParDo[passert.failFn] Out:[]
4: ParDo[passert.diffFn] Out:[1 2 3]
5: wait[2] Out:4
6: buffer[6]. wait:5 Out:4
7: buffer[7]. wait:5 Out:4
8: Flatten[2]. Out:buffer[6]. wait:5 Out:4
9: ParDo[beam.partitionFn] Out:[8 8]
10: Multiplex. Out:[9 7]
11: ParDo[beam.createFn] Out:[10]
2018/06/12 00:17:19 wait[5] unblocked w/ 1 [false]
2018/06/12 00:17:19 wait[5] done
2018/06/12 00:17:19 Pipeline:
2018/06/12 00:17:19 Nodes: {1: []uint8/bytes GLO}
{2: int/int[varintz] GLO}
{3: int/int[varintz] GLO}
{4: int/int[varintz] GLO}
{5: int/int[varintz] GLO}
{6: []uint8/bytes GLO}
{7: int/int[varintz] GLO}
{8: int/int[varintz] GLO}
{9: int/int[varintz] GLO}
Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/bytes GLO}]
2: ParDo [In(Main): []uint8 <- {1: []uint8/bytes GLO}] -> [Out: T -> {2: 
int/int[varintz] GLO}]
3: ParDo [In(Main): int <- {2: int/int[varintz] GLO}] -> [Out: int -> {3: 
int/int[varintz] GLO} Out: int -> {4: int/int[varintz] GLO}]
4: Flatten [In(Main): int <- {3: int/int[varintz] GLO} In(Main): int <- {4: 
int/int[varintz] GLO}] -> [Out: int -> {5: int/int[varintz] GLO}]
5: Impulse [] -> [Out: []uint8 -> {6: []uint8/bytes GLO}]
6: ParDo [In(Main): []uint8 <- {6: []uint8/bytes GLO} In(Iter): T <- {5: 
int/int[varintz] GLO} In(Iter): T <- {2: int/int[varintz] GLO}] -> [Out: T -> 
{7: int/int[varintz] GLO} Out: T -> {8: int/int[varintz] GLO} Out: T -> {9: 
int/int[varintz] GLO}]
7: ParDo [In(Main): X <- {7: int/int[varintz] GLO}] -> []
8: ParDo [In(Main): X <- {9: int/int[varintz] GLO}] -> []
2018/06/12 00:17:19 Plan[plan]:
12: Impulse[0]
13: Impulse[0]
1: ParDo[passert.failFn] Out:[]
2: Discard
3: ParDo[passert.failFn] Out:[]
4: ParDo[passert.diffFn] Out:[1 2 3]
5: wait[2] Out:4
6: buffer[6]. wait:5 Out:4
7: buffer[7]. wait:5 Out:4
8: Flatten[2]. Out:buffer[6]. wait:5 Out:4
9: ParDo[beam.partitionFn] Out:[8 8]
10: Multiplex. Out:[9 7]
11: ParDo[beam.createFn] Out:[10]
2018/06/12 00:17:19 wait[5] unblocked w/ 1 [false]
2018/06/12 00:17:19 wait[5] done
2018/06/12 00:17:19 Pipeline:
2018/06/12 00:17:19 Nodes: {1: []uint8/bytes GLO}
{2: int/int[varintz] GLO}
{3: int/int[varintz] GLO}
{4: int/int[varintz] GLO}
{5: int/int[varintz] GLO}
{6: int/int[varintz] GLO}
{7: int/int[varintz] GLO}
{8: int/int[varintz] GLO}
{9: int/int[varintz] GLO}
{10: int/int[varintz] GLO}
{11: []uint8/bytes GLO}
{12: int/int[varintz] GLO}
{13: int/int[varintz] GLO}
{14: int/int[varintz] 

[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110904
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 12/Jun/18 00:02
Start Date: 12/Jun/18 00:02
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #5601: [BEAM-4533] [SQL] 
Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#issuecomment-396423519
 
 
   run go precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110904)
Time Spent: 1h 10m  (was: 1h)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110900
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:57
Start Date: 11/Jun/18 23:57
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396422844
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110900)
Time Spent: 49h  (was: 48h 50m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 49h
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110895
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:47
Start Date: 11/Jun/18 23:47
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396421142
 
 
   Run Dependency Check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110895)
Time Spent: 48h 50m  (was: 48h 40m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 48h 50m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110894
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:43
Start Date: 11/Jun/18 23:43
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396420521
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110894)
Time Spent: 48h 40m  (was: 48.5h)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 48h 40m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=110893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110893
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:34
Start Date: 11/Jun/18 23:34
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on a change in pull request #5592: 
[BEAM-3773] [SQL] Add support for setting PipelineOptions
URL: https://github.com/apache/beam/pull/5592#discussion_r194578203
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -29,11 +30,17 @@
 /** BeamRelNode to replace a {@code TableScan} node. */
 public class BeamIOSourceRel extends TableScan implements BeamRelNode {
 
-  private BeamSqlTable sqlTable;
+  private final BeamSqlTable sqlTable;
+  private final Map pipelineOptions;
 
 Review comment:
   That's the thread. My though of `pipelineOptions on session` level is, only 
keep `pipelineOptions` in `BeamCalciteSchema`, and expose `BeamCalciteSchema` 
where is needed. 
   I use `BeamSqlCli` on version 2.4.0, it refers to [these 
lines](https://github.com/apilloud/beam/blob/9c1cd62cdf568cdd3ebb87f44771ded298984fde/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlCli.java#L58-L63).
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110893)
Time Spent: 16h 20m  (was: 16h 10m)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=110892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110892
 ]

ASF GitHub Bot logged work on BEAM-4283:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:29
Start Date: 11/Jun/18 23:29
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #5464: [BEAM-4283] Write 
Nexmark execution times to bigquery
URL: https://github.com/apache/beam/pull/5464#issuecomment-396418368
 
 
   @echauchot please let me know if this is ready for another look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110892)
Time Spent: 7h  (was: 6h 50m)

> Export nexmark execution times to bigQuery
> --
>
> Key: BEAM-4283
> URL: https://issues.apache.org/jira/browse/BEAM-4283
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-nexmark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Nexmark only outputs the results collection to bigQuery and prints in the 
> console the execution times. To supervise Nexmark execution times, we need to 
> store them as well per runner/query/mode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110889
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:24
Start Date: 11/Jun/18 23:24
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396417556
 
 
   Run Dependency Check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110889)
Time Spent: 48.5h  (was: 48h 20m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 48.5h
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110888
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 23:21
Start Date: 11/Jun/18 23:21
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396417011
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110888)
Time Spent: 48h 20m  (was: 48h 10m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 48h 20m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110878
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 22:40
Start Date: 11/Jun/18 22:40
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5591: [BEAM-4290] Beam 
File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#issuecomment-396409541
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110878)
Time Spent: 14h 40m  (was: 14.5h)

> ArtifactStagingService that stages to a distributed filesystem
> --
>
> Key: BEAM-4290
> URL: https://issues.apache.org/jira/browse/BEAM-4290
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Using the job's staging directory from PipelineOptions.
> Physical layout on the distributed filesystem is TBD but it should allow for 
> arbitrary filenames and ideally for eventually avoiding uploading artifacts 
> that are already there.
> Handling credentials is TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110876
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 22:31
Start Date: 11/Jun/18 22:31
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #5591: [BEAM-4290] Beam 
File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#issuecomment-396407813
 
 
   retest please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110876)
Time Spent: 14.5h  (was: 14h 20m)

> ArtifactStagingService that stages to a distributed filesystem
> --
>
> Key: BEAM-4290
> URL: https://issues.apache.org/jira/browse/BEAM-4290
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Using the job's staging directory from PipelineOptions.
> Physical layout on the distributed filesystem is TBD but it should allow for 
> arbitrary filenames and ideally for eventually avoiding uploading artifacts 
> that are already there.
> Handling credentials is TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4271) Executable stages allow side input coders to be set and/or queried

2018-06-11 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde updated BEAM-4271:

Labels: portability  (was: )

> Executable stages allow side input coders to be set and/or queried
> --
>
> Key: BEAM-4271
> URL: https://issues.apache.org/jira/browse/BEAM-4271
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ben Sidhom
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> ProcessBundleDescriptors may contain side input references from inner 
> PTransforms. These side inputs do not have explicit coders; instead, SDK 
> harnesses use the PCollection coders by default.
> Using the default PCollection coder as specified at pipeline construction is 
> in general not the correct thing to do. When PCollection elements are 
> materialized, any coders unknown to a runner a length-prefixed. This means 
> that materialized PCollections do not use their original element coders. Side 
> inputs are delivered to SDKs via MultimapSideInput StateRequests. The 
> responses to these requests are expected to contain all of the values for a 
> given key (and window), coded with the PCollection KV.value coder, 
> concatenated. However, at the time of serving these requests on the runner 
> side, we do not have enough information to reconstruct the original value 
> coders.
> There are different ways to address this issue. For example:
>  * Modify the associated PCollection coder to match the coder that the runner 
> uses to materialize elements. This means that anywhere a given PCollection is 
> used within a given bundle, it will use the runner-safe coder. This may 
> introduce inefficiencies but should be "correct".
>  * Annotate side inputs with explicit coders. This guarantees that the key 
> and value coders used by the runner match the coders used by SDKs. 
> Furthermore, it allows the _runners_ to specify coders. This involves changes 
> to the proto models and all SDKs.
>  * Annotate side input state requests with both key and value coders. This 
> inverts the expected responsibility and has the SDK determine runner coders. 
> Additionally, because runners do not understand all SDK types, additional 
> coder substitution will need to be done at request handling time to make sure 
> that the requested coder can be instantiated and will remain consistent with 
> the SDK coder. This requires only small changes to SDKs because they may opt 
> to use their default PCollection coders.
> All of the these approaches have their own downsides. Explicit side input 
> coders is probably the right thing to do long-term, but the simplest change 
> for now is to modify PCollection coders to match exactly how they're 
> materialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2855) Implement a python version of the nexmark queries

2018-06-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

María GH reassigned BEAM-2855:
--

Assignee: María GH

> Implement a python version of the nexmark queries
> -
>
> Key: BEAM-2855
> URL: https://issues.apache.org/jira/browse/BEAM-2855
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, testing
>Reporter: Ismaël Mejía
>Assignee: María GH
>Priority: Minor
>  Labels: newbie, nexmark, starter
>
> Currently we have a Java only implementation of Nexmark, a python based 
> implementation would be nice to have to validate the direct and dataflow 
> runners, but also to validate the new support of multiple SDKs in multiple 
> runners via the runner/fn API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110872
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 22:21
Start Date: 11/Jun/18 22:21
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5601: 
[BEAM-4533] [SQL] Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#discussion_r194565931
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLTest.java
 ##
 @@ -129,6 +129,25 @@ public void testParseCreateTable_withoutLocation() throws 
Exception {
 tableProvider.getTables().get("person"));
   }
 
+  @Test
+  public void testParseCreateTable_minimal() throws Exception {
+TestTableProvider tableProvider = new TestTableProvider();
+BeamSqlEnv env = BeamSqlEnv.withTableProvider(tableProvider);
+
+env.executeDdl("CREATE TABLE person (\n" + "id INT) \n" + "TYPE text \n");
 
 Review comment:
   Thanks for spotting that! Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110872)
Time Spent: 1h  (was: 50m)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110870=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110870
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 22:20
Start Date: 11/Jun/18 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5601: 
[BEAM-4533] [SQL] Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#discussion_r194565665
 
 

 ##
 File path: sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl
 ##
 @@ -160,7 +160,12 @@ SqlCreate SqlCreateTable(Span s, boolean replace) :
  ifNotExists = IfNotExistsOpt()
 id = CompoundIdentifier()
 fieldList = FieldListParens()
- type = StringLiteral()
+
+(
+type = StringLiteral()
+|
+type = SimpleIdentifier()
 
 Review comment:
   If we only keep `SimpleIdentifier()`, this would be a breaking change to the 
synax.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110870)
Time Spent: 40m  (was: 0.5h)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110871
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 22:20
Start Date: 11/Jun/18 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5601: 
[BEAM-4533] [SQL] Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#discussion_r194565665
 
 

 ##
 File path: sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl
 ##
 @@ -160,7 +160,12 @@ SqlCreate SqlCreateTable(Span s, boolean replace) :
  ifNotExists = IfNotExistsOpt()
 id = CompoundIdentifier()
 fieldList = FieldListParens()
- type = StringLiteral()
+
+(
+type = StringLiteral()
+|
+type = SimpleIdentifier()
 
 Review comment:
   If we only keep `SimpleIdentifier()`, this would be a breaking change to the 
syntax.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110871)
Time Spent: 50m  (was: 40m)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #738

2018-06-11 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110863
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 22:04
Start Date: 11/Jun/18 22:04
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396402142
 
 
   Run Dependency Check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110863)
Time Spent: 48h 10m  (was: 48h)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 48h 10m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Java_GradleBuild #737

2018-06-11 Thread Apache Jenkins Server
See 


Changes:

[kedin] [SQL] Manually read rows in TestBigQuery

[kedin] Add polling assertion support to TestBigQuery

[kedin] [SQL] Add Pubsub to BigQuery E2E integration test

--
[...truncated 17.61 MB...]
INFO: 2018-06-11T21:59:00.971Z: Checking permissions granted to controller 
Service Account.
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:04.477Z: Worker configuration: n1-standard-1 in 
us-central1-b.
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:04.808Z: Expanding CoGroupByKey operations into 
optimizable parts.
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.013Z: Expanding GroupByKey operations into 
optimizable parts.
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.044Z: Lifting ValueCombiningMappingFns into 
MergeBucketsMappingFns
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.248Z: Fusing adjacent ParDo, Read, Write, and 
Flatten operations
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.280Z: Elided trivial flatten 
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.311Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map into SpannerIO.Write/Write 
mutations to Cloud Spanner/Create seed/Read(CreateSource)
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.337Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Read information schema into SpannerIO.Write/Write 
mutations to Cloud Spanner/Wait.OnSignal/Wait/Map
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.371Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Write
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/ParDo(UseWindowHashAsKeyAndWindowAsSortKey)
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.400Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/ParDo(IsmRecordForSingularValuePerWindow) 
into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/BatchViewOverrides.GroupByWindowHashAsKeyAndWindowAsSortKey/BatchViewOverrides.GroupByKeyAndSortValuesOnly/Read
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.435Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/WithKeys/AddKeys/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Read information schema
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.456Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/GroupByKey/Read
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.488Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Values/Values/Map
 into SpannerIO.Write/Write mutations to Cloud Spanner/Schema 
View/Combine.GloballyAsSingletonView/Combine.globally(Singleton)/Combine.perKey(Singleton)/Combine.GroupedValues/Extract
Jun 11, 2018 9:59:09 PM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2018-06-11T21:59:05.512Z: Fusing consumer SpannerIO.Write/Write 
mutations to Cloud Spanner/Schema 

[jira] [Updated] (BEAM-214) Create Parquet IO

2018-06-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-214:
--
Issue Type: New Feature  (was: Improvement)

> Create Parquet IO
> -
>
> Key: BEAM-214
> URL: https://issues.apache.org/jira/browse/BEAM-214
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Neville Li
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
> Fix For: 2.5.0
>
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Would be nice to support Parquet files with projection and predicates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-214) Create Parquet IO

2018-06-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-214.
---
   Resolution: Fixed
Fix Version/s: 2.5.0

> Create Parquet IO
> -
>
> Key: BEAM-214
> URL: https://issues.apache.org/jira/browse/BEAM-214
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Neville Li
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
> Fix For: 2.5.0
>
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Would be nice to support Parquet files with projection and predicates.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4291) ArtifactRetrievalService that retrieves artifacts from a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=110861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110861
 ]

ASF GitHub Bot logged work on BEAM-4291:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:46
Start Date: 11/Jun/18 21:46
Worklog Time Spent: 10m 
  Work Description: axelmagn commented on a change in pull request #5584: 
[BEAM-4291] Add distributed artifact retrieval
URL: https://github.com/apache/beam/pull/5584#discussion_r194558294
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/DfsArtifactRetrievalService.java
 ##
 @@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.artifact;
+
+
+import com.google.protobuf.ByteString;
+import io.grpc.stub.StreamObserver;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.ReadableByteChannel;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi;
+import org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+
+/**
+ * An {@link ArtifactRetrievalService} that uses distributed file systems as 
its backing storage.
 
 Review comment:
   Ok, after checking out your usage of ProxyManifest, I see what you mean.  I 
agree that it will resolve the concerns @jkff has about decoupling naming from 
storage, and allow the retrieval token to just be the manifest URI.  SGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110861)
Time Spent: 2.5h  (was: 2h 20m)

> ArtifactRetrievalService that retrieves artifacts from a distributed 
> filesystem
> ---
>
> Key: BEAM-4291
> URL: https://issues.apache.org/jira/browse/BEAM-4291
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Axel Magnuson
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In agreement with how they are staged in BEAM-4290.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=110860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110860
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:45
Start Date: 11/Jun/18 21:45
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5592: 
[BEAM-3773] [SQL] Add support for setting PipelineOptions
URL: https://github.com/apache/beam/pull/5592#discussion_r194557947
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -29,11 +30,17 @@
 /** BeamRelNode to replace a {@code TableScan} node. */
 public class BeamIOSourceRel extends TableScan implements BeamRelNode {
 
-  private BeamSqlTable sqlTable;
+  private final BeamSqlTable sqlTable;
+  private final Map pipelineOptions;
 
 Review comment:
   I remember a discussion about removing `BeamSqlEnv` and replacing it with 
`meta/store` and `CalciteSchema` (which is implemented by `BeamCalciteSchema`), 
is that the one? I'm still not following how the concepts of system and session 
apply here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110860)
Time Spent: 16h 10m  (was: 16h)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4388) Support optimized logical plan

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4388?focusedWorklogId=110859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110859
 ]

ASF GitHub Bot logged work on BEAM-4388:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:44
Start Date: 11/Jun/18 21:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5481: [BEAM-4388] 
Support optimized logical plan
URL: https://github.com/apache/beam/pull/5481#issuecomment-396397305
 
 
   R: @apilloud 
   
   I think we may not have resolution on whether JDBC and non-JDBC are sharing 
the rules.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110859)
Time Spent: 4h 20m  (was: 4h 10m)

> Support optimized logical plan
> --
>
> Key: BEAM-4388
> URL: https://issues.apache.org/jira/browse/BEAM-4388
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Before converting into Beam Pipeline physical plan, logical plan should be 
> optimized and it will be super helpful for efficiently executing Beam 
> PTransforms pipeline. 
> Calcite has two ways for optimizing logical plan (HepPlanner and 
> VolcanoPlanner). We can support VolcanoPlanner first and apply calcite 
> builtin optimize rules (like 
> FilterJoinRule.FILTER_ON_JOIN) to sql query optimize plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4291) ArtifactRetrievalService that retrieves artifacts from a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=110857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110857
 ]

ASF GitHub Bot logged work on BEAM-4291:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:38
Start Date: 11/Jun/18 21:38
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5584: 
[BEAM-4291] Add distributed artifact retrieval
URL: https://github.com/apache/beam/pull/5584#discussion_r194556385
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/DfsArtifactRetrievalService.java
 ##
 @@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.artifact;
+
+
+import com.google.protobuf.ByteString;
+import io.grpc.stub.StreamObserver;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.ReadableByteChannel;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi;
+import org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+
+/**
+ * An {@link ArtifactRetrievalService} that uses distributed file systems as 
its backing storage.
 
 Review comment:
   Artifact location can never be assumed to be $token/$artifactName.
   
   The artifacts are saved with a different name than the provided artifact 
name and the mapping of artifact name vs file name is stored in the manifest 
file.
   The ProxyManifest is the proto which describe the format of manifest file. 
Its an optional format but it fits well in our use case.
   
   By providing the base dir name as the token, the retrieval service assumes 
the folder structure (what i mean is it assumes that the manifest will be  
$stagingToken/MANIFEST) As we only need manifest file to get locations for all 
the artifacts, passing the path to manifest file as token will remove implicit 
directory structure and simplify the code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110857)
Time Spent: 2h 20m  (was: 2h 10m)

> ArtifactRetrievalService that retrieves artifacts from a distributed 
> filesystem
> ---
>
> Key: BEAM-4291
> URL: https://issues.apache.org/jira/browse/BEAM-4291
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Axel Magnuson
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In agreement with how they are staged in BEAM-4290.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=110855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110855
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:34
Start Date: 11/Jun/18 21:34
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5592: 
[BEAM-3773] [SQL] Add support for setting PipelineOptions
URL: https://github.com/apache/beam/pull/5592#discussion_r194555475
 
 

 ##
 File path: sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl
 ##
 @@ -280,4 +285,48 @@ Schema.FieldType SimpleType() :
 }
 }
 
+SqlSetOptionBeam SqlSetOptionBeam(Span s, String scope) :
+{
+SqlIdentifier name;
+final SqlNode val;
+}
+{
+(
+ {
 
 Review comment:
   The JDBC path gives us no way to extend execute. When accessing Beam SQL 
through that path all outputs from the parser must be self executing, which 
`SqlSetOptionBeam` is. Are you already using `SET/RESET` for something?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110855)
Time Spent: 16h  (was: 15h 50m)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4388) Support optimized logical plan

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4388?focusedWorklogId=110854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110854
 ]

ASF GitHub Bot logged work on BEAM-4388:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:34
Start Date: 11/Jun/18 21:34
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #5481: [BEAM-4388] Support 
optimized logical plan
URL: https://github.com/apache/beam/pull/5481#issuecomment-396394592
 
 
   @XuMingmin  I'll update it very quickly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110854)
Time Spent: 4h 10m  (was: 4h)

> Support optimized logical plan
> --
>
> Key: BEAM-4388
> URL: https://issues.apache.org/jira/browse/BEAM-4388
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Before converting into Beam Pipeline physical plan, logical plan should be 
> optimized and it will be super helpful for efficiently executing Beam 
> PTransforms pipeline. 
> Calcite has two ways for optimizing logical plan (HepPlanner and 
> VolcanoPlanner). We can support VolcanoPlanner first and apply calcite 
> builtin optimize rules (like 
> FilterJoinRule.FILTER_ON_JOIN) to sql query optimize plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110853
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:31
Start Date: 11/Jun/18 21:31
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #5601: 
[BEAM-4533] [SQL] Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#discussion_r194554614
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamDDLTest.java
 ##
 @@ -129,6 +129,25 @@ public void testParseCreateTable_withoutLocation() throws 
Exception {
 tableProvider.getTables().get("person"));
   }
 
+  @Test
+  public void testParseCreateTable_minimal() throws Exception {
+TestTableProvider tableProvider = new TestTableProvider();
+BeamSqlEnv env = BeamSqlEnv.withTableProvider(tableProvider);
+
+env.executeDdl("CREATE TABLE person (\n" + "id INT) \n" + "TYPE text \n");
 
 Review comment:
   multiline strings get mangled by spottlessApply


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110853)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4388) Support optimized logical plan

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4388?focusedWorklogId=110851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110851
 ]

ASF GitHub Bot logged work on BEAM-4388:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:31
Start Date: 11/Jun/18 21:31
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on issue #5481: [BEAM-4388] Support 
optimized logical plan
URL: https://github.com/apache/beam/pull/5481#issuecomment-396394035
 
 
   @vectorijk  could you solve the conflict? 
   
   @kennknowles  any pending changes required as there's a RED sign? Let's 
close this PR asap.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110851)
Time Spent: 4h  (was: 3h 50m)

> Support optimized logical plan
> --
>
> Key: BEAM-4388
> URL: https://issues.apache.org/jira/browse/BEAM-4388
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Before converting into Beam Pipeline physical plan, logical plan should be 
> optimized and it will be super helpful for efficiently executing Beam 
> PTransforms pipeline. 
> Calcite has two ways for optimizing logical plan (HepPlanner and 
> VolcanoPlanner). We can support VolcanoPlanner first and apply calcite 
> builtin optimize rules (like 
> FilterJoinRule.FILTER_ON_JOIN) to sql query optimize plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110852
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:31
Start Date: 11/Jun/18 21:31
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #5601: 
[BEAM-4533] [SQL] Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#discussion_r194554328
 
 

 ##
 File path: sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl
 ##
 @@ -160,7 +160,12 @@ SqlCreate SqlCreateTable(Span s, boolean replace) :
  ifNotExists = IfNotExistsOpt()
 id = CompoundIdentifier()
 fieldList = FieldListParens()
- type = StringLiteral()
+
+(
+type = StringLiteral()
+|
+type = SimpleIdentifier()
 
 Review comment:
   Should we only keep `SimpleIdentifier()` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110852)
Time Spent: 0.5h  (was: 20m)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4481) Remove duplicate dependency declarations from runners/direct-java

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4481?focusedWorklogId=110849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110849
 ]

ASF GitHub Bot logged work on BEAM-4481:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:26
Start Date: 11/Jun/18 21:26
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5594: [BEAM-4481, 
BEAM-4484] Start vendoring portability dependencies to not have dependency 
conflicts
URL: https://github.com/apache/beam/pull/5594#issuecomment-396392713
 
 
   Where is the bit where the vendored jar is published?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110849)
Time Spent: 2h 40m  (was: 2.5h)

> Remove duplicate dependency declarations from runners/direct-java
> -
>
> Key: BEAM-4481
> URL: https://issues.apache.org/jira/browse/BEAM-4481
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> beam-model-pipeline and others are duplicated in the dependency list



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110846
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:25
Start Date: 11/Jun/18 21:25
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194551115
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java
 ##
 @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws 
Exception {
 assertFiles(files.keySet(), stagingToken);
   }
 
+  @Test
+  public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception {
+String stagingSession = "123";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken,
+Paths.get(srcDir.toString(), 
fileName).toAbsolutePath().toString(), fileName);
+  } catch (Exception e) {
+Assert.fail(e.getMessage());
+  }
+  
metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build());
+});
+  }
+} finally {
+  executorService.shutdown();
+  executorService.awaitTermination(2, TimeUnit.SECONDS);
+}
+
+String stagingToken = commitManifest(stagingSessionToken, metadata);
+Assert.assertEquals(
+Paths.get(destDir.toAbsolutePath().toString(), stagingSession, 
"MANIFEST").toString(),
+stagingToken);
+assertFiles(files.keySet(), stagingToken);
+  }
+
+  @Test
+  public void putArtifactsMultipleFilesConcurrentSessionsTest() throws 
Exception {
+String stagingSession1 = "123";
+String stagingSession2 = "abc";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken1 = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession1, 
destDir.toUri().getPath());
+String stagingSessionToken2 = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession2, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken1,
 
 Review comment:
   The API does not allow uploading multiple chunks of same file in parallel.
   
   This testcase simulates file uploaded by 2 separate session in parallel. 
   
   I will create 2 sets of files here which should make the session completely 
different.


This is an automated message from the Apache Git Service.
To respond to the message, please log on 

[beam] 01/01: Merge pull request #5590: [SQL] Delete BeamSql class

2018-06-11 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit d906270f243bb4de20a7f0baf514667590c8c494
Merge: 75a619a db69d4a
Author: Kenn Knowles 
AuthorDate: Mon Jun 11 14:19:24 2018 -0700

Merge pull request #5590: [SQL] Delete BeamSql class

 .../apache/beam/sdk/extensions/sql/BeamSql.java| 92 --
 .../sql/{QueryTransform.java => SqlTransform.java} | 81 ---
 .../sdk/extensions/sql/example/BeamSqlExample.java |  8 +-
 .../extensions/sql/example/BeamSqlPojoExample.java |  8 +-
 .../beam/sdk/extensions/sql/impl/BeamSqlEnv.java   |  8 +-
 .../sql/BeamSqlDslAggregationCovarianceTest.java   | 10 ++-
 .../extensions/sql/BeamSqlDslAggregationTest.java  | 27 ---
 .../sql/BeamSqlDslAggregationVarianceTest.java |  8 +-
 .../sdk/extensions/sql/BeamSqlDslArrayTest.java| 25 +++---
 .../sdk/extensions/sql/BeamSqlDslFilterTest.java   | 10 +--
 .../sdk/extensions/sql/BeamSqlDslJoinTest.java | 12 +--
 .../extensions/sql/BeamSqlDslNestedRowsTest.java   | 10 +--
 .../sdk/extensions/sql/BeamSqlDslProjectTest.java  | 12 +--
 .../sdk/extensions/sql/BeamSqlDslUdfUdafTest.java  | 17 ++--
 .../beam/sdk/extensions/sql/BeamSqlMapTest.java|  8 +-
 .../sdk/extensions/sql/BeamSqlNonAsciiTest.java|  4 +-
 .../extensions/sql/InferredRowCoderSqlTest.java|  8 +-
 .../beam/sdk/extensions/sql/JsonToRowSqlTest.java  |  2 +-
 ...BeamSqlBuiltinFunctionsIntegrationTestBase.java |  4 +-
 .../BeamSqlDateFunctionsIntegrationTest.java   |  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery0.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery1.java| 10 ++-
 .../beam/sdk/nexmark/queries/sql/SqlQuery2.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery3.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery5.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery7.java|  4 +-
 26 files changed, 180 insertions(+), 208 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[beam] branch master updated (75a619a -> d906270)

2018-06-11 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 75a619a  Merge pull request #5589: [SQL] Pubsub to BigQuery e2e 
integration test
 add 1cfa0f9  [SQL] Delete BeamSql
 add db69d4a  [SQL] Rename QueryTransform to SqlTransform
 new d906270  Merge pull request #5590: [SQL] Delete BeamSql class

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../apache/beam/sdk/extensions/sql/BeamSql.java| 92 --
 .../sql/{QueryTransform.java => SqlTransform.java} | 81 ---
 .../sdk/extensions/sql/example/BeamSqlExample.java |  8 +-
 .../extensions/sql/example/BeamSqlPojoExample.java |  8 +-
 .../beam/sdk/extensions/sql/impl/BeamSqlEnv.java   |  8 +-
 .../sql/BeamSqlDslAggregationCovarianceTest.java   | 10 ++-
 .../extensions/sql/BeamSqlDslAggregationTest.java  | 27 ---
 .../sql/BeamSqlDslAggregationVarianceTest.java |  8 +-
 .../sdk/extensions/sql/BeamSqlDslArrayTest.java| 25 +++---
 .../sdk/extensions/sql/BeamSqlDslFilterTest.java   | 10 +--
 .../sdk/extensions/sql/BeamSqlDslJoinTest.java | 12 +--
 .../extensions/sql/BeamSqlDslNestedRowsTest.java   | 10 +--
 .../sdk/extensions/sql/BeamSqlDslProjectTest.java  | 12 +--
 .../sdk/extensions/sql/BeamSqlDslUdfUdafTest.java  | 17 ++--
 .../beam/sdk/extensions/sql/BeamSqlMapTest.java|  8 +-
 .../sdk/extensions/sql/BeamSqlNonAsciiTest.java|  4 +-
 .../extensions/sql/InferredRowCoderSqlTest.java|  8 +-
 .../beam/sdk/extensions/sql/JsonToRowSqlTest.java  |  2 +-
 ...BeamSqlBuiltinFunctionsIntegrationTestBase.java |  4 +-
 .../BeamSqlDateFunctionsIntegrationTest.java   |  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery0.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery1.java| 10 ++-
 .../beam/sdk/nexmark/queries/sql/SqlQuery2.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery3.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery5.java|  4 +-
 .../beam/sdk/nexmark/queries/sql/SqlQuery7.java|  4 +-
 26 files changed, 180 insertions(+), 208 deletions(-)
 delete mode 100644 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSql.java
 rename 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/{QueryTransform.java
 => SqlTransform.java} (61%)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110843
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:15
Start Date: 11/Jun/18 21:15
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194550351
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java
 ##
 @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws 
Exception {
 assertFiles(files.keySet(), stagingToken);
   }
 
+  @Test
+  public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception {
+String stagingSession = "123";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken,
+Paths.get(srcDir.toString(), 
fileName).toAbsolutePath().toString(), fileName);
+  } catch (Exception e) {
+Assert.fail(e.getMessage());
+  }
+  
metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build());
+});
+  }
+} finally {
+  executorService.shutdown();
+  executorService.awaitTermination(2, TimeUnit.SECONDS);
+}
+
+String stagingToken = commitManifest(stagingSessionToken, metadata);
+Assert.assertEquals(
+Paths.get(destDir.toAbsolutePath().toString(), stagingSession, 
"MANIFEST").toString(),
+stagingToken);
+assertFiles(files.keySet(), stagingToken);
+  }
+
+  @Test
+  public void putArtifactsMultipleFilesConcurrentSessionsTest() throws 
Exception {
+String stagingSession1 = "123";
+String stagingSession2 = "abc";
+Map files = new HashMap<>();
 
 Review comment:
   ImmutableMap.of(k,v) is only applicable for 5 k-v while we are having more 
KVs.
   Using builders to create the immutable map.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110843)
Time Spent: 14h 10m  (was: 14h)

> ArtifactStagingService that stages to a distributed filesystem
> --
>
> Key: BEAM-4290
> URL: https://issues.apache.org/jira/browse/BEAM-4290
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> Using the job's staging directory from PipelineOptions.
> Physical layout on the distributed filesystem is TBD but it should allow for 
> arbitrary filenames and ideally for eventually avoiding uploading artifacts 
> that are already there.
> Handling credentials is TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=110842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110842
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:13
Start Date: 11/Jun/18 21:13
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #5592: 
[BEAM-3773] [SQL] Add support for setting PipelineOptions
URL: https://github.com/apache/beam/pull/5592#discussion_r194549906
 
 

 ##
 File path: sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl
 ##
 @@ -160,7 +160,12 @@ SqlCreate SqlCreateTable(Span s, boolean replace) :
  ifNotExists = IfNotExistsOpt()
 id = CompoundIdentifier()
 fieldList = FieldListParens()
- type = StringLiteral()
+
 
 Review comment:
   Ok, dropping this and adding it to #5601 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110842)
Time Spent: 15h 50m  (was: 15h 40m)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110841
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:12
Start Date: 11/Jun/18 21:12
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #5601: [BEAM-4533] [SQL] 
Support unquoted table types
URL: https://github.com/apache/beam/pull/5601#issuecomment-396388861
 
 
   R: @XuMingmin 
   CC: @kennknowles @xumingming @akedin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110841)
Time Spent: 20m  (was: 10m)

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4533?focusedWorklogId=110840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110840
 ]

ASF GitHub Bot logged work on BEAM-4533:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:11
Start Date: 11/Jun/18 21:11
Worklog Time Spent: 10m 
  Work Description: apilloud opened a new pull request #5601: [BEAM-4533] 
[SQL] Support unquoted table types
URL: https://github.com/apache/beam/pull/5601
 
 
   For example `TYPE text` in addition to `TYPE 'text'`
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110840)
Time Spent: 10m
Remaining Estimate: 0h

> Beam SQL should support unquoted types
> --
>
> Key: BEAM-4533
> URL: https://issues.apache.org/jira/browse/BEAM-4533
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (57300e2 -> 75a619a)

2018-06-11 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 57300e2  Merge pull request #5509.
 add 698e285  [SQL] Manually read rows in TestBigQuery
 add 21d6762  Add polling assertion support to TestBigQuery
 add 7d2b34a  [SQL] Add Pubsub to BigQuery E2E integration test
 new 75a619a  Merge pull request #5589: [SQL] Pubsub to BigQuery e2e 
integration test

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../sdk/extensions/sql/BeamSqlCliPubsubTest.java   |  54 -
 .../sdk/extensions/sql/PubsubToBigqueryIT.java | 134 
 .../meta/provider/bigquery/BigQueryWriteIT.java|  59 --
 .../beam/sdk/io/gcp/bigquery/BigQueryUtils.java|  99 ++---
 .../beam/sdk/io/gcp/bigquery/TestBigQuery.java | 227 +++--
 5 files changed, 415 insertions(+), 158 deletions(-)
 delete mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliPubsubTest.java
 create mode 100644 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/PubsubToBigqueryIT.java

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[beam] 01/01: Merge pull request #5589: [SQL] Pubsub to BigQuery e2e integration test

2018-06-11 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 75a619aebab467ae9a358d3c1f8d8899390ac3a6
Merge: 57300e2 7d2b34a
Author: Kenn Knowles 
AuthorDate: Mon Jun 11 14:11:15 2018 -0700

Merge pull request #5589: [SQL] Pubsub to BigQuery e2e integration test

 .../sdk/extensions/sql/BeamSqlCliPubsubTest.java   |  54 -
 .../sdk/extensions/sql/PubsubToBigqueryIT.java | 134 
 .../meta/provider/bigquery/BigQueryWriteIT.java|  59 --
 .../beam/sdk/io/gcp/bigquery/BigQueryUtils.java|  99 ++---
 .../beam/sdk/io/gcp/bigquery/TestBigQuery.java | 227 +++--
 5 files changed, 415 insertions(+), 158 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Created] (BEAM-4533) Beam SQL should support unquoted types

2018-06-11 Thread Andrew Pilloud (JIRA)
Andrew Pilloud created BEAM-4533:


 Summary: Beam SQL should support unquoted types
 Key: BEAM-4533
 URL: https://issues.apache.org/jira/browse/BEAM-4533
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Andrew Pilloud
Assignee: Andrew Pilloud


TYPE text in addition to TYPE 'text'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4291) ArtifactRetrievalService that retrieves artifacts from a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=110838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110838
 ]

ASF GitHub Bot logged work on BEAM-4291:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:06
Start Date: 11/Jun/18 21:06
Worklog Time Spent: 10m 
  Work Description: axelmagn commented on a change in pull request #5584: 
[BEAM-4291] Add distributed artifact retrieval
URL: https://github.com/apache/beam/pull/5584#discussion_r194547955
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/DfsArtifactRetrievalService.java
 ##
 @@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.artifact;
+
+
+import com.google.protobuf.ByteString;
+import io.grpc.stub.StreamObserver;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.Channels;
+import java.nio.channels.ReadableByteChannel;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi;
+import org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+
+/**
+ * An {@link ArtifactRetrievalService} that uses distributed file systems as 
its backing storage.
 
 Review comment:
   @angoenka I think I disagree.  If we do that, then it complicates the 
implementation of artifact retrieval because the storage location is no longer 
`$token/$artifactName`.  I'm not sure the gains from disambiguating manifest 
location justify the added complexity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110838)
Time Spent: 2h 10m  (was: 2h)

> ArtifactRetrievalService that retrieves artifacts from a distributed 
> filesystem
> ---
>
> Key: BEAM-4291
> URL: https://issues.apache.org/jira/browse/BEAM-4291
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Axel Magnuson
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In agreement with how they are staged in BEAM-4290.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110836
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194535246
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110830
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194525812
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110835
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194535202
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110832
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194527254
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110833
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194547405
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
 
 Review comment:
   sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110833)
Time Spent: 13h 40m  (was: 13.5h)

> ArtifactStagingService that stages to a distributed filesystem
> --
>
> Key: BEAM-4290
> URL: https://issues.apache.org/jira/browse/BEAM-4290
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Using the job's staging directory from PipelineOptions.
> Physical layout on the distributed filesystem is TBD but it should allow for 
> arbitrary filenames and ideally for eventually avoiding uploading artifacts 
> that are already there.
> Handling credentials is TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110834
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194536154
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110837
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194534830
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110829
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194526541
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110831
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:04
Start Date: 11/Jun/18 21:04
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194531422
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110828
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:03
Start Date: 11/Jun/18 21:03
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194545548
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java
 ##
 @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws 
Exception {
 assertFiles(files.keySet(), stagingToken);
   }
 
+  @Test
+  public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception {
+String stagingSession = "123";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken,
+Paths.get(srcDir.toString(), 
fileName).toAbsolutePath().toString(), fileName);
+  } catch (Exception e) {
+Assert.fail(e.getMessage());
+  }
+  
metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build());
+});
+  }
+} finally {
+  executorService.shutdown();
+  executorService.awaitTermination(2, TimeUnit.SECONDS);
+}
+
+String stagingToken = commitManifest(stagingSessionToken, metadata);
+Assert.assertEquals(
+Paths.get(destDir.toAbsolutePath().toString(), stagingSession, 
"MANIFEST").toString(),
+stagingToken);
+assertFiles(files.keySet(), stagingToken);
+  }
+
+  @Test
+  public void putArtifactsMultipleFilesConcurrentSessionsTest() throws 
Exception {
+String stagingSession1 = "123";
+String stagingSession2 = "abc";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken1 = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession1, 
destDir.toUri().getPath());
+String stagingSessionToken2 = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession2, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken1,
 
 Review comment:
   There should be at least *some* difference in what is being placed to 
actually verify there is no cross-staging interference. Granted, there's also 
no need to place a huge number of multi-chunk files here; a single file for one 
and two for the other would be perfectly fine. 


This is an automated message from the Apache Git Service.
To respond to 

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110825
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:02
Start Date: 11/Jun/18 21:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194546744
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java
 ##
 @@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.artifact;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.hash.Hashing;
+import com.google.protobuf.util.JsonFormat;
+import io.grpc.stub.StreamObserver;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ArtifactMetadata;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestRequest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.CommitManifestResponse;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactApi.ProxyManifest.Location;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactMetadata;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactRequest;
+import org.apache.beam.model.jobmanagement.v1.ArtifactApi.PutArtifactResponse;
+import 
org.apache.beam.model.jobmanagement.v1.ArtifactStagingServiceGrpc.ArtifactStagingServiceImplBase;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.MoveOptions.StandardMoveOptions;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.util.MimeTypes;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link ArtifactStagingServiceImplBase} based on beam file system.
+ */
+public class BeamFileSystemArtifactStagingService extends 
ArtifactStagingServiceImplBase implements
+FnService {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(BeamFileSystemArtifactStagingService.class);
+  private static final ObjectMapper MAPPER = new ObjectMapper();
+  // Use UTF8 for all text encoding.
+  private static final Charset CHARSET = StandardCharsets.UTF_8;
+  public static final String MANIFEST = "MANIFEST";
+
+  @Override
+  public StreamObserver putArtifact(
+  StreamObserver responseObserver) {
+return new PutArtifactStreamObserver(responseObserver);
+  }
+
+  @Override
+  public void commitManifest(
+  CommitManifestRequest request, StreamObserver 
responseObserver) {
+try {
+  ResourceId manifestResourceId = 
getManifestFileResourceId(request.getStagingSessionToken());
+  ResourceId artifactDirResourceId = 
getArtifactDirResourceId(request.getStagingSessionToken());
+  ProxyManifest.Builder proxyManifestBuilder = ProxyManifest.newBuilder()
+  .setManifest(request.getManifest());
+  for (ArtifactMetadata artifactMetadata : 
request.getManifest().getArtifactList()) {
+proxyManifestBuilder.addLocation(Location.newBuilder()
+.setName(artifactMetadata.getName())
+.setUri(artifactDirResourceId
+.resolve(encodedFileName(artifactMetadata), 
StandardResolveOptions.RESOLVE_FILE)
+.toString()).build());
+  }
+  try (WritableByteChannel manifestWritableByteChannel = FileSystems
+  .create(manifestResourceId, MimeTypes.TEXT)) {
+

[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110827
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:02
Start Date: 11/Jun/18 21:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194544535
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java
 ##
 @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws 
Exception {
 assertFiles(files.keySet(), stagingToken);
   }
 
+  @Test
+  public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception {
+String stagingSession = "123";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken,
+Paths.get(srcDir.toString(), 
fileName).toAbsolutePath().toString(), fileName);
+  } catch (Exception e) {
+Assert.fail(e.getMessage());
+  }
+  
metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build());
+});
+  }
+} finally {
+  executorService.shutdown();
+  executorService.awaitTermination(2, TimeUnit.SECONDS);
+}
+
+String stagingToken = commitManifest(stagingSessionToken, metadata);
+Assert.assertEquals(
+Paths.get(destDir.toAbsolutePath().toString(), stagingSession, 
"MANIFEST").toString(),
+stagingToken);
+assertFiles(files.keySet(), stagingToken);
+  }
+
+  @Test
+  public void putArtifactsMultipleFilesConcurrentSessionsTest() throws 
Exception {
+String stagingSession1 = "123";
+String stagingSession2 = "abc";
+Map files = new HashMap<>();
 
 Review comment:
   Nit (here and above): I prefer using ImmutableMaps for constants like this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110827)
Time Spent: 12h 50m  (was: 12h 40m)

> ArtifactStagingService that stages to a distributed filesystem
> --
>
> Key: BEAM-4290
> URL: https://issues.apache.org/jira/browse/BEAM-4290
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Using the job's staging directory from PipelineOptions.
> Physical layout on the distributed filesystem is TBD but it should allow for 
> arbitrary filenames and ideally for eventually avoiding uploading artifacts 
> that are already there.
> Handling credentials is TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4290) ArtifactStagingService that stages to a distributed filesystem

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4290?focusedWorklogId=110826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110826
 ]

ASF GitHub Bot logged work on BEAM-4290:


Author: ASF GitHub Bot
Created on: 11/Jun/18 21:02
Start Date: 11/Jun/18 21:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5591: 
[BEAM-4290] Beam File System based ArtifactStagingService
URL: https://github.com/apache/beam/pull/5591#discussion_r194545548
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingServiceTest.java
 ##
 @@ -251,17 +258,129 @@ public void putArtifactsMultipleFilesTest() throws 
Exception {
 assertFiles(files.keySet(), stagingToken);
   }
 
+  @Test
+  public void putArtifactsMultipleFilesConcurrentlyTest() throws Exception {
+String stagingSession = "123";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken,
+Paths.get(srcDir.toString(), 
fileName).toAbsolutePath().toString(), fileName);
+  } catch (Exception e) {
+Assert.fail(e.getMessage());
+  }
+  
metadata.add(ArtifactMetadata.newBuilder().setName(fileName).build());
+});
+  }
+} finally {
+  executorService.shutdown();
+  executorService.awaitTermination(2, TimeUnit.SECONDS);
+}
+
+String stagingToken = commitManifest(stagingSessionToken, metadata);
+Assert.assertEquals(
+Paths.get(destDir.toAbsolutePath().toString(), stagingSession, 
"MANIFEST").toString(),
+stagingToken);
+assertFiles(files.keySet(), stagingToken);
+  }
+
+  @Test
+  public void putArtifactsMultipleFilesConcurrentSessionsTest() throws 
Exception {
+String stagingSession1 = "123";
+String stagingSession2 = "abc";
+Map files = new HashMap<>();
+files.put("file5cb", (DATA_1KB / 2) /*500b*/);
+files.put("file1kb", DATA_1KB /*1 kb*/);
+files.put("file15cb", (DATA_1KB * 3) / 2  /*1.5 kb*/);
+files.put("nested/file1kb", DATA_1KB /*1 kb*/);
+files.put("file10kb", 10 * DATA_1KB /*10 kb*/);
+files.put("file100kb", 100 * DATA_1KB /*100 kb*/);
+
+final String text = "abcdefghinklmop\n";
+files.forEach((fileName, size) -> {
+  Path filePath = Paths.get(srcDir.toString(), fileName).toAbsolutePath();
+  try {
+Files.createDirectories(filePath.getParent());
+Files.write(filePath,
+Strings.repeat(text, Double.valueOf(Math.ceil(size * 1.0 / 
text.length())).intValue())
+.getBytes(CHARSET));
+  } catch (IOException ignored) {
+  }
+});
+String stagingSessionToken1 = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession1, 
destDir.toUri().getPath());
+String stagingSessionToken2 = BeamFileSystemArtifactStagingService
+.generateStagingSessionToken(stagingSession2, 
destDir.toUri().getPath());
+
+List metadata = new ArrayList<>();
+ExecutorService executorService = Executors.newFixedThreadPool(8);
+try {
+  for (String fileName : files.keySet()) {
+executorService.execute(() -> {
+  try {
+putArtifact(stagingSessionToken1,
 
 Review comment:
   There should be at least *some* difference in what is being placed to 
actually verify there is no cross-staging interference. Granted, there's also 
no need to place a huge number of files here, a single file for one and two for 
the other would be perfectly fine. 


This is an automated message from the Apache Git Service.
To respond to the message, 

[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=110821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110821
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 11/Jun/18 20:54
Start Date: 11/Jun/18 20:54
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on a change in pull request #5592: 
[BEAM-3773] [SQL] Add support for setting PipelineOptions
URL: https://github.com/apache/beam/pull/5592#discussion_r194544724
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -29,11 +30,17 @@
 /** BeamRelNode to replace a {@code TableScan} node. */
 public class BeamIOSourceRel extends TableScan implements BeamRelNode {
 
-  private BeamSqlTable sqlTable;
+  private final BeamSqlTable sqlTable;
+  private final Map pipelineOptions;
 
 Review comment:
   There was a discussion before to expose `BeamSqlEnv` in `BeamRelNode` or 
`BeamRelNode.toPTransform()`. This may be an option to have SET in session 
level. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110821)
Time Spent: 15h 40m  (was: 15.5h)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=110820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110820
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 11/Jun/18 20:50
Start Date: 11/Jun/18 20:50
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on a change in pull request #5592: 
[BEAM-3773] [SQL] Add support for setting PipelineOptions
URL: https://github.com/apache/beam/pull/5592#discussion_r194543480
 
 

 ##
 File path: sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl
 ##
 @@ -280,4 +285,48 @@ Schema.FieldType SimpleType() :
 }
 }
 
+SqlSetOptionBeam SqlSetOptionBeam(Span s, String scope) :
+{
+SqlIdentifier name;
+final SqlNode val;
+}
+{
+(
+ {
 
 Review comment:
   I suppose this is about CLI part, and `BeamSqlCli` is the main point. --Pls 
correct me if any mis-understand since I see the test code uses `JdbcDriver`. 
In the prior way, you could extend  `execute` method to support `SqlSetOption` 
directly.
   
   Ps, my CLI(a bash script which invokes BeamSqlCli) looks like:
   ```
   CLI= BeamSqlCli.new
   loop:
 EXPLAIN ...
 | (INSERT... ) SELECT ...
 | CREATE TABLE ...
 | SET/UNSET ...
 | EXIT
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110820)
Time Spent: 15.5h  (was: 15h 20m)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110818
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 20:46
Start Date: 11/Jun/18 20:46
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396381647
 
 
   Run Dependency Check


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110818)
Time Spent: 48h  (was: 47h 50m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4302) Fix to dependency hell

2018-06-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4302?focusedWorklogId=110815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110815
 ]

ASF GitHub Bot logged work on BEAM-4302:


Author: ASF GitHub Bot
Created on: 11/Jun/18 20:41
Start Date: 11/Jun/18 20:41
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #5406: Do Not Merge, 
[BEAM-4302] add beam dependency checks
URL: https://github.com/apache/beam/pull/5406#issuecomment-396380041
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 110815)
Time Spent: 47h 50m  (was: 47h 40m)

> Fix to dependency hell
> --
>
> Key: BEAM-4302
> URL: https://issues.apache.org/jira/browse/BEAM-4302
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 47h 50m
>  Remaining Estimate: 0h
>
> # For Java, a daily Jenkins test to compare version of all Beam dependencies 
> to the latest version available in Maven Central.
>  # For Python, a daily Jenkins test to compare versions of all Beam 
> dependencies to the latest version available in PyPI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >