[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139946
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 31/Aug/18 04:52
Start Date: 31/Aug/18 04:52
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #6309: [BEAM-5274][SQL] 
Check if iterator.hasNext
URL: https://github.com/apache/beam/pull/6309#issuecomment-417551085
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139946)
Time Spent: 1.5h  (was: 1h 20m)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5264) Reference DirectRunner implementation of Python user state and timers API

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5264?focusedWorklogId=139933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139933
 ]

ASF GitHub Bot logged work on BEAM-5264:


Author: ASF GitHub Bot
Created on: 31/Aug/18 01:42
Start Date: 31/Aug/18 01:42
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #6304: [BEAM-5264] 
Reference DirectRunner implementation of Python User State and Timers API
URL: https://github.com/apache/beam/pull/6304#issuecomment-417522850
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139933)
Time Spent: 20m  (was: 10m)

> Reference DirectRunner implementation of Python user state and timers API
> -
>
> Key: BEAM-5264
> URL: https://issues.apache.org/jira/browse/BEAM-5264
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue tracks the reference DirectRunner implementation of the Beam 
> Python User State and Timer API, described here: 
> [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139928
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 31/Aug/18 00:38
Start Date: 31/Aug/18 00:38
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6282: [BEAM-4495] Website 
pre-commit job
URL: https://github.com/apache/beam/pull/6282#issuecomment-417512291
 
 
   @pabloem I'll try to "run seed job" and verify that the precommit runs on 
Jenkins. I'll try that tomorrow since builds.apache.org is still down.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139928)
Time Spent: 7.5h  (was: 7h 20m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139929
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 31/Aug/18 00:38
Start Date: 31/Aug/18 00:38
Worklog Time Spent: 10m 
  Work Description: udim edited a comment on issue #6282: [BEAM-4495] 
Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#issuecomment-417512291
 
 
   @pabloem I'll try to "run seed job" and verify that the precommit runs on 
Jenkins. I'll try that tomorrow since builds.apache.org is still down.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139929)
Time Spent: 7h 40m  (was: 7.5h)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #6312: Add min_cpu_platform pipeline option

2018-08-30 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 8030204647ff2e13c8465ecca0f96ca82ab7c761
Merge: 568c96a c75dd4a
Author: Chamikara Jayalath 
AuthorDate: Thu Aug 30 17:03:22 2018 -0700

Merge pull request #6312: Add min_cpu_platform pipeline option

 sdks/python/apache_beam/options/pipeline_options.py |  6 ++
 sdks/python/apache_beam/runners/dataflow/dataflow_runner.py | 11 +++
 2 files changed, 17 insertions(+)




[beam] branch master updated (568c96a -> 8030204)

2018-08-30 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 568c96a  Config Gradle javadoc with UTF-8 encoding
 add c75dd4a  Add min_cpu_platform pipeline option
 new 8030204  Merge pull request #6312: Add min_cpu_platform pipeline option

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/options/pipeline_options.py |  6 ++
 sdks/python/apache_beam/runners/dataflow/dataflow_runner.py | 11 +++
 2 files changed, 17 insertions(+)



[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139924
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 23:58
Start Date: 30/Aug/18 23:58
Worklog Time Spent: 10m 
  Work Description: akedin commented on issue #6309: [BEAM-5274][SQL] Check 
if iterator.hasNext
URL: https://github.com/apache/beam/pull/6309#issuecomment-417505481
 
 
   @apilloud I assume you've looked at the updated PR. 
   
   > LGTM, but there is an implied // this should never happen here. 
   
   I don't think that's an implication with `.hasNext()`. 
"`.hasNext()+.next()`" is the contract of the iterator interface and it's a 
good practice to guard `.next()` with `.hasNext()`. In this case it's similar 
to if you needed to check `if (counters.size() > 0)` before `counters.get(0)`, 
which you probably want to always check unless there's clear guarantee of 
non-emptiness. In this case there's no clear guarantee of the non-emptiness so 
I think that it's correct to handle the emptiness. I don't know whether lack of 
metrics always means that `count = 0`, but to me it seems reasonable to assume 
this.
   
   I agree that it's unclear from just reading this line of code what the 
behavior of metrics is expected to be for an empty pipeline, or if there are 
other cases when metrics can be empty. My opinion though is that ambiguity in 
this case is caused by the fact that iterator is picked as an interface to 
extract metrics and not by how we handle it. 
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139924)
Time Spent: 1h 20m  (was: 1h 10m)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch release-2.8.0-lyft deleted (was ff645a2)

2018-08-30 Thread thw
This is an automated email from the ASF dual-hosted git repository.

thw pushed a change to branch release-2.8.0-lyft
in repository https://gitbox.apache.org/repos/asf/beam.git.


 was ff645a2  [LYFT] Build support for 2.8 branch.

This change permanently discards the following revisions:

 discard ff645a2  [LYFT] Build support for 2.8 branch.



[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139921
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 23:20
Start Date: 30/Aug/18 23:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6309: [BEAM-5274][SQL] 
Check if iterator.hasNext
URL: https://github.com/apache/beam/pull/6309#issuecomment-417498570
 
 
   LGTM, but there is an implied `// this should never happen` here. You found 
this in the direct runner, I wonder what other runners do in regards to metrics 
when when you run a pipeline with no elements. (Could this be a bug in the 
direct runner?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139921)
Time Spent: 1h 10m  (was: 1h)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5258) Investigate if we can disable Row type flattening in Calcite

2018-08-30 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-5258:
--

Assignee: (was: Rui Wang)

> Investigate if we can disable Row type flattening in Calcite
> 
>
> Key: BEAM-5258
> URL: https://issues.apache.org/jira/browse/BEAM-5258
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Either disable the flattening in PlannerImpl or Flattener could be a good 
> start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Config Gradle javadoc with UTF-8 encoding

2018-08-30 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 568c96ac189571f838e152c8044a3429a4db9493
Merge: 64ec7bd 778d498
Author: Lukasz Cwik 
AuthorDate: Thu Aug 30 16:00:43 2018 -0700

Config Gradle javadoc with UTF-8 encoding

 .../src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)



[beam] branch master updated (64ec7bd -> 568c96a)

2018-08-30 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 64ec7bd  Merge pull request #6247 from cclauss/patch-1
 add 778d498  Config Gradle javadoc
 new 568c96a  Config Gradle javadoc with UTF-8 encoding

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)



[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-30 Thread Reuven Lax (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598020#comment-16598020
 ] 

Reuven Lax commented on BEAM-5036:
--

Great. We should also fix GCS to use rewrite instead of copy/rename (I
think GCS rewrite didn't exist back when this code was originally written),
though that should probably be in a separate PR.

On Thu, Aug 30, 2018 at 3:13 PM Tim Robertson (JIRA) 



> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139915
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 22:41
Start Date: 30/Aug/18 22:41
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #6309: 
[BEAM-5274][SQL] Catch NoSuchElementException
URL: https://github.com/apache/beam/pull/6309#discussion_r214202380
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
 ##
 @@ -304,7 +305,11 @@ private static Object fieldToAvatica(Schema.FieldType 
type, Object beamValue) {
   MetricsFilter.builder()
   
.addNameFilter(MetricNameFilter.named(BeamEnumerableConverter.class, "rows"))
   .build());
-  count = metrics.getCounters().iterator().next().getAttempted();
+  try {
+count = metrics.getCounters().iterator().next().getAttempted();
+  } catch (NoSuchElementException e) {
 
 Review comment:
   Updated.
   
   Thought about it. Your approach is better from the perspective of avoiding 
catching other NoSuchElementExceptions that are not caused by this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139915)
Time Spent: 1h  (was: 50m)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (865b147 -> 64ec7bd)

2018-08-30 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 865b147  Merge pull request #6291 from pabloem/for-test-ss
 add 9506b7a  tox.ini: Upgrade to current versions of PyLint
 add 60ad2d4  PyLint 1.9.3 --> 1.9.2
 add 4450af9  Add failing tests to .pylintrc disable section
 add b8cfbd1  Add disable directive logging-not-lazy to .pylintrc
 add c74b1d9  Drop back to PyLint v1.8 when the --py3k flag is used
 add db50be4  Drop back to PyLint v1.8 when the --py3k flag is used
 add 0a2de30  Drop back to PyLint v1.7 when the --py3k flag is used
 add 12e87dd  flake8 --exclude={toxinidir}/build/gradleenv
 add 085a974  Put a space between --exclude= and --select=
 add 1123909  Remove the unnecessary disable directives from .pylintrc
 add 6bd3277  Remove flake8 experimentation
 new 64ec7bd  Merge pull request #6247 from cclauss/patch-1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/.pylintrc |  7 +--
 sdks/python/tox.ini   | 12 ++--
 2 files changed, 11 insertions(+), 8 deletions(-)



[beam] 01/01: Merge pull request #6247 from cclauss/patch-1

2018-08-30 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 64ec7bd9612353555fc53e9a88f4c747e042fdaf
Merge: 865b147 6bd3277
Author: Charles Chen 
AuthorDate: Thu Aug 30 15:41:18 2018 -0700

Merge pull request #6247 from cclauss/patch-1

tox.ini: Upgrade to current versions of PyLint

 sdks/python/.pylintrc |  7 +--
 sdks/python/tox.ini   | 12 ++--
 2 files changed, 11 insertions(+), 8 deletions(-)



[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139913
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 22:32
Start Date: 30/Aug/18 22:32
Worklog Time Spent: 10m 
  Work Description: akedin commented on a change in pull request #6309: 
[BEAM-5274][SQL] Catch NoSuchElementException
URL: https://github.com/apache/beam/pull/6309#discussion_r214200886
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
 ##
 @@ -304,7 +305,11 @@ private static Object fieldToAvatica(Schema.FieldType 
type, Object beamValue) {
   MetricsFilter.builder()
   
.addNameFilter(MetricNameFilter.named(BeamEnumerableConverter.class, "rows"))
   .build());
-  count = metrics.getCounters().iterator().next().getAttempted();
+  try {
+count = metrics.getCounters().iterator().next().getAttempted();
+  } catch (NoSuchElementException e) {
 
 Review comment:
   i think it's better to guard `.next()` with `.hasNext()` instead of catching 
the exception


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139913)
Time Spent: 50m  (was: 40m)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139909
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 22:20
Start Date: 30/Aug/18 22:20
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #6309: [BEAM-5274][SQL] 
Catch NoSuchElementException
URL: https://github.com/apache/beam/pull/6309#issuecomment-417486000
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139909)
Time Spent: 0.5h  (was: 20m)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139910
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 22:20
Start Date: 30/Aug/18 22:20
Worklog Time Spent: 10m 
  Work Description: amaliujia removed a comment on issue #6309: 
[BEAM-5274][SQL] Catch NoSuchElementException
URL: https://github.com/apache/beam/pull/6309#issuecomment-417486000
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139910)
Time Spent: 40m  (was: 0.5h)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5268) Ardagan's test ticket

2018-08-30 Thread Mikhail Gryzykhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Gryzykhin closed BEAM-5268.
---
   Resolution: Invalid
Fix Version/s: Not applicable

> Ardagan's test ticket 
> --
>
> Key: BEAM-5268
> URL: https://issues.apache.org/jira/browse/BEAM-5268
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139907
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 22:19
Start Date: 30/Aug/18 22:19
Worklog Time Spent: 10m 
  Work Description: amaliujia opened a new pull request #6309: 
[BEAM-5274][SQL] Catch NoSuchElementException
URL: https://github.com/apache/beam/pull/6309
 
 
   I found when `INSERT INTO empty_table FROM empty_table", SQL Shell throws 
NoSuchElementException, which it shouldn't throw. 
   
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139907)
Time Spent: 10m
Remaining Estimate: 0h

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang

[jira] [Work logged] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5274?focusedWorklogId=139908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139908
 ]

ASF GitHub Bot logged work on BEAM-5274:


Author: ASF GitHub Bot
Created on: 30/Aug/18 22:19
Start Date: 30/Aug/18 22:19
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #6309: [BEAM-5274][SQL] 
Catch NoSuchElementException
URL: https://github.com/apache/beam/pull/6309#issuecomment-417485877
 
 
   R: @apilloud 
   CC: @akedin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139908)
Time Spent: 20m  (was: 10m)

> Handle NoSuchElementException When select from an empty table and insert into 
> another table
> ---
>
> Key: BEAM-5274
> URL: https://issues.apache.org/jira/browse/BEAM-5274
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5275) Beam SQL RAND(seed) function isn't compatible with dofn

2018-08-30 Thread Andrew Pilloud (JIRA)
Andrew Pilloud created BEAM-5275:


 Summary: Beam SQL RAND(seed) function isn't compatible with dofn
 Key: BEAM-5275
 URL: https://issues.apache.org/jira/browse/BEAM-5275
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Andrew Pilloud
Assignee: Xu Mingmin


We currently wrap the RAND(seed) operator with a normal dofn, but that isn't 
actually valid because we don't process the rows in a single instance of that 
dofn. We should revisit if this is something we even want to support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-30 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597976#comment-16597976
 ] 

Tim Robertson commented on BEAM-5036:
-

Thanks [~reuvenlax] - that was in response to my concern about the files 
already existing right? It doesn't affect whether we use copy/delete or rename 
approach, or am I missing something?

I have added {{FileAlreadyExistsException}} in the [PR for changing 
HDFSFileSystem.rename()|https://github.com/apache/beam/pull/6285]. With that we 
can handle the case of failure when the destination already exists, delete it 
and retry thus forcing the overwrite. Together with the IGNORE_MISSING_FILES 
that should be as idempotent as we can achieve I think.

Sound reasonable? 

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5274) Handle NoSuchElementException When select from an empty table and insert into another table

2018-08-30 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5274:
--

 Summary: Handle NoSuchElementException When select from an empty 
table and insert into another table
 Key: BEAM-5274
 URL: https://issues.apache.org/jira/browse/BEAM-5274
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-08-30 Thread Andrew Pilloud (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597973#comment-16597973
 ] 

Andrew Pilloud commented on BEAM-4704:
--

We can't override calcite's implementation if calcite is generating our code. 
Fixing is in calcite blocks BEAM-5112

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4861?focusedWorklogId=139899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139899
 ]

ASF GitHub Bot logged work on BEAM-4861:


Author: ASF GitHub Bot
Created on: 30/Aug/18 21:55
Start Date: 30/Aug/18 21:55
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6285: [BEAM-4861] 
Autocreate directories when doing an HDFS rename
URL: https://github.com/apache/beam/pull/6285#issuecomment-417479658
 
 
   @reuvenlax when builds.apache.org is back, can you PTAL?
   
   This is added to enable us to proceed with BEAM-5036, but I believe this is 
a good addition regardless of what we do on that ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139899)
Time Spent: 0.5h  (was: 20m)

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4861?focusedWorklogId=139898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139898
 ]

ASF GitHub Bot logged work on BEAM-4861:


Author: ASF GitHub Bot
Created on: 30/Aug/18 21:50
Start Date: 30/Aug/18 21:50
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6285: [BEAM-4861] 
Autocreate directories when doing an HDFS rename
URL: https://github.com/apache/beam/pull/6285#issuecomment-417478467
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139898)
Time Spent: 20m  (was: 10m)

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5273) Local file system does not work as expected on Portability Framework with Docker

2018-08-30 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-5273:
--

 Summary: Local file system does not work as expected on 
Portability Framework with Docker
 Key: BEAM-5273
 URL: https://issues.apache.org/jira/browse/BEAM-5273
 Project: Beam
  Issue Type: Bug
  Components: sdk-go, sdk-java-harness, sdk-py-harness
Reporter: Ankur Goenka
Assignee: Ankur Goenka


With portability framework, the local file system reads and write to the docker 
container file system.

This makes usage of local files impossible with portability framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5272) Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5272?focusedWorklogId=139887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139887
 ]

ASF GitHub Bot logged work on BEAM-5272:


Author: ASF GitHub Bot
Created on: 30/Aug/18 21:14
Start Date: 30/Aug/18 21:14
Worklog Time Spent: 10m 
  Work Description: kevinsi4508 opened a new pull request #6308: 
[BEAM-5272] Randomize the reduced splits in BigtableIO so that multiple workers 
may not hit the same tablet server
URL: https://github.com/apache/beam/pull/6308
 
 
   Randomize the reduced splits so that multiple workers may not hit the same 
tablet server
   
   
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139887)
Time Spent: 10m
Remaining Estimate: 0h

> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server
> --
>
> Key: BEAM-5272
> URL: https://issues.apache.org/jira/browse/BEAM-5272
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kevin Si
>Assignee: Chamikara Jayalath
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Randomize the reduced splits in BigtableIO so that multiple workers may not 
> hit the same tablet server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5272) Randomize the reduced splits in BigtableIO so that multiple workers may not hit the same tablet server

2018-08-30 Thread Kevin Si (JIRA)
Kevin Si created BEAM-5272:
--

 Summary: Randomize the reduced splits in BigtableIO so that 
multiple workers may not hit the same tablet server
 Key: BEAM-5272
 URL: https://issues.apache.org/jira/browse/BEAM-5272
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Kevin Si
Assignee: Chamikara Jayalath


Randomize the reduced splits in BigtableIO so that multiple workers may not hit 
the same tablet server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (5720c1d -> 865b147)

2018-08-30 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 5720c1d  [BEAM-5187] Add a ProcessJobBundleFactory for process-based 
execution (#6287)
 add d21d328  Adding for_test utility function for state sampler
 add 7ead7b8  Addressing comments
 new 865b147  Merge pull request #6291 from pabloem/for-test-ss

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/runners/worker/statesampler.py | 6 ++
 1 file changed, 6 insertions(+)



[beam] 01/01: Merge pull request #6291 from pabloem/for-test-ss

2018-08-30 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 865b14781ed28bab80be8db49843f9d2d3d15527
Merge: 5720c1d 7ead7b8
Author: Charles Chen 
AuthorDate: Thu Aug 30 14:04:34 2018 -0700

Merge pull request #6291 from pabloem/for-test-ss

Adding for_test utility function for state sampler

 sdks/python/apache_beam/runners/worker/statesampler.py | 6 ++
 1 file changed, 6 insertions(+)



[jira] [Created] (BEAM-5271) Support INSERT OVERWRITE Statement

2018-08-30 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5271:
--

 Summary: Support INSERT OVERWRITE Statement
 Key: BEAM-5271
 URL: https://issues.apache.org/jira/browse/BEAM-5271
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Rui Wang


We can support INSERT OVERWRITE to insert to a table but overwrite existing 
table contents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139882
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 20:32
Start Date: 30/Aug/18 20:32
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214170949
 
 

 ##
 File path: website/Dockerfile
 ##
 @@ -0,0 +1,33 @@
+###
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###
+
+# This image contains Ruby and dependencies required to build and test the Beam
+# website. It is used by tasks in build.gradle.
+
+FROM ruby:2.5
+
+WORKDIR /ruby
+RUN gem install bundler
+# Update buildDockerImage's inputs.files if you change this list.
+ADD Gemfile Gemfile.lock /ruby/
+RUN bundle install --deployment --path $GEM_HOME
+
+# Required for website testing using HTMLProofer.
+ENV LC_ALL C.UTF-8
+
+CMD sleep 3600
 
 Review comment:
   Used by `bundle install`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139882)
Time Spent: 7h 20m  (was: 7h 10m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139863
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 19:56
Start Date: 30/Aug/18 19:56
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214160845
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -3,16 +3,18 @@ require 'html-proofer'
 require 'etc'
 
 task :test do
-  FileUtils.rm_rf('./.testcontent')
-  sh "bundle exec jekyll build --config _config.yml,_config_test.yml"
-  HTMLProofer.check_directory("./.testcontent", {
+  HTMLProofer.check_directory("./content", {
 :typhoeus => {
   :timeout => 60,
   :connecttimeout => 40 },
 :allow_hash_href => true,
 :check_html => true,
 :file_ignore => [/javadoc/, /v2/, /pydoc/],
 :url_ignore => [
+# Javadocs and Pydocs are only available on asf-site branch
+/documentation\/sdks\/javadoc/,
+/documentation\/sdks\/pydoc/,
 
 Review comment:
   I'm not sure yet where the generated docs go, it's going to either be 
apache/beam-site or directly push to gitbox. TBD when I get responses from 
INFRA.
   
   This bug tracks generation of java and pydocs: 
https://issues.apache.org/jira/browse/BEAM-4498
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139863)
Time Spent: 7h 10m  (was: 7h)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139862
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 19:56
Start Date: 30/Aug/18 19:56
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214159508
 
 

 ##
 File path: website/Dockerfile
 ##
 @@ -0,0 +1,33 @@
+###
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###
+
+# This image contains Ruby and dependencies required to build and test the Beam
+# website. It is used by tasks in build.gradle.
+
+FROM ruby:2.5
+
+WORKDIR /ruby
+RUN gem install bundler
+# Update buildDockerImage's inputs.files if you change this list.
+ADD Gemfile Gemfile.lock /ruby/
+RUN bundle install --deployment --path $GEM_HOME
+
+# Required for website testing using HTMLProofer.
+ENV LC_ALL C.UTF-8
+
+CMD sleep 3600
 
 Review comment:
   Yes, that's specified in 
https://github.com/apache/beam/blob/master/website/Gemfile.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139862)
Time Spent: 7h 10m  (was: 7h)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4498) Migrate release Javadocs / Pydocs to [asf-site] branch and update release guide

2018-08-30 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597857#comment-16597857
 ] 

Udi Meiri commented on BEAM-4498:
-

Release script: 
https://github.com/apache/beam/blob/5720c1d22771a65ad5d7be6a06ad8aa0754fa64b/release/src/main/scripts/build_release_candidate.sh#L224

> Migrate release Javadocs / Pydocs to [asf-site] branch and update release 
> guide
> ---
>
> Key: BEAM-4498
> URL: https://issues.apache.org/jira/browse/BEAM-4498
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Major
>  Labels: beam-site-automation-reliability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5270) Finish Python 3 porting for coders subpackage

2018-08-30 Thread Robbe (JIRA)
Robbe created BEAM-5270:
---

 Summary: Finish Python 3 porting for coders subpackage
 Key: BEAM-5270
 URL: https://issues.apache.org/jira/browse/BEAM-5270
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Robbe
Assignee: Robbe






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4511) Create a tox environment that uses Py3 interpreter for pre/post commit test suites, once codebase supports Py3.

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4511?focusedWorklogId=139839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139839
 ]

ASF GitHub Bot logged work on BEAM-4511:


Author: ASF GitHub Bot
Created on: 30/Aug/18 19:01
Start Date: 30/Aug/18 19:01
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on issue #6266: [BEAM-4511] 
added py3 tox env for first test
URL: https://github.com/apache/beam/pull/6266#issuecomment-417431322
 
 
   I've added a change which also regenerates the proto files when 
gen_protos.py has been updated, which is needed to apply the new changes 
automatically.
   I can also confirm that having "py3" in the testenv name uses the default 
python 3 version on my machine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139839)
Time Spent: 40m  (was: 0.5h)

> Create a tox environment that uses Py3 interpreter for pre/post commit test 
> suites, once codebase supports Py3. 
> 
>
> Key: BEAM-4511
> URL: https://issues.apache.org/jira/browse/BEAM-4511
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Matthias Feys
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5269) Create integration tests for BigQueryIORead pipeline

2018-08-30 Thread yifan zou (JIRA)
yifan zou created BEAM-5269:
---

 Summary: Create integration tests for BigQueryIORead pipeline
 Key: BEAM-5269
 URL: https://issues.apache.org/jira/browse/BEAM-5269
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: yifan zou
Assignee: yifan zou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: [BEAM-5187] Add a ProcessJobBundleFactory for process-based execution (#6287)

2018-08-30 Thread thw
This is an automated email from the ASF dual-hosted git repository.

thw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 5720c1d  [BEAM-5187] Add a ProcessJobBundleFactory for process-based 
execution (#6287)
5720c1d is described below

commit 5720c1d22771a65ad5d7be6a06ad8aa0754fa64b
Author: Maximilian Michels 
AuthorDate: Thu Aug 30 20:02:45 2018 +0200

[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution 
(#6287)
---
 .../control/DockerJobBundleFactory.java| 266 ++---
 ...undleFactory.java => JobBundleFactoryBase.java} | 137 +++
 .../control/ProcessJobBundleFactory.java   |  84 +++
 .../environment/ProcessEnvironment.java|  77 ++
 .../environment/ProcessEnvironmentFactory.java | 157 
 .../fnexecution/environment/ProcessManager.java| 225 +
 .../control/ProcessJobBundleFactoryTest.java   | 195 +++
 .../environment/ProcessEnvironmentFactoryTest.java | 127 ++
 .../environment/ProcessEnvironmentTest.java|  44 
 .../environment/ProcessManagerTest.java| 103 
 10 files changed, 1056 insertions(+), 359 deletions(-)

diff --git 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
index 1e7f48b..3178a2e 100644
--- 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
+++ 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
@@ -18,46 +18,19 @@
 package org.apache.beam.runners.fnexecution.control;
 
 import com.google.common.annotations.VisibleForTesting;
-import com.google.common.cache.CacheBuilder;
-import com.google.common.cache.CacheLoader;
-import com.google.common.cache.LoadingCache;
-import com.google.common.cache.RemovalNotification;
-import com.google.common.collect.ImmutableMap;
-import com.google.common.collect.Iterables;
 import com.google.common.net.HostAndPort;
-import java.io.IOException;
-import java.util.Map;
-import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Executors;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.atomic.AtomicReference;
 import javax.annotation.concurrent.ThreadSafe;
-import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
-import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
-import org.apache.beam.runners.core.construction.graph.ExecutableStage;
-import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
 import org.apache.beam.runners.fnexecution.GrpcFnServer;
 import org.apache.beam.runners.fnexecution.ServerFactory;
 import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
-import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
-import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
-import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
-import org.apache.beam.runners.fnexecution.data.GrpcDataService;
 import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
 import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
-import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
 import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
-import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
 import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
 import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
-import org.apache.beam.runners.fnexecution.state.GrpcStateService;
-import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
-import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.fn.IdGenerator;
-import org.apache.beam.sdk.fn.IdGenerators;
-import org.apache.beam.sdk.fn.data.FnDataReceiver;
-import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
-import org.apache.beam.sdk.util.WindowedValue;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -67,7 +40,7 @@ import org.slf4j.LoggerFactory;
  * thread-safe. Instead, a new stage factory should be created for each client.
  */
 @ThreadSafe
-public class DockerJobBundleFactory implements JobBundleFactory {
+public class DockerJobBundleFactory extends JobBundleFactoryBase {
   private static final Logger LOG = 
LoggerFactory.getLogger(DockerJobBundleFactory.class);
 
   // Port offset for MacOS since we don't have host networking and need to use 
published ports
@@ -77,7 +50,7 @@ public class DockerJobBundleFactory implements 

[jira] [Assigned] (BEAM-5250) Python Wordcount fails with Flink portable streaming

2018-08-30 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-5250:


Assignee: Maximilian Michels

> Python Wordcount fails with Flink portable streaming
> 
>
> Key: BEAM-5250
> URL: https://issues.apache.org/jira/browse/BEAM-5250
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139800
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 17:50
Start Date: 30/Aug/18 17:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417408812
 
 
   @tweise Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139800)
Time Spent: 6h 40m  (was: 6.5h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5268) Ardagan's test ticket

2018-08-30 Thread Mikhail Gryzykhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Gryzykhin reassigned BEAM-5268:
---

Assignee: Mikhail Gryzykhin

> Ardagan's test ticket 
> --
>
> Key: BEAM-5268
> URL: https://issues.apache.org/jira/browse/BEAM-5268
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5268) Ardagan's test ticket

2018-08-30 Thread Mikhail Gryzykhin (JIRA)
Mikhail Gryzykhin created BEAM-5268:
---

 Summary: Ardagan's test ticket 
 Key: BEAM-5268
 URL: https://issues.apache.org/jira/browse/BEAM-5268
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Mikhail Gryzykhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5214) Update Java quickstart to use maven

2018-08-30 Thread Bruce Arctor (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597670#comment-16597670
 ] 

Bruce Arctor commented on BEAM-5214:


Should this be 'Update Java quickstart to use gradle'???

> Update Java quickstart to use maven
> ---
>
> Key: BEAM-5214
> URL: https://issues.apache.org/jira/browse/BEAM-5214
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, website
>Reporter: Robert Bradshaw
>Assignee: Reuven Lax
>Priority: Major
>
> The existing quickstart still uses mvn commands. 
> https://beam.apache.org/get-started/quickstart-java/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139788
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 16:52
Start Date: 30/Aug/18 16:52
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417389707
 
 
   Please run:  `./gradlew :beam-runners-java-fn-execution:check`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139788)
Time Spent: 6.5h  (was: 6h 20m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139785
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 16:43
Start Date: 30/Aug/18 16:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214101911
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -3,16 +3,18 @@ require 'html-proofer'
 require 'etc'
 
 task :test do
-  FileUtils.rm_rf('./.testcontent')
-  sh "bundle exec jekyll build --config _config.yml,_config_test.yml"
-  HTMLProofer.check_directory("./.testcontent", {
+  HTMLProofer.check_directory("./content", {
 
 Review comment:
   Thank you


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139785)
Time Spent: 7h  (was: 6h 50m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139782
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 16:43
Start Date: 30/Aug/18 16:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214101672
 
 

 ##
 File path: website/Dockerfile
 ##
 @@ -0,0 +1,33 @@
+###
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###
+
+# This image contains Ruby and dependencies required to build and test the Beam
+# website. It is used by tasks in build.gradle.
+
+FROM ruby:2.5
+
+WORKDIR /ruby
+RUN gem install bundler
+# Update buildDockerImage's inputs.files if you change this list.
+ADD Gemfile Gemfile.lock /ruby/
+RUN bundle install --deployment --path $GEM_HOME
+
+# Required for website testing using HTMLProofer.
+ENV LC_ALL C.UTF-8
+
+CMD sleep 3600
 
 Review comment:
   the other way around, does the ruby container have jekyll?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139782)
Time Spent: 6.5h  (was: 6h 20m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139783
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 16:43
Start Date: 30/Aug/18 16:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214101740
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// Define common lifecycle tasks and artifact types
+apply plugin: "base"
+
+def dockerImageTag = 'beam-website'
+def dockerWorkDir = "/repo"
+def buildDir = "$project.rootDir/build/website"
+
+task buildDockerImage(type: Exec) {
+  inputs.files 'Gemfile', 'Gemfile.lock'
+  commandLine 'docker', 'build', '-t', dockerImageTag, '.'
+}
+
+task createDockerContainer(type: Exec) {
+  dependsOn buildDockerImage
+  standardOutput = new ByteArrayOutputStream()
+  ext.containerId = {
+return standardOutput.toString().trim()
+  }
+  commandLine '/bin/bash', '-c',
+"docker create -v $project.rootDir:$dockerWorkDir -u \$(id -u):\$(id -g) 
$dockerImageTag"
+}
+
+task startDockerContainer(type: Exec) {
+  dependsOn createDockerContainer
+  ext.containerId = {
+return createDockerContainer.containerId()
+  }
+  commandLine 'docker', 'start',
+"${->createDockerContainer.containerId()}" // Lazily evaluate containerId.
+}
+
+task stopAndRemoveDockerContainer(type: Exec) {
+  commandLine 'docker', 'rm', '-f', "${->createDockerContainer.containerId()}"
+}
+
+task setupBuildDir(type: Copy) {
+  from('.') {
+include 'Gemfile*'
+include 'Rakefile'
+  }
+  into buildDir
+}
+
+task cleanWebsite(type: Delete) {
+  delete buildDir
+}
+clean.dependsOn cleanWebsite
+
+task buildWebsite(type: Exec) {
+  dependsOn startDockerContainer, setupBuildDir
+  finalizedBy stopAndRemoveDockerContainer
+  inputs.files 'Gemfile.lock', '_config.yml'
+  inputs.dir 'src'
+  outputs.dir "$buildDir/.sass-cache"
+  outputs.dir "$buildDir/content"
+  commandLine 'docker', 'exec', '-w', "$dockerWorkDir/build/website",
+"${->startDockerContainer.containerId()}", '/bin/bash', '-c',
+"""bundle exec jekyll build \
+  --config $dockerWorkDir/website/_config.yml \
+  --incremental \
+  --source $dockerWorkDir/website/src
+  """
+}
+build.dependsOn buildWebsite
+
+task testWebsite(type: Exec) {
+  dependsOn startDockerContainer, buildWebsite
 
 Review comment:
   Fair enough : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139783)
Time Spent: 6h 40m  (was: 6.5h)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139784
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 30/Aug/18 16:43
Start Date: 30/Aug/18 16:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r214101823
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -3,16 +3,18 @@ require 'html-proofer'
 require 'etc'
 
 
 Review comment:
   fair enough. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139784)
Time Spent: 6h 50m  (was: 6h 40m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #6302 from boyuanzz/add_new_dependency

2018-08-30 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit f469471cd850eba77fe7b8e4d3e385a6fb75416c
Merge: 6497b0b 71491db
Author: Pablo 
AuthorDate: Thu Aug 30 09:41:49 2018 -0700

Merge pull request #6302 from boyuanzz/add_new_dependency

Add boyuanzz as a owner of java powermock dependency

 ownership/JAVA_DEPENDENCY_OWNERS.yaml | 5 +
 1 file changed, 5 insertions(+)



[beam] branch master updated (6497b0b -> f469471)

2018-08-30 Thread pabloem
This is an automated email from the ASF dual-hosted git repository.

pabloem pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 6497b0b  Merge pull request #6300 from apilloud/index
 add 71491db  Add boyuanzz as a owner of powermock deps
 new f469471  Merge pull request #6302 from boyuanzz/add_new_dependency

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 ownership/JAVA_DEPENDENCY_OWNERS.yaml | 5 +
 1 file changed, 5 insertions(+)



[jira] [Assigned] (BEAM-4819) Make portable Flink runner JobBundleFactory configurable

2018-08-30 Thread Thomas Weise (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise reassigned BEAM-4819:
--

Assignee: Maximilian Michels  (was: Thomas Weise)

> Make portable Flink runner JobBundleFactory configurable
> 
>
> Key: BEAM-4819
> URL: https://issues.apache.org/jira/browse/BEAM-4819
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>
> BEAM-4791 introduces factory override for testing, expand that to allow users 
> to configure a different factory via service loader to adopt alternative 
> execution environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5124?focusedWorklogId=139770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139770
 ]

ASF GitHub Bot logged work on BEAM-5124:


Author: ASF GitHub Bot
Created on: 30/Aug/18 16:27
Start Date: 30/Aug/18 16:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #540: [BEAM-5124] DSL 
Euphoria documentation update
URL: https://github.com/apache/beam-site/pull/540#issuecomment-417381391
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139770)
Time Spent: 1h 40m  (was: 1.5h)

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3193) CoGroupByKey doesn't work in streaming mode

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3193?focusedWorklogId=139748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139748
 ]

ASF GitHub Bot logged work on BEAM-3193:


Author: ASF GitHub Bot
Created on: 30/Aug/18 15:38
Start Date: 30/Aug/18 15:38
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#5945: [BEAM-3193] Add SparkCoGroupByKeyStreaming validates runner to test 
CoGroupByKay bahavior in streaming mode on spark runner
URL: https://github.com/apache/beam/pull/5945#discussion_r214079472
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/SparkCoGroupByKeyStreamingTest.java
 ##
 @@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.spark.translation.streaming;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.junit.Assert.assertThat;
+
+import org.apache.beam.runners.spark.ReuseSparkContextRule;
+import org.apache.beam.runners.spark.SparkPipelineOptions;
+import org.apache.beam.runners.spark.StreamingTest;
+import org.apache.beam.runners.spark.io.CreateStream;
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TimestampedValue;
+import org.apache.beam.sdk.values.TupleTag;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A test that verifies that CoGroupByKey works in streaming mode in spark 
runner. */
+public class SparkCoGroupByKeyStreamingTest {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(SparkCoGroupByKeyStreamingTest.class);
 
 Review comment:
   I think `LOG` is never used after, should be removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139748)
Time Spent: 1h  (was: 50m)

> CoGroupByKey doesn't work in streaming mode
> ---
>
> Key: BEAM-3193
> URL: https://issues.apache.org/jira/browse/BEAM-3193
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The CoGroupByKey PTransform doesn't throw an exception but doesn't actually 
> perform the grouping when used in streaming mode. I will attach a test 
> pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3193) CoGroupByKey doesn't work in streaming mode

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3193?focusedWorklogId=139749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139749
 ]

ASF GitHub Bot logged work on BEAM-3193:


Author: ASF GitHub Bot
Created on: 30/Aug/18 15:38
Start Date: 30/Aug/18 15:38
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on a change in pull request 
#5945: [BEAM-3193] Add SparkCoGroupByKeyStreaming validates runner to test 
CoGroupByKay bahavior in streaming mode on spark runner
URL: https://github.com/apache/beam/pull/5945#discussion_r214079123
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/SparkCoGroupByKeyStreamingTest.java
 ##
 @@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.spark.translation.streaming;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.junit.Assert.assertThat;
+
+import org.apache.beam.runners.spark.ReuseSparkContextRule;
+import org.apache.beam.runners.spark.SparkPipelineOptions;
+import org.apache.beam.runners.spark.StreamingTest;
+import org.apache.beam.runners.spark.io.CreateStream;
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TimestampedValue;
+import org.apache.beam.sdk.values.TupleTag;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A test that verifies that CoGroupByKey works in streaming mode in spark 
runner. */
+public class SparkCoGroupByKeyStreamingTest {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(SparkCoGroupByKeyStreamingTest.class);
+  private static final TupleTag INPUT1_TAG = new TupleTag<>("input1");
+  private static final TupleTag INPUT2_TAG = new TupleTag<>("input2");
+
+  @Rule public final transient ReuseSparkContextRule noContextResue = 
ReuseSparkContextRule.no();
+
+  @Rule public final TestPipeline pipeline = TestPipeline.create();
+
+  private Duration batchDuration() {
+return Duration.millis(
+
(pipeline.getOptions().as(SparkPipelineOptions.class)).getBatchIntervalMillis());
+  }
+
+  @Category(StreamingTest.class)
+  @Test
+  public void testInStreamingMode() throws Exception {
+Instant instant = new Instant(0);
+CreateStream> source1 =
+CreateStream.of(KvCoder.of(VarIntCoder.of(), VarIntCoder.of()), 
batchDuration())
+.emptyBatch()
+.advanceWatermarkForNextBatch(instant)
+.nextBatch(
+TimestampedValue.of(KV.of(1, 1), instant),
+TimestampedValue.of(KV.of(1, 2), instant),
+TimestampedValue.of(KV.of(1, 3), instant))
+
.advanceWatermarkForNextBatch(instant.plus(Duration.standardSeconds(1L)))
+.nextBatch(
+TimestampedValue.of(KV.of(2, 4), 
instant.plus(Duration.standardSeconds(1L))),
+TimestampedValue.of(KV.of(2, 5), 
instant.plus(Duration.standardSeconds(1L))),
+TimestampedValue.of(KV.of(2, 6), 
instant.plus(Duration.standardSeconds(1L
+.advanceNextBatchWatermarkToInfinity();
+
+CreateStream> source2 =
+CreateStream.of(KvCoder.of(VarIntCoder.of(), VarIntCoder.of()), 
batchDuration())
+.emptyBatch()
+.advanceWatermarkForNextBatch(instant)
+.nextBatch(
+

[jira] [Work logged] (BEAM-5062) Add ability to configure S3ClientOptions

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5062?focusedWorklogId=139738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139738
 ]

ASF GitHub Bot logged work on BEAM-5062:


Author: ASF GitHub Bot
Created on: 30/Aug/18 15:23
Start Date: 30/Aug/18 15:23
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6122: [BEAM-5062] Add 
ability to provide custom S3ClientOptions
URL: https://github.com/apache/beam/pull/6122#issuecomment-417359457
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139738)
Time Spent: 2h 40m  (was: 2.5h)

> Add ability to configure S3ClientOptions
> 
>
> Key: BEAM-5062
> URL: https://issues.apache.org/jira/browse/BEAM-5062
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> It would be very useful to have an ability to configure 
> [S3ClientOptions|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/S3ClientOptions.html]
>  for Apache Beam jobs.
> For example, there are some implementations of S3, that does not support 
> virtual-hosted-style URLs for buckets, only path-style. Currently it's 
> impossible to enable path style access for amazon s3 client, which is used by 
> an apache-beam job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-30 Thread Reuven Lax (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597563#comment-16597563
 ] 

Reuven Lax commented on BEAM-5036:
--

Once you get to the rename step, the set of files to rename should be
deterministic. This isn't currently true for the Flink runner (it is for
Dataflow) because support for @RequiresStableInput is fully implemented,
however without stable input to the rename step many things can go wrong.
The Flink implementation of stable input will block the rename step from
executing until the snapshot is finalized, which means that a rollback will
only rollback that far and not regenerate new output files. This does work
in the current Spark runner (I believe) by forcing an RDD checkpoint.

Of course if the user manually rerurns a pipeline this can happen.

On Thu, Aug 30, 2018 at 7:11 AM Tim Robertson (JIRA) 



> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139736
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 15:13
Start Date: 30/Aug/18 15:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417355811
 
 
   @tweise I made the ShutdownHook gracefully stop the processes, followed by a 
kill if there are still running processes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139736)
Time Spent: 6h 20m  (was: 6h 10m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139713
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 14:45
Start Date: 30/Aug/18 14:45
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417345794
 
 
   thanks!
   
   One more observation: The shutdown hook kills the top level process, but not 
its child process(es). For  bash -> python, only bash will be killed. Maybe we 
can add a (very small) graceful termination period before taking out the hammer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139713)
Time Spent: 6h 10m  (was: 6h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139697
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 14:25
Start Date: 30/Aug/18 14:25
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417338622
 
 
   @tweise sure, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139697)
Time Spent: 6h  (was: 5h 50m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3912) Add support of HadoopOutputFormatIO

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3912?focusedWorklogId=139693=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139693
 ]

ASF GitHub Bot logged work on BEAM-3912:


Author: ASF GitHub Bot
Created on: 30/Aug/18 14:21
Start Date: 30/Aug/18 14:21
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on a change in pull request 
#6306: [BEAM-3912] Add HadoopOutputFormatIO support
URL: https://github.com/apache/beam/pull/6306#discussion_r214049997
 
 

 ##
 File path: sdks/java/io/hadoop-output-format/build.gradle
 ##
 @@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+applyJavaNature()
+provideIntegrationTestingDependencies()
+enableJavaPerformanceTesting()
+
+description = "Apache Beam :: SDKs :: Java :: IO :: Hadoop Output Format"
+ext.summary = "IO to write data to sinks that implement Hadoop Output Format."
+
+def log4j_version = "2.6.2"
+def elastic_search_version = "5.0.0"
+// Migrate to using a version of the driver compatible with Guava 20
+def cassandra_driver = "3.2.0"
+
+// Ban dependencies from the test runtime classpath
+configurations.testRuntimeClasspath {
+  // Ban hive-exec and mesos since they bundle protobuf without repackaging
+  exclude group: "org.apache.hive", module: "hive-exec"
+  exclude group: "org.apache.mesos", module: "mesos"
+  // Prevent a StackOverflow because of wiring LOG4J -> SLF4J -> LOG4J
+  exclude group: "org.slf4j", module: "log4j-over-slf4j"
+}
+
+dependencies {
+  shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
+  compile library.java.guava
+  shadow library.java.slf4j_api
+  shadow project(path: ":beam-sdks-java-io-hadoop-common", configuration: 
"shadow")
+  provided library.java.hadoop_common
+  provided library.java.hadoop_mapreduce_client_core
+  testCompile project(path: ":beam-runners-direct-java", configuration: 
"shadow")
+  testCompile project(path: ":beam-sdks-java-core", configuration: 
"shadowTest")
+  testCompile project(path: ":beam-sdks-java-io-common", configuration: 
"shadow")
+  testCompile project(path: ":beam-sdks-java-io-common", configuration: 
"shadowTest")
+  testCompile "io.netty:netty-transport-native-epoll:4.1.0.CR3"
 
 Review comment:
   Nit: 
   inlined version


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139693)
Time Spent: 50m  (was: 40m)

> Add support of HadoopOutputFormatIO
> ---
>
> Key: BEAM-3912
> URL: https://issues.apache.org/jira/browse/BEAM-3912
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For the moment, there is only HadoopInputFormatIO in Beam. To provide a 
> support of different writing IOs, that are not yet natively supported in Beam 
> (for example, Apache Orc or HBase bulk load), it would make sense to add 
> HadoopOutputFormatIO as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139690
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 14:16
Start Date: 30/Aug/18 14:16
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417335281
 
 
   @mxm rebased my branch 
https://github.com/tweise/beam/commit/c64add1a4eb2e1a4ae818e7891516a5c57ef1fe1
   
   Can you add the changes to ProcessManager and DockerJobBundleFactory to your 
PR?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139690)
Time Spent: 5h 50m  (was: 5h 40m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-30 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597502#comment-16597502
 ] 

Tim Robertson commented on BEAM-5036:
-

Thanks to everyone for contributing to this. [~JozoVilcek] I've come to a 
similar conclusion overnight and think we need to do one of:
 # surface {{FileAlreadyExistsException}} as well as {{FileNotFoundException}} 
from {{FileSystem.rename()}} and let the caller decide (here I presume we would 
opt to overwrite by deleting the target only if the source still exists and 
then retry)
 # document and implement that {{FileSystem.rename()}} will always replace 
existing files for all filesystems
 # expose a {{forceOverwrite}} flag / option and use it here

I propose we should open a separate issue to explore optimising rename for Gcs. 
I had simply overlooked the rewrite option (sorry, I am not all that familiar 
with Gcs).

I still have some concern about rewriting output files that already exist 
though. Isn't it the case that if "run 1" produced 45 avro file parts but for 
some reason "run 2" split differently and produced 43 file parts, anything 
using a glob on the directory would get incorrect data (i.e. the addition of 2 
parts from run 1)? This would be relevant for bounded, but possibly even a 
restart / recover of a streaming scenario?

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5267) Update Flink Runner to Flink 1.6.x

2018-08-30 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-5267:
-
Priority: Major  (was: Minor)

> Update Flink Runner to Flink 1.6.x
> --
>
> Key: BEAM-5267
> URL: https://issues.apache.org/jira/browse/BEAM-5267
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.8.0
>
>
> For the next release, the Flink version should be bumped. As changes for 
> 2.7.0 are already frozen, it going to be 2.8.0. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5267) Update Flink Runner to Flink 1.6.x

2018-08-30 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-5267:
-
Fix Version/s: 2.8.0

> Update Flink Runner to Flink 1.6.x
> --
>
> Key: BEAM-5267
> URL: https://issues.apache.org/jira/browse/BEAM-5267
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 2.8.0
>
>
> For the next release, the Flink version should be bumped. As changes for 
> 2.7.0 are already frozen, it going to be 2.8.0. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5267) Update Flink Runner to Flink 1.6.x

2018-08-30 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-5267:
-
Affects Version/s: (was: 2.8.0)

> Update Flink Runner to Flink 1.6.x
> --
>
> Key: BEAM-5267
> URL: https://issues.apache.org/jira/browse/BEAM-5267
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>
> For the next release, the Flink version should be bumped. As changes for 
> 2.7.0 are already frozen, it going to be 2.8.0. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5267) Update Flink Runner to Flink 1.6.x

2018-08-30 Thread Maximilian Michels (JIRA)
Maximilian Michels created BEAM-5267:


 Summary: Update Flink Runner to Flink 1.6.x
 Key: BEAM-5267
 URL: https://issues.apache.org/jira/browse/BEAM-5267
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Affects Versions: 2.8.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels


For the next release, the Flink version should be bumped. As changes for 2.7.0 
are already frozen, it going to be 2.8.0. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139675
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 13:09
Start Date: 30/Aug/18 13:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r214023347
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval >= 0) {
 
 Review comment:
   The default is now 0, so it gets disabled by default now. I agree with 
Aljoscha that the check could be removed entirely. The problem is, if you pass 
in negative numbers, they will simply be ignored and latency tracking will 
still be enabled...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139675)
Time Spent: 2h 20m  (was: 2h 10m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139674
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 13:09
Start Date: 30/Aug/18 13:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r214021455
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -167,4 +167,12 @@
   Boolean isShutdownSourcesOnFinalWatermark();
 
   void setShutdownSourcesOnFinalWatermark(Boolean shutdownOnFinalWatermark);
+
+  @Description(
+  "Interval in milliseconds for sending latency tracking marks from the 
sources to the sinks. " 
+  + "Interval value = 0 disablesthe feature.")
 
 Review comment:
   space missing `disablesthe`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139674)
Time Spent: 2h 10m  (was: 2h)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3912) Add support of HadoopOutputFormatIO

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3912?focusedWorklogId=139673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139673
 ]

ASF GitHub Bot logged work on BEAM-3912:


Author: ASF GitHub Bot
Created on: 30/Aug/18 12:57
Start Date: 30/Aug/18 12:57
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6306: [BEAM-3912] 
Add HadoopOutputFormatIO support
URL: https://github.com/apache/beam/pull/6306#issuecomment-417309907
 
 
   @aromanenko-dev thank you for including me. This is just to say that I am a 
little busy and probably can't do a thorough review within the next 2 weeks. I 
will comment as best I can and am very interested in this though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139673)
Time Spent: 40m  (was: 0.5h)

> Add support of HadoopOutputFormatIO
> ---
>
> Key: BEAM-3912
> URL: https://issues.apache.org/jira/browse/BEAM-3912
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For the moment, there is only HadoopInputFormatIO in Beam. To provide a 
> support of different writing IOs, that are not yet natively supported in Beam 
> (for example, Apache Orc or HBase bulk load), it would make sense to add 
> HadoopOutputFormatIO as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5266) TextIO doen't support http schema anymore

2018-08-30 Thread JIRA
Jean-Baptiste Onofré created BEAM-5266:
--

 Summary: TextIO doen't support http schema anymore
 Key: BEAM-5266
 URL: https://issues.apache.org/jira/browse/BEAM-5266
 Project: Beam
  Issue Type: Bug
  Components: io-java-files
Affects Versions: 2.6.0
Reporter: Jean-Baptiste Onofré
Assignee: Jean-Baptiste Onofré


Up to Beam 2.4.0 (at least), it was possible to directly use {{http}} schema 
with {{TextIO}}.
However,  now, when trying something like:

{code}
TextIO.read().from("http://;)
{code}

throws:

{code}
Caused by: java.lang.IllegalArgumentException: No filesystem found for scheme 
http
{code}

That's due to the "new" file system  support. Both {{file}} and {{http}} schema 
should be handled for URL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139647
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 11:42
Start Date: 30/Aug/18 11:42
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on issue #6278: [BEAM-5239] Enable 
to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#issuecomment-417289738
 
 
   Ah, Jenkins is down: https://status.apache.org


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139647)
Time Spent: 2h  (was: 1h 50m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5247?focusedWorklogId=139648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139648
 ]

ASF GitHub Bot logged work on BEAM-5247:


Author: ASF GitHub Bot
Created on: 30/Aug/18 11:42
Start Date: 30/Aug/18 11:42
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on issue #6284: [BEAM-5247] Remove 
slf4j-simple binding from dependencies
URL: https://github.com/apache/beam/pull/6284#issuecomment-417289783
 
 
   @lukecwik Nevermind, Jenkins is/was down: https://status.apache.org


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139648)
Time Spent: 1h  (was: 50m)

> Remove slf4j-simple binding from dependencies
> -
>
> Key: BEAM-5247
> URL: https://issues.apache.org/jira/browse/BEAM-5247
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jozef Vilcek
>Assignee: Jozef Vilcek
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Flink runner declares a slf4j-simple binding in dependencies. This can break 
> logging of application if they have their own binding and does not exclude 
> this one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139646
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 11:40
Start Date: 30/Aug/18 11:40
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213997293
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval >= 0) {
 
 Review comment:
   I think we should disable by default. This is also what will happen for the 
next Flink version, I think. And it might save people some headaches if they 
don't have to debug it and find out the hard way, as you did.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139646)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3912) Add support of HadoopOutputFormatIO

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3912?focusedWorklogId=139642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139642
 ]

ASF GitHub Bot logged work on BEAM-3912:


Author: ASF GitHub Bot
Created on: 30/Aug/18 11:19
Start Date: 30/Aug/18 11:19
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev removed a comment on issue #6306: 
[BEAM-3912] Add HadoopOutputFormatIO support
URL: https://github.com/apache/beam/pull/6306#issuecomment-417263723
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139642)
Time Spent: 0.5h  (was: 20m)

> Add support of HadoopOutputFormatIO
> ---
>
> Key: BEAM-3912
> URL: https://issues.apache.org/jira/browse/BEAM-3912
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For the moment, there is only HadoopInputFormatIO in Beam. To provide a 
> support of different writing IOs, that are not yet natively supported in Beam 
> (for example, Apache Orc or HBase bulk load), it would make sense to add 
> HadoopOutputFormatIO as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139639
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:41
Start Date: 30/Aug/18 10:41
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213983463
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval >= 0) {
 
 Review comment:
   Idea was to stick to default flink configuration, unless user intend to 
overwrite it. With this PR I have a choice to disable it by passing to runner 
`--latencyTrackingInterval=0`. 
   Did not wanted to disable it for all by default. Should I?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139639)
Time Spent: 1h 40m  (was: 1.5h)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5124?focusedWorklogId=139636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139636
 ]

ASF GitHub Bot logged work on BEAM-5124:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:21
Start Date: 30/Aug/18 10:21
Worklog Time Spent: 10m 
  Work Description: VaclavPlajt commented on issue #540: [BEAM-5124] DSL 
Euphoria documentation update
URL: https://github.com/apache/beam-site/pull/540#issuecomment-417269596
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139636)
Time Spent: 1.5h  (was: 1h 20m)

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5247?focusedWorklogId=139635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139635
 ]

ASF GitHub Bot logged work on BEAM-5247:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:19
Start Date: 30/Aug/18 10:19
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on issue #6284: [BEAM-5247] Remove 
slf4j-simple binding from dependencies
URL: https://github.com/apache/beam/pull/6284#issuecomment-417269203
 
 
   @lukecwik Is "Run Flink ValidatesRunner" not the correct incantation 
anymore? I think there might be some other issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139635)
Time Spent: 50m  (was: 40m)

> Remove slf4j-simple binding from dependencies
> -
>
> Key: BEAM-5247
> URL: https://issues.apache.org/jira/browse/BEAM-5247
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jozef Vilcek
>Assignee: Jozef Vilcek
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Flink runner declares a slf4j-simple binding in dependencies. This can break 
> logging of application if they have their own binding and does not exclude 
> this one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5247?focusedWorklogId=139634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139634
 ]

ASF GitHub Bot logged work on BEAM-5247:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:18
Start Date: 30/Aug/18 10:18
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on issue #6284: [BEAM-5247] Remove 
slf4j-simple binding from dependencies
URL: https://github.com/apache/beam/pull/6284#issuecomment-417269034
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139634)
Time Spent: 40m  (was: 0.5h)

> Remove slf4j-simple binding from dependencies
> -
>
> Key: BEAM-5247
> URL: https://issues.apache.org/jira/browse/BEAM-5247
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jozef Vilcek
>Assignee: Jozef Vilcek
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Flink runner declares a slf4j-simple binding in dependencies. This can break 
> logging of application if they have their own binding and does not exclude 
> this one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139633
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:18
Start Date: 30/Aug/18 10:18
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on issue #6278: [BEAM-5239] Enable 
to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#issuecomment-417268836
 
 
   Run Flink ValidatesRunner
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139633)
Time Spent: 1.5h  (was: 1h 20m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139632
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:16
Start Date: 30/Aug/18 10:16
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213977137
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval >= 0) {
 
 Review comment:
   Will this work? The default is `-1`, so this condition will be `false`, i.e. 
we never disable latency tracking in Flink which is the original point of this 
PR.
   
   I think we can just remove the check. What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139632)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5247?focusedWorklogId=139631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139631
 ]

ASF GitHub Bot logged work on BEAM-5247:


Author: ASF GitHub Bot
Created on: 30/Aug/18 10:12
Start Date: 30/Aug/18 10:12
Worklog Time Spent: 10m 
  Work Description: aljoscha commented on issue #6284: [BEAM-5247] Remove 
slf4j-simple binding from dependencies
URL: https://github.com/apache/beam/pull/6284#issuecomment-417267284
 
 
   Run Flink ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139631)
Time Spent: 0.5h  (was: 20m)

> Remove slf4j-simple binding from dependencies
> -
>
> Key: BEAM-5247
> URL: https://issues.apache.org/jira/browse/BEAM-5247
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jozef Vilcek
>Assignee: Jozef Vilcek
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Flink runner declares a slf4j-simple binding in dependencies. This can break 
> logging of application if they have their own binding and does not exclude 
> this one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3912) Add support of HadoopOutputFormatIO

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3912?focusedWorklogId=139630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139630
 ]

ASF GitHub Bot logged work on BEAM-3912:


Author: ASF GitHub Bot
Created on: 30/Aug/18 09:59
Start Date: 30/Aug/18 09:59
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #6306: [BEAM-3912] 
Add HadoopOutputFormatIO support
URL: https://github.com/apache/beam/pull/6306#issuecomment-417263723
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139630)
Time Spent: 20m  (was: 10m)

> Add support of HadoopOutputFormatIO
> ---
>
> Key: BEAM-3912
> URL: https://issues.apache.org/jira/browse/BEAM-3912
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the moment, there is only HadoopInputFormatIO in Beam. To provide a 
> support of different writing IOs, that are not yet natively supported in Beam 
> (for example, Apache Orc or HBase bulk load), it would make sense to add 
> HadoopOutputFormatIO as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3912) Add support of HadoopOutputFormatIO

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3912?focusedWorklogId=139629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139629
 ]

ASF GitHub Bot logged work on BEAM-3912:


Author: ASF GitHub Bot
Created on: 30/Aug/18 09:56
Start Date: 30/Aug/18 09:56
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev opened a new pull request #6306: 
[BEAM-3912] Add HadoopOutputFormatIO support
URL: https://github.com/apache/beam/pull/6306
 
 
   For the moment, there is only `HadoopInputFormatIO` in Beam. To provide a 
support of different writing IOs, that are not yet natively supported in Beam 
(for example, Apache Orc or HBase bulk load), this PR adds new Java IO 
`HadoopOutputFormatIO` which allows to write data to any sink which implements 
Hadoop OutputFormat. 
   
   It is developed as a separate IO module `hadoop-output-format` to avoid a 
confusion with a name of already existed module `hadoop-input-format`. Perhaps, 
in the next versions of Beam we should merge them into one common 
`HadoopFormatIO` module.
   
   It was tested by unit tests and integration test that were incorporated into 
this PR as well.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139628
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 09:42
Start Date: 30/Aug/18 09:42
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417258241
 
 
   > btw I noticed that after job server shutdown, launched processes still 
stick around and don't exit
   
   Please see the latest version of the PR from yesterday. I revised the 
shutdown logic to eventually kill  processes if they don't stop gracefully. In 
your code you're using the old version. Also, I've added a ShutdownHook to kill 
running processes in case the JVM shuts down prematurely.
   
   @tweise @angoenka I've removed the singleton ProcessManager and instantiate 
it per `ProcessEnvironmentFactory` which should get rid of the duplicate worker 
ids you were seeing @tweise.
   
   @tweise For the IO, I've enabled inheriting IO when the log level is set to 
DEBUG. That should help with debugging process startup problems.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139628)
Time Spent: 5h 40m  (was: 5.5h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3193) CoGroupByKey doesn't work in streaming mode

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3193?focusedWorklogId=139621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139621
 ]

ASF GitHub Bot logged work on BEAM-3193:


Author: ASF GitHub Bot
Created on: 30/Aug/18 09:35
Start Date: 30/Aug/18 09:35
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #5945: [BEAM-3193] 
Add SparkCoGroupByKeyStreaming validates runner to test CoGroupByKay bahavior 
in streaming mode on spark runner
URL: https://github.com/apache/beam/pull/5945#issuecomment-417256115
 
 
   @echauchot Sure, I'll take a look today


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139621)
Time Spent: 50m  (was: 40m)

> CoGroupByKey doesn't work in streaming mode
> ---
>
> Key: BEAM-3193
> URL: https://issues.apache.org/jira/browse/BEAM-3193
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The CoGroupByKey PTransform doesn't throw an exception but doesn't actually 
> perform the grouping when used in streaming mode. I will attach a test 
> pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3193) CoGroupByKey doesn't work in streaming mode

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3193?focusedWorklogId=139618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139618
 ]

ASF GitHub Bot logged work on BEAM-3193:


Author: ASF GitHub Bot
Created on: 30/Aug/18 09:27
Start Date: 30/Aug/18 09:27
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #5945: [BEAM-3193] Add 
SparkCoGroupByKeyStreaming validates runner to test CoGroupByKay bahavior in 
streaming mode on spark runner
URL: https://github.com/apache/beam/pull/5945#issuecomment-417253207
 
 
   @aromanenko-dev can you please take a look? JB and Ismael seem busy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139618)
Time Spent: 40m  (was: 0.5h)

> CoGroupByKey doesn't work in streaming mode
> ---
>
> Key: BEAM-3193
> URL: https://issues.apache.org/jira/browse/BEAM-3193
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The CoGroupByKey PTransform doesn't throw an exception but doesn't actually 
> perform the grouping when used in streaming mode. I will attach a test 
> pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5265) Can not test Timer with processing time domain

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5265?focusedWorklogId=139614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139614
 ]

ASF GitHub Bot logged work on BEAM-5265:


Author: ASF GitHub Bot
Created on: 30/Aug/18 08:44
Start Date: 30/Aug/18 08:44
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on issue #6305: [BEAM-5265] Use 
currentProcessingTime() for onTime with processing time domain
URL: https://github.com/apache/beam/pull/6305#issuecomment-417239792
 
 
   Simple patch to see if it breaks any existing tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139614)
Time Spent: 20m  (was: 10m)

> Can not test Timer with processing time domain
> --
>
> Key: BEAM-5265
> URL: https://issues.apache.org/jira/browse/BEAM-5265
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-direct
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I have a stateful DoFn which has a timer on PROCESSING_TIME domain. While 
> writing tests, I noticed that it does not react to `advanceProcessingTime()` 
> on tests stream. Problem seems to be here:
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L260]
> I can only tell that patching this place works for direct runner tests. Not 
> sure about broader impact on other runners since it is in `runner-core`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5265) Can not test Timer with processing time domain

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5265?focusedWorklogId=139613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139613
 ]

ASF GitHub Bot logged work on BEAM-5265:


Author: ASF GitHub Bot
Created on: 30/Aug/18 08:44
Start Date: 30/Aug/18 08:44
Worklog Time Spent: 10m 
  Work Description: JozoVilcek opened a new pull request #6305: [BEAM-5265] 
Use currentProcessingTime() for onTime with processing time domain
URL: https://github.com/apache/beam/pull/6305
 
 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139613)
Time Spent: 10m
Remaining Estimate: 0h

> Can not test Timer with processing time domain
> --
>
> Key: BEAM-5265
> URL: https://issues.apache.org/jira/browse/BEAM-5265
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-direct
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have a stateful DoFn which has a timer on PROCESSING_TIME domain. While 
> writing tests, I noticed that it does not react to `advanceProcessingTime()` 
> on tests stream. Problem seems to be here:
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L260]
> I can only tell that patching this place works for direct runner tests. Not 
> sure about broader impact on other runners since it is in `runner-core`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5264) Reference DirectRunner implementation of Python user state and timers API

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5264?focusedWorklogId=139612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139612
 ]

ASF GitHub Bot logged work on BEAM-5264:


Author: ASF GitHub Bot
Created on: 30/Aug/18 08:42
Start Date: 30/Aug/18 08:42
Worklog Time Spent: 10m 
  Work Description: charlesccychen opened a new pull request #6304: 
[BEAM-5264] Reference DirectRunner implementation of Python User State and 
Timers API
URL: https://github.com/apache/beam/pull/6304
 
 
   This change adds the reference DirectRunner implementation of the Python 
User State and Timers API.  With this change, a user can execute DoFns with 
state and timers on the DirectRunner.
   
   More details on the API design is available at 
https://s.apache.org/beam-python-user-state-and-timers.
   
   R: @robertwb 
   CC: @lukecwik @tweise @aaltay 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139612)
Time Spent: 10m
Remaining Estimate: 0h

> Reference DirectRunner implementation of Python user state and timers API
> -
>
> Key: BEAM-5264
> URL: https://issues.apache.org/jira/browse/BEAM-5264
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Charles Chen
>Assignee: Charles Chen
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue tracks the reference DirectRunner implementation of the Beam 
> Python User State and Timer API, described here: 
> [https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5265) Can not test Timer with processing time domain

2018-08-30 Thread Jozef Vilcek (JIRA)
Jozef Vilcek created BEAM-5265:
--

 Summary: Can not test Timer with processing time domain
 Key: BEAM-5265
 URL: https://issues.apache.org/jira/browse/BEAM-5265
 Project: Beam
  Issue Type: Bug
  Components: runner-core, runner-direct
Reporter: Jozef Vilcek
Assignee: Kenneth Knowles


I have a stateful DoFn which has a timer on PROCESSING_TIME domain. While 
writing tests, I noticed that it does not react to `advanceProcessingTime()` on 
tests stream. Problem seems to be here:

[https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L260]

I can only tell that patching this place works for direct runner tests. Not 
sure about broader impact on other runners since it is in `runner-core`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5264) Reference DirectRunner implementation of Python user state and timers API

2018-08-30 Thread Charles Chen (JIRA)
Charles Chen created BEAM-5264:
--

 Summary: Reference DirectRunner implementation of Python user 
state and timers API
 Key: BEAM-5264
 URL: https://issues.apache.org/jira/browse/BEAM-5264
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Affects Versions: 2.6.0
Reporter: Charles Chen
Assignee: Charles Chen


This issue tracks the reference DirectRunner implementation of the Beam Python 
User State and Timer API, described here: 
[https://s.apache.org/beam-python-user-state-and-timers].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5263) Add suppurt for accumulatrors to `SingleValueCollector`

2018-08-30 Thread Vaclav Plajt (JIRA)
Vaclav Plajt created BEAM-5263:
--

 Summary: Add suppurt for accumulatrors to `SingleValueCollector`
 Key: BEAM-5263
 URL: https://issues.apache.org/jira/browse/BEAM-5263
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-euphoria
Reporter: Vaclav Plajt
Assignee: Vaclav Plajt






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139599
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 30/Aug/18 08:27
Start Date: 30/Aug/18 08:27
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213943954
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableList;
+import java.io.File;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.concurrent.ThreadSafe;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A simple process manager which forks processes and kills them if 
necessary. */
+@ThreadSafe
+class ProcessManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessManager.class);
+
+  private static final ProcessManager INSTANCE = new ProcessManager();
+
+  private final Map processes;
+
+  public static ProcessManager getInstance() {
+return INSTANCE;
 
 Review comment:
   Absolutely, the `ProcessManager` needs to be instantiated per 
`ProcessEnvironmentFactory`. Getting rid of the static instance also makes 
testing easier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139599)
Time Spent: 5.5h  (was: 5h 20m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-690) Backoff in the DirectRunner Monitor if no work is Available

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-690?focusedWorklogId=139594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139594
 ]

ASF GitHub Bot logged work on BEAM-690:
---

Author: ASF GitHub Bot
Created on: 30/Aug/18 08:24
Start Date: 30/Aug/18 08:24
Worklog Time Spent: 10m 
  Work Description: janotav opened a new pull request #6303: [BEAM-690] 
Backoff in the DirectRunner if no work is available
URL: https://github.com/apache/beam/pull/6303
 
 
   Implementing backoff as described in the JIRA ticker for [BEAM-690]. 
Basically this PR:
   
   1. adds new DriverState (CONTINUE_THROTTLE) that signals that no work was 
available
   2. performs capped exponential backoff in the 
ExecutorServiceParallelExecutor when CONTINUE_THROTTLE is encountered
   
   @tgroh , you are probably the most suitable reviewer
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139594)
Time Spent: 10m
Remaining Estimate: 0h

> Backoff in the DirectRunner Monitor if no work is Available
> ---
>
> Key: BEAM-690
> URL: 

[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=139586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139586
 ]

ASF GitHub Bot logged work on BEAM-4461:


Author: ASF GitHub Bot
Created on: 30/Aug/18 07:55
Start Date: 30/Aug/18 07:55
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #6298: [BEAM-4461] 
Introduce Group transform.
URL: https://github.com/apache/beam/pull/6298#issuecomment-417225567
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139586)
Time Spent: 3h 10m  (was: 3h)

> Create a library of useful transforms that use schemas
> --
>
> Key: BEAM-4461
> URL: https://issues.apache.org/jira/browse/BEAM-4461
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> e.g. JoinBy(fields). Project, Filter, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5124?focusedWorklogId=139585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139585
 ]

ASF GitHub Bot logged work on BEAM-5124:


Author: ASF GitHub Bot
Created on: 30/Aug/18 07:40
Start Date: 30/Aug/18 07:40
Worklog Time Spent: 10m 
  Work Description: VaclavPlajt commented on issue #540: [BEAM-5124] DSL 
Euphoria documentation update
URL: https://github.com/apache/beam-site/pull/540#issuecomment-417221301
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139585)
Time Spent: 1h 20m  (was: 1h 10m)

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >