[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325456
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 09/Oct/19 02:56
Start Date: 09/Oct/19 02:56
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9306: [BEAM-7939] 
ZetaSQL dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539799833
 
 
   My high level comment is this: I find the titles "Beam SQL lexical structure 
(Calcite)" and "Beam SQL Lexical Structure (ZetaSQL)" a bit confusing. I would 
rather lead with the dialect on the titles for those pages. I'm not sure what 
would improve it, but "Beam Calcite SQL Lexical structure" and "Beam ZetaSQL 
Lexical Structure" are a little better (but not nice on the eyes).
   
   Ditto "Beam SQL operators for ZetaSQL" where I would rather say "Operators 
(Beam ZetaSQL)" or "Operators in Beam ZetaSQL" or "ZetaSQL operators supported 
by Beam ZetaSQL" or some such.
   
   If I summarize the above into a single rule, it would be that once you are 
talking about a dialect, don't use the term "Beam SQL" any more.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325456)
Time Spent: 1h 40m  (was: 1.5h)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325454
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 09/Oct/19 02:52
Start Date: 09/Oct/19 02:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9306: [BEAM-7939] 
ZetaSQL dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539798907
 
 
   Well, some BQ folks have tweeted that ZetaSQL is the SQL used by BQ...
   
   But it is true that ZetaSQL has many knobs and BQ SQL represents one 
configuration. BQ SQL also does not include our streaming extensions, while 
Beam SQL's ZetaSQL Dialect does not support a lot of BQ SQL. And Beam ZetaSQL 
users may choose a different configuration of ZetaSQL than BQ, for example to 
match Spanner or some future user of ZetaSQL more closely.
   
   So I'm with Cyrus on this one - I like calling it "ZetaSQL dialect".
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325454)
Time Spent: 1.5h  (was: 1h 20m)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7980) External environment with containerized worker pool

2019-10-08 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise closed BEAM-7980.
--
Resolution: Implemented

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-7980) External environment with containerized worker pool

2019-10-08 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise reopened BEAM-7980:


> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-08 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved BEAM-6829.

Fix Version/s: 2.17.0
   Resolution: Fixed

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
> Fix For: 2.17.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?focusedWorklogId=325450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325450
 ]

ASF GitHub Bot logged work on BEAM-6829:


Author: ASF GitHub Bot
Created on: 09/Oct/19 02:43
Start Date: 09/Oct/19 02:43
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #8585: [BEAM-6829] Use 
transform/pcollection name for metric namespace if none provided
URL: https://github.com/apache/beam/pull/8585#issuecomment-539796815
 
 
   Manual test shows that duplicate metric warnings are gone.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325450)
Time Spent: 3h 40m  (was: 3.5h)

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?focusedWorklogId=325449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325449
 ]

ASF GitHub Bot logged work on BEAM-6829:


Author: ASF GitHub Bot
Created on: 09/Oct/19 02:42
Start Date: 09/Oct/19 02:42
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #8585: [BEAM-6829] Use 
transform/pcollection name for metric namespace if none provided
URL: https://github.com/apache/beam/pull/8585
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325449)
Time Spent: 3.5h  (was: 3h 20m)

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=325446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325446
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 09/Oct/19 02:29
Start Date: 09/Oct/19 02:29
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9735: [BEAM-8355] Add a 
standard boolean coder
URL: https://github.com/apache/beam/pull/9735#issuecomment-539793694
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325446)
Time Spent: 3h  (was: 2h 50m)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=325445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325445
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 09/Oct/19 02:27
Start Date: 09/Oct/19 02:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9735: [BEAM-8355] Add a 
standard boolean coder
URL: https://github.com/apache/beam/pull/9735#issuecomment-539793241
 
 
   Please file bugs for Jenkins flakes. You can often find an existing flake by 
searching JIRA for the job name.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325445)
Time Spent: 2h 50m  (was: 2h 40m)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?focusedWorklogId=325428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325428
 ]

ASF GitHub Bot logged work on BEAM-6829:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:33
Start Date: 09/Oct/19 01:33
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #8585: [BEAM-6829] Use 
transform/pcollection name for metric namespace if none provided
URL: https://github.com/apache/beam/pull/8585#discussion_r332798020
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoMetricName.java
 ##
 @@ -51,7 +51,14 @@ private MonitoringInfoMetricName(String urn, Map labels) {
   @Override
   public String getNamespace() {
 if (labels.containsKey(MonitoringInfoConstants.Labels.NAMESPACE)) {
+  // User-generated metric
   return labels.getOrDefault(MonitoringInfoConstants.Labels.NAMESPACE, 
null);
+} else if (labels.containsKey(MonitoringInfoConstants.Labels.PCOLLECTION)) 
{
+  // System-generated metric, prepend with a colon
+  return ":" + 
labels.getOrDefault(MonitoringInfoConstants.Labels.PCOLLECTION, null);
+} else if (labels.containsKey(MonitoringInfoConstants.Labels.PTRANSFORM)) {
+  // System-generated metric, prepend with a colon
+  return ":" + 
labels.getOrDefault(MonitoringInfoConstants.Labels.PTRANSFORM, null);
 
 Review comment:
   Agreed, there should be no need for this prefix. I'm going to remove it. I 
wouldn't be surprised though if we revisit the metric names as we test this 
with a real world metric backend. The current names are quite verbose. Examples:
   
   
`36read/Read/Reshuffle/RemoveRandomKeys.None/beam:env:docker:v1:0:beam:metric:user_distribution
 {NAMESPACE=__main__.WordExtractingDoFn, 
PTRANSFORM=ref_AppliedPTransform_split_17, NAME=word_len_dist}: 
DistributionResult{sum=114, count=29, min=0, max=11}`
   
   (user metric created as `self.word_lengths_dist = 
Metrics.distribution(self.__class__, 'word_len_dist')`
   
   or 
   
   
`17read/Read/Impulse.None/beam:env:docker:v1:0:beam:metric:sampled_byte_size:v1 
{PCOLLECTION=ref_PCollection_PCollection_1}: DistributionResult{sum=13, 
count=1, min=13, max=13}, 
6format.None/beam:env:docker:v1:0:beam:metric:sampled_byte_size:v1 
{PCOLLECTION=ref_PCollection_PCollection_17}: DistributionResult{sum=395, 
count=18, min=19, max=29}`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325428)
Time Spent: 3h 20m  (was: 3h 10m)

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325421
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:25
Start Date: 09/Oct/19 01:25
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332796781
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -21,7 +21,9 @@
 import logging
 import os
 import random
+import re
 
 Review comment:
   You can remove this import now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325421)
Time Spent: 1h 50m  (was: 1h 40m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325423
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:25
Start Date: 09/Oct/19 01:25
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332796540
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1580,6 +1584,38 @@ def test_split_half(self):
 raise unittest.SkipTest("This test is for a single worker only.")
 
 
+class FnApiBasedLullLoggingTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(
+default_environment=beam_runner_api_pb2.Environment(
+urn=python_urns.EMBEDDED_PYTHON_GRPC),
+progress_request_frequency=0.5))
+
+  def test_lull_logging(self):
+
+if sys.version_info < (3, 4):
 
 Review comment:
   Ack. You could add a BEAM-1251 todo here so that we can remove it after 
python 2 support is removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325423)
Time Spent: 2h 10m  (was: 2h)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325419
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:22
Start Date: 09/Oct/19 01:22
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332796401
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -403,10 +412,35 @@ def process_bundle_split(self, request, instruction_id):
   instruction_id=instruction_id,
   error='Instruction not running: %s' % instruction_id)
 
+  def _log_lull_in_bundle_processor(self, processor):
+state_sampler = processor.state_sampler
+sampler_info = state_sampler.get_info()
+if (sampler_info
+and sampler_info.time_since_transition
+and sampler_info.time_since_transition > self.log_lull_timeout_ns):
+  step_name = sampler_info.state_name.step_name
+  state_name = sampler_info.state_name.name
+  state_lull_log = (
+  'There has been a processing lull of over %.2f seconds in state %s'
+  % (sampler_info.time_since_transition / 1e9, state_name))
+  step_name_log = (' in step %s ' % step_name) if step_name else ''
+
+  exec_thread = getattr(sampler_info, 'tracked_thread', None)
+  if exec_thread is not None:
+thread_frame = sys._current_frames().get(exec_thread.ident)  # pylint: 
disable=protected-access
+stack_trace = '\n'.join(
+traceback.format_stack(thread_frame)) if thread_frame else ''
+  else:
+stack_trace = '-NOT AVAILABLE-'
+
+  logging.warning(
+  '%s%s. Traceback:\n%s', state_lull_log, step_name_log, stack_trace)
+
   def process_bundle_progress(self, request, instruction_id):
 # It is an error to get progress for a not-in-flight bundle.
-processor = self.bundle_processor_cache.lookup(
-request.instruction_id)
+processor = self.bundle_processor_cache.lookup(request.instruction_id)
+if processor:
+  self._log_lull_in_bundle_processor(processor)
 
 Review comment:
   I have not benchmarked this, but a similar operation is done on the python 
prod worker, and it has not been a problem. Furthermore, a progress update 
involves going through all of the user and system metrics, and building updates 
for them. This should be equivalent to adding one more metric, as speaking to 
the state sampler is lockless.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325419)
Time Spent: 1h 40m  (was: 1.5h)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325417
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:21
Start Date: 09/Oct/19 01:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332796139
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1580,6 +1584,38 @@ def test_split_half(self):
 raise unittest.SkipTest("This test is for a single worker only.")
 
 
+class FnApiBasedLullLoggingTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(
+default_environment=beam_runner_api_pb2.Environment(
+urn=python_urns.EMBEDDED_PYTHON_GRPC),
+progress_request_frequency=0.5))
+
+  def test_lull_logging(self):
+
+if sys.version_info < (3, 4):
+  self.skipTest('Log-based assertions are supported after Python 3.4')
+try:
+  utils.check_compiled('apache_beam.runners.worker.opcounters')
+except RuntimeError:
+  self.skipTest('Cython is not available')
+
+with self.assertLogs(level='WARNING') as logs:
+  with self.create_pipeline() as p:
+sdk_worker.DEFAULT_LOG_LULL_TIMEOUT_NS = 1000 * 1000  # Lull after 1 ms
+
+_ = (p
+ | beam.Create([1])
+ | beam.Map(time.sleep))
+
+self.assertTrue(
 
 Review comment:
   Done!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325417)
Time Spent: 1h 20m  (was: 1h 10m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325416
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:20
Start Date: 09/Oct/19 01:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332796127
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1580,6 +1584,38 @@ def test_split_half(self):
 raise unittest.SkipTest("This test is for a single worker only.")
 
 
+class FnApiBasedLullLoggingTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(
+default_environment=beam_runner_api_pb2.Environment(
+urn=python_urns.EMBEDDED_PYTHON_GRPC),
+progress_request_frequency=0.5))
+
+  def test_lull_logging(self):
+
+if sys.version_info < (3, 4):
 
 Review comment:
   This is also for Python 2, that's right.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325416)
Time Spent: 1h 10m  (was: 1h)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325418
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 01:21
Start Date: 09/Oct/19 01:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332796254
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -403,10 +412,35 @@ def process_bundle_split(self, request, instruction_id):
   instruction_id=instruction_id,
   error='Instruction not running: %s' % instruction_id)
 
+  def _log_lull_in_bundle_processor(self, processor):
+state_sampler = processor.state_sampler
+sampler_info = state_sampler.get_info()
+if (sampler_info
+and sampler_info.time_since_transition
+and sampler_info.time_since_transition > self.log_lull_timeout_ns):
+  step_name = sampler_info.state_name.step_name
+  state_name = sampler_info.state_name.name
+  state_lull_log = (
+  'There has been a processing lull of over %.2f seconds in state %s'
+  % (sampler_info.time_since_transition / 1e9, state_name))
+  step_name_log = (' in step %s ' % step_name) if step_name else ''
+
+  exec_thread = getattr(sampler_info, 'tracked_thread', None)
+  if exec_thread is not None:
+thread_frame = sys._current_frames().get(exec_thread.ident)  # pylint: 
disable=protected-access
+stack_trace = '\n'.join(
+traceback.format_stack(thread_frame)) if thread_frame else ''
+  else:
+stack_trace = '-NOT AVAILABLE-'
 
 Review comment:
   This should not occur, as the state sampler always marks the thread that 
it's looking at - but I'd like to be cautious.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325418)
Time Spent: 1.5h  (was: 1h 20m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325406
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:49
Start Date: 09/Oct/19 00:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332790525
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1580,6 +1584,38 @@ def test_split_half(self):
 raise unittest.SkipTest("This test is for a single worker only.")
 
 
+class FnApiBasedLullLoggingTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(
+default_environment=beam_runner_api_pb2.Environment(
+urn=python_urns.EMBEDDED_PYTHON_GRPC),
+progress_request_frequency=0.5))
+
+  def test_lull_logging(self):
+
+if sys.version_info < (3, 4):
+  self.skipTest('Log-based assertions are supported after Python 3.4')
+try:
+  utils.check_compiled('apache_beam.runners.worker.opcounters')
+except RuntimeError:
+  self.skipTest('Cython is not available')
+
+with self.assertLogs(level='WARNING') as logs:
+  with self.create_pipeline() as p:
+sdk_worker.DEFAULT_LOG_LULL_TIMEOUT_NS = 1000 * 1000  # Lull after 1 ms
+
+_ = (p
+ | beam.Create([1])
+ | beam.Map(time.sleep))
+
+self.assertTrue(
 
 Review comment:
   Could we use assertRegexpMatches instead?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325406)
Time Spent: 50m  (was: 40m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325407
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:49
Start Date: 09/Oct/19 00:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332791064
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -403,10 +412,35 @@ def process_bundle_split(self, request, instruction_id):
   instruction_id=instruction_id,
   error='Instruction not running: %s' % instruction_id)
 
+  def _log_lull_in_bundle_processor(self, processor):
+state_sampler = processor.state_sampler
+sampler_info = state_sampler.get_info()
+if (sampler_info
+and sampler_info.time_since_transition
+and sampler_info.time_since_transition > self.log_lull_timeout_ns):
+  step_name = sampler_info.state_name.step_name
+  state_name = sampler_info.state_name.name
+  state_lull_log = (
+  'There has been a processing lull of over %.2f seconds in state %s'
+  % (sampler_info.time_since_transition / 1e9, state_name))
+  step_name_log = (' in step %s ' % step_name) if step_name else ''
+
+  exec_thread = getattr(sampler_info, 'tracked_thread', None)
+  if exec_thread is not None:
+thread_frame = sys._current_frames().get(exec_thread.ident)  # pylint: 
disable=protected-access
+stack_trace = '\n'.join(
+traceback.format_stack(thread_frame)) if thread_frame else ''
+  else:
+stack_trace = '-NOT AVAILABLE-'
 
 Review comment:
   When would this be the case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325407)
Time Spent: 50m  (was: 40m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325408
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:49
Start Date: 09/Oct/19 00:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332790861
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -403,10 +412,35 @@ def process_bundle_split(self, request, instruction_id):
   instruction_id=instruction_id,
   error='Instruction not running: %s' % instruction_id)
 
+  def _log_lull_in_bundle_processor(self, processor):
+state_sampler = processor.state_sampler
+sampler_info = state_sampler.get_info()
+if (sampler_info
+and sampler_info.time_since_transition
+and sampler_info.time_since_transition > self.log_lull_timeout_ns):
+  step_name = sampler_info.state_name.step_name
+  state_name = sampler_info.state_name.name
+  state_lull_log = (
+  'There has been a processing lull of over %.2f seconds in state %s'
+  % (sampler_info.time_since_transition / 1e9, state_name))
+  step_name_log = (' in step %s ' % step_name) if step_name else ''
+
+  exec_thread = getattr(sampler_info, 'tracked_thread', None)
+  if exec_thread is not None:
+thread_frame = sys._current_frames().get(exec_thread.ident)  # pylint: 
disable=protected-access
+stack_trace = '\n'.join(
+traceback.format_stack(thread_frame)) if thread_frame else ''
+  else:
+stack_trace = '-NOT AVAILABLE-'
+
+  logging.warning(
+  '%s%s. Traceback:\n%s', state_lull_log, step_name_log, stack_trace)
+
   def process_bundle_progress(self, request, instruction_id):
 # It is an error to get progress for a not-in-flight bundle.
-processor = self.bundle_processor_cache.lookup(
-request.instruction_id)
+processor = self.bundle_processor_cache.lookup(request.instruction_id)
+if processor:
+  self._log_lull_in_bundle_processor(processor)
 
 Review comment:
   Do we know the performance impact? Especially since this will happen many 
worker threads.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325408)
Time Spent: 1h  (was: 50m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325409
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:49
Start Date: 09/Oct/19 00:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9746: [BEAM-8333] 
Adding lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#discussion_r332790181
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1580,6 +1584,38 @@ def test_split_half(self):
 raise unittest.SkipTest("This test is for a single worker only.")
 
 
+class FnApiBasedLullLoggingTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(
+default_environment=beam_runner_api_pb2.Environment(
+urn=python_urns.EMBEDDED_PYTHON_GRPC),
+progress_request_frequency=0.5))
+
+  def test_lull_logging(self):
+
+if sys.version_info < (3, 4):
 
 Review comment:
   SDK is not installable with < Python 3.4 anyway. You can probably drop this 
(https://github.com/apache/beam/blob/master/sdks/python/setup.py#L182)
   
   Or is this for protecting against py 2 users?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325409)
Time Spent: 1h  (was: 50m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325405
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:37
Start Date: 09/Oct/19 00:37
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332789172
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325405)
Time Spent: 12h 40m  (was: 12.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325403
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:36
Start Date: 09/Oct/19 00:36
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332789120
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.INT32,
+ schema_pb2.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325402
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:34
Start Date: 09/Oct/19 00:34
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332788742
 
 

 ##
 File path: sdks/python/apache_beam/coders/standard_coders_test.py
 ##
 @@ -133,11 +171,17 @@ def parse_coder(self, spec):
  for c in spec.get('components', ())]
 context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=spec['urn'], payload=spec.get('payload')),
+urn=spec['urn'], payload=spec.get('payload', '').encode()),
 
 Review comment:
   Yeah this was just a fix I slipped in to get the test working in Python 3 
and forgot about..
   
   It should probably specify an encoding. Do you think it should be `latin1` 
like we do with the [expected encoded 
value](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/standard_coders_test.py#L115)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325402)
Time Spent: 12h 20m  (was: 12h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325398
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:12
Start Date: 09/Oct/19 00:12
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332785140
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +124,33 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
 
 Review comment:
   Yes right now there's just a fixed mapping from fieldtype to coder. There's 
not a bug filed for using components, I was thinking that we would just 
continue inlining everything. Do you think we should plan on using components 
instead? What does that get us?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325398)
Time Spent: 12h 10m  (was: 12h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=325397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325397
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:11
Start Date: 09/Oct/19 00:11
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9735: [BEAM-8355] Add a 
standard boolean coder
URL: https://github.com/apache/beam/pull/9735#issuecomment-539756148
 
 
   This should be all ready to merge.   The jenkins jobs are being super flaky. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325397)
Time Spent: 2h 40m  (was: 2.5h)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=325395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325395
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:10
Start Date: 09/Oct/19 00:10
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-539755932
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325395)
Time Spent: 2h 10m  (was: 2h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=325396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325396
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:10
Start Date: 09/Oct/19 00:10
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-539756014
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325396)
Time Spent: 2h 20m  (was: 2h 10m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=325393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325393
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:09
Start Date: 09/Oct/19 00:09
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-539755630
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325393)
Time Spent: 5h  (was: 4h 50m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=325394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325394
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:09
Start Date: 09/Oct/19 00:09
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-539755717
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325394)
Time Spent: 5h 10m  (was: 5h)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=325392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325392
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:08
Start Date: 09/Oct/19 00:08
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9735: [BEAM-8355] Add a 
standard boolean coder
URL: https://github.com/apache/beam/pull/9735#issuecomment-539755537
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325392)
Time Spent: 2.5h  (was: 2h 20m)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=325391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325391
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:08
Start Date: 09/Oct/19 00:08
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9735: [BEAM-8355] Add a 
standard boolean coder
URL: https://github.com/apache/beam/pull/9735#issuecomment-539755521
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325391)
Time Spent: 2h 20m  (was: 2h 10m)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?focusedWorklogId=325387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325387
 ]

ASF GitHub Bot logged work on BEAM-6829:


Author: ASF GitHub Bot
Created on: 08/Oct/19 23:53
Start Date: 08/Oct/19 23:53
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #8585: [BEAM-6829] Use 
transform/pcollection name for metric namespace if none provided
URL: https://github.com/apache/beam/pull/8585#issuecomment-539751962
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325387)
Time Spent: 3h 10m  (was: 3h)

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325386
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 23:47
Start Date: 08/Oct/19 23:47
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#discussion_r332780486
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java
 ##
 @@ -17,5 +17,38 @@
  */
 package org.apache.beam.sdk.extensions.sql.meta;
 
+import static 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.List;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+
 /** Basic implementation of {@link BeamSqlTable} methods used by predicate and 
filter push-down. */
-public abstract class BaseBeamTable implements BeamSqlTable {}
+public abstract class BaseBeamTable implements BeamSqlTable {
+
+  @Override
+  public PCollection buildIOReader(
+  PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+String error =
+this.getClass().getName()
++ " does not support predicate/project push-down, yet non-empty %s 
is passed.";
+checkArgument(
 
 Review comment:
   Updated to use `checkArgument` with string formatting.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325386)
Time Spent: 3h 50m  (was: 3h 40m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325384
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 23:46
Start Date: 08/Oct/19 23:46
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#discussion_r332780266
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java
 ##
 @@ -17,22 +17,33 @@
  */
 package org.apache.beam.sdk.extensions.sql.meta;
 
+import java.util.List;
 import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.POutput;
 import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
 
 /** This interface defines a Beam Sql Table. */
 public interface BeamSqlTable {
   /** create a {@code PCollection} from source. */
   PCollection buildIOReader(PBegin begin);
 
+  /** create a {@code PCollection} from source with predicate and/or 
project pushed-down. */
+  PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames);
+
   /** create a {@code IO.write()} instance to write to target. */
   POutput buildIOWriter(PCollection input);
 
+  /** Generate an IO implementation of {@code BeamSqlTableFilter} for 
predicate push-down. */
+  BeamSqlTableFilter constructFilter(List filter);
+
+  /** Whether project push-down is supported by the IO API. */
+  Boolean supportsProjects();
 
 Review comment:
   No, value should not be nullable, update to use `boolean` instead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325384)
Time Spent: 3h 40m  (was: 3.5h)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8354) Add RowJsonSerializer

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8354?focusedWorklogId=325380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325380
 ]

ASF GitHub Bot logged work on BEAM-8354:


Author: ASF GitHub Bot
Created on: 08/Oct/19 23:13
Start Date: 08/Oct/19 23:13
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9732: [BEAM-8354] Add 
`RowJsonSerializer`
URL: https://github.com/apache/beam/pull/9732#issuecomment-539742866
 
 
   Ah, everything in the `util` package is not for users. (they may or may not 
be on board with this design decision)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325380)
Time Spent: 40m  (was: 0.5h)

> Add RowJsonSerializer
> -
>
> Key: BEAM-8354
> URL: https://issues.apache.org/jira/browse/BEAM-8354
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should add a RowJsonSerializer that is compatible with RowJsonDeserializer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325373
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 22:49
Start Date: 08/Oct/19 22:49
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#discussion_r332731637
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java
 ##
 @@ -17,5 +17,38 @@
  */
 package org.apache.beam.sdk.extensions.sql.meta;
 
+import static 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.List;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+
 /** Basic implementation of {@link BeamSqlTable} methods used by predicate and 
filter push-down. */
-public abstract class BaseBeamTable implements BeamSqlTable {}
+public abstract class BaseBeamTable implements BeamSqlTable {
+
+  @Override
+  public PCollection buildIOReader(
+  PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+String error =
+this.getClass().getName()
++ " does not support predicate/project push-down, yet non-empty %s 
is passed.";
+checkArgument(
 
 Review comment:
   `checkArgument` has built in string formatting support that only runs on 
errors. You should use that:
   
https://guava.dev/releases/19.0/api/docs/com/google/common/base/Preconditions.html#checkArgument(boolean,%20java.lang.String,%20java.lang.Object...)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325373)
Time Spent: 3.5h  (was: 3h 20m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325374
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 22:49
Start Date: 08/Oct/19 22:49
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#discussion_r332765742
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java
 ##
 @@ -17,22 +17,33 @@
  */
 package org.apache.beam.sdk.extensions.sql.meta;
 
+import java.util.List;
 import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.POutput;
 import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
 
 /** This interface defines a Beam Sql Table. */
 public interface BeamSqlTable {
   /** create a {@code PCollection} from source. */
   PCollection buildIOReader(PBegin begin);
 
+  /** create a {@code PCollection} from source with predicate and/or 
project pushed-down. */
+  PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames);
+
   /** create a {@code IO.write()} instance to write to target. */
   POutput buildIOWriter(PCollection input);
 
+  /** Generate an IO implementation of {@code BeamSqlTableFilter} for 
predicate push-down. */
+  BeamSqlTableFilter constructFilter(List filter);
+
+  /** Whether project push-down is supported by the IO API. */
+  Boolean supportsProjects();
 
 Review comment:
   Does this need to be nullable, or will a `boolean` work here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325374)
Time Spent: 3.5h  (was: 3h 20m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325370
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 08/Oct/19 22:36
Start Date: 08/Oct/19 22:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9746: [BEAM-8333] Adding 
lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#issuecomment-539732787
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325370)
Time Spent: 40m  (was: 0.5h)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=325358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325358
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 08/Oct/19 22:06
Start Date: 08/Oct/19 22:06
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9724: [BEAM-8348] make 
job_name a standard option in Python SDK
URL: https://github.com/apache/beam/pull/9724#issuecomment-539723909
 
 
   As discussed offline, I don't think we want to break a bunch of Dataflow 
users' pipelines over something trivial like this. Closing
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325358)
Time Spent: 50m  (was: 40m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=325359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325359
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 08/Oct/19 22:06
Start Date: 08/Oct/19 22:06
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9724: [BEAM-8348] make 
job_name a standard option in Python SDK
URL: https://github.com/apache/beam/pull/9724
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325359)
Time Spent: 1h  (was: 50m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325340
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 08/Oct/19 21:36
Start Date: 08/Oct/19 21:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9746: [BEAM-8333] Adding 
lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#issuecomment-539714669
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325340)
Time Spent: 0.5h  (was: 20m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325338
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 21:35
Start Date: 08/Oct/19 21:35
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#discussion_r332744724
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTableFilter.java
 ##
 @@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import java.util.List;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+
+/** This interface defines Beam SQL Table Filter. */
+public interface BeamSqlTableFilter {
+  /**
+   * Identify parts of a predicate that are not supported by the IO push-down 
capabilities to be
+   * preserved in a {@code Calc} following {@code BeamIOSourceRel}.
+   *
+   * @return {@code List} unsupported by the IO API. Should be empty 
when an entire
+   * condition is supported, or an unchanged {@code List} when 
predicate push-down is
+   * not supported at all.
+   */
+  List getNotSupported();
 
 Review comment:
   Changed method name to `constructFilter`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325338)
Time Spent: 3h 20m  (was: 3h 10m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=325336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325336
 ]

ASF GitHub Bot logged work on BEAM-876:
---

Author: ASF GitHub Bot
Created on: 08/Oct/19 21:28
Start Date: 08/Oct/19 21:28
Worklog Time Spent: 10m 
  Work Description: ziel commented on pull request #9524: [BEAM-876] 
Support schemaUpdateOption in BigQueryIO
URL: https://github.com/apache/beam/pull/9524#discussion_r332742208
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java
 ##
 @@ -302,6 +308,12 @@ public WriteTables(
 this.kmsKey = kmsKey;
   }
 
+  public WriteTables withSchemaUpdateOptions(
+  Set schemaUpdateOptions) {
+this.schemaUpdateOptions = schemaUpdateOptions;
+return this;
+  }
 
 Review comment:
   I pushed a mini update to address this bit. Going to see about an 
integration test when I next get a chance to look at this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325336)
Time Spent: 1.5h  (was: 1h 20m)

> Support schemaUpdateOption in BigQueryIO
> 
>
> Key: BEAM-876
> URL: https://issues.apache.org/jira/browse/BEAM-876
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Eugene Kirpichov
>Assignee: canaan silberberg
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> BigQuery recently added support for updating the schema as a side effect of 
> the load job.
> Here is the relevant API method in JobConfigurationLoad: 
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List)
> BigQueryIO should support this too. See user request for this: 
> http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325329
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 20:58
Start Date: 08/Oct/19 20:58
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9731: [BEAM-8343] Added 
nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#issuecomment-539701443
 
 
   R: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325329)
Time Spent: 3h 10m  (was: 3h)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-08 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8365:

Description: 
* InMemoryTable should implement implement a following method:
{code:java}
public PCollection buildIOReader(
PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
Which should return a `PCollection` with fields specified in `fieldNames` list.

 * Add a property "push_down" to TestTableProvider, which should allow user to 
select version of InMemoryProvider to use: with project push-down, with 
predicate push-down (will be implemented later), both (will be implemented 
later), or without (default).
 * Create a rule to push fields used by a Calc (in projects and in a condition) 
down into TestTable IO.
 * Updating that same Calc  (from previous step) to have a proper input and 
output schemes, remove unused fields.

  was:
* Create a class extending InMemoryTable and implement a following method:
{code:java}
public PCollection buildIOReader(
PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
Which should return a `PCollection` with fields specified in `fieldNames` list.

 * Add a property "push_down" to TestTableProvider, which should allow user to 
select version of InMemoryProvider to use: with project push-down, with 
predicate push-down (will be implemented later), both (will be implemented 
later), or without (default).
 * Create a rule to push fields used by a Calc (in projects and in a condition) 
down into TestTable IO.
 * Updating that same Calc  (from previous step) to have a proper input and 
output schemes, remove unused fields.


> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Add a property "push_down" to TestTableProvider, which should allow user 
> to select version of InMemoryProvider to use: with project push-down, with 
> predicate push-down (will be implemented later), both (will be implemented 
> later), or without (default).
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325327
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 08/Oct/19 20:54
Start Date: 08/Oct/19 20:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9746: [BEAM-8333] Adding 
lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#issuecomment-539700024
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325327)
Time Spent: 20m  (was: 10m)

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325317
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 20:36
Start Date: 08/Oct/19 20:36
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539693272
 
 
   Staging links:
   
http://apache-beam-website-pull-requests.storage.googleapis.com/9306/documentation/dsls/sql/overview/index.html
   
http://apache-beam-website-pull-requests.storage.googleapis.com/9306/documentation/dsls/sql/zetasql/overview/index.html
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325317)
Time Spent: 1h 20m  (was: 1h 10m)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8333?focusedWorklogId=325316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325316
 ]

ASF GitHub Bot logged work on BEAM-8333:


Author: ASF GitHub Bot
Created on: 08/Oct/19 20:30
Start Date: 08/Oct/19 20:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9746: [BEAM-8333] Adding 
lull logging for SDK harness
URL: https://github.com/apache/beam/pull/9746#issuecomment-539690884
 
 
   This logs something like this:
   ```
   WARNING:root:There has been a processing lull of over 0.45 seconds in state 
process-msecs in step Map(sleep) . Traceback:
 File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
   self._bootstrap_inner()
   
 File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
   self.run()
   
 File "/usr/lib/python3.6/threading.py", line 864, in run
   self._target(*self._args, **self._kwargs)
   
 File "/usr/lib/python3.6/concurrent/futures/thread.py", line 69, in _worker
   work_item.run()
   
 File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
   result = self.fn(*self.args, **self.kwargs)
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 203, in task
   self._execute(lambda: worker.do_instruction(work), work)
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 170, in _execute
   response = task()
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 203, in 
   self._execute(lambda: worker.do_instruction(work), work)
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 360, in do_instruction
   request.instruction_id)
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 386, in process_bundle
   bundle_processor.process_bundle(instruction_id))
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 663, in process_bundle
   data.transform_id].process_encoded(data.data)
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
   self.output(decoded_value)
   
 File 
"/usr/local/google/home/pabloem/codes/log-lull-sdk/sdks/python/apache_beam/transforms/core.py",
 line 1356, in 
   wrapper = lambda x, *args, **kwargs: [fn(x, *args, **kwargs)]
   
   .
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325316)
Remaining Estimate: 0h
Time Spent: 10m

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8333) Python SDK Worker should log lulls with progress-reporting thread

2019-10-08 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947168#comment-16947168
 ] 

Pablo Estrada commented on BEAM-8333:
-

PR [https://github.com/apache/beam/pull/9746] should address this for the 
Python SDK.

> Python SDK Worker should log lulls with progress-reporting thread
> -
>
> Key: BEAM-8333
> URL: https://issues.apache.org/jira/browse/BEAM-8333
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325311
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 20:24
Start Date: 08/Oct/19 20:24
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539665140
 
 
   > High level question: Do we actually want the user facing name to be 
ZetaSQL dialect? Would Big Query Standard SQL dialect be more appropriate?
   
   I like the idea here - renaming to more accurately reflect the intended use 
case. But these are some points I'd say in favor of the ZetaSQL name:
   - ZetaSQL is open source whereas BigQuery is a closed-source, Google product 
(using the closed-source name may lead to some legal/trademark roadblocks)
   - The supported components in Beam SQL are more similar to Dataflow SQL, 
which is itself a ZetaSQL dialect (unlike BQ SQL, which is similar to but not 
exactly a ZetaSQL dialect; see below)
   - BQ Standard SQL is similar to ZetaSQL but (AFAIK) not based on the actual 
ZetaSQL framework, so using a name like BQ Standard SQL in Beam could 
misrepresent BQ Standard SQL as a ZetaSQL dialect
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325311)
Time Spent: 1h 10m  (was: 1h)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8202) Support ParquetTable Writer

2019-10-08 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-8202:
--

Assignee: Xianqiong Wu

> Support ParquetTable Writer
> ---
>
> Key: BEAM-8202
> URL: https://issues.apache.org/jira/browse/BEAM-8202
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Xianqiong Wu
>Priority: Major
>
> https://github.com/apache/beam/pull/9054 supported reader for Parquet Table 
> in BeamSQL. We can support writer as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325309
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 20:19
Start Date: 08/Oct/19 20:19
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539665140
 
 
   > High level question: Do we actually want the user facing name to be 
ZetaSQL dialect? Would Big Query Standard SQL dialect be more appropriate?
   
   I like the idea here - renaming to more accurately reflect the intended use 
case. But these are some points I'd say in favor of the ZetaSQL name:
   - ZetaSQL is open source whereas BigQuery is a closed-source, Google product 
(using the closed-source name may lead to some legal/trademark roadblocks)
   - The supported components in Beam SQL are more similar to Dataflow SQL, 
which is itself a ZetaSQL dialect (unlike BQ SQL, which is similar to but not 
exactly a ZetaSQL dialect)
   - BQ Standard SQL is similar to ZetaSQL but (AFAIK) not based on the actual 
ZetaSQL framework, so using a name like BQ Standard SQL in Beam could 
misrepresent BQ Standard SQL as a ZetaSQL dialect
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325309)
Time Spent: 1h  (was: 50m)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8363) Nexmark regression in direct runner in streaming mode

2019-10-08 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-8363:
-

Assignee: Mark Liu  (was: Kenneth Knowles)

> Nexmark regression in direct runner in streaming mode
> -
>
> Key: BEAM-8363
> URL: https://issues.apache.org/jira/browse/BEAM-8363
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.16.0
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
> Fix For: 2.17.0
>
> Attachments: regression_screenshot.png
>
>
> From Nexmark performance dashboard for direct runner: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, 
> there is a regression for streaming mode happened on August 25.
> The runtime increased about 25% in two days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8363) Nexmark regression in direct runner in streaming mode

2019-10-08 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-8363:
-

Assignee: (was: Mark Liu)

> Nexmark regression in direct runner in streaming mode
> -
>
> Key: BEAM-8363
> URL: https://issues.apache.org/jira/browse/BEAM-8363
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.16.0
>Reporter: Mark Liu
>Priority: Major
> Fix For: 2.17.0
>
> Attachments: regression_screenshot.png
>
>
> From Nexmark performance dashboard for direct runner: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, 
> there is a regression for streaming mode happened on August 25.
> The runtime increased about 25% in two days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8363) Nexmark regression in direct runner in streaming mode

2019-10-08 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-8363:
-

Assignee: Kenneth Knowles

> Nexmark regression in direct runner in streaming mode
> -
>
> Key: BEAM-8363
> URL: https://issues.apache.org/jira/browse/BEAM-8363
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.16.0
>Reporter: Mark Liu
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.17.0
>
> Attachments: regression_screenshot.png
>
>
> From Nexmark performance dashboard for direct runner: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, 
> there is a regression for streaming mode happened on August 25.
> The runtime increased about 25% in two days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=325294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325294
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:51
Start Date: 08/Oct/19 19:51
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9190: [BEAM-7520] Fix 
timer firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-539674351
 
 
   I haven't had a chance to look again - thanks for the ping.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325294)
Time Spent: 10h 10m  (was: 10h)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325291
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:31
Start Date: 08/Oct/19 19:31
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539665140
 
 
   > High level question: Do we actually want the user facing name to be 
ZetaSQL dialect? Would Big Query Standard SQL dialect be more appropriate?
   
   I like the idea here - renaming to more accurately reflect the intended use 
case. But these are some points I'd say in favor of the ZetaSQL name:
   - ZetaSQL is open source whereas BigQuery is a closed-source, Google product 
(using the closed-source name may lead to some legal/trademark roadblocks)
   - The supported components in Beam SQL are more similar to Dataflow SQL, 
which is itself a ZetaSQL dialect (unlike BQ SQL, which is similar to but not 
exactly a ZetaSQL dialect)
   - BQ standard SQL is itself more or less another ZetaSQL dialect, so using a 
name like BQ Standard SQL in Beam shrouds the underlying SQL framework (since 
the dialect in Beam SQL is based off ZetaSQL directly, not the intermediate BQ 
standard SQ)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325291)
Time Spent: 50m  (was: 40m)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325289
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:30
Start Date: 08/Oct/19 19:30
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539665140
 
 
   > High level question: Do we actually want the user facing name to be 
ZetaSQL dialect? Would Big Query Standard SQL dialect be more appropriate?
   
   I like the idea here - renaming to more accurately reflect the intended use 
case. But these are some points I'd say in favor of the ZetaSQL name:
   - ZetaSQL is open source whereas BigQuery is a closed-source, Google product 
(using the closed-source name may lead to some legal/trademark roadblocks)
   - The supported components in Beam SQL are more similar to Dataflow SQL, 
which is itself a ZetaSQL dialect (unlike BQ SQL)
   - BQ standard SQL is itself more or less another ZetaSQL dialect, so using a 
name like BQ Standard SQL in Beam shrouds the underlying SQL framework (since 
the dialect in Beam SQL is based off ZetaSQL directly, not the intermediate BQ 
standard SQ)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325289)
Time Spent: 40m  (was: 0.5h)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325287
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:29
Start Date: 08/Oct/19 19:29
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539665140
 
 
   > High level question: Do we actually want the user facing name to be 
ZetaSQL dialect? Would Big Query Standard SQL dialect be more appropriate?
   
   I like the idea here - renaming to more accurately reflect the intended use 
case. But these are some points I'd say in favor of the ZetaSQL name:
   - ZetaSQL is open source whereas BigQuery is a closed-source, Google product
   - The supported components in Beam SQL are more similar to Dataflow SQL, 
which is itself a ZetaSQL dialect (unlike BQ SQL)
   - BQ standard SQL is itself more or less another ZetaSQL dialect, so using a 
name like BQ Standard SQL in Beam shrouds the underlying SQL framework (since 
the dialect in Beam SQL is based off ZetaSQL directly, not the intermediate BQ 
standard SQ)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325287)
Time Spent: 0.5h  (was: 20m)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325284
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:26
Start Date: 08/Oct/19 19:26
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539665140
 
 
   > High level question: Do we actually want the user facing name to be 
ZetaSQL dialect? Would Big Query Standard SQL dialect be more appropriate?
   
   I like the idea here - renaming to more accurately reflect the intended use 
case. But these are some points I'd say in favor of the ZetaSQL name:
   - ZetaSQL is open source whereas BigQuery is a closed-source, Google product
   - The supported components in Beam SQL are more similar to Dataflow SQL, 
which is itself a ZetaSQL dialect (unlike BQ SQL)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325284)
Time Spent: 20m  (was: 10m)

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7939) Document ZetaSQL dialect in Beam SQL

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7939?focusedWorklogId=325283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325283
 ]

ASF GitHub Bot logged work on BEAM-7939:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:21
Start Date: 08/Oct/19 19:21
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #9306: [BEAM-7939] ZetaSQL 
dialect documentation
URL: https://github.com/apache/beam/pull/9306#issuecomment-539663000
 
 
   High level question: Do we actually want the user facing name to be ZetaSQL 
dialect? Would Big Query Standard SQL dialect be more appropriate?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325283)
Remaining Estimate: 0h
Time Spent: 10m

> Document ZetaSQL dialect in Beam SQL
> 
>
> Key: BEAM-7939
> URL: https://issues.apache.org/jira/browse/BEAM-7939
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Blocked by BEAM-7832. ZetaSQL dialect source will be merged from #9210.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=325281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325281
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:04
Start Date: 08/Oct/19 19:04
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9642: [BEAM-8213] 
Split up monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325281)
Time Spent: 13.5h  (was: 13h 20m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8213) Run and report python tox tasks separately within Jenkins

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=325280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325280
 ]

ASF GitHub Bot logged work on BEAM-8213:


Author: ASF GitHub Bot
Created on: 08/Oct/19 19:04
Start Date: 08/Oct/19 19:04
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-539656743
 
 
   Closing this until we move to pytest and xdist
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325280)
Time Spent: 13h 20m  (was: 13h 10m)

> Run and report python tox tasks separately within Jenkins
> -
>
> Key: BEAM-8213
> URL: https://issues.apache.org/jira/browse/BEAM-8213
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=325273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325273
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 08/Oct/19 18:41
Start Date: 08/Oct/19 18:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-539647938
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325273)
Time Spent: 7h 10m  (was: 7h)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-08 Thread Chamikara Madhusanka Jayalath (Jira)
Chamikara Madhusanka Jayalath created BEAM-8367:
---

 Summary: Python BigQuery sink should use unique IDs for mode 
STREAMING_INSERTS
 Key: BEAM-8367
 URL: https://issues.apache.org/jira/browse/BEAM-8367
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Chamikara Madhusanka Jayalath
Assignee: Pablo Estrada


Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
example, we don't write the same record twice in a VM failure.

 

Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a VM 
failure resulting in data duplication.

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]

 

Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
generated, similar to how Java BQ sink operates.

[https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]

 

Pablo, can you do an initial assessment here ?

I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8355) Make BooleanCoder a standard coder

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8355?focusedWorklogId=325269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325269
 ]

ASF GitHub Bot logged work on BEAM-8355:


Author: ASF GitHub Bot
Created on: 08/Oct/19 18:30
Start Date: 08/Oct/19 18:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9735: [BEAM-8355] Add a 
standard boolean coder
URL: https://github.com/apache/beam/pull/9735#issuecomment-539643431
 
 
   Thanks. This looks good to me. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325269)
Time Spent: 2h 10m  (was: 2h)

> Make BooleanCoder a standard coder
> --
>
> Key: BEAM-8355
> URL: https://issues.apache.org/jira/browse/BEAM-8355
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This involves making the current java BooleanCoder a standard coder, and 
> implementing an equivalent coder in python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8309) Update Python dependencies page for 2.16.0

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8309?focusedWorklogId=325265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325265
 ]

ASF GitHub Bot logged work on BEAM-8309:


Author: ASF GitHub Bot
Created on: 08/Oct/19 18:22
Start Date: 08/Oct/19 18:22
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9653: 
[BEAM-8309] Document SDK 2.16.0 Python dependencies
URL: https://github.com/apache/beam/pull/9653
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325265)
Time Spent: 1.5h  (was: 1h 20m)

> Update Python dependencies page for 2.16.0
> --
>
> Key: BEAM-8309
> URL: https://issues.apache.org/jira/browse/BEAM-8309
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.16.0
> [https://beam.apache.org/documentation/sdks/python-dependencies/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6884) NoSuchMethodError: descriptors$EnumValueDescriptor when deploying Beam Java SDK 2.10.0 to Dataflow

2019-10-08 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947098#comment-16947098
 ] 

Kirill Kozlov commented on BEAM-6884:
-

I had similar issue when running tests using Intellij, running the same test 
with Gradle works.

> NoSuchMethodError: descriptors$EnumValueDescriptor when deploying Beam Java 
> SDK 2.10.0 to Dataflow
> --
>
> Key: BEAM-6884
> URL: https://issues.apache.org/jira/browse/BEAM-6884
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.10.0
>Reporter: Yohei Shimomae
>Priority: Major
>
> My working environment:
>  * Apache Beam Java SDK version: works with 2.9.0 but failed with 2.10.0
>  * Runner: failed with both Direct Runner and Dataflow Runner
>  * Application code: Scala (note I did not use Scio)
> I tried to change Apache Beam Java SDK version from 2.9.0 to 2.10.0 and 
> deploy it to Dataflow but I got this error. It works with 2.9.0. Am I missing 
> something?
> {code:java}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.beam.model.pipeline.v1.RunnerApi$BeamConstants$Constants.getValueDescriptor()Lorg/apache/beam/vendor/grpc/v1p13p1/com/google/protobuf/Descriptors$EnumValueDescriptor;
> at 
> org.apache.beam.sdk.transforms.windowing.BoundedWindow.extractTimestampFromProto(BoundedWindow.java:84)
> at 
> org.apache.beam.sdk.transforms.windowing.BoundedWindow.(BoundedWindow.java:49)
> at 
> org.apache.beam.sdk.coders.CoderRegistry$CommonTypes.(CoderRegistry.java:140)
> at 
> org.apache.beam.sdk.coders.CoderRegistry$CommonTypes.(CoderRegistry.java:97)
> at org.apache.beam.sdk.coders.CoderRegistry.(CoderRegistry.java:160)
> at org.apache.beam.sdk.Pipeline.getCoderRegistry(Pipeline.java:326)
> at org.apache.beam.sdk.io.kafka.KafkaIO$Read.expand(KafkaIO.java:707)
> at org.apache.beam.sdk.io.kafka.KafkaIO$Read.expand(KafkaIO.java:309)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:182)
> {code}
> My code is in Scala but it works well with Beam 2.9.0. 
> {code:java}
>   val p = Pipeline.create(options)
> p.apply(s"${bu.name}_ReadFromKafka", KafkaIO.read()
> .withBootstrapServers(options.getBootstreapServers)
> .updateConsumerProperties(config)
> .withTopics(util.Arrays.asList(topicName))
> .withKeyDeserializer(classOf[LongDeserializer])
> .withValueDeserializer(classOf[StringDeserializer])
> .withConsumerFactoryFn(
>   new KafkaTLSConsumerFactory(
> projectId, options.getSourceBucket, options.getTrustStoreGCSKey, 
> options.getKeyStoreGCSKey)))
>  .apply(s"${bu.name}_Convert", ParDo.of(new 
> ConvertJSONTextToEPCTransaction(bu)))
>  .apply(s"${bu.name}_WriteToBQ",  BigQueryIO.write()
>   .to(bqDestTable)
>   .withSchema(schema)
>   .withFormatFunction(new ConvertMessageToTable())
>   .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
>   .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND))
>   }
>   p.run
> {code}
> According to the error log, it failed at this part.
>  
> [https://github.com/apache/beam/blob/v2.10.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/BoundedWindow.java#L81-L85]
> {code:java}
>   private static Instant 
> extractTimestampFromProto(RunnerApi.BeamConstants.Constants constant) {
> return new Instant(
> Long.parseLong(
> 
> constant.getValueDescriptor().getOptions().getExtension(RunnerApi.beamConstant)));
>   }
> {code}
> This constant come from this part.
>  
> [https://github.com/apache/beam/blob/v2.10.0/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/BoundedWindow.java#L48-L49]
> {code:java}
>   public static final Instant TIMESTAMP_MIN_VALUE =
>   
> extractTimestampFromProto(RunnerApi.BeamConstants.Constants.MIN_TIMESTAMP_MILLIS);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8362) Don't use ZetaSQL's unimplemented functions

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8362?focusedWorklogId=325212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325212
 ]

ASF GitHub Bot logged work on BEAM-8362:


Author: ASF GitHub Bot
Created on: 08/Oct/19 17:52
Start Date: 08/Oct/19 17:52
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9739: [BEAM-8362] 
Don't use ZetaSQL's unimplemented functions
URL: https://github.com/apache/beam/pull/9739#discussion_r332648960
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java
 ##
 @@ -517,13 +517,7 @@ public RexNode convertResolvedLiteral(ResolvedLiteral 
resolvedLiteral) {
 ret = convertValueToRexNode(resolvedLiteral.getType(), 
resolvedLiteral.getValue());
 break;
   default:
-throw new RuntimeException(
-MessageFormat.format(
-"Unsupported ResolvedLiteral type: {0}, kind: {1}, value: {2}, 
class: {3}",
-resolvedLiteral.getType().typeName(),
-kind,
-resolvedLiteral.getValue(),
-resolvedLiteral.getClass()));
+throw new RuntimeException("Unsupported ResolvedLiteral type.");
 
 Review comment:
   I would prefer a mixed approach:
   1. give user the unsupported kind.
   2. print log info of more information to help development.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325212)
Time Spent: 1h  (was: 50m)

> Don't use ZetaSQL's unimplemented functions
> ---
>
> Key: BEAM-8362
> URL: https://issues.apache.org/jira/browse/BEAM-8362
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Affects Versions: 2.15.0
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Unfortunately a bunch of debug functionality is still unimplemented in 
> ZetaSQL. We should avoid calling those functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6544) Python wheel release process must not require inputting credentials to third party sites

2019-10-08 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-6544.

Fix Version/s: Not applicable
   Resolution: Fixed

> Python wheel release process must not require inputting credentials to third 
> party sites
> 
>
> Key: BEAM-6544
> URL: https://issues.apache.org/jira/browse/BEAM-6544
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Blocker
> Fix For: Not applicable
>
>
> According to the instructions at 
> https://beam.apache.org/contribute/release-guide/#build-and-stage-python-wheels
>  and https://github.com/apache/beam-wheels, the release manager must input 
> their ASF credentials to Travis-CI. This should not be required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6544) Python wheel release process must not require inputting credentials to third party sites

2019-10-08 Thread Mark Liu (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947080#comment-16947080
 ] 

Mark Liu commented on BEAM-6544:


Auto push from beam-wheels works for me in 2.16 release. I think this jira can 
be marked as resolved.

> Python wheel release process must not require inputting credentials to third 
> party sites
> 
>
> Key: BEAM-6544
> URL: https://issues.apache.org/jira/browse/BEAM-6544
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Blocker
>
> According to the instructions at 
> https://beam.apache.org/contribute/release-guide/#build-and-stage-python-wheels
>  and https://github.com/apache/beam-wheels, the release manager must input 
> their ASF credentials to Travis-CI. This should not be required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8362) Don't use ZetaSQL's unimplemented functions

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8362?focusedWorklogId=325210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325210
 ]

ASF GitHub Bot logged work on BEAM-8362:


Author: ASF GitHub Bot
Created on: 08/Oct/19 17:51
Start Date: 08/Oct/19 17:51
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9739: [BEAM-8362] 
Don't use ZetaSQL's unimplemented functions
URL: https://github.com/apache/beam/pull/9739#discussion_r332648527
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java
 ##
 @@ -517,13 +517,7 @@ public RexNode convertResolvedLiteral(ResolvedLiteral 
resolvedLiteral) {
 ret = convertValueToRexNode(resolvedLiteral.getType(), 
resolvedLiteral.getValue());
 break;
   default:
-throw new RuntimeException(
-MessageFormat.format(
-"Unsupported ResolvedLiteral type: {0}, kind: {1}, value: {2}, 
class: {3}",
-resolvedLiteral.getType().typeName(),
-kind,
-resolvedLiteral.getValue(),
-resolvedLiteral.getClass()));
+throw new RuntimeException("Unsupported ResolvedLiteral type.");
 
 Review comment:
   It's very useful for development actually. And for users, at least what kind 
of ResolvedLiteral should be thrown to remind where could be wrong in user 
queries. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325210)
Time Spent: 50m  (was: 40m)

> Don't use ZetaSQL's unimplemented functions
> ---
>
> Key: BEAM-8362
> URL: https://issues.apache.org/jira/browse/BEAM-8362
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Affects Versions: 2.15.0
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Unfortunately a bunch of debug functionality is still unimplemented in 
> ZetaSQL. We should avoid calling those functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=325204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325204
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 08/Oct/19 17:45
Start Date: 08/Oct/19 17:45
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9731: [BEAM-8343] 
Added nessesary methods to BeamSqlTable to enable support for predicate/project 
push-down
URL: https://github.com/apache/beam/pull/9731#discussion_r332645483
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTableFilter.java
 ##
 @@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import java.util.List;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+
+/** This interface defines Beam SQL Table Filter. */
+public interface BeamSqlTableFilter {
+  /**
+   * Identify parts of a predicate that are not supported by the IO push-down 
capabilities to be
+   * preserved in a {@code Calc} following {@code BeamIOSourceRel}.
+   *
+   * @return {@code List} unsupported by the IO API. Should be empty 
when an entire
+   * condition is supported, or an unchanged {@code List} when 
predicate push-down is
+   * not supported at all.
+   */
+  List getNotSupported();
 
 Review comment:
   Change method name to something more accurate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325204)
Time Spent: 3h  (was: 2h 50m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - Boolean supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8309) Update Python dependencies page for 2.16.0

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8309?focusedWorklogId=325203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325203
 ]

ASF GitHub Bot logged work on BEAM-8309:


Author: ASF GitHub Bot
Created on: 08/Oct/19 17:44
Start Date: 08/Oct/19 17:44
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9653: [BEAM-8309] Document 
SDK 2.16.0 Python dependencies
URL: https://github.com/apache/beam/pull/9653#issuecomment-539625259
 
 
   > LGTM.
   > 
   > @soyrice how did you generate the dependencies? Did you use a script? 
Should we add release guide with instruction on re-generating this page after 
releases?
   
   Yeah, Melissa wrote a script that the TWs use: 
https://raw.githubusercontent.com/melap/beam/pydepsscript/website/scripts/generate_py_deps.py
   
   That sounds like a good idea to me. We should definitely add the docs 
components to the release guide. But these dependencies are also fairly easy to 
collect manually.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325203)
Time Spent: 1h 20m  (was: 1h 10m)

> Update Python dependencies page for 2.16.0
> --
>
> Key: BEAM-8309
> URL: https://issues.apache.org/jira/browse/BEAM-8309
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.16.0
> [https://beam.apache.org/documentation/sdks/python-dependencies/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8309) Update Python dependencies page for 2.16.0

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8309?focusedWorklogId=325196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325196
 ]

ASF GitHub Bot logged work on BEAM-8309:


Author: ASF GitHub Bot
Created on: 08/Oct/19 17:31
Start Date: 08/Oct/19 17:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9653: [BEAM-8309] Document 
SDK 2.16.0 Python dependencies
URL: https://github.com/apache/beam/pull/9653#issuecomment-539620354
 
 
   LGTM.
   
   @soyrice how did you generate the dependencies? Did you use a script? Should 
we add release guide with instruction on re-generating this page after releases?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325196)
Time Spent: 1h 10m  (was: 1h)

> Update Python dependencies page for 2.16.0
> --
>
> Key: BEAM-8309
> URL: https://issues.apache.org/jira/browse/BEAM-8309
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.16.0
> [https://beam.apache.org/documentation/sdks/python-dependencies/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8309) Update Python dependencies page for 2.16.0

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8309?focusedWorklogId=325186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325186
 ]

ASF GitHub Bot logged work on BEAM-8309:


Author: ASF GitHub Bot
Created on: 08/Oct/19 16:59
Start Date: 08/Oct/19 16:59
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9653: [BEAM-8309] Document 
SDK 2.16.0 Python dependencies
URL: https://github.com/apache/beam/pull/9653#issuecomment-539607575
 
 
   > (undo approval) I saw Mark's comment. Yes, the dependencies needs to be 
updated.
   
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325186)
Time Spent: 1h  (was: 50m)

> Update Python dependencies page for 2.16.0
> --
>
> Key: BEAM-8309
> URL: https://issues.apache.org/jira/browse/BEAM-8309
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.16.0
> [https://beam.apache.org/documentation/sdks/python-dependencies/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8309) Update Python dependencies page for 2.16.0

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8309?focusedWorklogId=325185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325185
 ]

ASF GitHub Bot logged work on BEAM-8309:


Author: ASF GitHub Bot
Created on: 08/Oct/19 16:59
Start Date: 08/Oct/19 16:59
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #9653: [BEAM-8309] Document 
SDK 2.16.0 Python dependencies
URL: https://github.com/apache/beam/pull/9653#issuecomment-539607502
 
 
   > @soyrice Can you repopulate this dependency list? Afaik dill is changed to 
0.3.0 in this release. Since 2.16 is published, everything is finalized.
   > 
   > +R: @tvalentyn
   
   Done!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325185)
Time Spent: 50m  (was: 40m)

> Update Python dependencies page for 2.16.0
> --
>
> Key: BEAM-8309
> URL: https://issues.apache.org/jira/browse/BEAM-8309
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Update Python dependencies page for 2.16.0
> [https://beam.apache.org/documentation/sdks/python-dependencies/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-08 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-6995:

Fix Version/s: (was: Not applicable)
   2.17.0

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405
> 
> rel#103:BeamCalcRel.BEAM_LOGICAL(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={150.0 
> rows, 801.0 cpu, 0.0 io}
> rel#106:Subset#1.ENUMERABLE, best=rel#105, importance=0.405
> 
> rel#105:BeamEnumerableConverter.ENUMERABLE(input=rel#104:Subset#1.BEAM_LOGICAL),
>  rowcount=50.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#2, type: RecordType(INTEGER id, INTEGER EXPR$1)
> rel#99:Subset#2.NONE, best=null, importance=0.9
> 
> rel#98:LogicalAggregate.NONE(input=rel#97:Subset#1.NONE,group={0},EXPR$1=SUM($1)),
>  rowcount=5.0, cumulative cost={inf}
> rel#100:Subset#2.BEAM_LOGICAL, best=null, importance=1.0
> 
> rel#101:AbstractConverter.BEAM_LOGICAL(input=rel#99:Subset#2.NONE,convention=BEAM_LOGICAL),
>  rowcount=5.0, cumulative cost={inf}
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:437)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:296)
> at 
> 

[jira] [Resolved] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-08 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-6995.
-
Fix Version/s: Not applicable
   Resolution: Fixed

Fixed in [https://github.com/apache/beam/pull/9703]

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405
> 
> rel#103:BeamCalcRel.BEAM_LOGICAL(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={150.0 
> rows, 801.0 cpu, 0.0 io}
> rel#106:Subset#1.ENUMERABLE, best=rel#105, importance=0.405
> 
> rel#105:BeamEnumerableConverter.ENUMERABLE(input=rel#104:Subset#1.BEAM_LOGICAL),
>  rowcount=50.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#2, type: RecordType(INTEGER id, INTEGER EXPR$1)
> rel#99:Subset#2.NONE, best=null, importance=0.9
> 
> rel#98:LogicalAggregate.NONE(input=rel#97:Subset#1.NONE,group={0},EXPR$1=SUM($1)),
>  rowcount=5.0, cumulative cost={inf}
> rel#100:Subset#2.BEAM_LOGICAL, best=null, importance=1.0
> 
> rel#101:AbstractConverter.BEAM_LOGICAL(input=rel#99:Subset#2.NONE,convention=BEAM_LOGICAL),
>  rowcount=5.0, cumulative cost={inf}
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:437)
> at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:296)
> at 
> 

[jira] [Work logged] (BEAM-8352) Reading records in background may lead to OOM errors

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8352?focusedWorklogId=325159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325159
 ]

ASF GitHub Bot logged work on BEAM-8352:


Author: ASF GitHub Bot
Created on: 08/Oct/19 16:13
Start Date: 08/Oct/19 16:13
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #9745: 
[BEAM-8352] Add "withMaxCapacityPerShard()" to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9745
 
 
   Added new configuration option `withMaxCapacityPerShard()` to 
`KinesisIO.Read` to provide a possibility to control the size of 
`ShardReadersPool.recordsQueue` in case if default value (10K records per 
shard) can cause NPE for topics with big amount of shards or large messages 
that can't fit into memory.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8344) Add infer schema support in ParquetIO and refactor ParquetTableProvider

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8344?focusedWorklogId=325132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325132
 ]

ASF GitHub Bot logged work on BEAM-8344:


Author: ASF GitHub Bot
Created on: 08/Oct/19 15:40
Start Date: 08/Oct/19 15:40
Worklog Time Spent: 10m 
  Work Description: bmv126 commented on pull request #9721: [BEAM-8344] Add 
inferSchema support in ParquetIO and refactor ParquetTableProvider
URL: https://github.com/apache/beam/pull/9721#discussion_r332587458
 
 

 ##
 File path: 
sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java
 ##
 @@ -122,15 +122,34 @@
* pattern).
*/
   public static Read read(Schema schema) {
-return new AutoValue_ParquetIO_Read.Builder().setSchema(schema).build();
+return new AutoValue_ParquetIO_Read.Builder()
+.setSchema(schema)
+.setInferBeamSchema(false)
+.build();
   }
 
   /**
* Like {@link #read(Schema)}, but reads each file in a {@link PCollection} 
of {@link
* org.apache.beam.sdk.io.FileIO.ReadableFile}, which allows more flexible 
usage.
*/
   public static ReadFiles readFiles(Schema schema) {
-return new 
AutoValue_ParquetIO_ReadFiles.Builder().setSchema(schema).build();
+return new AutoValue_ParquetIO_ReadFiles.Builder()
+.setSchema(schema)
+.setInferBeamSchema(false)
+.build();
+  }
+
+  private static  PCollection setBeamSchema(
+  PCollection pc, Class clazz, @Nullable Schema schema) {
+org.apache.beam.sdk.schemas.Schema beamSchema =
+org.apache.beam.sdk.schemas.utils.AvroUtils.getSchema(clazz, schema);
+if (beamSchema != null) {
 
 Review comment:
   Thanks for reviewing the code.  
   The idea here was to align it the same way as AvroIO 
(https://github.com/apache/beam/blob/ad5d3836a47fe2cbd552fe3908e15ffc7f777f11/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L339)
 
   Currently AvroUtils.getSchema() returns null if it is anything other than 
GenericRecord type. 
   Currently In parquetIO we are handling only generic record so this method 
will always return schema. 
   I think with your suggestion of making infer schema as non optional and as 
we handle only GenericRecord in ParquetIO, I will modify the code to set the 
beamSchema always without the boolean flag as we do in AvroIO. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325132)
Time Spent: 0.5h  (was: 20m)

> Add infer schema support in ParquetIO and refactor ParquetTableProvider
> ---
>
> Key: BEAM-8344
> URL: https://issues.apache.org/jira/browse/BEAM-8344
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, io-java-parquet
>Reporter: Vishwas
>Assignee: Vishwas
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add support for inferring Beam Schema in ParquetIO.
> Refactor ParquetTable code to use Convert.rows().
> Remove unnecessary java class GenericRecordReadConverter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8306) improve estimation of data byte size reading from source in ElasticsearchIO

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8306?focusedWorklogId=325119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325119
 ]

ASF GitHub Bot logged work on BEAM-8306:


Author: ASF GitHub Bot
Created on: 08/Oct/19 14:47
Start Date: 08/Oct/19 14:47
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #9660: [BEAM-8306] improve 
estimation datasize elasticsearch io
URL: https://github.com/apache/beam/pull/9660#issuecomment-539549366
 
 
   Sorry I have don't have time to review this,  @wscheep maybe ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325119)
Time Spent: 40m  (was: 0.5h)

> improve estimation of data byte size reading from source in ElasticsearchIO
> ---
>
> Key: BEAM-8306
> URL: https://issues.apache.org/jira/browse/BEAM-8306
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Derek He
>Assignee: Derek He
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ElasticsearchIO splits BoundedSource based on the Elasticsearch index size. 
> We expect it can be more accurate to split it base on query result size.
> Currently, we have a big Elasticsearch index. But for query result, it only 
> contains a few documents in the index.  ElasticsearchIO splits it into up 
> to1024 BoundedSources in Google dataflow. It takes long time to finish the 
> processing the small numbers of Elasticsearch document in Google dataflow.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7635) Migrate SnsIO to AWS SDK for Java 2

2019-10-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-7635.

Resolution: Fixed

> Migrate SnsIO to AWS SDK for Java 2
> ---
>
> Key: BEAM-7635
> URL: https://issues.apache.org/jira/browse/BEAM-7635
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7635) Migrate SnsIO to AWS SDK for Java 2

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7635?focusedWorklogId=325117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325117
 ]

ASF GitHub Bot logged work on BEAM-7635:


Author: ASF GitHub Bot
Created on: 08/Oct/19 14:44
Start Date: 08/Oct/19 14:44
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9393: [BEAM-7635] 
Migrate SnsIO to AWS SDK for Java 2
URL: https://github.com/apache/beam/pull/9393
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325117)
Time Spent: 40m  (was: 0.5h)

> Migrate SnsIO to AWS SDK for Java 2
> ---
>
> Key: BEAM-7635
> URL: https://issues.apache.org/jira/browse/BEAM-7635
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7635) Migrate SnsIO to AWS SDK for Java 2

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7635?focusedWorklogId=325116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325116
 ]

ASF GitHub Bot logged work on BEAM-7635:


Author: ASF GitHub Bot
Created on: 08/Oct/19 14:44
Start Date: 08/Oct/19 14:44
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9393: [BEAM-7635] Migrate 
SnsIO to AWS SDK for Java 2
URL: https://github.com/apache/beam/pull/9393#issuecomment-539547898
 
 
   Closing manually since already merged. One extra comment on this one. I 
finally did not change the `PublishResponse` return type in case users in a 
step downstream in the Pipeline may want to use the ids to request the status 
of the response.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325116)
Time Spent: 0.5h  (was: 20m)

> Migrate SnsIO to AWS SDK for Java 2
> ---
>
> Key: BEAM-7635
> URL: https://issues.apache.org/jira/browse/BEAM-7635
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=325091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325091
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:53
Start Date: 08/Oct/19 13:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #9567: [BEAM-5690] 
Fix Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r332510266
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java
 ##
 @@ -451,6 +457,47 @@ public void 
testAdvanceWatermarkEqualToPositiveInfinityThrows() {
 source.advanceWatermarkForNextBatch(BoundedWindow.TIMESTAMP_MAX_VALUE);
   }
 
+  @Test
+  public void testInStreamingModeCountByKey() throws Exception {
+Instant instant = new Instant(0);
+
+CreateStream> kvSource =
+CreateStream.of(KvCoder.of(VarIntCoder.of(), VarLongCoder.of()), 
batchDuration())
+.emptyBatch()
+.advanceWatermarkForNextBatch(instant)
+.nextBatch(
+TimestampedValue.of(KV.of(1, 100L), 
instant.plus(Duration.standardSeconds(3L))),
+TimestampedValue.of(KV.of(1, 300L), 
instant.plus(Duration.standardSeconds(4L
+
.advanceWatermarkForNextBatch(instant.plus(Duration.standardSeconds(7L)))
+.nextBatch(
+TimestampedValue.of(KV.of(1, 400L), 
instant.plus(Duration.standardSeconds(8L
+.advanceNextBatchWatermarkToInfinity();
+
+PCollection> output =
+p.apply("create kv Source", kvSource)
+.apply(
+"window input",
+Window.>into(FixedWindows.of(Duration.standardSeconds(3L)))
+.withAllowedLateness(Duration.ZERO))
+.apply(Count.perKey());
+
+PAssert.that("Wrong count value ", output)
+.satisfies(
+(SerializableFunction>, Void>)
+input -> {
+  for (KV element : input) {
+if (element.getKey() == 1) {
+  Long countValue = element.getValue();
+  assertNotEquals("Count Value is 0 !!!", 0L, 
countValue.longValue());
 
 Review comment:
   As I understood expired timers are not evicted and the fact that they are 
triggered entails an empty collection as output. But it is not in 100% of cases 
right, only in some corners cases ? I see no corner case in this test case, 
there should be 3 value output (one per 3s window, with timestamp 3, 4 and 8). 
I don't understand how this test ensures that the fix works. Is this test 
really failing without the fix in `SparkGroupAlsoByWindowViaWindowSet`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325091)
Time Spent: 1h 40m  (was: 1.5h)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  

[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=325092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325092
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:53
Start Date: 08/Oct/19 13:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #9567: [BEAM-5690] 
Fix Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r332510266
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java
 ##
 @@ -451,6 +457,47 @@ public void 
testAdvanceWatermarkEqualToPositiveInfinityThrows() {
 source.advanceWatermarkForNextBatch(BoundedWindow.TIMESTAMP_MAX_VALUE);
   }
 
+  @Test
+  public void testInStreamingModeCountByKey() throws Exception {
+Instant instant = new Instant(0);
+
+CreateStream> kvSource =
+CreateStream.of(KvCoder.of(VarIntCoder.of(), VarLongCoder.of()), 
batchDuration())
+.emptyBatch()
+.advanceWatermarkForNextBatch(instant)
+.nextBatch(
+TimestampedValue.of(KV.of(1, 100L), 
instant.plus(Duration.standardSeconds(3L))),
+TimestampedValue.of(KV.of(1, 300L), 
instant.plus(Duration.standardSeconds(4L
+
.advanceWatermarkForNextBatch(instant.plus(Duration.standardSeconds(7L)))
+.nextBatch(
+TimestampedValue.of(KV.of(1, 400L), 
instant.plus(Duration.standardSeconds(8L
+.advanceNextBatchWatermarkToInfinity();
+
+PCollection> output =
+p.apply("create kv Source", kvSource)
+.apply(
+"window input",
+Window.>into(FixedWindows.of(Duration.standardSeconds(3L)))
+.withAllowedLateness(Duration.ZERO))
+.apply(Count.perKey());
+
+PAssert.that("Wrong count value ", output)
+.satisfies(
+(SerializableFunction>, Void>)
+input -> {
+  for (KV element : input) {
+if (element.getKey() == 1) {
+  Long countValue = element.getValue();
+  assertNotEquals("Count Value is 0 !!!", 0L, 
countValue.longValue());
 
 Review comment:
   As I understood expired timers are not evicted and the fact that they are 
triggered entails an empty collection as output. But it is not in 100% of cases 
right, only in some corners cases ? I see no corner case in this test case, 
there should be 3 value output (one per 3s window, with timestamp 3, 4 and 8). 
I don't understand how this test ensures that the fix works. Is this test 
really failing without the fix in `SparkGroupAlsoByWindowViaWindowSet` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325092)
Time Spent: 1h 50m  (was: 1h 40m)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09 

[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=325078=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325078
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:51
Start Date: 08/Oct/19 13:51
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #9567: [BEAM-5690] 
Fix Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r332478533
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java
 ##
 @@ -451,6 +457,47 @@ public void 
testAdvanceWatermarkEqualToPositiveInfinityThrows() {
 source.advanceWatermarkForNextBatch(BoundedWindow.TIMESTAMP_MAX_VALUE);
   }
 
+  @Test
+  public void testInStreamingModeCountByKey() throws Exception {
 
 Review comment:
   First thought that this could be a validatesRunner test but then I 
remembered that spark has its own validates runner tests for streaming. So, 
this place is not perfect but will be fine
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325078)
Time Spent: 1.5h  (was: 1h 20m)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=325075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325075
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:51
Start Date: 08/Oct/19 13:51
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #9567: [BEAM-5690] 
Fix Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r332510266
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java
 ##
 @@ -451,6 +457,47 @@ public void 
testAdvanceWatermarkEqualToPositiveInfinityThrows() {
 source.advanceWatermarkForNextBatch(BoundedWindow.TIMESTAMP_MAX_VALUE);
   }
 
+  @Test
+  public void testInStreamingModeCountByKey() throws Exception {
+Instant instant = new Instant(0);
+
+CreateStream> kvSource =
+CreateStream.of(KvCoder.of(VarIntCoder.of(), VarLongCoder.of()), 
batchDuration())
+.emptyBatch()
+.advanceWatermarkForNextBatch(instant)
+.nextBatch(
+TimestampedValue.of(KV.of(1, 100L), 
instant.plus(Duration.standardSeconds(3L))),
+TimestampedValue.of(KV.of(1, 300L), 
instant.plus(Duration.standardSeconds(4L
+
.advanceWatermarkForNextBatch(instant.plus(Duration.standardSeconds(7L)))
+.nextBatch(
+TimestampedValue.of(KV.of(1, 400L), 
instant.plus(Duration.standardSeconds(8L
+.advanceNextBatchWatermarkToInfinity();
+
+PCollection> output =
+p.apply("create kv Source", kvSource)
+.apply(
+"window input",
+Window.>into(FixedWindows.of(Duration.standardSeconds(3L)))
+.withAllowedLateness(Duration.ZERO))
+.apply(Count.perKey());
+
+PAssert.that("Wrong count value ", output)
+.satisfies(
+(SerializableFunction>, Void>)
+input -> {
+  for (KV element : input) {
+if (element.getKey() == 1) {
+  Long countValue = element.getValue();
+  assertNotEquals("Count Value is 0 !!!", 0L, 
countValue.longValue());
 
 Review comment:
   As I understood expired timers are not evicted and the fact that they are 
triggered entails an empty collection as output. But it is not in 100% of cases 
right, only in some corners cases ? I see no corner case in this test case, 
there should be 3 value output (one per 3s window, with timestamp 3, 4 and 8). 
I don't understand how this test ensures that the fix works. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325075)
Time Spent: 1h 10m  (was: 1h)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> 

[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=325077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325077
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:51
Start Date: 08/Oct/19 13:51
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #9567: [BEAM-5690] 
Fix Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r332522630
 
 

 ##
 File path: 
runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/CreateStreamTest.java
 ##
 @@ -451,6 +457,47 @@ public void 
testAdvanceWatermarkEqualToPositiveInfinityThrows() {
 source.advanceWatermarkForNextBatch(BoundedWindow.TIMESTAMP_MAX_VALUE);
   }
 
+  @Test
+  public void testInStreamingModeCountByKey() throws Exception {
+Instant instant = new Instant(0);
+
+CreateStream> kvSource =
+CreateStream.of(KvCoder.of(VarIntCoder.of(), VarLongCoder.of()), 
batchDuration())
+.emptyBatch()
+.advanceWatermarkForNextBatch(instant)
+.nextBatch(
+TimestampedValue.of(KV.of(1, 100L), 
instant.plus(Duration.standardSeconds(3L))),
+TimestampedValue.of(KV.of(1, 300L), 
instant.plus(Duration.standardSeconds(4L
+
.advanceWatermarkForNextBatch(instant.plus(Duration.standardSeconds(7L)))
+.nextBatch(
+TimestampedValue.of(KV.of(1, 400L), 
instant.plus(Duration.standardSeconds(8L
+.advanceNextBatchWatermarkToInfinity();
+
+PCollection> output =
+p.apply("create kv Source", kvSource)
+.apply(
+"window input",
+Window.>into(FixedWindows.of(Duration.standardSeconds(3L)))
+.withAllowedLateness(Duration.ZERO))
+.apply(Count.perKey());
 
 Review comment:
   you're lucky :) , Count is implemented in the sdk as Combine.PerKey which is 
translated in the current spark runner as GBK + Pardo and GBK's translation 
calls `SparkGroupAlsoByWindowViaWindowSet` which contains your fix. So your fix 
is called even if you call Count and not GBK. If the runner translated 
Combine.perKey directly (as the new spark runner) you would not be testing your 
fix.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325077)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00 

[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=325076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325076
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:51
Start Date: 08/Oct/19 13:51
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #9567: [BEAM-5690] 
Fix Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r332476939
 
 

 ##
 File path: 
runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java
 ##
 @@ -338,6 +339,18 @@ public void outputWindowedValue(
   outputHolder.getWindowedValues();
 
   if (!outputs.isEmpty() || !stateInternals.getState().isEmpty()) {
+Collection filteredTimers =
+timerInternals.getTimers().stream()
+.filter(
+timer ->
+timer
+.getTimestamp()
+.plus(windowingStrategy.getAllowedLateness())
+
.isBefore(timerInternals.currentInputWatermarkTime()))
+.collect(Collectors.toList());
+
+filteredTimers.forEach(timerInternals::deleteTimer);
+
 
 Review comment:
   Logic with watermarks is good. I first thought we should put that it the 
core utility method `LateDataUtils.dropExpiredWindows` but this method drops 
elements based on their windows and watermark.  I know think that dropping the 
internals timers more belongs to this (dedicated to timers) part of the code. 
but you should wrap this code into a method such as dropExpiredTimers and maybe 
put it in a core utility such as `LateDataUtils` because it is a common matter 
of all the runners
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325076)
Time Spent: 1h 20m  (was: 1h 10m)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}

[jira] [Work logged] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7730?focusedWorklogId=325068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325068
 ]

ASF GitHub Bot logged work on BEAM-7730:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:40
Start Date: 08/Oct/19 13:40
Worklog Time Spent: 10m 
  Work Description: dmvk commented on pull request #9296: [BEAM-7730] 
Introduce Flink 1.9 Runner
URL: https://github.com/apache/beam/pull/9296#discussion_r332516978
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
 ##
 @@ -203,6 +205,7 @@ public void processTimer(
 currentTimer = timer;
 currentTimeDomain = timeDomain;
 try {
+  @SuppressWarnings("unchecked")
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325068)
Time Spent: 5h  (was: 4h 50m)

> Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
> -
>
> Key: BEAM-7730
> URL: https://issues.apache.org/jira/browse/BEAM-7730
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: David Moravek
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target 
> and make Flink Runner compatible with Flink 1.9.
> I will add the brief changes after the Flink 1.9.0 released. 
> And I appreciate it if you can leave your suggestions or comments!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7730?focusedWorklogId=325066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325066
 ]

ASF GitHub Bot logged work on BEAM-7730:


Author: ASF GitHub Bot
Created on: 08/Oct/19 13:38
Start Date: 08/Oct/19 13:38
Worklog Time Spent: 10m 
  Work Description: dmvk commented on pull request #9296: [BEAM-7730] 
Introduce Flink 1.9 Runner
URL: https://github.com/apache/beam/pull/9296#discussion_r332515923
 
 

 ##
 File path: sdks/java/build-tools/src/main/resources/beam/suppressions.xml
 ##
 @@ -92,5 +92,6 @@
   
   
   
+  
 
 Review comment:
   BeamStoppableFunction is needed, explanation above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325066)
Time Spent: 4h 50m  (was: 4h 40m)

> Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
> -
>
> Key: BEAM-7730
> URL: https://issues.apache.org/jira/browse/BEAM-7730
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: David Moravek
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target 
> and make Flink Runner compatible with Flink 1.9.
> I will add the brief changes after the Flink 1.9.0 released. 
> And I appreciate it if you can leave your suggestions or comments!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7184) Performance regression on spark-runner

2019-10-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946827#comment-16946827
 ] 

Ismaël Mejía commented on BEAM-7184:


No.

> Performance regression on spark-runner
> --
>
> Key: BEAM-7184
> URL: https://issues.apache.org/jira/browse/BEAM-7184
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> There is a performance degradation of+200% in spark runner starting on 04/10 
> for all the Nexmark queries. See 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-3185) Build blocks on parsing long as int from github status json

2019-10-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3185:
---
Priority: Major  (was: Blocker)

> Build blocks on parsing long as int from github status json
> ---
>
> Key: BEAM-3185
> URL: https://issues.apache.org/jira/browse/BEAM-3185
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Holden Karau
>Priority: Major
>
> (e.g. see 
> https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/818/console )
> `Caused by: com.fasterxml.jackson.databind.JsonMappingException: Numeric 
> value (4313677368) out of range of int`
> Assuming IDs are monotonically increasing this might impact all new PRs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6396) PipelineTest:test_memory_usage fails with new urllib3 on some platforms

2019-10-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-6396:
--

Assignee: kadir cetinkaya

> PipelineTest:test_memory_usage fails with new urllib3 on some platforms
> ---
>
> Key: BEAM-6396
> URL: https://issues.apache.org/jira/browse/BEAM-6396
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: kadir cetinkaya
>Assignee: kadir cetinkaya
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Disable memory usage test since it fails on some platforms with urllib3 
> v1.24.1
> After an update of urllib3 from 1.15.1 to 1.24.1 this test started to fail 
> with about an 50% increase in memory. After my initial analysis through 
> `objgraph` I have seen that there was an increase in amount of tuple's being 
> used by the test which was proportional to `num_maps * num_elements`.
> I've also opened the issue urllib3/urllib3#1514



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6396) PipelineTest:test_memory_usage fails with new urllib3 on some platforms

2019-10-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-6396.

Fix Version/s: 2.10.0
   Resolution: Fixed

> PipelineTest:test_memory_usage fails with new urllib3 on some platforms
> ---
>
> Key: BEAM-6396
> URL: https://issues.apache.org/jira/browse/BEAM-6396
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: kadir cetinkaya
>Assignee: kadir cetinkaya
>Priority: Blocker
> Fix For: 2.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Disable memory usage test since it fails on some platforms with urllib3 
> v1.24.1
> After an update of urllib3 from 1.15.1 to 1.24.1 this test started to fail 
> with about an 50% increase in memory. After my initial analysis through 
> `objgraph` I have seen that there was an increase in amount of tuple's being 
> used by the test which was proportional to `num_maps * num_elements`.
> I've also opened the issue urllib3/urllib3#1514



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7184) Performance regression on spark-runner

2019-10-08 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946820#comment-16946820
 ] 

Etienne Chauchot commented on BEAM-7184:


Do we know what was causing the perf drop ?

> Performance regression on spark-runner
> --
>
> Key: BEAM-7184
> URL: https://issues.apache.org/jira/browse/BEAM-7184
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> There is a performance degradation of+200% in spark runner starting on 04/10 
> for all the Nexmark queries. See 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6544) Python wheel release process must not require inputting credentials to third party sites

2019-10-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946817#comment-16946817
 ] 

Ismaël Mejía commented on BEAM-6544:


Is this still a thing? pinging [~markflyhigh] who made the last release to 
confirm if this can be closed.

> Python wheel release process must not require inputting credentials to third 
> party sites
> 
>
> Key: BEAM-6544
> URL: https://issues.apache.org/jira/browse/BEAM-6544
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Blocker
>
> According to the instructions at 
> https://beam.apache.org/contribute/release-guide/#build-and-stage-python-wheels
>  and https://github.com/apache/beam-wheels, the release manager must input 
> their ASF credentials to Travis-CI. This should not be required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7184) Performance regression on spark-runner

2019-10-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946814#comment-16946814
 ] 

Ismaël Mejía commented on BEAM-7184:


It seems this could not be reproduced. Can this issue be closed now?

> Performance regression on spark-runner
> --
>
> Key: BEAM-7184
> URL: https://issues.apache.org/jira/browse/BEAM-7184
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> There is a performance degradation of+200% in spark runner starting on 04/10 
> for all the Nexmark queries. See 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >