[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=405234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405234
 ]

ASF GitHub Bot logged work on BEAM-7516:


Author: ASF GitHub Bot
Created on: 18/Mar/20 05:42
Start Date: 18/Mar/20 05:42
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10291: 
[BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive 
watermark manager
URL: https://github.com/apache/beam/pull/10291#issuecomment-600435754
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405234)
Time Spent: 7h 40m  (was: 7.5h)

> Add a watermark manager for the fn_api_runner
> -
>
> Key: BEAM-7516
> URL: https://issues.apache.org/jira/browse/BEAM-7516
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> To track watermarks for each stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9537) Refactor FnApiRunner into its own package

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9537?focusedWorklogId=405233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405233
 ]

ASF GitHub Bot logged work on BEAM-9537:


Author: ASF GitHub Bot
Created on: 18/Mar/20 05:40
Start Date: 18/Mar/20 05:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11153: [BEAM-9537] Adding a 
new module for FnApiRunner
URL: https://github.com/apache/beam/pull/11153#issuecomment-600435252
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405233)
Remaining Estimate: 0h
Time Spent: 10m

> Refactor FnApiRunner into its own package
> -
>
> Key: BEAM-9537
> URL: https://issues.apache.org/jira/browse/BEAM-9537
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9503) SyntaxError in process worker startup

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9503?focusedWorklogId=405231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405231
 ]

ASF GitHub Bot logged work on BEAM-9503:


Author: ASF GitHub Bot
Created on: 18/Mar/20 05:39
Start Date: 18/Mar/20 05:39
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11124: 
[cherry-pick][release-2.20.0][BEAM-9503] Insert missing comma in process worker 
script.
URL: https://github.com/apache/beam/pull/11124#issuecomment-600434903
 
 
   Thank you! I have cherry-picked your commit directly. Closing this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405231)
Time Spent: 5h  (was: 4h 50m)

> SyntaxError in process worker startup
> -
>
> Key: BEAM-9503
> URL: https://issues.apache.org/jira/browse/BEAM-9503
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> WARNING:apache_beam.runners.worker.worker_pool_main:Starting worker with 
> command ['python', '-c', 'from apache_beam.runners.worker.sdk_worker import 
> SdkHarness; 
> SdkHarness("localhost:57103",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=0).run()']
> Note that 'state_cache_size=0data_buffer_time_limit_ms=0' is all mashed 
> together. Looks like we're missing a comma: 
> https://github.com/apache/beam/blob/feefaca793d8358d5386d0725863c03e4e37b5b1/sdks/python/apache_beam/runners/worker/worker_pool_main.py#L116



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9503) SyntaxError in process worker startup

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9503?focusedWorklogId=405232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405232
 ]

ASF GitHub Bot logged work on BEAM-9503:


Author: ASF GitHub Bot
Created on: 18/Mar/20 05:39
Start Date: 18/Mar/20 05:39
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11124: 
[cherry-pick][release-2.20.0][BEAM-9503] Insert missing comma in process worker 
script.
URL: https://github.com/apache/beam/pull/11124
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405232)
Time Spent: 5h 10m  (was: 5h)

> SyntaxError in process worker startup
> -
>
> Key: BEAM-9503
> URL: https://issues.apache.org/jira/browse/BEAM-9503
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> WARNING:apache_beam.runners.worker.worker_pool_main:Starting worker with 
> command ['python', '-c', 'from apache_beam.runners.worker.sdk_worker import 
> SdkHarness; 
> SdkHarness("localhost:57103",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=0).run()']
> Note that 'state_cache_size=0data_buffer_time_limit_ms=0' is all mashed 
> together. Looks like we're missing a comma: 
> https://github.com/apache/beam/blob/feefaca793d8358d5386d0725863c03e4e37b5b1/sdks/python/apache_beam/runners/worker/worker_pool_main.py#L116



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9536) Return type of window start/end functions is incorrectly inferred to be INT64

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9536?focusedWorklogId=405223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405223
 ]

ASF GitHub Bot logged work on BEAM-9536:


Author: ASF GitHub Bot
Created on: 18/Mar/20 05:20
Start Date: 18/Mar/20 05:20
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11152: [BEAM-9536] 
Specify return types of window start/end functions explicitly
URL: https://github.com/apache/beam/pull/11152#issuecomment-600429939
 
 
   LGTM
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405223)
Time Spent: 0.5h  (was: 20m)

> Return type of window start/end functions is incorrectly inferred to be INT64
> -
>
> Key: BEAM-9536
> URL: https://issues.apache.org/jira/browse/BEAM-9536
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Our current implementation does not have the correct return type information 
> of the window start/end functions (e.g. TUMBLE_START).
> LOC that cause the problem: 
> [https://github.com/apache/beam/blob/646f596988be9d6a739090f48d2fed07c8dfc17c/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L62]
> We should specify the return type explicitly here in the CREATE statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9536) Return type of window start/end functions is incorrectly inferred to be INT64

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9536?focusedWorklogId=405222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405222
 ]

ASF GitHub Bot logged work on BEAM-9536:


Author: ASF GitHub Bot
Created on: 18/Mar/20 05:19
Start Date: 18/Mar/20 05:19
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11152: [BEAM-9536] 
Specify return types of window start/end functions explicitly
URL: https://github.com/apache/beam/pull/11152#issuecomment-600429785
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405222)
Time Spent: 20m  (was: 10m)

> Return type of window start/end functions is incorrectly inferred to be INT64
> -
>
> Key: BEAM-9536
> URL: https://issues.apache.org/jira/browse/BEAM-9536
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Our current implementation does not have the correct return type information 
> of the window start/end functions (e.g. TUMBLE_START).
> LOC that cause the problem: 
> [https://github.com/apache/beam/blob/646f596988be9d6a739090f48d2fed07c8dfc17c/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L62]
> We should specify the return type explicitly here in the CREATE statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9537) Refactor FnApiRunner into its own package

2020-03-17 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9537:
---

 Summary: Refactor FnApiRunner into its own package
 Key: BEAM-9537
 URL: https://issues.apache.org/jira/browse/BEAM-9537
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9533) Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9533?focusedWorklogId=405206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405206
 ]

ASF GitHub Bot logged work on BEAM-9533:


Author: ASF GitHub Bot
Created on: 18/Mar/20 04:11
Start Date: 18/Mar/20 04:11
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11149: [BEAM-9533] 
Adding tox cloud tests
URL: https://github.com/apache/beam/pull/11149
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405206)
Time Spent: 0.5h  (was: 20m)

> Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both
> -
>
> Key: BEAM-9533
> URL: https://issues.apache.org/jira/browse/BEAM-9533
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently there are `py37-gcp`, py37-aws test suites. Let's consolidate all 
> of them into py37-cloud, along with other py35-gcp, py27-gcp, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=405195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405195
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 18/Mar/20 03:45
Start Date: 18/Mar/20 03:45
Worklog Time Spent: 10m 
  Work Description: pulasthi commented on issue #10888: [BEAM-7304] 
Twister2 Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-600409833
 
 
   @iemejia just wanted to check if you were able to review the code. And 
please let me know if you need anything from my end
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405195)
Time Spent: 13h 50m  (was: 13h 40m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9533) Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9533?focusedWorklogId=405185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405185
 ]

ASF GitHub Bot logged work on BEAM-9533:


Author: ASF GitHub Bot
Created on: 18/Mar/20 03:31
Start Date: 18/Mar/20 03:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11149: [BEAM-9533] Adding 
tox cloud tests
URL: https://github.com/apache/beam/pull/11149#issuecomment-600406930
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405185)
Time Spent: 20m  (was: 10m)

> Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both
> -
>
> Key: BEAM-9533
> URL: https://issues.apache.org/jira/browse/BEAM-9533
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there are `py37-gcp`, py37-aws test suites. Let's consolidate all 
> of them into py37-cloud, along with other py35-gcp, py27-gcp, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6874) HCatalogTableProvider always read all rows

2020-03-17 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-6874:
-

Assignee: (was: Ahmet Altay)

> HCatalogTableProvider always read all rows
> --
>
> Key: BEAM-6874
> URL: https://issues.apache.org/jira/browse/BEAM-6874
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, io-java-hcatalog
>Affects Versions: 2.11.0
>Reporter: Near
>Priority: Major
> Attachments: limit.png
>
>
> Hi,
> I'm using HCatalogTableProvider while doing SqlTransform.query. The query is 
> something like "select * from `hive`.`table_name` limit 10". Despite of the 
> limit clause, the data source still reads much more rows (the data of Hive 
> table are files on S3), even more than the number of rows in one file (or 
> partition).
>  
> Some more details:
>  # It is running on Flink.
>  # I actually implemented my own HiveTableProvider because HCatalogBeamSchema 
> only supports primitive types. However, the table provider works when I query 
> a small table with ~1k rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405172
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 03:03
Start Date: 18/Mar/20 03:03
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#issuecomment-600400560
 
 
   Great comments and questions. I’m in the middle of rolling out our COVID
   plan at work so it may take me a bit to get you proper answers but I’ll
   start chipping away at it as soon as I can.
   
   
   
   On Tue, Mar 17, 2020 at 7:30 PM Udi Meiri  wrote:
   
   > *@udim* commented on this pull request.
   >
   > This review is taking me forever. I'm at 18 out of 52 files reviewed, but
   > releasing 16 comments I've written so far.
   > --
   >
   > In sdks/python/apache_beam/runners/worker/operations.py
   > :
   >
   > > @@ -569,7 +580,7 @@ def __init__(self,
   >  self.tagged_receivers = None  # type: Optional[_TaggedReceivers]
   >  # A mapping of timer tags to the input "PCollections" they come in on.
   >  self.timer_inputs = timer_inputs or {}
   > -self.input_info = None  # type: Optional[Tuple[str, str, 
coders.WindowedValueCoder, MutableMapping[str, str]]]
   >
   > So the MutableMapping hint here was a mistake?
   > --
   >
   > In sdks/python/apache_beam/metrics/metric.py
   > :
   >
   > > @@ -56,7 +55,7 @@ class Metrics(object):
   >@staticmethod
   >def get_namespace(namespace):
   >  # type: (Union[Type, str]) -> str
   > -if inspect.isclass(namespace):
   > +if isinstance(namespace, type):
   >
   > Hopefully no one uses old-style classes any more (types.ClassType).
   > --
   >
   > In sdks/python/mypy.ini
   > :
   >
   > >  color_output = true
   > -# uncomment this to see how close we are to being complete
   > +# required setting for dmypy:
   > +follow_imports = error
   >
   > Does this mean having to supply all imported modules on the mypy command
   > line?
   > --
   >
   > In sdks/python/apache_beam/io/iobase.py
   > :
   >
   > > +
   > +class Position(Protocol):
   > +  def __lt__(self, other):
   > +pass
   > +
   > +  def __le__(self, other):
   > +pass
   > +
   > +  def __gt__(self, other):
   > +pass
   > +
   > +  def __ge__(self, other):
   > +pass
   > +
   > +
   > +PositionT = TypeVar('PositionT', bound='Position')
   >
   > I don't understand the usage of PositionT. Isn't Position already a type?
   > It seems that you could replace all uses of PositionT with Position and
   > it'd work the same.
   > --
   >
   > In sdks/python/apache_beam/io/iobase.py
   > :
   >
   > > @@ -95,8 +115,11 @@
   >  #
   >  # Type for start and stop positions are specific to the bounded source 
and must
   >  # be consistent throughout.
   > -SourceBundle = namedtuple(
   > -'SourceBundle', 'weight source start_position stop_position')
   > +SourceBundle = NamedTuple(
   > +'SourceBundle',
   > +[('weight', Optional[float]), ('source', 'BoundedSource'),
   >
   > It seems that weight is non-Optional.
   > --
   >
   > In sdks/python/apache_beam/io/range_trackers.py
   > :
   >
   > > @@ -42,7 +48,7 @@
   >  _LOGGER = logging.getLogger(__name__)
   >
   >
   > -class OffsetRangeTracker(iobase.RangeTracker):
   > +class OffsetRangeTracker(iobase.RangeTracker[int]):
   >"""A 'RangeTracker' for non-negative positions of type 'long'."""
   >
   > s/long/int/
   > --
   >
   > In sdks/python/apache_beam/io/range_trackers.py
   > :
   >
   > >  self._start_position = start_position
   >  self._stop_position = stop_position
   >  self._lock = threading.Lock()
   > -self._last_claim = self.UNSTARTED
   > +# the return on investment for properly typing this is low. cast it.
   >
   > Did you mean that the ROI is high, not low?
   > --
   >
   > In sdks/python/apache_beam/io/range_trackers.py
   > :
   >
   > >def fraction_to_position(self, fraction, start, end):
   > +# type: (float, Optional[iobase.PositionT], 

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=405155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405155
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:38
Start Date: 18/Mar/20 02:38
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #10369: [BEAM-8933] 
BigQueryIO Arrow for read
URL: https://github.com/apache/beam/pull/10369#issuecomment-600394738
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405155)
Time Spent: 11h 20m  (was: 11h 10m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405142
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394021216
 
 

 ##
 File path: sdks/python/apache_beam/io/range_trackers.py
 ##
 @@ -42,7 +48,7 @@
 _LOGGER = logging.getLogger(__name__)
 
 
-class OffsetRangeTracker(iobase.RangeTracker):
+class OffsetRangeTracker(iobase.RangeTracker[int]):
   """A 'RangeTracker' for non-negative positions of type 'long'."""
 
 Review comment:
   s/long/int/
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405142)
Time Spent: 75h 40m  (was: 75.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405147
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394029843
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -244,7 +272,7 @@ def is_bounded(self):
 return True
 
 
-class RangeTracker(object):
+class RangeTracker(Generic[PositionT]):
 
 Review comment:
   I'm not sure. Positions might be opaque byte strings, where splitting 
happens externally. This might be important for cross-language transforms.
   
   @lukecwik, @robertwb, @chamikaramj , do you have opinion on RangeTracker 
position types? Should they all support the Position protocol (defined above)?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405147)
Time Spent: 76h  (was: 75h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405152
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394056361
 
 

 ##
 File path: sdks/python/apache_beam/utils/sentinel.py
 ##
 @@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import enum
+
+
+class Sentinel(enum.Enum):
+  """
+  A type-safe sentinel class
+  """
+
+  sentinel = object()
 
 Review comment:
   SG. 
   Sentinel is the type.
   SPLIT_POINTS_UNKNOWN is the unique value.
   Inheriting from Enum (vs calling Enum()) simplifies pickling (not sure 
necessary, but doesn't hurt).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405152)
Time Spent: 76.5h  (was: 76h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405149
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394038483
 
 

 ##
 File path: sdks/python/apache_beam/coders/coders.py
 ##
 @@ -387,8 +387,11 @@ def register_structured_urn(urn, cls):
 """Register a coder that's completely defined by its urn and its
 component(s), if any, which are passed to construct the instance.
 """
-cls.to_runner_api_parameter = (
-lambda self, unused_context: (urn, None, self._get_component_coders()))
+setattr(
 
 Review comment:
   Could you explain (in a comment perhaps) why using setattr here  is 
necessary? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405149)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405144
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r393281202
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -569,7 +580,7 @@ def __init__(self,
 self.tagged_receivers = None  # type: Optional[_TaggedReceivers]
 # A mapping of timer tags to the input "PCollections" they come in on.
 self.timer_inputs = timer_inputs or {}
-self.input_info = None  # type: Optional[Tuple[str, str, 
coders.WindowedValueCoder, MutableMapping[str, str]]]
 
 Review comment:
   So the MutableMapping hint here was a mistake?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405144)
Time Spent: 75h 50m  (was: 75h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405140
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394017739
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -95,8 +115,11 @@
 #
 # Type for start and stop positions are specific to the bounded source and must
 # be consistent throughout.
-SourceBundle = namedtuple(
-'SourceBundle', 'weight source start_position stop_position')
+SourceBundle = NamedTuple(
+'SourceBundle',
+[('weight', Optional[float]), ('source', 'BoundedSource'),
 
 Review comment:
   It seems that `weight` is non-Optional.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405140)
Time Spent: 75.5h  (was: 75h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405150
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394034914
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/naturallanguageml.py
 ##
 @@ -74,7 +75,9 @@ def __init__(
   def to_dict(document):
 # type: (Document) -> Mapping[str, Optional[str]]
 if document.from_gcs:
-  dict_repr = {'gcs_content_uri': document.content}
+  dict_repr = {
+  'gcs_content_uri': document.content
+  }  # type: Dict[str, Optional[str]]
 
 Review comment:
   `document.content` is not Optional
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405150)
Time Spent: 76h 20m  (was: 76h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405139
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r393289270
 
 

 ##
 File path: sdks/python/apache_beam/metrics/metric.py
 ##
 @@ -56,7 +55,7 @@ class Metrics(object):
   @staticmethod
   def get_namespace(namespace):
 # type: (Union[Type, str]) -> str
-if inspect.isclass(namespace):
+if isinstance(namespace, type):
 
 Review comment:
   Hopefully no one uses old-style classes any more (types.ClassType).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405139)
Time Spent: 75h 20m  (was: 75h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405141
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r393304392
 
 

 ##
 File path: sdks/python/mypy.ini
 ##
 @@ -17,16 +17,21 @@
 
 [mypy]
 python_version = 3.6
+files = apache_beam
 ignore_missing_imports = true
-follow_imports = true
-warn_no_return = true
 no_implicit_optional = true
+# warnings:
+warn_no_return = true
 warn_redundant_casts = true
 warn_unused_ignores = true
+# formatting:
 show_error_codes = true
-files = apache_beam
 color_output = true
-# uncomment this to see how close we are to being complete
+# required setting for dmypy:
+follow_imports = error
 
 Review comment:
   Does this mean having to supply all imported modules on the mypy command 
line?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405141)
Time Spent: 75h 40m  (was: 75.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405153
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394034514
 
 

 ##
 File path: sdks/python/apache_beam/io/restriction_trackers.py
 ##
 @@ -52,6 +55,7 @@ def __hash__(self):
 return hash((type(self), self.start, self.stop))
 
   def split(self, desired_num_offsets_per_split, min_num_offsets_per_split=1):
+# type: (...) -> Iterator[OffsetRange]
 
 Review comment:
   Input looks like `(int, int)`. Any reason to leave it empty?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405153)
Time Spent: 76h 40m  (was: 76.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405143
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r393383746
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -79,6 +82,23 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+
+class Position(Protocol):
+  def __lt__(self, other):
+pass
+
+  def __le__(self, other):
+pass
+
+  def __gt__(self, other):
+pass
+
+  def __ge__(self, other):
+pass
+
+
+PositionT = TypeVar('PositionT', bound='Position')
 
 Review comment:
   I don't understand the usage of PositionT. Isn't Position already a type?
   It seems that you could replace all uses of PositionT with Position and it'd 
work the same.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405143)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405146
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394026348
 
 

 ##
 File path: sdks/python/apache_beam/io/range_trackers.py
 ##
 @@ -233,19 +254,25 @@ class OrderedPositionRangeTracker(iobase.RangeTracker):
   UNSTARTED = object()
 
   def __init__(self, start_position=None, stop_position=None):
+# type: (Optional[iobase.PositionT], Optional[iobase.PositionT]) -> None
 self._start_position = start_position
 self._stop_position = stop_position
 self._lock = threading.Lock()
-self._last_claim = self.UNSTARTED
+# the return on investment for properly typing this is low. cast it.
 
 Review comment:
   Did you mean that the ROI is high, not low?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405146)
Time Spent: 76h  (was: 75h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405148
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394060396
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -353,6 +359,8 @@ def release(self, instruction_id):
 self.cached_bundle_processors[descriptor_id].append(processor)
 
   def shutdown(self):
+# type: (...) -> None
 
 Review comment:
   This can be `() -> None` right?
   Same for the next 2 hints below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405148)
Time Spent: 76h 10m  (was: 76h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405154
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394057261
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -171,7 +172,7 @@ def get_responses():
 self._worker_thread_pool.shutdown()
 # get_responses may be blocked on responses.get(), but we need to return
 # control to its caller.
-self._responses.put(no_more_work)
+self._responses.put(no_more_work)  # type: ignore[arg-type]
 
 Review comment:
   Sounds like a good idea. There are ~20 places in the code that assign 
`object()` as  value.
   One of them is even called `READER_THREAD_IS_DONE_SENTINEL`. :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405154)
Time Spent: 76h 40m  (was: 76.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405145
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394027007
 
 

 ##
 File path: sdks/python/apache_beam/io/range_trackers.py
 ##
 @@ -283,20 +313,28 @@ def try_split(self, position):
 return None
 
   def fraction_consumed(self):
+# type: () -> float
 if self._last_claim is self.UNSTARTED:
   return 0
 else:
   return self.position_to_fraction(
   self._last_claim, self._start_position, self._stop_position)
 
+  @classmethod
+  def position_to_fraction(cls, key, start=None, end=None):
+# type: (iobase.PositionT, Optional[iobase.PositionT], 
Optional[iobase.PositionT]) -> float
+raise NotImplementedError
+
   def fraction_to_position(self, fraction, start, end):
+# type: (float, Optional[iobase.PositionT], Optional[iobase.PositionT]) -> 
Optional[iobase.PositionT]
 
 Review comment:
   The return value seems to be non-Optional.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405145)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 75h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=405151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405151
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 18/Mar/20 02:29
Start Date: 18/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#discussion_r394072405
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1300,12 +1300,13 @@ def to_runner_api_parameter(self, context):
   common_urns.requirements.REQUIRES_STATEFUL_PROCESSING.urn)
 from apache_beam.runners.common import DoFnSignature
 sig = DoFnSignature(self.fn)
-is_splittable = sig.is_splittable_dofn()
 
 Review comment:
   Not sure if checking get_restriction_coder() return type instead of 
is_splittable_dofn() is future proof.
   
   Also, I don't understand the change, from a mypy correctness perspective.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405151)
Time Spent: 76.5h  (was: 76h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 76.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?

2020-03-17 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061320#comment-17061320
 ] 

Tomo Suzuki edited comment on BEAM-9444 at 3/18/20, 2:15 AM:
-

{{BEAM-9444-gcp-bom(fec5361f)}} still has problem in missing class such as 
{{org.apache.beam.sdk.options.PipelineOptions}} and 
{{com.google.protobuf.ByteString}}.

{noformat}
Linkage Check difference on beam-sdks-java-io-google-cloud-platform between 
master(b00885a1) and BEAM-9444-gcp-bom(fec5361f):
Lines starting with '<' mean the branch remedies the errors (good)
Lines starting with '>' mean the branch introduces new errors (bad)
7c7,436
< Class com.github.luben.zstd.ZstdInputStream is not found;
---
> Class org.apache.beam.sdk.options.PipelineOptions is not found;
>   referenced by 27 class files
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigtable.BigtableConfig 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.WriteRename 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteTables 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
...

> Class com.google.protobuf.ByteString is not found;
>   referenced by 13 class files
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigtable.BigtableServiceImpl 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
{noformat}


was (Author: suztomo):
 BEAM-9444-gcp-bom(fec5361f) still has problem in missing class such as 
{{org.apache.beam.sdk.options.PipelineOptions}} and 
{{com.google.protobuf.ByteString}}.

{noformat}
Linkage Check difference on beam-sdks-java-io-google-cloud-platform between 
master(b00885a1) and BEAM-9444-gcp-bom(fec5361f):
Lines starting with '<' mean the branch remedies the errors (good)
Lines starting with '>' mean the branch introduces new errors (bad)
7c7,436
< Class com.github.luben.zstd.ZstdInputStream is not found;
---
> Class org.apache.beam.sdk.options.PipelineOptions is not found;
>   referenced by 27 class files
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigtable.BigtableConfig 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.WriteRename 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteTables 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
...

> Class com.google.protobuf.ByteString is not found;
>   referenced by 13 class files
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigtable.BigtableServiceImpl 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
{noformat}

> Shall we use GCP Libraries BOM to specify Google-related library versions?
> --
>
> Key: BEAM-9444
> URL: https://issues.apache.org/jira/browse/BEAM-9444
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot 
> 2020-03-17 at 16.01.16.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Shall we use GCP Libraries BOM to specify Google-related library versions?
>   
>  I've been working on Beam's dependency upgrades in the past few months. I 
> think it's time to consider a long-term solution to keep the libraries 
> up-to-date with small maintenance effort. To achieve that, I propose Beam to 
> use GCP Libraries BOM to set the Google-related library versions, rather than 
> trying to make changes in each 

[jira] [Commented] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?

2020-03-17 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061320#comment-17061320
 ] 

Tomo Suzuki commented on BEAM-9444:
---

 BEAM-9444-gcp-bom(fec5361f) still has problem in missing class such as 
{{org.apache.beam.sdk.options.PipelineOptions}} and 
{{com.google.protobuf.ByteString}}.

{noformat}
Linkage Check difference on beam-sdks-java-io-google-cloud-platform between 
master(b00885a1) and BEAM-9444-gcp-bom(fec5361f):
Lines starting with '<' mean the branch remedies the errors (good)
Lines starting with '>' mean the branch introduces new errors (bad)
7c7,436
< Class com.github.luben.zstd.ZstdInputStream is not found;
---
> Class org.apache.beam.sdk.options.PipelineOptions is not found;
>   referenced by 27 class files
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigtable.BigtableConfig 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.WriteRename 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteTables 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
...

> Class com.google.protobuf.ByteString is not found;
>   referenced by 13 class files
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigtable.BigtableServiceImpl 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource 
> (beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.jar)
{noformat}

> Shall we use GCP Libraries BOM to specify Google-related library versions?
> --
>
> Key: BEAM-9444
> URL: https://issues.apache.org/jira/browse/BEAM-9444
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot 
> 2020-03-17 at 16.01.16.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Shall we use GCP Libraries BOM to specify Google-related library versions?
>   
>  I've been working on Beam's dependency upgrades in the past few months. I 
> think it's time to consider a long-term solution to keep the libraries 
> up-to-date with small maintenance effort. To achieve that, I propose Beam to 
> use GCP Libraries BOM to set the Google-related library versions, rather than 
> trying to make changes in each of ~30 Google libraries.
>   
> h1. Background
> A BOM is pom.xml that provides dependencyManagement to importing projects.
>   
>  GCP Libraries BOM is a BOM that includes many Google Cloud related libraries 
> + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain 
> the BOM so that the set of the libraries are compatible with each other.
>   
> h1. Implementation
> Notes for obstacles.
> h2. BeamModulePlugin's "force" does not take BOM into account (thus fails)
> {{forcedModules}} via version resolution strategy is playing bad. This causes
> {noformat}
> A problem occurred evaluating project ':sdks:java:extensions:sql'. 
> Could not resolve all dependencies for configuration 
> ':sdks:java:extensions:sql:fmppTemplates'.
> Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version 
> cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat}
> !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! 
>   
> h2. :sdks:java:maven-archetypes:examples needs the version of 
> google-http-client
> The task requires the version for the library:
> {code:java}
> 'google-http-client.version': 
> dependencies.create(project.library.java.google_http_client).getVersion(),
> {code}
> This would generate NullPointerException. Running gradlew without the 
> subproject:
>   
> {code:java}
> ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check
> {code}
> h1. Problem in Gradle-generated pom files
> The generated Maven artifact POM has invalid data due to the BOM change. For 
> example my locally installed 
> {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}}
>  had the following problems.
> h2. The GCP Libraries BOM showing up in dependencies section:
> {noformat}
>   
> 
>   com.google.cloud
>  

[jira] [Commented] (BEAM-8494) Python 3.8 Support

2020-03-17 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061310#comment-17061310
 ] 

Valentyn Tymofieiev commented on BEAM-8494:
---

Sounds good!

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9503) SyntaxError in process worker startup

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9503?focusedWorklogId=405112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405112
 ]

ASF GitHub Bot logged work on BEAM-9503:


Author: ASF GitHub Bot
Created on: 18/Mar/20 01:45
Start Date: 18/Mar/20 01:45
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11124: 
[cherry-pick][release-2.20.0][BEAM-9503] Insert missing comma in process worker 
script.
URL: https://github.com/apache/beam/pull/11124#issuecomment-600382340
 
 
   Python2_PVR_Flink failures are probably flakes related to timeouts -- see 
BEAM-8912
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405112)
Time Spent: 4h 50m  (was: 4h 40m)

> SyntaxError in process worker startup
> -
>
> Key: BEAM-9503
> URL: https://issues.apache.org/jira/browse/BEAM-9503
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> WARNING:apache_beam.runners.worker.worker_pool_main:Starting worker with 
> command ['python', '-c', 'from apache_beam.runners.worker.sdk_worker import 
> SdkHarness; 
> SdkHarness("localhost:57103",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=0).run()']
> Note that 'state_cache_size=0data_buffer_time_limit_ms=0' is all mashed 
> together. Looks like we're missing a comma: 
> https://github.com/apache/beam/blob/feefaca793d8358d5386d0725863c03e4e37b5b1/sdks/python/apache_beam/runners/worker/worker_pool_main.py#L116



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9536) Return type of window start/end functions is incorrectly inferred to be INT64

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9536?focusedWorklogId=405107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405107
 ]

ASF GitHub Bot logged work on BEAM-9536:


Author: ASF GitHub Bot
Created on: 18/Mar/20 01:32
Start Date: 18/Mar/20 01:32
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #11152: [BEAM-9536] 
Specify return types of window start/end functions explicitly
URL: https://github.com/apache/beam/pull/11152
 
 
   See details in [BEAM-9536](https://issues.apache.org/jira/browse/BEAM-9536)
   
   r: @amaliujia 
   cc: @TheNeuralBit 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Commented] (BEAM-8912) PreCommit_Python2_PVR_Flink_Commit flaky

2020-03-17 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061297#comment-17061297
 ] 

Kyle Weaver commented on BEAM-8912:
---

This failed again, but with helpful failure information:

==
ERROR: test_assert_that (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 105, in 
test_assert_that
assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
  File "apache_beam/pipeline.py", line 503, in __exit__
self.run().wait_until_finish()
  File "apache_beam/pipeline.py", line 483, in run
self._options).run(False)
  File "apache_beam/pipeline.py", line 496, in run
return self.runner.run_pipeline(self, self._options)
  File "apache_beam/runners/portability/portable_runner.py", line 399, in 
run_pipeline
result.wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 518, in 
wait_until_finish
for state_response in self._state_stream:
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",
 line 413, in next
return self._next()
 Timed out after 60 seconds. 

# Thread: <_Worker(Thread-21, started daemon 139751759787776)>

# Thread: <_Worker(Thread-7, started daemon 139752397338368)>

  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",
 line 697, in _next
_common.wait(self._state.condition.wait, _response_ready)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",
 line 139, in wait
# Thread: 

# Thread: 

_wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb)
# Thread: <_MainThread(MainThread, started 139753682020096)>

# Thread: 
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",
 line 104, in _wait_once
wait_fn(timeout=timeout)
 Timed out after 60 seconds. 

  File "/usr/lib/python2.7/threading.py", line 359, in wait
_sleep(delay)
# Thread: <_Worker(Thread-1500, started daemon 139752388945664)>

# Thread: 

# Thread: 

  File "apache_beam/runners/portability/portable_runner_test.py", line 82, in 
handler
# Thread: <_MainThread(MainThread, started 139753682020096)>

raise BaseException(msg)
# Thread: 
BaseException: Timed out after 60 seconds.

==
ERROR: test_combine_per_key (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 585, in 
test_combine_per_key
assert_that(res, equal_to([('a', 1.5), ('b', 3.0)]))
  File "apache_beam/pipeline.py", line 503, in __exit__
self.run().wait_until_finish()
  File "apache_beam/pipeline.py", line 483, in run
self._options).run(False)
  File "apache_beam/pipeline.py", line 496, in run
return self.runner.run_pipeline(self, self._options)
  File "apache_beam/runners/portability/portable_runner.py", line 399, in 
run_pipeline
result.wait_until_finish()
  File "apache_beam/runners/portability/portable_runner.py", line 528, in 
wait_until_finish
(self._job_id, self._state, self._last_error_message()))
RuntimeError: Pipeline 
test_combine_per_key_1584490211.15_4578ed7a-90a6-41a5-802f-b4284178ca06 failed 
in state FAILED: akka.ConfigurationException: Could not start logger due to 
[akka.ConfigurationException: Logger specified in config can't be loaded 
[akka.event.slf4j.Slf4jLogger] due to 
[akka.event.Logging$LoggerInitializationException: Logger log1-Slf4jLogger did 
not respond with LoggerInitialized, sent instead [TIMEOUT]]]

==
ERROR: test_assert_that (__main__.FlinkRunnerTestOptimized)
--
Traceback (most recent call last):
  File "apache_beam/runners/portability/fn_api_runner_test.py", line 105, in 
test_assert_that
assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
  File "apache_beam/pipeline.py", line 503, in __exit__
self.run().wait_until_finish()
  File "apache_beam/pipeline.py", line 483, in run
self._options).run(False)
  File "apache_beam/pipeline.py", line 496, in run
return self.runner.run_pipeline(self, self._options)
  File 

[jira] [Created] (BEAM-9536) Return type of window start/end functions is incorrectly inferred to be INT64

2020-03-17 Thread Yueyang Qiu (Jira)
Yueyang Qiu created BEAM-9536:
-

 Summary: Return type of window start/end functions is incorrectly 
inferred to be INT64
 Key: BEAM-9536
 URL: https://issues.apache.org/jira/browse/BEAM-9536
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql-zetasql
Reporter: Yueyang Qiu
Assignee: Yueyang Qiu


Our current implementation does not have the correct return type information of 
the window start/end functions (e.g. TUMBLE_START).

LOC that cause the problem: 
[https://github.com/apache/beam/blob/646f596988be9d6a739090f48d2fed07c8dfc17c/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L62]

We should specify the return type explicitly here in the CREATE statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8494) Python 3.8 Support

2020-03-17 Thread yoshiki obata (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061291#comment-17061291
 ] 

yoshiki obata commented on BEAM-8494:
-

[~tvalentyn] Sounds good to sync with.
But I'm not good at speaking English, how about text chat?
If it's OK, I'll hangout you on 3/19 10:00 am (PDT).

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405098
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 18/Mar/20 01:18
Start Date: 18/Mar/20 01:18
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #11060: [BEAM-9454] Create 
Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#issuecomment-600376033
 
 
   Run Python 3.5 Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405098)
Time Spent: 4.5h  (was: 4h 20m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405097
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 18/Mar/20 01:18
Start Date: 18/Mar/20 01:18
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #11060: [BEAM-9454] Create 
Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#issuecomment-600376012
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405097)
Time Spent: 4h 20m  (was: 4h 10m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9535) Cleanup portability protos.

2020-03-17 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-9535:
--
Summary: Cleanup portability protos.  (was: Clean up portability protos.)

> Cleanup portability protos.
> ---
>
> Key: BEAM-9535
> URL: https://issues.apache.org/jira/browse/BEAM-9535
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is to provide a stable version of the FnAPI protos going forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9503) SyntaxError in process worker startup

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9503?focusedWorklogId=405080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405080
 ]

ASF GitHub Bot logged work on BEAM-9503:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:47
Start Date: 18/Mar/20 00:47
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11124: 
[cherry-pick][release-2.20.0][BEAM-9503] Insert missing comma in process worker 
script.
URL: https://github.com/apache/beam/pull/11124#issuecomment-600368958
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405080)
Time Spent: 4h 40m  (was: 4.5h)

> SyntaxError in process worker startup
> -
>
> Key: BEAM-9503
> URL: https://issues.apache.org/jira/browse/BEAM-9503
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> WARNING:apache_beam.runners.worker.worker_pool_main:Starting worker with 
> command ['python', '-c', 'from apache_beam.runners.worker.sdk_worker import 
> SdkHarness; 
> SdkHarness("localhost:57103",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=0).run()']
> Note that 'state_cache_size=0data_buffer_time_limit_ms=0' is all mashed 
> together. Looks like we're missing a comma: 
> https://github.com/apache/beam/blob/feefaca793d8358d5386d0725863c03e4e37b5b1/sdks/python/apache_beam/runners/worker/worker_pool_main.py#L116



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9535) Clean up portability protos.

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9535?focusedWorklogId=405081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405081
 ]

ASF GitHub Bot logged work on BEAM-9535:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:47
Start Date: 18/Mar/20 00:47
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11150: [BEAM-9535] 
Remove unused ParDoPayload.Parameters.
URL: https://github.com/apache/beam/pull/11150
 
 
   I wasn't able to find any reference to these in the dataflow code either.
   
   R: @lukecwik 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9535) Clean up portability protos.

2020-03-17 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-9535:
-

 Summary: Clean up portability protos.
 Key: BEAM-9535
 URL: https://issues.apache.org/jira/browse/BEAM-9535
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Robert Bradshaw
 Fix For: 2.21.0


This is to provide a stable version of the FnAPI protos going forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=405073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405073
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:34
Start Date: 18/Mar/20 00:34
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #11092: [BEAM-9085] Fix 
performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/11092#issuecomment-600365872
 
 
   What do you think about branching the generation based on Py version as a 
workaround until we deprecate Py2 support in Beam?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405073)
Time Spent: 6.5h  (was: 6h 20m)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=405072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405072
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:31
Start Date: 18/Mar/20 00:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #11092: [BEAM-9085] Fix 
performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/11092#issuecomment-600364689
 
 
   Yes, I agree; we should be migrating the tests to Python 3. I think we'll 
pick the option that works well on Py3 and move on, or branch the generation on 
Py2 vs Py3, if we don't find one-fits-all solution. Afterall, the input 
generation should not be the slowest part of the pipeline.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405072)
Time Spent: 6h 20m  (was: 6h 10m)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=405070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405070
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:28
Start Date: 18/Mar/20 00:28
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #11092: [BEAM-9085] Fix 
performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/11092#issuecomment-600364689
 
 
   Yes, I agree; we should be migrating the tests to Python 3. I think we'll 
pick the option that works well on Py3 and move or branch the generation on Py2 
vs Py3, if we don't find one-fit-all solution. Afterall, the input generation 
should not be the slowest part of the pipeline.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405070)
Time Spent: 6h 10m  (was: 6h)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9498) RowJson exception for unsupported types should list the relevant fields

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9498?focusedWorklogId=405068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405068
 ]

ASF GitHub Bot logged work on BEAM-9498:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:26
Start Date: 18/Mar/20 00:26
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9: 
[BEAM-9498] Include descriptor and type of unsupported fields in RowJson 
exception
URL: https://github.com/apache/beam/pull/9
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405068)
Time Spent: 40m  (was: 0.5h)

> RowJson exception for unsupported types should list the relevant fields
> ---
>
> Key: BEAM-9498
> URL: https://issues.apache.org/jira/browse/BEAM-9498
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9533) Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9533?focusedWorklogId=405066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405066
 ]

ASF GitHub Bot logged work on BEAM-9533:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:23
Start Date: 18/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11149: [BEAM-9533] Adding 
tox cloud tests
URL: https://github.com/apache/beam/pull/11149#issuecomment-600363403
 
 
   I also plan to send this update to dev@:
   ```
   Subject: [announce] tox test suites changed to *-cloud
   Hello all,
   as of today we'll be merging a change changing all the *-gcp, and *-aws tox 
suites to be *-cloud. If you were using any of them, please be informed that 
they will not work from today. Please use *-cloud tox targets from now on.
   
   Most of the gradle tasks will remain the same. Please see[1] for the changes.
   
   [1] https://github.com/apache/beam/pull/11149
   
   Thanks
   -P.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405066)
Remaining Estimate: 0h
Time Spent: 10m

> Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both
> -
>
> Key: BEAM-9533
> URL: https://issues.apache.org/jira/browse/BEAM-9533
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are `py37-gcp`, py37-aws test suites. Let's consolidate all 
> of them into py37-cloud, along with other py35-gcp, py27-gcp, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8866) portableWordCount Flink/Spark - flaky post commits

2020-03-17 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8866.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> portableWordCount Flink/Spark - flaky post commits
> --
>
> Key: BEAM-8866
> URL: https://issues.apache.org/jira/browse/BEAM-8866
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ahmet Altay
>Assignee: Kyle Weaver
>Priority: Critical
>  Labels: portability-flink
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Logs: 
> [https://scans.gradle.com/s/rkdiftvzvr7cy/console-log?task=:sdks:python:test-suites:portable:py36:portableWordCountFlinkRunnerStreaming]
> Error:
> ..
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/localfilesystem.py",
>  line 335, in delete   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/localfilesystem.py",
>  line 335, in delete     raise BeamIOError("Delete operation failed", 
> exceptions) apache_beam.io.filesystem.BeamIOError: Delete operation failed 
> with exceptions \{'/tmp/py-wordcount-direct-1-of-2': OSError('No 
> files found to delete under: /tmp/py-wordcount-direct-1-of-2',), 
> '/tmp/py-wordcount-direct-0-of-2': OSError('No files found to delete 
> under: /tmp/py-wordcount-direct-0-of-2',)} During handling of the 
> above exception, another exception occurred:
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8866) portableWordCount Flink/Spark - flaky post commits

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8866?focusedWorklogId=405061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405061
 ]

ASF GitHub Bot logged work on BEAM-8866:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:15
Start Date: 18/Mar/20 00:15
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11140: [BEAM-8866] Use unique 
temp dir for output of portable word count tests.
URL: https://github.com/apache/beam/pull/11140#issuecomment-600361336
 
 
   Python precommit failure is a known flake (BEAM-9119). Java precommit 
failure also looks like a flake -- filed BEAM-9534
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405061)
Time Spent: 0.5h  (was: 20m)

> portableWordCount Flink/Spark - flaky post commits
> --
>
> Key: BEAM-8866
> URL: https://issues.apache.org/jira/browse/BEAM-8866
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ahmet Altay
>Assignee: Kyle Weaver
>Priority: Critical
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Logs: 
> [https://scans.gradle.com/s/rkdiftvzvr7cy/console-log?task=:sdks:python:test-suites:portable:py36:portableWordCountFlinkRunnerStreaming]
> Error:
> ..
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/localfilesystem.py",
>  line 335, in delete   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/localfilesystem.py",
>  line 335, in delete     raise BeamIOError("Delete operation failed", 
> exceptions) apache_beam.io.filesystem.BeamIOError: Delete operation failed 
> with exceptions \{'/tmp/py-wordcount-direct-1-of-2': OSError('No 
> files found to delete under: /tmp/py-wordcount-direct-1-of-2',), 
> '/tmp/py-wordcount-direct-0-of-2': OSError('No files found to delete 
> under: /tmp/py-wordcount-direct-0-of-2',)} During handling of the 
> above exception, another exception occurred:
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8866) portableWordCount Flink/Spark - flaky post commits

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8866?focusedWorklogId=405062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405062
 ]

ASF GitHub Bot logged work on BEAM-8866:


Author: ASF GitHub Bot
Created on: 18/Mar/20 00:15
Start Date: 18/Mar/20 00:15
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11140: [BEAM-8866] Use 
unique temp dir for output of portable word count tests.
URL: https://github.com/apache/beam/pull/11140
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405062)
Time Spent: 40m  (was: 0.5h)

> portableWordCount Flink/Spark - flaky post commits
> --
>
> Key: BEAM-8866
> URL: https://issues.apache.org/jira/browse/BEAM-8866
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ahmet Altay
>Assignee: Kyle Weaver
>Priority: Critical
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Logs: 
> [https://scans.gradle.com/s/rkdiftvzvr7cy/console-log?task=:sdks:python:test-suites:portable:py36:portableWordCountFlinkRunnerStreaming]
> Error:
> ..
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/localfilesystem.py",
>  line 335, in delete   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/io/localfilesystem.py",
>  line 335, in delete     raise BeamIOError("Delete operation failed", 
> exceptions) apache_beam.io.filesystem.BeamIOError: Delete operation failed 
> with exceptions \{'/tmp/py-wordcount-direct-1-of-2': OSError('No 
> files found to delete under: /tmp/py-wordcount-direct-1-of-2',), 
> '/tmp/py-wordcount-direct-0-of-2': OSError('No files found to delete 
> under: /tmp/py-wordcount-direct-0-of-2',)} During handling of the 
> above exception, another exception occurred:
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9534) Flink testParDoRequiresStableInput timed out

2020-03-17 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-9534:
-

 Summary: Flink testParDoRequiresStableInput timed out
 Key: BEAM-9534
 URL: https://issues.apache.org/jira/browse/BEAM-9534
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Kyle Weaver


Regression
org.apache.beam.runners.flink.FlinkRequiresStableInputTest.testParDoRequiresStableInput

Failing for the past 1 build (Since Failed#10395 )
Took 30 sec.
Error Message
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
Stacktrace
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.io.FileOutputStream.write(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:290)
at java.util.zip.ZipOutputStream.writeInt(ZipOutputStream.java:723)
at java.util.zip.ZipOutputStream.writeLOC(ZipOutputStream.java:401)
at java.util.zip.ZipOutputStream.putNextEntry(ZipOutputStream.java:238)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:266)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:253)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:253)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:253)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:253)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:253)
at 
org.apache.beam.sdk.util.ZipFiles.zipDirectoryInternal(ZipFiles.java:253)
at org.apache.beam.sdk.util.ZipFiles.zipDirectory(ZipFiles.java:223)
at 
org.apache.beam.runners.core.construction.Environments.zipDirectory(Environments.java:234)
at 
org.apache.beam.runners.core.construction.Environments.getArtifacts(Environments.java:219)
at 
org.apache.beam.runners.core.construction.Environments.createOrGetDefaultEnvironment(Environments.java:110)
at 
org.apache.beam.runners.core.construction.SdkComponents.create(SdkComponents.java:92)
at 
org.apache.beam.runners.core.construction.ParDoTranslation.getParDoPayload(ParDoTranslation.java:735)
at 
org.apache.beam.runners.core.construction.ParDoTranslation.getSchemaInformation(ParDoTranslation.java:344)
at 
org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$ParDoStreamingTranslator.translateNode(FlinkStreamingTransformTranslators.java:669)
at 
org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:157)
at 
org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:136)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
at 
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:463)
at 
org.apache.beam.runners.flink.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:38)
at 
org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.translate(FlinkStreamingPipelineTranslator.java:88)
at 
org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:116)
at 
org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.getJobGraph(FlinkPipelineExecutionEnvironment.java:153)
at 
org.apache.beam.runners.flink.FlinkRunner.getJobGraph(FlinkRunner.java:209)
at 
org.apache.beam.runners.flink.FlinkRequiresStableInputTest.getJobGraph(FlinkRequiresStableInputTest.java:176)
at 
org.apache.beam.runners.flink.FlinkRequiresStableInputTest.restoreFromSavepoint(FlinkRequiresStableInputTest.java:201)
at 
org.apache.beam.runners.flink.FlinkRequiresStableInputTest.testParDoRequiresStableInput(FlinkRequiresStableInputTest.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 

[jira] [Commented] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061263#comment-17061263
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-9484:
-

Haven't been looking into this actively. So sending this your way. Thanks Pablo.

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath reassigned BEAM-9484:
---

Assignee: Pablo Estrada  (was: Chamikara Madhusanka Jayalath)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405056
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:57
Start Date: 17/Mar/20 23:57
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r394036805
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -65,6 +65,7 @@
 from apache_beam.transforms import environments
 from apache_beam.transforms import userstate
 from apache_beam.transforms import window
+from apache_beam.transforms.deduplicate import DeduplictaionWithinDuration
 
 Review comment:
   Thanks for the pointer!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405056)
Time Spent: 4h 10m  (was: 4h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9524) ib.show() spins forever when cells are re-executed

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9524?focusedWorklogId=405055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405055
 ]

ASF GitHub Bot logged work on BEAM-9524:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:54
Start Date: 17/Mar/20 23:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11128: [BEAM-9524] Fix for 
ib.show() executing indefinitely
URL: https://github.com/apache/beam/pull/11128#issuecomment-600355945
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405055)
Time Spent: 20m  (was: 10m)

> ib.show() spins forever when cells are re-executed
> --
>
> Key: BEAM-9524
> URL: https://issues.apache.org/jira/browse/BEAM-9524
> Project: Beam
>  Issue Type: Bug
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9533) Replace *-gcp/*-aws tox suites with *-cloud suites to run unit tests for both

2020-03-17 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9533:
---

 Summary: Replace *-gcp/*-aws tox suites with *-cloud suites to run 
unit tests for both
 Key: BEAM-9533
 URL: https://issues.apache.org/jira/browse/BEAM-9533
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Pablo Estrada
Assignee: Pablo Estrada


Currently there are `py37-gcp`, py37-aws test suites. Let's consolidate all of 
them into py37-cloud, along with other py35-gcp, py27-gcp, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405047
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:37
Start Date: 17/Mar/20 23:37
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #11060: [BEAM-9454] Create 
Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#issuecomment-600350868
 
 
   Run Python 3.5 Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405047)
Time Spent: 4h  (was: 3h 50m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405046
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:36
Start Date: 17/Mar/20 23:36
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #11060: [BEAM-9454] Create 
Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#issuecomment-600350776
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405046)
Time Spent: 3h 50m  (was: 3h 40m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405045
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:35
Start Date: 17/Mar/20 23:35
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r394029968
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
+  """ A PTransform which deduplicates input records over a time domain and
+  threshold. Values in different windows will NOT be considered duplicates of
+  each other. Deduplication is best effort.
+
+  The durations specified may impose memory and/or storage requirements within
+  a runner and care might need to be used to ensure that the deduplication time
+  limit is long enough to remove duplicates but short enough to not cause
+  performance problems within a runner. Each runner may provide an optimized
+  implementation of their choice using the deduplication time domain and
+  threshold specified.
+
+  Does not preserve any order the input PCollection might have had.
+  """
+  def __init__(
+  self, time_domain=TimeDomain.REAL_TIME, duration=Duration(10 * 60)):
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError(
+  'Unsupported TimeDomain: %r. DeduplictaionWithinDuration'
+  'expects TimeDomain.WATERMARK or TimeDomain.REAL_TIME' %
+  (time_domain, ))
+self.time_domain = time_domain
+self.duration = duration
+
+  def _create_deduplicate_fn(self):
+timer_spec = userstate.TimerSpec('expiry_timer', self.time_domain)
+state_spec = userstate.BagStateSpec('seen', BooleanCoder())
+duration = self.duration
+domain = self.time_domain
+
+class DeduplicationFn(DoFn):
+  def process(
+  self,
+  element,
+  ts=DoFn.TimestampParam,
+  seen_state=DoFn.StateParam(state_spec),
+  expiry_timer=DoFn.TimerParam(timer_spec)):
+if True in seen_state.read():
+  return
+
+if domain == TimeDomain.REAL_TIME:
+  expiry_timer.set(Timestamp.now() + duration)
+elif domain == TimeDomain.WATERMARK:
+  expiry_timer.set(ts + duration)
 
 Review comment:
   Are you talking about the watermark case? yes the timer will be fires when 
the watermark advance to 11 if the timer is set by the value with timestamp 1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405045)
Time Spent: 3h 40m  (was: 3.5h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: 

[jira] [Comment Edited] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061257#comment-17061257
 ] 

Pablo Estrada edited comment on BEAM-9484 at 3/17/20, 11:34 PM:


[~chamikara] lmk if you need a hand looking into this, or if you got it. Feel 
free to assign to me too.


was (Author: pabloem):
[~chamikara] lmk if you need a hand looking into this, or if you got it.

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061257#comment-17061257
 ] 

Pablo Estrada commented on BEAM-9484:
-

[~chamikara] lmk if you need a hand looking into this, or if you got it.

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9498) RowJson exception for unsupported types should list the relevant fields

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9498?focusedWorklogId=405044=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405044
 ]

ASF GitHub Bot logged work on BEAM-9498:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:31
Start Date: 17/Mar/20 23:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9: 
[BEAM-9498] Include descriptor and type of unsupported fields in RowJson 
exception
URL: https://github.com/apache/beam/pull/9#discussion_r394028731
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
 ##
 @@ -84,41 +85,76 @@
   private static final ImmutableSet SUPPORTED_TYPES =
   ImmutableSet.of(BYTE, INT16, INT32, INT64, FLOAT, DOUBLE, BOOLEAN, 
STRING, DECIMAL);
 
+  /**
+   * Throws {@link UnsupportedRowJsonException} if {@code schema} contains an 
unsupported field
+   * type.
+   */
   public static void verifySchemaSupported(Schema schema) {
-schema.getFields().forEach(RowJson::verifyFieldTypeSupported);
+ImmutableList unsupportedFields = 
findUnsupportedFields(schema);
+if (!unsupportedFields.isEmpty()) {
+  throw new UnsupportedRowJsonException(
+  String.format(
+  "Field type%s %s not supported when converting between JSON and 
Rows. Supported types are: %s",
+  unsupportedFields.size() > 1 ? "s" : "",
+  unsupportedFields.toString(),
+  SUPPORTED_TYPES.toString()));
+}
+  }
+
+  private static class UnsupportedField {
+final String descriptor;
+final TypeName typeName;
+
+UnsupportedField(String descriptor, TypeName typeName) {
+  this.descriptor = descriptor;
+  this.typeName = typeName;
+}
+
+@Override
+public String toString() {
+  return this.descriptor + "=" + this.typeName;
+}
+  }
+
+  private static ImmutableList findUnsupportedFields(Schema 
schema) {
+return schema.getFields().stream()
+.flatMap((field) -> findUnsupportedFields(field).stream())
+.collect(toImmutableList());
   }
 
-  static void verifyFieldTypeSupported(Field field) {
+  private static ImmutableList findUnsupportedFields(Field 
field) {
 FieldType fieldType = field.getType();
 
 Review comment:
   Done! thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405044)
Time Spent: 0.5h  (was: 20m)

> RowJson exception for unsupported types should list the relevant fields
> ---
>
> Key: BEAM-9498
> URL: https://issues.apache.org/jira/browse/BEAM-9498
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3415) JUnit5 support

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3415?focusedWorklogId=405041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405041
 ]

ASF GitHub Bot logged work on BEAM-3415:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:29
Start Date: 17/Mar/20 23:29
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #4360: [BEAM-3415] junit 5 
integration
URL: https://github.com/apache/beam/pull/4360#issuecomment-600348843
 
 
   It seems like it's a good amount of work for a small amount of value so
   people are having a hard time justifying to do the work over other things
   that could be done.
   
   On Tue, Mar 17, 2020 at 4:24 PM mgrey0  wrote:
   
   > Any news on this given that it's been 2 years? A project I am working on
   > could use junit 5 support as all other components of the project support 5,
   > leaving beam as the sole user of junit 4.
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405041)
Time Spent: 20m  (was: 10m)

> JUnit5 support
> --
>
> Key: BEAM-3415
> URL: https://issues.apache.org/jira/browse/BEAM-3415
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Romain Manni-Bucau
>Assignee: Romain Manni-Bucau
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405040
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:25
Start Date: 17/Mar/20 23:25
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r394026985
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
+  """ A PTransform which deduplicates input records over a time domain and
+  threshold. Values in different windows will NOT be considered duplicates of
+  each other. Deduplication is best effort.
+
+  The durations specified may impose memory and/or storage requirements within
+  a runner and care might need to be used to ensure that the deduplication time
+  limit is long enough to remove duplicates but short enough to not cause
+  performance problems within a runner. Each runner may provide an optimized
+  implementation of their choice using the deduplication time domain and
+  threshold specified.
+
+  Does not preserve any order the input PCollection might have had.
+  """
+  def __init__(
+  self, time_domain=TimeDomain.REAL_TIME, duration=Duration(10 * 60)):
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError(
+  'Unsupported TimeDomain: %r. DeduplictaionWithinDuration'
+  'expects TimeDomain.WATERMARK or TimeDomain.REAL_TIME' %
+  (time_domain, ))
+self.time_domain = time_domain
+self.duration = duration
+
+  def _create_deduplicate_fn(self):
+timer_spec = userstate.TimerSpec('expiry_timer', self.time_domain)
+state_spec = userstate.BagStateSpec('seen', BooleanCoder())
+duration = self.duration
+domain = self.time_domain
+
+class DeduplicationFn(DoFn):
+  def process(
+  self,
+  element,
+  ts=DoFn.TimestampParam,
+  seen_state=DoFn.StateParam(state_spec),
+  expiry_timer=DoFn.TimerParam(timer_spec)):
+if True in seen_state.read():
+  return
+
+if domain == TimeDomain.REAL_TIME:
+  expiry_timer.set(Timestamp.now() + duration)
+elif domain == TimeDomain.WATERMARK:
+  expiry_timer.set(ts + duration)
+else:
+  raise ValueError(
+  'Unsupported TimeDomain: %r. DeduplictaionWithinDuration'
+  'expects TimeDomain.WATERMARK or TimeDomain.REAL_TIME' %
+  (domain, ))
+seen_state.add(True)
+value, _ = element
+yield value
+
+  def infer_output_type(self, input_type):
+key_type, _ = trivial_inference.key_value_types(input_type)
+return key_type
+
+  @userstate.on_timer(timer_spec)
+  def process_timer(self, seen_state=DoFn.StateParam(state_spec)):
+seen_state.clear()
+
+return DeduplicationFn()
+
+  def expand(self, pcoll):
+return (
+pcoll
+| 'KeyByElement' >> Map(lambda x: (x, None))
+| 'DeduplicateFn' >> ParDo(self._create_deduplicate_fn()))
 
 Review comment:
   Are you suggesting that we provide both `DeduplicatePerKey` and 

[jira] [Work logged] (BEAM-3415) JUnit5 support

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3415?focusedWorklogId=405039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405039
 ]

ASF GitHub Bot logged work on BEAM-3415:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:24
Start Date: 17/Mar/20 23:24
Worklog Time Spent: 10m 
  Work Description: mgrey0 commented on issue #4360: [BEAM-3415] junit 5 
integration
URL: https://github.com/apache/beam/pull/4360#issuecomment-600347563
 
 
   Any news on this given that it's been 2 years? A project I am working on 
could use junit 5 support as all other components of the project support 5, 
leaving beam as the sole user of junit 4.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405039)
Remaining Estimate: 0h
Time Spent: 10m

> JUnit5 support
> --
>
> Key: BEAM-3415
> URL: https://issues.apache.org/jira/browse/BEAM-3415
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Romain Manni-Bucau
>Assignee: Romain Manni-Bucau
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9532) test_last_updated is failing in s3io_test

2020-03-17 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061253#comment-17061253
 ] 

Pablo Estrada commented on BEAM-9532:
-

cc: [~mattmorgis] fyi

> test_last_updated is failing in s3io_test
> -
>
> Key: BEAM-9532
> URL: https://issues.apache.org/jira/browse/BEAM-9532
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-aws
>Reporter: Pablo Estrada
>Priority: Major
>
> The timestamps are not set appropriately. For some reason they are one hour 
> away from each other:
>  
> ==
> FAIL: test_last_updated (apache_beam.io.aws.s3io_test.TestS3IO)
> --
> Traceback (most recent call last):
>  File "/home/pabloem/codes/meam/sdks/python/apache_beam/io/aws/s3io_test.py", 
> line 125, in test_last_updated
>  self.assertAlmostEqual(result, time.time(), delta=tolerance)
> AssertionError: 1584481946.282874 != 1584485546.2829826 within 300 delta
> --
>  
> Note that 1584481946.282874 - 1584485546.2829826 is 3600



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9532) test_last_updated is failing in s3io_test

2020-03-17 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9532:
---

 Summary: test_last_updated is failing in s3io_test
 Key: BEAM-9532
 URL: https://issues.apache.org/jira/browse/BEAM-9532
 Project: Beam
  Issue Type: Bug
  Components: io-py-aws
Reporter: Pablo Estrada


The timestamps are not set appropriately. For some reason they are one hour 
away from each other:

 

==
FAIL: test_last_updated (apache_beam.io.aws.s3io_test.TestS3IO)
--
Traceback (most recent call last):
 File "/home/pabloem/codes/meam/sdks/python/apache_beam/io/aws/s3io_test.py", 
line 125, in test_last_updated
 self.assertAlmostEqual(result, time.time(), delta=tolerance)
AssertionError: 1584481946.282874 != 1584485546.2829826 within 300 delta

--

 

Note that 1584481946.282874 - 1584485546.2829826 is 3600



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=405038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405038
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:21
Start Date: 17/Mar/20 23:21
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r394025988
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
+  """ A PTransform which deduplicates input records over a time domain and
+  threshold. Values in different windows will NOT be considered duplicates of
+  each other. Deduplication is best effort.
+
+  The durations specified may impose memory and/or storage requirements within
+  a runner and care might need to be used to ensure that the deduplication time
+  limit is long enough to remove duplicates but short enough to not cause
+  performance problems within a runner. Each runner may provide an optimized
+  implementation of their choice using the deduplication time domain and
+  threshold specified.
+
+  Does not preserve any order the input PCollection might have had.
+  """
+  def __init__(
+  self, time_domain=TimeDomain.REAL_TIME, duration=Duration(10 * 60)):
 
 Review comment:
   > Also, is it not possible to have both a real and watermark delay? This API 
forces you to do one or the other.
   
   Currently the API only allows either watermark based or walltime based.  I'm 
not sure what kind of strategy we can do if allowing both. Do we want to do no 
matter which fires first, just clear another?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405038)
Time Spent: 3h 20m  (was: 3h 10m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2717) Infer coders in SDK prior to handing off pipeline to Runner

2020-03-17 Thread Joao Peixoto (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061250#comment-17061250
 ] 

Joao Peixoto commented on BEAM-2717:


Ran into this issue while trying to deploy a pipeline into GCP.

I'm not sure I understand what the underlying problem is or how to work around 
it. Could someone shed some light?

My pipeline uses protobuf to generate the types used within. Classes and 
functions are annotated with "@beam.typehints.with_input_types" and the like.

> Infer coders in SDK prior to handing off pipeline to Runner
> ---
>
> Key: BEAM-2717
> URL: https://issues.apache.org/jira/browse/BEAM-2717
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently all runners have to duplicate this work, and there's also a hack 
> storing the element type rather than the coder in the Runner protos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=405034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405034
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:12
Start Date: 17/Mar/20 23:12
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11107: [BEAM-9468] 
[WIP] add HL7v2IO and FhirIO
URL: https://github.com/apache/beam/pull/11107
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405034)
Time Spent: 1h 40m  (was: 1.5h)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=405033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405033
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:12
Start Date: 17/Mar/20 23:12
Worklog Time Spent: 10m 
  Work Description: jaketf commented on issue #11107: [BEAM-9468] [WIP] add 
HL7v2IO and FhirIO
URL: https://github.com/apache/beam/pull/11107#issuecomment-600344533
 
 
   Closing this split into separate PRs for HL7v2IO and FhirIO
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405033)
Time Spent: 1.5h  (was: 1h 20m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9430) Migrate from ProcessContext#updateWatermark to WatermarkEstimators

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9430?focusedWorklogId=405029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405029
 ]

ASF GitHub Bot logged work on BEAM-9430:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:08
Start Date: 17/Mar/20 23:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11126: [BEAM-9430, 
BEAM-2939] Migrate from ProcessContext#updateWatermark to WatermarkEstimators
URL: https://github.com/apache/beam/pull/11126#issuecomment-600343432
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405029)
Time Spent: 50m  (was: 40m)

> Migrate from ProcessContext#updateWatermark to WatermarkEstimators
> --
>
> Key: BEAM-9430
> URL: https://issues.apache.org/jira/browse/BEAM-9430
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current discussion underway in 
> [https://lists.apache.org/thread.html/r5d974b6a58bc04ff4c02682fda4ef68608121f1bf23a86e9d592ca6e%40%3Cdev.beam.apache.org%3E]
>  
> Proposed API: [https://github.com/apache/beam/pull/10992]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061242#comment-17061242
 ] 

Ahmet Altay commented on BEAM-9484:
---

cc: [~pabloem]

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9484) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is flaky in DirectRunner Postcommits

2020-03-17 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-9484:
-

Assignee: Chamikara Madhusanka Jayalath

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests is 
> flaky in DirectRunner Postcommits
> 
>
> Key: BEAM-9484
> URL: https://issues.apache.org/jira/browse/BEAM-9484
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> From https://builds.apache.org/job/beam_PostCommit_Python37_PR/99/: 
> {noformat}
>  ==
> 04:40:28  FAIL: test_big_query_write 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)
> 04:40:28  
> --
> 04:40:28  Traceback (most recent call last):
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 167, in test_big_query_write
> 04:40:28  write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 522, in __exit__
> 04:40:28  self.run().wait_until_finish()
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 495, in run
> 04:40:28  self._options).run(False)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 508, in run
> 04:40:28  return self.runner.run_pipeline(self, self._options)
> 04:40:28File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 53, in run_pipeline
> 04:40:28  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 04:40:28  AssertionError: 
> 04:40:28  Expected: (Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')])
> 04:40:28   but: Expected data is [(1, 'abc'), (2, 'def'), (3, '你好'), (4, 
> 'привет')] Actual data is []
> 04:40:28  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=405027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405027
 ]

ASF GitHub Bot logged work on BEAM-7516:


Author: ASF GitHub Bot
Created on: 17/Mar/20 23:00
Start Date: 17/Mar/20 23:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10291: 
[BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive 
watermark manager
URL: https://github.com/apache/beam/pull/10291#issuecomment-600341259
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405027)
Time Spent: 7.5h  (was: 7h 20m)

> Add a watermark manager for the fn_api_runner
> -
>
> Key: BEAM-7516
> URL: https://issues.apache.org/jira/browse/BEAM-7516
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> To track watermarks for each stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9390) [PostCommit_Java_PortabilityApi] [BigQuery related ITs] UnsupportedOperationException: BigQuery source must be split before being read

2020-03-17 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061239#comment-17061239
 ] 

Luke Cwik edited comment on BEAM-9390 at 3/17/20, 10:57 PM:


The issue is that the BigQuerySource only works with initial splitting since 
initial splitting kicks off the import job.

 

This can be fixed by:

1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using 
beam_fn_api

2) Update the JRH to support doing the expansion for SplittableDoFns itself

3) Migrate BigQuerySource to a SplittableDoFn

4) Implement "concat" source over the AvroIO files.

5) Migrate to use the UW and then Dataflow will perform the SDF expansion and 
disable the JRH version of this test.

 

All but #3 is throw away work but #3 would make the source incompatible with 
non-portable runners. #5 is likely the easiest to do once it becomes available.


was (Author: lcwik):
The issue is that the BigQuerySource only works with initial splitting since 
initial splitting kicks off the import job.

 

This can be fixed by:

1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using 
beam_fn_api

2) Update the JRH to support doing the expansion for SplittableDoFns itself

3) Migrate BigQuerySource to a SplittableDoFn

4) Implement "concat" source over the AvroIO files.

5) Migrate to use the UW and then Dataflow will perform the SDF expansion.

 

All but #3 is throw away work but #3 would make the source incompatible with 
non-portable runners. #5 is likely the easiest to do once it becomes available.

> [PostCommit_Java_PortabilityApi] [BigQuery related ITs] 
> UnsupportedOperationException: BigQuery source must be split before being read
> --
>
> Key: BEAM-9390
> URL: https://issues.apache.org/jira/browse/BEAM-9390
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Luke Cwik
>Priority: Major
>  Labels: currently-failing
>
> Failed tests:
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2EBigQueryTornadoesWithExport
>  
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2eBigQueryTornadoesWithStorageApi
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringTableFunction
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringDynamicDestinations
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryTimePartitioning
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClustering
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testLegacyQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testStandardQueryWithoutCustom
> ([https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4226/#showFailuresLink)]
>  
> Example failures:
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -596: java.lang.UnsupportedOperationException: BigQuery source must be split 
> before being read at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146)
> ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9390) [PostCommit_Java_PortabilityApi] [BigQuery related ITs] UnsupportedOperationException: BigQuery source must be split before being read

2020-03-17 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061239#comment-17061239
 ] 

Luke Cwik edited comment on BEAM-9390 at 3/17/20, 10:56 PM:


The issue is that the BigQuerySource only works with initial splitting since 
initial splitting kicks off the import job.

 

This can be fixed by:

1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using 
beam_fn_api

2) Update the JRH to support doing the expansion for SplittableDoFns itself

3) Migrate BigQuerySource to a SplittableDoFn

4) Implement "concat" source over the AvroIO files.

5) Migrate to use the UW and then Dataflow will perform the SDF expansion.

 

All but #3 is throw away work but #3 would make the source incompatible with 
non-portable runners. #5 is likely the easiest to do once it becomes available.


was (Author: lcwik):
The issue is that the BigQuerySource only works with initial splitting since 
initial splitting kicks off the import job.

 

This can be fixed by:

1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using 
beam_fn_api

2) Update the JRH to support doing the expansion for SplittableDoFns itself

3) Migrate BigQuerySource to a SplittableDoFn

4) Implement "concat" source over the AvroIO files.

 

All but #3 is throw away work but #3 would make the source incompatible with 
non-portable runners. #1 is likely the easiest to do.

> [PostCommit_Java_PortabilityApi] [BigQuery related ITs] 
> UnsupportedOperationException: BigQuery source must be split before being read
> --
>
> Key: BEAM-9390
> URL: https://issues.apache.org/jira/browse/BEAM-9390
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Luke Cwik
>Priority: Major
>  Labels: currently-failing
>
> Failed tests:
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2EBigQueryTornadoesWithExport
>  
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2eBigQueryTornadoesWithStorageApi
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringTableFunction
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringDynamicDestinations
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryTimePartitioning
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClustering
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testLegacyQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testStandardQueryWithoutCustom
> ([https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4226/#showFailuresLink)]
>  
> Example failures:
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -596: java.lang.UnsupportedOperationException: BigQuery source must be split 
> before being read at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146)
> ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9531) Kickstart the C# .net SDK for Beam

2020-03-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9531:
---
Labels: gsoc gsoc2020 mentor  (was: )

> Kickstart the C# .net SDK for Beam
> --
>
> Key: BEAM-9531
> URL: https://issues.apache.org/jira/browse/BEAM-9531
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-ideas
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: gsoc, gsoc2020, mentor
>
> The idea of this GSoC project is to kickstart the creation of the Beam SDK 
> for C# (.net). The goal is to create the minimal set of pieces required to 
> allow a user to write and execute a WordCount type of pipeline in C# with 
> Beam.
> To do this we will need to implement the minimum set of abstractions of the 
> SDK: ParDo (Beam’s Map with super powers) + GroupByKey, as well as a Harness 
> capable of writing to the data channel, and some internal data 
> representations (WindowedValue and others) to be able to run the pipeline 
> using portable runners.
> Don’t worry if the Beam specific details are not clear, just familiarity with 
> the Big Data WordCount concepts are a prerequisite, as well as probably 
> reading some of the Beam introductory material [1-3]. Good knowledge of C# 
> and its idioms, as well as familiarity with the recent .net ecosystem are 
> required for the student who wants to apply for this project.
> [1] [https://beam.apache.org/]
> [2] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101]
> [3] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9390) [PostCommit_Java_PortabilityApi] [BigQuery related ITs] UnsupportedOperationException: BigQuery source must be split before being read

2020-03-17 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061239#comment-17061239
 ] 

Luke Cwik commented on BEAM-9390:
-

The issue is that the BigQuerySource only works with initial splitting since 
initial splitting kicks off the import job.

 

This can be fixed by:

1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using 
beam_fn_api

2) Update the JRH to support doing the expansion for SplittableDoFns itself

3) Migrate BigQuerySource to a SplittableDoFn

4) Implement "concat" source over the AvroIO files.

 

All but #3 is throw away work but #3 would make the source incompatible with 
non-portable runners. #1 is likely the easiest to do.

> [PostCommit_Java_PortabilityApi] [BigQuery related ITs] 
> UnsupportedOperationException: BigQuery source must be split before being read
> --
>
> Key: BEAM-9390
> URL: https://issues.apache.org/jira/browse/BEAM-9390
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Luke Cwik
>Priority: Major
>  Labels: currently-failing
>
> Failed tests:
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2EBigQueryTornadoesWithExport
>  
> org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2eBigQueryTornadoesWithStorageApi
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringTableFunction
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringDynamicDestinations
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryTimePartitioning
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClustering
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testLegacyQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithoutReshuffle
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testStandardQueryWithoutCustom
> ([https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4226/#showFailuresLink)]
>  
> Example failures:
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -596: java.lang.UnsupportedOperationException: BigQuery source must be split 
> before being read at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159)
>  at 
> org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146)
> ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9531) Kickstart the C# .net SDK for Beam

2020-03-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9531:
---
Status: Open  (was: Triage Needed)

> Kickstart the C# .net SDK for Beam
> --
>
> Key: BEAM-9531
> URL: https://issues.apache.org/jira/browse/BEAM-9531
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-ideas
>Reporter: Ismaël Mejía
>Priority: Minor
>
> The idea of this GSoC project is to kickstart the creation of the Beam SDK 
> for C# (.net). The goal is to create the minimal set of pieces required to 
> allow a user to write and execute a WordCount type of pipeline in C# with 
> Beam.
> To do this we will need to implement the minimum set of abstractions of the 
> SDK: ParDo (Beam’s Map with super powers) + GroupByKey, as well as a Harness 
> capable of writing to the data channel, and some internal data 
> representations (WindowedValue and others) to be able to run the pipeline 
> using portable runners.
> Don’t worry if the Beam specific details are not clear, just familiarity with 
> the Big Data WordCount concepts are a prerequisite, as well as probably 
> reading some of the Beam introductory material [1-3]. Good knowledge of C# 
> and its idioms, as well as familiarity with the recent .net ecosystem are 
> required for the student who wants to apply for this project.
> [1] [https://beam.apache.org/]
> [2] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101]
> [3] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9531) Kickstart the C# .net SDK for Beam

2020-03-17 Thread Jira
Ismaël Mejía created BEAM-9531:
--

 Summary: Kickstart the C# .net SDK for Beam
 Key: BEAM-9531
 URL: https://issues.apache.org/jira/browse/BEAM-9531
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-ideas
Reporter: Ismaël Mejía


The idea of this GSoC project is to kickstart the creation of the Beam SDK for 
C# (.net). The goal is to create the minimal set of pieces required to allow a 
user to write and execute a WordCount type of pipeline in C# with Beam.

To do this we will need to implement the minimum set of abstractions of the 
SDK: ParDo (Beam’s Map with super powers) + GroupByKey, as well as a Harness 
capable of writing to the data channel, and some internal data representations 
(WindowedValue and others) to be able to run the pipeline using portable 
runners.

Don’t worry if the Beam specific details are not clear, just familiarity with 
the Big Data WordCount concepts are a prerequisite, as well as probably reading 
some of the Beam introductory material [1-3]. Good knowledge of C# and its 
idioms, as well as familiarity with the recent .net ecosystem are required for 
the student who wants to apply for this project.

[1] [https://beam.apache.org/]
[2] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101]
[3] [https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=405022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405022
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 17/Mar/20 22:54
Start Date: 17/Mar/20 22:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11147: [BEAM-7923] Support 
dict and iterable PCollections in show
URL: https://github.com/apache/beam/pull/11147#issuecomment-600339328
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405022)
Time Spent: 4h 40m  (was: 4.5h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=405021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405021
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 17/Mar/20 22:53
Start Date: 17/Mar/20 22:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11147: [BEAM-7923] Support 
dict and iterable PCollections in show
URL: https://github.com/apache/beam/pull/11147#issuecomment-600339261
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405021)
Time Spent: 4.5h  (was: 4h 20m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=405003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405003
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:48
Start Date: 17/Mar/20 21:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11147: [BEAM-7923] Support 
dict and iterable PCollections in show
URL: https://github.com/apache/beam/pull/11147#issuecomment-600317504
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405003)
Time Spent: 4h 10m  (was: 4h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=405004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405004
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:48
Start Date: 17/Mar/20 21:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11147: [BEAM-7923] Support 
dict and iterable PCollections in show
URL: https://github.com/apache/beam/pull/11147#issuecomment-600317593
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405004)
Time Spent: 4h 20m  (was: 4h 10m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and do not have conflicting dependencies

2020-03-17 Thread David Yan (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061206#comment-17061206
 ] 

David Yan commented on BEAM-8551:
-

`pip check` is another way to check for broken dependencies.

> Beam Python containers should include all Beam SDK dependencies, and do not 
> have conflicting dependencies
> -
>
> Key: BEAM-8551
> URL: https://issues.apache.org/jira/browse/BEAM-8551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Checks could be introduced during container creation, and be enforced by 
> ValidatesContainer test suites. We could:
> - Check pip output or status code for incompatible dependency errors.
> - Remove internet access when installing apache-beam in the container, to 
> makes sure all dependencies are installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9530) Add `pip check` to ensure good python dependencies

2020-03-17 Thread David Yan (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Yan closed BEAM-9530.
---
Fix Version/s: Not applicable
   Resolution: Duplicate

> Add `pip check` to ensure good python dependencies
> --
>
> Key: BEAM-9530
> URL: https://issues.apache.org/jira/browse/BEAM-9530
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: David Yan
>Priority: Major
> Fix For: Not applicable
>
>
> We should add {{pip check}} after pip install in our tests to make sure there 
> is no incompatibility.  {{pip install}} does not return an error exit code 
> for broken dependencies for historical reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404984
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393972784
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
 
 Review comment:
   PTransform names are generally verbs. Perhaps just call this `Deduplicate`. 
The duration argument would typically be part of the constructor, e.g. 
`Deduplicate(duration=...)`, so wouldn't need to be repeated in the name. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404984)
Time Spent: 2h 50m  (was: 2h 40m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404986
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393978547
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -448,6 +449,42 @@ def is_buffered_correctly(actual):
 
   assert_that(actual, is_buffered_correctly)
 
+  def test_deduplication_transform_with_processing_time(self):
+# Note that current FnApiRunner doesn't respect either real timestamp.
+with self.create_pipeline() as p:
+  inputs = [
+  window.TimestampedValue('value_1', 1),
+  window.TimestampedValue('value_1', 2),
+  window.TimestampedValue('value_1', 3)
+  ]
+  actual = (
+  p
+  | beam.Create(inputs)
+  | beam.WindowInto(window.FixedWindows(10))
+  | DeduplictaionWithinDuration(duration=timestamp.Duration(5)))
+  assert_that(actual, equal_to(['value_1']))
+
+  @unittest.skip('TestStream not yet supported')
+  def test_deduplication_transform_with_event_time(self):
+test_stream = (
+TestStream().advance_watermark_to(0).add_elements([
+window.TimestampedValue('value_1', 1),
+window.TimestampedValue('value_1', 2),
+window.TimestampedValue('value_1', 10)
 
 Review comment:
   It's be a stronger test to add elements one at a time. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404986)
Time Spent: 3h 10m  (was: 3h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404987
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393978328
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
+  """ A PTransform which deduplicates input records over a time domain and
+  threshold. Values in different windows will NOT be considered duplicates of
+  each other. Deduplication is best effort.
+
+  The durations specified may impose memory and/or storage requirements within
+  a runner and care might need to be used to ensure that the deduplication time
+  limit is long enough to remove duplicates but short enough to not cause
+  performance problems within a runner. Each runner may provide an optimized
+  implementation of their choice using the deduplication time domain and
+  threshold specified.
+
+  Does not preserve any order the input PCollection might have had.
+  """
+  def __init__(
+  self, time_domain=TimeDomain.REAL_TIME, duration=Duration(10 * 60)):
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError(
+  'Unsupported TimeDomain: %r. DeduplictaionWithinDuration'
+  'expects TimeDomain.WATERMARK or TimeDomain.REAL_TIME' %
+  (time_domain, ))
+self.time_domain = time_domain
+self.duration = duration
+
+  def _create_deduplicate_fn(self):
+timer_spec = userstate.TimerSpec('expiry_timer', self.time_domain)
+state_spec = userstate.BagStateSpec('seen', BooleanCoder())
+duration = self.duration
+domain = self.time_domain
+
+class DeduplicationFn(DoFn):
+  def process(
+  self,
+  element,
+  ts=DoFn.TimestampParam,
+  seen_state=DoFn.StateParam(state_spec),
+  expiry_timer=DoFn.TimerParam(timer_spec)):
+if True in seen_state.read():
+  return
+
+if domain == TimeDomain.REAL_TIME:
+  expiry_timer.set(Timestamp.now() + duration)
+elif domain == TimeDomain.WATERMARK:
+  expiry_timer.set(ts + duration)
 
 Review comment:
   Suppose one has values with timestamps 10, 1, 12, and duration 10. This will 
not properly deduplicate because the timer will fire at 11 clearing state 
before 12 gets seen. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404987)
Time Spent: 3h 10m  (was: 3h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>

[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404988
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393974676
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
+  """ A PTransform which deduplicates input records over a time domain and
+  threshold. Values in different windows will NOT be considered duplicates of
+  each other. Deduplication is best effort.
+
+  The durations specified may impose memory and/or storage requirements within
+  a runner and care might need to be used to ensure that the deduplication time
+  limit is long enough to remove duplicates but short enough to not cause
+  performance problems within a runner. Each runner may provide an optimized
+  implementation of their choice using the deduplication time domain and
+  threshold specified.
+
+  Does not preserve any order the input PCollection might have had.
+  """
+  def __init__(
+  self, time_domain=TimeDomain.REAL_TIME, duration=Duration(10 * 60)):
 
 Review comment:
   I would make duration a required parameter; I don't think there's a good 
default we can pick for all pipelines. (This also allows us to have a more 
intelligent default in the future, e.g. dynamically based on the actual 
watermark/propagation delay in the pipeline.)
   
   Also, is it not possible to have both a real and watermark delay? This API 
forces you to do one or the other. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404988)
Time Spent: 3h 10m  (was: 3h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404981
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393968846
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -65,6 +65,7 @@
 from apache_beam.transforms import environments
 from apache_beam.transforms import userstate
 from apache_beam.transforms import window
+from apache_beam.transforms.deduplicate import DeduplictaionWithinDuration
 
 Review comment:
   Generally we've been following the convention of importing modules rather 
than objects from modules. 
http://google.github.io/styleguide/pyguide.html#22-imports
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404981)
Time Spent: 2h 40m  (was: 2.5h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404980
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393968339
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -448,6 +449,42 @@ def is_buffered_correctly(actual):
 
   assert_that(actual, is_buffered_correctly)
 
+  def test_deduplication_transform_with_processing_time(self):
 
 Review comment:
   fn_api_runner_test primarily for testing primitives. I would put this in 
apache_beam.transforms.deduplicate_test (or util_test, if that's where we plan 
to move it). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404980)
Time Spent: 2.5h  (was: 2h 20m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404982
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393970454
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -448,6 +449,42 @@ def is_buffered_correctly(actual):
 
   assert_that(actual, is_buffered_correctly)
 
+  def test_deduplication_transform_with_processing_time(self):
+# Note that current FnApiRunner doesn't respect either real timestamp.
+with self.create_pipeline() as p:
+  inputs = [
+  window.TimestampedValue('value_1', 1),
+  window.TimestampedValue('value_1', 2),
+  window.TimestampedValue('value_1', 3)
+  ]
+  actual = (
+  p
+  | beam.Create(inputs)
+  | beam.WindowInto(window.FixedWindows(10))
+  | DeduplictaionWithinDuration(duration=timestamp.Duration(5)))
+  assert_that(actual, equal_to(['value_1']))
+
+  @unittest.skip('TestStream not yet supported')
+  def test_deduplication_transform_with_event_time(self):
+test_stream = (
+TestStream().advance_watermark_to(0).add_elements([
+window.TimestampedValue('value_1', 1),
+window.TimestampedValue('value_1', 2),
+window.TimestampedValue('value_1', 10)
+]).advance_watermark_to(10).add_elements([
+window.TimestampedValue('value_1', 6)
+]).add_elements([window.TimestampedValue('value_2', 15)
+ ]).advance_watermark_to_infinity())
+
+with self.create_pipeline() as p:
+  actual = (
+  p
+  | test_stream
+  | DeduplictaionWithinDuration(
+  duration=timestamp.Duration(10),
+  time_domain=userstate.TimeDomain.WATERMARK))
+  assert_that(actual, equal_to(['value_1', 'value_1', 'value_2']))
 
 Review comment:
   The duplication of value_1 probably merits a comment. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404982)
Time Spent: 2h 40m  (was: 2.5h)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404985
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393976398
 
 

 ##
 File path: sdks/python/apache_beam/transforms/deduplicate.py
 ##
 @@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+"""a collection of ptransforms for deduplicating elements."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+from apache_beam.coders.coders import BooleanCoder
+from apache_beam.transforms import ptransform
+from apache_beam.transforms import userstate
+from apache_beam.transforms.core import DoFn
+from apache_beam.transforms.core import Map
+from apache_beam.transforms.core import ParDo
+from apache_beam.transforms.timeutil import TimeDomain
+from apache_beam.typehints import trivial_inference
+from apache_beam.utils.timestamp import Duration
+from apache_beam.utils.timestamp import Timestamp
+
+__all__ = [
+'DeduplictaionWithinDuration',
+]
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
+  """ A PTransform which deduplicates input records over a time domain and
+  threshold. Values in different windows will NOT be considered duplicates of
+  each other. Deduplication is best effort.
+
+  The durations specified may impose memory and/or storage requirements within
+  a runner and care might need to be used to ensure that the deduplication time
+  limit is long enough to remove duplicates but short enough to not cause
+  performance problems within a runner. Each runner may provide an optimized
+  implementation of their choice using the deduplication time domain and
+  threshold specified.
+
+  Does not preserve any order the input PCollection might have had.
+  """
+  def __init__(
+  self, time_domain=TimeDomain.REAL_TIME, duration=Duration(10 * 60)):
+if time_domain not in (TimeDomain.WATERMARK, TimeDomain.REAL_TIME):
+  raise ValueError(
+  'Unsupported TimeDomain: %r. DeduplictaionWithinDuration'
+  'expects TimeDomain.WATERMARK or TimeDomain.REAL_TIME' %
+  (time_domain, ))
+self.time_domain = time_domain
+self.duration = duration
+
+  def _create_deduplicate_fn(self):
+timer_spec = userstate.TimerSpec('expiry_timer', self.time_domain)
+state_spec = userstate.BagStateSpec('seen', BooleanCoder())
+duration = self.duration
+domain = self.time_domain
+
+class DeduplicationFn(DoFn):
+  def process(
+  self,
+  element,
+  ts=DoFn.TimestampParam,
+  seen_state=DoFn.StateParam(state_spec),
+  expiry_timer=DoFn.TimerParam(timer_spec)):
+if True in seen_state.read():
+  return
+
+if domain == TimeDomain.REAL_TIME:
+  expiry_timer.set(Timestamp.now() + duration)
+elif domain == TimeDomain.WATERMARK:
+  expiry_timer.set(ts + duration)
+else:
+  raise ValueError(
+  'Unsupported TimeDomain: %r. DeduplictaionWithinDuration'
+  'expects TimeDomain.WATERMARK or TimeDomain.REAL_TIME' %
+  (domain, ))
+seen_state.add(True)
+value, _ = element
+yield value
+
+  def infer_output_type(self, input_type):
+key_type, _ = trivial_inference.key_value_types(input_type)
+return key_type
+
+  @userstate.on_timer(timer_spec)
+  def process_timer(self, seen_state=DoFn.StateParam(state_spec)):
+seen_state.clear()
+
+return DeduplicationFn()
+
+  def expand(self, pcoll):
+return (
+pcoll
+| 'KeyByElement' >> Map(lambda x: (x, None))
+| 'DeduplicateFn' >> ParDo(self._create_deduplicate_fn()))
 
 Review comment:
   I would expose a DeduplicatePerKey transform which emits a 

[jira] [Work logged] (BEAM-9454) Add Deduplication transform for SDF

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9454?focusedWorklogId=404983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404983
 ]

ASF GitHub Bot logged work on BEAM-9454:


Author: ASF GitHub Bot
Created on: 17/Mar/20 21:25
Start Date: 17/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11060: [BEAM-9454] 
Create Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r393969469
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -448,6 +449,42 @@ def is_buffered_correctly(actual):
 
   assert_that(actual, is_buffered_correctly)
 
+  def test_deduplication_transform_with_processing_time(self):
+# Note that current FnApiRunner doesn't respect either real timestamp.
 
 Review comment:
   IIRC, the FnApiRunner does respect timestamps. (Not sure about the Create 
transform though.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404983)
Time Spent: 2h 50m  (was: 2h 40m)

> Add Deduplication transform for SDF
> ---
>
> Key: BEAM-9454
> URL: https://issues.apache.org/jira/browse/BEAM-9454
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py-core
>Reporter: Boyuan Zhang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When SDF is used as a source-like operation, it's necessary to provide a 
> default Deduplication transform for the SDF user to deduplicate values by 
> certain unique id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=404969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404969
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 17/Mar/20 20:55
Start Date: 17/Mar/20 20:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11147: [BEAM-7923] Support 
dict and iterable PCollections in show
URL: https://github.com/apache/beam/pull/11147#issuecomment-600293642
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404969)
Time Spent: 4h  (was: 3h 50m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9430) Migrate from ProcessContext#updateWatermark to WatermarkEstimators

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9430?focusedWorklogId=404960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404960
 ]

ASF GitHub Bot logged work on BEAM-9430:


Author: ASF GitHub Bot
Created on: 17/Mar/20 20:43
Start Date: 17/Mar/20 20:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11126: [BEAM-9430, 
BEAM-2939] Migrate from ProcessContext#updateWatermark to WatermarkEstimators
URL: https://github.com/apache/beam/pull/11126#issuecomment-600287442
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404960)
Time Spent: 40m  (was: 0.5h)

> Migrate from ProcessContext#updateWatermark to WatermarkEstimators
> --
>
> Key: BEAM-9430
> URL: https://issues.apache.org/jira/browse/BEAM-9430
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current discussion underway in 
> [https://lists.apache.org/thread.html/r5d974b6a58bc04ff4c02682fda4ef68608121f1bf23a86e9d592ca6e%40%3Cdev.beam.apache.org%3E]
>  
> Proposed API: [https://github.com/apache/beam/pull/10992]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=404958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-404958
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 17/Mar/20 20:39
Start Date: 17/Mar/20 20:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11147: [BEAM-7923] Support 
dict and iterable PCollections in show
URL: https://github.com/apache/beam/pull/11147#issuecomment-600286040
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 404958)
Time Spent: 3h 50m  (was: 3h 40m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >