[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778469
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 22:51
Start Date: 05/Jun/22 22:51
Worklog Time Spent: 10m 
  Work Description: lostluck commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146897475

   Run Go Postcommit




Issue Time Tracking
---

Worklog Id: (was: 778469)
Time Spent: 7h 10m  (was: 7h)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-10785) Support for coder argument in WriteToBigQuery

2022-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10785?focusedWorklogId=778427&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778427
 ]

ASF GitHub Bot logged work on BEAM-10785:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 06:58
Start Date: 05/Jun/22 06:58
Worklog Time Spent: 10m 
  Work Description: harrydrippin commented on PR #17518:
URL: https://github.com/apache/beam/pull/17518#issuecomment-1146753160

   @pabloem Just for sure, do you mean it is good if I apply the change to the 
original `RowAsDictJsonCoder` implementation just like my custom one? (adding 
`ensure_ascii=False` on the `json.dumps`)
   
   I will submit another PR for this if you confirm. Thanks!




Issue Time Tracking
---

Worklog Id: (was: 778427)
Time Spent: 2h 20m  (was: 2h 10m)

> Support for coder argument in WriteToBigQuery
> -
>
> Key: BEAM-10785
> URL: https://issues.apache.org/jira/browse/BEAM-10785
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Nakamura Yu
>Assignee: Seunghwan Hong
>Priority: P1
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When using WriteToBigQuery to transfer data to BigQuery, non-ascii characters 
> are replaced with replacement characters.
> This was due to the RowAsDictJsonCoder being set as the coder for the 
> BigQueryBatchFileLoads called inside WriteToBigQuery.
> I want to add coder to the argument of WriteToBigQuery so that I can set a 
> coder other than RowAsDictJsonCoder.
> If no problem, I will create a Pull Request next weekend.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-7439) Bigquery Write with schema None: TypeError: 'NoneType' object has no attribute '__getitem__'

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7439?focusedWorklogId=778368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778368
 ]

ASF GitHub Bot logged work on BEAM-7439:


Author: ASF GitHub Bot
Created on: 04/Jun/22 01:49
Start Date: 04/Jun/22 01:49
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new issue, #19463:
URL: https://github.com/apache/beam/issues/19463

   This is a followup item from https://issues.apache.org/jira/browse/BEAM-7439
   
   Update the documentation for comment 
https://issues.apache.org/jira/browse/BEAM-7439?focusedCommentId=16851420&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16851420
   
    
   
   cc: [~tvalentyn] [~chamikara] 
   
   Imported from Jira 
[BEAM-7482](https://issues.apache.org/jira/browse/BEAM-7482). Original Jira may 
contain additional context.
   Reported by: angoenka.




Issue Time Tracking
---

Worklog Id: (was: 778368)
Time Spent: 1h  (was: 50m)

> Bigquery Write with schema None: TypeError: 'NoneType' object has no 
> attribute '__getitem__'
> 
>
> Key: BEAM-7439
> URL: https://issues.apache.org/jira/browse/BEAM-7439
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Pablo Estrada
>Priority: P0
> Fix For: 2.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When running a simple write to bigquery on apache-beam==2.12.0
> {code:java}
> input_data = [
>{'str': 'test'}
>  ]
> (pipeline | 'create' >> beam.Create(input_data)
>| 'write' >> beam.io.WriteToBigQuery(
>':beam_test.test'))
> {code}
>  
> I get the following error:
> {code:java}
> WARNING:root:Start running in the cloud
> Traceback (most recent call last):
>  File "test_pipeline.py", line 193, in 
>  main()
>  File "test_pipeline.py", line 183, in main
>  ':beam_test.test'))
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
>  line 112, in __or__
>  return self.pipeline.apply(ptransform, self)
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 470, in apply
>  label or transform.label)
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 480, in apply
>  return self.apply(transform, pvalueish)
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 516, in apply
>  pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/runners/runner.py",
>  line 193, in apply
>  return m(transform, input, options)
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 617, in apply_WriteToBigQuery
>  parse_table_schema_from_json(json.dumps(transform.schema)),
>  File 
> "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 130, in parse_table_schema_from_json
>  fields = [_parse_schema_field(f) for f in json_schema['fields']]
> TypeError: 'NoneType' object has no attribute '__getitem__'{code}
> I already proposed a fix for this as part of a larger pr: 
> https://github.com/apache/beam/pull/8621/commits/41cdfbda5a4e2a56b6d10046ba265ad68c78675d
> I was wondering if this also needs to be patched for version 2.12.0?
> cc: [~tvalentyn] [~pabloem]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14556) Honor custom formatters being installed on the root logging handler

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14556?focusedWorklogId=778365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778365
 ]

ASF GitHub Bot logged work on BEAM-14556:
-

Author: ASF GitHub Bot
Created on: 04/Jun/22 00:25
Start Date: 04/Jun/22 00:25
Worklog Time Spent: 10m 
  Work Description: lukecwik merged PR #17820:
URL: https://github.com/apache/beam/pull/17820




Issue Time Tracking
---

Worklog Id: (was: 778365)
Time Spent: 1h 20m  (was: 1h 10m)

> Honor custom formatters being installed on the root logging handler
> ---
>
> Key: BEAM-14556
> URL: https://issues.apache.org/jira/browse/BEAM-14556
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This will allow users to create custom formatters integrating with an MDC if 
> they so choose by using a JvmInitializer to install their custom formatter on 
> the root logging handler.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14556) Honor custom formatters being installed on the root logging handler

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14556?focusedWorklogId=778361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778361
 ]

ASF GitHub Bot logged work on BEAM-14556:
-

Author: ASF GitHub Bot
Created on: 04/Jun/22 00:03
Start Date: 04/Jun/22 00:03
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on PR #17820:
URL: https://github.com/apache/beam/pull/17820#issuecomment-1146467756

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778361)
Time Spent: 1h 10m  (was: 1h)

> Honor custom formatters being installed on the root logging handler
> ---
>
> Key: BEAM-14556
> URL: https://issues.apache.org/jira/browse/BEAM-14556
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This will allow users to create custom formatters integrating with an MDC if 
> they so choose by using a JvmInitializer to install their custom formatter on 
> the root logging handler.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14293) Add @DoFn.yields_batches and @DoFn.yields_elements decorators to override defaults

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14293?focusedWorklogId=778360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778360
 ]

ASF GitHub Bot logged work on BEAM-14293:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 23:51
Start Date: 03/Jun/22 23:51
Worklog Time Spent: 10m 
  Work Description: codecov[bot] commented on PR #19268:
URL: https://github.com/apache/beam/pull/19268#issuecomment-1146462274

   # 
[Codecov](https://codecov.io/gh/apache/beam/pull/19268?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#19268](https://codecov.io/gh/apache/beam/pull/19268?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (0f9a41e) into 
[master](https://codecov.io/gh/apache/beam/commit/c77971053d2eac2e2bcd1c36c3d312157fc4e450?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c779710) will **increase** coverage by `0.00%`.
   > The diff coverage is `85.81%`.
   
   ```diff
   @@   Coverage Diff   @@
   ##   master   #19268   +/-   ##
   ===
 Coverage   74.09%   74.09%   
   ===
 Files 697  697   
 Lines   9198692036   +50 
   ===
   + Hits6815468197   +43 
   - Misses  2258322590+7 
 Partials 1249 1249   
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.75% <85.81%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/19268?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...on/apache\_beam/runners/direct/sdf\_direct\_runner.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvc2RmX2RpcmVjdF9ydW5uZXIucHk=)
 | `35.53% <75.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=)
 | `88.68% <83.33%> (+0.74%)` | :arrow_up: |
   | 
[sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5)
 | `92.38% <90.47%> (+0.08%)` | :arrow_up: |
   | 
[sdks/python/apache\_beam/utils/windowed\_value.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvd2luZG93ZWRfdmFsdWUucHk=)
 | `92.52% <100.00%> (+0.17%)` | :arrow_up: |
   | 
[.../python/apache\_beam/testing/test\_stream\_service.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy90ZXN0X3N0cmVhbV9zZXJ2aWNlLnB5)
 | `88.09% <0.00%> (-4.77%)` | :arrow_down: |
   | 
[.../apache\_beam/runners/interactive/dataproc/types.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kYXRhcHJvYy90eXBlcy5weQ==)
 | `93.10% <0.00%> (-3.45%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/internal/dill\_pickler.py](https://codecov.io/gh/apache/beam/pull/19268/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvZGlsbF9waWNrbGVyLnB5)
 | `84.89% <0.00%> (-1.44%)` | :arrow_down: |
   | 
[.

[jira] [Work logged] (BEAM-14293) Add @DoFn.yields_batches and @DoFn.yields_elements decorators to override defaults

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14293?focusedWorklogId=778359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778359
 ]

ASF GitHub Bot logged work on BEAM-14293:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 23:34
Start Date: 03/Jun/22 23:34
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on PR #19268:
URL: https://github.com/apache/beam/pull/19268#issuecomment-1146452287

   Despite the inlining this PR seems to have a minor effect (0.1-0.2 
microseconds/element) on map_fn_microbenchmark`, presumably due to the new 
branch.  Benchmarks on my desktop with Intel Xeon W-2135 CPU. 
   
   c77971053d2e:
   ```
   Fixed cost   1.6063439090633391  

   
Per-element  7.317467689514159e-07  

   R^2  0.9972847477152673
   ```
   
   This PR:
   ```
   Fixed cost   1.630842854329745
   Per-element  7.487253376931855e-07
   R^2  0.998137270424781
   ```




Issue Time Tracking
---

Worklog Id: (was: 778359)
Time Spent: 20m  (was: 10m)

> Add @DoFn.yields_batches and @DoFn.yields_elements decorators to override 
> defaults
> --
>
> Key: BEAM-14293
> URL: https://issues.apache.org/jira/browse/BEAM-14293
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14293) Add @DoFn.yields_batches and @DoFn.yields_elements decorators to override defaults

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14293?focusedWorklogId=778358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778358
 ]

ASF GitHub Bot logged work on BEAM-14293:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 23:30
Start Date: 03/Jun/22 23:30
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit opened a new pull request, #19268:
URL: https://github.com/apache/beam/pull/19268

   This PR adds two new decorators, `@yields_batches` and `@yields_elements`, 
which override the default interpretation for the output of `DoFn.process` and 
`DoFn.process_batch`. These decorators are handled via changes to 
`OutputProcessor` (now renamed to `OutputHandler` for clarity), which branches 
depending on whether batches or elements are expected in the `results` iterable.
   
   Where possible, common logic in `OutputHandler` was extracted into private 
helper methods that can be re-used, the .pxd definition for these methods 
indicates they should be inlined in cython-generated code so they don't affect 
performance.
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   




Issue Time Tracking
---

Worklog Id: (was: 778358)
Remaining Estimate: 0h
Time Spent: 10m

> Add @DoFn.yields_batches and @DoFn.yields_elements decorators to override 
> defaults
> --
>
> Key: BEAM-14293
> URL: https://issues.apache.org/jira/browse/BEAM-14293
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14556) Honor custom formatters being installed on the root logging handler

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14556?focusedWorklogId=778355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778355
 ]

ASF GitHub Bot logged work on BEAM-14556:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 23:24
Start Date: 03/Jun/22 23:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on PR #17820:
URL: https://github.com/apache/beam/pull/17820#issuecomment-1146448098

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778355)
Time Spent: 1h  (was: 50m)

> Honor custom formatters being installed on the root logging handler
> ---
>
> Key: BEAM-14556
> URL: https://issues.apache.org/jira/browse/BEAM-14556
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This will allow users to create custom formatters integrating with an MDC if 
> they so choose by using a JvmInitializer to install their custom formatter on 
> the root logging handler.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14529) Beam BQIO not accepting integer values for BQ table fields of type Float64

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14529?focusedWorklogId=778349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778349
 ]

ASF GitHub Bot logged work on BEAM-14529:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 22:58
Start Date: 03/Jun/22 22:58
Worklog Time Spent: 10m 
  Work Description: reuvenlax merged PR #17779:
URL: https://github.com/apache/beam/pull/17779




Issue Time Tracking
---

Worklog Id: (was: 778349)
Time Spent: 2h 20m  (was: 2h 10m)

> Beam BQIO not accepting integer values for BQ table fields of type Float64
> --
>
> Key: BEAM-14529
> URL: https://issues.apache.org/jira/browse/BEAM-14529
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Assignee: Yiru Tang
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Customer (F5) is working on migrating to the new Storage API support in Beam 
> SDK and are facing issues in the latest Snapshot build. Today, in production, 
> they are using the legacy streaming API and their pipeline works fine when 
> they are mapping an incoming integer field to a BQ table field of type 
> Float64. However, with the storage API, their pipeline is failing with the 
> following error "Unexpected value :0, type: class java.lang.Integer. Table 
> field name: a, type: FLOAT64" Customer considers this as a regression and is 
> preventing them from going live with Storage API. Customer is planning on 
> going live with Storage API end of July and would like to have this fixed in 
> the 2.39.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14529) Beam BQIO not accepting integer values for BQ table fields of type Float64

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14529?focusedWorklogId=778346&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778346
 ]

ASF GitHub Bot logged work on BEAM-14529:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 22:36
Start Date: 03/Jun/22 22:36
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on PR #17779:
URL: https://github.com/apache/beam/pull/17779#issuecomment-1146421195

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778346)
Time Spent: 2h 10m  (was: 2h)

> Beam BQIO not accepting integer values for BQ table fields of type Float64
> --
>
> Key: BEAM-14529
> URL: https://issues.apache.org/jira/browse/BEAM-14529
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Assignee: Yiru Tang
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Customer (F5) is working on migrating to the new Storage API support in Beam 
> SDK and are facing issues in the latest Snapshot build. Today, in production, 
> they are using the legacy streaming API and their pipeline works fine when 
> they are mapping an incoming integer field to a BQ table field of type 
> Float64. However, with the storage API, their pipeline is failing with the 
> following error "Unexpected value :0, type: class java.lang.Integer. Table 
> field name: a, type: FLOAT64" Customer considers this as a regression and is 
> preventing them from going live with Storage API. Customer is planning on 
> going live with Storage API end of July and would like to have this fixed in 
> the 2.39.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13806) [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13806?focusedWorklogId=778345&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778345
 ]

ASF GitHub Bot logged work on BEAM-13806:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 22:25
Start Date: 03/Jun/22 22:25
Worklog Time Spent: 10m 
  Work Description: youngoli commented on PR #16818:
URL: https://github.com/apache/beam/pull/16818#issuecomment-114648

   Run XVR_GoUsingJava_Dataflow PostCommit




Issue Time Tracking
---

Worklog Id: (was: 778345)
Time Spent: 9h  (was: 8h 50m)

> [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.
> -
>
> Key: BEAM-13806
> URL: https://issues.apache.org/jira/browse/BEAM-13806
> Project: Beam
>  Issue Type: New Feature
>  Components: cross-language, io-go-gcp, sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: P2
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Title says it all. Add an integration test for cross-language BigQuery IO 
> that runs on Jenkins.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14529) Beam BQIO not accepting integer values for BQ table fields of type Float64

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14529?focusedWorklogId=778341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778341
 ]

ASF GitHub Bot logged work on BEAM-14529:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 22:07
Start Date: 03/Jun/22 22:07
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on PR #17779:
URL: https://github.com/apache/beam/pull/17779#issuecomment-1146401113

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778341)
Time Spent: 2h  (was: 1h 50m)

> Beam BQIO not accepting integer values for BQ table fields of type Float64
> --
>
> Key: BEAM-14529
> URL: https://issues.apache.org/jira/browse/BEAM-14529
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Assignee: Yiru Tang
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Customer (F5) is working on migrating to the new Storage API support in Beam 
> SDK and are facing issues in the latest Snapshot build. Today, in production, 
> they are using the legacy streaming API and their pipeline works fine when 
> they are mapping an incoming integer field to a BQ table field of type 
> Float64. However, with the storage API, their pipeline is failing with the 
> following error "Unexpected value :0, type: class java.lang.Integer. Table 
> field name: a, type: FLOAT64" Customer considers this as a regression and is 
> preventing them from going live with Storage API. Customer is planning on 
> going live with Storage API end of July and would like to have this fixed in 
> the 2.39.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14529) Beam BQIO not accepting integer values for BQ table fields of type Float64

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14529?focusedWorklogId=778325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778325
 ]

ASF GitHub Bot logged work on BEAM-14529:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 21:26
Start Date: 03/Jun/22 21:26
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on PR #17779:
URL: https://github.com/apache/beam/pull/17779#issuecomment-1146374757

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778325)
Time Spent: 1h 50m  (was: 1h 40m)

> Beam BQIO not accepting integer values for BQ table fields of type Float64
> --
>
> Key: BEAM-14529
> URL: https://issues.apache.org/jira/browse/BEAM-14529
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Assignee: Yiru Tang
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Customer (F5) is working on migrating to the new Storage API support in Beam 
> SDK and are facing issues in the latest Snapshot build. Today, in production, 
> they are using the legacy streaming API and their pipeline works fine when 
> they are mapping an incoming integer field to a BQ table field of type 
> Float64. However, with the storage API, their pipeline is failing with the 
> following error "Unexpected value :0, type: class java.lang.Integer. Table 
> field name: a, type: FLOAT64" Customer considers this as a regression and is 
> preventing them from going live with Storage API. Customer is planning on 
> going live with Storage API end of July and would like to have this fixed in 
> the 2.39.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13806) [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13806?focusedWorklogId=778318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778318
 ]

ASF GitHub Bot logged work on BEAM-13806:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 21:07
Start Date: 03/Jun/22 21:07
Worklog Time Spent: 10m 
  Work Description: youngoli commented on PR #16818:
URL: https://github.com/apache/beam/pull/16818#issuecomment-1146363217

   I added a test that reads from a Query, and while testing in my own 
environment I'm getting weird results. The elements output from a query read 
have all fields as pointers because SQL queries don't match the table's schema. 
That's pretty normal. But when I made an equivalent struct in Go, I get errors 
when decoding into that struct, even though all the fields seem to be matching 
exactly:
   ```
   panic: reflect: Call using struct { Counter *int64 "beam:\"counter\""; 
Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word 
*string "beam:\"word\"" } "beam:\"rand_data\"" } as type bigquery.TestRowPtrs
   Full error:
   while executing Process for Plan[s08-67]:
   2: DataSink[S[ptransform-65@localhost:12371]] 
Coder:W;coder-80>!GWC
   3: PCollection[pcollection-72] Out:[2]
   4: ParDo[bigquery.ReadFromQueryPipeline.func1] Out:[2]
   1: DataSource[S[ptransform-64@localhost:12371], 0] 
Coder:W;coder-76>!GWC Out:4
caused by:
   panic: reflect: Call using struct { Counter *int64 "beam:\"counter\""; 
Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word 
*string "beam:\"word\"" } "beam:\"rand_data\"" } as type bigquery.TestRowPtrs 
goroutine 60 [running]:
   runtime/debug.Stack()
   ...
   ```




Issue Time Tracking
---

Worklog Id: (was: 778318)
Time Spent: 8h 50m  (was: 8h 40m)

> [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.
> -
>
> Key: BEAM-13806
> URL: https://issues.apache.org/jira/browse/BEAM-13806
> Project: Beam
>  Issue Type: New Feature
>  Components: cross-language, io-go-gcp, sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: P2
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Title says it all. Add an integration test for cross-language BigQuery IO 
> that runs on Jenkins.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13806) [Cross-Language] Jenkins integration test for Go SDK BigQuery IO.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13806?focusedWorklogId=778317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778317
 ]

ASF GitHub Bot logged work on BEAM-13806:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 21:05
Start Date: 03/Jun/22 21:05
Worklog Time Spent: 10m 
  Work Description: codecov[bot] commented on PR #16818:
URL: https://github.com/apache/beam/pull/16818#issuecomment-1146361667

   # 
[Codecov](https://codecov.io/gh/apache/beam/pull/16818?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#16818](https://codecov.io/gh/apache/beam/pull/16818?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c9ce532) into 
[master](https://codecov.io/gh/apache/beam/commit/23aeca4e373ff5a4de176770d52f15a37653a32e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (23aeca4) will **increase** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@Coverage Diff @@
   ##   master   #16818  +/-   ##
   ==
   + Coverage   74.07%   74.10%   +0.02% 
   ==
 Files 697  697  
 Lines   9192792040 +113 
   ==
   + Hits6809168202 +111 
   + Misses  2259122587   -4 
   - Partials 1245 1251   +6 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | go | `50.95% <ø> (+0.19%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/16818?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[sdks/go/pkg/beam/runners/dataflow/dataflow.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9ydW5uZXJzL2RhdGFmbG93L2RhdGFmbG93Lmdv)
 | `58.42% <0.00%> (-0.20%)` | :arrow_down: |
   | 
[sdks/go/pkg/beam/testing/passert/sum.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS90ZXN0aW5nL3Bhc3NlcnQvc3VtLmdv)
 | `100.00% <0.00%> (ø)` | |
   | 
[sdks/go/pkg/beam/core/runtime/exec/fn.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZXhlYy9mbi5nbw==)
 | `69.55% <0.00%> (ø)` | |
   | 
[sdks/go/pkg/beam/core/runtime/exec/sdf.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZXhlYy9zZGYuZ28=)
 | `70.84% <0.00%> (+0.23%)` | :arrow_up: |
   | 
[sdks/go/pkg/beam/core/runtime/exec/pardo.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZXhlYy9wYXJkby5nbw==)
 | `59.43% <0.00%> (+0.32%)` | :arrow_up: |
   | 
[...o/pkg/beam/io/rtrackers/offsetrange/offsetrange.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9pby9ydHJhY2tlcnMvb2Zmc2V0cmFuZ2Uvb2Zmc2V0cmFuZ2UuZ28=)
 | `80.62% <0.00%> (+2.12%)` | :arrow_up: |
   | 
[sdks/go/pkg/beam/testing/passert/passert.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS90ZXN0aW5nL3Bhc3NlcnQvcGFzc2VydC5nbw==)
 | `81.11% <0.00%> (+2.36%)` | :arrow_up: |
   | 
[sdks/go/pkg/beam/testing/passert/count.go](https://codecov.io/gh/apache/beam/pull/16818/diff?src=pr&el=tree

[jira] [Work logged] (BEAM-14556) Honor custom formatters being installed on the root logging handler

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14556?focusedWorklogId=778315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778315
 ]

ASF GitHub Bot logged work on BEAM-14556:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 21:02
Start Date: 03/Jun/22 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on PR #17820:
URL: https://github.com/apache/beam/pull/17820#issuecomment-1146360017

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778315)
Time Spent: 50m  (was: 40m)

> Honor custom formatters being installed on the root logging handler
> ---
>
> Key: BEAM-14556
> URL: https://issues.apache.org/jira/browse/BEAM-14556
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This will allow users to create custom formatters integrating with an MDC if 
> they so choose by using a JvmInitializer to install their custom formatter on 
> the root logging handler.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778313
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 20:58
Start Date: 03/Jun/22 20:58
Worklog Time Spent: 10m 
  Work Description: benWize commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146356790

   Run Java KafkaIO Performance Test




Issue Time Tracking
---

Worklog Id: (was: 778313)
Time Spent: 6h 20m  (was: 6h 10m)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14534) Add an interface to allow users to compress values being written to shuffle

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14534?focusedWorklogId=778310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778310
 ]

ASF GitHub Bot logged work on BEAM-14534:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:52
Start Date: 03/Jun/22 20:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on PR #17783:
URL: https://github.com/apache/beam/pull/17783#issuecomment-1146352894

   > > The other part of the change makes sense to reduce byte[] copies by 
using ByteString.
   > > CC: @tudorm
   > 
   > Maybe I'll pull the ByteString refactoring stuff out into another review 
just to make this easier? Do you have any particular issues with it using 
ByteString there?
   > 
   > The downsides with using Output/Input stream are really too big to ignore 
here, the performance differences are orders of magnitude in our tests. The 
main problem is that most "stream" compressor implementations are designed to 
compress a large amount of data, but in this case we're usually only 
compressing a few 100-1KB. It makes the overhead from creating/destroying the 
compressor streams very high (comparatively at least). We ran into this problem 
both with deflate and zstd, and its one of the reasons we ended up with an 
interface like this. If its really a non-starter putting this on OSS with a 
similar interface that's fine though, we can continue maintaining this in our 
own fork for the time being.
   > 
   > The PipelineVisitor idea is interesting, although I'm skeptical how well 
it'd work in practice. For example with a Combine the coder for the data being 
shuffled is the accumulator coder, not the value coder of the KV. I bet you'd 
need a bunch of special cases to pick the "right" coder to wrap for various 
transforms.
   
   I'm happy with using ByteStrings as it reduces copies and will review and 
merge that and then we can revisit this.
   
   I was unaware of the additional overhead for using a stream 
compressor/decompressor vs one that worked on fixed length byte strings. I'm 
curious to learn more about it though as I might see something that wasn't 
considered before and also with the migration to Dataflow runner v2 you'll lose 
the ability to have this option have an effect unless you go the coder route 
which will have additional benefits since you'll reduce the amount of data 
being sent over gRPC between the SDK harness and the runner and also how much 
the runner materializes in shuffle in addition to those other use cases such as 
side inputs/user state/...
   
   Using the PipelineVisitor should work quite well as since you really only 
need to handle GBK, CoGBK and Combine and that would give you everything that 
is optimized across runners today to my knowledge.




Issue Time Tracking
---

Worklog Id: (was: 778310)
Time Spent: 1h 40m  (was: 1.5h)

> Add an interface to allow users to compress values being written to shuffle
> ---
>
> Key: BEAM-14534
> URL: https://issues.apache.org/jira/browse/BEAM-14534
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Frequently values being shuffled are large and compressible, while users can 
> compress them on their own by using a coder that compresses the data, it 
> would be nice to be able to do so globally for all values.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-10496) Eliminate nullability errors from :sdks:java:core

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10496?focusedWorklogId=778308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778308
 ]

ASF GitHub Bot logged work on BEAM-10496:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:52
Start Date: 03/Jun/22 20:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on PR #17819:
URL: https://github.com/apache/beam/pull/17819#issuecomment-1146352708

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778308)
Time Spent: 1.5h  (was: 1h 20m)

> Eliminate nullability errors from :sdks:java:core
> -
>
> Key: BEAM-10496
> URL: https://issues.apache.org/jira/browse/BEAM-10496
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: P3
>  Labels: Clarified, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Just edit {{build.gradle}} and set {{enableChecker: true}} and fix some 
> errors!
> As of 2020-07-20 setting {{-Xmaxerrs 1}} there are 1451 errors in the 
> core Java SDK.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13756) [Playground] Merge Log and Output tabs into one and add there filtering

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13756?focusedWorklogId=778307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778307
 ]

ASF GitHub Bot logged work on BEAM-13756:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:51
Start Date: 03/Jun/22 20:51
Worklog Time Spent: 10m 
  Work Description: pabloem merged PR #17792:
URL: https://github.com/apache/beam/pull/17792




Issue Time Tracking
---

Worklog Id: (was: 778307)
Time Spent: 1h  (was: 50m)

> [Playground] Merge Log and Output tabs into one and add there filtering
> ---
>
> Key: BEAM-13756
> URL: https://issues.apache.org/jira/browse/BEAM-13756
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-playground
>Reporter: Artur Khanin
>Assignee: Alexander Zhuravlev
>Priority: P3
>  Labels: beam-playground-frontend
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As a Beam Playground {*}User{*}, I want to have one tab with Logs and Outputs 
> so I can filter them to see just Logs, Outputs, or both.
> *Acceptance criteria:*
>  * Logs and Outputs are at the same tab
>  * Logs and Outputs appear as they come
>  * Logs and Outputs can be filtered
>  * Logs and Outputs are showed both as default



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13756) [Playground] Merge Log and Output tabs into one and add there filtering

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13756?focusedWorklogId=778306&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778306
 ]

ASF GitHub Bot logged work on BEAM-13756:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:51
Start Date: 03/Jun/22 20:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17792:
URL: https://github.com/apache/beam/pull/17792#issuecomment-1146352247

   LGTM thanks all!




Issue Time Tracking
---

Worklog Id: (was: 778306)
Time Spent: 50m  (was: 40m)

> [Playground] Merge Log and Output tabs into one and add there filtering
> ---
>
> Key: BEAM-13756
> URL: https://issues.apache.org/jira/browse/BEAM-13756
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-playground
>Reporter: Artur Khanin
>Assignee: Alexander Zhuravlev
>Priority: P3
>  Labels: beam-playground-frontend
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As a Beam Playground {*}User{*}, I want to have one tab with Logs and Outputs 
> so I can filter them to see just Logs, Outputs, or both.
> *Acceptance criteria:*
>  * Logs and Outputs are at the same tab
>  * Logs and Outputs appear as they come
>  * Logs and Outputs can be filtered
>  * Logs and Outputs are showed both as default



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778305&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778305
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:50
Start Date: 03/Jun/22 20:50
Worklog Time Spent: 10m 
  Work Description: pabloem merged PR #18374:
URL: https://github.com/apache/beam/pull/18374




Issue Time Tracking
---

Worklog Id: (was: 778305)
Time Spent: 15.5h  (was: 15h 20m)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778304
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 20:50
Start Date: 03/Jun/22 20:50
Worklog Time Spent: 10m 
  Work Description: benWize commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146351526

   > sorry - I ran SEed Job again - can you try it once more after this job? 
https://ci-beam.apache.org/job/beam_SeedJob/9774/
   
   Ok, thanks




Issue Time Tracking
---

Worklog Id: (was: 778304)
Time Spent: 6h 10m  (was: 6h)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778302
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 20:48
Start Date: 03/Jun/22 20:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146350371

   sorry - I ran SEed Job again - can you try it once more after this job? 
https://ci-beam.apache.org/job/beam_SeedJob/9774/




Issue Time Tracking
---

Worklog Id: (was: 778302)
Time Spent: 6h  (was: 5h 50m)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778300
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 20:45
Start Date: 03/Jun/22 20:45
Worklog Time Spent: 10m 
  Work Description: benWize commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146348170

   Run Java KafkaIO Performance Test




Issue Time Tracking
---

Worklog Id: (was: 778300)
Time Spent: 5h 50m  (was: 5h 40m)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778295
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 20:35
Start Date: 03/Jun/22 20:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146341544

   https://ci-beam.apache.org/job/beam_SeedJob/9772/console
   
   I'll record a quick tutorial later




Issue Time Tracking
---

Worklog Id: (was: 778295)
Time Spent: 5h 40m  (was: 5.5h)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778291
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:25
Start Date: 03/Jun/22 20:25
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on PR #17462:
URL: https://github.com/apache/beam/pull/17462#issuecomment-1146334152

   Tested postcommit -> 
https://ci-beam.apache.org/job/beam_PostCommit_Python39_PR/24/testReport/




Issue Time Tracking
---

Worklog Id: (was: 778291)
Time Spent: 10h 20m  (was: 10h 10m)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14553) Dataflow portable job submission translate FileResultCoder only with window coder

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14553?focusedWorklogId=778290&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778290
 ]

ASF GitHub Bot logged work on BEAM-14553:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:23
Start Date: 03/Jun/22 20:23
Worklog Time Spent: 10m 
  Work Description: y1chi commented on PR #17818:
URL: https://github.com/apache/beam/pull/17818#issuecomment-1146332642

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778290)
Time Spent: 2h 40m  (was: 2.5h)

> Dataflow portable job submission translate FileResultCoder only with window 
> coder
> -
>
> Key: BEAM-14553
> URL: https://issues.apache.org/jira/browse/BEAM-14553
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Priority: P2
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The destination coder is neglected, if there are multiple FileResultCoders 
> with different destination coder, only first registration is successful.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778289
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:16
Start Date: 03/Jun/22 20:16
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on PR #17462:
URL: https://github.com/apache/beam/pull/17462#issuecomment-1146327462

   Run Python 3.9 PostCommit




Issue Time Tracking
---

Worklog Id: (was: 778289)
Time Spent: 10h 10m  (was: 10h)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778288
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:14
Start Date: 03/Jun/22 20:14
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on PR #17462:
URL: https://github.com/apache/beam/pull/17462#issuecomment-1146326503

   @tvalentyn Changed the last few things.  I kept run() and run_pipeline() as 
separate because there are plans to return an pipeline object from 
`run_pipeline()` for benchmark tests(still working on that). So I thought may 
be its better to have a separate method. Right now, I have a different PR 
working on top of this code by returning a pipeline object from 
`run_pipeline()` in a different file. I will try to refactor the code later if 
thats okay.




Issue Time Tracking
---

Worklog Id: (was: 778288)
Time Spent: 10h  (was: 9h 50m)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14337) Support **kwargs for PyTorch models.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14337?focusedWorklogId=778281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778281
 ]

ASF GitHub Bot logged work on BEAM-14337:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 20:02
Start Date: 03/Jun/22 20:02
Worklog Time Spent: 10m 
  Work Description: yeandy commented on PR #17470:
URL: https://github.com/apache/beam/pull/17470#issuecomment-1146317553

   retest this please




Issue Time Tracking
---

Worklog Id: (was: 778281)
Time Spent: 8h 20m  (was: 8h 10m)

> Support **kwargs for PyTorch models.
> 
>
> Key: BEAM-14337
> URL: https://issues.apache.org/jira/browse/BEAM-14337
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Andy Ye
>Priority: P2
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Some models in Pytorch instantiating from torch.nn.Module, has extra 
> parameters in the forward function call. These extra parameters can be passed 
> as Dict or as positional arguments. 
> Example of PyTorch models supported by Hugging Face -> 
> [https://huggingface.co/bert-base-uncased]
> [Some torch models on Hugging 
> face|https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py]
> Eg: 
> [https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertModel]
> {code:java}
> inputs = {
>      input_ids: Tensor1,
>      attention_mask: Tensor2,
>      token_type_ids: Tensor3,
> } 
> model = BertModel.from_pretrained("bert-base-uncased") # which is a  
> # subclass of torch.nn.Module
> outputs = model(**inputs) # model forward method should be expecting the keys 
> in the inputs as the positional arguments.{code}
>  
> [Transformers|https://pytorch.org/hub/huggingface_pytorch-transformers/] 
> integrated in Pytorch is supported by Hugging Face as well. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778279
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:55
Start Date: 03/Jun/22 19:55
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889238242


##
sdks/python/scripts/run_integration_test.sh:
##
@@ -213,9 +218,13 @@ if [[ -z $PIPELINE_OPTS ]]; then
   # Install test dependencies for ValidatesRunner tests.
   # pyhamcrest==1.10.0 doesn't work on Py2.
   # See: https://github.com/hamcrest/PyHamcrest/issues/131.
-  echo "pyhamcrest!=1.10.0,<2.0.0" > postcommit_requirements.txt

Review Comment:
   there is .txt extension is missing here, but present in .gitignore



##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,160 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+A pipeline that uses RunInference API to perform image classification."""
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Dict
+from typing import Iterable
+from typing import Optional
+from typing import Tuple
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: Optional[str] = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image.Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # Pre-trained PyTorch models expect input images normalized the

Review Comment:
   ```suggestion
 # Pre-trained PyTorch models expect input images normalized with the
   ```



##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,160 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+A pipeline that uses RunInference API to perform image classification."""
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Dict
+from typing import Iterable
+from typing import Optional
+from typing import Tuple
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torch

[jira] [Work logged] (BEAM-12572) All beam examples should get continuously exercised

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12572?focusedWorklogId=778278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778278
 ]

ASF GitHub Bot logged work on BEAM-12572:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:55
Start Date: 03/Jun/22 19:55
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on PR #17015:
URL: https://github.com/apache/beam/pull/17015#issuecomment-1146312464

   I'll review




Issue Time Tracking
---

Worklog Id: (was: 778278)
Time Spent: 71h  (was: 70h 50m)

> All beam examples should get continuously exercised
> ---
>
> Key: BEAM-12572
> URL: https://issues.apache.org/jira/browse/BEAM-12572
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Priority: P3
>  Time Spent: 71h
>  Remaining Estimate: 0h
>
> Sometimes our examples become broken without us noticing. For example, see: 
> https://lists.apache.org/thread.html/r45340bbee91a6caf798fe62d24388f645f8792cc7506351fd66adec3%40%3Cdev.beam.apache.org%3E
> We should test our examples better.
> Test examples on Dataflow, Spark, Flink and Direct runners 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-12572) All beam examples should get continuously exercised

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12572?focusedWorklogId=778277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778277
 ]

ASF GitHub Bot logged work on BEAM-12572:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:32
Start Date: 03/Jun/22 19:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on PR #17015:
URL: https://github.com/apache/beam/pull/17015#issuecomment-1146295253

   I think @kileys or @kennknowles would be the right people to review this. I 
will ping them.




Issue Time Tracking
---

Worklog Id: (was: 778277)
Time Spent: 70h 50m  (was: 70h 40m)

> All beam examples should get continuously exercised
> ---
>
> Key: BEAM-12572
> URL: https://issues.apache.org/jira/browse/BEAM-12572
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Priority: P3
>  Time Spent: 70h 50m
>  Remaining Estimate: 0h
>
> Sometimes our examples become broken without us noticing. For example, see: 
> https://lists.apache.org/thread.html/r45340bbee91a6caf798fe62d24388f645f8792cc7506351fd66adec3%40%3Cdev.beam.apache.org%3E
> We should test our examples better.
> Test examples on Dataflow, Spark, Flink and Direct runners 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778276
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 19:31
Start Date: 03/Jun/22 19:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146294860

   @pabloem - could you please run the seed job for this PR? (I tried using 
your instructions from your email today but failed to have a successful run.)




Issue Time Tracking
---

Worklog Id: (was: 778276)
Time Spent: 5.5h  (was: 5h 20m)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778274
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 19:28
Start Date: 03/Jun/22 19:28
Worklog Time Spent: 10m 
  Work Description: aaltay commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146293085

   Run Seed Job




Issue Time Tracking
---

Worklog Id: (was: 778274)
Time Spent: 5h 20m  (was: 5h 10m)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778272&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778272
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:20
Start Date: 03/Jun/22 19:20
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146286647

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control




Issue Time Tracking
---

Worklog Id: (was: 778272)
Time Spent: 28h 40m  (was: 28.5h)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 28h 40m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778271&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778271
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:19
Start Date: 03/Jun/22 19:19
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146285870

   R: @lostluck 




Issue Time Tracking
---

Worklog Id: (was: 778271)
Time Spent: 28.5h  (was: 28h 20m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 28.5h
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778270
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:18
Start Date: 03/Jun/22 19:18
Worklog Time Spent: 10m 
  Work Description: damccorm commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146284943

   Run Go Samza ValidatesRunner




Issue Time Tracking
---

Worklog Id: (was: 778270)
Time Spent: 7h  (was: 6h 50m)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778268&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778268
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:18
Start Date: 03/Jun/22 19:18
Worklog Time Spent: 10m 
  Work Description: damccorm commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146284732

   Run Go Flink ValidatesRunner




Issue Time Tracking
---

Worklog Id: (was: 778268)
Time Spent: 6h 40m  (was: 6.5h)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778269
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:18
Start Date: 03/Jun/22 19:18
Worklog Time Spent: 10m 
  Work Description: damccorm commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146284860

   Run Go Spark ValidatesRunner




Issue Time Tracking
---

Worklog Id: (was: 778269)
Time Spent: 6h 50m  (was: 6h 40m)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778267&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778267
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:14
Start Date: 03/Jun/22 19:14
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146282207

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control




Issue Time Tracking
---

Worklog Id: (was: 778267)
Time Spent: 6.5h  (was: 6h 20m)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778264
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:13
Start Date: 03/Jun/22 19:13
Worklog Time Spent: 10m 
  Work Description: asf-ci commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146281115

   Can one of the admins verify this patch?




Issue Time Tracking
---

Worklog Id: (was: 778264)
Time Spent: 6h  (was: 5h 50m)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778266
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:13
Start Date: 03/Jun/22 19:13
Worklog Time Spent: 10m 
  Work Description: damccorm commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146281396

   R: @lostluck since you have the most context - anything I'm missing here 
that makes this unsafe?




Issue Time Tracking
---

Worklog Id: (was: 778266)
Time Spent: 6h 20m  (was: 6h 10m)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778263&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778263
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:13
Start Date: 03/Jun/22 19:13
Worklog Time Spent: 10m 
  Work Description: damccorm opened a new pull request, #18533:
URL: https://github.com/apache/beam/pull/18533

   In #16729, we moved to using the RoleUrn for the Go Worker Binary, with the 
intent of removing the legacy check after a few sprints. That should be safe 
now.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Add a link to the appropriate issue in your description, if 
applicable. This will automatically link the pull request to the issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   




Issue Time Tracking
---

Worklog Id: (was: 778263)
Time Spent: 5h 50m  (was: 5h 40m)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13647) Go SDK FnApi environment version 9 should be compatible with Runner v2 artifact service

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13647?focusedWorklogId=778265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778265
 ]

ASF GitHub Bot logged work on BEAM-13647:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:13
Start Date: 03/Jun/22 19:13
Worklog Time Spent: 10m 
  Work Description: asf-ci commented on PR #18533:
URL: https://github.com/apache/beam/pull/18533#issuecomment-1146281116

   Can one of the admins verify this patch?




Issue Time Tracking
---

Worklog Id: (was: 778265)
Time Spent: 6h 10m  (was: 6h)

> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service
> ---
>
> Key: BEAM-13647
> URL: https://issues.apache.org/jira/browse/BEAM-13647
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-go
>Reporter: Heejong Lee
>Assignee: Danny McCormick
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Go SDK FnApi environment version 9 should be compatible with Runner v2 
> artifact service. We need to test and verify whether Go SDK works well with 
> Runner v2 artifact service before increasing the version number.
> For 2.37 and 2.38 we'll keep both current and old versions as transitional 
> code, but we should be able to comfortably pull out the old stuff in 2.39.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778261
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:01
Start Date: 03/Jun/22 19:01
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146273331

   issue unrelated. merging.




Issue Time Tracking
---

Worklog Id: (was: 778261)
Time Spent: 8h  (was: 7h 50m)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778262
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 19:01
Start Date: 03/Jun/22 19:01
Worklog Time Spent: 10m 
  Work Description: pabloem merged PR #17741:
URL: https://github.com/apache/beam/pull/17741




Issue Time Tracking
---

Worklog Id: (was: 778262)
Time Spent: 8h 10m  (was: 8h)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778258
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:51
Start Date: 03/Jun/22 18:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #18374:
URL: https://github.com/apache/beam/pull/18374#issuecomment-1146266043

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778258)
Time Spent: 15h 20m  (was: 15h 10m)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778252
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:40
Start Date: 03/Jun/22 18:40
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889236325


##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,160 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+A pipeline that uses RunInference API to perform image classification."""
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Dict
+from typing import Iterable
+from typing import Optional
+from typing import Tuple
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: Optional[str] = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image.Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # Pre-trained PyTorch models expect input images normalized the
+  # below values ref: https://pytorch.org/vision/stable/models.html#
+  normalize = transforms.Normalize(
+  mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+  transform = transforms.Compose([
+  transforms.Resize(image_size),
+  transforms.ToTensor(),
+  normalize,
+  ])
+  return transform(data)
+
+
+class PostProcessor(beam.DoFn):
+  def process(self, element: Tuple[str, PredictionResult]) -> Iterable[str]:
+filename, prediction_result = element
+prediction = torch.argmax(prediction_result.inference, dim=0)
+yield filename + ',' + str(prediction.item())
+
+
+def run_pipeline(
+options: PipelineOptions,
+model_class: Optional[torch.nn.Module],
+model_params: Optional[Dict],
+args=None):
+  """
+  Args:
+options: options used to set up the pipeline.
+model_class: Reference to the class definition of the model.
+If None, MobilenetV2 will be used as default .
+model_params: Parameters passed to the constructor of the model_class.
+  These will be used to instantiate the model object in the
+  RunInference API.
+args: Command line arguments defined for this example.
+  """
+  if not model_class:
+model_class = MobileNetV2
+model_params = {'num_classes': 1000}
+
+  model_loader = PytorchModelLoader(
+  state_dict_path=args.model_state_dict_path,
+  model_class=model_class,
+  model_params=model_params)
+
+  with beam.Pipeline(options=options) as p:
+filename_value_pair = (
+p
+| 'ReadImageNames' >> beam.io.ReadFromText(
+args.input, skip_header_lines=1)
+| 'ReadImageData' >> beam.Map(
+partial(read_image, path_to_dir=args.images_dir))
+| 'PreprocessImages' >> beam.MapTuple(
+lambda file_name, data: (file_name, preprocess_image(data
+predictions = (
+filename_value_pair
+| 'PyTorchRunInference' >> 
RunInference(model_loader).with_output_types(
+Tuple[str, PredictionResult])
+| 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+
+if args.output:
+  predictions | "WriteOutputToGCS" >> beam.io.WriteToText( # pylint: 
disable=expression-

[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778251
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:38
Start Date: 03/Jun/22 18:38
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146254738

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @riteshghorse for label go.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).




Issue Time Tracking
---

Worklog Id: (was: 778251)
Time Spent: 28h 20m  (was: 28h 10m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 28h 20m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778250
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:30
Start Date: 03/Jun/22 18:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146248768

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778250)
Time Spent: 7h 50m  (was: 7h 40m)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778249
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:29
Start Date: 03/Jun/22 18:29
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on PR #17462:
URL: https://github.com/apache/beam/pull/17462#issuecomment-1146248454

   PTAL @tvalentyn 




Issue Time Tracking
---

Worklog Id: (was: 778249)
Time Spent: 9.5h  (was: 9h 20m)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778244
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:12
Start Date: 03/Jun/22 18:12
Worklog Time Spent: 10m 
  Work Description: johnjcasey commented on PR #18374:
URL: https://github.com/apache/beam/pull/18374#issuecomment-1146234428

   Run Java Precommit




Issue Time Tracking
---

Worklog Id: (was: 778244)
Time Spent: 15h 10m  (was: 15h)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778242
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:06
Start Date: 03/Jun/22 18:06
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889212195


##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,146 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Pipeline that uses RunInference API to perform classification task on 
imagenet dataset"""  # pylint: disable=line-too-long
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Any
+from typing import Iterable
+from typing import Tuple
+from typing import Union
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: str = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # to use models in torch with imagenet weights,
+  # normalize the images using the below values.
+  # ref: https://pytorch.org/vision/stable/models.html#
+  normalize = transforms.Normalize(
+  mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+  transform = transforms.Compose([
+  transforms.Resize(image_size),
+  transforms.ToTensor(),
+  normalize,
+  ])
+  return transform(data)
+
+
+class PostProcessor(beam.DoFn):
+  def process(
+  self, element: Union[PredictionResult, Tuple[Any, PredictionResult]]
+  ) -> Iterable[str]:
+filename, prediction_result = element
+prediction = torch.argmax(prediction_result.inference, dim=0)
+yield filename + ',' + str(prediction.item())
+
+
+def run_pipeline(options: PipelineOptions, args=None):
+  """Sets up PyTorch RunInference pipeline"""
+  # reference to the class definition of the model.
+  model_class = MobileNetV2
+  # params for model class constructor. These values will be used in
+  # RunInference API to instantiate the model object.
+  model_params = {'num_classes': 1000}  # imagenet has 1000 classes.
+  # for this example, the pretrained weights are downloaded from
+  # "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth";
+  # and saved on GCS bucket 
gs://apache-beam-ml/models/imagenet_classification_mobilenet_v2.pt,
+  # which will be used to load the model state_dict in the RunInference API.
+  model_loader = PytorchModelLoader(
+  state_dict_path=args.model_state_dict_path,
+  model_class=model_class,
+  model_params=model_params)
+  with beam.Pipeline(options=options) as p:
+filename_value_pair = (
+p
+| 'Read from csv file' >> beam.io.ReadFromText(
+args.input, skip_header_lines=1)
+| 'Parse and read files from the input_file' >> beam.Map(
+partial(read_image, path_to_dir=args.images_dir))
+| 'Preprocess images' >> beam.MapTuple(
+lambda file_name, data: (file_name, preprocess_image(data
+predictions = (
+filename_value_pair
+| 'PyTorch RunInference' >> RunInference(model_loader)
+| 'Process output' >> beam.ParDo(PostProcessor()))
+
+if args.output:
+  predictions | "Write output to GCS" 

[jira] [Work logged] (BEAM-14535) Add support for Pandas Dataframes to sklearn RunInference Implementation

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14535?focusedWorklogId=778239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778239
 ]

ASF GitHub Bot logged work on BEAM-14535:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:00
Start Date: 03/Jun/22 18:00
Worklog Time Spent: 10m 
  Work Description: ryanthompson591 commented on PR #17800:
URL: https://github.com/apache/beam/pull/17800#issuecomment-1146225373

   R: @TheNeuralBit 




Issue Time Tracking
---

Worklog Id: (was: 778239)
Time Spent: 1.5h  (was: 1h 20m)

> Add support for Pandas Dataframes to sklearn RunInference Implementation
> 
>
> Key: BEAM-14535
> URL: https://issues.apache.org/jira/browse/BEAM-14535
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ryan Thompson
>Assignee: Ryan Thompson
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sklearn pipelines are often set up to take pandas dataframes.
>  
> Our current implementation only supports numpy arrays.
>  
> This FR allows the sklearn implementation to autodetect pandas dataframes or 
> numpy arrays and then combine them (via concat).
>  
> In the case of a pandas dataframe that value will be passed through to the 
> pipeline.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778241
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 18:03
Start Date: 03/Jun/22 18:03
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889210162


##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,146 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Pipeline that uses RunInference API to perform classification task on 
imagenet dataset"""  # pylint: disable=line-too-long
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Any
+from typing import Iterable
+from typing import Tuple
+from typing import Union
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: str = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # to use models in torch with imagenet weights,
+  # normalize the images using the below values.
+  # ref: https://pytorch.org/vision/stable/models.html#
+  normalize = transforms.Normalize(
+  mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+  transform = transforms.Compose([
+  transforms.Resize(image_size),
+  transforms.ToTensor(),
+  normalize,
+  ])
+  return transform(data)
+
+
+class PostProcessor(beam.DoFn):
+  def process(
+  self, element: Union[PredictionResult, Tuple[Any, PredictionResult]]

Review Comment:
   1 seems to be simple for this example and it works. Thanks. I will commit 
that change





Issue Time Tracking
---

Worklog Id: (was: 778241)
Time Spent: 9h 10m  (was: 9h)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-10785) Support for coder argument in WriteToBigQuery

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10785?focusedWorklogId=778237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778237
 ]

ASF GitHub Bot logged work on BEAM-10785:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:52
Start Date: 03/Jun/22 17:52
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17518:
URL: https://github.com/apache/beam/pull/17518#issuecomment-1146218977

   @harrydrippin thanks for the analysis, and great catch! - I think your fix 
to the coder itself would be a much better fix. Would you be willing to perform 
that fix instead?




Issue Time Tracking
---

Worklog Id: (was: 778237)
Time Spent: 2h 10m  (was: 2h)

> Support for coder argument in WriteToBigQuery
> -
>
> Key: BEAM-10785
> URL: https://issues.apache.org/jira/browse/BEAM-10785
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Nakamura Yu
>Assignee: Seunghwan Hong
>Priority: P1
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When using WriteToBigQuery to transfer data to BigQuery, non-ascii characters 
> are replaced with replacement characters.
> This was due to the RowAsDictJsonCoder being set as the coder for the 
> BigQueryBatchFileLoads called inside WriteToBigQuery.
> I want to add coder to the argument of WriteToBigQuery so that I can set a 
> coder other than RowAsDictJsonCoder.
> If no problem, I will create a Pull Request next weekend.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778235
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:50
Start Date: 03/Jun/22 17:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146217741

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778235)
Time Spent: 7h 40m  (was: 7.5h)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778232
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:47
Start Date: 03/Jun/22 17:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146215215

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778232)
Time Spent: 7.5h  (was: 7h 20m)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778233
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:47
Start Date: 03/Jun/22 17:47
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889198878


##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,146 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Pipeline that uses RunInference API to perform classification task on 
imagenet dataset"""  # pylint: disable=line-too-long
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Any
+from typing import Iterable
+from typing import Tuple
+from typing import Union
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: str = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # to use models in torch with imagenet weights,
+  # normalize the images using the below values.
+  # ref: https://pytorch.org/vision/stable/models.html#
+  normalize = transforms.Normalize(
+  mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+  transform = transforms.Compose([
+  transforms.Resize(image_size),
+  transforms.ToTensor(),
+  normalize,
+  ])
+  return transform(data)
+
+
+class PostProcessor(beam.DoFn):
+  def process(
+  self, element: Union[PredictionResult, Tuple[Any, PredictionResult]]

Review Comment:
   I think this is because RunInference API can produce predictions with and 
without original examples. I think there are two ways to resolve this:
   
   1) Clarify in RunInference transform that we will have only examples and 
predictions in the output
   ```
   | 'PyTorch RunInference' >> 
RunInference(model_loader).with_output_types(Tuple[str, PredictionResult])
   ```
   2) We generalize the postprocessor to handle both cases: when there are 
examples + predictions and when there are no examples, only predictions. In 
that case, the current type hint would make sense.
   
   If we go with 2) we can also consider adding an additional pipeline option 
whether or not to include examples or just predictions in the final result.





Issue Time Tracking
---

Worklog Id: (was: 778233)
Time Spent: 9h  (was: 8h 50m)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The

[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778231
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:43
Start Date: 03/Jun/22 17:43
Worklog Time Spent: 10m 
  Work Description: damccorm commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889196045


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   Yeah - I don't care if we make up empty functions for that, we already do 
that for a number of the existing snippets. The process continuation stuff 
should compile though, and that's the important bit anyways.





Issue Time Tracking
---

Worklog Id: (was: 778231)
Time Spent: 28h 10m  (was: 28h)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 28h 10m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778230
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:43
Start Date: 03/Jun/22 17:43
Worklog Time Spent: 10m 
  Work Description: damccorm commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889196045


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   Yeah - I don't care if we make up empty functions for that, we already do 
that for a number of the existing snippets. The process continuation stuff 
should compile though.





Issue Time Tracking
---

Worklog Id: (was: 778230)
Time Spent: 28h  (was: 27h 50m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 28h
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778229
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:42
Start Date: 03/Jun/22 17:42
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889195081


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   Adding an error parameter is reasonable, so that's added. It did lead to 
some extra error checking overhead since we don't have the try-catch mechanism 
the other SDKs leverage but that's not a huge problem.





Issue Time Tracking
---

Worklog Id: (was: 778229)
Time Spent: 27h 50m  (was: 27h 40m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 27h 50m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778227
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:41
Start Date: 03/Jun/22 17:41
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889194510


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {
+  position := rt.GetRestriction().(offsetrange.Restriction).Start
+  for {
+records, err := fn.ExternalService.readNextRecords(position)
+if err == fn.ExternalService.ThrottlingErr {
+  return sdf.ResumeProcessingIn(60 * time.Seconds)
+}
+if len(records) == 0 {
+  return sdf.ResumeProcessingIn(10 * time.Seconds)

Review Comment:
   That's a fair note. Adding clarifying comments is always good for a 
documentation snippet





Issue Time Tracking
---

Worklog Id: (was: 778227)
Time Spent: 27h 40m  (was: 27.5h)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 27h 40m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778225
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:39
Start Date: 03/Jun/22 17:39
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889185574


##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,146 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Pipeline that uses RunInference API to perform classification task on 
imagenet dataset"""  # pylint: disable=line-too-long
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Any
+from typing import Iterable
+from typing import Tuple
+from typing import Union
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: str = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image) -> torch.Tensor:

Review Comment:
   Thanks for catching



##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,146 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Pipeline that uses RunInference API to perform classification task on 
imagenet dataset"""  # pylint: disable=line-too-long
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Any
+from typing import Iterable
+from typing import Tuple
+from typing import Union
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: str = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # to use models in torch with imagenet w

[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778220
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:36
Start Date: 03/Jun/22 17:36
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889191309


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   IIRC if an SDF signature has a ProcessContinuation return we *always* expect 
either a Resume() or Stop() continuation and never a nil. 





Issue Time Tracking
---

Worklog Id: (was: 778220)
Time Spent: 27.5h  (was: 27h 20m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 27.5h
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778218
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:35
Start Date: 03/Jun/22 17:35
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889190683


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   This is a completely fictional IO since we don't actually have a robust 
native streaming IO, so there's nothing to compile. It's just modeled after the 
Python and Java versions. 





Issue Time Tracking
---

Worklog Id: (was: 778218)
Time Spent: 27h 20m  (was: 27h 10m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 27h 20m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14068) RunInference Benchmarking tests

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14068?focusedWorklogId=778215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778215
 ]

ASF GitHub Bot logged work on BEAM-14068:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 17:27
Start Date: 03/Jun/22 17:27
Worklog Time Spent: 10m 
  Work Description: AnandInguva commented on code in PR #17462:
URL: https://github.com/apache/beam/pull/17462#discussion_r889185085


##
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##
@@ -0,0 +1,146 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Pipeline that uses RunInference API to perform classification task on 
imagenet dataset"""  # pylint: disable=line-too-long
+
+import argparse
+import io
+import os
+from functools import partial
+from typing import Any
+from typing import Iterable
+from typing import Tuple
+from typing import Union
+
+import apache_beam as beam
+import torch
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.ml.inference.api import PredictionResult
+from apache_beam.ml.inference.api import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelLoader
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from PIL import Image
+from torchvision import transforms
+from torchvision.models.mobilenetv2 import MobileNetV2
+
+
+def read_image(image_file_name: str,
+   path_to_dir: str = None) -> Tuple[str, Image.Image]:
+  if path_to_dir is not None:
+image_file_name = os.path.join(path_to_dir, image_file_name)
+  with FileSystems().open(image_file_name, 'r') as file:
+data = Image.open(io.BytesIO(file.read())).convert('RGB')
+return image_file_name, data
+
+
+def preprocess_image(data: Image) -> torch.Tensor:
+  image_size = (224, 224)
+  # to use models in torch with imagenet weights,
+  # normalize the images using the below values.
+  # ref: https://pytorch.org/vision/stable/models.html#
+  normalize = transforms.Normalize(
+  mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+  transform = transforms.Compose([
+  transforms.Resize(image_size),
+  transforms.ToTensor(),
+  normalize,
+  ])
+  return transform(data)
+
+
+class PostProcessor(beam.DoFn):
+  def process(
+  self, element: Union[PredictionResult, Tuple[Any, PredictionResult]]

Review Comment:
   I had what you suggested here but I got this errror.
   
   `apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
'ProcessOutput': requires Tuple[Any, PredictionResult] but got 
Union[PredictionResult, Tuple[Any, PredictionResult]] for element`.
   
   Output defined here: 
https://github.com/apache/beam/blob/6d6a54d21b483c20005debcfd0b6d9ff6739ec80/sdks/python/apache_beam/ml/inference/api.py#L39
   
   I am not sure of the behavior though. It should be able to accept `element: 
Tuple[str, PredictionResult]]` but to avoid this error, I changed the code 





Issue Time Tracking
---

Worklog Id: (was: 778215)
Time Spent: 8h 40m  (was: 8.5h)

> RunInference Benchmarking tests
> ---
>
> Key: BEAM-14068
> URL: https://issues.apache.org/jira/browse/BEAM-14068
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Anand Inguva
>Priority: P2
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> RunInference benchmarks will evaluate performance of Pipelines, which 
> represent common use cases of Beam + Dataflow in Pytorch, sklearn and 
> possibly TFX. These benchmarks would be the integration tests that exercise 
> several software components using Beam, PyTorch, Scikit learn and TensorFlow 
> extended.
> we would use the datasets that's available publicly (Eg; Kaggle). 
> Size: small / 10 GB / 1 TB etc
> The default execution runner would be Dataflow unless specified otherwise.
> These tests would be run very less frequently(every release cycle).  



-

[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778161
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:59
Start Date: 03/Jun/22 15:59
Worklog Time Spent: 10m 
  Work Description: asf-ci commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146103311

   Can one of the admins verify this patch?




Issue Time Tracking
---

Worklog Id: (was: 778161)
Time Spent: 26h 40m  (was: 26.5h)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 26h 40m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778204
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:59
Start Date: 03/Jun/22 16:59
Worklog Time Spent: 10m 
  Work Description: damccorm commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889155710


##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   Can we put this in the snippets folder (example below in Watermark 
estimation section)? I know we haven't been clean on that before, but it:
   
   (a) makes sure that the code actually compiles
   (b) makes it easier to reuse (e.g. I know Dataflow has docs that use 
snippets from Beam)



##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {
+  position := rt.GetRestriction().(offsetrange.Restriction).Start
+  for {
+records, err := fn.ExternalService.readNextRecords(position)
+if err == fn.ExternalService.ThrottlingErr {
+  return sdf.ResumeProcessingIn(60 * time.Seconds)
+}
+if len(records) == 0 {
+  return sdf.ResumeProcessingIn(10 * time.Seconds)

Review Comment:
   Maybe add a comment along the lines of `// Wait for data to be available`? 
Might be nice to have a similar comment for the throttling case and the finish 
execution case as well.



##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   Could you return an `err` parameter as well (it can just return nil)? 
Something I realized w/ Bundle Finalization is that its much more helpful if we 
provide the parameters that surround the one we are demonstrating because it 
allows users to see the ordering we require.
   
   Side note unrelated to this PR: We probably need better ordering error 
messages, they are pretty confusing right now.



##
website/www/site/content/en/documentation/programming-guide.md:
##
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit 
func(Record)) sdf.ProcessContinuation {

Review Comment:
   I'm actually also curious about what process continuation we should return 
when we return an err response actually - is it nil? Might be worth including 
that as an option if for example, `records, err := 
fn.ExternalService.readNextRecords(position)` returns a non-nil, non-throttling 
error respone





Issue Time Tracking
---

Worklog Id: (was: 778204)
Time Spent: 27h 10m  (was: 27h)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 27h 10m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14337) Support **kwargs for PyTorch models.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14337?focusedWorklogId=778203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778203
 ]

ASF GitHub Bot logged work on BEAM-14337:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:57
Start Date: 03/Jun/22 16:57
Worklog Time Spent: 10m 
  Work Description: yeandy commented on PR #17470:
URL: https://github.com/apache/beam/pull/17470#issuecomment-1146178363

   R: @tvalentyn 




Issue Time Tracking
---

Worklog Id: (was: 778203)
Time Spent: 8h 10m  (was: 8h)

> Support **kwargs for PyTorch models.
> 
>
> Key: BEAM-14337
> URL: https://issues.apache.org/jira/browse/BEAM-14337
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Anand Inguva
>Assignee: Andy Ye
>Priority: P2
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Some models in Pytorch instantiating from torch.nn.Module, has extra 
> parameters in the forward function call. These extra parameters can be passed 
> as Dict or as positional arguments. 
> Example of PyTorch models supported by Hugging Face -> 
> [https://huggingface.co/bert-base-uncased]
> [Some torch models on Hugging 
> face|https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py]
> Eg: 
> [https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertModel]
> {code:java}
> inputs = {
>      input_ids: Tensor1,
>      attention_mask: Tensor2,
>      token_type_ids: Tensor3,
> } 
> model = BertModel.from_pretrained("bert-base-uncased") # which is a  
> # subclass of torch.nn.Module
> outputs = model(**inputs) # model forward method should be expecting the keys 
> in the inputs as the positional arguments.{code}
>  
> [Transformers|https://pytorch.org/hub/huggingface_pytorch-transformers/] 
> integrated in Pytorch is supported by Hugging Face as well. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14553) Dataflow portable job submission translate FileResultCoder only with window coder

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14553?focusedWorklogId=778202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778202
 ]

ASF GitHub Bot logged work on BEAM-14553:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:56
Start Date: 03/Jun/22 16:56
Worklog Time Spent: 10m 
  Work Description: y1chi commented on code in PR #17818:
URL: https://github.com/apache/beam/pull/17818#discussion_r889158954


##
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java:
##
@@ -1196,7 +1196,7 @@ public static  
FileResultCoder of(
 
 @Override
 public List> getCoderArguments() {
-  return Arrays.asList(windowCoder);
+  return Arrays.asList(windowCoder, destinationCoder);

Review Comment:
   So IIUC, the right thing to do is to have FileResult 
so that coder inference can work for both window type and destination type. But 
since all FileResult PCollections use explicit setCoders that's not a 
requirement.  I still think window coder needs to be in the components so that 
FileResultCoder with different window coders won't collide with each other.





Issue Time Tracking
---

Worklog Id: (was: 778202)
Time Spent: 2.5h  (was: 2h 20m)

> Dataflow portable job submission translate FileResultCoder only with window 
> coder
> -
>
> Key: BEAM-14553
> URL: https://issues.apache.org/jira/browse/BEAM-14553
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Priority: P2
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The destination coder is neglected, if there are multiple FileResultCoders 
> with different destination coder, only first registration is successful.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778200
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:55
Start Date: 03/Jun/22 16:55
Worklog Time Spent: 10m 
  Work Description: asf-ci commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146177133

   Can one of the admins verify this patch?




Issue Time Tracking
---

Worklog Id: (was: 778200)
Time Spent: 7h 10m  (was: 7h)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778201
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:55
Start Date: 03/Jun/22 16:55
Worklog Time Spent: 10m 
  Work Description: asf-ci commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146177154

   Can one of the admins verify this patch?




Issue Time Tracking
---

Worklog Id: (was: 778201)
Time Spent: 7h 20m  (was: 7h 10m)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14553) Dataflow portable job submission translate FileResultCoder only with window coder

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14553?focusedWorklogId=778199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778199
 ]

ASF GitHub Bot logged work on BEAM-14553:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:53
Start Date: 03/Jun/22 16:53
Worklog Time Spent: 10m 
  Work Description: y1chi commented on PR #17818:
URL: https://github.com/apache/beam/pull/17818#issuecomment-1146175229

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778199)
Time Spent: 2h 20m  (was: 2h 10m)

> Dataflow portable job submission translate FileResultCoder only with window 
> coder
> -
>
> Key: BEAM-14553
> URL: https://issues.apache.org/jira/browse/BEAM-14553
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The destination coder is neglected, if there are multiple FileResultCoders 
> with different destination coder, only first registration is successful.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778198
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:49
Start Date: 03/Jun/22 16:49
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17741:
URL: https://github.com/apache/beam/pull/17741#issuecomment-1146170628

   Run Java PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778198)
Time Spent: 7h  (was: 6h 50m)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778195
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:46
Start Date: 03/Jun/22 16:46
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #17741: [BEAM-14504] Add 
support for including addittional parameters to executebundle method in fhirio.
URL: https://github.com/apache/beam/pull/17741




Issue Time Tracking
---

Worklog Id: (was: 778195)
Time Spent: 6h 40m  (was: 6.5h)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14504) Add support for including addittional parameters to executebundle method in fhirio.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14504?focusedWorklogId=778196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778196
 ]

ASF GitHub Bot logged work on BEAM-14504:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:46
Start Date: 03/Jun/22 16:46
Worklog Time Spent: 10m 
  Work Description: fbeevikm opened a new pull request, #17741:
URL: https://github.com/apache/beam/pull/17741

   **Please** add a meaningful description for your change here
   
   Add FhirBundleWithMetadata in executebundles method so that we can pass 
additional information like message id.
   FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
messageId) to be executed on the intermediate FHIR store.
   
   Make FhirResourcePagesIterator constructor public. Whistle plugins calls 
this constructor.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   




Issue Time Tracking
---

Worklog Id: (was: 778196)
Time Spent: 6h 50m  (was: 6h 40m)

> Add support for including addittional parameters to executebundle method in 
> fhirio.
> ---
>
> Key: BEAM-14504
> URL: https://issues.apache.org/jira/browse/BEAM-14504
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-healthcare
>Reporter: Fathima Mohammed
>Assignee: Fathima Mohammed
>Priority: P2
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Add FhirBundleWithMetadata in executebundles method so that we can pass 
> additional information like message id.
> FhirBundleWithMetadata represents a FHIR bundle, with it's metadata (eg. Hl7 
> messageId) to be executed on the intermediate FHIR store.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778194
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:44
Start Date: 03/Jun/22 16:44
Worklog Time Spent: 10m 
  Work Description: pabloem merged PR #18038:
URL: https://github.com/apache/beam/pull/18038




Issue Time Tracking
---

Worklog Id: (was: 778194)
Time Spent: 15h  (was: 14h 50m)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14546) [Go SDK] passert.Count succeeds for empty PCollections.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14546?focusedWorklogId=778189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778189
 ]

ASF GitHub Bot logged work on BEAM-14546:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:30
Start Date: 03/Jun/22 16:30
Worklog Time Spent: 10m 
  Work Description: lostluck merged PR #17813:
URL: https://github.com/apache/beam/pull/17813




Issue Time Tracking
---

Worklog Id: (was: 778189)
Time Spent: 3.5h  (was: 3h 20m)

> [Go SDK] passert.Count succeeds for empty PCollections.
> ---
>
> Key: BEAM-14546
> URL: https://issues.apache.org/jira/browse/BEAM-14546
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/sdks/v2.39.0/sdks/go/pkg/beam/testing/passert/count.go#L28
> Since it's using a Combine to do the count, it never executes for empty 
> Pcollections, and is unable to fail.
> The fix is: when count > 0, plumb the pcollection through as a side input to 
> a DoFn that requires the side input to be non-empty. This would catch the 
> empty PCollection case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-974) Adds attributes support to PubsubIO

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-974?focusedWorklogId=778186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778186
 ]

ASF GitHub Bot logged work on BEAM-974:
---

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:29
Start Date: 03/Jun/22 16:29
Worklog Time Spent: 10m 
  Work Description: kennknowles opened a new issue, #18180:
URL: https://github.com/apache/beam/issues/18180

   Hello,
   I'm a Google Cloud Support Specialist who recently owned a case from a 
platinum customer who was reporting "validation of workflow failed" internal 
error when attempting to use new beta SDK.  Eventually, it was determined that 
the problem was that they weren't using .withCoder, which became required after 
BEAM-974[1].  They would like to request that a more informative error be 
thrown, as the current one is far too vague to be able to derive any useful 
information from.  Thank you.
   
   Regards,
   
   Garrett Anderson
   Cloud Support Specialist
   Google Cloud Platform Support 
   
   [1] https://issues.apache.org/jira/browse/BEAM-974
   
   Imported from Jira 
[BEAM-1662](https://issues.apache.org/jira/browse/BEAM-1662). Original Jira may 
contain additional context.
   Reported by: gfunk.




Issue Time Tracking
---

Worklog Id: (was: 778186)
Remaining Estimate: 0h
Time Spent: 10m

> Adds attributes support to PubsubIO
> ---
>
> Key: BEAM-974
> URL: https://issues.apache.org/jira/browse/BEAM-974
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P2
>  Labels: Done
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adds support for Pubsub attributes by introducing a new PubsubMessage class 
> that allows manipulation of payload and attributes.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778176
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:15
Start Date: 03/Jun/22 16:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #17492:
URL: https://github.com/apache/beam/pull/17492#issuecomment-1146144655

   sorry about that. I've opened a rollback and a fix forward:
   
   https://github.com/apache/beam/pull/18038
   https://github.com/apache/beam/pull/18060




Issue Time Tracking
---

Worklog Id: (was: 778176)
Time Spent: 14h 50m  (was: 14h 40m)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778175
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:15
Start Date: 03/Jun/22 16:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on PR #18038:
URL: https://github.com/apache/beam/pull/18038#issuecomment-1146144032

   This fixes breakage introduced in #17492




Issue Time Tracking
---

Worklog Id: (was: 778175)
Time Spent: 14h 40m  (was: 14.5h)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-9482) beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9482?focusedWorklogId=778170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778170
 ]

ASF GitHub Bot logged work on BEAM-9482:


Author: ASF GitHub Bot
Created on: 03/Jun/22 16:12
Start Date: 03/Jun/22 16:12
Worklog Time Spent: 10m 
  Work Description: benWize commented on PR #17727:
URL: https://github.com/apache/beam/pull/17727#issuecomment-1146141837

   Hi @aaltay, could you help me to run a seed job for this PR to validate my 
changes?




Issue Time Tracking
---

Worklog Id: (was: 778170)
Time Spent: 5h 10m  (was: 5h)

> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
> allocated"
> ---
>
> Key: BEAM-9482
> URL: https://issues.apache.org/jira/browse/BEAM-9482
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Benjamin Gonzalez
>Priority: P1
>  Labels: sickbay
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Kafka_IO/514/console]
>  
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-0.yml":
>  Service "outside-0" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32400: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-1.yml":
>  Service "outside-1" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32401: provided port is already allocated
> 18:55:33 Error from server (Invalid): error when creating 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Kafka_IO/src/.test-infra/kubernetes/kafka-cluster/04-outside-services/outside-2.yml":
>  Service "outside-2" is invalid: spec.ports[0].nodePort: Invalid value: 
> 32402: provided port is already allocated
> 1
>  
> Seems like we tried three ports but they were being used. Probably we should 
> update code to find an unused port dynamically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778168
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:11
Start Date: 03/Jun/22 16:11
Worklog Time Spent: 10m 
  Work Description: pabloem opened a new pull request, #18038:
URL: https://github.com/apache/beam/pull/18038

   … BQ connector to support new JSON type "
   
   This reverts commit 12be69dcd14df43e314a4c3abb086a7066c2a6a0.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Add a link to the appropriate issue in your description, if 
applicable. This will automatically link the pull request to the issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   




Issue Time Tracking
---

Worklog Id: (was: 778168)
Time Spent: 14.5h  (was: 14h 20m)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778167
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:09
Start Date: 03/Jun/22 16:09
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146139843

   R: @riteshghorse @damccorm 




Issue Time Tracking
---

Worklog Id: (was: 778167)
Time Spent: 27h  (was: 26h 50m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 27h
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13171) Support for stopReadTime on KafkaIO SDF

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13171?focusedWorklogId=778164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778164
 ]

ASF GitHub Bot logged work on BEAM-13171:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 16:06
Start Date: 03/Jun/22 16:06
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on PR #15951:
URL: https://github.com/apache/beam/pull/15951#issuecomment-1146131230

   I may be getting mixed up. I had thought that 
https://github.com/apache/beam/pull/15951#issuecomment-990238052 was addressed. 
Specifically that we would have a bounded-per-input-element SDF, hence bounded 
PCollections, when stop read time was specified. Is this false? Are 
PCollections still unbounded?




Issue Time Tracking
---

Worklog Id: (was: 778164)
Time Spent: 6h 50m  (was: 6h 40m)

> Support for stopReadTime on KafkaIO SDF 
> 
>
> Key: BEAM-13171
> URL: https://issues.apache.org/jira/browse/BEAM-13171
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Mostafa Aghajani
>Assignee: Mostafa Aghajani
>Priority: P2
> Fix For: 2.36.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> There is already the support for startReadTime using SDF when the Kafka 
> version is supported.
> I want to add the support for stopReadTIme so we can extract messages from 
> Kafka only up to a point in time and then the task will be finished.
> One use case: when you want to only re-process (re-read) a period of time for 
> a Kafka topic in your pipeline.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778160
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:59
Start Date: 03/Jun/22 15:59
Worklog Time Spent: 10m 
  Work Description: jrmccluskey opened a new pull request, #17956:
URL: https://github.com/apache/beam/pull/17956

   Adds small code snippet example to the Beam Programming Guide that 
demonstrates self-checkpointing behavior in Beam Go.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Add a link to the appropriate issue in your description, if 
applicable. This will automatically link the pull request to the issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   




Issue Time Tracking
---

Worklog Id: (was: 778160)
Time Spent: 26.5h  (was: 26h 20m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 26.5h
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-11104) [Go SDK] DoFn Self Checkpointing

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-11104?focusedWorklogId=778162&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778162
 ]

ASF GitHub Bot logged work on BEAM-11104:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:59
Start Date: 03/Jun/22 15:59
Worklog Time Spent: 10m 
  Work Description: asf-ci commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146103317

   Can one of the admins verify this patch?




Issue Time Tracking
---

Worklog Id: (was: 778162)
Time Spent: 26h 50m  (was: 26h 40m)

> [Go SDK] DoFn Self Checkpointing
> 
>
> Key: BEAM-11104
> URL: https://issues.apache.org/jira/browse/BEAM-11104
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
> Fix For: 2.40.0
>
>  Time Spent: 26h 50m
>  Remaining Estimate: 0h
>
> Allow SplittableDoFns to self checkpoint.
> Design doc: 
> [https://docs.google.com/document/d/1_JbzjY9JR07ZK5v7PcZevUfzHPsqwzfV7W6AouNpMPk/edit?usp=sharing]
>  
> Feature is written E2E and users will be able to return ProcessContinuations 
> from SDFs as of 2.39.0 but the full behavior has not been fully validated. An 
> integration test that validates self-checkpointing is working as-intended 
> will need to be written and passing before the feature is no longer 
> considered experimental and this ticket is marked as resolved.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-12572) All beam examples should get continuously exercised

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12572?focusedWorklogId=778159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778159
 ]

ASF GitHub Bot logged work on BEAM-12572:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:55
Start Date: 03/Jun/22 15:55
Worklog Time Spent: 10m 
  Work Description: fernando-wizeline commented on PR #17015:
URL: https://github.com/apache/beam/pull/17015#issuecomment-1146100411

   Hi @aaltay!
   Do you know someone who can help us to review this?
   Thanks!




Issue Time Tracking
---

Worklog Id: (was: 778159)
Time Spent: 70h 40m  (was: 70.5h)

> All beam examples should get continuously exercised
> ---
>
> Key: BEAM-12572
> URL: https://issues.apache.org/jira/browse/BEAM-12572
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Assignee: Fernando Morales
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 70h 40m
>  Remaining Estimate: 0h
>
> Sometimes our examples become broken without us noticing. For example, see: 
> https://lists.apache.org/thread.html/r45340bbee91a6caf798fe62d24388f645f8792cc7506351fd66adec3%40%3Cdev.beam.apache.org%3E
> We should test our examples better.
> Test examples on Dataflow, Spark, Flink and Direct runners 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-12918) TPC-DS: Add Jenkins jobs

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12918?focusedWorklogId=778155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778155
 ]

ASF GitHub Bot logged work on BEAM-12918:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:38
Start Date: 03/Jun/22 15:38
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on PR #17680:
URL: https://github.com/apache/beam/pull/17680#issuecomment-1146086878

   Run Dataflow Runner Nexmark Tests




Issue Time Tracking
---

Worklog Id: (was: 778155)
Time Spent: 10.5h  (was: 10h 20m)

> TPC-DS: Add Jenkins jobs
> 
>
> Key: BEAM-12918
> URL: https://issues.apache.org/jira/browse/BEAM-12918
> Project: Beam
>  Issue Type: Test
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14014) Support impersonation credentials in Dataflow runner.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14014?focusedWorklogId=778139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778139
 ]

ASF GitHub Bot logged work on BEAM-14014:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:20
Start Date: 03/Jun/22 15:20
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on code in PR #17394:
URL: https://github.com/apache/beam/pull/17394#discussion_r889045192


##
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java:
##
@@ -169,6 +169,25 @@ public interface GcpOptions extends GoogleApiDebugOptions, 
PipelineOptions {
 
   void setGcpCredential(Credentials value);
 
+  /**
+   * All API requests will be made as the given service account or target 
service account in an
+   * impersonation delegation chain instead of the currently selected account. 
You can specify
+   * either a single service account as the impersonator, or a comma-separated 
list of service
+   * accounts to create an impersonation delegation chain.
+   */
+  @Description(
+  "All API requests will be made as the given service account or"
+  + " target service account in an impersonation delegation chain"
+  + " instead of the currently selected account. You can specify"
+  + " either a single service account as the impersonator, or a"
+  + " comma-separated list of service accounts to create an"
+  + " impersonation delegation chain.")
+  @JsonIgnore
+  @Nullable
+  String getImpersonateServiceAccount();
+
+  void setImpersonateServiceAccount(String impersonateServiceAccount);

Review Comment:
   Ah, I did not realize this. That could be a useful follow-up. We do want the 
format of the parameter to be the same across SDKs and across Beam and gcloud, 
etc. But it seems that comma-separated strings will be the same across all of 
them.





Issue Time Tracking
---

Worklog Id: (was: 778139)
Time Spent: 9h  (was: 8h 50m)

> Support impersonation credentials in Dataflow runner.
> -
>
> Key: BEAM-14014
> URL: https://issues.apache.org/jira/browse/BEAM-14014
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ryan Thompson
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-13945) Update BQ connector to support new JSON type

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13945?focusedWorklogId=778129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778129
 ]

ASF GitHub Bot logged work on BEAM-13945:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:08
Start Date: 03/Jun/22 15:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on PR #17492:
URL: https://github.com/apache/beam/pull/17492#issuecomment-1146058065

   I believe this is the cause for the permared Java PreCommit which is failing 
due to:
   ```
   09:55:24 * What went wrong:
   09:55:24 Execution failed for task 
':sdks:java:io:google-cloud-platform:analyzeClassesDependencies'.
   09:55:24 > Dependency analysis found issues.
   09:55:24   usedUndeclaredArtifacts
   09:55:24- org.json:json:20200518@jar
   ```




Issue Time Tracking
---

Worklog Id: (was: 778129)
Time Spent: 14h 20m  (was: 14h 10m)

> Update BQ connector to support new JSON type
> 
>
> Key: BEAM-13945
> URL: https://issues.apache.org/jira/browse/BEAM-13945
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Ahmed Abualsaud
>Priority: P2
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> BQ has a new JSON type that is defined here: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type
> We should update Beam BQ Java and Python connectors to support that for 
> various read methods (export jobs, storage API) and write methods (load jobs, 
> streaming inserts, storage API).
> We should also add integration tests that exercise reading from /writing to 
> BQ tables with columns that has JSON type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14441) Migrate from Jira to GitHub Issues

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14441?focusedWorklogId=778126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778126
 ]

ASF GitHub Bot logged work on BEAM-14441:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:03
Start Date: 03/Jun/22 15:03
Worklog Time Spent: 10m 
  Work Description: tvalentyn merged PR #17812:
URL: https://github.com/apache/beam/pull/17812




Issue Time Tracking
---

Worklog Id: (was: 778126)
Time Spent: 3h 10m  (was: 3h)

> Migrate from Jira to GitHub Issues
> --
>
> Key: BEAM-14441
> URL: https://issues.apache.org/jira/browse/BEAM-14441
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-community
>Reporter: Danny McCormick
>Assignee: Danny McCormick
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Discussion & Planning Thread 
> https://lists.apache.org/thread/q5nbwxqvfkzlz664c4kchzkbj26c3r89
> Migration thread: 
> https://lists.apache.org/thread/p623g3k8hx5dc9jzmso2n9mk2cdk15v9 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14441) Migrate from Jira to GitHub Issues

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14441?focusedWorklogId=778125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778125
 ]

ASF GitHub Bot logged work on BEAM-14441:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 15:00
Start Date: 03/Jun/22 15:00
Worklog Time Spent: 10m 
  Work Description: damccorm commented on PR #17812:
URL: https://github.com/apache/beam/pull/17812#issuecomment-1146050749

   R: @kennknowles 




Issue Time Tracking
---

Worklog Id: (was: 778125)
Time Spent: 3h  (was: 2h 50m)

> Migrate from Jira to GitHub Issues
> --
>
> Key: BEAM-14441
> URL: https://issues.apache.org/jira/browse/BEAM-14441
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-community
>Reporter: Danny McCormick
>Assignee: Danny McCormick
>Priority: P2
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Discussion & Planning Thread 
> https://lists.apache.org/thread/q5nbwxqvfkzlz664c4kchzkbj26c3r89
> Migration thread: 
> https://lists.apache.org/thread/p623g3k8hx5dc9jzmso2n9mk2cdk15v9 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14557) Migrate exclusively to the v1 progress/metrics API

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14557?focusedWorklogId=778108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778108
 ]

ASF GitHub Bot logged work on BEAM-14557:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 14:29
Start Date: 03/Jun/22 14:29
Worklog Time Spent: 10m 
  Work Description: riteshghorse commented on PR #17821:
URL: https://github.com/apache/beam/pull/17821#issuecomment-1146022317

   Okay, it looks like the universal runner extracts metrics from 
MonitoringInfos for producing the pipeline result. Need to change that as well.




Issue Time Tracking
---

Worklog Id: (was: 778108)
Remaining Estimate: 0h
Time Spent: 10m

> Migrate exclusively to the v1 progress/metrics API
> --
>
> Key: BEAM-14557
> URL: https://issues.apache.org/jira/browse/BEAM-14557
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: 2.40.0
>Reporter: Robert Burke
>Assignee: Ritesh Ghorse
>Priority: P2
> Fix For: 2.40.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Go SDK has continued to send the legacy metrics to runners, 
> simultaneously with the new metrics. We should remove the legacy approach and 
> only use the modern version which is more efficient.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14406) Integration testing of Pipeline drain in Go SDK

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14406?focusedWorklogId=778101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778101
 ]

ASF GitHub Bot logged work on BEAM-14406:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 14:04
Start Date: 03/Jun/22 14:04
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #17814:
URL: https://github.com/apache/beam/pull/17814#issuecomment-1145998797

   R: @youngoli for final approval




Issue Time Tracking
---

Worklog Id: (was: 778101)
Time Spent: 0.5h  (was: 20m)

> Integration testing of Pipeline drain in Go SDK
> ---
>
> Key: BEAM-14406
> URL: https://issues.apache.org/jira/browse/BEAM-14406
> Project: Beam
>  Issue Type: Test
>  Components: sdk-go
>Reporter: Ritesh Ghorse
>Assignee: Ritesh Ghorse
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add an integration test for pipeline drain feature 
> [BEAM-11106|https://issues.apache.org/jira/browse/BEAM-11106] added in 
> https://github.com/apache/beam/pull/17432. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work logged] (BEAM-14546) [Go SDK] passert.Count succeeds for empty PCollections.

2022-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14546?focusedWorklogId=778098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778098
 ]

ASF GitHub Bot logged work on BEAM-14546:
-

Author: ASF GitHub Bot
Created on: 03/Jun/22 13:51
Start Date: 03/Jun/22 13:51
Worklog Time Spent: 10m 
  Work Description: jrmccluskey commented on PR #17813:
URL: https://github.com/apache/beam/pull/17813#issuecomment-1145987212

   Run GoPortable PreCommit




Issue Time Tracking
---

Worklog Id: (was: 778098)
Time Spent: 3h 20m  (was: 3h 10m)

> [Go SDK] passert.Count succeeds for empty PCollections.
> ---
>
> Key: BEAM-14546
> URL: https://issues.apache.org/jira/browse/BEAM-14546
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Jack McCluskey
>Priority: P3
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/sdks/v2.39.0/sdks/go/pkg/beam/testing/passert/count.go#L28
> Since it's using a Combine to do the count, it never executes for empty 
> Pcollections, and is unable to fail.
> The fix is: when count > 0, plumb the pcollection through as a side input to 
> a DoFn that requires the side input to be non-empty. This would catch the 
> empty PCollection case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


  1   2   3   4   5   6   7   8   9   10   >