[GitHub] [beam] epicfaace opened a new pull request #11802: [BEAM-9916] Update I/O documentation links and create more complete I/O matrix

2020-05-22 Thread GitBox


epicfaace opened a new pull request #11802:
URL: https://github.com/apache/beam/pull/11802


   Update I/O documentation links to point to the appropriate pydoc / javadoc / 
godoc pages. Also, create a more complete I/O matrix that has all the existing 
I/O's used in all three languages, with a tab to switch between languages, too. 
R: @pabloem 
   
   The page now looks like this:
   
   
![image](https://user-images.githubusercontent.com/1689183/82719976-298a5c80-9c7d-11ea-8614-a6f40e5e3921.png)
   
   Additionally, I've also added links to the built-in I/O connector guides on 
the side bar, because previously they didn't exist:
   
   
![image](https://user-images.githubusercontent.com/1689183/82719978-314a0100-9c7d-11ea-9071-c47f137c7c2b.png)
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[GitHub] [beam] TheNeuralBit merged pull request #11778: [BEAM-8889][release-2.22.0] Upgrades gcsio to 2.1.3

2020-05-22 Thread GitBox


TheNeuralBit merged pull request #11778:
URL: https://github.com/apache/beam/pull/11778


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11778: [BEAM-8889][release-2.22.0] Upgrades gcsio to 2.1.3

2020-05-22 Thread GitBox


TheNeuralBit commented on pull request #11778:
URL: https://github.com/apache/beam/pull/11778#issuecomment-632932510


   D'oh Python PreCommit is failing just because 2.22.0 containers haven't been 
built yet. I'll merge now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] stale[bot] commented on pull request #10528: [BEAM-9064] Add pytype checks to tox

2020-05-22 Thread GitBox


stale[bot] commented on pull request #10528:
URL: https://github.com/apache/beam/pull/10528#issuecomment-632834816


   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] stale[bot] closed pull request #10528: [BEAM-9064] Add pytype checks to tox

2020-05-22 Thread GitBox


stale[bot] closed pull request #10528:
URL: https://github.com/apache/beam/pull/10528


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on a change in pull request #11780: [BEAM-9948] Uploading mascot to the website

2020-05-22 Thread GitBox


iemejia commented on a change in pull request #11780:
URL: https://github.com/apache/beam/pull/11780#discussion_r429122732



##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 
+
+You can browse the original mascot and its adaptations in different sizes and 
image formats in [this 
directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot).
 
+
+![Model Sheet](/images/mascot/model_sheet.png)

Review comment:
   Can we make this image clickable since it is barely readable in its 
compact form.

##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 

Review comment:
   (nit) upper case Apache **L**icense

##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 
+
+You can browse the original mascot and its adaptations in different sizes and 
image formats in [this 
directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot).
 

Review comment:
   Also I cannot see the svg images in the web browser it seems there is an 
issue with the Apache License headers, e.g. 
https://raw.githubusercontent.com/aijamalnk/beam/mascot-upload/website/www/site/static/images/mascot/beam_mascot.svg

##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 
+
+You can browse the original mascot and its adaptations in different sizes and 
image formats in [this 
directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot).
 

Review comment:
   Link does not work and can we make it relative too e.g. 
`(/images/mascot/)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11778: [BEAM-8889][release-2.22.0] Upgrades gcsio to 2.1.3

2020-05-22 Thread GitBox


TheNeuralBit commented on pull request #11778:
URL: https://github.com/apache/beam/pull/11778#issuecomment-632831985


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace commented on a change in pull request #11780: [BEAM-9948] Uploading mascot to the website

2020-05-22 Thread GitBox


epicfaace commented on a change in pull request #11780:
URL: https://github.com/apache/beam/pull/11780#discussion_r429376885



##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 

Review comment:
   ```suggestion
   Beam Firefly is cute, friendly, agile, easy to use, and its main objective 
is to fetch streams and batches of data and process it. The mascot’s model 
sheet is useful to understand its features, capabilities, as well as its 
morphology.  The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 
   ```

##
File path: website/www/site/content/en/community/mascot.md
##
@@ -0,0 +1,52 @@
+---
+title: "Beam Mascot"
+---
+
+
+# Beam Mascot Design
+
+This page contains Apache Beam's mascot designs. 
+
+Beam firefly is cute, friendly, agile, easy to use, and its main objective is 
to fetch streams and batches of data and process it. The mascot’s model sheet 
is useful to understand its features, capabilities, as well as its morphology.  
The original design of the mascot was created by [Julian G. 
Bruno](https://www.artstation.com/jbruno) and was donated to the Apache Beam 
community under the [Apache license 
2.0](https://www.apache.org/licenses/LICENSE-2.0). 
+
+You can browse the original mascot and its adaptations in different sizes and 
image formats in [this 
directory](https://github.com/apache/beam/tree/mascot-upload/website/www/site/static/images/mascot).
 
+
+![Model Sheet](/images/mascot/model_sheet.png)
+
+## Beam mascot adaptations
+
+The original Beam mascot - simple and eye-catching, was developed with its 
body shaped like a “B” from Beam’s logo. The ideal choice for printing in any 
size or using for swags. The lit-up tail is another element to play with if you 
want to animate the firefly.
+
+
+
+The adaptation below represents independent learning. The bubble can be 
replaced by any other description. An ideal choice for promoting Beam talks, 
workshops, and webinars where the Beam community learns new things.
+
+
+
+The second adaptation below represents data processing. The data goes through 
the firefly’s tail, filling the entire body with streams of information (for 
example, codes) and when it is done processing the mascot’s entire body outline 
lits up in yellow color to tell that it is loaded with data and is ready to 
process it, while it becomes stronger and faster. Mascot returns to its 
original status after finishing this process.
+
+
+
+
+
+
+
+## Colors
+For Beam mascot we used Apache Beam project’s predefined colors and fonts. 
[This document](/downloads/palette.pdf) has more information. Color Palette 
(RGB), Orange : #FF570B, Yellow : #FFF200, White: #FF, Black: #1D1D1B

Review comment:
   ```suggestion
   For the Beam mascot, we used Apache Beam project’s predefined colors and 
fonts. [This document](/downloads/palette.pdf) has more information. Color 
Palette (RGB), Orange : #FF570B, Yellow : #FFF200, White: #FF, Black: 
#1D1D1B
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on a change in pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.

2020-05-22 Thread GitBox


lostluck commented on a change in pull request #11791:
URL: https://github.com/apache/beam/pull/11791#discussion_r429361145



##
File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
##
@@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot {
return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, 
Name: n.Name, Count: c}
 }
 
-// Split takes a sorted set of potential split indices, selects and actuates
-// split on an appropriate split index, and returns the selected split index
-// if successful. Returns an error when unable to split.
+// Split takes a sorted set of potential split indices and a fraction of the
+// remainder to split at, selects and actuates a split on an appropriate split
+// index, and returns the selected split index if successful. Returns an error
+// when unable to split.
 func (n *DataSource) Split(splits []int64, frac float64) (int64, error) {
-   if splits == nil {
-   return 0, fmt.Errorf("failed to split: requested splits were 
empty")
-   }
if n == nil {
return 0, fmt.Errorf("failed to split at requested splits: 
{%v}, DataSource not initialized", splits)
}
+   if frac > 1.0 {
+   frac = 1.0
+   } else if frac < 0.0 {
+   frac = 0.0
+   }
+
n.mu.Lock()
-   c := n.index
-   // Find the smallest split index that we haven't yet processed, and set
-   // the promised split index to this value.
-   for _, s := range splits {
-   // // Never split on the first element, or the current element.
-   if s > 0 && s > c && s <= n.splitIdx {
-   n.splitIdx = s
-   fs := n.splitIdx
-   n.mu.Unlock()
-   return fs, nil
-   }
+   s, err := splitHelper(n.index, n.splitIdx, splits, frac)
+   if err != nil {
+   n.mu.Unlock()
+   return 0, err
}
+   n.splitIdx = s
+   fs := n.splitIdx
n.mu.Unlock()
-   // If we can't find a suitable split index from the requested choices,
-   // return an error.
-   return 0, fmt.Errorf("failed to split at requested splits: {%v}, 
DataSource at index: %v", splits, c)
+   return fs, nil
+}
+
+// splitHelper is a helper function that finds a split point in a range.
+// currIdx and splitIdx should match the DataSource's index and splitIdx 
fields,
+// and represent the start and end of the splittable range respectively. splits
+// is an optional slice of valid split indices, and if nil then all indices are
+// considered valid split points. frac must be between [0, 1], and represents
+// a fraction of the remaining work that the split point aims to be as close
+// as possible to.
+func splitHelper(currIdx, splitIdx int64, splits []int64, frac float64) 
(int64, error) {
+   // Get split index from fraction. Find the closest index to the 
fraction of
+   // the remainder.
+   var start int64 = 0
+   if currIdx > start {
+   start = currIdx
+   }
+   // This is the first valid split index, since we should never split at 
0 or
+   // at the current element.
+   safeStart := start + 1
+   // The remainder starts at our actual progress (i.e. start), but our 
final
+   // split index has to be >= our safeStart.
+   fracIdx := start + int64(math.Round(frac*float64(splitIdx-start)))
+   if fracIdx < safeStart {
+   fracIdx = safeStart
+   }
+   if splits == nil {
+   // All split points are valid so just split at fraction.
+   return fracIdx, nil
+   } else {
+   // Find the closest unprocessed split point to our fraction.
+   sort.Slice(splits, func(i, j int) bool { return splits[i] < 
splits[j] })

Review comment:
   Consider https://golang.org/pkg/sort/#Ints

##
File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
##
@@ -266,33 +267,85 @@ func (n *DataSource) Progress() ProgressReportSnapshot {
return ProgressReportSnapshot{PID: n.outputPID, ID: n.SID.PtransformID, 
Name: n.Name, Count: c}
 }
 
-// Split takes a sorted set of potential split indices, selects and actuates
-// split on an appropriate split index, and returns the selected split index
-// if successful. Returns an error when unable to split.
+// Split takes a sorted set of potential split indices and a fraction of the
+// remainder to split at, selects and actuates a split on an appropriate split
+// index, and returns the selected split index if successful. Returns an error
+// when unable to split.
 func (n *DataSource) Split(splits []int64, frac float64) (int64, error) {
-   if splits == nil {
-   return 0, fmt.Errorf("failed to split: requested splits were 
empty")
-   }
if n == nil {
return 0, fmt.Errorf("failed to split at 

[GitHub] [beam] chadrik commented on a change in pull request #11632: [BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.*

2020-05-22 Thread GitBox


chadrik commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r429361766



##
File path: sdks/python/apache_beam/dataframe/convert.py
##
@@ -16,13 +16,23 @@
 
 from __future__ import absolute_import
 
+import typing
+
 import inspect
 
 from apache_beam import pvalue
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
 from apache_beam.dataframe import transforms
 
+if typing.TYPE_CHECKING:
+  # pylint: disable=ungrouped-imports
+  from typing import Any
+  from typing import Dict
+  from typing import Tuple
+  from typing import Union

Review comment:
   This note hasn't been addressed:  
   
   > I had a look at the lint errors, and they are legitimate, but scoping the 
imports is not the right solution.
   
   see above. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11782: [BEAM-10056] Fix validation for struct CoGBKs

2020-05-22 Thread GitBox


lostluck commented on pull request #11782:
URL: https://github.com/apache/beam/pull/11782#issuecomment-632807038


   I like that the tests are very nicely separated for the different cases eg. 
Unknown being more permissive than any of the stricter checks. Well done!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #11782: [BEAM-10056] Fix validation for struct CoGBKs

2020-05-22 Thread GitBox


lostluck merged pull request #11782:
URL: https://github.com/apache/beam/pull/11782


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #11768: [BEAM-10051] Move closed reader check after sentinel.

2020-05-22 Thread GitBox


lostluck merged pull request #11768:
URL: https://github.com/apache/beam/pull/11768


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace opened a new pull request #11801: [BEAM-10067] Minify website assets with --minify flag

2020-05-22 Thread GitBox


epicfaace opened a new pull request #11801:
URL: https://github.com/apache/beam/pull/11801


   Use Hugo's `--minify` flag to minify website assets, such as HTML files. 
Size of the `dist` folder has changed from 41M -> 39M. R: @aaltay 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[GitHub] [beam] epicfaace commented on pull request #11759: [BEAM-9926] Docs - show placeholder code snippets if code snippets are unavailable

2020-05-22 Thread GitBox


epicfaace commented on pull request #11759:
URL: https://github.com/apache/beam/pull/11759#issuecomment-632737396


   I don't like this approach. I think it's best to manually add code snippets 
as in #11790.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] santhh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


santhh commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429297864



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set. The
+ * situation is the same with deidentifyTemplateName and deidentifyConfig 
({@link DeidentifyConfig}.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ *
+ * The transform outputs {@link KV} of {@link String} (eg. filename) and 
{@link
+ * DeidentifyContentResponse}, which will contain {@link Table} of results for 
the user to consume.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPDeidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  @Nullable
+  public abstract String inspectTemplateName();
+
+  @Nullable
+  public abstract String deidentifyTemplateName();
+
+  @Nullable
+  public abstract InspectConfig inspectConfig();
+
+  @Nullable
+  public abstract DeidentifyConfig deidentifyConfig();
+
+  @Nullable
+  public abstract PCollectionView> csvHeader();
+
+  @Nullable
+  public abstract String csvDelimiter();
+
+  public abstract Integer batchSize();
+
+  public abstract String projectId();
+
+  @AutoValue.Builder
+  public abstract static class Builder {
+public abstract Builder setInspectTemplateName(String inspectTemplateName);
+
+public abstract Builder setCsvHeader(PCollectionView> 
csvHeader);
+
+public abstract Builder setCsvDelimiter(String delimiter);
+
+public abstract Builder setBatchSize(Integer batchSize);
+
+public abstract Builder setProjectId(String projectId);
+
+public abstract Builder setDeidentifyTemplateName(String 
deidentifyTemplateName);
+
+public abstract Builder setInspectConfig(InspectConfig inspectConfig);
+
+public abstract Builder setDeidentifyConfig(DeidentifyConfig 
deidentifyConfig);
+
+public abstract DLPDeidentifyText build();
+  }
+
+  public static DLPDeidentifyText.Builder newBuilder() {
+return new AutoValue_DLPDeidentifyText.Builder();
+  }
+
+  /**
+   * The transform batches the contents of input PCollection and then calls 
Cloud DLP service to
+   * perform the deidentification.
+   *
+   * @param input input PCollection
+   * @return PCollection after transformations
+   */
+  @Override
+  public PCollection> expand(
+  PCollection> input) {
+return input
+.apply(ParDo.of(new MapStringToDlpRow(csvDelimiter(
+.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(batchSize(
+.apply(
+"DLPDeidentify",
+ParDo.of(
+new 

[GitHub] [beam] epicfaace closed pull request #11759: [BEAM-9926] Docs - show placeholder code snippets if code snippets are unavailable

2020-05-22 Thread GitBox


epicfaace closed pull request #11759:
URL: https://github.com/apache/beam/pull/11759


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] santhh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


santhh commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429297298



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set. The
+ * situation is the same with deidentifyTemplateName and deidentifyConfig 
({@link DeidentifyConfig}.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ *
+ * The transform outputs {@link KV} of {@link String} (eg. filename) and 
{@link
+ * DeidentifyContentResponse}, which will contain {@link Table} of results for 
the user to consume.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPDeidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  @Nullable
+  public abstract String inspectTemplateName();
+
+  @Nullable
+  public abstract String deidentifyTemplateName();
+
+  @Nullable
+  public abstract InspectConfig inspectConfig();
+
+  @Nullable
+  public abstract DeidentifyConfig deidentifyConfig();
+
+  @Nullable
+  public abstract PCollectionView> csvHeader();
+
+  @Nullable
+  public abstract String csvDelimiter();
+
+  public abstract Integer batchSize();
+
+  public abstract String projectId();
+
+  @AutoValue.Builder
+  public abstract static class Builder {
+public abstract Builder setInspectTemplateName(String inspectTemplateName);
+
+public abstract Builder setCsvHeader(PCollectionView> 
csvHeader);
+
+public abstract Builder setCsvDelimiter(String delimiter);
+
+public abstract Builder setBatchSize(Integer batchSize);
+
+public abstract Builder setProjectId(String projectId);
+
+public abstract Builder setDeidentifyTemplateName(String 
deidentifyTemplateName);
+
+public abstract Builder setInspectConfig(InspectConfig inspectConfig);
+
+public abstract Builder setDeidentifyConfig(DeidentifyConfig 
deidentifyConfig);
+
+public abstract DLPDeidentifyText build();
+  }
+
+  public static DLPDeidentifyText.Builder newBuilder() {
+return new AutoValue_DLPDeidentifyText.Builder();
+  }
+
+  /**
+   * The transform batches the contents of input PCollection and then calls 
Cloud DLP service to
+   * perform the deidentification.
+   *
+   * @param input input PCollection
+   * @return PCollection after transformations
+   */
+  @Override
+  public PCollection> expand(
+  PCollection> input) {
+return input
+.apply(ParDo.of(new MapStringToDlpRow(csvDelimiter(
+.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(batchSize(
+.apply(
+"DLPDeidentify",
+ParDo.of(
+new 

[GitHub] [beam] santhh commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


santhh commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429296219



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set. The
+ * situation is the same with deidentifyTemplateName and deidentifyConfig 
({@link DeidentifyConfig}.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.

Review comment:
   I think moving to the method definition will be more helpful.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-22 Thread GitBox


pulasthi commented on pull request #10888:
URL: https://github.com/apache/beam/pull/10888#issuecomment-632725227


   @iemejia Would we be able to merge this into the master so it would be 
included in the 2.23.0 release? let me know what else needs to happen on my end 
to proceed. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace opened a new pull request #11800: Fix typo copyLicenseScrips -> copyLicenseScripts

2020-05-22 Thread GitBox


epicfaace opened a new pull request #11800:
URL: https://github.com/apache/beam/pull/11800


   R: @youngoli 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] kamilwu commented on pull request #11776: [BEAM-9421] Better documentation of output results from AnnotateText transform

2020-05-22 Thread GitBox


kamilwu commented on pull request #11776:
URL: https://github.com/apache/beam/pull/11776#issuecomment-632724995


   R: @aaltay Could you take a look?
   Website staging is not working at the moment, fix is in progress: 
https://github.com/apache/beam/pull/11796



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pulasthi commented on pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-22 Thread GitBox


pulasthi commented on pull request #10888:
URL: https://github.com/apache/beam/pull/10888#issuecomment-632724710


   @RyanSkraba Thank you so much for your review and feedback



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] naipath commented on pull request #11799: [BEAM-10066] Added support for ValueProvider in RedisConnectionConfiguration

2020-05-22 Thread GitBox


naipath commented on pull request #11799:
URL: https://github.com/apache/beam/pull/11799#issuecomment-632724291


   R: @jbonofre 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests

2020-05-22 Thread GitBox


epicfaace commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-632724234


   Run Python PreCommit
   
   Run Python 3.7 PostCommit
   
   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace commented on a change in pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests

2020-05-22 Thread GitBox


epicfaace commented on a change in pull request #11788:
URL: https://github.com/apache/beam/pull/11788#discussion_r429283691



##
File path: .test-infra/jenkins/README.md
##
@@ -84,6 +84,7 @@ Beam Jenkins overview page: 
[link](https://builds.apache.org/view/A-D/view/Beam/
 | beam_PostCommit_Python35 | 
[cron](https://builds.apache.org/job/beam_PostCommit_Python35), 
[phrase](https://builds.apache.org/job/beam_PostCommit_Python35_PR/) | `Run 
Python 3.5 PostCommit` | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35)
 |
 | beam_PostCommit_Python36 | 
[cron](https://builds.apache.org/job/beam_PostCommit_Python36), 
[phrase](https://builds.apache.org/job/beam_PostCommit_Python36_PR/) | `Run 
Python 3.6 PostCommit` | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36)
 |
 | beam_PostCommit_Python37 | 
[cron](https://builds.apache.org/job/beam_PostCommit_Python37), 
[phrase](https://builds.apache.org/job/beam_PostCommit_Python37_PR/) | `Run 
Python 3.7 PostCommit` | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37)
 |
+| beam_PostCommit_Python38 | 
[cron](https://builds.apache.org/job/beam_PostCommit_Python38), 
[phrase](https://builds.apache.org/job/beam_PostCommit_Python38_PR/) | `Run 
Python 3.7 PostCommit` | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python38/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python38)
 |

Review comment:
   ```suggestion
   | beam_PostCommit_Python38 | 
[cron](https://builds.apache.org/job/beam_PostCommit_Python38), 
[phrase](https://builds.apache.org/job/beam_PostCommit_Python38_PR/) | `Run 
Python 3.8 PostCommit` | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python38/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python38)
 |
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pulasthi commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-22 Thread GitBox


pulasthi commented on a change in pull request #10888:
URL: https://github.com/apache/beam/pull/10888#discussion_r429283778



##
File path: 
runners/twister2/src/main/java/org/apache/beam/runners/twister2/package-info.java
##
@@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Internal implementation of the Beam runner for Apache Twister2. */

Review comment:
   Fixed it, it is good that you caught that, that would go unfixed for a 
long time otherwise





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] epicfaace commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests

2020-05-22 Thread GitBox


epicfaace commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-632723409


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] naipath opened a new pull request #11799: Added support for the ValueProvider for RedisConnectionConfiguration

2020-05-22 Thread GitBox


naipath opened a new pull request #11799:
URL: https://github.com/apache/beam/pull/11799


   This pull request adds support for `ValueProvider` for 
RedisConnectionConfiguration.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[GitHub] [beam] epicfaace opened a new pull request #11798: [BEAM-7370] Upgrade sphinx to 3.0.3

2020-05-22 Thread GitBox


epicfaace opened a new pull request #11798:
URL: https://github.com/apache/beam/pull/11798


   R: @udim.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] epicfaace opened a new pull request #11797: [BEAM-10065] Fix beam release guide template

2020-05-22 Thread GitBox


epicfaace opened a new pull request #11797:
URL: https://github.com/apache/beam/pull/11797


   Fix beam release guide template so that it shows up, and so that it also 
highlights syntax in markdown.
   
   It turns out, the `` part of the template was what was causing 
the display issues.
   
   R: @ibzib 
   
   
![image](https://user-images.githubusercontent.com/1689183/82671081-3f136e00-9c0c-11ea-93eb-732517491a17.png)
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[GitHub] [beam] stale[bot] commented on pull request #11186: [BEAM-9564] Remove insecure ssl options from MongoDBIO

2020-05-22 Thread GitBox


stale[bot] commented on pull request #11186:
URL: https://github.com/apache/beam/pull/11186#issuecomment-632681994


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429213975



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.

Review comment:
   Thanks for this remark, I added early validation to the code.

##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or 

[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429213709



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.privacy.dlp.v2.Table;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.StateSpec;
+import org.apache.beam.sdk.state.StateSpecs;
+import org.apache.beam.sdk.state.TimeDomain;
+import org.apache.beam.sdk.state.Timer;
+import org.apache.beam.sdk.state.TimerSpec;
+import org.apache.beam.sdk.state.TimerSpecs;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * DoFn batching the input PCollection into bigger requests in order to better 
utilize the Cloud DLP
+ * service.
+ */
+@Experimental
+class BatchRequestForDLP extends DoFn, KV>> {
+  public static final Logger LOG = 
LoggerFactory.getLogger(BatchRequestForDLP.class);
+
+  private final Counter numberOfRowsBagged =
+  Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged");
+  private final Integer batchSize;
+
+  @StateId("elementsBag")
+  private final StateSpec>> elementsBag = 
StateSpecs.bag();
+
+  @TimerId("eventTimer")
+  private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME);
+
+  public BatchRequestForDLP(Integer batchSize) {
+this.batchSize = batchSize;
+  }
+
+  @ProcessElement
+  public void process(
+  @Element KV element,
+  @StateId("elementsBag") BagState> elementsBag,
+  @TimerId("eventTimer") Timer eventTimer,
+  BoundedWindow w) {
+elementsBag.add(element);
+eventTimer.set(w.maxTimestamp());
+  }
+
+  @OnTimer("eventTimer")
+  public void onTimer(
+  @StateId("elementsBag") BagState> elementsBag,
+  OutputReceiver>> output) {
+String key = elementsBag.read().iterator().next().getKey();
+AtomicInteger bufferSize = new AtomicInteger();
+List rows = new ArrayList<>();
+elementsBag
+.read()
+.forEach(
+element -> {
+  int elementSize = element.getValue().getSerializedSize();
+  boolean clearBuffer = bufferSize.intValue() + elementSize > 
batchSize;
+  if (clearBuffer) {
+numberOfRowsBagged.inc(rows.size());
+LOG.debug("Clear Buffer {} , Key {}", bufferSize.intValue(), 
element.getKey());
+output.output(KV.of(element.getKey(), rows));
+rows.clear();
+bufferSize.set(0);
+  }
+  rows.add(element.getValue());
+  bufferSize.getAndAdd(element.getValue().getSerializedSize());
+});
+if (!rows.isEmpty()) {
+  LOG.debug("Remaining rows {}", rows.size());

Review comment:
   Done, thanks

##
File path: 
sdks/java/extensions/ml/src/test/java/org/apache/beam/sdk/extensions/ml/DLPTextOperationsIT.java
##
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing 

[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429213646



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.privacy.dlp.v2.Table;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.StateSpec;
+import org.apache.beam.sdk.state.StateSpecs;
+import org.apache.beam.sdk.state.TimeDomain;
+import org.apache.beam.sdk.state.Timer;
+import org.apache.beam.sdk.state.TimerSpec;
+import org.apache.beam.sdk.state.TimerSpecs;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * DoFn batching the input PCollection into bigger requests in order to better 
utilize the Cloud DLP
+ * service.
+ */
+@Experimental
+class BatchRequestForDLP extends DoFn, KV>> {
+  public static final Logger LOG = 
LoggerFactory.getLogger(BatchRequestForDLP.class);
+
+  private final Counter numberOfRowsBagged =
+  Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged");
+  private final Integer batchSize;
+
+  @StateId("elementsBag")
+  private final StateSpec>> elementsBag = 
StateSpecs.bag();
+
+  @TimerId("eventTimer")
+  private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME);
+
+  public BatchRequestForDLP(Integer batchSize) {
+this.batchSize = batchSize;
+  }
+
+  @ProcessElement
+  public void process(
+  @Element KV element,
+  @StateId("elementsBag") BagState> elementsBag,
+  @TimerId("eventTimer") Timer eventTimer,
+  BoundedWindow w) {
+elementsBag.add(element);
+eventTimer.set(w.maxTimestamp());
+  }
+
+  @OnTimer("eventTimer")
+  public void onTimer(
+  @StateId("elementsBag") BagState> elementsBag,
+  OutputReceiver>> output) {
+String key = elementsBag.read().iterator().next().getKey();

Review comment:
   Ok, done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mwalenia commented on pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


mwalenia commented on pull request #11566:
URL: https://github.com/apache/beam/pull/11566#issuecomment-632663593


   @tysonjh What do you think about it now? I added Javadocs and some tests - 
do you feel some tests are still missing? Let me know!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mwalenia commented on a change in pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-22 Thread GitBox


mwalenia commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r429202254



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import com.google.privacy.dlp.v2.DeidentifyContentRequest;
+import com.google.privacy.dlp.v2.DeidentifyContentResponse;
+import com.google.privacy.dlp.v2.FieldId;
+import com.google.privacy.dlp.v2.InspectConfig;
+import com.google.privacy.dlp.v2.ProjectName;
+import com.google.privacy.dlp.v2.Table;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * A {@link PTransform} connecting to Cloud DLP and deidentifying text 
according to provided
+ * settings. The transform supports both CSV formatted input data and 
unstructured input.
+ *
+ * If the csvHeader property is set, csvDelimiter also should be, else the 
results will be
+ * incorrect. If csvHeader is not set, input is assumed to be unstructured.
+ *
+ * Either inspectTemplateName (String) or inspectConfig {@link 
InspectConfig} need to be set. The
+ * situation is the same with deidentifyTemplateName and deidentifyConfig 
({@link DeidentifyConfig}.
+ *
+ * Batch size defines how big are batches sent to DLP at once in bytes.
+ *
+ * The transform outputs {@link KV} of {@link String} (eg. filename) and 
{@link
+ * DeidentifyContentResponse}, which will contain {@link Table} of results for 
the user to consume.
+ */
+@Experimental
+@AutoValue
+public abstract class DLPDeidentifyText
+extends PTransform<
+PCollection>, PCollection>> {
+
+  @Nullable
+  public abstract String inspectTemplateName();

Review comment:
   Sure, I'll add comments





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kgabryje commented on pull request #11796: [BEAM-10003] Use local code for building code samples on website

2020-05-22 Thread GitBox


kgabryje commented on pull request #11796:
URL: https://github.com/apache/beam/pull/11796#issuecomment-632648552


   Hi @rezarokni , this PR solves the issue submitted by you. Can you take a 
look?
   Also @pabloem , I'd appreciate if you took a look as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kgabryje opened a new pull request #11796: [BEAM-10003] Use local code for building code samples on website

2020-05-22 Thread GitBox


kgabryje opened a new pull request #11796:
URL: https://github.com/apache/beam/pull/11796


   Currently code samples are fetched from Beam Github's master branch. So in 
order to make change in code and then write a document piece with code sample, 
it was necessary create 2 PRs - one with change in code and another with change 
in documentation. This PR solves this issue by using local code to build code 
samples for documentation instead of fetching it from Github.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[GitHub] [beam] kamilwu commented on pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model

2020-05-22 Thread GitBox


kamilwu commented on pull request #11795:
URL: https://github.com/apache/beam/pull/11795#issuecomment-632647909


   R: @TheNeuralBit 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model

2020-05-22 Thread GitBox


kamilwu commented on pull request #11795:
URL: https://github.com/apache/beam/pull/11795#issuecomment-632631059


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] purbanow commented on pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK

2020-05-22 Thread GitBox


purbanow commented on pull request #11794:
URL: https://github.com/apache/beam/pull/11794#issuecomment-632624409


   Hi @RyanSkraba will you find a moment to make a CR for this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] RyanSkraba commented on pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-22 Thread GitBox


RyanSkraba commented on pull request #10888:
URL: https://github.com/apache/beam/pull/10888#issuecomment-632621478


   I ran the `validatesRunner` tests again and the partition /tmp directory 
wasn't filled with zip files!  Thanks for your speedy attention, LGTM!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model

2020-05-22 Thread GitBox


kamilwu commented on pull request #11795:
URL: https://github.com/apache/beam/pull/11795#issuecomment-632612022


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu opened a new pull request #11795: [BEAM-10058] Provide less strict assertion to make the test more resistant against future changes in a model

2020-05-22 Thread GitBox


kamilwu opened a new pull request #11795:
URL: https://github.com/apache/beam/pull/11795


   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] RyanSkraba commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-22 Thread GitBox


RyanSkraba commented on a change in pull request #10888:
URL: https://github.com/apache/beam/pull/10888#discussion_r429155712



##
File path: 
runners/twister2/src/main/java/org/apache/beam/runners/twister2/package-info.java
##
@@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Internal implementation of the Beam runner for Apache Twister2. */

Review comment:
   I am SO sorry, such a minor nitpick but probably the right time to fix 
it!  You probably want to remove or move Apache here.
   ```suggestion
   /** Internal implementation of the Beam runner for Twister2. */
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] RyanSkraba commented on a change in pull request #10888: [BEAM-7304] Twister2 Beam runner

2020-05-22 Thread GitBox


RyanSkraba commented on a change in pull request #10888:
URL: https://github.com/apache/beam/pull/10888#discussion_r429149206



##
File path: 
runners/twister2/src/main/java/org/apache/beam/runners/twister2/BeamBatchWorker.java
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.twister2;
+
+import edu.iu.dsc.tws.api.config.Config;
+import edu.iu.dsc.tws.api.tset.TBase;
+import edu.iu.dsc.tws.api.tset.sets.TSet;
+import edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet;
+import edu.iu.dsc.tws.tset.TBaseGraph;
+import edu.iu.dsc.tws.tset.env.BatchTSetEnvironment;
+import edu.iu.dsc.tws.tset.links.BaseTLink;
+import edu.iu.dsc.tws.tset.sets.BaseTSet;
+import edu.iu.dsc.tws.tset.sets.BuildableTSet;
+import edu.iu.dsc.tws.tset.sets.batch.CachedTSet;
+import edu.iu.dsc.tws.tset.sets.batch.ComputeTSet;
+import edu.iu.dsc.tws.tset.sets.batch.SinkTSet;
+import edu.iu.dsc.tws.tset.worker.BatchTSetIWorker;
+import java.io.Serializable;
+import java.util.ArrayDeque;
+import java.util.Deque;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Set;
+import org.apache.beam.runners.twister2.translators.functions.DoFnFunction;
+import 
org.apache.beam.runners.twister2.translators.functions.Twister2SinkFunction;
+
+/**
+ * The Twister2 worker that will execute the job logic once the job is 
submitted from the run
+ * method.
+ */
+public class BeamBatchWorker implements Serializable, BatchTSetIWorker {
+
+  private static final String SIDEINPUTS = "sideInputs";
+  private static final String LEAVES = "leaves";
+  private static final String GRAPH = "graph";
+  private HashMap> sideInputDataSets;
+  private Set leaves;

Review comment:
   Of course!  Objectively, this is **_only_** a preference, I'm OK for 
leaving it for a subsequent improvement/cleanup.

##
File path: 
runners/twister2/src/main/java/org/apache/beam/runners/twister2/Twister2LegacyRunner.java
##
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.twister2;
+
+import static 
org.apache.beam.runners.core.construction.resources.PipelineResources.detectClassPathResourcesToStage;
+
+import edu.iu.dsc.tws.api.JobConfig;
+import edu.iu.dsc.tws.api.Twister2Job;
+import edu.iu.dsc.tws.api.config.Config;
+import edu.iu.dsc.tws.api.driver.DriverJobState;
+import edu.iu.dsc.tws.api.exceptions.Twister2RuntimeException;
+import edu.iu.dsc.tws.api.scheduler.Twister2JobState;
+import edu.iu.dsc.tws.api.tset.sets.TSet;
+import edu.iu.dsc.tws.api.tset.sets.batch.BatchTSet;
+import edu.iu.dsc.tws.local.LocalSubmitter;
+import edu.iu.dsc.tws.rsched.core.ResourceAllocator;
+import edu.iu.dsc.tws.rsched.job.Twister2Submitter;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.logging.LogManager;
+import java.util.logging.Logger;
+import java.util.stream.Collectors;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipOutputStream;
+import org.apache.beam.runners.core.construction.PTransformMatchers;
+import 

[GitHub] [beam] mwalenia commented on pull request #11775: [BEAM-10050] Change labels checked in VideoIntelligenceIT

2020-05-22 Thread GitBox


mwalenia commented on pull request #11775:
URL: https://github.com/apache/beam/pull/11775#issuecomment-632601472







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mwalenia commented on pull request #11775: [BEAM-10050] Change labels checked in VideoIntelligenceIT

2020-05-22 Thread GitBox


mwalenia commented on pull request #11775:
URL: https://github.com/apache/beam/pull/11775#issuecomment-632580694


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mwalenia commented on pull request #11775: [BEAM-10050] Change labels checked in VideoIntelligenceIT

2020-05-22 Thread GitBox


mwalenia commented on pull request #11775:
URL: https://github.com/apache/beam/pull/11775#issuecomment-632580795


   First precommit passed, launching second



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11777: [BEAM-10054] Fix watermark hold for on_time_pane

2020-05-22 Thread GitBox


mxm commented on pull request #11777:
URL: https://github.com/apache/beam/pull/11777#issuecomment-632572299


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] purbanow commented on a change in pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK

2020-05-22 Thread GitBox


purbanow commented on a change in pull request #11794:
URL: https://github.com/apache/beam/pull/11794#discussion_r429105275



##
File path: 
sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeServiceImpl.java
##
@@ -1,90 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io.snowflake;
-
-import java.sql.Connection;
-import java.sql.PreparedStatement;
-import java.sql.ResultSet;
-import java.sql.SQLException;
-import java.util.function.Consumer;
-import javax.sql.DataSource;
-import org.apache.beam.sdk.transforms.SerializableFunction;
-
-/**
- * Implemenation of {@link org.apache.beam.sdk.io.snowflake.SnowflakeService} 
used in production.
- */
-public class SnowflakeServiceImpl implements SnowflakeService {

Review comment:
   Note: This file was moved to `services/SnowflakeServiceImpl.java`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] purbanow commented on a change in pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK

2020-05-22 Thread GitBox


purbanow commented on a change in pull request #11794:
URL: https://github.com/apache/beam/pull/11794#discussion_r429105275



##
File path: 
sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeServiceImpl.java
##
@@ -1,90 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io.snowflake;
-
-import java.sql.Connection;
-import java.sql.PreparedStatement;
-import java.sql.ResultSet;
-import java.sql.SQLException;
-import java.util.function.Consumer;
-import javax.sql.DataSource;
-import org.apache.beam.sdk.transforms.SerializableFunction;
-
-/**
- * Implemenation of {@link org.apache.beam.sdk.io.snowflake.SnowflakeService} 
used in production.
- */
-public class SnowflakeServiceImpl implements SnowflakeService {

Review comment:
   Note: This file moved to `services/SnowflakeServiceImpl.java`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #11780: [BEAM-9948] Uploading mascot to the website

2020-05-22 Thread GitBox


iemejia commented on pull request #11780:
URL: https://github.com/apache/beam/pull/11780#issuecomment-632558904


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #11780: [BEAM-9948] Uploading mascot to the website

2020-05-22 Thread GitBox


iemejia commented on pull request #11780:
URL: https://github.com/apache/beam/pull/11780#issuecomment-632558819


   Run CommunityMetrics PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] purbanow opened a new pull request #11794: [BEAM-9894] Add batch SnowflakeIO.Write to Java SDK

2020-05-22 Thread GitBox


purbanow opened a new pull request #11794:
URL: https://github.com/apache/beam/pull/11794


   This PR is part of adding SnowflakeIO to Java SDK 
[BEAM-9893](https://issues.apache.org/jira/browse/BEAM-9893). Precisely this PR 
is adding write operation to SnowflakeIO 
[BEAM-9894](https://issues.apache.org/jira/browse/BEAM-9894).
   
   The SnowflakeIO.Write works in the way that puts data on GCS as CSV files 
and then uses Snowflake's JDBC driver to run [COPY INTO 
TABLE](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html) 
statement to move CSV files from GCS to Snowflake table.
   
   As mentioned in the previous 
[PR](https://github.com/apache/beam/pull/11360),  next PR’s will contain 
integration tests, streaming and cross-language support.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski edited a comment on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-626801208


   @markflyhigh Could I ask you for review?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski edited a comment on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-63279


   @tvalentyn Thank you much for your review! I made the requested changes. I'm 
still not sure about the part of the dashboards so please add a word whether my 
answer is relevant.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski commented on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-63279


   @tvalentyn I made the requested changes. I'm not sure about the part of 
dashboards so please add a word whether my answer is sufficient.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski edited a comment on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-63279







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski edited a comment on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-63279


   @tvalentyn Thank you much for your review! I made the requested changes. I'm 
still not sure about the part of the dashboards so please add a word whether my 
answer is sufficient.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r429095714



##
File path: sdks/python/test-suites/dataflow/common.gradle
##
@@ -109,4 +109,21 @@ task validatesRunnerStreamingTests {
   args '-c', ". ${envdir}/bin/activate && 
${runScriptsDir}/run_integration_test.sh $cmdArgs"
 }
   }
-}
\ No newline at end of file
+}
+
+task runPerformanceTest {
+dependsOn 'installGcpTest'
+dependsOn ':sdks:python:sdist'
+
+def test = project.findProperty('test')
+def testOpts = project.findProperty('test-pipeline-options')
+testOpts += " 
--sdk_location=${files(configurations.distTarBall.files).singleFile}"
+
+  doLast {
+exec {
+  workingDir "${project.rootDir}/sdks/python"
+  executable 'sh'
+  args '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py 
nosetests --tests=${test}  --test-pipeline-options=\"${testOpts}\" 
--ignore-files \'.*py3\\d?\\.py\$\'"

Review comment:
   I wanted to be on the safe side because in most places (not only in the 
bash scripts) it is added. But when I think about it we absolutely don't need 
to ignore anything, we run just one test. I'll remove it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r429094648



##
File path: sdks/python/test-suites/dataflow/common.gradle
##
@@ -109,4 +109,21 @@ task validatesRunnerStreamingTests {
   args '-c', ". ${envdir}/bin/activate && 
${runScriptsDir}/run_integration_test.sh $cmdArgs"
 }
   }
-}
\ No newline at end of file
+}
+
+task runPerformanceTest {
+dependsOn 'installGcpTest'
+dependsOn ':sdks:python:sdist'
+
+def test = project.findProperty('test')
+def testOpts = project.findProperty('test-pipeline-options')
+testOpts += " 
--sdk_location=${files(configurations.distTarBall.files).singleFile}"
+
+  doLast {
+exec {
+  workingDir "${project.rootDir}/sdks/python"
+  executable 'sh'
+  args '-c', ". ${envdir}/bin/activate && ${envdir}/bin/python setup.py 
nosetests --tests=${test}  --test-pipeline-options=\"${testOpts}\" 
--ignore-files \'.*py3\\d?\\.py\$\'"

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r429088765



##
File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
##
@@ -58,117 +26,59 @@ def dataflowPipelineArgs = [
 temp_location   : 'gs://temp-storage-for-end-to-end-tests/temp-it',
 ]
 
-
-// Configurations of each Jenkins job.
-def testConfigurations = [
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py27',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py27 with 1Gb files',
-jobTriggerPhrase  : 'Run Python27 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py27_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py2',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py35',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py35 with 1Gb files',
-jobTriggerPhrase  : 'Run Python35 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py35_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py35',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py36',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py36 with 1Gb files',
-jobTriggerPhrase  : 'Run Python36 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py36_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py36',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py37',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py37 with 1Gb files',
-jobTriggerPhrase  : 'Run Python37 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py37_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py37',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-]
-
+testConfigurations = []
+pythonVersions = ['27', '35', '36', '37']
+
+for (pythonVersion in pythonVersions) {

Review comment:
   I'm not sure if I understand meaning of "dashboards" correctly here, but 
we are running tasks via proper python modules that have pythonVersion variable 
already set up, so there is no need to set -PpythonVersion manually.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please 

[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r429088004



##
File path: .test-infra/jenkins/job_PerformanceTests_Python.groovy
##
@@ -58,117 +26,59 @@ def dataflowPipelineArgs = [
 temp_location   : 'gs://temp-storage-for-end-to-end-tests/temp-it',
 ]
 
-
-// Configurations of each Jenkins job.
-def testConfigurations = [
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py27',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py27 with 1Gb files',
-jobTriggerPhrase  : 'Run Python27 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py27_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py2',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py35',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py35 with 1Gb files',
-jobTriggerPhrase  : 'Run Python35 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py35_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py35',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py36',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py36 with 1Gb files',
-jobTriggerPhrase  : 'Run Python36 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py36_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py36',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-new PerformanceTestConfigurations(
-jobName   : 'beam_PerformanceTests_WordCountIT_Py37',
-jobDescription: 'Python SDK Performance Test - Run WordCountIT in 
Py37 with 1Gb files',
-jobTriggerPhrase  : 'Run Python37 WordCountIT Performance Test',
-resultTable   : 'beam_performance.wordcount_py37_pkb_results',
-test  : 
'apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it',
-itModule  : ':sdks:python:test-suites:dataflow:py37',
-extraPipelineArgs : dataflowPipelineArgs + [
-input: 
'gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.*', // 1Gb
-output: 
'gs://temp-storage-for-end-to-end-tests/py-it-cloud/output',
-expect_checksum: 'ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710',
-num_workers: '10',
-autoscaling_algorithm: 'NONE',  // Disable autoscale the worker 
pool.
-],
-),
-]
-
+testConfigurations = []
+pythonVersions = ['27', '35', '36', '37']

Review comment:
   I tried to keep the effect of the job as close to original as possible. 
I agree that 2 versions of python sound sufficient.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-22 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r429087219



##
File path: sdks/python/test-suites/dataflow/py2/build.gradle
##
@@ -205,3 +205,20 @@ task chicagoTaxiExample {
 }
   }
 }
+
+task runPerformanceTest {

Review comment:
   Sure, there can even be more code moved to common.gradle





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #11793: [BEAM-10064] Fix google3 import error for BEAM-9383

2020-05-22 Thread GitBox


ihji commented on pull request #11793:
URL: https://github.com/apache/beam/pull/11793#issuecomment-632517277


   R: @angoenka 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji opened a new pull request #11793: [BEAM-10064] Fix google3 import error for BEAM-9383

2020-05-22 Thread GitBox


ihji opened a new pull request #11793:
URL: https://github.com/apache/beam/pull/11793


   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] chadrik commented on a change in pull request #11070: [BEAM-8280] Blog post: Python typing changes

2020-05-22 Thread GitBox


chadrik commented on a change in pull request #11070:
URL: https://github.com/apache/beam/pull/11070#discussion_r429063034



##
File path: website/src/_posts/2020-03-06-python-typing.md
##
@@ -0,0 +1,117 @@
+---
+layout: post
+title:  "Python SDK Typing Changes"
+date:   2020-03-06 00:00:01 -0800
+excerpt_separator: 
+categories: blog python typing
+authors:
+  - chadrik
+  - udim
+
+---
+
+
+TODO excerpt
+
+
+
+Python supports type annotations on functions (PEP 484). Static type checkers,
+such as mypy, are used to verify adherence to these types.
+For example:
+```py
+def f(v: int) -> int:
+  return v[0]
+```
+Running mypy on the above code will give the error:
+`Value of type "int" is not indexable`.
+
+We've recently made changes to Beam in 2 areas:
+
+Adding type hints throughout Beam. TODO expand
+
+Second, we've added support for Python 3 type annotations. This allows SDK
+users to specify a DoFn's type hints in one place. 
+We've also expanded Beam's support of `typing` module types.
+
+For more background see: 
+[Ensuring Python Type 
Safety](https://beam.apache.org/documentation/sdks/python-type-safety/).
+
+# Beam Is Typed
+
+TODO
+
+# New Ways to Annotate
+
+## Python 3 Syntax Annotations
+
+Coming in Beam 2.21 (BEAM-8280), you will be able to use Python annotation
+syntax to specify input and output types.
+
+For example, this new form:
+```py
+class MyDoFn(beam.DoFn):
+  def process(self, element: int) -> typing.Text:
+yield str(element)
+```
+is equivalent to this:
+```py
+@beam.typehints.with_input_types(int)
+@beam.typehints.with_output_types(typing.Text)
+class MyDoFn(beam.DoFn):
+  def process(self, element):
+yield str(element)
+```
+
+One of the advantages of the new form is that you may already be using it
+in tandem with a static type checker such as mypy, thus getting additional
+type checking for free.
+
+This feature will be enabled by default, and there will be 2 mechanisms in
+place to disable it:
+1. Calling `apache_beam.typehints.disable_type_annotations()` before pipeline
+construction will disable the new feature completely.
+1. Decorating a function with `@apache_beam.typehints.no_annotations` will
+tell Beam to ignore annotations for it. 
+ 
+Uses of Beam's `with_input_type`, `with_output_type` methods and decorators 
will 
+still work and take precedence over annotations.
+
+Sidebar:
+
+> You might ask: couldn't we use mypy to type check Beam pipelines? The main 
issue
+is that such a tool would have to understand type relations between
+pipeline graph nodes, e.g., that the type of element passed to a transform
+should be consistent with its annotated input type.

Review comment:
   I went very deep on making transforms and collections generic, and I got 
pretty close to making it work, but there was still a need for a plug-in. IIRC, 
it wasn’t possible for mypy to propagate all of the type information when the 
apply() method was used. When we get past this current push I’ll revive that 
old experiment and show you where I got. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim commented on pull request #11070: [BEAM-8280] Blog post: Python typing changes

2020-05-22 Thread GitBox


udim commented on pull request #11070:
URL: https://github.com/apache/beam/pull/11070#issuecomment-632506918


   CC: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim commented on pull request #11070: [BEAM-8280] Blog post: Python typing changes

2020-05-22 Thread GitBox


udim commented on pull request #11070:
URL: https://github.com/apache/beam/pull/11070#issuecomment-632506706


   PTAL. Preview is here:
   
http://apache-beam-website-pull-requests.storage.googleapis.com/11070/blog/python-typing/index.html



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org