[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393527
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 14:43
Start Date: 26/Feb/20 14:43
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393527)
Time Spent: 5h 50m  (was: 5h 40m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393464
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:46
Start Date: 26/Feb/20 13:46
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591433459
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393464)
Time Spent: 5h 40m  (was: 5.5h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393463
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:46
Start Date: 26/Feb/20 13:46
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591433644
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393463)
Time Spent: 5.5h  (was: 5h 20m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393462
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:45
Start Date: 26/Feb/20 13:45
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591433459
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393462)
Time Spent: 5h 20m  (was: 5h 10m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393450
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:37
Start Date: 26/Feb/20 13:37
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591429417
 
 
   Thanks for the thorough review!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393450)
Time Spent: 5h 10m  (was: 5h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393445
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:31
Start Date: 26/Feb/20 13:31
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591426682
 
 
   LGTM. I've left two small comments, but consider them as non-blockers.
   
   Thanks for merging those PTransform I mentioned and for documenting 
batch_size parameters.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393445)
Time Spent: 5h  (was: 4h 50m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393436
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:25
Start Date: 26/Feb/20 13:25
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r384488945
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml_test.py
 ##
 @@ -0,0 +1,252 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unit tests for visionml."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+from __future__ import unicode_literals
+
+import logging
+import unittest
+
+import mock
+
+import apache_beam as beam
+from apache_beam.metrics import MetricsFilter
+from apache_beam.typehints.decorators import TypeCheckError
+
+# Protect against environments where vision lib is not available.
+try:
+  from google.cloud.vision import ImageAnnotatorClient
+  from google.cloud import vision
+  from apache_beam.ml.gcp import visionml
+except ImportError:
+  ImageAnnotatorClient = None
+
+
+@unittest.skipIf(
+ImageAnnotatorClient is None, 'Vision dependencies are not installed')
+class VisionTest(unittest.TestCase):
+  def setUp(self):
+self._mock_client = mock.Mock()
+self.m2 = mock.Mock()
+self._mock_client.batch_annotate_images.return_value = None
+
+feature_type = vision.enums.Feature.Type.TEXT_DETECTION
+self.features = [
+vision.types.Feature(
+type=feature_type, max_results=3, model="builtin/stable")
+]
+self.img_ctx = vision.types.ImageContext()
+self.min_batch_size = 1
+self.max_batch_size = 1
+
+  def test_AnnotateImage_URIs(self):
+images_to_annotate = [
+'gs://cloud-samples-data/vision/ocr/sign.jpg',
+'gs://cloud-samples-data/vision/ocr/sign.jpg'
+]
+
+expected_counter = len(images_to_annotate)
+with mock.patch.object(visionml,
+   'get_vision_client',
+   return_value=self._mock_client):
+  p = beam.Pipeline()
+  _ = (
+  p
+  | "Create data" >> beam.Create(images_to_annotate)
+  | "Annotate image" >> visionml.AnnotateImage(
+  self.features,
+  min_batch_size=self.min_batch_size,
+  max_batch_size=self.max_batch_size))
+  result = p.run()
+  result.wait_until_finish()
+
+  read_filter = MetricsFilter().with_name('API Calls')
+  query_result = result.metrics().query(read_filter)
+  if query_result['counters']:
+read_counter = query_result['counters'][0]
+self.assertTrue(read_counter.committed == expected_counter)
 
 Review comment:
   It's safer to use `result` property rather than `committed`, because when 
`committed` is empty, `attempted` is being used. See the code:
   
https://github.com/apache/beam/blob/f9f3159ef6da97b41227d5fa52ac1c176179eb09/sdks/python/apache_beam/metrics/execution.py#L129-L134
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393436)
Time Spent: 4h 50m  (was: 4h 40m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates 

[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393423
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 13:16
Start Date: 26/Feb/20 13:16
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r384484177
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml_test.py
 ##
 @@ -0,0 +1,252 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unit tests for visionml."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+from __future__ import unicode_literals
+
+import logging
+import unittest
+
+import mock
+
+import apache_beam as beam
+from apache_beam.metrics import MetricsFilter
+from apache_beam.typehints.decorators import TypeCheckError
+
+# Protect against environments where vision lib is not available.
+try:
+  from google.cloud.vision import ImageAnnotatorClient
+  from google.cloud import vision
+  from apache_beam.ml.gcp import visionml
+except ImportError:
+  ImageAnnotatorClient = None
+
+
+@unittest.skipIf(
+ImageAnnotatorClient is None, 'Vision dependencies are not installed')
+class VisionTest(unittest.TestCase):
+  def setUp(self):
+self._mock_client = mock.Mock()
+self.m2 = mock.Mock()
 
 Review comment:
   This mock seems to be unused
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393423)
Time Spent: 4h 40m  (was: 4.5h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393303
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 09:29
Start Date: 26/Feb/20 09:29
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591328033
 
 
   @aaltay I noticed an issue in `setup.py` for the videointelligence library: 
it was missing a comma. I added that fix in this PR if that's okay.
   
   I also updated `CHANGES.md` to specify that the videointelligence features 
are for the Python SDK.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393303)
Time Spent: 4.5h  (was: 4h 20m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393234
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 08:13
Start Date: 26/Feb/20 08:13
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591295088
 
 
   > Cool, nice to hear that!
   > 
   > One more thing that came to my mind.
   > Would it be possible to merge `AnnotateImage` and `BatchAnnotateImage` 
transforms? I think the user has little interest in configuring 
`min_batch_size` and `max_batch_size`, because, at the end, responses are 
flatten (output PCollection is of type `AnnotateImageResponse`, not 
`List[AnnotateImageResponse]`). Also, I expect that `batch_annotate_images` 
with `len(requests) == 1` works the same way as `annotate_images`. But maybe 
I'm wrong.
   
   Good guess! Indeed, looking at the source code for `annotate_image`, it is 
essentially just a wrapper around `batch_annotate_images` like so `r = 
self.batch_annotate_images([request], retry=retry, timeout=timeout)`.
   
   I'll need to keep the min_batch_size and max_batch_size parameters for 
testing purposes, in order to get a valid count of the API calls as otherwise 
the number of calls is dependent on the BatchElements transform. But I'll make 
sure to document these two parameters as such.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393234)
Time Spent: 4h 20m  (was: 4h 10m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=393232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393232
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 26/Feb/20 08:12
Start Date: 26/Feb/20 08:12
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-591295088
 
 
   > Cool, nice to hear that!
   > 
   > One more thing that came to my mind.
   > Would it be possible to merge `AnnotateImage` and `BatchAnnotateImage` 
transforms? I think the user has little interest in configuring 
`min_batch_size` and `max_batch_size`, because, at the end, responses are 
flatten (output PCollection is of type `AnnotateImageResponse`, not 
`List[AnnotateImageResponse]`). Also, I expect that `batch_annotate_images` 
with `len(requests) == 1` works the same way as `annotate_images`. But maybe 
I'm wrong.
   
   Indeed, looking at the source code for `annotate_image`, it is essentially 
just a wrapper around `batch_annotate_images` like so `r = 
self.batch_annotate_images([request], retry=retry, timeout=timeout)`.
   
   I'll need to keep the min_batch_size and max_batch_size parameters for 
testing purposes, in order to get a valid count of the API calls as otherwise 
the number of calls is dependent on the BatchElements transform. But I'll make 
sure to document these two parameters as such.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393232)
Time Spent: 4h 10m  (was: 4h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392804
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 18:30
Start Date: 25/Feb/20 18:30
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590998207
 
 
   > Overall direction LGTM. One comment on the setup.py
   > 
   > 1. Do you need a lower limit?
   > 2. <0.43.0 ? if we dont want to automatically pick up the next version.
   
   1. Yes, sorry, I'm meaning to test out a few different versions below 0.42.0 
before finalising the PR. But wanted to make sure that the PR was heading in 
the correct general direction before doing that.
   
   2. Good catch, that's a typo, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392804)
Time Spent: 4h  (was: 3h 50m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392803
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 18:30
Start Date: 25/Feb/20 18:30
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590998207
 
 
   > Overall direction LGTM. One comment on the setup.py
   > 
   > 1. Do you need a lower limit?
   > 2. <0.43.0 ? if we dont want to automatically pick up the next version.
   
   1. 
   Yes, sorry, I'm meaning to test out a few different versions below 0.42.0 
before finalising the PR. But wanted to make sure that the PR was heading in 
the correct general direction before doing that.
   
   2. 
   Good catch, that's a typo, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392803)
Time Spent: 3h 50m  (was: 3h 40m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392802
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 18:29
Start Date: 25/Feb/20 18:29
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590998207
 
 
   > Overall direction LGTM. One comment on the setup.py
   > 
   > 1. Do you need a lower limit?
   > 2. <0.43.0 ? if we dont want to automatically pick up the next version.
   
   1. 
   Yes, sorry, I'm meaning to test out a few different versions below 0.42.0 
before finalising the PR. But wanted to make sure that the PR was heading in 
the correct general direction before doing that.
   
   2. Good catch, that's a typo, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392802)
Time Spent: 3h 40m  (was: 3.5h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392766
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 18:29
Start Date: 25/Feb/20 18:29
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590998207
 
 
   > Overall direction LGTM. One comment on the setup.py
   > 
   > 1. Do you need a lower limit?
   > 2. <0.43.0 ? if we dont want to automatically pick up the next version.
   
   Yes, sorry, I'm meaning to test out a few different versions below 0.42.0 
before finalising the PR. But wanted to make sure that the PR was heading in 
the correct general direction before doing that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392766)
Time Spent: 3.5h  (was: 3h 20m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392763
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 18:28
Start Date: 25/Feb/20 18:28
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590998207
 
 
   > Overall direction LGTM. One comment on the setup.py
   > 
   > 1. Do you need a lower limit?
   > 2. <0.43.0 ? if we dont want to automatically pick up the next version.
   
   Yes, sorry, I'm meaning to test out a few different versions below 0.43.0 
before finalising the PR. But wanted to make sure that the PR was heading in 
the correct general direction before doing that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392763)
Time Spent: 3h 20m  (was: 3h 10m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392753
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 18:14
Start Date: 25/Feb/20 18:14
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590992133
 
 
   Overall direction LGTM. One comment on the setup.py
   
   1. Do you need a lower limit?
   2. <0.43.0 ? if we dont want to automatically pick up the next version.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392753)
Time Spent: 3h 10m  (was: 3h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392668
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 16:13
Start Date: 25/Feb/20 16:13
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590947582
 
 
   Cool, nice to hear that!
   
   One more thing that came to my mind.
   Would it be possible to merge `AnnotateImage` and `BatchAnnotateImage` 
transforms? I think the user has little interest in configuring 
`min_batch_size` and `max_batch_size`, because, at the end, responses are 
flatten (output PCollection is of type `AnnotateImageResponse`, not 
`List[AnnotateImageResponse]`). Also, I expect that `batch_annotate_images` 
with `len(requests) == 1` works the same way as `annotate_images`. But maybe 
I'm wrong.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392668)
Time Spent: 3h  (was: 2h 50m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392608
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 14:59
Start Date: 25/Feb/20 14:59
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590909015
 
 
   Cheers for the tip about `BatchElements`. Using that reduces code 
duplication quite a bit as we can offload the creation of the 
`AnnotateImageRequest` to an earlier step in the `PTransform`, leaving us with 
only one `DoFn` for the two Batch transforms.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392608)
Time Spent: 2h 50m  (was: 2h 40m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392549
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 13:31
Start Date: 25/Feb/20 13:31
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590867589
 
 
   > If you think that batches of <=5 items are useful, then perhaps we should 
go with sync (online) batch annotation rather than async.
   
   I don't see any obstacles.
   
   Also take a look at `BatchElements` built-in PTransform (to be found at 
`sdks/python/apache_beam/transforms/util.py`), which could be useful in our 
case. This transform consumes a PCollection of element type T and produces a 
PCollection of type List[T]. The maximum size of the list can be configured by 
setting a `max_batch_size` parameter. This would speed things up a bit: it's 
much better to send one request with 5 files than 5 separate requests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392549)
Time Spent: 2h 40m  (was: 2.5h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392543
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 13:22
Start Date: 25/Feb/20 13:22
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590859920
 
 
   > I wonder whether it makes sense to support async (offline) annotation from 
Beam's perspective. Let's suppose that we don't return anything and 
`AsyncBatchAnnotateImage` essentially become a sink. In that case, Beam, as a 
data processing framework, doesn't provide much value. If all the transform 
does is just sending a request, there is no point in executing it on multiple 
Dataflow or Flink workers. We'd rather use a task orchestration tool, like 
Apache Airflow.
   > 
   > On the other hand, the main advantage of sync (online) annotation is that 
it returns results relatively fast for further processing.
   
   I agree that those sort of tasks are more suited for e.g. Airflow, and that 
was something that crossed my mind too. 
   
   If you think that batches of <=5 items are useful,  then perhaps we should 
go with sync (online) batch annotation rather than async. But I guess the 
question then is how much use (or more efficient) a batch of 5 image 
annotations is compared to 5 separate annotations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392543)
Time Spent: 2.5h  (was: 2h 20m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392539
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 13:13
Start Date: 25/Feb/20 13:13
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590859920
 
 
   > I wonder whether it makes sense to support async (offline) annotation from 
Beam's perspective. Let's suppose that we don't return anything and 
`AsyncBatchAnnotateImage` essentially become a sink. In that case, Beam, as a 
data processing framework, doesn't provide much value. If all the transform 
does is just sending a request, there is no point in executing it on multiple 
Dataflow or Flink workers. We'd rather use a task orchestration tool, like 
Apache Airflow.
   > 
   > On the other hand, the main advantage of sync (online) annotation is that 
it returns results relatively fast for further processing.
   
   I agree that those sort of tasks are more suited for e.g. Airflow, and that 
was something that crossed my mind too. 
   
   If you think that batches of <=5 items are useful,  then perhaps we should 
go with sync (online) batch annotation rather than async.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392539)
Time Spent: 2h 20m  (was: 2h 10m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392538
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 13:11
Start Date: 25/Feb/20 13:11
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r383868124
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml.py
 ##
 @@ -0,0 +1,526 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A connector for sending API requests to the GCP Vision API.
+"""
+
+from __future__ import absolute_import
+
+from typing import Optional
+from typing import Tuple
+from typing import Union
+
+from future.utils import binary_type
+from future.utils import text_type
+
+from apache_beam import typehints
+from apache_beam.metrics import Metrics
+from apache_beam.transforms import DoFn
+from apache_beam.transforms import ParDo
+from apache_beam.transforms import PTransform
+from cachetools.func import ttl_cache
+
+try:
+  from google.cloud import vision
+except ImportError:
+  raise ImportError(
+  'Google Cloud Vision not supported for this execution environment '
+  '(could not import google.cloud.vision).')
+
+__all__ = [
+'AnnotateImage',
+'AnnotateImageWithContext',
+'AsyncBatchAnnotateImage',
+'AsyncBatchAnnotateImageWithContext'
+]
+
+
+@ttl_cache(maxsize=128, ttl=3600)
+def get_vision_client(client_options=None):
+  """Returns a Cloud Vision API client."""
+  _client = vision.ImageAnnotatorClient(client_options=client_options)
+  return _client
+
+
+class AnnotateImage(PTransform):
+  """A ``PTransform`` for annotating images using the GCP Vision API
+  ref: https://cloud.google.com/vision/docs/
+
+  Sends each element to the GCP Vision API. Element is a
+  Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or 
binary_type
+  base64-encoded image data.
+  Accepts an `AsDict` side input that maps each image to an image context.
+  """
+  def __init__(
+  self,
+  features,
+  retry=None,
+  timeout=120,
+  client_options=None,
+  context_side_input=None):
+"""
+Args:
+  features: (List[``vision.types.Feature.enums.Feature``]) Required.
+The Vision API features to detect
+  retry: (google.api_core.retry.Retry) Optional.
+A retry object used to retry requests.
+If None is specified (default), requests will not be retried.
+  timeout: (float) Optional.
+The time in seconds to wait for the response from the
+Vision API.
+  client_options: (Union[dict, 
google.api_core.client_options.ClientOptions])
+Client options used to set user options on the client.
+API Endpoint should be set through client_options.
+  context_side_input: (beam.pvalue.AsDict) Optional.
+An ``AsDict`` of a PCollection to be passed to the
+_ImageAnnotateFn as the image context mapping containing additional
+image context and/or feature-specific parameters.
+Example usage::
+
+  image_contexts =
+[(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),
+(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),]
+
+  context_side_input =
+(
+  p
+  | "Image contexts" >> beam.Create(image_contexts)
+)
+
+  visionml.AnnotateImage(features,
+context_side_input=beam.pvalue.AsDict(context_side_input)))
+"""
+super(AnnotateImage, self).__init__()
+self.features = features
+self.retry = retry
+self.timeout = timeout
+self.client_options = client_options
+self.context_side_input = context_side_input
+
+  def expand(self, pvalue):
+return pvalue | ParDo(
+_ImageAnnotateFn(
+features=self.features,
+

[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392537
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 13:09
Start Date: 25/Feb/20 13:09
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590858166
 
 
   > Synchronous batch annotation only supports batching <=5 elements at a 
time, so is not very useful
   
   From my perspective, there should be no problem in sending a request 
containing up to 5 files to be annotated, then waiting for the result and 
sending another request.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392537)
Time Spent: 2h  (was: 1h 50m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392536
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 13:08
Start Date: 25/Feb/20 13:08
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590857909
 
 
   I wonder whether it makes sense to support async (offline) annotation from 
Beam's perspective. Let's suppose that we don't return anything and 
`AsyncBatchAnnotateImage` essentially become a sink. In that case, Beam, as a 
data processing framework, doesn't provide much value. If all the transform 
does is just sending a request, there is no point in executing it on multiple 
Dataflow or Flink workers. We'd rather use a task orchestration tool, like 
Apache Airflow.
   
   On the other hand, the main advantage of sync (online) annotation is that it 
returns results relatively fast for further processing. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392536)
Time Spent: 1h 50m  (was: 1h 40m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392525
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 12:37
Start Date: 25/Feb/20 12:37
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r383808713
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml.py
 ##
 @@ -0,0 +1,526 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A connector for sending API requests to the GCP Vision API.
+"""
+
+from __future__ import absolute_import
+
+from typing import Optional
+from typing import Tuple
+from typing import Union
+
+from future.utils import binary_type
+from future.utils import text_type
+
+from apache_beam import typehints
+from apache_beam.metrics import Metrics
+from apache_beam.transforms import DoFn
+from apache_beam.transforms import ParDo
+from apache_beam.transforms import PTransform
+from cachetools.func import ttl_cache
+
+try:
+  from google.cloud import vision
+except ImportError:
+  raise ImportError(
+  'Google Cloud Vision not supported for this execution environment '
+  '(could not import google.cloud.vision).')
+
+__all__ = [
+'AnnotateImage',
+'AnnotateImageWithContext',
+'AsyncBatchAnnotateImage',
+'AsyncBatchAnnotateImageWithContext'
+]
+
+
+@ttl_cache(maxsize=128, ttl=3600)
+def get_vision_client(client_options=None):
+  """Returns a Cloud Vision API client."""
+  _client = vision.ImageAnnotatorClient(client_options=client_options)
+  return _client
+
+
+class AnnotateImage(PTransform):
+  """A ``PTransform`` for annotating images using the GCP Vision API
+  ref: https://cloud.google.com/vision/docs/
+
+  Sends each element to the GCP Vision API. Element is a
+  Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or 
binary_type
+  base64-encoded image data.
+  Accepts an `AsDict` side input that maps each image to an image context.
+  """
+  def __init__(
+  self,
+  features,
+  retry=None,
+  timeout=120,
+  client_options=None,
+  context_side_input=None):
+"""
+Args:
+  features: (List[``vision.types.Feature.enums.Feature``]) Required.
+The Vision API features to detect
+  retry: (google.api_core.retry.Retry) Optional.
+A retry object used to retry requests.
+If None is specified (default), requests will not be retried.
+  timeout: (float) Optional.
+The time in seconds to wait for the response from the
+Vision API.
+  client_options: (Union[dict, 
google.api_core.client_options.ClientOptions])
+Client options used to set user options on the client.
+API Endpoint should be set through client_options.
+  context_side_input: (beam.pvalue.AsDict) Optional.
+An ``AsDict`` of a PCollection to be passed to the
+_ImageAnnotateFn as the image context mapping containing additional
+image context and/or feature-specific parameters.
+Example usage::
+
+  image_contexts =
+[(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),
+(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),]
+
+  context_side_input =
+(
+  p
+  | "Image contexts" >> beam.Create(image_contexts)
+)
+
+  visionml.AnnotateImage(features,
+context_side_input=beam.pvalue.AsDict(context_side_input)))
+"""
+super(AnnotateImage, self).__init__()
+self.features = features
+self.retry = retry
+self.timeout = timeout
+self.client_options = client_options
+self.context_side_input = context_side_input
+
+  def expand(self, pvalue):
+return pvalue | ParDo(
+_ImageAnnotateFn(
+features=self.features,
+retry=self.retry,

[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392515
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 12:09
Start Date: 25/Feb/20 12:09
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590835668
 
 
   > Has there been a discussion of whether this (rather specialized) kind of 
transform belongs in beam vs. as a separate library built on top of beam?
   
   There's been a discussion on devlist some time ago: 
[here](https://lists.apache.org/thread.html/rdc9647d00c4bb173f00ca433a2cdea3efe176aea19e631ab9723a80c%40%3Cdev.beam.apache.org%3E).
 General conclusion was that it's fine to put Cloud AI transforms directly into 
Beam. Also, we have similar transforms already implemented: 
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/ml/gcp
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392515)
Time Spent: 1.5h  (was: 1h 20m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392475
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 11:01
Start Date: 25/Feb/20 11:01
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r383808713
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml.py
 ##
 @@ -0,0 +1,526 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A connector for sending API requests to the GCP Vision API.
+"""
+
+from __future__ import absolute_import
+
+from typing import Optional
+from typing import Tuple
+from typing import Union
+
+from future.utils import binary_type
+from future.utils import text_type
+
+from apache_beam import typehints
+from apache_beam.metrics import Metrics
+from apache_beam.transforms import DoFn
+from apache_beam.transforms import ParDo
+from apache_beam.transforms import PTransform
+from cachetools.func import ttl_cache
+
+try:
+  from google.cloud import vision
+except ImportError:
+  raise ImportError(
+  'Google Cloud Vision not supported for this execution environment '
+  '(could not import google.cloud.vision).')
+
+__all__ = [
+'AnnotateImage',
+'AnnotateImageWithContext',
+'AsyncBatchAnnotateImage',
+'AsyncBatchAnnotateImageWithContext'
+]
+
+
+@ttl_cache(maxsize=128, ttl=3600)
+def get_vision_client(client_options=None):
+  """Returns a Cloud Vision API client."""
+  _client = vision.ImageAnnotatorClient(client_options=client_options)
+  return _client
+
+
+class AnnotateImage(PTransform):
+  """A ``PTransform`` for annotating images using the GCP Vision API
+  ref: https://cloud.google.com/vision/docs/
+
+  Sends each element to the GCP Vision API. Element is a
+  Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or 
binary_type
+  base64-encoded image data.
+  Accepts an `AsDict` side input that maps each image to an image context.
+  """
+  def __init__(
+  self,
+  features,
+  retry=None,
+  timeout=120,
+  client_options=None,
+  context_side_input=None):
+"""
+Args:
+  features: (List[``vision.types.Feature.enums.Feature``]) Required.
+The Vision API features to detect
+  retry: (google.api_core.retry.Retry) Optional.
+A retry object used to retry requests.
+If None is specified (default), requests will not be retried.
+  timeout: (float) Optional.
+The time in seconds to wait for the response from the
+Vision API.
+  client_options: (Union[dict, 
google.api_core.client_options.ClientOptions])
+Client options used to set user options on the client.
+API Endpoint should be set through client_options.
+  context_side_input: (beam.pvalue.AsDict) Optional.
+An ``AsDict`` of a PCollection to be passed to the
+_ImageAnnotateFn as the image context mapping containing additional
+image context and/or feature-specific parameters.
+Example usage::
+
+  image_contexts =
+[(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),
+(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),]
+
+  context_side_input =
+(
+  p
+  | "Image contexts" >> beam.Create(image_contexts)
+)
+
+  visionml.AnnotateImage(features,
+context_side_input=beam.pvalue.AsDict(context_side_input)))
+"""
+super(AnnotateImage, self).__init__()
+self.features = features
+self.retry = retry
+self.timeout = timeout
+self.client_options = client_options
+self.context_side_input = context_side_input
+
+  def expand(self, pvalue):
+return pvalue | ParDo(
+_ImageAnnotateFn(
+features=self.features,
+retry=self.retry,

[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392469
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 10:53
Start Date: 25/Feb/20 10:53
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r383792592
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml.py
 ##
 @@ -0,0 +1,526 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A connector for sending API requests to the GCP Vision API.
+"""
+
+from __future__ import absolute_import
+
+from typing import Optional
+from typing import Tuple
+from typing import Union
+
+from future.utils import binary_type
+from future.utils import text_type
+
+from apache_beam import typehints
+from apache_beam.metrics import Metrics
+from apache_beam.transforms import DoFn
+from apache_beam.transforms import ParDo
+from apache_beam.transforms import PTransform
+from cachetools.func import ttl_cache
+
+try:
+  from google.cloud import vision
+except ImportError:
+  raise ImportError(
+  'Google Cloud Vision not supported for this execution environment '
+  '(could not import google.cloud.vision).')
+
+__all__ = [
+'AnnotateImage',
+'AnnotateImageWithContext',
+'AsyncBatchAnnotateImage',
+'AsyncBatchAnnotateImageWithContext'
+]
+
+
+@ttl_cache(maxsize=128, ttl=3600)
+def get_vision_client(client_options=None):
+  """Returns a Cloud Vision API client."""
+  _client = vision.ImageAnnotatorClient(client_options=client_options)
+  return _client
+
+
+class AnnotateImage(PTransform):
+  """A ``PTransform`` for annotating images using the GCP Vision API
+  ref: https://cloud.google.com/vision/docs/
+
+  Sends each element to the GCP Vision API. Element is a
+  Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or 
binary_type
+  base64-encoded image data.
+  Accepts an `AsDict` side input that maps each image to an image context.
+  """
+  def __init__(
+  self,
+  features,
+  retry=None,
+  timeout=120,
+  client_options=None,
+  context_side_input=None):
+"""
+Args:
+  features: (List[``vision.types.Feature.enums.Feature``]) Required.
+The Vision API features to detect
+  retry: (google.api_core.retry.Retry) Optional.
+A retry object used to retry requests.
+If None is specified (default), requests will not be retried.
+  timeout: (float) Optional.
+The time in seconds to wait for the response from the
+Vision API.
+  client_options: (Union[dict, 
google.api_core.client_options.ClientOptions])
+Client options used to set user options on the client.
+API Endpoint should be set through client_options.
+  context_side_input: (beam.pvalue.AsDict) Optional.
+An ``AsDict`` of a PCollection to be passed to the
+_ImageAnnotateFn as the image context mapping containing additional
+image context and/or feature-specific parameters.
+Example usage::
+
+  image_contexts =
+[(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),
+(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),]
+
+  context_side_input =
+(
+  p
+  | "Image contexts" >> beam.Create(image_contexts)
+)
+
+  visionml.AnnotateImage(features,
+context_side_input=beam.pvalue.AsDict(context_side_input)))
+"""
+super(AnnotateImage, self).__init__()
+self.features = features
+self.retry = retry
+self.timeout = timeout
+self.client_options = client_options
+self.context_side_input = context_side_input
+
+  def expand(self, pvalue):
+return pvalue | ParDo(
+_ImageAnnotateFn(
+features=self.features,
+retry=self.retry,

[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392462
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 10:32
Start Date: 25/Feb/20 10:32
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#discussion_r383792592
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/visionml.py
 ##
 @@ -0,0 +1,526 @@
+# pylint: skip-file
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A connector for sending API requests to the GCP Vision API.
+"""
+
+from __future__ import absolute_import
+
+from typing import Optional
+from typing import Tuple
+from typing import Union
+
+from future.utils import binary_type
+from future.utils import text_type
+
+from apache_beam import typehints
+from apache_beam.metrics import Metrics
+from apache_beam.transforms import DoFn
+from apache_beam.transforms import ParDo
+from apache_beam.transforms import PTransform
+from cachetools.func import ttl_cache
+
+try:
+  from google.cloud import vision
+except ImportError:
+  raise ImportError(
+  'Google Cloud Vision not supported for this execution environment '
+  '(could not import google.cloud.vision).')
+
+__all__ = [
+'AnnotateImage',
+'AnnotateImageWithContext',
+'AsyncBatchAnnotateImage',
+'AsyncBatchAnnotateImageWithContext'
+]
+
+
+@ttl_cache(maxsize=128, ttl=3600)
+def get_vision_client(client_options=None):
+  """Returns a Cloud Vision API client."""
+  _client = vision.ImageAnnotatorClient(client_options=client_options)
+  return _client
+
+
+class AnnotateImage(PTransform):
+  """A ``PTransform`` for annotating images using the GCP Vision API
+  ref: https://cloud.google.com/vision/docs/
+
+  Sends each element to the GCP Vision API. Element is a
+  Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or 
binary_type
+  base64-encoded image data.
+  Accepts an `AsDict` side input that maps each image to an image context.
+  """
+  def __init__(
+  self,
+  features,
+  retry=None,
+  timeout=120,
+  client_options=None,
+  context_side_input=None):
+"""
+Args:
+  features: (List[``vision.types.Feature.enums.Feature``]) Required.
+The Vision API features to detect
+  retry: (google.api_core.retry.Retry) Optional.
+A retry object used to retry requests.
+If None is specified (default), requests will not be retried.
+  timeout: (float) Optional.
+The time in seconds to wait for the response from the
+Vision API.
+  client_options: (Union[dict, 
google.api_core.client_options.ClientOptions])
+Client options used to set user options on the client.
+API Endpoint should be set through client_options.
+  context_side_input: (beam.pvalue.AsDict) Optional.
+An ``AsDict`` of a PCollection to be passed to the
+_ImageAnnotateFn as the image context mapping containing additional
+image context and/or feature-specific parameters.
+Example usage::
+
+  image_contexts =
+[(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),
+(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
+``vision.types.ImageContext()``]),]
+
+  context_side_input =
+(
+  p
+  | "Image contexts" >> beam.Create(image_contexts)
+)
+
+  visionml.AnnotateImage(features,
+context_side_input=beam.pvalue.AsDict(context_side_input)))
+"""
+super(AnnotateImage, self).__init__()
+self.features = features
+self.retry = retry
+self.timeout = timeout
+self.client_options = client_options
+self.context_side_input = context_side_input
+
+  def expand(self, pvalue):
+return pvalue | ParDo(
+_ImageAnnotateFn(
+features=self.features,
+retry=self.retry,

[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392412
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 09:29
Start Date: 25/Feb/20 09:29
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590768636
 
 
   Most of the discussion was done in this PR: 
https://github.com/apache/beam/pull/10764 and the over-arching Jira ticket is 
here https://issues.apache.org/jira/browse/BEAM-9145
   
   I agree that these transforms are slightly different as they call an 
external API and lets that do most of the heavy-lifting processing.
   
   The batching pattern implemented here is quite specific to these transforms. 
Not sure how applicable it would be to pull out.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392412)
Time Spent: 50m  (was: 40m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392411
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 09:27
Start Date: 25/Feb/20 09:27
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590768636
 
 
   Most of the discussion was done in this PR: 
https://github.com/apache/beam/pull/10764 and the over-arching Jira ticket is 
here https://issues.apache.org/jira/browse/BEAM-9145
   
   I agree that these transforms are slightly different as they call an 
external API and lets that do most of the heavy-lifting processing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392411)
Time Spent: 40m  (was: 0.5h)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392405
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 09:19
Start Date: 25/Feb/20 09:19
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590765249
 
 
   Has there been a discussion of whether this (rather specialized) kind of 
transform belongs in beam vs. as a separate library built on top of beam? 
   
   Also, is the synchronous/batching pattern something that would be generally 
useful, and could be pulled out, or is it really integral to these transforms 
and contexts? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392405)
Time Spent: 0.5h  (was: 20m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392380
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 08:40
Start Date: 25/Feb/20 08:40
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10959: [BEAM-9247] Integrate 
GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959#issuecomment-590749149
 
 
   R: @aaltay @kamilwu 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 392380)
Time Spent: 20m  (was: 10m)

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9247?focusedWorklogId=392376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392376
 ]

ASF GitHub Bot logged work on BEAM-9247:


Author: ASF GitHub Bot
Created on: 25/Feb/20 08:36
Start Date: 25/Feb/20 08:36
Worklog Time Spent: 10m 
  Work Description: EDjur commented on pull request #10959: [BEAM-9247] 
Integrate GCP Vision API functionality
URL: https://github.com/apache/beam/pull/10959
 
 
   This PR refers to https://issues.apache.org/jira/browse/BEAM-9247: [Python] 
PTransform that integrates Cloud Vision functionality.
   
   Small comments:
   The synchronous annotation is very similar to the videointelligence 
implementation.
   
   I opted to only support async (offline) batch annotation:
   * Synchronous batch annotation only supports batching <=5 elements at a 
time, so is not very useful (https://cloud.google.com/vision/docs/batch).
   * Synchronous batch annotation also doesn't make a lot of sense for Beam, as 
the CPU/workers would just sit around and wait until a response is received 
from the Vision API, which seems inefficient.
   
   
   I'm still working on a few kinks and issues with the batch annotation as 
well as making sure all tests run properly, so consider this a draft PR for now.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build