Thanks Sourabh! That looks like it should work for me.

Tom

> On Apr 12, 2017, at 4:23 PM, Sourabh Bajaj <[email protected]> wrote:
> 
> Hi, 
> 
> One idea could be creating a PCollection of the file names first and then in 
> a DoFn processing each image as my guess is you probably don't want the 
> PCollection to have sliced images.
> 
> # Boilerplate python code for Beam
> pcollection = p | beam.Create([None])
> list_of_images = pcollection | beam.FlatMap(get_list_of_image_files)
> processed_images = list_of_images | beam.Map(process_image_given_filename)
> 
> Does this fit your use case? The text sources try to work on one record so 
> can parallelize by splitting a single source file.
> 
> Thanks
> Sourabh
> 
> On Wed, Apr 12, 2017 at 1:06 PM Tom Pollard <[email protected] 
> <mailto:[email protected]>> wrote:
> I have a large collection of images on GCS and was interested in trying to 
> use Dataflow/BEAM to run analyses on these.  It looks like the existing IOs 
> are all oriented towards textual data or structured data, and that there's no 
> IO that makes the metadata on GCS storage objects available to a BEAM 
> pipeline.  Is that the case, or am I missing something?  
> 
> Tom Pollard
> Senior Software Engineer
> ______________________________
> FLASHPOINT
> e:    [email protected] <mailto:[email protected]>
> w:    www.flashpoint-intel.com <http://www.flashpoint-intel.com/>
> 
> This email and any attachments are confidential and intended solely for the 
> addressee(s) and may also be privileged or exempt from disclosure under 
> applicable law. If you are not the addressee, or have received this email in 
> error, please notify the sender immediately, delete it from your system and 
> do not copy, distribute, disclose, or act upon any part of this email or its 
> attachments.
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to