Thanks Sourabh! That looks like it should work for me. Tom
> On Apr 12, 2017, at 4:23 PM, Sourabh Bajaj <[email protected]> wrote: > > Hi, > > One idea could be creating a PCollection of the file names first and then in > a DoFn processing each image as my guess is you probably don't want the > PCollection to have sliced images. > > # Boilerplate python code for Beam > pcollection = p | beam.Create([None]) > list_of_images = pcollection | beam.FlatMap(get_list_of_image_files) > processed_images = list_of_images | beam.Map(process_image_given_filename) > > Does this fit your use case? The text sources try to work on one record so > can parallelize by splitting a single source file. > > Thanks > Sourabh > > On Wed, Apr 12, 2017 at 1:06 PM Tom Pollard <[email protected] > <mailto:[email protected]>> wrote: > I have a large collection of images on GCS and was interested in trying to > use Dataflow/BEAM to run analyses on these. It looks like the existing IOs > are all oriented towards textual data or structured data, and that there's no > IO that makes the metadata on GCS storage objects available to a BEAM > pipeline. Is that the case, or am I missing something? > > Tom Pollard > Senior Software Engineer > ______________________________ > FLASHPOINT > e: [email protected] <mailto:[email protected]> > w: www.flashpoint-intel.com <http://www.flashpoint-intel.com/> > > This email and any attachments are confidential and intended solely for the > addressee(s) and may also be privileged or exempt from disclosure under > applicable law. If you are not the addressee, or have received this email in > error, please notify the sender immediately, delete it from your system and > do not copy, distribute, disclose, or act upon any part of this email or its > attachments. >
smime.p7s
Description: S/MIME cryptographic signature
