Hi Cory, Apologies for the lack of response -- somehow, this ended up in my spam bucket.
Right now, TextIO does not expose the filename being read. You can, however, implement your own custom source that does so. I have an example here [1], though it is a little out of date -- it uses the old Dataflow APIs rather than the fancy new Beam APIs. Thanks, Dan [1] https://github.com/dhalperi/dataflow-escience/blob/master/src/main/java/com/google/cloud/dataflow/examples/escience/seaflow/SeaFlowIO.java On Tue, Aug 9, 2016 at 5:57 PM, Cory Tucker <[email protected]> wrote: > I am reading a bunch of files from cloud storage using TextIO.Read and the > data is all in the same format (CSV). I need to de-dupe the records by ID > and keep only the latest one, but I cannot tell which one is the latest by > only the row data, instead it is implied by the filename. > > Any way I can get access to the filename when using TextIO? If not, any > suggestions on a workaround? > > thanks! > --Cory >
