Hi Cory,

Apologies for the lack of response -- somehow, this ended up in my spam
bucket.

Right now, TextIO does not expose the filename being read. You can,
however, implement your own custom source that does so. I have an example
here [1], though it is a little out of date -- it uses the old Dataflow
APIs rather than the fancy new Beam APIs.

Thanks,
Dan

[1]
https://github.com/dhalperi/dataflow-escience/blob/master/src/main/java/com/google/cloud/dataflow/examples/escience/seaflow/SeaFlowIO.java

On Tue, Aug 9, 2016 at 5:57 PM, Cory Tucker <[email protected]> wrote:

> I am reading a bunch of files from cloud storage using TextIO.Read and the
> data is all in the same format (CSV). I need to de-dupe the records by ID
> and keep only the latest one, but I cannot tell which one is the latest by
> only the row data, instead it is implied by the filename.
>
> Any way I can get access to the filename when using TextIO?  If not, any
> suggestions on a workaround?
>
> thanks!
> --Cory
>

Reply via email to