Hi!
I have encountered this a few times but have only solved it using some ugly
hack so far so I thought I'd ask this time.

If I have a bunch of files with timestamps in their names but with no
timestamps in the data, how should I best read them into a PCollection of
timestamped values?

The files are json or CSV files and is I use the TextIO.read I don't have
the filenames available anymore.

If the best way to do this to write your own source? In that case how can I
most easily get the filename or timestamp into the data using essentially
everything else from TextIO? I tried doing this using a filebased source
but it didn't pan out too well.

Or is it better to do a DoFn that reads a PCollection of filenames and then
itself reads these files and fan-out? I have had some bad experiences with
fan-out so I'm not sure this is good either.

If anyone has solved this it would be really interesting to know what the
best approach would be.

Thanks!
Vilhelm von Ehrenheim

Reply via email to