Hi, I have a large amount of CSV files stored in a GCS bucket which are timestamped according to their file pattern, i.e “ gs://deathstar/2017-10-05/plans.csv <gs://deathstar/2017-10-05/plans.csv>” “ gs://deathstar/2017-11-01/plans.csv <gs://deathstar/2017-11-01/plans.csv>”
So basically I want to utilise the functionality of TextIO.read() where each line in every file is read into a PCollection but I also need to extract the timestamp from the file pattern and link it to each line (KV or something similar). However it doesn’t seem possible to extract this metadata unless I use FileIO. However the problem here is that the entire file is read not split into individual lines. Is it possible to read each line in the specified globed file pattern and have some parsed file metadata (i.e timestamp from filepattern) linked to respective element? Kind Regards, Akash -- This message and any attachment(s) hereto are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient you are hereby notified that you have received this message in error and that you must not - in whole or in part - review, copy, distribute, retain copies or disclose the contents of this message or any attachments hereto. If you are not the intended recipient, please notify the sender immediately by return e-mail and delete this message and any attachment from your system.
