I have implemented a reader in the same way Lukasz described for a little toy project I wrote a few weeks ago:
https://github.com/davideanastasia/apache-beam-getting-started/blob/master/src/main/java/com/davideanastasia/beam/gs/fn/ReadFileFn.java Hope it helps, Davide On 13 August 2018 at 15:53, Lukasz Cwik <[email protected]> wrote: > Not with the current capabilities of TextIO since it only provides the > contents of the file and none of the metadata. > > FileIO allows you to use a ReadableByteChannel so you can read the file > incrementally in a loop and as you already have figured out you'll need to > parse and emit on every new line that you see yourself. Consider combining > the ReadableByteChannel with a BufferedReader and leverage the readLine[1] > method in a for loop. > > 1: https://docs.oracle.com/javase/8/docs/api/java/io/ > BufferedReader.html#readLine-- > > On Mon, Aug 13, 2018 at 7:32 AM Akash Patel <[email protected]> > wrote: > >> Hi, >> >> I have a large amount of CSV files stored in a GCS bucket which are >> timestamped according to their file pattern, i.e >> “ gs://deathstar/2017-10-05/plans.csv” >> “ gs://deathstar/2017-11-01/plans.csv” >> >> So basically I want to utilise the functionality of TextIO.read() where >> each line in every file is read into a PCollection but I also need to >> extract the timestamp from the file pattern and link it to each line (KV or >> something similar). However it doesn’t seem possible to extract this >> metadata unless I use FileIO. However the problem here is that the entire >> file is read not split into individual lines. Is it possible to read each >> line in the specified globed file pattern and have some parsed file >> metadata (i.e timestamp from filepattern) linked to respective element? >> >> Kind Regards, >> Akash >> >> ------------------------------ >> >> This message and any attachment(s) hereto are confidential and may be >> privileged or otherwise protected from disclosure. If you are not the >> intended recipient you are hereby notified that you have received this >> message in error and that you must not - in whole or in part - review, >> copy, distribute, retain copies or disclose the contents of this message or >> any attachments hereto. If you are not the intended recipient, please >> notify the sender immediately by return e-mail and delete this message and >> any attachment from your system. >> > -- <http://www.cloud-iq.com/> DAVIDE ANASTASIA | HEAD OF DATA [email protected] RocketSpace, 40-42 Islington High St, London, N1 8EQ WWW.CLOUD-IQ.COM <http://www.cloud-iq.com/> This e-mail and any files transmitted with it are confidential and intended only for the addressee and should not be disclosed, distributed, copied or printed without the prior written agreement of cloud.IQ. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you should not use, disclose, distribute, copy or print. Please notify the sender by e-mail if you have received this e-mail by mistake and delete this e-mail from your system immediately. Any views or opinions expressed in this message are those of the author and do not necessarily represent those of cloud.IQ.
