Dear all,
I Using FTP Client to download some file dynamically , and the file is csv.
( it is working fine)
And the next step I need to open the files, and read lines
Somebody could help me using the good practices in this approach ?
I using Java > Google DataFlow > apache beam 2.9.0
PCollection<String> fileTransfers= pipeline.apply("Transfer FTP", new
DoFn<FtpInput, String>{
@ProcessElement
public void processElement(ProcessContext c) {
ArgsOptions opt= c.getPipelineOptions().as(ArgsOptions.class);
FTPClient ftp = new FTPClient();
ftp.connect(opt.getFtpHost());
ByteArrayOutputStream download = new
ByteArrayOutputStream();
boolean result= ftp.retrieveFile(f.getName(),
download);
saveCSV(download); // save CSV in Storage Google
cloud
c.output("???");
...
})
.apply("Read File", TextIO.read().from("")); // This is not correct ...
.apply("Read CSV LINES ", .....);
.appply("Convert to AVRO".....) ;
.apply("Save in AVRO",...);
What I found at Internet is samples using the easy way:
Start the pipeline with TextIO.read().from("hardcoded path") first.
But I can't find some example in my situations.
Someone already faced this challenge?
Thanks in Advanced