Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-16 Thread Matt Casters
Kettle indeed uses POI for xlsx but you can configure it in the Excel Input step. Kettle on Apache Beam would read the file(s) in a single thread as discussed earlier on the user beam mailing list. You can download a version with Beam over here: http://www.kettle.be/ Cheers, Matt --- Matt Caster

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-16 Thread Pablo Estrada
Hm I am not very familiar with POI, but if its transforms are able to take in a file descriptor, you should be able to use FileIO.match()[0] to find your files (local, or in GCS/S3/HDFS); and FileIO.readMatches()[1] to get file descriptors for these files. If the POI libraries require the files to

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Henrique Molina
Hi Pablo , Thanks for your attention, I so sorry, my bad written "Cs extension " I did means .csv extension ! The example like this: load-csv-file-from-google-cloud-storage

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Pablo Estrada
Hello Henrique, I am not aware of existing Beam transforms specifically used for reading in XLSX data. Can you share what you mean by "examples related with Cs extension"? I am aware of some Python libraries foir this sort of thing[1]. You could use the FileIO transforms in the Python SDK to find

Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Henrique Molina
Hello I would like to use best practices from Apache Beams to read Xlsx. however I found examples only related with Cs extension. someone there is sample using ParDo to Collect all columns and sheets from Excel xlsx ? Afterwards I will put into google Big query. Thanks & Regards