Hello all,

I have a large database of Parquet files that I need to convert to csv so they 
can be read into Matlab. (Unless someone knows how to do this automatically 
without an intermediate step?). I am using CTAS to create csv files from a 
directory of Parquet files, which works fine when the total size of the Parquet 
files is smallish (~= 800MB).  In this case, CTAS will automatically generate 
about a few csv files, ~200 - 300MB each, which are easily digestible by Matlab.

However, I have a dataset consisting of 34 Parquet files, totaling 18GB. When I 
run CTAS on this to create the csv files using default parameters, it 
automatically generates 12 csv files, >2GB each. This is approaching the upper 
limit of what Matlab can handle, so I am wondering if there are any parameters 
I can set that will limit the size of the individual csv files that are 
automatically created by the CTAS process. i.e. is there a parameter equivalent 
to the store.parquet.block-size parameter for creating text files?

I am using Drill in embedded mode on a desktop.

Thanks!
Laura


Laura Mariano
Senior Member of the Technical Staff
Draper
[email protected]
(617) 258-2331

________________________________
Notice: This email and any attachments may contain proprietary (Draper 
non-public) and/or export-controlled information of Draper. If you are not the 
intended recipient of this email, please immediately notify the sender by 
replying to this email and immediately destroy all copies of this email.
________________________________

Reply via email to