How many columns do you need from the big file? Also how CPU / memory intensive are the computations you want to perform? Alexander Czech <alexander.cz...@googlemail.com> schrieb am Mo. 27. Nov. 2017 um 10:57:
> I want to load a 10TB parquet File from S3 and I'm trying to decide what > EC2 instances to use. > > Should I go for instances that in total have a larger memory size than > 10TB? Or is it enough that they have in total enough SSD storage so that > everything can be spilled to disk? > > thanks >