Re: parquet optimal file structure - flat vs nested

Steve Loughran Wed, 03 May 2017 06:55:53 -0700

> On 30 Apr 2017, at 09:19, Zeming Yu <zemin...@gmail.com> wrote:
> 
> Hi,
> 
> We're building a parquet based data lake. I was under the impression that 
> flat files are more efficient than deeply nested files (say 3 or 4 levels 
> down). Is that correct?
> 
> Thanks,
> Zeming


Where's the data going to live: HDFS or an object store? If it's somewhere like 
Amazon S3 I'd be biased towards the flatter structure as how the client 
libraries mimic treewalking is pretty expensive in terms of HTTP calls, and, as 
those calls all take place during the initial, serialized, query planning 
stage, expensive. 



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: parquet optimal file structure - flat vs nested

Reply via email to