Re: Parquet Block Size Detection

2016-07-01 Thread Parth Chandra
For metadata, you can use 'parquet-tools dump' and pipe the output to more/less. Parquet dump will print the block (aka row group) and page level metadata. It will then dump all the data so be prepared to cancel when that happens. Setting dfs.blocksize == parquet.blocksize is a very good idea and

Re: Parquet Block Size Detection

2016-07-01 Thread John Omernik
I am looking forward to the MapR 1.7 dev preview because of the metadata user impersonation JIRA fix. "Drill always writes one row group per file." So is this one parquet block? "row group" is a new term to this email :) On Fri, Jul 1, 2016 at 2:09 PM, Abdel Hakim Deneche

Re: Parquet Block Size Detection

2016-07-01 Thread Abdel Hakim Deneche
Just make sure you enable parquet metadata caching, otherwise the more files you have the more time Drill will spend reading the metadata from every single file. On Fri, Jul 1, 2016 at 11:17 AM, John Omernik wrote: > In addition > 7. Generally speaking, keeping number of files

Re: Parquet Block Size Detection

2016-07-01 Thread Abdel Hakim Deneche
some answers inline: On Fri, Jul 1, 2016 at 10:56 AM, John Omernik wrote: > I looked at that, and both the meta and schema options didn't provide me > block size. > > I may be looking at parquet block size wrong, so let me toss out some > observations, and inferences I am

Re: Parquet Block Size Detection

2016-07-01 Thread John Omernik
In addition 7. Generally speaking, keeping number of files low, will help in multiple phases of planning/execution. True/False On Fri, Jul 1, 2016 at 12:56 PM, John Omernik wrote: > I looked at that, and both the meta and schema options didn't provide me > block size. > > I

Re: Parquet Block Size Detection

2016-07-01 Thread John Omernik
I looked at that, and both the meta and schema options didn't provide me block size. I may be looking at parquet block size wrong, so let me toss out some observations, and inferences I am making, and then others who know the spec/format can confirm or correct. 1. The block size in parquet is

Re: Parquet Block Size Detection

2016-07-01 Thread Parth Chandra
parquet-tools perhaps? https://github.com/Parquet/parquet-mr/tree/master/parquet-tools On Fri, Jul 1, 2016 at 5:39 AM, John Omernik wrote: > Is there any way, with Drill or with other tools, given a Parquet file, to > detect the block size it was written with? I am copying

Parquet Block Size Detection

2016-07-01 Thread John Omernik
Is there any way, with Drill or with other tools, given a Parquet file, to detect the block size it was written with? I am copying data from one cluster to another, and trying to determine the block size. While I was able to get the size by asking the devs, I was wondering, is there any way to