Re: question for pre-filter parquet file data

Steven Phillips Tue, 27 Jan 2015 14:54:51 -0800

Our parquet reader doesn't currently have filter pushdown, but this is
something we will be adding in the near future. Once that work is done, we
will be able to skip entire pages as you describe.


Also, could you file a jira for the TableStatsCalculator bug?

On Tue, Jan 27, 2015 at 12:23 AM, 蔡自强(伏念) <[email protected]>
wrote:

>
> Hi  dear drill devloper,    Now we are deploy the 0.7 version drill for
> statistics analysis. I found that the parquet file store the column summary
> info in pageheader (like min,max,count and so on), but in the datareader
> these info seems not to be used for pre-filtering files. For example, when
> I search the records that attribute_A = 10, if the column's
> (min,max) =(1,9) , skip to scan the data seems the best choice. I want to
> check if drill will do this operation in analysis
> process.btw：In TableStatsCalculator.getRegionSizeInBytes method,
> if avgRowSizeInBytes is to large, the return value will be out of int
> range. So the code should be fixed like "return
> ((long)avgRowSizeInBytes)*1024L*1024L".
>   Thanks&Regards




-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: question for pre-filter parquet file data

Reply via email to