Recently, bloom filter index is added to ORC which is much more accurate in row 
group elimination than min/max based index.

Thanks
Prasanth

> On Jul 16, 2015, at 9:07 AM, Thomas Abeler <[email protected]> wrote:
> 
> Hey,
> 
>  
> 
> i have an question about how indexing in ORC works
> 
>  
> 
> The way I understood ORC indexing is, that ORC keeps statistics (min, max, 
> sum) about the rows every 10'000 rows (by default )and if I query the data it 
> looks at the statistics to figure out if it needs to read the row chunk or 
> not.
> 
>  
> 
> If that's true - is it possible to build an index on an ORC file that is more 
> similar to an database index - meaning that i want to create another sorted 
> data structure which holds the field value and a pointer to the record it 
> relates to.
> 
>  
> 
> The problem i have is that i have a huge dataset. >300TB and 69 columns. 
> There is no 'key' column that gets frequently queried and i would like to 
> perform ad-hoc queries on nearly every of these columns. I think building an 
> index on ever column would be a good approach to get this ability.
> 
>  
> 
> Regards,
> 
> Thomas
> 

Reply via email to