Just be aware that you should insert the data sorted at least on the most 
discrimating column of your where clause

> On 19 Jan 2016, at 17:27, Owen O'Malley <omal...@apache.org> wrote:
> 
> It has both. Each index has statistics of min, max, count, and sum for each 
> column in the row group of 10,000 rows. It also has the location of the start 
> of each row group, so that the reader can jump straight to the beginning of 
> the row group. The reader takes a SearchArgument (eg. age > 100)  that limits 
> which rows are required for the query and can avoid reading an entire file, 
> or at least sections of the file.
> 
> .. Owen
> 
>> On Tue, Jan 19, 2016 at 7:50 AM, Ashok Kumar <ashok34...@yahoo.com> wrote:
>> Hi,
>> 
>> I have read some notes on ORC files in Hive and indexes.
>> 
>> The document describes in the indexes but makes reference to statistics
>> 
>> Indexes
>>  
>>  
>> 
>>  
>>  
>>  
>>  
>>  
>> Indexes
>> Indexes ORC provides three level of indexes within each file: file level - 
>> statistics about the values in each column across the entire file
>> View on orc.apache.org
>> Preview by Yahoo
>>  
>> 
>> I am confused as it is mixing up indexes with statistics. Can someone 
>> clarify these.
>> 
>> Thanks
> 

Reply via email to