The page indices should solve a large part of this problem, but I can
definitely come up with examples where the page indices aren't sufficient
to avoid most materialisation if we have a predicate on an unsorted column.

E.g. if you have a predicate on a state column with 50 distinct values (I'm
being US-centric).

  select * from sales where state = 'MI'

Suppose there is some amount of locality to the data and on average you get
2 states per data page. You're probably only going to be able to filter out
~50% of pages using min-max filters since 'MI' will lie in-between many
pairs of states. Whereas if you scanned the 'state' column and materialized
the other columns lazily, you could filter out a large majority of the data
before materialising the other columns.

On Tue, Mar 20, 2018 at 9:20 AM, Alexander Behm <[email protected]>
wrote:

> I think we do eventually want to support it. For highly selective queries
> the existing dictionary and min/max filtering can already be very
> effective. In addition, we plan to add indexes for finer-grained page
> pruning. See https://issues.apache.org/jira/browse/IMPALA-5842
>
> After all those improvements, it's not clear what the additional benefit
> of later materialization is going to be in practice.
>
> Do you have a case in mind that specifically requires late materialization
> to work well?
>
> On Tue, Mar 20, 2018 at 12:47 AM, Antoni Ivanov <[email protected]>
> wrote:
>
>> Hi,
>>
>>
>>
>> You can ignore my question, Found the relevant JIRA -
>> https://issues.apache.org/jira/browse/IMPALA-2017 So I guess the answer
>> is not yet.
>>
>>
>>
>> Regards,
>>
>> Antoni
>>
>>
>>
>> *From:* Antoni Ivanov
>> *Sent:* Tuesday, March 20, 2018 9:45 AM
>> *To:* '[email protected]' <[email protected]>
>> *Subject:* Does Impala supports or plan to support Late Materialization
>>
>>
>>
>> I don’t mean partition pruning but as described in
>>
>> https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-re
>> dshift-introduces-late-materialization-for-faster-query-processing/
>>
>>
>>
>> It basically pre-fetches first the filter columns and then after applying
>> the filter it fetches only the data from the rest of columns only if filter
>> applies.
>>
>>
>>
>> Thanks
>>
>
>

Reply via email to