@Antoni, Check the blog below, it has examples on how to optimize the schema for selective queries.
https://blog.cloudera.com/blog/2017/12/faster-performance-for-selective-queries/ On Tue, Mar 20, 2018 at 3:30 PM, Tim Armstrong <[email protected]> wrote: > The page indices should solve a large part of this problem, but I can > definitely come up with examples where the page indices aren't sufficient > to avoid most materialisation if we have a predicate on an unsorted column. > > E.g. if you have a predicate on a state column with 50 distinct values > (I'm being US-centric). > > select * from sales where state = 'MI' > > Suppose there is some amount of locality to the data and on average you > get 2 states per data page. You're probably only going to be able to filter > out ~50% of pages using min-max filters since 'MI' will lie in-between many > pairs of states. Whereas if you scanned the 'state' column and materialized > the other columns lazily, you could filter out a large majority of the data > before materialising the other columns. > > On Tue, Mar 20, 2018 at 9:20 AM, Alexander Behm <[email protected]> > wrote: > >> I think we do eventually want to support it. For highly selective queries >> the existing dictionary and min/max filtering can already be very >> effective. In addition, we plan to add indexes for finer-grained page >> pruning. See https://issues.apache.org/jira/browse/IMPALA-5842 >> >> After all those improvements, it's not clear what the additional benefit >> of later materialization is going to be in practice. >> >> Do you have a case in mind that specifically requires late >> materialization to work well? >> >> On Tue, Mar 20, 2018 at 12:47 AM, Antoni Ivanov <[email protected]> >> wrote: >> >>> Hi, >>> >>> >>> >>> You can ignore my question, Found the relevant JIRA - >>> https://issues.apache.org/jira/browse/IMPALA-2017 So I guess the answer >>> is not yet. >>> >>> >>> >>> Regards, >>> >>> Antoni >>> >>> >>> >>> *From:* Antoni Ivanov >>> *Sent:* Tuesday, March 20, 2018 9:45 AM >>> *To:* '[email protected]' <[email protected]> >>> *Subject:* Does Impala supports or plan to support Late Materialization >>> >>> >>> >>> I don’t mean partition pruning but as described in >>> >>> https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-re >>> dshift-introduces-late-materialization-for-faster-query-processing/ >>> >>> >>> >>> It basically pre-fetches first the filter columns and then after >>> applying the filter it fetches only the data from the rest of columns only >>> if filter applies. >>> >>> >>> >>> Thanks >>> >> >> >
