RE: Function Based Index - Not Used ???

Cary Millsap Mon, 02 Jun 2003 19:39:41 -0700

To clarify my "Book! Book! Book! :)"...
This is my request for Tim to write [another] one.

Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com

Upcoming events:
- Hotsos Clinic 101 in Reykjavik, Ottawa, Dallas, Washington, Denver, Sydney
- Visit www.hotsos.com for schedule details...

-----Original Message-----
Sent: Friday, May 30, 2003 3:40 AM
To: Multiple recipients of list ORACLE-L

Book! Book! Book! :)

Speaking of query optimizers, here are some simple things about Oracle query
optimizers that I think a lot of people miss:

- CBO is just a module that chooses an execution plan
- RBO is another module that chooses an execution plan
- In 8.1.6+, CBO usually does a better job than RBO at computing a good plan
- In 9.2+, CBO almost always does a better job than RBO
- One advantage that CBO has over RBO is that it has more plans to choose
from; RBO, for example, doesn't know about hash joins
- Once a plan has been computed, it doesn't matter what optimizer was used
to choose it; the *execution* of a plan is not performed by a query
optimizer, it's performed by other (more stable) code path in the Oracle
kernel
- Therefore, for example, a full-table scan plan chosen by CBO will execute
no better and no worse than a full-table scan plan chosen by RBO

These may seem self-evident, or maybe even irrelevant, but in my experience,
misunderstanding these things has cause people to get so wrapped around
their own axle that they can't fix their problems.

Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com

Upcoming events:
- Hotsos Clinic 101 in Reykjavik, Ottawa, Dallas, Washington, Denver, Sydney
- Visit www.hotsos.com for schedule details...

-----Original Message-----
Sent: Friday, May 30, 2003 12:30 AM
To: Multiple recipients of list ORACLE-L

JP,

I apologize in advance for the long email, but I think you'll find it
rewarding to read it all the way through...

The CBO is just a mathematical processor, and a rather good one at that.  It
is choosing the best plan given the data it has been given, which is
admittedly often incomplete.

Let's take a look at the following test case, perhaps somewhat similar to
yours:

============== Begin test case #1 =======================
    SQL> create table t1
      2  (
      3          c1      varchar2(30),
      4          c2      number,
      5          c3      number
      6  )       tablespace tools;

    Table created.

    SQL> begin
      2    for i in 1..100000 loop
      3       insert into t1 values(to_char(mod(i,187)),i,mod(i,187));
      4    end loop;
      5  end;
      6  /

    PL/SQL procedure successfully completed.

    SQL> create index i1 on t1(upper(c1)) tablespace tools;

    Index created.

    SQL> analyze table t1 compute statistics;

    Table analyzed.

    SQL> set autotrace on
    SQL>
    SQL> select c1 from t1 where upper(c1) = '10000';

    no rows selected

    Execution Plan
    ----------------------------------------------------------
     0  SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=535 Bytes=1605)
     1  0  TABLE ACCESS (FULL) OF 'T1' (Cost=37 Card=535 Bytes=1605)
============== End test case #1 =======================

OK, so we've reproduced your situation.  Why didn't it use the index?  Are
function-based indexes somehow not working?  Or is the CBO choosing a FULL
table scan instead?

Here is some information to consider:

    * DBA_TABLES.NUM_ROWS = 100000
    * DBA_TABLES.BLOCKS = 232
    * DBA_INDEXES.CLUSTERING_FACTOR = 43197
    * DBA_INDEXES.AVG_LEAF_BLOCKS_PER_KEY = 1
    * DBA_INDEXES.AVG_DATA_BLOCKS_PER_KEY = 231
    * DBA_INDEXES.DISTINCT_KEYS = 187

The NUM_ROWS and BLOCKS are easy to understand.  The CLUFAC we'll come back
to later.  The AVG_LEAF_BLOCKS_PER_KEY is an estimate of how many index-leaf
blocks will be visited for each distinct key value.  The
AVG_DATA_BLOCKS_PER_KEY is the estimate of how many data blocks on the table
will be visited for each distinct key value.

That's a big number -- 231 blocks for each key value!  THIS IS IMPORTANT --
notice that this is almost exactly the same number of blocks in the entire
table!  With this fact in mind, how could the CBO possibly choose an indexed
access plan?

Why is this happening?  Notice how the data values were "scattered" using
the MOD() function?  That causes values to be scattered across all the
blocks of the table, and not "clustered" together onto one patch of blocks.
That's why the CLUSTERING_FACTOR is so large.  It is much "closer" to the
NUM_ROWS value (indicating bad clustering) as opposed to being closer in
value to BLOCKS (indicating good clustering).

All in all, not using the index is a very wise choice by the CBO.

Now, let's change the distribution of data and see what happens...

============== Begin test case #2 =======================
    SQL> truncate table t1;

    Table truncated.

    SQL> drop index i1;

    Index dropped.

    SQL> begin
      2    for i in 1..100000 loop
      3      insert into t1
      4         values(to_char(round(i/187,0)),i, round(i/187,0));
      5    end loop;
      6  end;
      7  /

    PL/SQL procedure successfully completed.

    SQL>
    SQL> create index i1 on t1(upper(c1)) tablespace tools;

    Index created.

    SQL> analyze table t1 compute statistics;

    Table analyzed.

    SQL> set autotrace on
    SQL>
    SQL> select c1 from t1 where upper(c1) = '10000';

    no rows selected

    Execution Plan
    ----------------------------------------------------------
     0  SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=187 Bytes=561)
     1  0  TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=187 Bytes=561)
     2  1    INDEX (RANGE SCAN) OF 'I1' (NON-UNIQUE) (Cost=1 Card=187)

============== End test case #2 =======================

Now we got what you wanted -- to use the index, without hints.  How did this
come about?

Notice how the data in the column C1 got populated in the anonymous PL/SQL
block.  Instead of scattering data values all over the place using the MOD()
function (as in the first case study), this time the data values are grouped
(a.k.a. clustered) together in the same blocks, because we're utilizing the
divisor instead of the remainder.

So, with this new data population, here are new statistics to consider:

    * DBA_TABLES.NUM_ROWS = 100000
    * DBA_TABLES.BLOCKS = 242
    * DBA_INDEXES.CLUSTERING_FACTOR = 354
    * DBA_INDEXES.AVG_LEAF_BLOCKS_PER_KEY = 1
    * DBA_INDEXES.AVG_DATA_BLOCKS_PER_KEY = 1
    * DBA_INDEXES.DISTINCT_KEYS = 536

Now, you can see that the AVG_DATA_BLOCKS_PER_KEY is only 1 (before it was
231), and the CLUSTERING_FACTOR is very close in value to BLOCKS.  Both of
these statistics indicate that an index range scan would be very efficient.
And so, the CBO chooses to use the index this time, without being forced...

----

Does this explain what is going on?  Check some of the same factors in your
own situation and see if they are similar...

----

The fact that you measured the FULL table scan at 5.0 secs elapsed time and
the indexed scan at 0.7 secs only indicates that you probably have a data
skew problem.  My example here show perfect data distribution (i.e. 187
distinct data values each time).  Does your example also have even data
distribution, or should you gather column statistics for all indexed columns
as well, to inform the CBO about "popular" and "unpopular" data values?

Hope this helps...

-Tim

on 5/29/03 7:23 PM, Prem Khanna J at [EMAIL PROTECTED] wrote:

> Thanks Tim.
> 
> But the SELECT returns just 2 of 20,00,000 records.
> and the Time elapsed for Index scan is 0.7 secs where as
> it is 5 secs for FTS.
> 
> Hell a lot of lousy things here Tim.
> just mending it one by one.
> 
> Regards,
> Jp.
> 
> 2003/05/29 22:30:02, Tim Gorman <[EMAIL PROTECTED]> wrote:
>> JP,
>> In the EXPLAIN PLAN, it says "Card=262146", indicating that the query
>> expects to retrieve over a quarter-million rows.  Is that in fact
correct?
>> If so, the CBO is making the correct decision to perform a FULL table
scan.
>> What was the comparison of elapsed times between the two plans, the one
>> being the FULL table scan and the other being the indexed scan?  I bet
the
>> FULL table scan query finished much more quickly...
>> You've got everything configured correctly -- simply a lousy index.  The
CBO
>> has to be coerced into using the index because it is not the best plan to
>> use.
>> Hope this helps...
>> -Tim
> 
> 
> 

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Tim Gorman
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Cary Millsap
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Cary Millsap
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: Function Based Index - Not Used ???

Reply via email to