Vivek:

Skewed data is data that is not evenly distributed in a column. Let me try
to explain. Let's say you have a shipping company that does business
throughout the US (in all 50 states) and you have a shipping table with
address to customer that has a STATE column with the 2 letter state
abbreviation as part of their address. You have done business in all 50
states, but most of your business comes from the 6 states that are in your
region of the country. Therefore, statistically speaking, those 6 state
abbreviations are represented an abnormally high number of times; they may
represent as much as 80% of all the shipping orders. Without evidence to the
contrary, the CBO assumes evenly distributed data when it works. This would
mean, for the purposes of our example, that the CBO would assume that each
state is represented 2% of the time and that these 6 states taken together
would represent 12% of the total, not 60%.

So, let's say I run a query selecting all the ordered shipped to one of
those states. Something like "select * from orders where state='NY'. Now,
the CBO is going to assume that the state 'NY' represents 2% of the total
orders on average and do use the index on the "state" column. In fact,
however, much of our business comes from NY and it represents 35% of all the
orders. So, the CBO should go for a full table scan. But it doesn't know
that--it can only assume an even distribution of the data from the normal
statistics collected. A histogram on the "state" column would provide the
CBO with the additional information it needs to know that, in this case, the
expected number of rows to return from the query would represent a large
percentage of the total number of rows and that it should do a full table
scan.

So, as has been said earlier, you want to look for data in indexed columns
that are used in where clauses as literal predicates where the data is not
evenly distributed in the column. These generally make good candidates for
histograms.

Hope this makes sense.

--

Jon Walthour, OCDBA
Oracle DBA
Cincinnati, Ohio


----- Original Message -----
To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]>
Sent: Friday, July 06, 2001 1:28 PM



Can you possibly Detail what you mean by Skewed Data ?

> -----Original Message-----
> From: Jon Walthour [SMTP:[EMAIL PROTECTED]]
> Sent: Friday, July 06, 2001 8:25 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: RE: RULE versus CHOOSE
>
>
>
> Tom:
>
> I never did any official benchmark studies, per se. I studied
> the CBO/histograms about 4 months ago when we were converting
> an app over from Oracle 7 to 8i. The new version of the app had
> queries that were very different from its earlier version and
> so performance on 8i as compared to 7 was dramatically worse.
> Of course, the app owners blamed the database. In the course
> of my defense of the db, I discovered that the data (being primarily
> composed of case studies) was severely skewed to the more recent
> dates (i.e., the further back you went, the sparser the number
> of cases). I discovered that the optimizer was doing a lot of
> range scans for queries when it should have been doing full table
> scans according to the CBO thresholds. Histograms on certain
> date fields throughout the schema dropped times on certain large
> report queries from 30 minutes to under 2.
>
> That's all I know. I can't give you an hard empirical evidence,
> just anecdotal evidence that, when properly used, histograms
> do seem to have dramatic impacts.
>
> --
>
> Jon Walthour, OCDBA
> Oracle DBA
> Computer Horizons
> Cincinnati, Ohio
>
>
> >--- Original Message ---
> >From: "Terrian, Tom" <[EMAIL PROTECTED]>
> >To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]>
> >Date: 7/6/01 9:20:24 AM
> >
>
> >Jon,
> >
> >Great write up.
> >
> >Several times you mentioned creating Histograms for skewed data
> distributions.
> >I am just curious if you have ever studied the performance impact
> with and
> >without them?  We used to maintain histograms but when we studied
> the
> >performance impact (with and without them) we determined that
> there was very
> >little benefit with histograms.  The down side with them is
> that they
> >drastically increased the amount of time it took to analyze
> the tables at night.
> >We decided to do without them.  Have you ever studied their
> benefits verse
> >drawbacks?
> >
> >Tom
> >
> >Tom Terrian
> >Oracle DBA
> >WPAFB - DAASC
> >[EMAIL PROTECTED]
> >937-656-3844
> >
>
>
--
Please see the official ORACLE-L FAQ: http://www.orafaq.com
--
Author: VIVEK_SHARMA
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).



-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jon Walthour
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Reply via email to