Fantastic research, John! ----- Original Message ----- To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]> Sent: Thursday, May 23, 2002 7:13 PM
> Jack, > > I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper and > arrived at the following: (cut-and-paste of relevant parts of the paper) > > --- Begin Quote --- > MYTH: "COMPUTE IS BETTER THAN ESTIMATE" > This one generates an endless debate actually, so we will not take a firm > stand either way. Rather, we will present some figures that throw some light > on the issue and allow us to step back and look at the situation. The > problem with COMPUTE is that it has to scan the entire table, sort it and > figure out the exact data distribution. On the other hand, ESTIMATE steps > through samples of the data, sorts and analyzes only a portion of the data. > > In a recent test for the effectiveness of COMPUTE versus ESTIMATE on a > static clone of a reasonably large Oracle Apps database, the statistics were > generated and stored for both COMPUTE and ESTIMATE. The Database consisted > of about 3,300 tables and 6,000 indexes and occupied approximately 120 Gb. > The ESTIMATE percentage was defaulted to 10% and no activity other than > ANALYZE was allowed on this clone during the entire period. Table statistics > including row count, average row length and blocks occupied were analyzed. > This showed that there were some differences in row count and average row > length on 321 of these tables. Row count differences ranged from a value of > 53 row less in the ESTIMATE of a table containing 205,743 rows (0.025%) all > the way up-to a count difference of 101,704 in 13,311,090 rows (0.76%). Even > assuming a difference of a maximum of 5% in these scenarios, you are not far > off the goal. Further analysis showed that a smaller average row length > coupled with a small table produced larger variations than was usually seen. > > > The differences however, were far more pronounced in Indexes - differences > of upto 300% were noticed. Further analysis showed that this was related to > the percentage of deleted leaf rows in the index. If this percentage is > high, the possibility of ESTIMATE going wrong was also high, as the > deletions are not factored in correctly. This was especially true if the > deletions occurred in leaf blocks that were probably not involved in the > ESTIMATE. When the deleted leaf rows was low or even nil within the index, > the percentage difference was much lower, in the range of 4 to 5%. > > The real myth killer is the cost of COMPUTE versus ESTIMATE - COMPUTE > required 66,553,308 reads versus 38,951,158 reads for ESTIMATE - almost 70% > more reads for COMPUTE. The sorting involved in determining the averages and > data distribution was a clincher - COMPUTE processed 4,263,724,259 rows in > sorting operations while ESTIMATE sorted just 18,025,069 - i.e. about 235% > more rows were sorted for the COMPUTE operation. The last nail in the coffin > was the time taken to COMPUTE statistics - about 36 hours against the time > to ESTIMATE of just 12 hours. > > While the figures speak for themselves, we will offer some general advice to > the cautious: ESTIMATE on tables and COMPUTE on Indexes. Columns are > analyzed by default, but serve no useful purpose other than showing data > spread. Hence, you could ANALYZE only Tables and Indexed columns alone. An > identified list of 'small' tables could also be COMPUTED rather than > ANALYZED. This advice is given because ESTIMATE on a table comes close as > far as row count goes, while COMPUTE on Indexes generates a more accurate > picture of both data distribution as well as object size statistics. Testing > the effectiveness of COMPUTE versus ANALYZE is simple and provides you with > figures that you can use to decide the strategy for your situation. > > Before we move to the next topic, keep in mind that an ANALYZE/ESTIMATE with > a sample size greater than or equal to 50% will result in COMPUTE. > > --- End Quote --- > > The problem is that this simple mathematical model looks only at object > sizes and did not look at Column spread and sensitivity. However, I believe > that the combination of ESTIMATE on Tables and COMPUTE on Indexes would > catch most of it. > > As always, YMMV! > > John Kanagaraj > Oracle Applications DBA > DBSoft Inc > (W): 408-970-7002 > > The manuals for Oracle are here: http://tahiti.oracle.com > The manual for Life is here: http://www.gospelcom.net > > ** The opinions and statements above are entirely my own and not those of my > employer or clients ** > > > > -----Original Message----- > > From: Jack Silvey [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, May 21, 2002 2:44 PM > > To: Multiple recipients of list ORACLE-L > > Subject: Statistical sampling and representative stats collection > > > > > > Hi all, > > > > Did some investigation about statistical sampling this > > weekend since we are going to optimize our analyze > > process soon, and would like some input from all you > > orabrains on this one. > > > > I opened a TAR with Oracle asking about the sampling > > algorithm of stats collection, and they assured me it > > was random. > > > > The goal of analyze...estimate is to collect stats > > that are representative of the data population as a > > whole using a given sample set. Since analyzing tables > > takes up resources (does sorts to order the data for > > investigation) the fewer rows you use in estimate, the > > less system resources you use and the faster the > > analyze will go. > > > > Since our goal is to get as small a sample as possible > > and still have stats that are representative, my > > contention is that we could start by finding what the > > margin of error will be for each sample size and gauge > > our tolerance for it. > > > > One standard way to calculate margin of error for a > > given sample is by using this formula: > > > > M = 1/SQRT(N) > > > > where: > > M = margin of error > > N=sample size > > > > So, if we can tolerate stats that have a 1% a margin > > of error (will deviate from representative of the > > whole population by 1%), our sample size should be > > 10,000 rows. > > > > Also, a corollary (not a toyota corollary, though) to > > this would be that the more rows you add to your > > sample, the closer to representative your sample will > > be. So, in order to test whether your sample is > > representative enough, you could analyze using either > > estimate 49% or compute, take a snapshot of the stats, > > and then compare the stats from a 10,000 row estimate > > to those. Then, add rows to your estimate until you > > are satisfied with the stats. > > > > This of course is a pie in the sky mathematical model, > > but seems like a reasonable place to start with > > testing. > > > > Input? Input? Buhler? Buhler? > > > > > > /jack silvey > > > > > > __________________________________________________ > > Do You Yahoo!? > > LAUNCH - Your Yahoo! Music Experience > > http://launch.yahoo.com > > -- > > Please see the official ORACLE-L FAQ: http://www.orafaq.com > > -- > > Author: Jack Silvey > > INET: [EMAIL PROTECTED] > > > > Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 > > San Diego, California -- Public Internet access / Mailing Lists > > -------------------------------------------------------------------- > > To REMOVE yourself from this mailing list, send an E-Mail message > > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > > the message BODY, include a line containing: UNSUB ORACLE-L > > (or the name of mailing list you want to be removed from). You may > > also send the HELP command for other information (like subscribing). > > > -- > Please see the official ORACLE-L FAQ: http://www.orafaq.com > -- > Author: John Kanagaraj > INET: [EMAIL PROTECTED] > > Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 > San Diego, California -- Public Internet access / Mailing Lists > -------------------------------------------------------------------- > To REMOVE yourself from this mailing list, send an E-Mail message > to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in > the message BODY, include a line containing: UNSUB ORACLE-L > (or the name of mailing list you want to be removed from). You may > also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Tim Gorman INET: [EMAIL PROTECTED] Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).