Jack, Raj
I agree. The main point I feel is that if you follow statistics
theory, which a good part of our modern technology relies on, you will test
a fixed number of samples, rather than a percentage of the table rows.
For a small table, you may have to sample the entire table to get
results that work. As a real-life example, you wouldn't sample 10 US
Senators and expect those results to be accurate. No, you would simply
survey each Senator (and hope they don't change their mind). Similarly,
don't just sample 30 percent of a 1,000-row table. Sample all 1,000 rows.
For a large table, sampling a percentage would oversample and be a
wasted effort. If you have a million-row table and a hundred-million-row
table, the same sample size will produce results nearly as accurate for
both. That is why you don't see nearly as many state political polls. It is
nearly as much expense to accurately sample the citizens in a state as it is
to sample all the citizens in the US
Someone asked about skewed data. Well, that is the reason you
perform a RANDOM sample. That is the key point, and what produces many
real-life statistical failures. A classic example is the Truman-Dewey
presidential race in 1948. The pioneer pollsters used random samples of
phone numbers and confidently predicted Dewey's victory. What they neglected
was that wealthier people had telephones in greater proportion than poor
people. So their sample was skewed, which produced bad results. Here, we're
betting on Oracle's statement that the sample is truly random.
Now, if you want a more accurate result, you will sample more. But
you aren't increasing the sample size because the table is larger, but to
increase the accuracy. And to compensate for any other inaccuracies.
Just a thought, if you're responsible for a data warehouse, you may
want to consider studying some basic statistics. Unfortunately most computer
science curriculums don't require a class in statistics. In fact, since
polls form a lot of our political discussion, it wouldn't hurt to require
all citizens to have some statistical training. It might make it harder for
politicians to mis-construe statistical results. However, it is hard enough
to get people just to vote, so I suppose that one isn't going to fly.
Dennis Williams
DBA
Lifetouch, Inc.
[EMAIL PROTECTED]
-----Original Message-----
Sent: Wednesday, May 22, 2002 9:39 AM
To: Multiple recipients of list ORACLE-L
Jack,
Nielsen Ratings (the TV Rating company) monitors about 5000 people (and
their TV watching habits) to supply ratings for all the shows on most of the
networks for the whole United States. So, as long as you have a working and
proven statistical model, and a good sample, it works. How do I know, ever
seen anyone challenging Nielsen Ratings for a show? I haven't.
Raj
______________________________________________________
Rajendra Jamadagni MIS, ESPN Inc.
Rajendra dot Jamadagni at ESPN dot com
Any opinion expressed here is personal and doesn't reflect that of ESPN Inc.
QOTD: Any clod can have facts, but having an opinion is an art!
--
Please see the official ORACLE-L FAQ: http://www.orafaq.com
--
Author: DENNIS WILLIAMS
INET: [EMAIL PROTECTED]
Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
San Diego, California -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from). You may
also send the HELP command for other information (like subscribing).