> [EMAIL PROTECTED] writes: >> On a very basic level, why bother sampling the whole table at all? Why >> not >> check one block and infer all information from that? Because we know >> that >> isn't enough data. In a table of 4.6 million rows, can you say with any >> mathmatical certainty that a sample of 100 points can be, in any way, >> representative? > > This is a statistical argument, not a rhetorical one, and I'm not going > to bother answering handwaving. Show me some mathematical arguments for > a specific sampling rule and I'll listen. >
Tom, I am floored by this response, I am shaking my head in disbelief. It is inarguable that increasing the sample size increases the accuracy of a study, especially when diversity of the subject is unknown. It is known that reducing a sample size increases probability of error in any poll or study. The required sample size depends on the variance of the whole. It is mathmatically unsound to ASSUME any sample size is valid without understanding the standard deviation of the set. http://geographyfieldwork.com/MinimumSampleSize.htm Again, I understand why you used the Vitter algorithm, but it has been proven insufficient (as used) with the US Census TIGER database. We understand this because we have seen that the random sampling as implemented has insufficient information to properly characterize the variance in the data. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]