On Tue, Apr 17, 2012 at 5:33 PM, Christopher Browne <cbbro...@gmail.com> wrote: > I get the feeling that this is a somewhat-magical feature (in that > users haven't much hope of understanding in what ways the results are > deterministic) that is sufficiently "magical" that anyone serious > about their result sets is likely to be unhappy to use either SYSTEM > or BERNOULLI.
These both sound pretty useful. "BERNOULLI" is fine for cases where you aren't worried about time dependency on your data. If you're looking for the average or total value of some column for example. SYSTEM just means "I'm willing to trade some unspecified amount of speed for some unspecified amount of accuracy" which presumably is only good if you trust the database designers to make a reasonable trade-off for cases where speed matters and the accuracy requirements aren't very strict. > Possibly the forms of sampling that people *actually* need, most of > the time, are more like Dollar Unit Sampling, which are pretty > deterministic, in ways that mandate that they be rather expensive > (e.g. - guaranteeing Seq Scan). I don't know about that but the cases I would expect to need other distributions would be ones where you're looking at the tuples in a non-linear way. Things like "what's the average gap between events" or "what's the average number of instances per value". These might require a full table scan but might still be useful if the data is going to be subsequently aggregated or joined in ways that would be too expensive on the full data set. But we shouldn't let best be the enemy of the good here. Having SYSTEM and BERNOULLI would solve most use cases and having those would make it easier to add more later. -- greg -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers