Qi, Yeah, I can see that. That's a sign that you had a good idea for a project, actually: your idea is interesting enough that people want to debate it. Make a proposal on Monday and our potential mentors will help you refine the idea.
----- Original Message ----- > > > > > > Date: Thu, 22 Mar 2012 13:17:01 -0400 > > Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database > > schema > > From: cbbro...@gmail.com > > To: kevin.gritt...@wicourts.gov > > CC: firstname.lastname@example.org > > > > On Thu, Mar 22, 2012 at 12:38 PM, Kevin Grittner > > <kevin.gritt...@wicourts.gov> wrote: > > > Tom Lane <t...@sss.pgh.pa.us> wrote: > > >> Robert Haas <robertmh...@gmail.com> writes: > > >>> Well, the standard syntax apparently aims to reduce the number > > >>> of > > >>> returned rows, which ORDER BY does not. Maybe you could do it > > >>> with ORDER BY .. LIMIT, but the idea here I think is that we'd > > >>> like to sample the table without reading all of it first, so > > >>> that > > >>> seems to miss the point. > > >> > > >> I think actually the traditional locution is more like > >! ; >> WHERE random() < constant > > >> where the constant is the fraction of the table you want. And > > >> yeah, the presumption is that you'd like it to not actually read > > >> every row. (Though unless the sampling density is quite a bit > > >> less than 1 row per page, it's not clear how much you're really > > >> going to win.) > > > > > > It's all going to depend on the use cases, which I don't think > > > I've > > > heard described very well yet. > > > > > > I've had to pick random rows from, for example, a table of > > > disbursements to support a financial audit. In those cases it has > > > been the sample size that mattered, and order didn't. One > > > interesting twist there is that for some of these financial > > > audits > > > they wanted the probability of a row being selected to be > > > proportional ! to the dollar amount of the disbursement. I don't > > > t hink you can do this without a first pass across the whole data > > > set. > > > > This one was commonly called "Dollar Unit Sampling," though the > > terminology has gradually gotten internationalized. > > http://www.dummies.com/how-to/content/how-does-monetary-unit-sampling-work.html > > > > What the article doesn't mention is that some particularly large > > items > > might wind up covering multiple samples. In the example, they're > > looking for a sample every $3125 down the list. If there was a > > single > > transaction valued at $30000, that (roughly) covers 10 of the > > desired > > samples. > > > > It isn't possible to do this without scanning across the entire > > table. > > > > If you want repeatability, you probably want to instantiate a copy > > of > > enough information to indicate the ordering chosen. That's probably > > something that needs to be captured as part of the work of the > > audit, > > so n! ot only does it need to involve a pass across the data, it > > probably requires capturing a fair bit of data for posterity. > > -- > > When confronted by a difficult problem, solve it by reducing it to > > the > > question, "How would the Lone Ranger handle this?" > > > > > > > The discussion till now has gone far beyond my understanding..... > Could anyone explain briefly what is the idea for now? > The designing detail for me is still unfamiliar. I can only take time > to understand while possible after being selected and put time on it > to read relevant material. > For now, I'm still curious why Neil's implementation is no longer > working? The Postgres has been patched a lot, but the general idea > behind Neil's implementation should still work, isn't it? > Besides, whether this query is needed is still not decided. Seems > this is another hard to decide point. Is it that this topic is still > not so prepared for th e Gsoc yet? If really so, I think I still > have time to switch to other topics. Any suggestion? > > > Thanks. > > Best Regards and Thanks > Huang Qi Victor > Computer Science of National University of Singapore -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers