Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-31 Thread Robert Haas
On Sun, Oct 30, 2011 at 8:02 AM, Kääriäinen Anssi anssi.kaariai...@thl.fi wrote: Table size is around 600MB, index size is around 350MB and VM on-disk size is 16kB with default fillfactor. With fillfactor = 10, the VM size is 104 KB, and table size is around 6GB.  The index size is the same.

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-31 Thread Anssi Kääriäinen
On 10/31/2011 02:44 PM, Robert Haas wrote: What I think you're probably measuring here (oprofile would tell us for sure) is that once the size of the table goes beyond about half a gigabyte, it will have more than one page in the visibility map. The index-only scan code keeps the most recently

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-31 Thread Robert Haas
On Mon, Oct 31, 2011 at 9:51 AM, Anssi Kääriäinen anssi.kaariai...@thl.fi wrote: Stupid question, but why not keep the whole VM pinned? It might be that keeping more than one VM page pinned is a good idea, but we'd have to think carefully about it. For example, if we pin too many pages in

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-30 Thread Kääriäinen Anssi
Quoting Robert Haas: I tried this on my MacBook Pro this morning, using pgbench -i -s 500 to create a database about 7.5GB in size, and then using SELECT sum(1) FROM pgbench_accounts as a test query, on a build WITHOUT --enable-cassert. This machine has 4GB of memory, and I set shared_buffers =

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-30 Thread Kääriäinen Anssi
Sorry, I forgot to include the version used some information about my setup: PostgreSQL version: Git HEAD as of: Date: Fri Oct 28 21:18:36 2011 -0400 Commit: 51eba98cf4595e90730dedd9305da8aa84b649ee Compiled with defaults, (only change --with-pgport = 5431). I used default settings,

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Bruce Momjian
Tom Lane wrote: I wonder how trustworthy the measure of the visibilitymap_test call site as a consumer of cycles really is. I've frequently noticed that oprofile blames remarkably large fractions of the runtime on individual statements that appear to be quite trivial. I'm not sure if that

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Robert Haas
On Mon, Oct 24, 2011 at 4:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Oct 24, 2011 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder how trustworthy the measure of the visibilitymap_test call site as a consumer of cycles really is. I'm

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I also tried changing the BufferIsValid() tests in visibilitymap_test() to use BufferIsInvalid() instead, with the sense of the tests reversed (see attached vismap-test-invalid.patch). Since BufferIsInvalid() just checks for InvalidBuffer instead of

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Robert Haas
On Fri, Oct 28, 2011 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I also tried changing the BufferIsValid() tests in visibilitymap_test() to use BufferIsInvalid() instead, with the sense of the tests reversed (see attached vismap-test-invalid.patch).

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 28, 2011 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hmm.  I wonder whether it wouldn't be better to get rid of the range checks in BufferIsValid, or better convert them into Asserts.  It seems less than intuitive that BufferIsValid and

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Robert Haas
On Fri, Oct 28, 2011 at 3:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 28, 2011 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hmm.  I wonder whether it wouldn't be better to get rid of the range checks in BufferIsValid, or better convert them

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 28, 2011 at 3:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, I find that unlikely as well.  But leaving Asserts in place would tell us. OK. Should I go do that, or are you all over it? Go for it. regards, tom

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-28 Thread Robert Haas
On Fri, Oct 28, 2011 at 3:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 28, 2011 at 3:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, I find that unlikely as well.  But leaving Asserts in place would tell us. OK.  Should I go do that, or are

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-25 Thread Wolfgang Wilhelm
Von: Tom Lane t...@sss.pgh.pa.us An: Robert Haas robertmh...@gmail.com Cc: Kevin Grittner kevin.gritt...@wicourts.gov; pgsql-hackers@postgresql.org Gesendet: 21:35 Montag, 24.Oktober 2011 Betreff: Re: [HACKERS] So, is COUNT(*) fast now? Robert Haas robertmh...@gmail.com writes

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Robert Haas
On Sun, Oct 23, 2011 at 7:01 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Oct 21, 2011 at 12:52 PM, Robert Haas robertmh...@gmail.com wrote: Also, this line is kind of expensive:        if (!visibilitymap_test(scandesc-heapRelation,                                

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: I had wondered whether it'd be worth optimizing that along the lines of slot_getallattrs(). But most indexes probably have only one column, or anyway not enough to make for a useful savings. From a heavily-used production database: cir= select indnatts,

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I had wondered whether it'd be worth optimizing that along the lines of slot_getallattrs(). But most indexes probably have only one column, or anyway not enough to make for a useful savings. From a

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Yeah, TOAST indexes are 2-column. It would be best to exclude those from your counts, since it seems pretty unlikely that anyone will care how fast nodeIndexonlyscan.c is for scans on toast tables. User indexes (excluding toast): indnatts | count

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Kevin Grittner
Copy/paste problems -- the first set includes the system tables except for toast. User tables would be the difference between the results below. Sorry. -Kevin Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane t...@sss.pgh.pa.us wrote: Yeah, TOAST indexes are 2-column. It

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Robert Haas
On Mon, Oct 24, 2011 at 2:15 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: It doesn't look to me like the mean is above 2 (unless you have many fewer toast tables than I suspect), so trying to optimize many-column cases isn't going to help. The mean is 2.4 (give or take a little

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: But even though Tom's statement that most indexes are one column might be a slight exaggeration, I suspect it probably is true that the optimizations he's talking about for large numbers of columns won't produce any material benefit even for a 3 or 4

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Josh Berkus
On 10/24/11 12:35 PM, Tom Lane wrote: Your point about people trying to create wider indexes to exploit index-only scans is an interesting one, but I think it's premature to optimize on the basis of hypotheses about what people might do in future. I don't think that this is hypothetical at

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Robert Haas
On Mon, Oct 24, 2011 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Your point about people trying to create wider indexes to exploit index-only scans is an interesting one, but I think it's premature to optimize on the basis of hypotheses about what people might do in future. Well, I don't

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Kevin Grittner
Josh Berkus j...@agliodbs.com wrote: On 10/24/11 12:35 PM, Tom Lane wrote: Your point about people trying to create wider indexes to exploit index-only scans is an interesting one, but I think it's premature to optimize on the basis of hypotheses about what people might do in future. I

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Oct 24, 2011 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder how trustworthy the measure of the visibilitymap_test call site as a consumer of cycles really is. I'm not sure either. I guess we could try short-circuiting

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-23 Thread Jeff Janes
On Fri, Oct 21, 2011 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Oct 21, 2011 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think HeapTupleSatisfiesMVCC is probably being skipped anyway in this case, since all the heap pages should be PD_ALL_VISIBLE. Proves my point ;-)

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: count(*) and sum(1) do different things internally, and in my hands sum(1) is ~10% slower. I don't know how to dump the output of ExecBuildProjectionInfo into a human readable form, so I don't know the basis of the difference. But I wonder if using

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-23 Thread Jeff Janes
On Fri, Oct 21, 2011 at 12:52 PM, Robert Haas robertmh...@gmail.com wrote: Also, this line is kind of expensive:        if (!visibilitymap_test(scandesc-heapRelation,                                ItemPointerGetBlockNumber(tid),                                node-ioss_VMBuffer)) Around

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Andres Freund
On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread desmodemone
2011/10/22 Andres Freund and...@anarazel.de On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Tom Lane
Andres Freund and...@anarazel.de writes: On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's not touching six times less data. It's touching the exact same number of tuples either way, just index tuples in one

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Andres Freund
On Saturday, October 22, 2011 05:20:26 PM Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's not touching six times less data. It's touching the exact

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Tom Lane
Andres Freund and...@anarazel.de writes: On Saturday, October 22, 2011 05:20:26 PM Tom Lane wrote: Huh? In the case he's complaining about, the index is all in RAM. Sequentiality of access is not an issue (at least not at the page level --- within a page I suppose there could be cache-line

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Robert Haas
On Fri, Oct 21, 2011 at 10:57 PM, Jeff Janes jeff.ja...@gmail.com wrote: Yeah, but it works out to fewer pages. But since those pages are already in RAM, why would it matter all that much?  (Other than in the case of highly concurrent access, which you don't seem to be testing?) Well,

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread karavelov
- Цитат от Tom Lane (t...@sss.pgh.pa.us), на 22.10.2011 в 19:19 - Andres Freund and...@anarazel.de writes: On Saturday, October 22, 2011 05:20:26 PM Tom Lane wrote: Huh? In the case he's complaining about, the index is all in RAM. Sequentiality of access is not an issue (at least not

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 10:57 PM, Jeff Janes jeff.ja...@gmail.com wrote: If count(*) could cause the index-only scan to happen in physical order of the index, rather than logical order, that might be a big win.  Both for all in memory and for

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Anyhow, here's the scoop. On my desktop machine running F14, running SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of

[HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Robert Haas
Laments at: http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F http://wiki.postgresql.org/wiki/Slow_Counting I tried this on my MacBook Pro this morning, using pgbench -i -s 500 to create a database about 7.5GB in size, and then using SELECT sum(1)

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: That's a bit disappointing - it's now more than a third faster to do the sequential scan, even though the sequential scan has to touch six times as many blocks (at scale factor 20, index is 43 MB, table is 256 MB) all of which are in cache. Of course,

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Robert Haas
On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: That's a bit disappointing - it's now more than a third faster to do the sequential scan, even though the sequential scan has to touch six times as many blocks (at scale factor 20,

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise.  The whole point of the index-only optimization is to avoid I/O.  When you try it on a case

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Robert Haas
On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that touching an index is free, or even cheap, CPU-wise.  The whole point of the

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: What this test case proves is that btree's overhead per index tuple touched is significantly more than the cost of the fastest path through HeapTupleSatisfiesMVCC, which I don't find

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Robert Haas
On Fri, Oct 21, 2011 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think HeapTupleSatisfiesMVCC is probably being skipped anyway in this case, since all the heap pages should be PD_ALL_VISIBLE. Proves my point ;-) ... you're comparing a code path that's been beat on for *years* with one

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Robert Haas
On Fri, Oct 21, 2011 at 3:07 PM, Robert Haas robertmh...@gmail.com wrote: [ oprofile results ] *grovels through the line-by-line results* Hmm, I guess there is a bit of a hotspot in StoreIndexTuple, which is probably being folded into IndexOnlyNext in the per-function timings:

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Anyhow, here's the scoop. On my desktop machine running F14, running SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of oprofile data: 176830 13.0801 postgres postgres ExecProject Hm, that's

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Hmm, I guess there is a bit of a hotspot in StoreIndexTuple, which is probably being folded into IndexOnlyNext in the per-function timings: ExecClearTuple(slot); for (i = 0; i nindexatts; i++) values[i] = index_getattr(itup, i + 1,

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Robert Haas
On Fri, Oct 21, 2011 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Anyhow, here's the scoop.  On my desktop machine running F14, running SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of oprofile data: 176830   13.0801  postgres    

Re: [HACKERS] So, is COUNT(*) fast now?

2011-10-21 Thread Jeff Janes
On Fri, Oct 21, 2011 at 11:14 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't know why you'd imagine that