Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-24 Thread Vitalii Tymchyshyn
Hello. As of me, all this "hot" thing really looks like uncertain and dynamic enough. Two things that I could directly use right now (and they are needed in pair) are: 1)Per-table/index/database bufferpools (split shared buffer into parts, allow to specify which index/table/database goes where

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Robert Haas
On Mon, May 23, 2011 at 3:08 PM, Josh Berkus wrote: > >> Well, all of that stuff sounds impractically expensive to me... but I >> just work here. > > I'll point out that the simple version, which just checks for hot tables > and indexes, would improve estimates greatly and be a LOT less > complica

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Josh Berkus
> Well, all of that stuff sounds impractically expensive to me... but I > just work here. I'll point out that the simple version, which just checks for hot tables and indexes, would improve estimates greatly and be a LOT less complicated than these proposals. Certainly having some form of block-

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Cédric Villemain
2011/5/19 Jim Nasby : > On May 19, 2011, at 9:53 AM, Robert Haas wrote: >> On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote: >>> Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like >

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Robert Haas
On Thu, May 19, 2011 at 2:39 PM, Jim Nasby wrote: > On May 19, 2011, at 9:53 AM, Robert Haas wrote: >> On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote: >>> Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X se

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Jim Nasby
On May 19, 2011, at 9:53 AM, Robert Haas wrote: > On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote: >> Jim Nasby wrote: >>> I think the challenge there would be how to define the scope of the >>> hot-spot. Is it the last X pages? Last X serial values? Something like >>> correlation? >>> >>> Hmm

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Robert Haas
On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote: > Jim Nasby wrote: >> I think the challenge there would be how to define the scope of the >> hot-spot. Is it the last X pages? Last X serial values? Something like >> correlation? >> >> Hmm... it would be interesting if we had average relation ac

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-18 Thread Greg Smith
Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that woul

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-17 Thread Jim Nasby
On May 16, 2011, at 10:46 AM, Tom Lane wrote: > Robert Haas writes: >> On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote: >>> Ok, it may not work as well with index'es, since having 1% in cache may very >>> well mean that 90% of all requested blocks are there.. for tables in should >>> be more

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Nathan Boley writes: >> The accesses to an index are far more likely to be clustered than the >> accesses to the underlying table, because the index is organized in a >> way that's application-meaningful and the table not so much. > So, to clarify, are you saying that if query were actually reque

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Nathan Boley
> The accesses to an index are far more likely to be clustered than the > accesses to the underlying table, because the index is organized in a > way that's application-meaningful and the table not so much. So, to clarify, are you saying that if query were actually requesting rows uniformly random

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Jeff Janes writes: > On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh wrote: >> Ok, it may not work as well with index'es, since having 1% in cache may very >> well mean that 90% of all requested blocks are there.. for tables in should >> be more trivial. > Why would the index have a meaningful hot

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Jeff Janes
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh wrote: > > Ok, it may not work as well with index'es, since having 1% in cache may very > well mean that 90% of all requested blocks are there.. for tables in should > be more trivial. Why would the index have a meaningful hot-spot unless the underlyi

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Robert Haas writes: > On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote: >> Ok, it may not work as well with index'es, since having 1% in cache may very >> well mean that 90% of all requested blocks are there.. for tables in should >> be more trivial. > Tables can have hot spots, too. Consid

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Robert Haas
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote: >> To me it seems like a robust and fairly trivial way to to get better >> numbers. The >> fear is that the OS-cache is too much in flux to get any stable numbers >> out >> of it. > > Ok, it may not work as well with index'es, since having 1% i

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Jesper Krogh
On 2011-05-16 06:41, Jesper Krogh wrote: On 2011-05-16 03:18, Greg Smith wrote: You can't do it in real-time. You don't necessarily want that to even if it were possible; too many possibilities for nasty feedback loops where you always favor using some marginal index that happens to be in me

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Jesper Krogh
On 2011-05-16 03:18, Greg Smith wrote: You can't do it in real-time. You don't necessarily want that to even if it were possible; too many possibilities for nasty feedback loops where you always favor using some marginal index that happens to be in memory, and therefore never page in things t

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Greg Smith
Craig Ringer wrote: AFAIK, mincore() is only useful for mmap()ed files and for finding out if it's safe to access certain blocks of memory w/o risking triggering heavy swapping. It doesn't provide any visibility into the OS's block device / file system caches; you can't ask it "how much of this

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Craig Ringer
On 16/05/11 05:45, Cédric Villemain wrote: > 2011/5/15 Josh Berkus : >> disk pages might be in cache. >> However, I think that's beyond feasibility for current software/OSes. > > maybe not :) mincore is available in many OSes, and windows have > options to get those stats too. AFAIK, mincore() is

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Cédric Villemain
2011/5/15 Josh Berkus : > Stuart, > >> I think random_page_cost causes problems because I need to combine >> disk random access time, which I can measure, with a guesstimate of >> the disk cache hit rate. > > See, that's wrong. Disk cache hit rate is what effective_cache_size > (ECS) is for. > > Re

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Robert Haas
On Sun, May 15, 2011 at 2:08 PM, Josh Berkus wrote: >> All true.  I suspect that in practice the different between random and >> sequential memory page costs is small enough to be ignorable, although >> of course I might be wrong. > > This hasn't been my experience, although I have not carefully m

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Josh Berkus
Stuart, > I think random_page_cost causes problems because I need to combine > disk random access time, which I can measure, with a guesstimate of > the disk cache hit rate. See, that's wrong. Disk cache hit rate is what effective_cache_size (ECS) is for. Really, there's several factors which sh

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Josh Berkus
Robert, > All true. I suspect that in practice the different between random and > sequential memory page costs is small enough to be ignorable, although > of course I might be wrong. This hasn't been my experience, although I have not carefully measured it. In fact, there's good reason to sup

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-14 Thread Stuart Bishop
On Sat, May 14, 2011 at 3:13 AM, Josh Berkus wrote: > This is what the combination of random_page_cost and > effective_cache_size ought to supply, but I don't think it does, quite. I think random_page_cost causes problems because I need to combine disk random access time, which I can measure, wi

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 4:13 PM, Josh Berkus wrote: > Instead, we should be fixing the formulas these are based on and leaving > RPC alone. > > For any data page, there are actually four costs associated with each > tuple lookup, per: All true. I suspect that in practice the different between ra

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Cédric Villemain
2011/5/13 Josh Berkus : > >> I guess maybe the reason why it didn't matter for the OP is that - if >> the size of the index page in pages is smaller than the pro-rated >> fraction of effective_cache_size allowed to the index - then the exact >> value doesn't affect the answer. >> >> I apparently ne

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Josh Berkus
> I guess maybe the reason why it didn't matter for the OP is that - if > the size of the index page in pages is smaller than the pro-rated > fraction of effective_cache_size allowed to the index - then the exact > value doesn't affect the answer. > > I apparently need to study this code more. F

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 3:20 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure wrote: >>> The very first thing to check is effective_cache_size and to set it to >>> a reasonable value. > >> Actually, effective_cache_size has no impact on costing except

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Tom Lane
Robert Haas writes: > On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure wrote: >> The very first thing to check is effective_cache_size and to set it to >> a reasonable value. > Actually, effective_cache_size has no impact on costing except when > planning a nested loop with inner index scan. So,

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Kevin Grittner
Robert Haas wrote: > We've talked in the past (and I still think it's a good idea, but > haven't gotten around to doing anything about it) about adjusting > the planner to attribute to each relation the percentage of its > pages which we believe we'll find in cache. Although many > complicated

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure wrote: > On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner > wrote: >> Sok Ann Yap wrote: >> >>> So, index scan wins by a very small margin over sequential scan >>> after the tuning. I am a bit puzzled because index scan is more >>> than 3000 times f

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-28 Thread Jeff Janes
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap wrote: > > I understand the need to tune PostgreSQL properly for my use case. > What I am curious about is, for the data set I have, under what > circumstances (hardware/workload/cache status/etc) would a sequential > scan really be faster than an index

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-28 Thread Samuel Gendler
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap wrote: > On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner > > > I understand the need to tune PostgreSQL properly for my use case. > What I am curious about is, for the data set I have, under what > circumstances (hardware/workload/cache status/etc) wou

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Sok Ann Yap
On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner wrote: > Sok Ann Yap  wrote: >> Kevin Grittner  wrote: > >>> Please show us your overall configuration and give a description >>> of the hardware (how many of what kind of cores, how much RAM, >>> what sort of storage system). > >> Here's the configu

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Sok Ann Yap
On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner wrote: > Sok Ann Yap wrote: > >> Anyway, the overhead of spawning 44 extra queries means that it is >> still better off for me to stick with the original query and tune >> PostgreSQL to choose index scan. > > Maybe, but what is *best* for you is to

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Sok Ann Yap
On Wed, Apr 27, 2011 at 5:37 AM, Kevin Grittner wrote: > Sok Ann Yap wrote: > >> So, index scan wins by a very small margin over sequential scan >> after the tuning. I am a bit puzzled because index scan is more >> than 3000 times faster in this case, but the estimated costs are >> about the same

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Kevin Grittner
Sok Ann Yap wrote: > Anyway, the overhead of spawning 44 extra queries means that it is > still better off for me to stick with the original query and tune > PostgreSQL to choose index scan. Maybe, but what is *best* for you is to tune PostgreSQL so that your costs are accurately modeled, at w

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Kevin Grittner
Sok Ann Yap wrote: > Kevin Grittner wrote: >> Please show us your overall configuration and give a description >> of the hardware (how many of what kind of cores, how much RAM, >> what sort of storage system). > Here's the configuration (this is just a low end laptop): > version | PostgreSQ

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Claudio Freire
On Wed, Apr 27, 2011 at 9:22 AM, Claudio Freire wrote: > The problem there, I think, is that the planner is doing a full join, > instead of a semi-join. Or, rather, computing cost as if it was a full join. I'm not sure why. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresq

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Claudio Freire
On Wed, Apr 27, 2011 at 3:04 AM, Merlin Moncure wrote: > The very first thing to check is effective_cache_size and to set it to > a reasonable value. > The problem there, I think, is that the planner is doing a full join, instead of a semi-join. -- Sent via pgsql-performance mailing list (pgsql

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-26 Thread Merlin Moncure
On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner wrote: > Sok Ann Yap wrote: > >> So, index scan wins by a very small margin over sequential scan >> after the tuning. I am a bit puzzled because index scan is more >> than 3000 times faster in this case, but the estimated costs are >> about the same

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-26 Thread Kevin Grittner
Sok Ann Yap wrote: > So, index scan wins by a very small margin over sequential scan > after the tuning. I am a bit puzzled because index scan is more > than 3000 times faster in this case, but the estimated costs are > about the same. Did I do something wrong? Tuning is generally needed to ge

[PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-26 Thread Sok Ann Yap
Hi, I am using PostgreSQL 9.0. There is a salutations table with 44 rows, and a contacts table with more than a million rows. The contacts table has a nullable (only 0.002% null) salutation_id column, referencing salutations.id. With this query: SELECT salutations.id, salutations.name,