Hello.
As of me, all this "hot" thing really looks like uncertain and dynamic
enough.
Two things that I could directly use right now (and they are needed in
pair) are:
1)Per-table/index/database bufferpools (split shared buffer into parts,
allow to specify which index/table/database goes where
On Mon, May 23, 2011 at 3:08 PM, Josh Berkus wrote:
>
>> Well, all of that stuff sounds impractically expensive to me... but I
>> just work here.
>
> I'll point out that the simple version, which just checks for hot tables
> and indexes, would improve estimates greatly and be a LOT less
> complica
> Well, all of that stuff sounds impractically expensive to me... but I
> just work here.
I'll point out that the simple version, which just checks for hot tables
and indexes, would improve estimates greatly and be a LOT less
complicated than these proposals. Certainly having some form of
block-
2011/5/19 Jim Nasby :
> On May 19, 2011, at 9:53 AM, Robert Haas wrote:
>> On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote:
>>> Jim Nasby wrote:
I think the challenge there would be how to define the scope of the
hot-spot. Is it the last X pages? Last X serial values? Something like
>
On Thu, May 19, 2011 at 2:39 PM, Jim Nasby wrote:
> On May 19, 2011, at 9:53 AM, Robert Haas wrote:
>> On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote:
>>> Jim Nasby wrote:
I think the challenge there would be how to define the scope of the
hot-spot. Is it the last X pages? Last X se
On May 19, 2011, at 9:53 AM, Robert Haas wrote:
> On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote:
>> Jim Nasby wrote:
>>> I think the challenge there would be how to define the scope of the
>>> hot-spot. Is it the last X pages? Last X serial values? Something like
>>> correlation?
>>>
>>> Hmm
On Wed, May 18, 2011 at 11:00 PM, Greg Smith wrote:
> Jim Nasby wrote:
>> I think the challenge there would be how to define the scope of the
>> hot-spot. Is it the last X pages? Last X serial values? Something like
>> correlation?
>>
>> Hmm... it would be interesting if we had average relation ac
Jim Nasby wrote:
I think the challenge there would be how to define the scope of the hot-spot.
Is it the last X pages? Last X serial values? Something like correlation?
Hmm... it would be interesting if we had average relation access times for each
stats bucket on a per-column basis; that woul
On May 16, 2011, at 10:46 AM, Tom Lane wrote:
> Robert Haas writes:
>> On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote:
>>> Ok, it may not work as well with index'es, since having 1% in cache may very
>>> well mean that 90% of all requested blocks are there.. for tables in should
>>> be more
Nathan Boley writes:
>> The accesses to an index are far more likely to be clustered than the
>> accesses to the underlying table, because the index is organized in a
>> way that's application-meaningful and the table not so much.
> So, to clarify, are you saying that if query were actually reque
> The accesses to an index are far more likely to be clustered than the
> accesses to the underlying table, because the index is organized in a
> way that's application-meaningful and the table not so much.
So, to clarify, are you saying that if query were actually requesting
rows uniformly random
Jeff Janes writes:
> On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh wrote:
>> Ok, it may not work as well with index'es, since having 1% in cache may very
>> well mean that 90% of all requested blocks are there.. for tables in should
>> be more trivial.
> Why would the index have a meaningful hot
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh wrote:
>
> Ok, it may not work as well with index'es, since having 1% in cache may very
> well mean that 90% of all requested blocks are there.. for tables in should
> be more trivial.
Why would the index have a meaningful hot-spot unless the underlyi
Robert Haas writes:
> On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote:
>> Ok, it may not work as well with index'es, since having 1% in cache may very
>> well mean that 90% of all requested blocks are there.. for tables in should
>> be more trivial.
> Tables can have hot spots, too. Consid
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote:
>> To me it seems like a robust and fairly trivial way to to get better
>> numbers. The
>> fear is that the OS-cache is too much in flux to get any stable numbers
>> out
>> of it.
>
> Ok, it may not work as well with index'es, since having 1% i
On 2011-05-16 06:41, Jesper Krogh wrote:
On 2011-05-16 03:18, Greg Smith wrote:
You can't do it in real-time. You don't necessarily want that to
even if it were possible; too many possibilities for nasty feedback
loops where you always favor using some marginal index that happens
to be in me
On 2011-05-16 03:18, Greg Smith wrote:
You can't do it in real-time. You don't necessarily want that to
even if it were possible; too many possibilities for nasty feedback
loops where you always favor using some marginal index that happens
to be in memory, and therefore never page in things t
Craig Ringer wrote:
AFAIK, mincore() is only useful for mmap()ed files and for finding out
if it's safe to access certain blocks of memory w/o risking triggering
heavy swapping.
It doesn't provide any visibility into the OS's block device / file
system caches; you can't ask it "how much of this
On 16/05/11 05:45, Cédric Villemain wrote:
> 2011/5/15 Josh Berkus :
>> disk pages might be in cache.
>> However, I think that's beyond feasibility for current software/OSes.
>
> maybe not :) mincore is available in many OSes, and windows have
> options to get those stats too.
AFAIK, mincore() is
2011/5/15 Josh Berkus :
> Stuart,
>
>> I think random_page_cost causes problems because I need to combine
>> disk random access time, which I can measure, with a guesstimate of
>> the disk cache hit rate.
>
> See, that's wrong. Disk cache hit rate is what effective_cache_size
> (ECS) is for.
>
> Re
On Sun, May 15, 2011 at 2:08 PM, Josh Berkus wrote:
>> All true. I suspect that in practice the different between random and
>> sequential memory page costs is small enough to be ignorable, although
>> of course I might be wrong.
>
> This hasn't been my experience, although I have not carefully m
Stuart,
> I think random_page_cost causes problems because I need to combine
> disk random access time, which I can measure, with a guesstimate of
> the disk cache hit rate.
See, that's wrong. Disk cache hit rate is what effective_cache_size
(ECS) is for.
Really, there's several factors which sh
Robert,
> All true. I suspect that in practice the different between random and
> sequential memory page costs is small enough to be ignorable, although
> of course I might be wrong.
This hasn't been my experience, although I have not carefully measured
it. In fact, there's good reason to sup
On Sat, May 14, 2011 at 3:13 AM, Josh Berkus wrote:
> This is what the combination of random_page_cost and
> effective_cache_size ought to supply, but I don't think it does, quite.
I think random_page_cost causes problems because I need to combine
disk random access time, which I can measure, wi
On Fri, May 13, 2011 at 4:13 PM, Josh Berkus wrote:
> Instead, we should be fixing the formulas these are based on and leaving
> RPC alone.
>
> For any data page, there are actually four costs associated with each
> tuple lookup, per:
All true. I suspect that in practice the different between ra
2011/5/13 Josh Berkus :
>
>> I guess maybe the reason why it didn't matter for the OP is that - if
>> the size of the index page in pages is smaller than the pro-rated
>> fraction of effective_cache_size allowed to the index - then the exact
>> value doesn't affect the answer.
>>
>> I apparently ne
> I guess maybe the reason why it didn't matter for the OP is that - if
> the size of the index page in pages is smaller than the pro-rated
> fraction of effective_cache_size allowed to the index - then the exact
> value doesn't affect the answer.
>
> I apparently need to study this code more.
F
On Fri, May 13, 2011 at 3:20 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure wrote:
>>> The very first thing to check is effective_cache_size and to set it to
>>> a reasonable value.
>
>> Actually, effective_cache_size has no impact on costing except
Robert Haas writes:
> On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure wrote:
>> The very first thing to check is effective_cache_size and to set it to
>> a reasonable value.
> Actually, effective_cache_size has no impact on costing except when
> planning a nested loop with inner index scan. So,
Robert Haas wrote:
> We've talked in the past (and I still think it's a good idea, but
> haven't gotten around to doing anything about it) about adjusting
> the planner to attribute to each relation the percentage of its
> pages which we believe we'll find in cache. Although many
> complicated
On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure wrote:
> On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner
> wrote:
>> Sok Ann Yap wrote:
>>
>>> So, index scan wins by a very small margin over sequential scan
>>> after the tuning. I am a bit puzzled because index scan is more
>>> than 3000 times f
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap wrote:
>
> I understand the need to tune PostgreSQL properly for my use case.
> What I am curious about is, for the data set I have, under what
> circumstances (hardware/workload/cache status/etc) would a sequential
> scan really be faster than an index
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap wrote:
> On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
>
>
> I understand the need to tune PostgreSQL properly for my use case.
> What I am curious about is, for the data set I have, under what
> circumstances (hardware/workload/cache status/etc) wou
On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner
wrote:
> Sok Ann Yap wrote:
>> Kevin Grittner wrote:
>
>>> Please show us your overall configuration and give a description
>>> of the hardware (how many of what kind of cores, how much RAM,
>>> what sort of storage system).
>
>> Here's the configu
On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
wrote:
> Sok Ann Yap wrote:
>
>> Anyway, the overhead of spawning 44 extra queries means that it is
>> still better off for me to stick with the original query and tune
>> PostgreSQL to choose index scan.
>
> Maybe, but what is *best* for you is to
On Wed, Apr 27, 2011 at 5:37 AM, Kevin Grittner
wrote:
> Sok Ann Yap wrote:
>
>> So, index scan wins by a very small margin over sequential scan
>> after the tuning. I am a bit puzzled because index scan is more
>> than 3000 times faster in this case, but the estimated costs are
>> about the same
Sok Ann Yap wrote:
> Anyway, the overhead of spawning 44 extra queries means that it is
> still better off for me to stick with the original query and tune
> PostgreSQL to choose index scan.
Maybe, but what is *best* for you is to tune PostgreSQL so that your
costs are accurately modeled, at w
Sok Ann Yap wrote:
> Kevin Grittner wrote:
>> Please show us your overall configuration and give a description
>> of the hardware (how many of what kind of cores, how much RAM,
>> what sort of storage system).
> Here's the configuration (this is just a low end laptop):
> version | PostgreSQ
On Wed, Apr 27, 2011 at 9:22 AM, Claudio Freire wrote:
> The problem there, I think, is that the planner is doing a full join,
> instead of a semi-join.
Or, rather, computing cost as if it was a full join. I'm not sure why.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresq
On Wed, Apr 27, 2011 at 3:04 AM, Merlin Moncure wrote:
> The very first thing to check is effective_cache_size and to set it to
> a reasonable value.
>
The problem there, I think, is that the planner is doing a full join,
instead of a semi-join.
--
Sent via pgsql-performance mailing list (pgsql
On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner
wrote:
> Sok Ann Yap wrote:
>
>> So, index scan wins by a very small margin over sequential scan
>> after the tuning. I am a bit puzzled because index scan is more
>> than 3000 times faster in this case, but the estimated costs are
>> about the same
Sok Ann Yap wrote:
> So, index scan wins by a very small margin over sequential scan
> after the tuning. I am a bit puzzled because index scan is more
> than 3000 times faster in this case, but the estimated costs are
> about the same. Did I do something wrong?
Tuning is generally needed to ge
Hi,
I am using PostgreSQL 9.0. There is a salutations table with 44 rows,
and a contacts table with more than a million rows. The contacts table
has a nullable (only 0.002% null) salutation_id column, referencing
salutations.id.
With this query:
SELECT
salutations.id,
salutations.name,
43 matches
Mail list logo