date:20051201

[HACKERS] Buildfarm: Bear, Branch 2?

2005-12-01 Thread Michael Glaesemann


Out of curiosity, what is this beast?

http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=bear&dt=2005-11-13% 
2012:01:08


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Reducing relation locking overhead

2005-12-01 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> It was a *major* new feature that many people were waiting for when Oracle
> finally implemented live CREATE INDEX and REINDEX. The ability to run create
> an index without blocking any operations on a table, even updates, was
> absolutely critical for 24x7 operation.

Well, we're still not in *that* ballpark and I haven't seen any serious
proposals to make us so.  How "absolutely critical" is it really?
Is REINDEX-in-parallel-with-reads-but-not-writes, which is what we
actually have at the moment, an "absolutely critical" facility?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fork-based version of pgbench

2005-12-01 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> It's not so much that I want to inflate the measurements, as that
>> leaving 10% of the CPU on the table reduces pgbench's usefulness as
>> a way of stress-testing the backend.
> ...
> In any case it seems like there would be cases where each kind of behaviour
> would be useful. It seems likely there would be bugs where the lockstep
> behaviour was useful for testing, and other bugs where the randomized
> behaviour would be useful for testing too.

No, I'm not thinking about bugs at all, but performance stress tests.
As an example, the various context-swap-storm behaviors we've been
chasing over the past year or so cannot arise unless you can get several
processes competing for the same resource at the same time.  If your
test configuration isn't capable of saturating the CPU then you have
a much lower probability of being able to observe such misbehavior.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Greg Stark

"Pollard, Mike" <[EMAIL PROTECTED]> writes:

> Optimizer hints were added because some databases just don't have a very
> smart optimizer.  But you are much better served tracking down cases in
> which the optimizer makes a bad choice, and teaching the optimizer how
> to make a better one.  

You more or less missed my entire point.

You can always teach the optimizer to make better decisions based on good
data. Your statement is basically right when talking about tweaking the
optimizer's decisions to ignore its best judgement.

But there are many many cases where the data the optimizer has available isn't
good and for good reason. And in plenty of those cases the data the optimizer
has available *can't* be good.

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like 

  WHERE radius_authenticate(user) = 'OK'

-- 
greg

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Trent Shipley

On Thursday 2005-12-01 19:01, Gregory Maxwell wrote:
> On 12/1/05, Pollard, Mike <[EMAIL PROTECTED]> wrote:
> > Optimizer hints were added because some databases just don't have a very
> > smart optimizer.  But you are much better served tracking down cases in
> > which the optimizer makes a bad choice, and teaching the optimizer how
> > to make a better one.  That way, all users get the benefit of the fix.
> > Remember, the purpose of SQL is to isolate the end user from having to
> > care about how the data is retrieved; that is the RDBMS' problem.  (the
> > other thing forgotten was that it was supposed to be a natural language.
> > NVL.  Bah.)
>
> The flipside there is that a good set of hinting options  may increase
> the amount of detailed feedback we get from users on improvements
> needed in the optimizer.  The current knobs are pretty blunt and don't
> do as much as I'd like when trying to track down exactly where the
> optimiser has gone wrong.
>
> If we'd really like to avoid people using the knobs to rig queries,
> how about making them only  work with explain analyze, useful for
> debugging but not so useful for actual queries.

I'm all in favor of sticking to the declarative language ideal.

Also, I'm much in favor of protecting people from themselves.

On the other hand, if folks insist on engaging in extreme sports (like second 
guessing the optimizer) I'm against regulating their freedom.  I think 
exposing planner variables would be a good thing, on net.  Naturally, you 
would warn everyone not to touch them.  (Safety and freedom are both 
necessary.)

If you can play with the knobs, you should let them be used to return real 
result sets.  That way, when you get feedback, you will be able to tell if 
the cost estimator is "broken".  Just returning a modified plan won't 
challenge costing assumptions.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Reducing relation locking overhead

2005-12-01 Thread Greg Stark

Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:

> Surely in the real world REINDEX is run so rarely compared to all those other
> operations it'd be a win...

It's not a question of frequency. We're not talking about something like a 10%
performance loss. You're talking about whether REINDEX is useful at all.
Consider installations where REINDEX will require shutting down business
critical operations for days...

It was a *major* new feature that many people were waiting for when Oracle
finally implemented live CREATE INDEX and REINDEX. The ability to run create
an index without blocking any operations on a table, even updates, was
absolutely critical for 24x7 operation.

-- 
greg

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Fork-based version of pgbench

2005-12-01 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> It's not so much that I want to inflate the measurements, as that
> leaving 10% of the CPU on the table reduces pgbench's usefulness as
> a way of stress-testing the backend.

I suspect the difference is the same thing you theorised made the difference
before. Namely that they're no longer proceeding in lockstep. The progress is
more random allowing some processes to make more progress than average and
benefit from better, er, well some cache somewhere.

In any case it seems like there would be cases where each kind of behaviour
would be useful. It seems likely there would be bugs where the lockstep
behaviour was useful for testing, and other bugs where the randomized
behaviour would be useful for testing too.

-- 
greg

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Additional Grant/revoke problem

2005-12-01 Thread Joshua D. Drake


Hello,

CREATE TABLE foo (id bigserial primary key, fname text);
GRANT INSERT ON foo to jd;
INSERT INTO foo(fname) VALUES ('Joshua');

This will fail, because I don't have permissions on the sequence 
foo_id_seq which is a dependency created
by PostgreSQL. It seems that if bigserial is the datatype the grant 
should automatically cascade.


This behavior would make sense if I did it the long way, where I used 
bigint and then modified the column

with ALTER TABLE after the fact.

Joshua D. Drake


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Weird Grant/Revoke/Usage behavior

2005-12-01 Thread Joshua D. Drake


Hello,

The below seems incorrect. If I am in the schema the behavior seems 
correct. I can't see or select from the table.
However if I am not in the schema I am able to see the table and its 
structure. The user jd is not a superuser.


cleancontact=# revoke usage on schema financials from jd;
REVOKE
cleancontact=# \c cleancontact jd
You are now connected to database "cleancontact" as user "jd".
cleancontact=> \d financials.foo
 Table "financials.foo"
Column |  Type  |Modifiers
++-
id | bigint | not null default nextval('financials.foo_id_seq'::text)
fname  | text   |
Indexes:
   "foo_pkey" PRIMARY KEY, btree (id)

cleancontact=> set search_path='financials';
SET
cleancontact=> \d
No relations found.
cleancontact=> \d foo
Did not find any relation named "foo".
cleancontact=>


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane

Neil Conway <[EMAIL PROTECTED]> writes:
> On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
>> If we'd really like to avoid people using the knobs to rig queries,
>> how about making them only  work with explain analyze, useful for
>> debugging but not so useful for actual queries.

> That seems a pretty arbitrary limitation. I agree that it's not ideal to
> have users adjust planner behavior via this means, but until we have
> something better, I think applying that limitation would only make the
> status quo worse.

Yeah, I agree.  Adding code to prevent people from using a facility
doesn't seem very reasonable, even if it's our policy that using the
facility for production purposes is not a good idea.  In fact, we just
today had a counterexample --- see this thread:
http://archives.postgresql.org/pgsql-performance/2005-12/msg00015.php
Being able to use enable_nestloop got Markus out of a short-term bind,
which to me is exactly what you want to be able to do with this sort
of thing.

I don't have any problem with expending small amounts of work to make
it easier to hack the planner in small ways.  The real problem I have
with a "planner hints" facility (in the form that I think most people
who ask for it have in mind) is that it would be a *very large* amount
of work to do it reasonably well, and I think that amount of effort
would be better spent in other ways.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Neil Conway

On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
> If we'd really like to avoid people using the knobs to rig queries,
> how about making them only  work with explain analyze, useful for
> debugging but not so useful for actual queries.

That seems a pretty arbitrary limitation. I agree that it's not ideal to
have users adjust planner behavior via this means, but until we have
something better, I think applying that limitation would only make the
status quo worse.

-Neil

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Pollard, Mike

Gregory Maxwell <[EMAIL PROTECTED]> wrote:
> The flipside there is that a good set of hinting options  may increase
> the amount of detailed feedback we get from users on improvements
> needed in the optimizer.  The current knobs are pretty blunt and don't
> do as much as I'd like when trying to track down exactly where the
> optimiser has gone wrong.

Point conceded.  Any information that can help diagnose an issue is good
information.  I like the idea of only allowing it on explain.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Reducing relation locking overhead

2005-12-01 Thread Stephen Frost

* Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote:
> >4. The only reason we need to take relation-level locks on indexes
> >at all is to make the world safe for REINDEX being done concurrently
> >with read-only accesses to the table (that don't use the index being
> >reindexed).  If we went back to requiring exclusive lock for reindex we
> >could forget all about both #2 and #3.  Particularly for updates of
> >relations with lots of indexes, this could be a pretty significant win.
> >However we'd definitely be giving up something that was seen as a
> >feature at one point, so I'm not sold on this idea ... unless someone
> >can see a way to reduce the overhead without giving up concurrent
> >REINDEX.
> 
> Surely in the real world REINDEX is run so rarely compared to all those 
> other operations it'd be a win...

Yeah, except that in the real world you don't want to bring everything
to a halt while you do a REINDEX.  For my use cases it'd be fine but I
could see cases where it wouldn't be.  Kinda makes me wish we could give
the user the option at runtime somehow but I'm not sure that could be
done...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Windows installation notes

2005-12-01 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Thanks for responding to this and fixing what you could. I
understand now that a lot of the things are Window-isms
and thus not fixable.
  
> It's the default size for a Windows Installer installation program. We
> could make it larger, but then we'd not look very normal any more.
> AFAIK, it can't be made changeable.

How about a popup README option? (e.g. opens notepad for easier reading)

>> http://postgresql.org/windows.html
  
> It's also available as
> http://www.postgresql.org/docs/faqs.FAQ_windows.html
>
> Not sure if it's all tha tmuch better.

Not really. Anyone listening that could create a simpler URL? Redirects
are cheap after all...

> More explanations would help throughout the process, such as
> explaining exactly what "Initialize database cluster" means,
> and what a port number is.

>> Any specifics on how to do that would certainly be appreciated :-)

Heh. No idea. But is there a Windows TODO this could go on?

>> FAT32 partition. Should this not have come up much earlier?=20
>> For example, when I chose the partition?
  
> We can't put it up earlier, because it's only relevant if you actually
> do an initdb. We'll show it as soon as we've found that out.

Sure, but doing an initdb is the default action. Can't we do a simple check
early on just to let the user know that "this directory cannot be
used for the data directory"?

  
> Will fix. Been planning to do those screenshots for a long time,
> just never actually did it.

They're up now, and look great!

  
>> For the conf files in the menus: is there a way to make the=20
>> default editor wordpad instead of notepad?
  
> We could, but why do you want it? Specifically, I see several problems
> with using wordpad as the default:
> 1) It doesn't live in the same place always, depending on the language
> of windows. (Yeha, we can work around this fairly easily)

Can't you just say "wordpad %1" or whatever the Win convention is.

> 2) You can install Windows without it. You can't install Windows iwthout
> notepad, it'll always be there.

Can't argue with that. But we could test for wordpad first...

> 3) In some cases, Wordpad suddently decides you want to use Word format
> for your .txt file, when you've pasted something in it for example. This
> is really bad. I've never really figured out when or why, but I've
> seenit happen.

Does it depend on the extension? The conf files should be safe if so...

> This is because we don't support readline on windows because it breaks
> in most locales.

Ouch. Those poor Windows users.

> Thanks for all your comments!

Thanks for all the work you and everyone else has done on making this
installer!

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200512012155
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iD8DBQFDj7f1vJuQZxSWSsgRAn9uAJ0fin3nzy81+/aqQ+dghfXIS7XqZQCgza/8
d4VGoLwg8LU1Ik1XZRPL4AU=
=SZy9
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set

2005-12-01 Thread Bruce Momjian


OK, comments removed, and comment added to port/strtol.c.

---

Tom Lane wrote:
> Bruce Momjian  writes:
> > I modified it to:
> > errno = 0;  /* avoid having to check the result for failure */
> 
> Just for the record, that's *still* wrong.  It implies that if we
> tested (result == LONG_MAX && errno == ERANGE), without zeroing
> errno beforehand, the code would be correct.  But it would not,
> because the errno value could still be leftover.  The plain fact
> of the matter is that if you're going to check for strtol overflow at
> all, you have to zero errno beforehand.  This is perfectly well
> explained in the strtol spec page, and I see no need to duplicate it:
> 
>   Because 0, LONG_MIN and LONG_MAX are returned on error and are
>   also valid returns on success, an application wishing to check
>   for error situations should set errno to 0, then call strtol(),
>   then check errno.
> 
>   regards, tom lane
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Reducing relation locking overhead

2005-12-01 Thread Christopher Kings-Lynne


4. The only reason we need to take relation-level locks on indexes
at all is to make the world safe for REINDEX being done concurrently
with read-only accesses to the table (that don't use the index being
reindexed).  If we went back to requiring exclusive lock for reindex we
could forget all about both #2 and #3.  Particularly for updates of
relations with lots of indexes, this could be a pretty significant win.
However we'd definitely be giving up something that was seen as a
feature at one point, so I'm not sold on this idea ... unless someone
can see a way to reduce the overhead without giving up concurrent
REINDEX.


Surely in the real world REINDEX is run so rarely compared to all those 
other operations it'd be a win...


Chris


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Reducing relation locking overhead

2005-12-01 Thread Tom Lane

In looking at the current pgbench results, I notice that one
considerable contribution to LWLock contention is access to the
heavyweight-lock manager.  Almost all of that comes from taking
relation-level locks, so we could cut down the contention if we could
reduce the number of distinct locks taken.  (The lmgr code already
avoids touching shared memory again if, say, you request another
AccessShareLock on a relation you've already AccessShareLocked in
the current transaction.)

I see several places where we might possibly make this kind of
savings:

1. In an UPDATE or DELETE query, we take RowExclusiveLock on the target
relation, and also take AccessShareLock in the scan node that reads the
relation.  It wouldn't take much extra code to make the scan nodes avoid
taking AccessShareLock if the relation is already opened as the query
result relation.  AFAICS there is no downside to this; any lock that
would conflict with AccessShareLock will also conflict with
RowExclusiveLock, so there's no real need to hold both lock types.
(Moreover, we already make a similar kind of optimization when accessing
system catalogs: those routines only take one lock not two.)
Does anyone have an objection to it?

2. In the same way, an index that is used to scan the target relation
will be locked in both RowExclusiveLock and AccessShareLock modes
(corresponding to its roles as both source and update target).  We could
avoid taking both lock types, but this would require some restructuring
because the code responsible for locking the indexes doesn't currently
have access to the EState to find out whether an index belongs to a
target relation.  What I'm thinking here is that maybe there's no point
in maintaining the read versus write distinction at all for indexes ---
we could just take AccessShareLock in both cases.  Any thoughts on the
pros and cons there?

3. We could also save some trips to the lock manager if we adopt the
same position for user indexes as we do for user tables, namely that
locks once taken are held till end of transaction.  Any thoughts about
that?

4. The only reason we need to take relation-level locks on indexes
at all is to make the world safe for REINDEX being done concurrently
with read-only accesses to the table (that don't use the index being
reindexed).  If we went back to requiring exclusive lock for reindex we
could forget all about both #2 and #3.  Particularly for updates of
relations with lots of indexes, this could be a pretty significant win.
However we'd definitely be giving up something that was seen as a
feature at one point, so I'm not sold on this idea ... unless someone
can see a way to reduce the overhead without giving up concurrent
REINDEX.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Gregory Maxwell

On 12/1/05, Pollard, Mike <[EMAIL PROTECTED]> wrote:
> Optimizer hints were added because some databases just don't have a very
> smart optimizer.  But you are much better served tracking down cases in
> which the optimizer makes a bad choice, and teaching the optimizer how
> to make a better one.  That way, all users get the benefit of the fix.
> Remember, the purpose of SQL is to isolate the end user from having to
> care about how the data is retrieved; that is the RDBMS' problem.  (the
> other thing forgotten was that it was supposed to be a natural language.
> NVL.  Bah.)

The flipside there is that a good set of hinting options  may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer.  The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

If we'd really like to avoid people using the knobs to rig queries,
how about making them only  work with explain analyze, useful for
debugging but not so useful for actual queries.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Tom Lane

Martijn van Oosterhout  writes:
>   errno = 0;   /* clear prior detected errors */

That one is at least a correct explanation of what the code is doing...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Tom Lane

Bruce Momjian  writes:
> I modified it to:
> errno = 0;  /* avoid having to check the result for failure */

Just for the record, that's *still* wrong.  It implies that if we
tested (result == LONG_MAX && errno == ERANGE), without zeroing
errno beforehand, the code would be correct.  But it would not,
because the errno value could still be leftover.  The plain fact
of the matter is that if you're going to check for strtol overflow at
all, you have to zero errno beforehand.  This is perfectly well
explained in the strtol spec page, and I see no need to duplicate it:

Because 0, LONG_MIN and LONG_MAX are returned on error and are
also valid returns on success, an application wishing to check
for error situations should set errno to 0, then call strtol(),
then check errno.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is

2005-12-01 Thread Tom Lane

Neil Conway <[EMAIL PROTECTED]> writes:
> If people would like to see a detailed explanation of the
> interaction between strtol() and errno, a header comment to pg_strtol()
> seems a good place to put it. IMO that is better than copying and
> pasting a cryptic one-line comment to each and every callsite of
> strtol().

Next we'll be copying-and-pasting entire C-library man pages, no doubt.
I think this whole discussion is a waste of electrons, as are the
proposed comments.  No one ever asked for extra documentation in the
original coding in pg_atoi, or the other dozen or so places where we
have historically checked the result of strtol.  Why do we suddenly
feel it requires extra doc now?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fork-based version of pgbench

2005-12-01 Thread Tom Lane

Now that I've fixed the silly mistake in the fork-based version of
pgbench,
http://archives.postgresql.org/pgsql-patches/2005-12/msg00017.php
I'm seeing it consistently outperform the CVS-tip version by about 5%.
I get about 700 tps versus 670 tps; meanwhile "top" reports that idle
CPU percentage drops from around 15% to around 5%.  So I'm thinking
it may be worthwhile to fix the portability issues (ie, provide a
thread-based variant for Windows) and make it the standard version.
It's not so much that I want to inflate the measurements, as that
leaving 10% of the CPU on the table reduces pgbench's usefulness as
a way of stress-testing the backend.

The test case I'm looking at is on a dual Xeon, EM64T, hyperthreading
enabled (hence, 4 logical CPUs), running Fedora Core 4.  Database
parameters are stock except these changes to minimize I/O:
fsync = off
shared_buffers = 5
checkpoint_segments = 30
Database is initialized with "pgbench -i -s 10 bench" and then tested
with "pgbench -c 10 -t 3000 bench"; I usually do three runs and take
the median to have a trustworthy number.

It'd be interesting to find out if other people can get similar results
on other platforms.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Improving count(*)

2005-12-01 Thread Bruce Momjian


Add idea to TODO:

* Allow data to be pulled directly from indexes

  Currently indexes do not have enough tuple visibility information
  to allow data to be pulled from the index without also accessing
  the heap.  One way to allow this is to set a bit on index tuples
  to indicate if a tuple is currently visible to all transactions
  when the first valid heap lookup happens.  This bit would have to
  be cleared when a heap tuple is expired.  Another idea is to maintain
  a bitmap of heap pages where all rows are visible to all backends,
  and allow index lookups to reference that bitmap to avoid heap
  lookups, perhaps the same bitmap we might add someday to determine
  which heap pages need vacuuming.

I am thinking we are at the point where between this usage and vacuum
that a bitmap for each table is something we should work on for 8.2.

---

Bruce Momjian wrote:
> [EMAIL PROTECTED] wrote:
> > On Fri, Nov 18, 2005 at 03:46:42PM +, Richard Huxton wrote:
> > > Simon Riggs wrote:
> > > >One of the major complaints is always "Select count(*) is slow".
> > > Although there seem to have been plenty of ideas on this they all seem 
> > > to just provide a solution for the "whole table" case. It might be that 
> > > the solution provides other benefits, but for this one case it does seem 
> > > like a lot of work.
> > 
> > Or, it wasn't explained properly as to how the WHERE clause would
> > function.
> > 
> > The solution involving an index that has visibility information should
> > work fine with a WHERE clause. Only index rows that match the clause
> > are counted.
> > 
> > A solution enhancing the above mentioned indexes, to maintain a count
> > for whole index blocks, would allow whole index blocks that satisfy
> > the WHERE clause to be counted, assuming the whole index block is
> > visible in the current transaction.
> 
> I think it would be very difficult to generate a per-index-page
> visibility bit because I can't think of a normal database operation that
> would allow us to update it.  It requires that an index scan visit every
> heap page to check the visibility of the tuples.  However, we almost
> never do a full-index scan because it is so much slower than a heap
> scan.  It would be easy to keep a heap-visible bit up-to-date (because
> we do full-heap scans occasionally), but that would require the index
> to load the heap page to find the bit.  (The bit isn't on the index, it
> is on the heap).
> 
> Jan has been talking about have a bitmap to track pages that need
> vacuuming, and I am wondering if the same system could be used to track
> the heap-dirty bits.  Putting one bit on every 8k disk page means we have
> to load the 8k page to read the bit, while a centralized bitmap would
> load 64k page bits in a single 8k page.  That one page would cover 500MB
> of a table.  Seems vacuum could use the same bitmap values.
> 
> Assuming we track 100 tables regularly, that would require 800k of shared
> memory.  We would have to pagein/out the bitmaps as needed, but we could
> throw them away on a crash and rebuild as part of normal operation.
> 
> FSM has not been a concurrency bottleneck, so I am thinking this would
> not be either.
> 
> I suppose it would require a new filesystem file for each table.
> 
> -- 
>   Bruce Momjian|  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us   |  (610) 359-1001
>   +  If your life is a hard drive, |  13 Roberts Road
>   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is

2005-12-01 Thread Neil Conway

On Thu, 2005-12-01 at 16:38 -0500, Bruce Momjian wrote:
> Maybe it should be:
> 
>   errno = 0;  /* Allow unconditional errno check */

I think any solution that involves adding more duplication at each
strtol() callsite is not great ("Don't Repeat Yourself"). I'd still like
to see this refactored into a separate function, as I suggested on
-patches. If people would like to see a detailed explanation of the
interaction between strtol() and errno, a header comment to pg_strtol()
seems a good place to put it. IMO that is better than copying and
pasting a cryptic one-line comment to each and every callsite of
strtol().

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set

2005-12-01 Thread Bruce Momjian

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Thu, Dec 01, 2005 at 04:12:30PM -0500, Bruce Momjian wrote:
> > Well, there seems to be enough confusion, even in this email list, that
> > identifying _why_ errno is being cleared is a good idea.
> > 
> > I modified it to:
> > 
> > errno = 0;  /* avoid having to check the result for failure */
> 
> I don't know about others but I find that wording ambiguous. Like it's
> saying that once you've done that it can't fail. I think I'd prefer
> something like:
> 
> errno = 0;   /* Make error condition detectable */
> 
> or even
> 
> errno = 0;   /* clear pending errors */
> 
> or
> 
> errno = 0;   /* clear prior detected errors */

Maybe it should be:

errno = 0;  /* Allow unconditional errno check */

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] SHOW ALL output too wide

2005-12-01 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Dennis Bjorklund wrote:
> >> My motivation is only for the good of postgresql. If the majority want
> >> something else then that's what should happen (of course). I'm just 
> >> stating my view, nothing less and nothing more.
> 
> > I am surprised we have not gotten more comments about it.  I personally
> > like the VERBOSE idea myself, but others didn't, so what we have now was
> > the consensus.
> 
> Personally I'd vote for losing the description, ie, revert SHOW ALL to
> the way it was in 8.0 (and never mind the VERBOSE option, either).
> I don't find the descriptions to be a helpful addition.

The big question is whether general users will find the descriptions
helpful.  I thought the agreement that they were, similar to how we have
psql \h help.  It doesn't replace the documentation, but it does assist.

In fact, if we don't show the descriptions, why are we maintaining them? 
Just for "SELECT * FROM pg_settings"?

I do think we have divergent opinions on this and that is why our
default now is so inconsistent.  Can we get more comments from users?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Martijn van Oosterhout

On Thu, Dec 01, 2005 at 04:12:30PM -0500, Bruce Momjian wrote:
> Well, there seems to be enough confusion, even in this email list, that
> identifying _why_ errno is being cleared is a good idea.
> 
> I modified it to:
> 
> errno = 0;  /* avoid having to check the result for failure */

I don't know about others but I find that wording ambiguous. Like it's
saying that once you've done that it can't fail. I think I'd prefer
something like:

errno = 0;   /* Make error condition detectable */

or even

errno = 0;   /* clear pending errors */

or

errno = 0;   /* clear prior detected errors */

YMMV,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpnI1LTPZTnb.pgp
Description: PGP signature

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Alvaro Herrera

Bruce Momjian wrote:
> Tom Lane wrote:

> > > or should I add a macro to c.h as:
> > 
> > >   /* Sometimes we need to clear errno so we can check errno
> > >* without having to check for a failure value from the function
> > >* call.
> > >*/ 
> > >   #define CLEAR_ERRNO \\
> > >   do { \
> > >   errno = 0; \\
> > >   while (0);

May I vote against this kind of use of macros in general?  It doesn't
add much value (actually, none in this case) and it makes the code
harder to read.  For a pathological example I can point to PHP, which is
so full of strange macros that it's very very hard to read.

Of course there are places where macros are valuable tools, but this
doesn't seem to be one of them.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Pollard, Mike

Greg Stark <[EMAIL PROTECTED]> writes:
> On the other hand the type I would prefer to see are hints that feed
directly
> into filling in information the planner lacks. This only requires that
the
> user understand his own data and still lets the planner pick the best
plan
> based on the provided information.

Optimizer hints were added because some databases just don't have a very
smart optimizer.  But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one.  That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem.  (the
other thing forgotten was that it was supposed to be a natural language.
NVL.  Bah.)

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Should I just change them all to:
> 
> > errno = 0;  /* avoid checking result for failure */
> 
> No, that's still a completely inaccurate description of the reason
> for having the statement.
> 
> > or should I add a macro to c.h as:
> 
> > /* Sometimes we need to clear errno so we can check errno
> >  * without having to check for a failure value from the function
> >  * call.
> >  */ 
> > #define CLEAR_ERRNO \\
> > do { \
> > errno = 0; \\
> > while (0);
> 
> I vote "neither".  Anyone who doesn't understand what this is for will
> need to go read the C library man pages for a bit anyway.  Nor do I find
> "CLEAR_ERRNO" an improvement over "errno = 0".

Well, there seems to be enough confusion, even in this email list, that
identifying _why_ errno is being cleared is a good idea.

I modified it to:

errno = 0;  /* avoid having to check the result for failure */

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Tom Lane

Bruce Momjian  writes:
> Should I just change them all to:

>   errno = 0;  /* avoid checking result for failure */

No, that's still a completely inaccurate description of the reason
for having the statement.

> or should I add a macro to c.h as:

>   /* Sometimes we need to clear errno so we can check errno
>* without having to check for a failure value from the function
>* call.
>*/ 
>   #define CLEAR_ERRNO \\
>   do { \
>   errno = 0; \\
>   while (0);

I vote "neither".  Anyone who doesn't understand what this is for will
need to go read the C library man pages for a bit anyway.  Nor do I find
"CLEAR_ERRNO" an improvement over "errno = 0".

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> On the other hand the type I would prefer to see are hints that feed directly
> into filling in information the planner lacks. This only requires that the
> user understand his own data and still lets the planner pick the best plan
> based on the provided information.

This would avoid some issues, but it still is vulnerable to the problem
that the hint you put in your code today will fail to track changes in
your data tomorrow.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [COMMITTERS] pgsql: Add comments about why errno is set to zero.

2005-12-01 Thread Bruce Momjian

Tom Lane wrote:
> [EMAIL PROTECTED] (Bruce Momjian) writes:
> > Log Message:
> > ---
> > Add comments about why errno is set to zero.
> 
> These comments seem a bit wrongheaded, since "checking
> LONG_MIN/LONG_MAX" is exactly not what we could do to detect an overflow
> error.

Yea, I noticed the 0 was listed as another value that needs to be
checked.  Should I just change them all to:

errno = 0;  /* avoid checking result for failure */

or should I add a macro to c.h as:

/* Sometimes we need to clear errno so we can check errno
 * without having to check for a failure value from the function
 * call.
 */ 
#define CLEAR_ERRNO \\
do { \
errno = 0; \\
while (0);

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Greg Stark

"Jonah H. Harris" <[EMAIL PROTECTED]> writes:

> Tom,
> 
> Don't get me wrong, I agree with you completely.  I would rather put effort
> into enhancing the planner than in developing work-arounds.  In 99% of all
> cases the planner works correctly, but I know people who actually have to
> disable planning options (mergejoin) in production applications because they
> get bad plans.  The "bad" plans are not really bad in terms of what the
> planner knows about the query, just in areas where the planner doesn't look
> at other things.

I would like to draw a distinction between two sorts of hints. Currently
you're talking about one sort of hint, namely hints that tell the planner to
alter its cost model and choose a different plan than the inputs it has would
dictate. Using these require the user to have a fairly in depth understanding
of the planner and what options it has available.

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

So for example I would love to see a hint that allowed you to specify the
selectivity of a where clause. And one that let you specify the "density" of a
grouping clause.

Most of the time the planner makes a mistake it's because of a bad
miscalculation in estimating these givens. If it had the correct values for
the inputs then it would make the right decision about the plan.

Making the planner very good at making the right decisions given accurate
inputs is an attainable goal. Computers are pretty deterministic and it's
possible to come up with very accurate cost models. Despite some known
problems with Postgres's current models they're remarkably good already. And
there's no particular reason to think they can't be made nearly perfect.

Making the planner very good at producing accurate estimates is a much harder
goal. No matter how accurate it gets there will always be more complex
expressions that are harder to predict and there will always be cases the
planner can't estimate well. The user however knows his own data and may well
know the answer.

In the extreme consider user-defined operators, which will always be dependent
on the user to provide estimator functions. If it's a rarely used operator the
user may find it easier to simply tell the planner the selectivity of each
expression rather than come up with a general solution.

(I also think things like joins and group by clauses will never be able to be
guaranteed accurate results in general. Not without a lot more costs up front
including giving up on calculating statistics based on only a sample.)

-- 
greg

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Strange interval arithmetic

2005-12-01 Thread Michael Fuhr

On Thu, Dec 01, 2005 at 03:31:41PM -0500, Greg Stark wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Generally speaking looking at errno when you haven't received an error 
> > return
> > from a libc function is asking for trouble. It could be leftover from any
> > previous libc error. 
> > 
> > That's how you get programs saying things like "strtol: No such file or
> > directory" ...
> 
> Ah, I take back my taking back of this. It's still dicey to be doing it this
> way -- even if you reset errno before calling the library function.
> 
> The problem is that the function can call other libc functions, which may in
> turn return errors. But these errors might be incidental and handled by the
> function you're calling.

I had that concern, as I've seen such incidental errno changes
before.  But Tom pointed out the following from SUS:

  The strtol() function shall not change the setting of errno if
  successful.

Immediately after that the standard says:

  Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX}
  are returned on error and are also valid returns on success, an
  application wishing to check for error situations should set errno
  to 0, then call strtol() or strtoll(), then check errno.

I don't know if any systems are non-compliant in this respect, but
Tom said that "we've been doing it that way (errno test only) for
many years without complaints."

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Strange interval arithmetic

2005-12-01 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> Ah, I take back my taking back of this. It's still dicey to be doing it this
> way -- even if you reset errno before calling the library function.

See the rest of the thread.  The I-believe-what-I-read-in-the-spec camp
may take comfort from what it says in the SUS strtol spec,
http://www.opengroup.org/onlinepubs/007908799/xsh/strtol.html

The strtol() function will not change the setting of errno if
successful.

Those who are disinclined to assume that implementations always follow
the spec may take comfort from the fact that we've been testing errno
only (without also inspecting the return value) in pg_atoi for years
without problems.

>From neither point of view do I see an argument for adding a test on the
return value.  It's unnecessary according to the spec, and if you
disbelieve that strtol follows the spec, you should also be suspicious
about whether it's guaranteed to return LONG_MAX/LONG_MIN upon overflow
failure.  We have no track record that gives evidence that the latter is
universally true.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Strange interval arithmetic

2005-12-01 Thread Greg Stark

Greg Stark <[EMAIL PROTECTED]> writes:

> Generally speaking looking at errno when you haven't received an error return
> from a libc function is asking for trouble. It could be leftover from any
> previous libc error. 
> 
> That's how you get programs saying things like "strtol: No such file or
> directory" ...

Ah, I take back my taking back of this. It's still dicey to be doing it this
way -- even if you reset errno before calling the library function.

The problem is that the function can call other libc functions, which may in
turn return errors. But these errors might be incidental and handled by the
function you're calling.

The typical case of this is calling printf which first calls isatty(). printf
then returns success but leaves errno set to ENOTTY. And programs that check
errno without checking the return valule -- even if they clear it before
calling printf -- mysteriously print "Not a Typewriter" after correctly
printing the data.

The SUS standard protects against this though by only allowing this to happen
for functions that don't themselves use errno to signal errors:

  The value of errno may be set to nonzero by a library function call
  whether or not there is an error, provided the use of errno is not
  documented in the description of the function in this International
  Standard.

Older platforms may still have this behaviour, but strtol seems like a pretty
innocuous case. It's hard to imagine strtol needing to call much else. And
strtol was an ANSI addition so one imagines most platforms got it right from
the beginning.

-- 
greg

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-12-01 Thread Tom Lane

Qingqing Zhou <[EMAIL PROTECTED]> writes:
> I come up with a patch to fix server-side problem.

Applied.

> For windows, I set a one second waiting time -

The code actually does one millisecond; I left it that way since it
seems a reasonable value.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Jonah H. Harris

Tom,

Don't get me wrong, I agree with you completely.  I would rather
put effort into enhancing the planner than in developing
work-arounds.  In 99% of all cases the planner works correctly,
but I know people who actually have to disable planning options
(mergejoin) in production applications because they get bad
plans.  The "bad" plans are not really bad in terms of what the
planner knows about the query, just in areas where the planner doesn't
look at other things.

I also agree that a significant amount of work would be required to add
run-time hints which would be better spent enhancing the system as a
whole.  My only suggestion was that it would be better than Part 1
of Neil's statement.  Somehow I missed the end mention of
multipliers which I agree requires less effort.

On 12/1/05, Tom Lane <[EMAIL PROTECTED]> wrote:
"Jonah H. Harris" <[EMAIL PROTECTED]> writes:> In the last couple weeks I too have been thinking about planner hints.> Assuming I have read your post correctly, the issue I see with this idea is
> that, in most cases, there won't be much of a difference between adding an> arbitrary cost value to each type of node and disabling it completely.> Also, by fiddling with an arbitrary cost the user may introduce a lot of
> variation into the planner which may actually result in worse query plans.Which is pretty much exactly the problem with "planner hints", too.I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.  Even if the hint is righttoday for your current Postgres version and current data distribution,it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?As an example, a hint forcing the planner to use an indexscan with aparticular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-betterbitmap indexscan.The enable_foo switches are debug aids, not something you are expectedto fool with for production purposes, and the same would be true of
Neil's suggested multipliers.  While I don't feel any strong need forvariable multipliers, they'd be a small enough incremental amount ofwork that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitudemore work, and it would be taking the system in a design direction thatI think is fundamentally misguided.regards,
tom lane

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Magnus Hagander

> > That would be fairly trivial ... let me add it to the 'todo 
> list' ... 
> > I take it that it would be safe to relegate the 
> /pub/source/OLD stuff 
> > there too?
> 
> Not so trivial to put behind a web interface or the download 
> tracker though. Is it really necessary to have a separate 
> archive downloads site? It's not like the old ones get in the 
> way, or cost anything other than disk space on the mirrors to 
> store (and I've only ever heard mirror admins say how small 
> our site is compared to many others!).
> 
> Plus of course, weren't we trying to reduce the number of VMs/sites?

Agreed. If we're going to keep it, just sticking it in a /old/ directory
is definitly a lot better.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane

"Jonah H. Harris" <[EMAIL PROTECTED]> writes:
> In the last couple weeks I too have been thinking about planner hints.
> Assuming I have read your post correctly, the issue I see with this idea is
> that, in most cases, there won't be much of a difference between adding an
> arbitrary cost value to each type of node and disabling it completely.
> Also, by fiddling with an arbitrary cost the user may introduce a lot of
> variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.  Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers.  While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Strange interval arithmetic

2005-12-01 Thread Greg Stark


> Generally speaking looking at errno when you haven't received an error return
> from a libc function is asking for trouble. It could be leftover from any
> previous libc error. 

Oh, sorry, just reread the code snippet and see that isn't an issue. nevermind.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Strange interval arithmetic

2005-12-01 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Michael Fuhr <[EMAIL PROTECTED]> writes:
> > I usually check both in my own code but I noticed several places
> > where PostgreSQL doesn't, so I kept that style.  I'll check both
> > if that's preferred.
> 
> I'd say not --- it's more code and it makes a possibly unwarranted
> assumption about strtol's behavior.
> 

Generally speaking looking at errno when you haven't received an error return
from a libc function is asking for trouble. It could be leftover from any
previous libc error. 

That's how you get programs saying things like "strtol: No such file or
directory" ...

The strtol() function returns the result of the conversion, unless the value
would underflow or overflow. If an underflow occurs, strtol() returns
LONG_MIN. If an overflow occurs, strtol() returns LONG_MAX. In both cases,
errno is set to ERANGE. Precisely the same holds for strtoll() (with 
LLONG_MIN
and LLONG_MAX instead of LONG_MIN and LONG_MAX).

-- 
greg

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] A couple of proposed pgbench changes

2005-12-01 Thread Qingqing Zhou


"Tom Lane" <[EMAIL PROTECTED]> wrote
>
> I don't know yet why pgbench would be able to affect this, but it's
> repeatable.  If anyone's interested in trying to duplicate it, I'll
> post my version of pgbench.
>

I'd like to try to duplicate it here,

Regards,
Qingqing 



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Qingqing Zhou


"Neil Conway" <[EMAIL PROTECTED]> wrote
>
> This would also be useful when diagnosing bad query plans: for example,
> setting enable_seqscan=false often causes the planner to disregard the
> use of *any* sequential scan, anywhere in the plan. The ability to
> slightly bump up the cost of particular operations would allow more
> alternative plans to be examined.
>

This method also has the problem of "enable_seqscan=false" in some 
situations. I would vote we implement the final general solution like query 
plan hints directly.

Regards,
Qingqing 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Andrew Dunstan




Dave Page wrote:




 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Marc G. Fournier

Sent: 01 December 2005 17:01
To: Peter Eisentraut
Cc: pgsql-hackers@postgresql.org; Euler Taveira de Oliveira; 
Richard Huxton; Robert Treat; Magnus Hagander; Marc G. 
Fournier; [EMAIL PROTECTED]; Tom Lane; Andrew Dunstan

Subject: Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

On Thu, 1 Dec 2005, Peter Eisentraut wrote:

   

Am Donnerstag, 1. Dezember 2005 11:35 schrieb Euler Taveira 
 


de Oliveira:
   


What about an museum.postgresql.org to keep the old releases?
   

That gave me a good laugh, but there is something to be 
 


said about moving all
   

no longer supported releases (according to the criteria 
 


that are being
   


discussed) to an unmirrored site, say, archive.postgresql.org.
 

That would be fairly trivial ... let me add it to the 'todo 
list' ... I 
take it that it would be safe to relegate the /pub/source/OLD 
stuff there 
too?
   



Not so trivial to put behind a web interface or the download tracker
though. Is it really necessary to have a separate archive downloads
site? It's not like the old ones get in the way, or cost anything other
than disk space on the mirrors to store (and I've only ever heard mirror
admins say how small our site is compared to many others!).

Plus of course, weren't we trying to reduce the number of VMs/sites?


 



Agreed. I see no virtue in this at all. If we continue to make stuff 
available it must be because someone will need it. I can see that 
happening if some catastrophe happens on an old system, in which case 
the person hunting is likely to need to find it easily and possibly fast.


The network traffic involved in mirroring something that doesn't change 
is usually trivial, and disk space seems to be at most a few $ per Gb 
these days, so surely this is not a resource issue.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Shared locking in slru.c

2005-12-01 Thread Kenneth Marshall

On Wed, Nov 30, 2005 at 01:53:13PM -0500, Tom Lane wrote:
> I've been looking at various ways to resolve this, but one thing that
> seems promising is to hack slru.c to take the control lock in shared
> mode, not exclusive mode, for read-only accesses to pages that are
> already in memory.  The vast majority of accesses to pg_subtrans in the
> problem scenario are read-only, and so this reduces the need to block
> and the consequent CS storm.
> 
> A quick-hack prototype patch for this is attached.  (It changes only
> SubTransGetParent, but if applied of course TransactionIdGetStatus and
> so on would get the same treatment.)  The idea is that we take the lock
> in shared mode, look to see if the required page is in memory, and if so
> just return it; if not, release the lock, reacquire in exclusive mode,
> proceed with the existing code.
> 
> The reason that the existing code always takes the lock in exclusive
> mode, even if there's no intention to change data, is that it also needs
> to adjust the LRU information to reflect the page access, and the way
> that we're doing that requires exclusive access.  So to make the idea
> work at all, we need some alternative way of managing recency-of-use
> info.
> 
> The way the attached patch attacks this is for the shared-lock access
> case to simply set the page's LRU counter to zero, without bumping up
> the LRU counters of the other pages as the normal adjustment would do.
> This is safe to execute in a shared environment since even if someone
> else is concurrently touching the same page, they'll also be trying
> to set its counter to zero, and so there's no possibility of ending
> up in a bad state.  However, this leaves us with the likelihood that
> multiple pages will have equal LRU counters when it comes time to
> throw out a page to make room for another.  The patch deals with that
> by selecting the furthest-back page of those with the highest LRU
> setting.  I'm not totally happy with this heuristic, though, and was
> wondering if anyone had a better idea.  Anyone seen a lock-free
> data structure for LRU or approximately-LRU state tracking?
> 
>   regards, tom lane

Tom,

In this situation, if this code does not need the ability to queue
access to the control lock, it may be reasonable to use a latch process
to update the control information. In pseudo-code, the operations to
read the control information are:

WriteControl:
1. Set latch.
2. Update control information
3. Increment latch version number.
4. Unset latch.

ReadControl:
1. Read latch version number.
2. Read control information.
3. Check latch. If locked go to step 1.
4. Read latch version number. If the value is different from the
   value read in step 1, go to step 1.

In this way, a read operation will only need two reads of the version
number and one of the shared data. A write operation will be very
lightweight. Setting the latch can be a single TAS/CAS operation and
unsetting the latch and incrementing the version number can be done
in a single write. Of course, if you need to be able to queue up lock
requests this will not work.

Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-12-01 Thread Kevin Grittner

[Apologies for the delayed response; fighting through a backlog.]

I checked with out DBAs, and they are willing to test it.

By they way, they found that they were getting this on a pg_dump,
too.  We will test both failure cases.  If the test goes OK, we would
be happy to leave it in production with this patch.

-Kevin


>>> "Qingqing Zhou" <[EMAIL PROTECTED]>  >>>

"Tom Lane" <[EMAIL PROTECTED]> wrote
>
> Would a simple retry loop actually help?  It's not clear to me how
> persistent such a failure would be.
>

[with reply to all followup threads] Yeah, this is the key and we
definitely 
have no 100% guarantee that several retries will solve the problem -
just as 
the situation in pg_unlink/pg_rename. But shall we do something now? If 
Kevin could help on testing(you may have to revert the registry changes
:-() 
, I would like to send a patch in the retry style.

Regards,
Qingqing 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-12-01 Thread Kevin Grittner

Due to its size, in the Windows environment we can't dump this
database in any format except plain text, so the zlib issues don't
apply here.

-Kevin


>>> Qingqing Zhou <[EMAIL PROTECTED]>  >>>

> By they way, they found that they were getting this on a pg_dump,
> too.  We will test both failure cases.  If the test goes OK, we would
> be happy to leave it in production with this patch.
>

I can believe that pg_dump faces the similar situtation, i.e., running
out
of kernel buffers. But seems pg_dump supports "-Z 0..9" option which
uses
some external I/O functions from zlib. This part may be not easy to
"retry".


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Shared locking in slru.c

2005-12-01 Thread Kenneth Marshall

On Wed, Nov 30, 2005 at 03:23:55PM -0500, Tom Lane wrote:
> Kenneth Marshall <[EMAIL PROTECTED]> writes:
> > ... In pseudo-code, the operations to
> > read the control information are:
> 
> > WriteControl:
> > 1. Set latch.
> > 2. Update control information
> > 3. Increment latch version number.
> > 4. Unset latch.
> 
> > ReadControl:
> > 1. Read latch version number.
> > 2. Read control information.
> > 3. Check latch. If locked go to step 1.
> > 4. Read latch version number. If the value is different from the
> >value read in step 1, go to step 1.
> 
> Hm, I don't see how that's safe in the presence of concurrent would-be
> writers?  (Or is that what you meant by "queuing up lock requests"?)
> 
>   regards, tom lane

The latch is definitely safe for readers and writers concurrently
accessing the information. It does not provide the ordered waiting
for a lock that the LWLock will. It is also so light-weight that
for the types of reads and updates to the shared areas that it may
outperform the existing LWLock even during contention.

Ken

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Dave Page

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Marc G. Fournier
> Sent: 01 December 2005 17:01
> To: Peter Eisentraut
> Cc: pgsql-hackers@postgresql.org; Euler Taveira de Oliveira; 
> Richard Huxton; Robert Treat; Magnus Hagander; Marc G. 
> Fournier; [EMAIL PROTECTED]; Tom Lane; Andrew Dunstan
> Subject: Re: [pgsql-www] [HACKERS] Upcoming PG re-releases
> 
> On Thu, 1 Dec 2005, Peter Eisentraut wrote:
> 
> > Am Donnerstag, 1. Dezember 2005 11:35 schrieb Euler Taveira 
> de Oliveira:
> >> What about an museum.postgresql.org to keep the old releases?
> >
> > That gave me a good laugh, but there is something to be 
> said about moving all
> > no longer supported releases (according to the criteria 
> that are being
> > discussed) to an unmirrored site, say, archive.postgresql.org.
> 
> That would be fairly trivial ... let me add it to the 'todo 
> list' ... I 
> take it that it would be safe to relegate the /pub/source/OLD 
> stuff there 
> too?

Not so trivial to put behind a web interface or the download tracker
though. Is it really necessary to have a separate archive downloads
site? It's not like the old ones get in the way, or cost anything other
than disk space on the mirrors to store (and I've only ever heard mirror
admins say how small our site is compared to many others!).

Plus of course, weren't we trying to reduce the number of VMs/sites?

Regards, Dave.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Marc G. Fournier


On Thu, 1 Dec 2005, Peter Eisentraut wrote:


Am Donnerstag, 1. Dezember 2005 11:35 schrieb Euler Taveira de Oliveira:

What about an museum.postgresql.org to keep the old releases?


That gave me a good laugh, but there is something to be said about moving all
no longer supported releases (according to the criteria that are being
discussed) to an unmirrored site, say, archive.postgresql.org.


That would be fairly trivial ... let me add it to the 'todo list' ... I 
take it that it would be safe to relegate the /pub/source/OLD stuff there 
too?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Tom Lane

Neil Conway <[EMAIL PROTECTED]> writes:
> ... ISTM that a simple improvement to what we have now
> would allow for a wider range of planner hints with only minor changes:
> we could replace the enable_XXX variables with a set of variables that
> would add an arbitrary constant to the estimated cost of each type of
> query node. (Alternatively, an arbitrary multiplier could be specified;
> I'm not sure which would be better.)

I think the multiplier would be better, because it'd avoid the problem
you mention later that useful values would be dependent on the planner's
cost units.  Also, one could sanely allow a multiplier less than one,
so as to favor instead of penalize a particular plan type.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] generalizing the planner knobs

2005-12-01 Thread Jonah H. Harris

Hey Neil,

In the last couple weeks I too have been thinking about planner
hints.  Assuming I have read your post correctly, the issue I see
with this idea is that, in most cases, there won't be much of a
difference between adding an arbitrary cost value to each type of node
and disabling it completely.  Also, by fiddling with an arbitrary
cost the user may introduce a lot of variation into the planner which
may actually result in worse query plans.

While Tom's done a great job with the planner, there are certain cases
where a user knows exactly what type of join or index they want to use
for a query.  In that case I'd favor run-time hints from the user
similar to Oracle.  I've read about seven papers on query
optimization and planning in the last few weeks and have a lot of
ideas... I'm just not sure when I may get time to work on them :(

-JonahOn 12/1/05, Neil Conway <[EMAIL PROTECTED]> wrote:
There are currently some rather crude knobs for persuading the plannerto favour certain kinds of query plans: the enable_XXX GUC variables.Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hintsat the moment, but ISTM that a simple improvement to what we have nowwould allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables thatwould add an arbitrary constant to the estimated cost of each type ofquery node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)This would also be useful when diagnosing bad query plans: for example,setting enable_seqscan=false often causes the planner to disregard theuse of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow morealternative plans to be examined.On the other hand, the whole mechanism is still a hack. It also meansthat applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, ifyou're in the sort of desperate straights where this sort of hackery isrequired, perhaps that's acceptable.Comments?-Neil
---(end of broadcast)---TIP 4: Have you searched our list archives?   http://archives.postgresql.org

Re: [HACKERS] Another way to reduce pg_subtrans lookup overhead

2005-12-01 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I thought of a different approach to it, which is to make snapshot
>> checking take a hint from TransactionIdIsInProgress: use the subxid
>> caches from the PG_PROC array.

> Yeah, I had thought about using the XidCache while checking your other
> approach yesterday.  I was thinking however not in saving the Xids in
> the snapshot, but using the ones already in PGPROC.  Not sure if that is
> as useful, but looks like at least it would be able to reduce the
> probability of examining pg_subtrans in some cases without introducing
> extra cost.

How would that work?  The ones in PGPROC, by the time you are examining
the snapshot, might have little to do with the ones that were valid at
the time the snap was taken.  Another problem is that (AFAICS) you'd
have to lock the ProcArray to look at them, and the whole point of this
exercise is to avoid needing to lock shared data structures during
HeapTupleSatisfiesSnapshot.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] A couple of proposed pgbench changes

2005-12-01 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>> ... I wonder if it would help for pgbench to
>> fork off multiple sub-processes and have each sub-process tend just one
>> backend.

> I'm not sure multiple sub-processes version of pgbench shows superior
> performance than current implementation because of process context
> switching overhead. Maybe threading is better? Mr. Yasuo Ohgaki
> implemented pthead version of pgbench.

Oh, I wasn't aware someone had done it.  Last night I rewrote pgbench
to fork() off one subprocess for each client.  At least on the TPC-B
script, it's a bit slower, which lends weight to your worry about extra
context swap overhead.  But there is something funny going on on the
backend side too.  I don't have the numbers in front of me right now,
but very roughly this is what I was seeing:

pgbench alone   + idle transaction

stock pgbench   680 tps 330 tps
fork() version  640 tps 50 tps

It's hard to explain that last number as being pgbench's fault.  I think
that somehow the fork() version is stressing the backends more than
stock pgbench does.  It might have to do with the fact that the stock
version delivers commands to multiple backends in lockstep whereas the
fork() version is more random.  I've been poking at it to determine why
without a lot of success, but one interesting thing I've found out is
that with the fork() version there is a *whole* lot more contention for
the SubTransControlLock.  I hacked lwlock.c to print out per-process
counts of LWLock acquisitions and number of times blocked, and this is
what I got (with the prototype patch I posted the other day to take
shared lock for SubTransGetParent):

stock pgbench, no idle xact:

PID 31529 lwlock 14: shacq 438 exacq 1 blk 0
PID 31530 lwlock 14: shacq 401 exacq 0 blk 0
PID 31531 lwlock 14: shacq 378 exacq 0 blk 0
PID 31532 lwlock 14: shacq 381 exacq 1 blk 0
PID 31533 lwlock 14: shacq 377 exacq 2 blk 0
PID 31534 lwlock 14: shacq 354 exacq 0 blk 0
PID 31535 lwlock 14: shacq 373 exacq 0 blk 0
PID 31536 lwlock 14: shacq 373 exacq 0 blk 0
PID 31537 lwlock 14: shacq 370 exacq 1 blk 0
PID 31538 lwlock 14: shacq 377 exacq 0 blk 0

fork(), no idle xact:

PID 414 lwlock 14: shacq 82401 exacq 0 blk 0
PID 415 lwlock 14: shacq 82500 exacq 3 blk 0
PID 417 lwlock 14: shacq 77727 exacq 2 blk 0
PID 419 lwlock 14: shacq 83272 exacq 2 blk 0
PID 421 lwlock 14: shacq 78579 exacq 2 blk 0
PID 424 lwlock 14: shacq 82704 exacq 0 blk 0
PID 426 lwlock 14: shacq 82252 exacq 2 blk 0
PID 429 lwlock 14: shacq 86002 exacq 0 blk 0
PID 431 lwlock 14: shacq 86617 exacq 2 blk 0
PID 432 lwlock 14: shacq 78842 exacq 1 blk 0

stock pgbench + idle xact:

PID 17868 lwlock 14: shacq 3342147 exacq 3250 blk 67
PID 17869 lwlock 14: shacq 3318728 exacq 3477 blk 74
PID 17870 lwlock 14: shacq 3324261 exacq 3858 blk 102
PID 17871 lwlock 14: shacq 3388431 exacq 3436 blk 120
PID 17872 lwlock 14: shacq 3409427 exacq 4232 blk 108
PID 17873 lwlock 14: shacq 3416117 exacq 5763 blk 130
PID 17874 lwlock 14: shacq 3396471 exacq 4860 blk 70
PID 17875 lwlock 14: shacq 3369113 exacq 4828 blk 161
PID 17876 lwlock 14: shacq 3428814 exacq 5286 blk 193
PID 17877 lwlock 14: shacq 3476198 exacq 5073 blk 147

fork() + idle xact:

PID 519 lwlock 14: shacq 83979662 exacq 2 blk 7
PID 526 lwlock 14: shacq 94968544 exacq 1 blk 1
PID 529 lwlock 14: shacq 91672324 exacq 0 blk 2
PID 530 lwlock 14: shacq 92307866 exacq 3 blk 16
PID 531 lwlock 14: shacq 93694118 exacq 0 blk 2
PID 532 lwlock 14: shacq 90776114 exacq 1 blk 2
PID 533 lwlock 14: shacq 89445464 exacq 1 blk 2
PID 534 lwlock 14: shacq 94407745 exacq 2 blk 2
PID 535 lwlock 14: shacq 88223627 exacq 2 blk 2
PID 536 lwlock 14: shacq 87223449 exacq 3 blk 6

I don't know yet why pgbench would be able to affect this, but it's
repeatable.  If anyone's interested in trying to duplicate it, I'll
post my version of pgbench.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Richard Huxton


Csaba Nagy wrote:

Maybe "mausoleum" would be even better name :-D


Come on people, it's clearly: elephants-graveyard.postgresl.org

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Csaba Nagy

Maybe "mausoleum" would be even better name :-D

Cheers,
Csaba.

On Thu, 2005-12-01 at 11:35, Euler Taveira de Oliveira wrote:
> --- Richard Huxton  escreveu:
> 
> > If it's practical to keep them, I'd like to suggest doing so. If it's
> > not practical, could we have a where_to_find_old_versions.txt file
> > and 
> > open a project on sourceforge to keep them?
> > 
> What about an museum.postgresql.org to keep the old releases?
> 
> 
> Euler Taveira de Oliveira
> euler[at]yahoo_com_br
> 
> 
>   
> 
> 
> 
>   
>   
> ___ 
> Yahoo! doce lar. Faça do Yahoo! sua homepage. 
> http://br.yahoo.com/homepageset.html
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Peter Eisentraut

Am Donnerstag, 1. Dezember 2005 11:35 schrieb Euler Taveira de Oliveira:
> What about an museum.postgresql.org to keep the old releases?

That gave me a good laugh, but there is something to be said about moving all 
no longer supported releases (according to the criteria that are being 
discussed) to an unmirrored site, say, archive.postgresql.org.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Vertical Partitioning with TOAST

2005-12-01 Thread Junji TERAMOTO

Hi all,

I wrote a experimental patch for a vertical partitioning
function.

I decided to use the code of TOAST to create the function
easily. In a word, the row that the user specified is forcedly
driven out with TOAST.

The performance gain of 10% was seen by driving out c_data of the
customer table in the DBT-2 benchmark in our environment.

The mechanism of TOAST is an overdesigned system to use it for a
vertical partitioning. Because the overhead of processing is large,
the performance might down according to the environment.

There are seriously a lot of things that should be considered if
a vertical partitioning is mounted.
For instance, TOAST index is omitted, and ctid is used for link.

Your comments are welcome. Thanks.

---
How To Use
---
Use "ALTER TABLE" command.
http://www.postgresql.org/docs/8.1/static/sql-altertable.html

 ALTER TABLE name ALTER COLUMN column SET STRAGE FORCEEXTERNAL;

I do not understand whether "FORCEEXTERNAL" is an appropriate
word. Please teach when there is a better word...


-- 
Junji Teramoto
diff -purN postgresql-8.1.0.org/src/backend/access/heap/heapam.c 
postgresql-8.1.0/src/backend/access/heap/heapam.c
--- postgresql-8.1.0.org/src/backend/access/heap/heapam.c   2005-10-15 
11:49:08.0 +0900
+++ postgresql-8.1.0/src/backend/access/heap/heapam.c   2005-12-01 
15:31:38.307713257 +0900
@@ -1070,6 +1070,36 @@ heap_get_latest_tid(Relation relation,
}   /* end of loop 
*/
 }
 
+// Add by junji from here
+/*
+ * has_rel_forceexternal - Is there "SET STORAGE FORCEEXTERNAL"ed rows?
+ */
+bool
+has_rel_forceexternal(Relation relation)
+{
+   TupleDesc   tupleDesc;
+   Form_pg_attribute *att;
+   int numAttrs;
+   int i;
+
+   /*
+* Get the tuple descriptor and break down the tuple(s) into fields.
+*/
+   tupleDesc = relation->rd_att;
+   att = tupleDesc->attrs;
+   numAttrs = tupleDesc->natts;
+
+   for (i = 0; i < numAttrs; i++)
+   {
+   if (att[i]->attstorage == 'f')
+   return true;
+   }
+   
+   return false;
+}
+// Add by junji to here
+
+
 /*
  * heap_insert - insert tuple into a heap
  *
@@ -1130,6 +1160,9 @@ heap_insert(Relation relation, HeapTuple
 * out-of-line attributes from some other relation, invoke the toaster.
 */
if (HeapTupleHasExternal(tup) ||
+// Add by junji from here
+   (has_rel_forceexternal(relation)) ||
+// Add by junji to here
(MAXALIGN(tup->t_len) > TOAST_TUPLE_THRESHOLD))
heap_tuple_toast_attrs(relation, tup, NULL);
 
@@ -1762,6 +1795,9 @@ l2:
 */
need_toast = (HeapTupleHasExternal(&oldtup) ||
  HeapTupleHasExternal(newtup) ||
+// Add by junji from here
+ (has_rel_forceexternal(relation)) ||
+// Add by junji to here
  (MAXALIGN(newtup->t_len) > 
TOAST_TUPLE_THRESHOLD));
 
newtupsize = MAXALIGN(newtup->t_len);
diff -purN postgresql-8.1.0.org/src/backend/access/heap/tuptoaster.c 
postgresql-8.1.0/src/backend/access/heap/tuptoaster.c
--- postgresql-8.1.0.org/src/backend/access/heap/tuptoaster.c   2005-10-15 
11:49:09.0 +0900
+++ postgresql-8.1.0/src/backend/access/heap/tuptoaster.c   2005-12-01 
15:29:29.722579466 +0900
@@ -512,6 +512,46 @@ toast_insert_or_update(Relation rel, Hea
}
}
 
+// Add by junji from here
+   /*
+* We look for attributes of attstorage 'f'.
+*/
+   if (rel->rd_rel->reltoastrelid != InvalidOid)
+   {
+   Datum   old_value;
+
+   /*--
+* Search for the biggest yet inlined attribute with
+* attstorage equals 'x' or 'e'
+*--
+*/
+   for (i = 0; i < numAttrs; i++)
+   {
+   if (toast_action[i] == 'p')
+   continue;
+   if (VARATT_IS_EXTERNAL(toast_values[i]))
+   continue;
+   if (att[i]->attstorage != 'f')
+   continue;
+
+   /*
+* Store this external
+*/
+   old_value = toast_values[i];
+   toast_action[i] = 'p';
+   toast_values[i] = toast_save_datum(rel, 
toast_values[i]);
+   if (toast_free[i])
+   pfree(DatumGetPointer(old_value));
+
+   toast_free[i] = true;
+   toast_sizes[i] = VARATT_SIZE(toast_values[i]);
+
+   need_change = true;
+   need_free = true;
+   }
+   }
+// Add by junji to here
+
/*

Re: [HACKERS] Another way to reduce pg_subtrans lookup overhead

2005-12-01 Thread Alvaro Herrera

Tom Lane wrote:
> I mentioned yesterday that I'm looking at the problem of excessive
> accesses to pg_subtrans when there is an old open transaction:
> http://archives.postgresql.org/pgsql-hackers/2005-11/msg01547.php
> 
> I thought of a different approach to it, which is to make snapshot
> checking take a hint from TransactionIdIsInProgress: use the subxid
> caches from the PG_PROC array.  The idea is to have GetSnapshot save
> not just the top-level active xids, but also the subxids, plus an
> indicator of whether any of the subxids caches are overflowed.
> Then, when checking to see if an xid is active according to a snapshot,
> we can skip the SubTransGetTopmostTransaction() call if the overflow
> flag isn't set --- instead, just look through the subxids for a match.
> 
> This approach would reduce the cost of snapshot checking in the "normal"
> case where there are no overflowed subxid caches, but it doesn't help at
> all if there are; plus it increases the cost of capturing a snapshot.
> So I'm not sure how much net win there would be, if any.

Yeah, I had thought about using the XidCache while checking your other
approach yesterday.  I was thinking however not in saving the Xids in
the snapshot, but using the ones already in PGPROC.  Not sure if that is
as useful, but looks like at least it would be able to reduce the
probability of examining pg_subtrans in some cases without introducing
extra cost.

Except that PGPROC will be staying locked longer ... I wonder if that is
problematic.  If it is, maybe it can be alleviated by introducing one
spinlock per backend in ProcArray in addition to the global lock, so the
latter can be released while each backend is checked.

But then maybe I'm just talking out of my caffeine deficit.

> Thoughts anyone?

Maybe it's possible to get useful numbers from the XidCache
XIDCACHE_DEBUG code ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Euler Taveira de Oliveira

--- Richard Huxton  escreveu:

> If it's practical to keep them, I'd like to suggest doing so. If it's
> not practical, could we have a where_to_find_old_versions.txt file
> and 
> open a project on sourceforge to keep them?
> 
What about an museum.postgresql.org to keep the old releases?


Euler Taveira de Oliveira
euler[at]yahoo_com_br








___ 
Yahoo! doce lar. Faça do Yahoo! sua homepage. 
http://br.yahoo.com/homepageset.html 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Docs misspelling

2005-12-01 Thread Neil Conway

On Thu, 2005-12-01 at 18:33 +0800, Christopher Kings-Lynne wrote:
> 36.7.3.5. FOR (integer variant)
> 
> In the 8.1 docs.  "Label" has been spelt "Labal".

Thanks, fixed in HEAD and REL8_1_STABLE.

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Docs misspelling

2005-12-01 Thread Christopher Kings-Lynne


36.7.3.5. FOR (integer variant)

In the 8.1 docs.  "Label" has been spelt "Labal".

Chris


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] generalizing the planner knobs

2005-12-01 Thread Neil Conway

There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Richard Huxton


Robert Treat wrote:

On Wed, 2005-11-30 at 13:33, Magnus Hagander wrote:

Someone suggested earlier that we should drop the binaries for
nonsupported versions completely from the ftp site. Thoughts on this?

If not, they should at least go into OLD as well. But personally, I'm
for dropping them completely. If you're on something that old (heck, we
have 7.0 binaries..), you can still build from source.



I'm against the idea... the cost for us is minimal, and the hassle
involved in building from source is quite large. 


I don't have a need for an old PG binary. But when I have needed really 
old binaries it's always been in the middle of the night, in front of a 
machine with a teletype terminal, in the dark, surrounded by wolves 
while a timer ticks into the red... Locating the right versions of 17 
different libraries and compiling from source is always my second choice.


If it's practical to keep them, I'd like to suggest doing so. If it's 
not practical, could we have a where_to_find_old_versions.txt file and 
open a project on sourceforge to keep them?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

66 matches

Mail list logo