[HACKERS] problem with web interface for mailing lists?

2006-12-21 Thread Pavel Stehule

Hello,

I see las actualisation from 18. december

regards
Pavel Stehule

_
Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Inaam Rana

On 12/22/06, Takayuki Tsunakawa <[EMAIL PROTECTED]> wrote:


From: "Takayuki Tsunakawa" <[EMAIL PROTECTED]>
> (5) (4) + /proc/sys/vm/dirty* tuning
> dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
> changed from 40 to 4.
>
> 308  349  84  349  84

Sorry, I forgot to include the result when using Itagaki-san's patch.
The patch showd the following tps for case (5).

323  350  340  59  225

The best response time was 4 msec, and the worst one was 16 seconds.




Which IO Shceduler (elevator) you are using?

--
Inaam Rana
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa
From: "Takayuki Tsunakawa" <[EMAIL PROTECTED]>
> (5) (4) + /proc/sys/vm/dirty* tuning
> dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
> changed from 40 to 4.
>
> 308  349  84  349  84

Sorry, I forgot to include the result when using Itagaki-san's patch.
The patch showd the following tps for case (5).

323  350  340  59  225

The best response time was 4 msec, and the worst one was 16 seconds.


- Original Message - 
From: "Takayuki Tsunakawa" <[EMAIL PROTECTED]>
To: "ITAGAKI Takahiro" <[EMAIL PROTECTED]>
Cc: 
Sent: Friday, December 22, 2006 3:20 PM
Subject: Re: [HACKERS] Load distributed checkpoint


> Hello, Itagaki-san,
>
> Thank you for an interesting piece of information.
>
> From: "ITAGAKI Takahiro" <[EMAIL PROTECTED]>
>> If you use linux, try the following settings:
>>  1. Decrease /proc/sys/vm/dirty_ratio and dirty_background_ratio.
>>  2. Increase wal_buffers to redule WAL flushing.
>>  3. Set wal_sync_method to open_sync; O_SYNC is faster then
fsync().
>>  4. Separate data and WAL files into different partitions or disks.
>>
>> I suppose 1 is important for you, because kernel will not write
> dirty
>> buffers until 10% of buffers become dirty in default settings.
>> You have large memory (8GB), but small data set (800MB). So kernel
>> almost never writes buffers not in checkpoints. Accumulate dirty
> buffers
>> are written at a burst in fsync().
>
> I'll show the results of this tuning to share information with
people
> who don't have experience of this kind.
> The numbers shown below are the tps when running "pgbench -c16 -t100
> postgres" five times in succession.
>
> (1) Default case(this is show again for comparison and reminder)
> The bgwriter_* and checkpoint_* are set to those defaults.
> wal_buffers and wal_sync_method are also set to those defaults (64kB
> and fdatasync respectively.)
>
> 235  80  226  77  240
>
>
> (2) Default + WAL 1MB case
> The configuration is the same as case (1) except that wal_buffers is
> set to 1024kB.
>
> 302  328  82  330  85
>
> This is better improvement than I expected.
>
>
> (3) Default + wal_sync_method=open_sync case
> The configuration is the same as case (1) except that
wal_sync_method
> is set to open_sync.
>
> 162  67  176  67  164
>
> Too bad compared to case (2).  Do you know the reason?
>
>
> (4) (2)+(3) case
>
> 322  350  85  321  84
>
> This is good, too.
>
>
> (5) (4) + /proc/sys/vm/dirty* tuning
> dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
> changed from 40 to 4.
>
> 308  349  84  349  84
>
> The tuning of kernel cache doesn't appear to bring performance
> improvement in my env.  The kernel still waits too long before it
> starts flushing dirty buffers because the cache is large?  If so,
> increasingly available RAM may cause trouble more frequently in the
> near future.  Do the dirty_*_ratio accept values less than 1?
>
> BTW, in case (1), the best response time of a transaction was 6
> milliseconds.  On the other hand, the worst response time was 13
> seconds.
>
>
>> We would be happy if we would be free from a difficult combination
>> of tuning. If you have *idea for improvements*, please suggest it.
>> I think we've already understood *problem itself*.
>
> I agree with you.  Let's make the ideas more concrete, doing some
> experimentations.
>
>
>
>
> ---(end of
broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
>
>http://www.postgresql.org/about/donate
>



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Greg Smith

On Wed, 20 Dec 2006, Inaam Rana wrote:

Talking of bgwriter_* parameters I think we are missing a crucial 
internal counter i.e. number of dirty pages. How much work bgwriter has 
to do at each wakeup call should be a function of total buffers and 
currently dirty buffers.


This is actually a question I'd been meaning to throw out myself to this 
list.  How hard would it be to add an internal counter to the buffer 
management scheme that kept track of the current number of dirty pages? 
I've been looking at the bufmgr code lately trying to figure out how to 
insert one as part of building an auto-tuning bgwriter, but it's unclear 
to me how I'd lock such a resource properly and scalably.  I have a 
feeling I'd be inserting a single-process locking bottleneck into that 
code with any of the naive implementations I considered.


The main problem I've been seeing is also long waits stuck behind a slow 
fsync on Linux.  What I've been moving toward testing is an approach 
slightly different from the proposals here.  What if all the database page 
writes (background writer, buffer eviction, or checkpoint scan) were 
counted and periodic fsync requests send to the bgwriter based on that? 
For example, when I know I have a battery-backed caching controller that 
will buffer 64MB worth of data for me, if I forced a fsync after every 
6000 8K writes, no single fsync would get stuck waiting for the disk to 
write for longer than I'd like.


Give the admin a max_writes_before_sync parameter, make the default of 0 
work just like the current behavior, and off you go; a simple tunable that 
doesn't require a complicated scheme to implement or break anybody's 
existing setup.  Combined with a properly tuned background writer, that 
would solve the issues I've been running into.  It would even make the 
problem of Linux caching too many writes until checkpoint time go away (I 
know how to eliminate that by adjusting caching policy, but I have to be 
root to do it; a DBA should be able to work around that issue even if they 
don't have access to the kernel tunables.)


While I'm all for testing to prove me wrong, my gut feel is that going all 
the way to sync writes a la Oracle is a doomed approach, particularly on 
low-end hardware where they're super expensive.  Following The Oracle Way 
is a good roadmap for a lot of things, but I wouldn't put building a lean 
enough database to run on modest hardware on that list.  You can do sync 
writes with perfectly good performance on systems with a good 
battery-backed cache, but I think you'll get creamed in comparisons 
against MySQL on IDE disks if you start walking down that path; since 
right now a fair comparison with similar logging behavior is an even match 
there, that's a step backwards.


Also on the topic of sync writes to the database proper:  wouldn't using 
O_DIRECT for those potentially counter-productive?  I was under the 
impressions that one of the behaviors counted on by Postgres was that data 
evicted from its buffer cache, eventually intended for writing to disk, 
was still kept around for a bit in the OS buffer cache.  A subsequent read 
because the data was needed again might find the data already in the OS 
buffer, therefore avoiding an actual disk read; that substantially reduces 
the typical penalty for the database engine making a bad choice on what to 
evict.  I fear a move to direct writes would put more pressure on the LRU 
implementation to be very smart, and that's code that you really don't 
want to be more complicated.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Tom Lane
"Jim Nasby" <[EMAIL PROTECTED]> writes:
> The only other thought that comes to mind is that such syntax will  
> make it a *lot* more verbose to set all the options for a table.

Which should surely make you wonder whether setting these options
per-table is the most important thing to do...

Arguing about syntax details is pretty premature, in my humble opinion.
We don't have agreement yet about what options we need or what scope
they should apply over.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa
Hello, Itagaki-san,

Thank you for an interesting piece of information.

From: "ITAGAKI Takahiro" <[EMAIL PROTECTED]>
> If you use linux, try the following settings:
>  1. Decrease /proc/sys/vm/dirty_ratio and dirty_background_ratio.
>  2. Increase wal_buffers to redule WAL flushing.
>  3. Set wal_sync_method to open_sync; O_SYNC is faster then fsync().
>  4. Separate data and WAL files into different partitions or disks.
>
> I suppose 1 is important for you, because kernel will not write
dirty
> buffers until 10% of buffers become dirty in default settings.
> You have large memory (8GB), but small data set (800MB). So kernel
> almost never writes buffers not in checkpoints. Accumulate dirty
buffers
> are written at a burst in fsync().

I'll show the results of this tuning to share information with people
who don't have experience of this kind.
The numbers shown below are the tps when running "pgbench -c16 -t100
postgres" five times in succession.

(1) Default case(this is show again for comparison and reminder)
The bgwriter_* and checkpoint_* are set to those defaults.
wal_buffers and wal_sync_method are also set to those defaults (64kB
and fdatasync respectively.)

235  80  226  77  240


(2) Default + WAL 1MB case
The configuration is the same as case (1) except that wal_buffers is
set to 1024kB.

302  328  82  330  85

This is better improvement than I expected.


(3) Default + wal_sync_method=open_sync case
The configuration is the same as case (1) except that wal_sync_method
is set to open_sync.

162  67  176  67  164

Too bad compared to case (2).  Do you know the reason?


(4) (2)+(3) case

322  350  85  321  84

This is good, too.


(5) (4) + /proc/sys/vm/dirty* tuning
dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
changed from 40 to 4.

308  349  84  349  84

The tuning of kernel cache doesn't appear to bring performance
improvement in my env.  The kernel still waits too long before it
starts flushing dirty buffers because the cache is large?  If so,
increasingly available RAM may cause trouble more frequently in the
near future.  Do the dirty_*_ratio accept values less than 1?

BTW, in case (1), the best response time of a transaction was 6
milliseconds.  On the other hand, the worst response time was 13
seconds.


> We would be happy if we would be free from a difficult combination
> of tuning. If you have *idea for improvements*, please suggest it.
> I think we've already understood *problem itself*.

I agree with you.  Let's make the ideas more concrete, doing some
experimentations.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Jim Nasby

On Dec 21, 2006, at 1:28 PM, Andrew Dunstan wrote:

Jim Nasby wrote:

How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]


Given these remarks from Tom:


Where we are currently at is experimenting to find
out what autovacuum's control knobs ought to be.  The catalog  
table was

a suitably low-effort way to expose a first cut at the knobs.


doesn't making language level changes seem more than somewhat  
premature? Or have we finished experimenting?


Well, the only one I could possibly see removing would be threshold,  
but the reality is that these parameters have been kicking around  
since 7.4, so...


But I do like Richard Huxton's suggestion for syntax... that looks a  
lot more flexible than what I proposed.


The only other thought that comes to mind is that such syntax will  
make it a *lot* more verbose to set all the options for a table. But  
I don't know how often people feel the need to set *all* of them at  
once... Still, it might be worth continuing to support people poking  
values directly into the table; I just don't think we want to make  
that the official interface.

--
Jim Nasby   [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Companies Contributing to Open Source

2006-12-21 Thread Guido Barosio


"Companies often bring fresh prespective, ideas, and testing
infrastucture to a project."


"prespective" || "perspective" ?

g.-


On 12/21/06, Kevin Grittner <[EMAIL PROTECTED]> wrote:

>>> On Tue, Dec 19, 2006 at  6:13 PM, in message
<[EMAIL PROTECTED]>, Bruce Momjian
<[EMAIL PROTECTED]> wrote:
> if the company dies, the community keeps going (as it did after
Great
> Bridge, without a hickup), but if the community dies, the company
dies
> too.

This statement seems to ignore organizations for which PostgreSQL is an
implementation detail in their current environment.  While we appreciate
PostgreSQL and are likely to try to make an occasional contribution,
where it seems to be mutually beneficial, the Wisconsin State Courts
would survive the collapse of the PostgreSQL community.

While I can only guess at the reasons you may have put the slant you
did on the document, I think it really should reflect the patient
assistance the community provides to those who read the developers FAQ
and make a good faith effort to comply with what is outlined there.  The
cooperative, professional, and helpful demeanor of the members of this
community is something which should balanced against the community's
need to act as a gatekeeper on submissions.

I have recent experience as a first time employee contributor.  When we
hit a bump in our initial use of PostgreSQL because of the non-standard
character string literals, you were gracious in accepting our quick
patch as being possibly of some value in the implementation of the
related TODO item.  You were then helpful in our effort to do a proper
implementation of the TODO item which fixes it.  I see that the patch I
submitted was improved by someone before it made the release, which is
great.

This illustrates how the process can work.  I informed management of
the problem, and presented the options -- we could do our own little
hack that we then had to maintain and apply as the versions moved along,
or we could try to do fix which the community would accept and have that
feature "just work" for us for all subsequent releases.  The latter was
a little more time up front, but resulted in a better quality product
for us, and less work in the long term.  It was also presumably of some
benefit to the community, which has indirect benefit to our
organization.  Nobody here wants to switch database products again soon,
so if we can solve our problem in a way that helps the product gain
momentum, all the better.

I ran a consulting business for decades, and I know that there is a
great variation in the attitudes among managers.  Many are quite
reasonable.  I'm reminded of a meeting early in my career with a
businessman who owned and operated half a dozen successful businesses in
a variety of areas.  He proposed a deal that I was on the verge of
accepting, albeit somewhat reluctantly.  He stopped me and told me that
he hoped to continue to do business with me, so any deal we made had to
benefit and work for both of us or it was no good at all; if I was
uncomfortable with something in the proposal, we should talk it out.
That's the core of what we're trying to say in this document, isn't it?
The rest is an executive overview of the developer FAQ?  I can't help
feeling that even with the revisions so far it could have a more
positive "spin".

-Kevin



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




--
Guido Barosio
---
http://www.globant.com
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread ITAGAKI Takahiro

"Takayuki Tsunakawa" <[EMAIL PROTECTED]> wrote:

> > For pg, half RAM for shared_buffers is too much. The ratio is good for
> > other db software, that does not use the OS cache.
> 
> What percentage of RAM is recommended for shared buffers in general?
> 40%?  30%?  Or, is the general recommendation like "According to the
> amount of your data, this much RAM should be left for the kernel
> cache.  But tha's the story on Linux.  It may be different for other
> OSes."?
> Hmm,  if it is so, it sounds hard for system designers/administrators
> to judge.

If you use linux, try the following settings:
  1. Decrease /proc/sys/vm/dirty_ratio and dirty_background_ratio.
  2. Increase wal_buffers to redule WAL flushing.
  3. Set wal_sync_method to open_sync; O_SYNC is faster then fsync().
  4. Separate data and WAL files into different partitions or disks.

I suppose 1 is important for you, because kernel will not write dirty
buffers until 10% of buffers become dirty in default settings.
You have large memory (8GB), but small data set (800MB). So kernel
almost never writes buffers not in checkpoints. Accumulate dirty buffers
are written at a burst in fsync().


We would be happy if we would be free from a difficult combination
of tuning. If you have *idea for improvements*, please suggest it.
I think we've already understood *problem itself*.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Load distributed checkpoint patch

2006-12-21 Thread Takayuki Tsunakawa
Hello, Mr. Grittner,

From: "Kevin Grittner" <[EMAIL PROTECTED]>
> We have 3,000 "directly connected" users, various business partner
> interfaces, and public web entry doing OLTP in 72 databases
distributed
> around the state, with real-time replication to central databases
which
> are considered derived copies.

What a big system you have.

>   If all the pages modified on the central
> databases were held in buffers or cache until after peak hours,
query
> performance would suffer -- assuming it would all even fit in cache.
We
> must have a way for dirty pages to be written under load while
> responding to hundreds of thousands of queries per hour without
> disturbing "freezes" during checkpoints.

I agree with you.  My words were not good.  I consider it is necessary
to always advance checkpoints even under heavy load, caring OLTP
transactions.

> I raise this only to be sure that such environments are considered
with
> these changes, not to discourage improvements in the checkpoint
> techniques.  We have effectively eliminated checkpoint problems in
our
> environment with a combination of battery backed controller cache
and
> aggressive background writer configuration.  When you have a patch
which
> seems to help those who still have problems, I'll try to get time
> approved to run a transaction replication stream onto one of our
servers
> (in "catch up mode") while we do a web "stress test" by playing back
> requests from our production log.  That should indicate how the
patch
> will affect us.

Thank you very much for your kind offer.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Stats Collector Oddity

2006-12-21 Thread Christopher Browne
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Tom Lane) wrote:
> Chris Browne <[EMAIL PROTECTED]> writes:
>> There isn't any way, short of restarting the postmaster, to get rid of
>> that PID, is there?
>
> The entry will get overwritten when that BackendId slot gets re-used,
> so just starting enough concurrent backends should do it.  (Since
> incoming backends always take the lowest free slot, the fact that the
> dead entry has persisted awhile means that it must have a number higher
> than your normal number of concurrent sessions ... which is evidence
> in favor of the idea that it happened during a load spike ...)

Cool.  I started up a nice little bunch of psql sessions in the
background, and then once they were all up, shut down my shell
session, thereby eliminating them.  And that did, indeed, clear out
that pg_stat_activity entry.

... And five minutes later, Nagios sent me message indicating that
node had recovered from having an "ancient" open connection.

I'll re-add a few gratuitous details here in the hopes that that makes
this easily findable if anyone else should search for the issue...

The Problem:
 - pg_stat_activity was reporting an elderly transaction in progress

 - that backend process wasn't running anymore

 - pg_stat_activity *was* reporting other legitimate activity; this
   was not the scenario where it had gotten deranged (normally due to
   excessive load)

 - Per Tom's comments, there evidently *was* some load spike where
   the closing of this particular connection did not get logged by
   the stats collector

The Solution: 

 - We needed to roll the stats collector through a bunch of its slots
   in order to clean the apparently-still-populated entry out.

 - Ran, in a shell:
 for i in `seq 100`; do
 psql &
 done

   That left 100 psql sessions in the background, all connected to the
   database backend.

 - Closed the shell.  That then HUPped the 100 psql sessions.

That got the offending pg_stat_activity entry cleared out.
-- 
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/finances.html
"Temporary tattoos  are a CRITICAL  ELEMENT  of our security strategy.
To suggest otherwise is sheer lunacy."  -- Reid Fleming, cDc

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Load distributed checkpoint patch

2006-12-21 Thread ITAGAKI Takahiro

"Kevin Grittner" <[EMAIL PROTECTED]> wrote:

> > I consider that smoothing the load (more meaningfully, response time)
> > has higher priority over checkpoint punctuality in a practical sense,
>  
> I agree with that.

I agree with checkpoint_time is not so important, but we should
respect checkpoint_segements, or else new WAL files would be
created unboundedly, as Bruce pointed.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> Actually, the more I think about it the more I think that 3 numbers
>> might be the answer.  99% of the code would use only the permanent ID.

> Don't we already have such a permanent number -- just one we don't use
> anywhere in the data model? Namely the oid of the pg_attribute entry.

Nope, because pg_attribute hasn't got OIDs.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Actually, the more I think about it the more I think that 3 numbers
> might be the answer.  99% of the code would use only the permanent ID.

Don't we already have such a permanent number -- just one we don't use
anywhere in the data model? Namely the oid of the pg_attribute entry. It's
actually a bit odd that we don't use it since we use the oid of just about
every other system catalog record as the primary key.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Load distributed checkpoint patch

2006-12-21 Thread Kevin Grittner
>>> On Wed, Dec 20, 2006 at  6:05 AM, in message
<[EMAIL PROTECTED]>, "Takayuki Tsunakawa"
<[EMAIL PROTECTED]> wrote: 
> 
> I consider that smoothing the load (more meaningfully, response
time)
> has higher priority over checkpoint punctuality in a practical
sense,
> because the users of a system benefit from good steady response and
> give good reputation to the system.
 
I agree with that.
 
> If the checkpoint processing is
> not punctual, crash recovery would take longer time.  But which
would
> you give higher priority, the unlikely event (=crash of the system)
or
> likely event (=peek hours of the system)?  I believe the latter
should
> be regarded.
 
I'm still with you here.
 
> The system can write dirty buffers after the peek hours
> pass.
 
I don't see that in our busiest environment.
 
We have 3,000 "directly connected" users, various business partner
interfaces, and public web entry doing OLTP in 72 databases distributed
around the state, with real-time replication to central databases which
are considered derived copies.  If all the pages modified on the central
databases were held in buffers or cache until after peak hours, query
performance would suffer -- assuming it would all even fit in cache.  We
must have a way for dirty pages to be written under load while
responding to hundreds of thousands of queries per hour without
disturbing "freezes" during checkpoints.
 
On top of that, we monitor database requests on the source machines,
and during "idle time" we synchronize the data with all of the targets
to identify, log, and correct "drift".  So even if we could shift all
our disk writes to the end of the day, that would have its own down
side, in extending our synchronization cycle.
 
I raise this only to be sure that such environments are considered with
these changes, not to discourage improvements in the checkpoint
techniques.  We have effectively eliminated checkpoint problems in our
environment with a combination of battery backed controller cache and
aggressive background writer configuration.  When you have a patch which
seems to help those who still have problems, I'll try to get time
approved to run a transaction replication stream onto one of our servers
(in "catch up mode") while we do a web "stress test" by playing back
requests from our production log.  That should indicate how the patch
will affect us.
 
-Kevin
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Companies Contributing to Open Source

2006-12-21 Thread Kevin Grittner
>>> On Tue, Dec 19, 2006 at  6:13 PM, in message
<[EMAIL PROTECTED]>, Bruce Momjian
<[EMAIL PROTECTED]> wrote:
> if the company dies, the community keeps going (as it did after
Great
> Bridge, without a hickup), but if the community dies, the company
dies
> too.
 
This statement seems to ignore organizations for which PostgreSQL is an
implementation detail in their current environment.  While we appreciate
PostgreSQL and are likely to try to make an occasional contribution,
where it seems to be mutually beneficial, the Wisconsin State Courts
would survive the collapse of the PostgreSQL community.
 
While I can only guess at the reasons you may have put the slant you
did on the document, I think it really should reflect the patient
assistance the community provides to those who read the developers FAQ
and make a good faith effort to comply with what is outlined there.  The
cooperative, professional, and helpful demeanor of the members of this
community is something which should balanced against the community's
need to act as a gatekeeper on submissions.
 
I have recent experience as a first time employee contributor.  When we
hit a bump in our initial use of PostgreSQL because of the non-standard
character string literals, you were gracious in accepting our quick
patch as being possibly of some value in the implementation of the
related TODO item.  You were then helpful in our effort to do a proper
implementation of the TODO item which fixes it.  I see that the patch I
submitted was improved by someone before it made the release, which is
great.
 
This illustrates how the process can work.  I informed management of
the problem, and presented the options -- we could do our own little
hack that we then had to maintain and apply as the versions moved along,
or we could try to do fix which the community would accept and have that
feature "just work" for us for all subsequent releases.  The latter was
a little more time up front, but resulted in a better quality product
for us, and less work in the long term.  It was also presumably of some
benefit to the community, which has indirect benefit to our
organization.  Nobody here wants to switch database products again soon,
so if we can solve our problem in a way that helps the product gain
momentum, all the better.
 
I ran a consulting business for decades, and I know that there is a
great variation in the attitudes among managers.  Many are quite
reasonable.  I'm reminded of a meeting early in my career with a
businessman who owned and operated half a dozen successful businesses in
a variety of areas.  He proposed a deal that I was on the verge of
accepting, albeit somewhat reluctantly.  He stopped me and told me that
he hoped to continue to do business with me, so any deal we made had to
benefit and work for both of us or it was no good at all; if I was
uncomfortable with something in the proposal, we should talk it out. 
That's the core of what we're trying to say in this document, isn't it? 
The rest is an executive overview of the developer FAQ?  I can't help
feeling that even with the revisions so far it could have a more
positive "spin".
 
-Kevin
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa
- Original Message - 
From: "Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]>
To: "Takayuki Tsunakawa" <[EMAIL PROTECTED]>; "ITAGAKI
Takahiro" <[EMAIL PROTECTED]>
> > Yes, I used half the size of RAM as the shared buffers, which is
> > reasonable.  And I cached all the data.

> For pg, half RAM for shared_buffers is too much. The ratio is good
for
> other db software, that does not use the OS cache.

What percentage of RAM is recommended for shared buffers in general?
40%?  30%?  Or, is the general recommendation like "According to the
amount of your data, this much RAM should be left for the kernel
cache.  But tha's the story on Linux.  It may be different for other
OSes."?
Hmm,  if it is so, it sounds hard for system designers/administrators
to judge.




- Original Message - 
From: "Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]>
To: "Takayuki Tsunakawa" <[EMAIL PROTECTED]>; "ITAGAKI
Takahiro" <[EMAIL PROTECTED]>
Cc: 
Sent: Thursday, December 21, 2006 11:04 PM
Subject: RE: [HACKERS] Load distributed checkpoint



> > You were running the test on the very memory-depend machine.
> >> shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
> > Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
> > data and 2GB of memory. Storage is always the main part of
performace here,
> > even not in checkpoints.
>
> Yes, I used half the size of RAM as the shared buffers, which is
> reasonable.  And I cached all the data.

For pg, half RAM for shared_buffers is too much. The ratio is good for
other db software, that does not use the OS cache.

Andreas



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Bruce Momjian
Lukas Kahwe Smith wrote:
> Bruce Momjian wrote:
> 
> >> [1] http://developer.postgresql.org/index.php/Todo:WishlistFor83
> > 
> > That looks helpful.
> 
> Ok good :)
> 
> Seriously though, please slap me when things are wrong, not assigned yet 
> to the correct person .. there was a bit of guess work involved with 
> some of the points .. especially with the names.
> 
> I will let things sit like they are for 1-2 weeks and I will probably 
> try to get a hold off each of the people still assigned to items then to 
> confirm that they are actually targeting the feature for 8.3 .. or am I 
> duplicating Bruce's efforts when I do this?

What I did for the 8.2 open items tracking during feature freeze was to
list the item description, status, and person's name and email in a
table that could be easily reviewed.  Let me know if you need help
setting that up.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] executing a dml within a utility

2006-12-21 Thread Alvaro Herrera
Ehab Galal wrote:
> I created a CMD_UTILITY, which upon being executed by the user should do 
> the following atomically:
> 1- drops a table
> 2- deletes from another table all rows with a certain criterion: "DELETE 
> FROM pg_t1 WHERE att = val"
> 3- deletes from a third table all rows with a certain criterion: "DELETE 
> FROM pg_t2 WHERE att = val"
> 
> I do the first step using ProcessUtility(). How can i do the 2nd/3rd steps? 
> Any suggestions?

Don't do that.  Instead, use performDeletion and register your tuples in
the pg_t1 and pg_t2 catalogs in pg_depend (using recordDependencyOn).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] executing a dml within a utility

2006-12-21 Thread Ehab Galal
I created a CMD_UTILITY, which upon being executed by the user should do the 
following atomically:

1- drops a table
2- deletes from another table all rows with a certain criterion: "DELETE 
FROM pg_t1 WHERE att = val"
3- deletes from a third table all rows with a certain criterion: "DELETE 
FROM pg_t2 WHERE att = val"


I do the first step using ProcessUtility(). How can i do the 2nd/3rd steps? 
Any suggestions?


I tried calling exec_simple_query() but this didn't work. I am getting an 
error "cannot drop active portal"



Thanks a lot,
ehab

_
Find sales, coupons, and free shipping, all in one place!  MSN Shopping 
Sales & Deals 
http://shopping.msn.com/content/shp/?ctid=198,ptnrid=176,ptnrdata=200639



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] inet/cidr

2006-12-21 Thread Andrew - Supernews
> "Worky" == Worky Workerson <[EMAIL PROTECTED]> writes:

 Worky> I was looking at upgrading to 8.2, but I make extensive use of
 Worky> the IP4 module.

The needed changes to ip4r to build on 8.2 are already in its CVS, and
as far as I know works, the only reason I've not done another release
yet is because I haven't had a chance to test for regressions on 8.1.

-- 
Andrew, Supernews
http://www.supernews.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [COMMITTERS] pgsql: Initial SQL/XML support: xml data type and initial set of

2006-12-21 Thread Peter Eisentraut
Stefan Kaltenbrunner wrote:
> Peter Eisentraut wrote:
> > Log Message:
> > ---
> > Initial SQL/XML support: xml data type and initial set of
> > functions.
>
> this seems to cause regression failures on all the buildfarm members

Should be fixed now.  I don't know why that one file was outdated.
-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [COMMITTERS] pgsql: Initial SQL/XML support: xml data type and

2006-12-21 Thread Stefan Kaltenbrunner
Peter Eisentraut wrote:
> Log Message:
> ---
> Initial SQL/XML support: xml data type and initial set of functions.

this seems to cause regression failures on all the buildfarm members
(none of them are yet building with xml support).

http://www.pgbuildfarm.org/cgi-bin/show_status.pl

Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] tsearch in core patch, for review

2006-12-21 Thread Teodor Sigaev



patch: http://www.sigaev.ru/misc/tsearch_core-0.27.gz

http://www.sigaev.ru/misc/tsearch_core-0.28.gz
new version, because of XML commit - old patch doesn't apply cleanly.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Richard Huxton

Gregory Stark wrote:

"Jim Nasby" <[EMAIL PROTECTED]> writes:


How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]

... or would that create a whole bunch of reserved words?


The way to predict when you're going to run into conflicts in a case like this
is to ask what happens if you have a column named "autovacuum" or
"autoanalyze"...


Might it not be cleaner to treat them as scoped configuration values?

ALTER TABLE foo SET autovacuum.threshold = ...

Presumably it's not going to be the last such setting, and would give 
you a common format for setting all manner of system-object related things:

 - column statistics
 - fill-factor
 - comment
 - per-column locale (when we get it)
 - any module-related tuning (tsearch2? slony?)

That way the parser just needs to treat the next thing after "SET" as a 
(possibly compound) identifier.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Bundle of patches

2006-12-21 Thread Teodor Sigaev

0.9 doesn't apply cleanly after Peter's changes, so, new version

http://www.sigaev.ru/misc/user_defined_typmod-0.10.gz

Teodor Sigaev wrote:

 >>   Perhaps an array of int4 would be better?  How much

Done
http://www.sigaev.ru/misc/user_defined_typmod-0.9.gz


The patch needs more cleanup before applying, too, eg make comments
match code, get rid of unused keywords added to gram.y.


Cleaned.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Andrew Dunstan

Jim Nasby wrote:

How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]



Given these remarks from Tom:


Where we are currently at is experimenting to find
out what autovacuum's control knobs ought to be.  The catalog table was
a suitably low-effort way to expose a first cut at the knobs.


doesn't making language level changes seem more than somewhat premature? 
Or have we finished experimenting?


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Gregory Stark

"Jim Nasby" <[EMAIL PROTECTED]> writes:

> How about...
>
> ALTER TABLE ...
> ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
> ALTER AUTOANALYZE [ THRESHOLD | SCALE ]
>
> ... or would that create a whole bunch of reserved words?

The way to predict when you're going to run into conflicts in a case like this
is to ask what happens if you have a column named "autovacuum" or
"autoanalyze"...

Sometimes the parser can look ahead to the next keyword to determine which
production to use but usually you're best off just looking for a grammatical
construct that doesn't look ambiguous even to a naive human reader.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Jim Nasby

How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]

... or would that create a whole bunch of reserved words?

On Dec 21, 2006, at 10:04 AM, Simon Riggs wrote:


On Wed, 2006-12-20 at 09:47 -0500, Jim Nasby wrote:


On the other hand, this would be the only part of the system where
the official interface/API is a system catalog table. Do we really
want to expose the internal representation of something as our API?
That doesn't seem wise to me...


Define and agree the API (the hard bit) and I'll code it (the easy  
bit).


We may as well have something on the table, even if it changes later.

Dave: How does PgAdmin handle setting table-specific autovacuum
parameters? (Does it?)

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



--
Jim Nasby   [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan

Tom Lane wrote:

Actually, the more I think about it the more I think that 3 numbers
might be the answer.  99% of the code would use only the permanent ID.
Display position would be used in *exactly* one place, namely while
expanding "SELECT foo.*" --- I can't think of any other part of the
backend that would care about it.


Insert without a column list will need the logical ordering, I think. 
Also use of "like foo" in a create table statement. I'm not dead sure 
there aren't one or two others lurking. But I agree that the number is 
small.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Martijn van Oosterhout  writes:
> I was envisiging the physical number to be fixed and immutable (ie
> storage position = permanent position).

There are two different problems being discussed here, and one of them
is insoluble if we take that position: people would like the system to
automatically lay out tables to minimize alignment overhead and access
costs (eg, put fixed-width columns first).  This is not the same as
"I would like to change the display column order".

It's true that for an ADD COLUMN that doesn't already force a table
rewrite, forcing one to improve packing is probably bad.  My thought
would be that we leave the column storage order alone if we don't have
to rewrite the table ... but any rewriting variant of ALTER TABLE could
optimize the storage order while it was at it.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Andrew Dunstan

Lukas Kahwe Smith wrote:

Bruce Momjian wrote:


[1] http://developer.postgresql.org/index.php/Todo:WishlistFor83


That looks helpful.


Ok good :)

Seriously though, please slap me when things are wrong, not assigned 
yet to the correct person .. there was a bit of guess work involved 
with some of the points .. especially with the names.


I will let things sit like they are for 1-2 weeks and I will probably 
try to get a hold off each of the people still assigned to items then 
to confirm that they are actually targeting the feature for 8.3 .. or 
am I duplicating Bruce's efforts when I do this?






To the best of my understanding, the current PSM effort is a pgFoundry 
project to create it as a loadable language. I am not aware of anything 
that would get it to where we could include it in 8.3, so this item 
should probably not be on the list.


Also, "clustered/replication solutions" seems way too vague. It is not 
clear to me what if anything is in prospect on this front for 8.3.



cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Companies Contributing to Open Source

2006-12-21 Thread Simon Riggs
The paper is a good one, from my perspective. It does address important
issues and maybe we don't all agree on the exact places lines have been
drawn, but I think we probably do agree that these things need to be
said. Now that they have been said, we must allow reasonable time for
the understanding to percolate and for appropriate changes of direction
to take place. We can't undo the past, but we can change the future.

On Tue, 2006-12-19 at 19:13 -0500, Bruce Momjian wrote:
> This actually brings up an important distinction.  Joshua is saying
> that
> the community is painted as "god" in the article, and I agree there is
> a
> basis for that, but I don't think you can consider the community and
> company as equals either.  

>  Also, the community is developing the software at a rate that
> almost no other company can match, so again the company is kind of in
> toe if they are working with the community process.  For example, the
> community is not submitting patches for the company to approve.

The community is developing software quickly because there are some/many
full-time paid staff assigned to the project. We (the Community) need to
recognise that the Community is *all* of us and that includes various
Companies i.e. Companies aren't distinct from the Community. In that
sense, I would agree that The Community is "above" Companies.

We must be appreciative of contributions made in imperfect conditions.
Frequently changes are made behind closed doors and then approval is
given to release the software, sometimes after extensive lobbying. We
shouldn't shun those contributions, even while advising those companies
that we'd prefer it if they didn't do it that way next time. We should
assume that all development is done with the best intentions, even if
things don't follow the FAQ. Now that we have some clear policy on this,
I look forward to people being able to say "best not do it that way, the
Community has a clear policy against that", that Teodor, myself and
others can advise sponsors about.

BTW, the phrase "Companies" must also include any external Enterprise,
since many good things come our way from Universities and Colleges. We
should also recognise that many enterprises are in fact non-profit, or
simply local/national government/administrative organisations. Profit
per se is not the only thing that drives requirements.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] New version of money type

2006-12-21 Thread D'Arcy J.M. Cain
On Thu, 21 Dec 2006 10:47:52 -0500
Tom Lane <[EMAIL PROTECTED]> wrote:
> "D'Arcy J.M. Cain"  writes:
> > Very good points.  However, like the currency symbol issue I would like
> > to separate that into another discussion.  The code already exists with
> > the warts you mention (and more) and this proposal is to fix one thing
> > that will make it more useful to others.  Let's get that change in and
> > then start fixing up some of those other issues.
> 
> I've forgotten now --- was this patch intended *only* to convert money
> from int4 to int8 underlying representation, or did you do other things?

Well, the main intention was to just widen the underlying storage and
thus increase the range to the point that the type is useful to more
users.  In fact, as you can see, I have removed the change to drop the
currency on output just to keep this change to a single issue.
However, there was a little bit of cleanup as well.  I removed some
self-balancing XXX comments for example.  That's what CVS log is for.
I moved a few functions around in order to make static functions self
prototyping.  I added some consts to variables where appropriate.  The
cash_words function needed to be changed to accomodate the billions,
trillions and quadrillions that can now be handled.

Everything else should be directly related to the type change and
self-explanatory.

> It looks like there are unrelated changes in the patch, but I'm not sure
> if you just moved code around or did something more interesting.

Hopefully nothing too "interesting."  :-)

> One bug I see in it is that you'd better make the alignment 'd' if the

Fixed in my local tree.  Thanks.

> type is to be int8.  Also I much dislike these changes:
> 
> - int32   i = PG_GETARG_INT32(1);
> + int64   i = PG_GETARG_INT32(1);
> 
> I think they may not actually be wrong, but they certainly *look* wrong;
> in general the declared type of a parameter variable in a C-coded SQL
> function ought to match what the SQL signature says.  Anyway there is no
> need that I can see to widen these variables.  Every C compiler knows
> what to do if you ask it for arithmetic on a long and an int.

Right but I still need to accept int64 args here.  I have changed the
two relevant places to use PG_GETARG_INT64(1).

> (Speaking of which, have you thought about what happens on a machine
> with no 64-bit int, such that "int64" is really just 32 bits?  Ideally
> the code should continue to function but with reduced range.  I didn't
> see any places where you were obviously depending on the range, but
> it's something to have in the back of your mind while coding.)

Does PGSQL run on any such machines?  If so perhaps someone can
volunteer to do some testing if they have one.

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 11:15:38AM -0500, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Can we? For anything of any permenence (view definitions, rules,
> > compiled functions, plans, etc) you're going to want the physical
> > number, for the same reason we store the oids of functions and tables.
> 
> Not if we intend to rearrange the physical numbers during column
> add/drop to provide better packing.

Urk! If that's what people are suggesting, I'd run away very quickly.
Getting better packing during table create is a nice idea, but
preserving it across add/drop column is just... evil.

Run CLUSTER is you want that, I was expecting add/drop to be a simple
catalog change, nothing more.

> You could make a case that we need *three* numbers: a permanent column
> ID, a display position, and a storage position.

That's just way too complicated IMHO. It add's extra levels of
indirection all over the place.

I was envisiging the physical number to be fixed and immutable (ie
storage position = permanent position).

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> You could make a case that we need *three* numbers: a permanent column
>> ID, a display position, and a storage position.

> Could this not be handled by some catalog fixup after an add/drop? If we 
> get the having 3 numbers you will almost have me convinced that this 
> might be too complicated after all.

Actually, the more I think about it the more I think that 3 numbers
might be the answer.  99% of the code would use only the permanent ID.
Display position would be used in *exactly* one place, namely while
expanding "SELECT foo.*" --- I can't think of any other part of the
backend that would care about it.  (Obviously, client-side code such
as psql's \d would use it too.)  Use of storage position could be
localized into a few low-level tuple access functions, probably.

The problems we've been having with the concept stem precisely from
trying to misuse either display or storage position as a permanent ID.
That's fine as long as it actually is permanent, but as soon as you
want to change it then you have problems.  We should all understand
this perfectly well from a database theory standpoint: pg_attribute
has to have a persistent primary key.  (attrelid, attnum) is that key,
and we can't go around altering a column's attnum without creating
problems for ourselves.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Lukas Kahwe Smith

Bruce Momjian wrote:


[1] http://developer.postgresql.org/index.php/Todo:WishlistFor83


That looks helpful.


Ok good :)

Seriously though, please slap me when things are wrong, not assigned yet 
to the correct person .. there was a bit of guess work involved with 
some of the points .. especially with the names.


I will let things sit like they are for 1-2 weeks and I will probably 
try to get a hold off each of the people still assigned to items then to 
confirm that they are actually targeting the feature for 8.3 .. or am I 
duplicating Bruce's efforts when I do this?


regards,
Lukas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Stats Collector Oddity

2006-12-21 Thread Tom Lane
Chris Browne <[EMAIL PROTECTED]> writes:
> There isn't any way, short of restarting the postmaster, to get rid of
> that PID, is there?

The entry will get overwritten when that BackendId slot gets re-used,
so just starting enough concurrent backends should do it.  (Since
incoming backends always take the lowest free slot, the fact that the
dead entry has persisted awhile means that it must have a number higher
than your normal number of concurrent sessions ... which is evidence
in favor of the idea that it happened during a load spike ...)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan

Tom Lane wrote:

Martijn van Oosterhout  writes:
  

On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:


Really?  To me that's one of a large number of questions that are
unresolved about how we'd do this.  You can make a case for either
choice in quite a number of places.
  


  

Can we? For anything of any permenence (view definitions, rules,
compiled functions, plans, etc) you're going to want the physical
number, for the same reason we store the oids of functions and tables.



Not if we intend to rearrange the physical numbers during column
add/drop to provide better packing.

You could make a case that we need *three* numbers: a permanent column
ID, a display position, and a storage position.


  


Could this not be handled by some catalog fixup after an add/drop? If we 
get the having 3 numbers you will almost have me convinced that this 
might be too complicated after all.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch

2006-12-21 Thread Kevin Grittner
>>> On Wed, Dec 20, 2006 at  5:33 AM, in message
<[EMAIL PROTECTED]>,
Russell Smith <[EMAIL PROTECTED]> wrote: 
> 
> The 8.1 documentation for ALTER TABLE states the following.
> 
> Adding a column with a non-null default or changing the type of an 
> existing column will require the entire table to be rewritten. This
may 
> take a significant amount of time for a large table; and it will 
> temporarily require double the disk space.
> 
> 
> Now, we are rewriting the table from scratch anyway, the on disk
format 
> is changing.  What is stopping us from switching the column order at
the 
> same time.  The only thing I can think is that the catalogs will need

> more work to update them.  It's a middle sized price to pay for being

> able to reorder the columns in the table.  One of the problems I have
is 
> wanting to add a column in the middle of the table, but FK
constraints 
> stop me dropping the table to do the reorder.  If ALTER TABLE would
let 
> me stick it in the middle and rewrite the table on disk, I wouldn't 
> care.  It's likely that I would be rewriting the table anyway.  And
by 
> specifying AT POSITION, or BEFORE/AFTER you know for big tables it's

> going to take a while.
> 
> Not that I'm able to code this at all, but I'm interested in feedback
on 
> this option.
 
+1
 
Currently, I often have to make the choice between adding a column at
the "logical" place in relation to the other columns or adding it at the
end.  The former requires creating a whole new table, populating it with
INSERT/SELECT, dropping the old table, renaming the new table, and
restoring permissions, constraints, indexes, etc.  The latter is a
simple ALTER TABLE.  When I choose the former, I save significant time
and reduce errors by using pg_dump to generate a lot of the code; but it
should would be a nice feature if ALTER TABLE could do all this "under
the covers".
 
-Kevin
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD

> > > The thing is, physical index numbers has meaning, the logical
index
> > > number does not. In a view definition we're going to store the
> > > physical index, not the logical one, for example. We don't want
> > > rearranging columns to invalidate view definitions or plans.
> > 
> > I think we lack a definition here:
> > 
> > logical number: the order of columns when doing select *
> > physical number:the position inside the heap tuple (maybe with
> > offset)
> > 
> > All views and plans and index definitions and most everyting else 
> > needs to reference the logical number.
> 
> Huh? If I have an index on the first two columns of a table, 
> it's going
> to refernce columns 1 and 2.
> 
> If you alter the table to put a column in front of those two, the new
> column will be physical 3, logical 1.

No, you change pg_index to now contain 2,3.

> If the index references logical numbers, the index has just been
> broken. If the index references physical numbers, everything works
> without changes.

yup, sinval

> Same with views, if you use logical numbers you have to rebuild the
> view each time. Why bother, when physical numbers work and don't have
> that problem?

Because it would imho be a nightmare to handle ...

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:
>> Really?  To me that's one of a large number of questions that are
>> unresolved about how we'd do this.  You can make a case for either
>> choice in quite a number of places.

> Can we? For anything of any permenence (view definitions, rules,
> compiled functions, plans, etc) you're going to want the physical
> number, for the same reason we store the oids of functions and tables.

Not if we intend to rearrange the physical numbers during column
add/drop to provide better packing.

You could make a case that we need *three* numbers: a permanent column
ID, a display position, and a storage position.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 05:06:53PM +0100, Zeugswetter Andreas ADI SD wrote:
> > The thing is, physical index numbers has meaning, the logical index
> > number does not. In a view definition we're going to store the
> > physical index, not the logical one, for example. We don't want
> > rearranging columns to invalidate view definitions or plans.
> 
> I think we lack a definition here:
> 
> logical number:   the order of columns when doing select *
> physical number:  the position inside the heap tuple (maybe with
> offset)
> 
> All views and plans and index definitions and most everyting else 
> needs to reference the logical number.

Huh? If I have an index on the first two columns of a table, it's going
to refernce columns 1 and 2.

If you alter the table to put a column in front of those two, the new
column will be physical 3, logical 1.

If the index references logical numbers, the index has just been
broken. If the index references physical numbers, everything works
without changes.

Same with views, if you use logical numbers you have to rebuild the
view each time. Why bother, when physical numbers work and don't have
that problem?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > The thing is, physical index numbers has meaning, the logical index
> > number does not. In a view definition we're going to store the physical
> > index, not the logical one, for example.
> 
> Really?  To me that's one of a large number of questions that are
> unresolved about how we'd do this.  You can make a case for either
> choice in quite a number of places.

Can we? For anything of any permenence (view definitions, rules,
compiled functions, plans, etc) you're going to want the physical
number, for the same reason we store the oids of functions and tables.

I can't see the optimiser or executor caring about logical numbers
either. The planner would use it only when looking up column names.

The logical number isn't going to be used much I think. You can go from
column name to physical index directly, without ever looking up the
logical index. That's why I'm suggesting adding some large constant to
the logical numbers, since they're going to be less used in general.

Where do you think we have the choice?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD

> > I don't think we should expose the offset to user view at all - this
is 
> > just for internal use, no?
> 
> The thing is, physical index numbers has meaning, the logical index
> number does not. In a view definition we're going to store the
physical
> index, not the logical one, for example. We don't want rearranging
> columns to invalidate view definitions or plans.

I think we lack a definition here:

logical number: the order of columns when doing select *
physical number:the position inside the heap tuple (maybe with
offset)

All views and plans and index definitions and most everyting else 
needs to reference the logical number.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Stats Collector Oddity

2006-12-21 Thread Chris Browne
[EMAIL PROTECTED] (Tom Lane) writes:
> Chris Browne <[EMAIL PROTECTED]> writes:
>> We're getting a bit of an anomaly relating to pg_stat_activity...
>> ...
>> That PID has been dead for several days, but this connection is marked
>> as being open, still, after lo many days.
>
> This probably just means that the "backend termination" stats message
> got dropped due to heavy load.  That's expected behavior in all pre-8.2
> releases: the stats system was never intended to provide
> guaranteed-exactly-correct status.  PG 8.2 has reimplemented the
> pg_stat_activity view to make it more trustworthy.  (The other stuff is
> still probabilistic, but being just event counters, message loss isn't
> so obvious.)

That seems a *bit* surprising; the system wasn't expected to be under
particularly heavy load during the period in question; I would have
expected "particularly light load."  No matter; there may have been
some brief heavy load to cause this.

There isn't any way, short of restarting the postmaster, to get rid of
that PID, is there?
-- 
"cbbrowne","@","linuxdatabases.info"
http://cbbrowne.com/info/linuxdistributions.html
"High-level languages are a pretty good indicator that all else is
seldom equal." - Tim Bradshaw, comp.lang.lisp

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Dave Page

Simon Riggs wrote:

On Wed, 2006-12-20 at 09:47 -0500, Jim Nasby wrote:

On the other hand, this would be the only part of the system where  
the official interface/API is a system catalog table. Do we really  
want to expose the internal representation of something as our API?  
That doesn't seem wise to me...


Define and agree the API (the hard bit) and I'll code it (the easy bit).

We may as well have something on the table, even if it changes later.

Dave: How does PgAdmin handle setting table-specific autovacuum
parameters? (Does it?)



Yes, it adds/removes/edits rows in pg_autovacuum as required.

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: tuple concurrently updated

2006-12-21 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > I havn't built a reliable test case yet but I *think* the tuple
> > concurrently updated problem is with an analyze being run inside of a
> > function and also being run by autovacuum.
> 
> If so it should be fixed as of 8.2 --- I believe we changed the locking
> rules to ensure only one ANALYZE at a time for any one table.
[...]
> I think we've got a solution for that in 8.2, also --- at least, the
> only common case I know of should be fixed, namely where a RENAME or
> similar has caused the same table name to be assigned to a new OID.

Great!  These were on 8.1 and I was actually just working to try and
reproduce them on 8.2 (without success so far!).  I'll see about
upgrading the production systems to 8.2 soon and will let ya'll know if
I see them again there.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Martijn van Oosterhout  writes:
> The thing is, physical index numbers has meaning, the logical index
> number does not. In a view definition we're going to store the physical
> index, not the logical one, for example.

Really?  To me that's one of a large number of questions that are
unresolved about how we'd do this.  You can make a case for either
choice in quite a number of places.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] New version of money type

2006-12-21 Thread Tom Lane
"D'Arcy J.M. Cain"  writes:
> Very good points.  However, like the currency symbol issue I would like
> to separate that into another discussion.  The code already exists with
> the warts you mention (and more) and this proposal is to fix one thing
> that will make it more useful to others.  Let's get that change in and
> then start fixing up some of those other issues.

I've forgotten now --- was this patch intended *only* to convert money
from int4 to int8 underlying representation, or did you do other things?
It looks like there are unrelated changes in the patch, but I'm not sure
if you just moved code around or did something more interesting.

One bug I see in it is that you'd better make the alignment 'd' if the
type is to be int8.  Also I much dislike these changes:

-   int32   i = PG_GETARG_INT32(1);
+   int64   i = PG_GETARG_INT32(1);

I think they may not actually be wrong, but they certainly *look* wrong;
in general the declared type of a parameter variable in a C-coded SQL
function ought to match what the SQL signature says.  Anyway there is no
need that I can see to widen these variables.  Every C compiler knows
what to do if you ask it for arithmetic on a long and an int.

(Speaking of which, have you thought about what happens on a machine
with no 64-bit int, such that "int64" is really just 32 bits?  Ideally
the code should continue to function but with reduced range.  I didn't
see any places where you were obviously depending on the range, but
it's something to have in the back of your mind while coding.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 10:27:12AM -0500, Andrew Dunstan wrote:
> >Um, surely you meant "offset the physical numbers". Imho the logical
> >numbers
> >need to stay 1-n, because those numbers are used way more often and are 
> >more user visible than the physical.
> >
> >
> >  
> 
> I don't think we should expose the offset to user view at all - this is 
> just for internal use, no?

The thing is, physical index numbers has meaning, the logical index
number does not. In a view definition we're going to store the physical
index, not the logical one, for example. We don't want rearranging
columns to invalidate view definitions or plans.

The number of places needing the logical index are not that man,
relativelyy, and given it has no intrinsic meaning, it's better to give
it a numeric value which is obviously abritrary (like 10001).

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] ERROR: tuple concurrently updated

2006-12-21 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> I havn't built a reliable test case yet but I *think* the tuple
> concurrently updated problem is with an analyze being run inside of a
> function and also being run by autovacuum.

If so it should be fixed as of 8.2 --- I believe we changed the locking
rules to ensure only one ANALYZE at a time for any one table.

Conflicts from concurrent ANALYZEs are the only cases I've heard of
before that make this error occur in the field, but I suppose it would
be possible to get it from other things such as concurrently trying to
CREATE OR REPLACE the same function.

> The SysCache stuff I was
> thinking about previously was actually for another problem that I hadn't
> seen in a long time (because I hadn't been doing a particular set of
> operations, not because it's that difficult to have happen) but just ran
> into again today:
> ERROR:  cache lookup failed for relation ...

I think we've got a solution for that in 8.2, also --- at least, the
only common case I know of should be fixed, namely where a RENAME or
similar has caused the same table name to be assigned to a new OID.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan

Zeugswetter Andreas ADI SD wrote:

I'm not sure how much you can do with typing. Things like heap_getattr
are macros, and thus untyped. Most places use attr as an index to an
array, which also can't be type checked.

If you switched everything over to inline functions you might 
get it to

work, but that's about it.

IMHO the best solution is to offset the logical numbers by some
constant...



Um, surely you meant "offset the physical numbers". Imho the logical
numbers
need to stay 1-n, because those numbers are used way more often and are 
more user visible than the physical.



  


I don't think we should expose the offset to user view at all - this is 
just for internal use, no?


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] log_min_error_statement and parameters value

2006-12-21 Thread Tom Lane
"JEAN-PIERRE PELLETIER" <[EMAIL PROTECTED]> writes:
> Is it possible to set log_min_error_statement to display the parameters 
> value?

No --- displaying the parameter values requires being able to run
user-defined output functions, which we can't do in an already-failed
transaction.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] log_min_error_statement and parameters value

2006-12-21 Thread JEAN-PIERRE PELLETIER

Hi,

In PostgreSQL 8.2, log_statement display actual parameters value which is 
very neat as in

DETAIL:  parameters: $1 = '1', $2 = NULL

Currently I used log_statement='all' to display all statements and their 
parameters but
that produced a lot of output and I am really only interested in the 
statements displayed with log_min_error_statement.


Is it possible to set log_min_error_statement to display the parameters 
value?


Thanks,
Jean-Pierre Pelletier
e-djuster



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Simon Riggs
On Wed, 2006-12-20 at 09:47 -0500, Jim Nasby wrote:

> On the other hand, this would be the only part of the system where  
> the official interface/API is a system catalog table. Do we really  
> want to expose the internal representation of something as our API?  
> That doesn't seem wise to me...

Define and agree the API (the hard bit) and I'll code it (the easy bit).

We may as well have something on the table, even if it changes later.

Dave: How does PgAdmin handle setting table-specific autovacuum
parameters? (Does it?)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Tuning single row operations

2006-12-21 Thread Simon Riggs
On Thu, 2006-12-21 at 09:36 -0500, Matthew O'Connor wrote:
> Richard Huxton wrote:
> > Simon Riggs wrote:
> >> - improve RI check perf by caching small, static tables in each backend
> >> - apply index filter conditions on index scan to avoid heap lookup
> > 
> > For fkey checks against a basically static table could you get away with 
> > just checking the index and not the table?
> 
> I'm not sure that would fly, there is always the possibility it could 
> change, I think the ENUMs patch will solve this performance problem.

Not using SQL Standard syntax it won't. 

I'd be happier if it worked with DOMAINs and happier still if we can get
it to optimise just bare datatypes. My objective is to tune a database
without needing to reload any of the tables and to ensure that RI is
effective in both directions (from referencing and referenced tables).

Perhaps there's a way to make that happen...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD

> I'm not sure how much you can do with typing. Things like heap_getattr
> are macros, and thus untyped. Most places use attr as an index to an
> array, which also can't be type checked.
> 
> If you switched everything over to inline functions you might 
> get it to
> work, but that's about it.
> 
> IMHO the best solution is to offset the logical numbers by some
> constant...

Um, surely you meant "offset the physical numbers". Imho the logical
numbers
need to stay 1-n, because those numbers are used way more often and are 
more user visible than the physical.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tuning single row operations

2006-12-21 Thread Matthew O'Connor

Richard Huxton wrote:

Simon Riggs wrote:

- improve RI check perf by caching small, static tables in each backend
- apply index filter conditions on index scan to avoid heap lookup


For fkey checks against a basically static table could you get away with 
just checking the index and not the table?


I'm not sure that would fly, there is always the possibility it could 
change, I think the ENUMs patch will solve this performance problem.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Matthew O'Connor

Russell Smith wrote:
I thought the plan was to change the ALTER TABLE command to allow vacuum 
settings to be set.  



That is my understanding too.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tuning single row operations

2006-12-21 Thread Richard Huxton

Simon Riggs wrote:

For 8.3 my goal is to improve the performance of single row operations,


Great. That's something that's useful across the board.


Currently, I'm aware of these possibilities, some fairly vague

...

- avoid RI checks for update of a column not mentioned in SET


Linked at least logically - conditional triggers ("where 
old.status<>new.status" or similar) could save on IF statements early in 
trigger functions.


...

- improve RI check perf by caching small, static tables in each backend
- apply index filter conditions on index scan to avoid heap lookup


For fkey checks against a basically static table could you get away with 
just checking the index and not the table?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Zeugswetter Andreas ADI SD

> > You were running the test on the very memory-depend machine.
> >> shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
> > Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
> > data and 2GB of memory. Storage is always the main part of
performace here,
> > even not in checkpoints.
> 
> Yes, I used half the size of RAM as the shared buffers, which is
> reasonable.  And I cached all the data.

For pg, half RAM for shared_buffers is too much. The ratio is good for
other db software, that does not use the OS cache.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] New version of money type

2006-12-21 Thread D'Arcy J.M. Cain
On Thu, 21 Dec 2006 00:21:08 -0800
David Fetter <[EMAIL PROTECTED]> wrote:
> On Wed, Dec 20, 2006 at 08:44:07PM -0500, D'Arcy J.M. Cain wrote:
> > Now that 8.3 has been branched shall I go ahead and commit?  As
> > discussed I will put the currency symbol back in just so that it can
> > be discussed and worked on as a completely separate issue.  I have
> > attached the current patch against HEAD.
> 
> I noticed that all your numbers are in English.  Is it necessary to
> hard-code all that?  Also, you're assuming that powers of 10 which are
> divisible by 3 are the relevant ones.  In China, it's powers of 10
> divisible by 4, and in India, it's 0, 1, 2, 3, followed by odd numbers
> up through 19.

Very good points.  However, like the currency symbol issue I would like
to separate that into another discussion.  The code already exists with
the warts you mention (and more) and this proposal is to fix one thing
that will make it more useful to others.  Let's get that change in and
then start fixing up some of those other issues.

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Tuning single row operations

2006-12-21 Thread Simon Riggs
For 8.3 my goal is to improve the performance of single row operations,
such as
INSERT INTO foo ... VALUES (...)
UPDATE foo SET  WHERE  = 
DELETE FROM foo WHERE  = 

OLTP Assumptions
- all statements are prepared first, then executed with bound
parameters. 
- need to cover cases where these are executed from functions in a PL,
as well as the case where they are executed via a protocol v3+ message
from a latest-edition driver.
- we are likely to be using RETURNING clauses
- we care about both single connections issuing a stream of these
requests, as well as performance with many concurrent sessions
- we have Referential Integrity constraints defined
- we care about both with/without Slony replication
- we care about log-shipping/warm standby also
- we want to support all other features also: Most Advanced OSDB etc

I would like to discuss what opportunities exist to improve these
operations and to prioritise them for work during 8.3 and beyond.

Currently, I'm aware of these possibilities, some fairly vague
- set up index scan at plan time, not in executor
- stop the index scan immediately a single row is returned
- reduce WAL for updates when SET clause doesn't mention all cols
- avoid RI checks for update of a column not mentioned in SET
- separate prepared plan from plan state, to avoid memcpy
- avoid double access of buffer for UPDATE/DELETE by producing new fast
path through executor, streamlined for unique accesses
- turn off WAL for (some?) indexes and rebuild them following a crash
- HOT updates: don't do index inserts for unchanged indexed cols
- avoid explicit locking of indexes (at cost of concurrent index ops)
- improve RI check perf by caching small, static tables in each backend
- apply index filter conditions on index scan to avoid heap lookup
- others... feel free to add your own etc

Clearly, some of these need further work. The question is which ones
have sufficient promise to be worth taking further and what would the
priority order for that work be? I assume that a full feasibility
investigation is needed for each item and that there is *no* presumption
that something prioritised higher means it is pre-approved for
inclusion. I'll document the responses as an additional section of the
public TODO, some of which may be removed later if they prove
infeasible. Those would possibly be labelled: OLTP Performance and
Investigations: Items thought to be worth investigation.

I'd like to initiate some open discussion on how, given the above goal,
to improve performance of PostgreSQL. If you don't have any ideas or
opinions now, you're welcome to reply to this thread in the future to
introduce new possibilities.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 06:46:36PM +0900, ITAGAKI Takahiro wrote:
> > Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
> > waiting for lseek() to complete when it committed.  But why does the
> > backend which is syncing WAL/pg_control have to wait for syncing the
> > data file?  They are, not to mention, different files, and WAL and
> > data files are stored on separate disks.
> 
> Backends call lseek() in planning, so they have to wait fsync() to
> the table that they will access. Even if all of data in the file is in
> the cache, lseek() conflict with fsync(). You can see a lot of backends
> are waiting in planning phase in checkpoints, not executing phase.

Hmm, there are other ways to sync parts of a file. For example doing an
mmap()/msync()/munmap() cycle to start an asyncronous flush. But given
what you're saying that might suffer from the same problem.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa
To: "Takayuki Tsunakawa" <[EMAIL PROTECTED]>
Cc: 
Sent: Thursday, December 21, 2006 6:46 PM
Subject: Re: [HACKERS] Load distributed checkpoint


>
From: "ITAGAKI Takahiro" <[EMAIL PROTECTED]>
> "Takayuki Tsunakawa" <[EMAIL PROTECTED]> wrote:
>> Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
>> waiting for lseek() to complete when it committed.  But why does
the
>> backend which is syncing WAL/pg_control have to wait for syncing
the
>> data file?  They are, not to mention, different files, and WAL and
>> data files are stored on separate disks.
>
> Backends call lseek() in planning, so they have to wait fsync() to
> the table that they will access. Even if all of data in the file is
in
> the cache, lseek() conflict with fsync(). You can see a lot of
backends
> are waiting in planning phase in checkpoints, not executing phase.

I see.  I found one backend like the following.  But one in my case
one out of 16 backends.  Most of others are waiting to acquire
WALWRITE lock.

#0  0x003a629c6902 in __lseek_nocancel () from
/lib64/tls/libc.so.6
#1  0x0056789f in FileSeek ()
#2  0x00574053 in mdnblocks ()
#3  0x00574f4a in smgrnblocks ()
#4  0x005489e8 in estimate_rel_size ()
#5  0x00548bee in get_relation_info ()
#6  0x0054aa3d in build_simple_rel ()
#7  0x00539c6b in add_base_rels_to_query ()
#8  0x0053b955 in query_planner ()
#9  0x0053c1c9 in grouping_planner ()
#10 0x0053d3b4 in subquery_planner ()
#11 0x0053d5b3 in planner ()
#12 0x005778fc in pg_plan_query ()
#13 0x0057798c in pg_plan_queries ()
#14 0x00577c53 in exec_simple_query ()




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread ITAGAKI Takahiro

"Takayuki Tsunakawa" <[EMAIL PROTECTED]> wrote:

> > If you use Linux, it has very unpleased behavior in fsync(); It locks all
> > metadata of the file being fsync-ed. We have to wait for the completion of
> > fsync when we do read(), write(), and even lseek().
> 
> Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
> waiting for lseek() to complete when it committed.  But why does the
> backend which is syncing WAL/pg_control have to wait for syncing the
> data file?  They are, not to mention, different files, and WAL and
> data files are stored on separate disks.

Backends call lseek() in planning, so they have to wait fsync() to
the table that they will access. Even if all of data in the file is in
the cache, lseek() conflict with fsync(). You can see a lot of backends
are waiting in planning phase in checkpoints, not executing phase.


> > it's not difficult at all to insert sleeps between fsync()s.

> Can we say "our approach helps our
> environments, but it won't help you.  The kernel VM settings may help
> you.  Good luck!"?

I didn't say such a thing at all.
There are several opinions in the discussion:
  1. High bgwriter setting is enough!
  2. Change your OS :-)
  3. Use O_SYNC or O_DIRECT, but very poor performance.
  4. We may settle for single fsync(), but not many fsync()s in a short time.
I just suggested 4.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Bernd Helmle


On Wed, 20 Dec 2006 17:49:15 +0100, Kaare Rasmussen <[EMAIL PROTECTED]> wrote:
> I'm not sure, but as far as I remember, it will be a short release cycle
> for
> 8.3 in order to finish some big items that couldn't be ready in time for
> 8.2.
>
> But which items are more or less expected for 8.3? I recall
> - Hierarchical Queries
> - On disk bitmap index
> - Clustered/replication solutions
>
> being discussed. What are on people's minds?
>

I still have updatable views on my radar. I'm currently out of office,
but i'm preparing to put a project page into our wiki tomorrow, so people
could get a slight overview about current implementation and open items.

Bernd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] New version of money type

2006-12-21 Thread David Fetter
On Wed, Dec 20, 2006 at 08:44:07PM -0500, D'Arcy J.M. Cain wrote:
> On Thu, 12 Oct 2006 14:24:22 -0400
> "D'Arcy J.M. Cain"  wrote:
> > On Thu, 12 Oct 2006 14:17:33 -0400
> > Tom Lane <[EMAIL PROTECTED]> wrote:
> > > "D'Arcy J.M. Cain"  writes:
> > > > Cool.  So what do I do with the patch?  Should I add the
> > > > currency symbol back in and commit or should I resubmit the
> > > > patch to hackers for further review?
> > > 
> > > Well, one thing you definitely *don't* do is commit right now,
> > > because we're in feature freeze, not to mention trying to avoid
> > > forced initdbs now that beta has started.  Sit on it till 8.3 is
> > > branched, and
> > 
> > OK.  I hadn't thought of it as a new feature per se but I
> > understand the initdb issue.  Holding at 30,000 feet, ground
> > control.
> > 
> > > meanwhile think about what you want to do with the
> > > currency-symbol issue...
> > 
> > Personally I don't see a need for it but I am currently in favour
> > of adding it back in before committing just so that we can deal
> > with the issue separately.  The same as the other changes being
> > discussed.
> 
> Now that 8.3 has been branched shall I go ahead and commit?  As
> discussed I will put the currency symbol back in just so that it can
> be discussed and worked on as a completely separate issue.  I have
> attached the current patch against HEAD.

I noticed that all your numbers are in English.  Is it necessary to
hard-code all that?  Also, you're assuming that powers of 10 which are
divisible by 3 are the relevant ones.  In China, it's powers of 10
divisible by 4, and in India, it's 0, 1, 2, 3, followed by odd numbers
up through 19.

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa
From: "ITAGAKI Takahiro" <[EMAIL PROTECTED]>
> You were running the test on the very memory-depend machine.
>> shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
> Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
> data and 2GB of memory. Storage is always the main part of
performace here,
> even not in checkpoints.

Yes, I used half the size of RAM as the shared buffers, which is
reasonable.  And I cached all the data.  The effect of fsync() is a
heavier offence, isn't it?  System administrators would say "I have
enough memory.  The data hasn't exhausted the DB cache yet.  But the
users complain to me about the response.  Why?  What should I do?
What?  Checkpoint??  Why doesn't PostgreSQL take care of frontend
users?"
BTW, is DBT-2 an OLTP benchmark which randomly access some parts of
data, or a batch application which accesses all data?  I'm not
familiar with it.  I know that IPA opens it to the public.

> If you use Linux, it has very unpleased behavior in fsync(); It
locks all
> metadata of the file being fsync-ed. We have to wait for the
completion of
> fsync when we do read(), write(), and even lseek().
> Almost of your data is in the accounts table and it was stored in a
single
> file. All of transactions must wait for fsync to the single largest
file,
> so you saw the bottleneck was in the fsync.

Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
waiting for lseek() to complete when it committed.  But why does the
backend which is syncing WAL/pg_control have to wait for syncing the
data file?  They are, not to mention, different files, and WAL and
data files are stored on separate disks.


>> [Conclusion]
>> I believe that the problem cannot be solved in a real sense by
>> avoiding fsync/fdatasync().
>
> I think so, too. However, I assume we can resolve a part of the
> checkpoint spikes with smoothing of write() alone.

First, what's the goal (if possible numerically?  Have you explained
to community members why the patch would help many people?  At least,
I haven't heard that fsync() can be seriously bad and we would close
our eyes to what fsync() does.
By the way, what good results did you get with DBT-2?  If you don't
mind, can you show us?


> BTW, can we use the same way to fsync? We call fsync()s to all
modified
> files without rest in mdsync(), but it's not difficult at all to
insert
> sleeps between fsync()s. Do you think it helps us? One of issues is
that
> we have to sleep in file unit, which is maybe rough granularity.

No, it definitely won't help us.  There is no reason why it will help.
It might help in some limited environments, though, how can we
characterize such environments?  Can we say "our approach helps our
environments, but it won't help you.  The kernel VM settings may help
you.  Good luck!"?
We have to consider seriously.  I think it's time to face the problem
and we should follow the approaches of experts like Jim Gray and DBMS
vendors, unless we have a new clever idea like them.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org