date:20061221

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa

From: ITAGAKI Takahiro [EMAIL PROTECTED]
 You were running the test on the very memory-depend machine.
 shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
 Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
 data and 2GB of memory. Storage is always the main part of
performace here,
 even not in checkpoints.

Yes, I used half the size of RAM as the shared buffers, which is
reasonable.  And I cached all the data.  The effect of fsync() is a
heavier offence, isn't it?  System administrators would say I have
enough memory.  The data hasn't exhausted the DB cache yet.  But the
users complain to me about the response.  Why?  What should I do?
What?  Checkpoint??  Why doesn't PostgreSQL take care of frontend
users?
BTW, is DBT-2 an OLTP benchmark which randomly access some parts of
data, or a batch application which accesses all data?  I'm not
familiar with it.  I know that IPA opens it to the public.

 If you use Linux, it has very unpleased behavior in fsync(); It
locks all
 metadata of the file being fsync-ed. We have to wait for the
completion of
 fsync when we do read(), write(), and even lseek().
 Almost of your data is in the accounts table and it was stored in a
single
 file. All of transactions must wait for fsync to the single largest
file,
 so you saw the bottleneck was in the fsync.

Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
waiting for lseek() to complete when it committed.  But why does the
backend which is syncing WAL/pg_control have to wait for syncing the
data file?  They are, not to mention, different files, and WAL and
data files are stored on separate disks.


 [Conclusion]
 I believe that the problem cannot be solved in a real sense by
 avoiding fsync/fdatasync().

 I think so, too. However, I assume we can resolve a part of the
 checkpoint spikes with smoothing of write() alone.

First, what's the goal (if possible numerically?  Have you explained
to community members why the patch would help many people?  At least,
I haven't heard that fsync() can be seriously bad and we would close
our eyes to what fsync() does.
By the way, what good results did you get with DBT-2?  If you don't
mind, can you show us?


 BTW, can we use the same way to fsync? We call fsync()s to all
modified
 files without rest in mdsync(), but it's not difficult at all to
insert
 sleeps between fsync()s. Do you think it helps us? One of issues is
that
 we have to sleep in file unit, which is maybe rough granularity.

No, it definitely won't help us.  There is no reason why it will help.
It might help in some limited environments, though, how can we
characterize such environments?  Can we say our approach helps our
environments, but it won't help you.  The kernel VM settings may help
you.  Good luck!?
We have to consider seriously.  I think it's time to face the problem
and we should follow the approaches of experts like Jim Gray and DBMS
vendors, unless we have a new clever idea like them.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] New version of money type

2006-12-21 Thread David Fetter

On Wed, Dec 20, 2006 at 08:44:07PM -0500, D'Arcy J.M. Cain wrote:
 On Thu, 12 Oct 2006 14:24:22 -0400
 D'Arcy J.M. Cain darcy@druid.net wrote:
  On Thu, 12 Oct 2006 14:17:33 -0400
  Tom Lane [EMAIL PROTECTED] wrote:
   D'Arcy J.M. Cain darcy@druid.net writes:
Cool.  So what do I do with the patch?  Should I add the
currency symbol back in and commit or should I resubmit the
patch to hackers for further review?
   
   Well, one thing you definitely *don't* do is commit right now,
   because we're in feature freeze, not to mention trying to avoid
   forced initdbs now that beta has started.  Sit on it till 8.3 is
   branched, and
  
  OK.  I hadn't thought of it as a new feature per se but I
  understand the initdb issue.  Holding at 30,000 feet, ground
  control.
  
   meanwhile think about what you want to do with the
   currency-symbol issue...
  
  Personally I don't see a need for it but I am currently in favour
  of adding it back in before committing just so that we can deal
  with the issue separately.  The same as the other changes being
  discussed.
 
 Now that 8.3 has been branched shall I go ahead and commit?  As
 discussed I will put the currency symbol back in just so that it can
 be discussed and worked on as a completely separate issue.  I have
 attached the current patch against HEAD.

I noticed that all your numbers are in English.  Is it necessary to
hard-code all that?  Also, you're assuming that powers of 10 which are
divisible by 3 are the relevant ones.  In China, it's powers of 10
divisible by 4, and in India, it's 0, 1, 2, 3, followed by odd numbers
up through 19.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Bernd Helmle



On Wed, 20 Dec 2006 17:49:15 +0100, Kaare Rasmussen [EMAIL PROTECTED] wrote:
 I'm not sure, but as far as I remember, it will be a short release cycle
 for
 8.3 in order to finish some big items that couldn't be ready in time for
 8.2.

 But which items are more or less expected for 8.3? I recall
 - Hierarchical Queries
 - On disk bitmap index
 - Clustered/replication solutions

 being discussed. What are on people's minds?


I still have updatable views on my radar. I'm currently out of office,
but i'm preparing to put a project page into our wiki tomorrow, so people
could get a slight overview about current implementation and open items.

Bernd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread ITAGAKI Takahiro


Takayuki Tsunakawa [EMAIL PROTECTED] wrote:

  If you use Linux, it has very unpleased behavior in fsync(); It locks all
  metadata of the file being fsync-ed. We have to wait for the completion of
  fsync when we do read(), write(), and even lseek().
 
 Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
 waiting for lseek() to complete when it committed.  But why does the
 backend which is syncing WAL/pg_control have to wait for syncing the
 data file?  They are, not to mention, different files, and WAL and
 data files are stored on separate disks.

Backends call lseek() in planning, so they have to wait fsync() to
the table that they will access. Even if all of data in the file is in
the cache, lseek() conflict with fsync(). You can see a lot of backends
are waiting in planning phase in checkpoints, not executing phase.


  it's not difficult at all to insert sleeps between fsync()s.

 Can we say our approach helps our
 environments, but it won't help you.  The kernel VM settings may help
 you.  Good luck!?

I didn't say such a thing at all.
There are several opinions in the discussion:
  1. High bgwriter setting is enough!
  2. Change your OS :-)
  3. Use O_SYNC or O_DIRECT, but very poor performance.
  4. We may settle for single fsync(), but not many fsync()s in a short time.
I just suggested 4.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa

To: Takayuki Tsunakawa [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Sent: Thursday, December 21, 2006 6:46 PM
Subject: Re: [HACKERS] Load distributed checkpoint

From: ITAGAKI Takahiro [EMAIL PROTECTED]
 Takayuki Tsunakawa [EMAIL PROTECTED] wrote:
 Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
 waiting for lseek() to complete when it committed.  But why does
the
 backend which is syncing WAL/pg_control have to wait for syncing
the
 data file?  They are, not to mention, different files, and WAL and
 data files are stored on separate disks.

 Backends call lseek() in planning, so they have to wait fsync() to
 the table that they will access. Even if all of data in the file is
in
 the cache, lseek() conflict with fsync(). You can see a lot of
backends
 are waiting in planning phase in checkpoints, not executing phase.

I see.  I found one backend like the following.  But one in my case
one out of 16 backends.  Most of others are waiting to acquire
WALWRITE lock.

#0  0x003a629c6902 in __lseek_nocancel () from
/lib64/tls/libc.so.6
#1  0x0056789f in FileSeek ()
#2  0x00574053 in mdnblocks ()
#3  0x00574f4a in smgrnblocks ()
#4  0x005489e8 in estimate_rel_size ()
#5  0x00548bee in get_relation_info ()
#6  0x0054aa3d in build_simple_rel ()
#7  0x00539c6b in add_base_rels_to_query ()
#8  0x0053b955 in query_planner ()
#9  0x0053c1c9 in grouping_planner ()
#10 0x0053d3b4 in subquery_planner ()
#11 0x0053d5b3 in planner ()
#12 0x005778fc in pg_plan_query ()
#13 0x0057798c in pg_plan_queries ()
#14 0x00577c53 in exec_simple_query ()

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Martijn van Oosterhout

On Thu, Dec 21, 2006 at 06:46:36PM +0900, ITAGAKI Takahiro wrote:
  Oh, really, what an evil fsync is!  Yes, I sometimes saw a backend
  waiting for lseek() to complete when it committed.  But why does the
  backend which is syncing WAL/pg_control have to wait for syncing the
  data file?  They are, not to mention, different files, and WAL and
  data files are stored on separate disks.
 
 Backends call lseek() in planning, so they have to wait fsync() to
 the table that they will access. Even if all of data in the file is in
 the cache, lseek() conflict with fsync(). You can see a lot of backends
 are waiting in planning phase in checkpoints, not executing phase.

Hmm, there are other ways to sync parts of a file. For example doing an
mmap()/msync()/munmap() cycle to start an asyncronous flush. But given
what you're saying that might suffer from the same problem.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

[HACKERS] Tuning single row operations

2006-12-21 Thread Simon Riggs

For 8.3 my goal is to improve the performance of single row operations,
such as
INSERT INTO foo ... VALUES (...)
UPDATE foo SET  WHERE unique index = values
DELETE FROM foo WHERE unique index = values

OLTP Assumptions
- all statements are prepared first, then executed with bound
parameters. 
- need to cover cases where these are executed from functions in a PL,
as well as the case where they are executed via a protocol v3+ message
from a latest-edition driver.
- we are likely to be using RETURNING clauses
- we care about both single connections issuing a stream of these
requests, as well as performance with many concurrent sessions
- we have Referential Integrity constraints defined
- we care about both with/without Slony replication
- we care about log-shipping/warm standby also
- we want to support all other features also: Most Advanced OSDB etc

I would like to discuss what opportunities exist to improve these
operations and to prioritise them for work during 8.3 and beyond.

Currently, I'm aware of these possibilities, some fairly vague
- set up index scan at plan time, not in executor
- stop the index scan immediately a single row is returned
- reduce WAL for updates when SET clause doesn't mention all cols
- avoid RI checks for update of a column not mentioned in SET
- separate prepared plan from plan state, to avoid memcpy
- avoid double access of buffer for UPDATE/DELETE by producing new fast
path through executor, streamlined for unique accesses
- turn off WAL for (some?) indexes and rebuild them following a crash
- HOT updates: don't do index inserts for unchanged indexed cols
- avoid explicit locking of indexes (at cost of concurrent index ops)
- improve RI check perf by caching small, static tables in each backend
- apply index filter conditions on index scan to avoid heap lookup
- others... feel free to add your own etc

Clearly, some of these need further work. The question is which ones
have sufficient promise to be worth taking further and what would the
priority order for that work be? I assume that a full feasibility
investigation is needed for each item and that there is *no* presumption
that something prioritised higher means it is pre-approved for
inclusion. I'll document the responses as an additional section of the
public TODO, some of which may be removed later if they prove
infeasible. Those would possibly be labelled: OLTP Performance and
Investigations: Items thought to be worth investigation.

I'd like to initiate some open discussion on how, given the above goal,
to improve performance of PostgreSQL. If you don't have any ideas or
opinions now, you're welcome to reply to this thread in the future to
introduce new possibilities.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] New version of money type

2006-12-21 Thread D'Arcy J.M. Cain

On Thu, 21 Dec 2006 00:21:08 -0800
David Fetter [EMAIL PROTECTED] wrote:
 On Wed, Dec 20, 2006 at 08:44:07PM -0500, D'Arcy J.M. Cain wrote:
  Now that 8.3 has been branched shall I go ahead and commit?  As
  discussed I will put the currency symbol back in just so that it can
  be discussed and worked on as a completely separate issue.  I have
  attached the current patch against HEAD.
 
 I noticed that all your numbers are in English.  Is it necessary to
 hard-code all that?  Also, you're assuming that powers of 10 which are
 divisible by 3 are the relevant ones.  In China, it's powers of 10
 divisible by 4, and in India, it's 0, 1, 2, 3, followed by odd numbers
 up through 19.

Very good points.  However, like the currency symbol issue I would like
to separate that into another discussion.  The code already exists with
the warts you mention (and more) and this proposal is to fix one thing
that will make it more useful to others.  Let's get that change in and
then start fixing up some of those other issues.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Zeugswetter Andreas ADI SD


  You were running the test on the very memory-depend machine.
  shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
  Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
  data and 2GB of memory. Storage is always the main part of
performace here,
  even not in checkpoints.
 
 Yes, I used half the size of RAM as the shared buffers, which is
 reasonable.  And I cached all the data.

For pg, half RAM for shared_buffers is too much. The ratio is good for
other db software, that does not use the OS cache.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Tuning single row operations

2006-12-21 Thread Richard Huxton


Simon Riggs wrote:

For 8.3 my goal is to improve the performance of single row operations,


Great. That's something that's useful across the board.


Currently, I'm aware of these possibilities, some fairly vague

...

- avoid RI checks for update of a column not mentioned in SET


Linked at least logically - conditional triggers (where 
old.statusnew.status or similar) could save on IF statements early in 
trigger functions.


...

- improve RI check perf by caching small, static tables in each backend
- apply index filter conditions on index scan to avoid heap lookup


For fkey checks against a basically static table could you get away with 
just checking the index and not the table?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Matthew O'Connor


Russell Smith wrote:
I thought the plan was to change the ALTER TABLE command to allow vacuum 
settings to be set.  



That is my understanding too.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tuning single row operations

2006-12-21 Thread Matthew O'Connor


Richard Huxton wrote:

Simon Riggs wrote:

- improve RI check perf by caching small, static tables in each backend
- apply index filter conditions on index scan to avoid heap lookup


For fkey checks against a basically static table could you get away with 
just checking the index and not the table?


I'm not sure that would fly, there is always the possibility it could 
change, I think the ENUMs patch will solve this performance problem.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD


 I'm not sure how much you can do with typing. Things like heap_getattr
 are macros, and thus untyped. Most places use attr as an index to an
 array, which also can't be type checked.
 
 If you switched everything over to inline functions you might 
 get it to
 work, but that's about it.
 
 IMHO the best solution is to offset the logical numbers by some
 constant...

Um, surely you meant offset the physical numbers. Imho the logical
numbers
need to stay 1-n, because those numbers are used way more often and are 
more user visible than the physical.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tuning single row operations

2006-12-21 Thread Simon Riggs

On Thu, 2006-12-21 at 09:36 -0500, Matthew O'Connor wrote:
 Richard Huxton wrote:
  Simon Riggs wrote:
  - improve RI check perf by caching small, static tables in each backend
  - apply index filter conditions on index scan to avoid heap lookup
  
  For fkey checks against a basically static table could you get away with 
  just checking the index and not the table?
 
 I'm not sure that would fly, there is always the possibility it could 
 change, I think the ENUMs patch will solve this performance problem.

Not using SQL Standard syntax it won't. 

I'd be happier if it worked with DOMAINs and happier still if we can get
it to optimise just bare datatypes. My objective is to tune a database
without needing to reload any of the tables and to ensure that RI is
effective in both directions (from referencing and referenced tables).

Perhaps there's a way to make that happen...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Simon Riggs

On Wed, 2006-12-20 at 09:47 -0500, Jim Nasby wrote:

 On the other hand, this would be the only part of the system where  
 the official interface/API is a system catalog table. Do we really  
 want to expose the internal representation of something as our API?  
 That doesn't seem wise to me...

Define and agree the API (the hard bit) and I'll code it (the easy bit).

We may as well have something on the table, even if it changes later.

Dave: How does PgAdmin handle setting table-specific autovacuum
parameters? (Does it?)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] log_min_error_statement and parameters value

2006-12-21 Thread JEAN-PIERRE PELLETIER


Hi,

In PostgreSQL 8.2, log_statement display actual parameters value which is 
very neat as in

DETAIL:  parameters: $1 = '1', $2 = NULL

Currently I used log_statement='all' to display all statements and their 
parameters but
that produced a lot of output and I am really only interested in the 
statements displayed with log_min_error_statement.


Is it possible to set log_min_error_statement to display the parameters 
value?


Thanks,
Jean-Pierre Pelletier
e-djuster



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] log_min_error_statement and parameters value

2006-12-21 Thread Tom Lane

JEAN-PIERRE PELLETIER [EMAIL PROTECTED] writes:
 Is it possible to set log_min_error_statement to display the parameters 
 value?

No --- displaying the parameter values requires being able to run
user-defined output functions, which we can't do in an already-failed
transaction.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan


Zeugswetter Andreas ADI SD wrote:

I'm not sure how much you can do with typing. Things like heap_getattr
are macros, and thus untyped. Most places use attr as an index to an
array, which also can't be type checked.

If you switched everything over to inline functions you might 
get it to

work, but that's about it.

IMHO the best solution is to offset the logical numbers by some
constant...



Um, surely you meant offset the physical numbers. Imho the logical
numbers
need to stay 1-n, because those numbers are used way more often and are 
more user visible than the physical.



  


I don't think we should expose the offset to user view at all - this is 
just for internal use, no?


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] ERROR: tuple concurrently updated

2006-12-21 Thread Tom Lane

Stephen Frost [EMAIL PROTECTED] writes:
 I havn't built a reliable test case yet but I *think* the tuple
 concurrently updated problem is with an analyze being run inside of a
 function and also being run by autovacuum.

If so it should be fixed as of 8.2 --- I believe we changed the locking
rules to ensure only one ANALYZE at a time for any one table.

Conflicts from concurrent ANALYZEs are the only cases I've heard of
before that make this error occur in the field, but I suppose it would
be possible to get it from other things such as concurrently trying to
CREATE OR REPLACE the same function.

 The SysCache stuff I was
 thinking about previously was actually for another problem that I hadn't
 seen in a long time (because I hadn't been doing a particular set of
 operations, not because it's that difficult to have happen) but just ran
 into again today:
 ERROR:  cache lookup failed for relation ...

I think we've got a solution for that in 8.2, also --- at least, the
only common case I know of should be fixed, namely where a RENAME or
similar has caused the same table name to be assigned to a new OID.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout

On Thu, Dec 21, 2006 at 10:27:12AM -0500, Andrew Dunstan wrote:
 Um, surely you meant offset the physical numbers. Imho the logical
 numbers
 need to stay 1-n, because those numbers are used way more often and are 
 more user visible than the physical.
 
 
   
 
 I don't think we should expose the offset to user view at all - this is 
 just for internal use, no?

The thing is, physical index numbers has meaning, the logical index
number does not. In a view definition we're going to store the physical
index, not the logical one, for example. We don't want rearranging
columns to invalidate view definitions or plans.

The number of places needing the logical index are not that man,
relativelyy, and given it has no intrinsic meaning, it's better to give
it a numeric value which is obviously abritrary (like 10001).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] New version of money type

2006-12-21 Thread Tom Lane

D'Arcy J.M. Cain darcy@druid.net writes:
 Very good points.  However, like the currency symbol issue I would like
 to separate that into another discussion.  The code already exists with
 the warts you mention (and more) and this proposal is to fix one thing
 that will make it more useful to others.  Let's get that change in and
 then start fixing up some of those other issues.

I've forgotten now --- was this patch intended *only* to convert money
from int4 to int8 underlying representation, or did you do other things?
It looks like there are unrelated changes in the patch, but I'm not sure
if you just moved code around or did something more interesting.

One bug I see in it is that you'd better make the alignment 'd' if the
type is to be int8.  Also I much dislike these changes:

-   int32   i = PG_GETARG_INT32(1);
+   int64   i = PG_GETARG_INT32(1);

I think they may not actually be wrong, but they certainly *look* wrong;
in general the declared type of a parameter variable in a C-coded SQL
function ought to match what the SQL signature says.  Anyway there is no
need that I can see to widen these variables.  Every C compiler knows
what to do if you ask it for arithmetic on a long and an int.

(Speaking of which, have you thought about what happens on a machine
with no 64-bit int, such that int64 is really just 32 bits?  Ideally
the code should continue to function but with reduced range.  I didn't
see any places where you were obviously depending on the range, but
it's something to have in the back of your mind while coding.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane

Martijn van Oosterhout kleptog@svana.org writes:
 The thing is, physical index numbers has meaning, the logical index
 number does not. In a view definition we're going to store the physical
 index, not the logical one, for example.

Really?  To me that's one of a large number of questions that are
unresolved about how we'd do this.  You can make a case for either
choice in quite a number of places.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] ERROR: tuple concurrently updated

2006-12-21 Thread Stephen Frost

* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  I havn't built a reliable test case yet but I *think* the tuple
  concurrently updated problem is with an analyze being run inside of a
  function and also being run by autovacuum.
 
 If so it should be fixed as of 8.2 --- I believe we changed the locking
 rules to ensure only one ANALYZE at a time for any one table.
[...]
 I think we've got a solution for that in 8.2, also --- at least, the
 only common case I know of should be fixed, namely where a RENAME or
 similar has caused the same table name to be assigned to a new OID.

Great!  These were on 8.1 and I was actually just working to try and
reproduce them on 8.2 (without success so far!).  I'll see about
upgrading the production systems to 8.2 soon and will let ya'll know if
I see them again there.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Dave Page


Simon Riggs wrote:

On Wed, 2006-12-20 at 09:47 -0500, Jim Nasby wrote:

On the other hand, this would be the only part of the system where  
the official interface/API is a system catalog table. Do we really  
want to expose the internal representation of something as our API?  
That doesn't seem wise to me...


Define and agree the API (the hard bit) and I'll code it (the easy bit).

We may as well have something on the table, even if it changes later.

Dave: How does PgAdmin handle setting table-specific autovacuum
parameters? (Does it?)



Yes, it adds/removes/edits rows in pg_autovacuum as required.

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Stats Collector Oddity

2006-12-21 Thread Chris Browne

[EMAIL PROTECTED] (Tom Lane) writes:
 Chris Browne [EMAIL PROTECTED] writes:
 We're getting a bit of an anomaly relating to pg_stat_activity...
 ...
 That PID has been dead for several days, but this connection is marked
 as being open, still, after lo many days.

 This probably just means that the backend termination stats message
 got dropped due to heavy load.  That's expected behavior in all pre-8.2
 releases: the stats system was never intended to provide
 guaranteed-exactly-correct status.  PG 8.2 has reimplemented the
 pg_stat_activity view to make it more trustworthy.  (The other stuff is
 still probabilistic, but being just event counters, message loss isn't
 so obvious.)

That seems a *bit* surprising; the system wasn't expected to be under
particularly heavy load during the period in question; I would have
expected particularly light load.  No matter; there may have been
some brief heavy load to cause this.

There isn't any way, short of restarting the postmaster, to get rid of
that PID, is there?
-- 
cbbrowne,@,linuxdatabases.info
http://cbbrowne.com/info/linuxdistributions.html
High-level languages are a pretty good indicator that all else is
seldom equal. - Tim Bradshaw, comp.lang.lisp

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD


  I don't think we should expose the offset to user view at all - this
is 
  just for internal use, no?
 
 The thing is, physical index numbers has meaning, the logical index
 number does not. In a view definition we're going to store the
physical
 index, not the logical one, for example. We don't want rearranging
 columns to invalidate view definitions or plans.

I think we lack a definition here:

logical number: the order of columns when doing select *
physical number:the position inside the heap tuple (maybe with
offset)

All views and plans and index definitions and most everyting else 
needs to reference the logical number.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout

On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  The thing is, physical index numbers has meaning, the logical index
  number does not. In a view definition we're going to store the physical
  index, not the logical one, for example.
 
 Really?  To me that's one of a large number of questions that are
 unresolved about how we'd do this.  You can make a case for either
 choice in quite a number of places.

Can we? For anything of any permenence (view definitions, rules,
compiled functions, plans, etc) you're going to want the physical
number, for the same reason we store the oids of functions and tables.

I can't see the optimiser or executor caring about logical numbers
either. The planner would use it only when looking up column names.

The logical number isn't going to be used much I think. You can go from
column name to physical index directly, without ever looking up the
logical index. That's why I'm suggesting adding some large constant to
the logical numbers, since they're going to be less used in general.

Where do you think we have the choice?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout

On Thu, Dec 21, 2006 at 05:06:53PM +0100, Zeugswetter Andreas ADI SD wrote:
  The thing is, physical index numbers has meaning, the logical index
  number does not. In a view definition we're going to store the
  physical index, not the logical one, for example. We don't want
  rearranging columns to invalidate view definitions or plans.
 
 I think we lack a definition here:
 
 logical number:   the order of columns when doing select *
 physical number:  the position inside the heap tuple (maybe with
 offset)
 
 All views and plans and index definitions and most everyting else 
 needs to reference the logical number.

Huh? If I have an index on the first two columns of a table, it's going
to refernce columns 1 and 2.

If you alter the table to put a column in front of those two, the new
column will be physical 3, logical 1.

If the index references logical numbers, the index has just been
broken. If the index references physical numbers, everything works
without changes.

Same with views, if you use logical numbers you have to rebuild the
view each time. Why bother, when physical numbers work and don't have
that problem?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane

Martijn van Oosterhout kleptog@svana.org writes:
 On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:
 Really?  To me that's one of a large number of questions that are
 unresolved about how we'd do this.  You can make a case for either
 choice in quite a number of places.

 Can we? For anything of any permenence (view definitions, rules,
 compiled functions, plans, etc) you're going to want the physical
 number, for the same reason we store the oids of functions and tables.

Not if we intend to rearrange the physical numbers during column
add/drop to provide better packing.

You could make a case that we need *three* numbers: a permanent column
ID, a display position, and a storage position.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD


   The thing is, physical index numbers has meaning, the logical
index
   number does not. In a view definition we're going to store the
   physical index, not the logical one, for example. We don't want
   rearranging columns to invalidate view definitions or plans.
  
  I think we lack a definition here:
  
  logical number: the order of columns when doing select *
  physical number:the position inside the heap tuple (maybe with
  offset)
  
  All views and plans and index definitions and most everyting else 
  needs to reference the logical number.
 
 Huh? If I have an index on the first two columns of a table, 
 it's going
 to refernce columns 1 and 2.
 
 If you alter the table to put a column in front of those two, the new
 column will be physical 3, logical 1.

No, you change pg_index to now contain 2,3.

 If the index references logical numbers, the index has just been
 broken. If the index references physical numbers, everything works
 without changes.

yup, sinval

 Same with views, if you use logical numbers you have to rebuild the
 view each time. Why bother, when physical numbers work and don't have
 that problem?

Because it would imho be a nightmare to handle ...

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch

2006-12-21 Thread Kevin Grittner

 On Wed, Dec 20, 2006 at  5:33 AM, in message
[EMAIL PROTECTED],
Russell Smith [EMAIL PROTECTED] wrote: 
 
 The 8.1 documentation for ALTER TABLE states the following.
 
 Adding a column with a non-null default or changing the type of an 
 existing column will require the entire table to be rewritten. This
may 
 take a significant amount of time for a large table; and it will 
 temporarily require double the disk space.
 
 
 Now, we are rewriting the table from scratch anyway, the on disk
format 
 is changing.  What is stopping us from switching the column order at
the 
 same time.  The only thing I can think is that the catalogs will need

 more work to update them.  It's a middle sized price to pay for being

 able to reorder the columns in the table.  One of the problems I have
is 
 wanting to add a column in the middle of the table, but FK
constraints 
 stop me dropping the table to do the reorder.  If ALTER TABLE would
let 
 me stick it in the middle and rewrite the table on disk, I wouldn't 
 care.  It's likely that I would be rewriting the table anyway.  And
by 
 specifying AT POSITION, or BEFORE/AFTER you know for big tables it's

 going to take a while.
 
 Not that I'm able to code this at all, but I'm interested in feedback
on 
 this option.
 
+1
 
Currently, I often have to make the choice between adding a column at
the logical place in relation to the other columns or adding it at the
end.  The former requires creating a whole new table, populating it with
INSERT/SELECT, dropping the old table, renaming the new table, and
restoring permissions, constraints, indexes, etc.  The latter is a
simple ALTER TABLE.  When I choose the former, I save significant time
and reduce errors by using pg_dump to generate a lot of the code; but it
should would be a nice feature if ALTER TABLE could do all this under
the covers.
 
-Kevin
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan


Tom Lane wrote:

Martijn van Oosterhout kleptog@svana.org writes:
  

On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:


Really?  To me that's one of a large number of questions that are
unresolved about how we'd do this.  You can make a case for either
choice in quite a number of places.
  


  

Can we? For anything of any permenence (view definitions, rules,
compiled functions, plans, etc) you're going to want the physical
number, for the same reason we store the oids of functions and tables.



Not if we intend to rearrange the physical numbers during column
add/drop to provide better packing.

You could make a case that we need *three* numbers: a permanent column
ID, a display position, and a storage position.


  


Could this not be handled by some catalog fixup after an add/drop? If we 
get the having 3 numbers you will almost have me convinced that this 
might be too complicated after all.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Stats Collector Oddity

2006-12-21 Thread Tom Lane

Chris Browne [EMAIL PROTECTED] writes:
 There isn't any way, short of restarting the postmaster, to get rid of
 that PID, is there?

The entry will get overwritten when that BackendId slot gets re-used,
so just starting enough concurrent backends should do it.  (Since
incoming backends always take the lowest free slot, the fact that the
dead entry has persisted awhile means that it must have a number higher
than your normal number of concurrent sessions ... which is evidence
in favor of the idea that it happened during a load spike ...)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Lukas Kahwe Smith


Bruce Momjian wrote:


[1] http://developer.postgresql.org/index.php/Todo:WishlistFor83


That looks helpful.


Ok good :)

Seriously though, please slap me when things are wrong, not assigned yet 
to the correct person .. there was a bit of guess work involved with 
some of the points .. especially with the names.


I will let things sit like they are for 1-2 weeks and I will probably 
try to get a hold off each of the people still assigned to items then to 
confirm that they are actually targeting the feature for 8.3 .. or am I 
duplicating Bruce's efforts when I do this?


regards,
Lukas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 You could make a case that we need *three* numbers: a permanent column
 ID, a display position, and a storage position.

 Could this not be handled by some catalog fixup after an add/drop? If we 
 get the having 3 numbers you will almost have me convinced that this 
 might be too complicated after all.

Actually, the more I think about it the more I think that 3 numbers
might be the answer.  99% of the code would use only the permanent ID.
Display position would be used in *exactly* one place, namely while
expanding SELECT foo.* --- I can't think of any other part of the
backend that would care about it.  (Obviously, client-side code such
as psql's \d would use it too.)  Use of storage position could be
localized into a few low-level tuple access functions, probably.

The problems we've been having with the concept stem precisely from
trying to misuse either display or storage position as a permanent ID.
That's fine as long as it actually is permanent, but as soon as you
want to change it then you have problems.  We should all understand
this perfectly well from a database theory standpoint: pg_attribute
has to have a persistent primary key.  (attrelid, attnum) is that key,
and we can't go around altering a column's attnum without creating
problems for ourselves.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout

On Thu, Dec 21, 2006 at 11:15:38AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  Can we? For anything of any permenence (view definitions, rules,
  compiled functions, plans, etc) you're going to want the physical
  number, for the same reason we store the oids of functions and tables.
 
 Not if we intend to rearrange the physical numbers during column
 add/drop to provide better packing.

Urk! If that's what people are suggesting, I'd run away very quickly.
Getting better packing during table create is a nice idea, but
preserving it across add/drop column is just... evil.

Run CLUSTER is you want that, I was expecting add/drop to be a simple
catalog change, nothing more.

 You could make a case that we need *three* numbers: a permanent column
 ID, a display position, and a storage position.

That's just way too complicated IMHO. It add's extra levels of
indirection all over the place.

I was envisiging the physical number to be fixed and immutable (ie
storage position = permanent position).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] New version of money type

2006-12-21 Thread D'Arcy J.M. Cain

On Thu, 21 Dec 2006 10:47:52 -0500
Tom Lane [EMAIL PROTECTED] wrote:
 D'Arcy J.M. Cain darcy@druid.net writes:
  Very good points.  However, like the currency symbol issue I would like
  to separate that into another discussion.  The code already exists with
  the warts you mention (and more) and this proposal is to fix one thing
  that will make it more useful to others.  Let's get that change in and
  then start fixing up some of those other issues.
 
 I've forgotten now --- was this patch intended *only* to convert money
 from int4 to int8 underlying representation, or did you do other things?

Well, the main intention was to just widen the underlying storage and
thus increase the range to the point that the type is useful to more
users.  In fact, as you can see, I have removed the change to drop the
currency on output just to keep this change to a single issue.
However, there was a little bit of cleanup as well.  I removed some
self-balancing XXX comments for example.  That's what CVS log is for.
I moved a few functions around in order to make static functions self
prototyping.  I added some consts to variables where appropriate.  The
cash_words function needed to be changed to accomodate the billions,
trillions and quadrillions that can now be handled.

Everything else should be directly related to the type change and
self-explanatory.

 It looks like there are unrelated changes in the patch, but I'm not sure
 if you just moved code around or did something more interesting.

Hopefully nothing too interesting.  :-)

 One bug I see in it is that you'd better make the alignment 'd' if the

Fixed in my local tree.  Thanks.

 type is to be int8.  Also I much dislike these changes:
 
 - int32   i = PG_GETARG_INT32(1);
 + int64   i = PG_GETARG_INT32(1);
 
 I think they may not actually be wrong, but they certainly *look* wrong;
 in general the declared type of a parameter variable in a C-coded SQL
 function ought to match what the SQL signature says.  Anyway there is no
 need that I can see to widen these variables.  Every C compiler knows
 what to do if you ask it for arithmetic on a long and an int.

Right but I still need to accept int64 args here.  I have changed the
two relevant places to use PG_GETARG_INT64(1).

 (Speaking of which, have you thought about what happens on a machine
 with no 64-bit int, such that int64 is really just 32 bits?  Ideally
 the code should continue to function but with reduced range.  I didn't
 see any places where you were obviously depending on the range, but
 it's something to have in the back of your mind while coding.)

Does PGSQL run on any such machines?  If so perhaps someone can
volunteer to do some testing if they have one.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Companies Contributing to Open Source

2006-12-21 Thread Simon Riggs

The paper is a good one, from my perspective. It does address important
issues and maybe we don't all agree on the exact places lines have been
drawn, but I think we probably do agree that these things need to be
said. Now that they have been said, we must allow reasonable time for
the understanding to percolate and for appropriate changes of direction
to take place. We can't undo the past, but we can change the future.

On Tue, 2006-12-19 at 19:13 -0500, Bruce Momjian wrote:
 This actually brings up an important distinction.  Joshua is saying
 that
 the community is painted as god in the article, and I agree there is
 a
 basis for that, but I don't think you can consider the community and
 company as equals either.  

  Also, the community is developing the software at a rate that
 almost no other company can match, so again the company is kind of in
 toe if they are working with the community process.  For example, the
 community is not submitting patches for the company to approve.

The community is developing software quickly because there are some/many
full-time paid staff assigned to the project. We (the Community) need to
recognise that the Community is *all* of us and that includes various
Companies i.e. Companies aren't distinct from the Community. In that
sense, I would agree that The Community is above Companies.

We must be appreciative of contributions made in imperfect conditions.
Frequently changes are made behind closed doors and then approval is
given to release the software, sometimes after extensive lobbying. We
shouldn't shun those contributions, even while advising those companies
that we'd prefer it if they didn't do it that way next time. We should
assume that all development is done with the best intentions, even if
things don't follow the FAQ. Now that we have some clear policy on this,
I look forward to people being able to say best not do it that way, the
Community has a clear policy against that, that Teodor, myself and
others can advise sponsors about.

BTW, the phrase Companies must also include any external Enterprise,
since many good things come our way from Universities and Colleges. We
should also recognise that many enterprises are in fact non-profit, or
simply local/national government/administrative organisations. Profit
per se is not the only thing that drives requirements.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Andrew Dunstan


Lukas Kahwe Smith wrote:

Bruce Momjian wrote:


[1] http://developer.postgresql.org/index.php/Todo:WishlistFor83


That looks helpful.


Ok good :)

Seriously though, please slap me when things are wrong, not assigned 
yet to the correct person .. there was a bit of guess work involved 
with some of the points .. especially with the names.


I will let things sit like they are for 1-2 weeks and I will probably 
try to get a hold off each of the people still assigned to items then 
to confirm that they are actually targeting the feature for 8.3 .. or 
am I duplicating Bruce's efforts when I do this?






To the best of my understanding, the current PSM effort is a pgFoundry 
project to create it as a loadable language. I am not aware of anything 
that would get it to where we could include it in 8.3, so this item 
should probably not be on the list.


Also, clustered/replication solutions seems way too vague. It is not 
clear to me what if anything is in prospect on this front for 8.3.



cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane

Martijn van Oosterhout kleptog@svana.org writes:
 I was envisiging the physical number to be fixed and immutable (ie
 storage position = permanent position).

There are two different problems being discussed here, and one of them
is insoluble if we take that position: people would like the system to
automatically lay out tables to minimize alignment overhead and access
costs (eg, put fixed-width columns first).  This is not the same as
I would like to change the display column order.

It's true that for an ADD COLUMN that doesn't already force a table
rewrite, forcing one to improve packing is probably bad.  My thought
would be that we leave the column storage order alone if we don't have
to rewrite the table ... but any rewriting variant of ALTER TABLE could
optimize the storage order while it was at it.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan


Tom Lane wrote:

Actually, the more I think about it the more I think that 3 numbers
might be the answer.  99% of the code would use only the permanent ID.
Display position would be used in *exactly* one place, namely while
expanding SELECT foo.* --- I can't think of any other part of the
backend that would care about it.


Insert without a column list will need the logical ordering, I think. 
Also use of like foo in a create table statement. I'm not dead sure 
there aren't one or two others lurking. But I agree that the number is 
small.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Jim Nasby


How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]

... or would that create a whole bunch of reserved words?

On Dec 21, 2006, at 10:04 AM, Simon Riggs wrote:


On Wed, 2006-12-20 at 09:47 -0500, Jim Nasby wrote:


On the other hand, this would be the only part of the system where
the official interface/API is a system catalog table. Do we really
want to expose the internal representation of something as our API?
That doesn't seem wise to me...


Define and agree the API (the hard bit) and I'll code it (the easy  
bit).


We may as well have something on the table, even if it changes later.

Dave: How does PgAdmin handle setting table-specific autovacuum
parameters? (Does it?)

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



--
Jim Nasby   [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Gregory Stark


Jim Nasby [EMAIL PROTECTED] writes:

 How about...

 ALTER TABLE ...
 ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
 ALTER AUTOANALYZE [ THRESHOLD | SCALE ]

 ... or would that create a whole bunch of reserved words?

The way to predict when you're going to run into conflicts in a case like this
is to ask what happens if you have a column named autovacuum or
autoanalyze...

Sometimes the parser can look ahead to the next keyword to determine which
production to use but usually you're best off just looking for a grammatical
construct that doesn't look ambiguous even to a naive human reader.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Andrew Dunstan


Jim Nasby wrote:

How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]



Given these remarks from Tom:


Where we are currently at is experimenting to find
out what autovacuum's control knobs ought to be.  The catalog table was
a suitably low-effort way to expose a first cut at the knobs.


doesn't making language level changes seem more than somewhat premature? 
Or have we finished experimenting?


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Bundle of patches

2006-12-21 Thread Teodor Sigaev


0.9 doesn't apply cleanly after Peter's changes, so, new version

http://www.sigaev.ru/misc/user_defined_typmod-0.10.gz

Teodor Sigaev wrote:

Perhaps an array of int4 would be better?  How much

Done
http://www.sigaev.ru/misc/user_defined_typmod-0.9.gz


The patch needs more cleanup before applying, too, eg make comments
match code, get rid of unused keywords added to gram.y.


Cleaned.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Richard Huxton


Gregory Stark wrote:

Jim Nasby [EMAIL PROTECTED] writes:


How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]

... or would that create a whole bunch of reserved words?


The way to predict when you're going to run into conflicts in a case like this
is to ask what happens if you have a column named autovacuum or
autoanalyze...


Might it not be cleaner to treat them as scoped configuration values?

ALTER TABLE foo SET autovacuum.threshold = ...

Presumably it's not going to be the last such setting, and would give 
you a common format for setting all manner of system-object related things:

 - column statistics
 - fill-factor
 - comment
 - per-column locale (when we get it)
 - any module-related tuning (tsearch2? slony?)

That way the parser just needs to treat the next thing after SET as a 
(possibly compound) identifier.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch in core patch, for review

2006-12-21 Thread Teodor Sigaev




patch: http://www.sigaev.ru/misc/tsearch_core-0.27.gz

http://www.sigaev.ru/misc/tsearch_core-0.28.gz
new version, because of XML commit - old patch doesn't apply cleanly.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [COMMITTERS] pgsql: Initial SQL/XML support: xml data type and

2006-12-21 Thread Stefan Kaltenbrunner

Peter Eisentraut wrote:
 Log Message:
 ---
 Initial SQL/XML support: xml data type and initial set of functions.

this seems to cause regression failures on all the buildfarm members
(none of them are yet building with xml support).

http://www.pgbuildfarm.org/cgi-bin/show_status.pl

Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [COMMITTERS] pgsql: Initial SQL/XML support: xml data type and initial set of

2006-12-21 Thread Peter Eisentraut

Stefan Kaltenbrunner wrote:
 Peter Eisentraut wrote:
  Log Message:
  ---
  Initial SQL/XML support: xml data type and initial set of
  functions.

 this seems to cause regression failures on all the buildfarm members

Should be fixed now.  I don't know why that one file was outdated.
-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] inet/cidr

2006-12-21 Thread Andrew - Supernews

 Worky == Worky Workerson [EMAIL PROTECTED] writes:

 Worky I was looking at upgrading to 8.2, but I make extensive use of
 Worky the IP4 module.

The needed changes to ip4r to build on 8.2 are already in its CVS, and
as far as I know works, the only reason I've not done another release
yet is because I haven't had a chance to test for regressions on 8.1.

-- 
Andrew, Supernews
http://www.supernews.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] executing a dml within a utility

2006-12-21 Thread Ehab Galal

I created a CMD_UTILITY, which upon being executed by the user should do the 
following atomically:

1- drops a table
2- deletes from another table all rows with a certain criterion: DELETE 
FROM pg_t1 WHERE att = val
3- deletes from a third table all rows with a certain criterion: DELETE 
FROM pg_t2 WHERE att = val


I do the first step using ProcessUtility(). How can i do the 2nd/3rd steps? 
Any suggestions?


I tried calling exec_simple_query() but this didn't work. I am getting an 
error cannot drop active portal



Thanks a lot,
ehab

_
Find sales, coupons, and free shipping, all in one place!  MSN Shopping 
Sales  Deals 
http://shopping.msn.com/content/shp/?ctid=198,ptnrid=176,ptnrdata=200639



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] executing a dml within a utility

2006-12-21 Thread Alvaro Herrera

Ehab Galal wrote:
 I created a CMD_UTILITY, which upon being executed by the user should do 
 the following atomically:
 1- drops a table
 2- deletes from another table all rows with a certain criterion: DELETE 
 FROM pg_t1 WHERE att = val
 3- deletes from a third table all rows with a certain criterion: DELETE 
 FROM pg_t2 WHERE att = val
 
 I do the first step using ProcessUtility(). How can i do the 2nd/3rd steps? 
 Any suggestions?

Don't do that.  Instead, use performDeletion and register your tuples in
the pg_t1 and pg_t2 catalogs in pg_depend (using recordDependencyOn).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Release 8.2.0 done, 8.3 development starts

2006-12-21 Thread Bruce Momjian

Lukas Kahwe Smith wrote:
 Bruce Momjian wrote:
 
  [1] http://developer.postgresql.org/index.php/Todo:WishlistFor83
  
  That looks helpful.
 
 Ok good :)
 
 Seriously though, please slap me when things are wrong, not assigned yet 
 to the correct person .. there was a bit of guess work involved with 
 some of the points .. especially with the names.
 
 I will let things sit like they are for 1-2 weeks and I will probably 
 try to get a hold off each of the people still assigned to items then to 
 confirm that they are actually targeting the feature for 8.3 .. or am I 
 duplicating Bruce's efforts when I do this?

What I did for the 8.2 open items tracking during feature freeze was to
list the item description, status, and person's name and email in a
table that could be easily reviewed.  Let me know if you need help
setting that up.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa

- Original Message - 
From: Zeugswetter Andreas ADI SD [EMAIL PROTECTED]
To: Takayuki Tsunakawa [EMAIL PROTECTED]; ITAGAKI
Takahiro [EMAIL PROTECTED]
  Yes, I used half the size of RAM as the shared buffers, which is
  reasonable.  And I cached all the data.

 For pg, half RAM for shared_buffers is too much. The ratio is good
for
 other db software, that does not use the OS cache.

What percentage of RAM is recommended for shared buffers in general?
40%?  30%?  Or, is the general recommendation like According to the
amount of your data, this much RAM should be left for the kernel
cache.  But tha's the story on Linux.  It may be different for other
OSes.?
Hmm,  if it is so, it sounds hard for system designers/administrators
to judge.

- Original Message - 
From: Zeugswetter Andreas ADI SD [EMAIL PROTECTED]
To: Takayuki Tsunakawa [EMAIL PROTECTED]; ITAGAKI
Takahiro [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Sent: Thursday, December 21, 2006 11:04 PM
Subject: RE: [HACKERS] Load distributed checkpoint

  You were running the test on the very memory-depend machine.
  shared_buffers = 4GB / The scaling factor is 50, 800MB of data.
  Thet would be why the patch did not work. I tested it with DBT-2,
10GB of
  data and 2GB of memory. Storage is always the main part of
performace here,
  even not in checkpoints.

 Yes, I used half the size of RAM as the shared buffers, which is
 reasonable.  And I cached all the data.

For pg, half RAM for shared_buffers is too much. The ratio is good for
other db software, that does not use the OS cache.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Companies Contributing to Open Source

2006-12-21 Thread Kevin Grittner

 On Tue, Dec 19, 2006 at  6:13 PM, in message
[EMAIL PROTECTED], Bruce Momjian
[EMAIL PROTECTED] wrote:
 if the company dies, the community keeps going (as it did after
Great
 Bridge, without a hickup), but if the community dies, the company
dies
 too.
 
This statement seems to ignore organizations for which PostgreSQL is an
implementation detail in their current environment.  While we appreciate
PostgreSQL and are likely to try to make an occasional contribution,
where it seems to be mutually beneficial, the Wisconsin State Courts
would survive the collapse of the PostgreSQL community.
 
While I can only guess at the reasons you may have put the slant you
did on the document, I think it really should reflect the patient
assistance the community provides to those who read the developers FAQ
and make a good faith effort to comply with what is outlined there.  The
cooperative, professional, and helpful demeanor of the members of this
community is something which should balanced against the community's
need to act as a gatekeeper on submissions.
 
I have recent experience as a first time employee contributor.  When we
hit a bump in our initial use of PostgreSQL because of the non-standard
character string literals, you were gracious in accepting our quick
patch as being possibly of some value in the implementation of the
related TODO item.  You were then helpful in our effort to do a proper
implementation of the TODO item which fixes it.  I see that the patch I
submitted was improved by someone before it made the release, which is
great.
 
This illustrates how the process can work.  I informed management of
the problem, and presented the options -- we could do our own little
hack that we then had to maintain and apply as the versions moved along,
or we could try to do fix which the community would accept and have that
feature just work for us for all subsequent releases.  The latter was
a little more time up front, but resulted in a better quality product
for us, and less work in the long term.  It was also presumably of some
benefit to the community, which has indirect benefit to our
organization.  Nobody here wants to switch database products again soon,
so if we can solve our problem in a way that helps the product gain
momentum, all the better.
 
I ran a consulting business for decades, and I know that there is a
great variation in the attitudes among managers.  Many are quite
reasonable.  I'm reminded of a meeting early in my career with a
businessman who owned and operated half a dozen successful businesses in
a variety of areas.  He proposed a deal that I was on the verge of
accepting, albeit somewhat reluctantly.  He stopped me and told me that
he hoped to continue to do business with me, so any deal we made had to
benefit and work for both of us or it was no good at all; if I was
uncomfortable with something in the proposal, we should talk it out. 
That's the core of what we're trying to say in this document, isn't it? 
The rest is an executive overview of the developer FAQ?  I can't help
feeling that even with the revisions so far it could have a more
positive spin.
 
-Kevin
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] Load distributed checkpoint patch

2006-12-21 Thread Kevin Grittner

 On Wed, Dec 20, 2006 at  6:05 AM, in message
[EMAIL PROTECTED], Takayuki Tsunakawa
[EMAIL PROTECTED] wrote: 
 
 I consider that smoothing the load (more meaningfully, response
time)
 has higher priority over checkpoint punctuality in a practical
sense,
 because the users of a system benefit from good steady response and
 give good reputation to the system.
 
I agree with that.
 
 If the checkpoint processing is
 not punctual, crash recovery would take longer time.  But which
would
 you give higher priority, the unlikely event (=crash of the system)
or
 likely event (=peek hours of the system)?  I believe the latter
should
 be regarded.
 
I'm still with you here.
 
 The system can write dirty buffers after the peek hours
 pass.
 
I don't see that in our busiest environment.
 
We have 3,000 directly connected users, various business partner
interfaces, and public web entry doing OLTP in 72 databases distributed
around the state, with real-time replication to central databases which
are considered derived copies.  If all the pages modified on the central
databases were held in buffers or cache until after peak hours, query
performance would suffer -- assuming it would all even fit in cache.  We
must have a way for dirty pages to be written under load while
responding to hundreds of thousands of queries per hour without
disturbing freezes during checkpoints.
 
On top of that, we monitor database requests on the source machines,
and during idle time we synchronize the data with all of the targets
to identify, log, and correct drift.  So even if we could shift all
our disk writes to the end of the day, that would have its own down
side, in extending our synchronization cycle.
 
I raise this only to be sure that such environments are considered with
these changes, not to discourage improvements in the checkpoint
techniques.  We have effectively eliminated checkpoint problems in our
environment with a combination of battery backed controller cache and
aggressive background writer configuration.  When you have a patch which
seems to help those who still have problems, I'll try to get time
approved to run a transaction replication stream onto one of our servers
(in catch up mode) while we do a web stress test by playing back
requests from our production log.  That should indicate how the patch
will affect us.
 
-Kevin
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Actually, the more I think about it the more I think that 3 numbers
 might be the answer.  99% of the code would use only the permanent ID.

Don't we already have such a permanent number -- just one we don't use
anywhere in the data model? Namely the oid of the pg_attribute entry. It's
actually a bit odd that we don't use it since we use the oid of just about
every other system catalog record as the primary key.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Actually, the more I think about it the more I think that 3 numbers
 might be the answer.  99% of the code would use only the permanent ID.

 Don't we already have such a permanent number -- just one we don't use
 anywhere in the data model? Namely the oid of the pg_attribute entry.

Nope, because pg_attribute hasn't got OIDs.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] Load distributed checkpoint patch

2006-12-21 Thread ITAGAKI Takahiro


Kevin Grittner [EMAIL PROTECTED] wrote:

  I consider that smoothing the load (more meaningfully, response time)
  has higher priority over checkpoint punctuality in a practical sense,
  
 I agree with that.

I agree with checkpoint_time is not so important, but we should
respect checkpoint_segements, or else new WAL files would be
created unboundedly, as Bruce pointed.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Stats Collector Oddity

2006-12-21 Thread Christopher Browne

A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Tom Lane) wrote:
 Chris Browne [EMAIL PROTECTED] writes:
 There isn't any way, short of restarting the postmaster, to get rid of
 that PID, is there?

 The entry will get overwritten when that BackendId slot gets re-used,
 so just starting enough concurrent backends should do it.  (Since
 incoming backends always take the lowest free slot, the fact that the
 dead entry has persisted awhile means that it must have a number higher
 than your normal number of concurrent sessions ... which is evidence
 in favor of the idea that it happened during a load spike ...)

Cool.  I started up a nice little bunch of psql sessions in the
background, and then once they were all up, shut down my shell
session, thereby eliminating them.  And that did, indeed, clear out
that pg_stat_activity entry.

... And five minutes later, Nagios sent me message indicating that
node had recovered from having an ancient open connection.

I'll re-add a few gratuitous details here in the hopes that that makes
this easily findable if anyone else should search for the issue...

The Problem:
 - pg_stat_activity was reporting an elderly transaction in progress

 - that backend process wasn't running anymore

 - pg_stat_activity *was* reporting other legitimate activity; this
   was not the scenario where it had gotten deranged (normally due to
   excessive load)

 - Per Tom's comments, there evidently *was* some load spike where
   the closing of this particular connection did not get logged by
   the stats collector

The Solution: 

 - We needed to roll the stats collector through a bunch of its slots
   in order to clean the apparently-still-populated entry out.

 - Ran, in a shell:
 for i in `seq 100`; do
 psql 
 done

   That left 100 psql sessions in the background, all connected to the
   database backend.

 - Closed the shell.  That then HUPped the 100 psql sessions.

That got the offending pg_stat_activity entry cleared out.
-- 
let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;;
http://linuxdatabases.info/info/finances.html
Temporary tattoos  are a CRITICAL  ELEMENT  of our security strategy.
To suggest otherwise is sheer lunacy.  -- Reid Fleming, cDc

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Load distributed checkpoint patch

2006-12-21 Thread Takayuki Tsunakawa

Hello, Mr. Grittner,

From: Kevin Grittner [EMAIL PROTECTED]
 We have 3,000 directly connected users, various business partner
 interfaces, and public web entry doing OLTP in 72 databases
distributed
 around the state, with real-time replication to central databases
which
 are considered derived copies.

What a big system you have.

   If all the pages modified on the central
 databases were held in buffers or cache until after peak hours,
query
 performance would suffer -- assuming it would all even fit in cache.
We
 must have a way for dirty pages to be written under load while
 responding to hundreds of thousands of queries per hour without
 disturbing freezes during checkpoints.

I agree with you.  My words were not good.  I consider it is necessary
to always advance checkpoints even under heavy load, caring OLTP
transactions.

 I raise this only to be sure that such environments are considered
with
 these changes, not to discourage improvements in the checkpoint
 techniques.  We have effectively eliminated checkpoint problems in
our
 environment with a combination of battery backed controller cache
and
 aggressive background writer configuration.  When you have a patch
which
 seems to help those who still have problems, I'll try to get time
 approved to run a transaction replication stream onto one of our
servers
 (in catch up mode) while we do a web stress test by playing back
 requests from our production log.  That should indicate how the
patch
 will affect us.

Thank you very much for your kind offer.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread ITAGAKI Takahiro


Takayuki Tsunakawa [EMAIL PROTECTED] wrote:

  For pg, half RAM for shared_buffers is too much. The ratio is good for
  other db software, that does not use the OS cache.
 
 What percentage of RAM is recommended for shared buffers in general?
 40%?  30%?  Or, is the general recommendation like According to the
 amount of your data, this much RAM should be left for the kernel
 cache.  But tha's the story on Linux.  It may be different for other
 OSes.?
 Hmm,  if it is so, it sounds hard for system designers/administrators
 to judge.

If you use linux, try the following settings:
  1. Decrease /proc/sys/vm/dirty_ratio and dirty_background_ratio.
  2. Increase wal_buffers to redule WAL flushing.
  3. Set wal_sync_method to open_sync; O_SYNC is faster then fsync().
  4. Separate data and WAL files into different partitions or disks.

I suppose 1 is important for you, because kernel will not write dirty
buffers until 10% of buffers become dirty in default settings.
You have large memory (8GB), but small data set (800MB). So kernel
almost never writes buffers not in checkpoints. Accumulate dirty buffers
are written at a burst in fsync().


We would be happy if we would be free from a difficult combination
of tuning. If you have *idea for improvements*, please suggest it.
I think we've already understood *problem itself*.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Companies Contributing to Open Source

2006-12-21 Thread Guido Barosio


quote
Companies often bring fresh prespective, ideas, and testing
infrastucture to a project.
/quote

prespective || perspective ?

g.-


On 12/21/06, Kevin Grittner [EMAIL PROTECTED] wrote:

 On Tue, Dec 19, 2006 at  6:13 PM, in message
[EMAIL PROTECTED], Bruce Momjian
[EMAIL PROTECTED] wrote:
 if the company dies, the community keeps going (as it did after
Great
 Bridge, without a hickup), but if the community dies, the company
dies
 too.

This statement seems to ignore organizations for which PostgreSQL is an
implementation detail in their current environment.  While we appreciate
PostgreSQL and are likely to try to make an occasional contribution,
where it seems to be mutually beneficial, the Wisconsin State Courts
would survive the collapse of the PostgreSQL community.

While I can only guess at the reasons you may have put the slant you
did on the document, I think it really should reflect the patient
assistance the community provides to those who read the developers FAQ
and make a good faith effort to comply with what is outlined there.  The
cooperative, professional, and helpful demeanor of the members of this
community is something which should balanced against the community's
need to act as a gatekeeper on submissions.

I have recent experience as a first time employee contributor.  When we
hit a bump in our initial use of PostgreSQL because of the non-standard
character string literals, you were gracious in accepting our quick
patch as being possibly of some value in the implementation of the
related TODO item.  You were then helpful in our effort to do a proper
implementation of the TODO item which fixes it.  I see that the patch I
submitted was improved by someone before it made the release, which is
great.

This illustrates how the process can work.  I informed management of
the problem, and presented the options -- we could do our own little
hack that we then had to maintain and apply as the versions moved along,
or we could try to do fix which the community would accept and have that
feature just work for us for all subsequent releases.  The latter was
a little more time up front, but resulted in a better quality product
for us, and less work in the long term.  It was also presumably of some
benefit to the community, which has indirect benefit to our
organization.  Nobody here wants to switch database products again soon,
so if we can solve our problem in a way that helps the product gain
momentum, all the better.

I ran a consulting business for decades, and I know that there is a
great variation in the attitudes among managers.  Many are quite
reasonable.  I'm reminded of a meeting early in my career with a
businessman who owned and operated half a dozen successful businesses in
a variety of areas.  He proposed a deal that I was on the verge of
accepting, albeit somewhat reluctantly.  He stopped me and told me that
he hoped to continue to do business with me, so any deal we made had to
benefit and work for both of us or it was no good at all; if I was
uncomfortable with something in the proposal, we should talk it out.
That's the core of what we're trying to say in this document, isn't it?
The rest is an executive overview of the developer FAQ?  I can't help
feeling that even with the revisions so far it could have a more
positive spin.

-Kevin



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




--
Guido Barosio
---
http://www.globant.com
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Jim Nasby


On Dec 21, 2006, at 1:28 PM, Andrew Dunstan wrote:

Jim Nasby wrote:

How about...

ALTER TABLE ...
ALTER AUTOVACUUM [ THRESHOLD | SCALE | COST DELAY | COST LIMIT ]
ALTER AUTOANALYZE [ THRESHOLD | SCALE ]


Given these remarks from Tom:


Where we are currently at is experimenting to find
out what autovacuum's control knobs ought to be.  The catalog  
table was

a suitably low-effort way to expose a first cut at the knobs.


doesn't making language level changes seem more than somewhat  
premature? Or have we finished experimenting?


Well, the only one I could possibly see removing would be threshold,  
but the reality is that these parameters have been kicking around  
since 7.4, so...


But I do like Richard Huxton's suggestion for syntax... that looks a  
lot more flexible than what I proposed.


The only other thought that comes to mind is that such syntax will  
make it a *lot* more verbose to set all the options for a table. But  
I don't know how often people feel the need to set *all* of them at  
once... Still, it might be worth continuing to support people poking  
values directly into the table; I just don't think we want to make  
that the official interface.

--
Jim Nasby   [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa

Hello, Itagaki-san,

Thank you for an interesting piece of information.

From: ITAGAKI Takahiro [EMAIL PROTECTED]
 If you use linux, try the following settings:
  1. Decrease /proc/sys/vm/dirty_ratio and dirty_background_ratio.
  2. Increase wal_buffers to redule WAL flushing.
  3. Set wal_sync_method to open_sync; O_SYNC is faster then fsync().
  4. Separate data and WAL files into different partitions or disks.

 I suppose 1 is important for you, because kernel will not write
dirty
 buffers until 10% of buffers become dirty in default settings.
 You have large memory (8GB), but small data set (800MB). So kernel
 almost never writes buffers not in checkpoints. Accumulate dirty
buffers
 are written at a burst in fsync().

I'll show the results of this tuning to share information with people
who don't have experience of this kind.
The numbers shown below are the tps when running pgbench -c16 -t100
postgres five times in succession.

(1) Default case(this is show again for comparison and reminder)
The bgwriter_* and checkpoint_* are set to those defaults.
wal_buffers and wal_sync_method are also set to those defaults (64kB
and fdatasync respectively.)

235  80  226  77  240


(2) Default + WAL 1MB case
The configuration is the same as case (1) except that wal_buffers is
set to 1024kB.

302  328  82  330  85

This is better improvement than I expected.


(3) Default + wal_sync_method=open_sync case
The configuration is the same as case (1) except that wal_sync_method
is set to open_sync.

162  67  176  67  164

Too bad compared to case (2).  Do you know the reason?


(4) (2)+(3) case

322  350  85  321  84

This is good, too.


(5) (4) + /proc/sys/vm/dirty* tuning
dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
changed from 40 to 4.

308  349  84  349  84

The tuning of kernel cache doesn't appear to bring performance
improvement in my env.  The kernel still waits too long before it
starts flushing dirty buffers because the cache is large?  If so,
increasingly available RAM may cause trouble more frequently in the
near future.  Do the dirty_*_ratio accept values less than 1?

BTW, in case (1), the best response time of a transaction was 6
milliseconds.  On the other hand, the worst response time was 13
seconds.


 We would be happy if we would be free from a difficult combination
 of tuning. If you have *idea for improvements*, please suggest it.
 I think we've already understood *problem itself*.

I agree with you.  Let's make the ideas more concrete, doing some
experimentations.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Interface for pg_autovacuum

2006-12-21 Thread Tom Lane

Jim Nasby [EMAIL PROTECTED] writes:
 The only other thought that comes to mind is that such syntax will  
 make it a *lot* more verbose to set all the options for a table.

Which should surely make you wonder whether setting these options
per-table is the most important thing to do...

Arguing about syntax details is pretty premature, in my humble opinion.
We don't have agreement yet about what options we need or what scope
they should apply over.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Greg Smith


On Wed, 20 Dec 2006, Inaam Rana wrote:

Talking of bgwriter_* parameters I think we are missing a crucial 
internal counter i.e. number of dirty pages. How much work bgwriter has 
to do at each wakeup call should be a function of total buffers and 
currently dirty buffers.


This is actually a question I'd been meaning to throw out myself to this 
list.  How hard would it be to add an internal counter to the buffer 
management scheme that kept track of the current number of dirty pages? 
I've been looking at the bufmgr code lately trying to figure out how to 
insert one as part of building an auto-tuning bgwriter, but it's unclear 
to me how I'd lock such a resource properly and scalably.  I have a 
feeling I'd be inserting a single-process locking bottleneck into that 
code with any of the naive implementations I considered.


The main problem I've been seeing is also long waits stuck behind a slow 
fsync on Linux.  What I've been moving toward testing is an approach 
slightly different from the proposals here.  What if all the database page 
writes (background writer, buffer eviction, or checkpoint scan) were 
counted and periodic fsync requests send to the bgwriter based on that? 
For example, when I know I have a battery-backed caching controller that 
will buffer 64MB worth of data for me, if I forced a fsync after every 
6000 8K writes, no single fsync would get stuck waiting for the disk to 
write for longer than I'd like.


Give the admin a max_writes_before_sync parameter, make the default of 0 
work just like the current behavior, and off you go; a simple tunable that 
doesn't require a complicated scheme to implement or break anybody's 
existing setup.  Combined with a properly tuned background writer, that 
would solve the issues I've been running into.  It would even make the 
problem of Linux caching too many writes until checkpoint time go away (I 
know how to eliminate that by adjusting caching policy, but I have to be 
root to do it; a DBA should be able to work around that issue even if they 
don't have access to the kernel tunables.)


While I'm all for testing to prove me wrong, my gut feel is that going all 
the way to sync writes a la Oracle is a doomed approach, particularly on 
low-end hardware where they're super expensive.  Following The Oracle Way 
is a good roadmap for a lot of things, but I wouldn't put building a lean 
enough database to run on modest hardware on that list.  You can do sync 
writes with perfectly good performance on systems with a good 
battery-backed cache, but I think you'll get creamed in comparisons 
against MySQL on IDE disks if you start walking down that path; since 
right now a fair comparison with similar logging behavior is an even match 
there, that's a step backwards.


Also on the topic of sync writes to the database proper:  wouldn't using 
O_DIRECT for those potentially counter-productive?  I was under the 
impressions that one of the behaviors counted on by Postgres was that data 
evicted from its buffer cache, eventually intended for writing to disk, 
was still kept around for a bit in the OS buffer cache.  A subsequent read 
because the data was needed again might find the data already in the OS 
buffer, therefore avoiding an actual disk read; that substantially reduces 
the typical penalty for the database engine making a bad choice on what to 
evict.  I fear a move to direct writes would put more pressure on the LRU 
implementation to be very smart, and that's code that you really don't 
want to be more complicated.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Takayuki Tsunakawa

From: Takayuki Tsunakawa [EMAIL PROTECTED]
 (5) (4) + /proc/sys/vm/dirty* tuning
 dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
 changed from 40 to 4.

 308  349  84  349  84

Sorry, I forgot to include the result when using Itagaki-san's patch.
The patch showd the following tps for case (5).

323  350  340  59  225

The best response time was 4 msec, and the worst one was 16 seconds.


- Original Message - 
From: Takayuki Tsunakawa [EMAIL PROTECTED]
To: ITAGAKI Takahiro [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Sent: Friday, December 22, 2006 3:20 PM
Subject: Re: [HACKERS] Load distributed checkpoint


 Hello, Itagaki-san,

 Thank you for an interesting piece of information.

 From: ITAGAKI Takahiro [EMAIL PROTECTED]
 If you use linux, try the following settings:
  1. Decrease /proc/sys/vm/dirty_ratio and dirty_background_ratio.
  2. Increase wal_buffers to redule WAL flushing.
  3. Set wal_sync_method to open_sync; O_SYNC is faster then
fsync().
  4. Separate data and WAL files into different partitions or disks.

 I suppose 1 is important for you, because kernel will not write
 dirty
 buffers until 10% of buffers become dirty in default settings.
 You have large memory (8GB), but small data set (800MB). So kernel
 almost never writes buffers not in checkpoints. Accumulate dirty
 buffers
 are written at a burst in fsync().

 I'll show the results of this tuning to share information with
people
 who don't have experience of this kind.
 The numbers shown below are the tps when running pgbench -c16 -t100
 postgres five times in succession.

 (1) Default case(this is show again for comparison and reminder)
 The bgwriter_* and checkpoint_* are set to those defaults.
 wal_buffers and wal_sync_method are also set to those defaults (64kB
 and fdatasync respectively.)

 235  80  226  77  240


 (2) Default + WAL 1MB case
 The configuration is the same as case (1) except that wal_buffers is
 set to 1024kB.

 302  328  82  330  85

 This is better improvement than I expected.


 (3) Default + wal_sync_method=open_sync case
 The configuration is the same as case (1) except that
wal_sync_method
 is set to open_sync.

 162  67  176  67  164

 Too bad compared to case (2).  Do you know the reason?


 (4) (2)+(3) case

 322  350  85  321  84

 This is good, too.


 (5) (4) + /proc/sys/vm/dirty* tuning
 dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
 changed from 40 to 4.

 308  349  84  349  84

 The tuning of kernel cache doesn't appear to bring performance
 improvement in my env.  The kernel still waits too long before it
 starts flushing dirty buffers because the cache is large?  If so,
 increasingly available RAM may cause trouble more frequently in the
 near future.  Do the dirty_*_ratio accept values less than 1?

 BTW, in case (1), the best response time of a transaction was 6
 milliseconds.  On the other hand, the worst response time was 13
 seconds.


 We would be happy if we would be free from a difficult combination
 of tuning. If you have *idea for improvements*, please suggest it.
 I think we've already understood *problem itself*.

 I agree with you.  Let's make the ideas more concrete, doing some
 experimentations.




 ---(end of
broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Load distributed checkpoint

2006-12-21 Thread Inaam Rana


On 12/22/06, Takayuki Tsunakawa [EMAIL PROTECTED] wrote:


From: Takayuki Tsunakawa [EMAIL PROTECTED]
 (5) (4) + /proc/sys/vm/dirty* tuning
 dirty_background_ratio is changed from 10 to 1, and dirty_ratio is
 changed from 40 to 4.

 308  349  84  349  84

Sorry, I forgot to include the result when using Itagaki-san's patch.
The patch showd the following tps for case (5).

323  350  340  59  225

The best response time was 4 msec, and the worst one was 16 seconds.




Which IO Shceduler (elevator) you are using?

--
Inaam Rana
EnterpriseDB   http://www.enterprisedb.com

[HACKERS] problem with web interface for mailing lists?

2006-12-21 Thread Pavel Stehule


Hello,

I see las actualisation from 18. december

regards
Pavel Stehule

_
Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

70 matches

Mail list logo