Re: [HACKERS] PG Killed by OOM Condition

2005-10-04 Thread Jeff Davis
 It's not an easy decision. Linux isn't wrong. Solaris isn't wrong.
 Most people never hit these problems, and the people that do, are
 just as likely to hit one problem, or the other. The grass is always
 greener on the side of the fence that isn't hurting me right now,
 and all that.
 
 Cheers,
 mark
 

Thanks, a very informative reply.

Do you have some references where I can learn more?

I think that I've run into the OOM killer without a fork() being
involved, but I could be wrong. Is it possible to be hit by the OOM
killer if no applications use fork()?

Regards,
Jeff Davis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PG Killed by OOM Condition

2005-10-04 Thread Dennis Bjorklund
On Mon, 3 Oct 2005, Jeff Davis wrote:

 involved, but I could be wrong. Is it possible to be hit by the OOM
 killer if no applications use fork()?

Sure, whenever the system is out of mem and the os can't find a free page 
then it kills a process. If you check the kernel log you can see if the 
oom killer have been doing some work.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Tuning current tuplesort external sort code for 8.2

2005-10-04 Thread Simon Riggs
On Tue, 2005-10-04 at 00:21 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  AFAICS the following opportunities exist, without changing any of the
  theoretical algorithms or the flexibility of definable datatypes:
 
  1. tuplesort_heap_siftup and tuplesort_heap_insert make no attempt to
  cache the values of keys that have been obtained from *_getattr macros.
  The two routines navigate a tournament sort heap, so that on average 50%
  of comparisons use at least one immediately preceeding tuple and key
  values from that could be cached ready for the next call.
 
 Hmm ... this seems interesting, but you also have to look at the
 potential downside: what is the cost of doing the caching?

Actually, nothing for the cache-1-tuple case. We just re-initialise the
fcinfo based around whether we are caching 0,1 or 2 fields. We keep two
fcinfo structures, so that when we cache=0 fields we do not pollute the
cache for the cache=1 case. 

tuplesort_heap_insert allows us to cache one value each time round the
inner loop, always using fcinfo1. (cache=1). tuplesort_heap_siftup has
two comparisons per iteration; it allows us to cache 0 values on the
first call, so we use fcinfo0. We can always cache one value for the
second call, exactly same as _heap_insert case, so we use fcinfo1. Maybe
we can also reuse the winner from the first call as the second input to
the second call. ehem...It's easier to see when you look at the code.

We need to work a bit harder to get the cache=2 case but its possible,
though I take your point that it may not be worth it. I'll do the
cache=1 case before attempting the cache=2 case, so we can measure the
gain.

We run _heap_insert and _heap_siftup once for each incoming tuple, so
the saving should be good.

  All of the remaining ideas relate to NULL handling.
 
 I can't get excited about this. Most sort scenarios involve few if any
 nulls.

The idea would be much less important anyway if you follow up on this
next idea:

 One thought that comes to mind is that the current system structure
 encourages sorting on keys that are at the end of their tuples.
 For instance, SELECT foo, bar FROM ... ORDER BY baz will sort by
 the third column of the generated tuples, which is certainly the least
 efficient to access.  

Not sure what goes on in there. Are you saying heap sort cols are always
at the end, or only in the cases you mention? What about where we're
doing a GROUP BY, so all the sort columns are always part of the select
list and frequently (by coding convention) at the front of the row?

If the cols aren't always in resjunk we can come up with some
performance test cases to check whether or not this is a problem, how
big and where it starts to show itself.

 It'd be interesting to look into generating the
 working tuples with baz as the first column.  I fear this is nontrivial,
 because there are a lot of places that expect resjunk columns to come
 last, but we should study it.  

It could be worth it to arrange the main select's attr list so that the
sort keys always come first. I'd settle just for that if the second part
is too much work.

It sounds like hard work to optimise the case where the ORDER BY is on
columns not mentioned in the select. The latter seems worth it for when
we do a sort-merge based on join columns not mentioned in the SELECT,
but not for optimizing sort-aggregate or one-table-select cases.

 (Note: this will do nada for Josh's
 original complaint about index build time, since btree index sorts will
 by definition use all the tuple's columns, but it seems like a
 significant issue for ordinary sorts.)

Understood, but we do care about heap sorts too! Heap and index sorts
have many dissimilarities, but both are important, so perhaps we should
tune for each case separately.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PG Killed by OOM Condition

2005-10-04 Thread Martijn van Oosterhout
On Mon, Oct 03, 2005 at 11:47:57PM -0700, Jeff Davis wrote:
 I think that I've run into the OOM killer without a fork() being
 involved, but I could be wrong. Is it possible to be hit by the OOM
 killer if no applications use fork()?

fork() is the obvious overcomitter. If Netscape wants to spawn a new
process, it first has to fork 50MB of memory, then free probably most
of it because it execs some little plugin. If processes mmap() a large block
and then doesn't use it until later. Similar idea with brk(). If you
run out of swap at the wrong moment... Recent versions are more clever
about who to kill. Sometimes you just get unlucky...

It's always killed the right process for me (Mozilla derivative leaked
masses of memory over long period).
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgppqV21TxiXq.pgp
Description: PGP signature


Re: [HACKERS] Tuning current tuplesort external sort code for 8.2

2005-10-04 Thread Martijn van Oosterhout
On Tue, Oct 04, 2005 at 12:37:51AM +0100, Simon Riggs wrote:
 On Mon, 2005-10-03 at 23:25 +0200, Martijn van Oosterhout wrote:
  Please note: if inlineApplySortFunction() is actually inlined (it isn't
  by default)
 
 Can you explain your last post some more. Thats not what I get.

The inline keyword is just a flag that you would like the compiler to
inline it. GCC will decide itself. It has a limit on the size of
functions that it will consider for inlining.

Quote the gcc manual:

-finline-limit-N
 By default, gcc limits the size of functions that can be inlined.
 This flag allows the control of this limit for functions that are
 explicitly marked as inline (ie marked with the inline keyword or
 defined within the class definition in c++).  N is the size of
 functions that can be inlined in number of pseudo instructions
 (not counting parameter handling).  

It goes in to say that the limit is 1 for gcc 2.95, but if you
examine the manual for gcc 3.3 it has the limit at 600. So it's
entirely possible that at the time the person wrote that code, it *was*
being inlined, but it sure isn't on some versions of some compilers. I
experimented and found that -finline-limit-1500 causes it to start
inlining.

The -Winline flag causes gcc to print a warning if you specified
inline but gcc didn't inline it, for whatever reason.

Hopefully this clears that up.
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpSmmXmACi5l.pgp
Description: PGP signature


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Martijn van Oosterhout
On Mon, Oct 03, 2005 at 10:51:32PM +0100, Simon Riggs wrote:
  Basically, I recommend adding -Winline -finline-limit-1500 to the
  default build while we discuss other options.
 
 I add -Winline but get no warnings. Why would I use -finline-limit-1500?
 
 I'm interested, but uncertain as to what difference this makes. Surely
 using -O3 works fine?

Different versions of gcc have different ideas of when a function can
be inlined. From my reading of the documentation, this decision is
independant of optimisation level. Maybe your gcc version has a limit
higher than 1500 by default.
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgppgbguCvZK1.pgp
Description: PGP signature


Re: [HACKERS] Vacuum and Transactions

2005-10-04 Thread Simon Riggs
On Tue, 2005-10-04 at 00:26 -0400, Rod Taylor wrote:
 As I understand it vacuum operates outside of the regular transaction
 and if you stop it (SIGTERM, or pg_cancel_backend()) some of the work it
 accomplished will be kept when it rolls back.
 
 For large structures with a ton of dead entries (which I seem to have a
 case), running vacuum takes long enough that high-churn structures begin
 to experience difficulties.
 
 Is it reasonable to cancel and restart the vacuum process periodically
 (say every 12 hours) until it manages to complete the work? It takes
 about 2 hours to do the table scan, and should get in about 10 hours of
 index work each round.

That is what I've had to recommend in extreme cases, with some success. 

For a non-FULL VACUUM, all of the database changes it does will be kept,
though that is not the only cost, as you indicate.  However, you're
right to question it since it does have some downsides like not
correctly updating statistics at the end of the run.

I wouldn't try this with VACUUM FULL. In that case, I'd VACUUM first,
then when all dead-rows are gone go for the VACUUM FULL; but I would
find another way round that, like a CTAS.

The problem is that VACUUM doesn't emit enough messages for you to know
when it gets to the end of each phase, so you've not much clue about how
much of that 12 hours would be wasted. Though as you say, it seems
likely that much of it is worthwhile in the situation you describe.

The tipping point is when VACUUM finds more dead rows than fits within
maintenance_work_mem/(size of row pointer). Thats when we start to do
multiple passes of each of the indexes.

Maybe it would be good to have a VACUUM max-one-pass only command, to
allow you to break big VACUUMs down into smaller chunks. Or perhaps we
should have a trace_vacuum command as well to allow you to see where to
cancel it? (Put notices in lazy_vacuum_index and lazy_vacuum_heap).

Hope that helps.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tuning current tuplesort external sort code for 8.2

2005-10-04 Thread Martijn van Oosterhout
On Tue, Oct 04, 2005 at 11:55:58AM +0200, Martijn van Oosterhout wrote:
 It goes in to say that the limit is 1 for gcc 2.95, but if you
 examine the manual for gcc 3.3 it has the limit at 600. So it's
 entirely possible that at the time the person wrote that code, it *was*
 being inlined, but it sure isn't on some versions of some compilers. I
 experimented and found that -finline-limit-1500 causes it to start
 inlining.

From searching the web, it appears the inline limit was dropped from
1 to 600 between gcc 3.0.0 and 3.0.1 in response to complaints
about gcc memory usage. Any function that could be inlined needed to be
kept in memory in semicompiled form and in C++ where lots of inlinable
functions call eachother, the memory usage blew up completely.

The difference between -O2 and -O3 is that the latter will consider any
function for inlining, even if you didn't ask. For C programs that
basically means any function declared static. Also, the number is
pseudo-instructions and their meaning can change from version to
version.

Since we're pretty much relying on gcc to inline for performance, I
still think we should add -Winline by default so we can tell when it's
not doing what we want.

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpkTvFoZ4PXA.pgp
Description: PGP signature


[HACKERS] New pg_config behavior

2005-10-04 Thread Peter Eisentraut
I have some doubts about the new pg_config behavior of printing not recorded 
when an item is not known.  I doubt that this will cause automated users of 
this program to behave reasonably in error scenarios.  I think a more 
reasonable behavior would be to use a non-zero exit status and possibly print 
a message to stderr and nothing to stdout.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] memory bug debugging

2005-10-04 Thread Markus Schiltknecht
Hello hackers,

I'm fiddling around with the backend and have created a memory bug. I
guess I'm overriding a palloced chunk, but can't figure out where.

The backend's report is:

WARNING:  problem in alloc set MessageContext: req size  alloc size for
chunk 0x8383ca8 in block 0x83834a0

Investigating a little I have found out that chunk size is 8, but
requested size is: 1912602626  (more than 1GB???). This certainly
doesn't make sense.

What tools or debugging options exist to find memory leaks / overrides?
I've read the utils/mmgr/README, which leads me to the conclusion that I
cannot use valgrind or similar tools.

It would be very helpful to know what chunk it is, or better: where it
got allocated.

Thank you for your help

Markus



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tuning current tuplesort external sort code for 8.2

2005-10-04 Thread Simon Riggs
On Tue, 2005-10-04 at 12:37 +0200, Martijn van Oosterhout wrote:

 Since we're pretty much relying on gcc to inline for performance, I
 still think we should add -Winline by default so we can tell when it's
 not doing what we want.

Very much agree to this. Can we put that in the build for 8.1 please?

People need to know we expect certain code to be inlined and they need
warnings when this does not occur.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Simon Riggs
On Tue, 2005-10-04 at 12:04 +0200, Martijn van Oosterhout wrote:
 On Mon, Oct 03, 2005 at 10:51:32PM +0100, Simon Riggs wrote:
   Basically, I recommend adding -Winline -finline-limit-1500 to the
   default build while we discuss other options.
  
  I add -Winline but get no warnings. Why would I use -finline-limit-1500?
  
  I'm interested, but uncertain as to what difference this makes. Surely
  using -O3 works fine?

How did you determine the 1500 figure? Can you give some more info to
surround that recommendation to allow everybody to evaluate it?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] memory bug debugging

2005-10-04 Thread Martijn van Oosterhout
On Tue, Oct 04, 2005 at 01:11:41PM +0200, Markus Schiltknecht wrote:
 Hello hackers,
 
 I'm fiddling around with the backend and have created a memory bug. I
 guess I'm overriding a palloced chunk, but can't figure out where.

There are some defines (MEMORY_CONTEXT_CHECKING and
CLOBBER_FREED_MEMORY) which can help find leaks. Valgrind works too,
with a bit of attention. I was going to add some directives to allow
Valgrind to handle PostgreSQL's memory allocator, but have got there
yet.

-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpM1s4YvBPVo.pgp
Description: PGP signature


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Martijn van Oosterhout
On Tue, Oct 04, 2005 at 12:24:54PM +0100, Simon Riggs wrote:
 How did you determine the 1500 figure? Can you give some more info to
 surround that recommendation to allow everybody to evaluate it?

[EMAIL PROTECTED]:~/dl/cvs/pgsql-local/src/backend/utils/sort$ gcc 
-finline-limit-1000 -Winline -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wendif-labels -fno-strict-aliasing -g -I../../../../src/include -D_GNU_SOURCE  
 -c -o tuplesort.o tuplesort.c
tuplesort.c: In function 'applySortFunction':
tuplesort.c:1833: warning: inlining failed in call to 'inlineApplySortFunction'
tuplesort.c:1906: warning: called from here
tuplesort.c: In function 'comparetup_heap':
tuplesort.c:1833: warning: inlining failed in call to 'inlineApplySortFunction'
tuplesort.c:1937: warning: called from here
tuplesort.c: In function 'comparetup_index':
tuplesort.c:1833: warning: inlining failed in call to 'inlineApplySortFunction'
tuplesort.c:2048: warning: called from here
tuplesort.c: In function 'comparetup_datum':
tuplesort.c:1833: warning: inlining failed in call to 'inlineApplySortFunction'
tuplesort.c:2167: warning: called from here
[EMAIL PROTECTED]:~/dl/cvs/pgsql-local/src/backend/utils/sort$ gcc 
-finline-limit-1500 -Winline -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wendif-labels -fno-strict-aliasing -g -I../../../../src/include -D_GNU_SOURCE  
 -c -o tuplesort.o tuplesort.c
no warnings

A quick binary search puts the cutoff between 1200 and 1300. Given
version variation I picked a nice round number, 1500.

Ugh, that's for -O2, for -O3 and above it needs to be 4100 to work.
Maybe we should go for 5000 or so.

I'm using: gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpadXPF53tUp.pgp
Description: PGP signature


Re: [HACKERS] New Point Releases Available

2005-10-04 Thread Dave Page
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Marc 
 G. Fournier
 Sent: 04 October 2005 02:34
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] New Point Releases Available
 
 
 Just bundled up 7.3.11, 7.4.9 and 8.0.4 ... please look them 
 over and make 
 sure they look okay ... will announce late tomorrow to -announce ...

Website docs updated, Windows version of 8.0.4 bundled and uploaded, ftp
site symlinks updated.

Might take a day or so for the latter two to hit all the mirrors of
course...

Regards, Dave


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] New pg_config behavior

2005-10-04 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 I have some doubts about the new pg_config behavior of printing not 
 recorded 
 when an item is not known.  I doubt that this will cause automated users of 
 this program to behave reasonably in error scenarios.  I think a more 
 reasonable behavior would be to use a non-zero exit status and possibly print 
 a message to stderr and nothing to stdout.

Good point, although in practice that code only gets compiled in builds
with brain-dead Windows tools; I rather doubt that the resulting executable
would ever get used as you foresee.  But if you want to change it, OK
with me.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Ron Peacetree
The constants related to inlining involve pcode, not actual assembly 
instructions, and are compiler version dependent as well as subject to change 
without notice by the GNU folks...

from:
http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/Optimize-Options.html#Optimize-Options

-finline-limit=n
By default, gcc limits the size of functions that can be inlined. This flag 
allows the control of this limit for functions that are explicitly marked as 
inline (i.e., marked with the inline keyword or defined within the class 
definition in c++). n is the size of functions that can be inlined in number of 
pseudo instructions (not counting parameter handling). The default value of n 
is 600. Increasing this value can result in more inlined code at the cost of 
compilation time and memory consumption. Decreasing usually makes the 
compilation faster and less code will be inlined (which presumably means slower 
programs). This option is particularly useful for programs that use inlining 
heavily such as those based on recursive templates with C++.

Inlining is actually controlled by a number of parameters, which may be 
specified individually by using --param name=value. The -finline-limit=n option 
sets some of these parameters as follows:

max-inline-insns
is set to n.
max-inline-insns-single
is set to n/2.
max-inline-insns-auto
is set to n/2.
min-inline-insns
is set to 130 or n/4, whichever is smaller.
max-inline-insns-rtl
is set to n. 

Using -finline-limit=600 thus results in the default settings for these 
parameters. See below for a documentation of the individual parameters 
controlling inlining.

Note: pseudo instruction represents, in this particular context, an abstract 
measurement of function's size. In no way, it represents a count of assembly 
instructions and as such its exact meaning might change from one release to an 
another. 

Further Down It Says...

--param name=value
In some places, GCC uses various constants to control the amount of 
optimization that is done. For example, GCC will not inline functions that 
contain more that a certain number of instructions. You can control some of 
these constants on the command-line using the --param option.

The names of specific parameters, and the meaning of the values, are tied to 
the internals of the compiler, and are subject to change without notice in 
future releases.

In each case, the value is an integer. The allowable choices for name are given 
in the following table:

snip

max-inline-insns-single
Several parameters control the tree inliner used in gcc. This number sets the 
maximum number of instructions (counted in gcc's internal representation) in a 
single function that the tree inliner will consider for inlining. This only 
affects functions declared inline and methods implemented in a class 
declaration (C++). The default value is 300.

max-inline-insns-auto
When you use -finline-functions (included in -O3), a lot of functions that 
would otherwise not be considered for inlining by the compiler will be 
investigated. To those functions, a different (more restrictive) limit compared 
to functions declared inline can be applied. The default value is 300.

max-inline-insns
The tree inliner does decrease the allowable size for single functions to be 
inlined after we already inlined the number of instructions given here by 
repeated inlining. This number should be a factor of two or more larger than 
the single function limit. Higher numbers result in better runtime performance, 
but incur higher compile-time resource (CPU time, memory) requirements and 
result in larger binaries. Very high values are not advisable, as too large 
binaries may adversely affect runtime performance. The default value is 600.

max-inline-slope
After exceeding the maximum number of inlined instructions by repeated 
inlining, a linear function is used to decrease the allowable size for single 
functions. The slope of that function is the negative reciprocal of the number 
specified here. The default value is 32.

min-inline-insns
The repeated inlining is throttled more and more by the linear function after 
exceeding the limit. To avoid too much throttling, a minimum for this function 
is specified here to allow repeated inlining for very small functions even when 
a lot of repeated inlining already has been done. The default value is 130.

max-inline-insns-rtl
For languages that use the RTL inliner (this happens at a later stage than tree 
inlining), you can set the maximum allowable size (counted in RTL instructions) 
for the RTL inliner with this parameter. The default value is 600.


-Original Message-
From: Martijn van Oosterhout kleptog@svana.org
Sent: Oct 4, 2005 8:24 AM
To: Simon Riggs [EMAIL PROTECTED]
Cc: Tom Lane [EMAIL PROTECTED], Ron Peacetree [EMAIL PROTECTED], 
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External 

Re: [HACKERS] memory bug debugging

2005-10-04 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Tue, Oct 04, 2005 at 01:11:41PM +0200, Markus Schiltknecht wrote:
 I'm fiddling around with the backend and have created a memory bug. I
 guess I'm overriding a palloced chunk, but can't figure out where.

 There are some defines (MEMORY_CONTEXT_CHECKING and
 CLOBBER_FREED_MEMORY) which can help find leaks.

Those are on by default if you built with --enable-cassert (which you
should surely always do when testing new backend code).

Best idea that comes to mind is:

(1) make sure you have a reproducible scenario in which the clobber
always happens at the same physical location.  (This shouldn't be too
hard if you start a fresh backend for each trial.)

(2) run it under gdb and set a hardware watchpoint at that location.

You'll get some false matches from touches in the aset.c code, but
hopefully not too many.  The first breakpoint outside aset.c is your
problem.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] \d on database with a lot of tables is slow

2005-10-04 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 02:00:12PM -0400, Tom Lane wrote:
 I wrote:
  It's presumably mostly in the pg_table_is_visible() calls.
 
 I did some profiling on a test case with 1 tables, and noticed that
 a big part of the problem is that the catalog caches become entirely
 useless: almost every catcache lookup ends up going to the underlying
 tables.  This is because MAXCCTUPLES in catcache.c is fixed at 5000,
 and that's not an adequate working set for this many tables.  If you
 are willing to throw memory at the problem, you could try increasing
 MAXCCTUPLES (to say 50K or 100K) and see if that helps.

Out of curiosity... does catcache cache all pg_* tables? Also, at what
point would it be good to up NCCBUCKETS?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 A quick binary search puts the cutoff between 1200 and 1300. Given
 version variation I picked a nice round number, 1500.

 Ugh, that's for -O2, for -O3 and above it needs to be 4100 to work.
 Maybe we should go for 5000 or so.

 I'm using: gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)

I don't know what the units of this number are, but it's apparently far
too gcc-version-dependent to consider putting into our build scripts.
Using gcc version 4.0.1 20050727 (current Fedora Core 4 compiler) on
i386, and compiling tuplesort.c as you did, I find:
-O2: warning goes away between 800 and 900
-O3: warning is always there (tried values up to 1000)
(the latter behavior may indicate a bug, not sure).

What's even more interesting is that the warning does not appear in
either case if I omit -finline-limit --- so the default value is plenty.

At least on this particular compiler, the proposed switch would be
counterproductive.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump versioning

2005-10-04 Thread Jim C. Nasby
On Mon, Oct 03, 2005 at 12:11:49AM -0400, Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Jim C. Nasby wrote:
  Watching the discussion about how to handle nextval() and keep it dump
  compatible makes me wonder: would it be useful to encode database or
  dump version info into dumps?
 
  If we ever get to a case where we _need_ to use it, it would be good to
  have, just in case.
 
 The trouble is that it won't help you until years after you invest
 the work --- invariably, the version info you wish you had is for
 distinguishing between different *old* releases.

True, but that will always be the case until someone adds it. And in
this particular case, ISTM it shouldn't require much effort to add the
info. Plus, it would certainly be useful for users to be able to examine
a dump and find out what exact version of the database/pgdump was used
to produce it. This would also allow for pg_restore or even a dump piped
into psql to throw a warning or error if being fed a dump version it
might not be able to handle (ie: feeding a newer dump to and older
version).

 I'm not real excited about it myself.  My own design experience says
 that version numbers aren't that hot as a way of determining does this
 data have the X nature?.  If you can't test for the problem directly,
 you haven't thought hard enough.

True, but at least if the info is made available now it could be used
down the road if needed.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Simon Riggs
On Tue, 2005-10-04 at 16:30 +0200, Martijn van Oosterhout wrote:
 On Tue, Oct 04, 2005 at 10:06:24AM -0400, Tom Lane wrote:
  Martijn van Oosterhout kleptog@svana.org writes:
   I'm using: gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
  
  I don't know what the units of this number are, but it's apparently far
  too gcc-version-dependent to consider putting into our build scripts.
  Using gcc version 4.0.1 20050727 (current Fedora Core 4 compiler) on
  i386, and compiling tuplesort.c as you did, I find:
  -O2: warning goes away between 800 and 900
  -O3: warning is always there (tried values up to 1000)
  (the latter behavior may indicate a bug, not sure).

 Facsinating. The fact that the warning goes away if you don't specify
 -finline-limit seems to indicate they've gotten smarter. Or a bug.
 We'd have to check the asm code to see if it's actually inlined or
 not.

I've been using gcc 3.4 and saw no warning when using either -Winline
or -O3 -Winline.

Martijn, at the moment it sounds like this is a feature that we no
longer need to support - even if we should have done for previous
releases.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 1. Add -Winline so we can at least be aware of when it's (not) happening.

Yeah, I agree with that part, just not with adding a fixed -finline-limit
value.

While on the subject of gcc warnings ... if I touch that code, I want to
remove -Wold-style-definition from the default flags, too.  It's causing
much more clutter than it's worth, because all the flex files generate
several such warnings.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] effective SELECT from child tables

2005-10-04 Thread Jim C. Nasby
Is there enough for a TODO here?

On Mon, Oct 03, 2005 at 11:24:30PM -0400, Greg Stark wrote:
 
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  On P, 2005-10-02 at 23:00 -0400, Tom Lane wrote:
   
   Here's another interesting case to think about:
   
 ALTER TABLE ADD foo integer DEFAULT 1
 ...
 ALTER TABLE ALTER foo SET DEFAULT 2
  
   You'll have to pay the table-traversal cost on one step or the other.
  
  The second, ALTER ... SET DEFAULT, would only set default for newly
  inserted columns, not the ones which are missing due to tuples being
  created before the column existed.
 
 Hm. So you're saying there are only ever exactly two types of defaults. The
 initial default that applies to all tuples that were created before the
 column was added. And the current default that only ever applies to newly
 created tuples.
 
 That does seem to cleanly close this hole.
 
 -- 
 greg
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vacuum and Transactions

2005-10-04 Thread Rod Taylor
  Is it reasonable to cancel and restart the vacuum process periodically
 
 No.
 
 How big is that table, anyway?  Are you trying a VACUUM FULL, or plain
 vacuum?

It's only about 60GB in size but appears it has been missed for nearly a
month for vacuum and probably has a large percentage dead material (30%
or so).

I'm trying to run a plain vacuum and the Pg version is 8.0.3.

Building indexes on this structure also takes a significant amount of
time. I have maintenace_work_mem set to about 1GB in size.

The catch is that there are some other very active structures (like
pg_listener for Slony) which after a couple of hours without vacuuming
will quickly have the DB at an unreasonably high load (low tens) which
seems to all but halt the vacuum on the large structure.



Rightfully the table should be partitioned by time, but I haven't quite
figured out how to delete data from the old structure without increasing
the vacuum time required.

Another alternative I'm considering is to create new partial indexes for
the active structures, drop all of the old full table indexes and run
vacuum on that, then partition it.
-- 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Jim C. Nasby
On Sun, Oct 02, 2005 at 09:46:07PM -0400, Tom Lane wrote:
 Fredrik Olsson [EMAIL PROTECTED] writes:
  To allow indexes to be inherited so unique, foreign keys and such works 
  properly with inheritance has been on the todo for quite some time. I 
  thought that most probably it is a very non trivial thing, perhaps 
  completely rethinking how indexes are done.
 
 Yup, you're right.  There are a couple of major problems, to my mind:
 
 1. A cross-table index would need to store a table OID as well as the
 existing block/offset information in order to tell you what an entry is
 pointing at.  An extra 4 bytes per index entry (8 bytes if MAXALIGN is
 8) is a lot of overhead, so you'd not want to pay that all the time.
 Which means two index tuple header formats to support, which looks
 painful.  How can that be handled cleanly and efficiently?

Wouldn't it make more sense to use a smaller pointer to a table of OIDs
that that index covers? I don't know off-hand how much padding there
currently is in index tuples, but hopefully this would allow bringing
the space usage under control for common cases involving less than a few
dozen tables.

Another possibility is optimizing for the special case of indexing on a
partitioning key. In this case, index values would be very localized to
one table, so just storing the table info on each index page (or
something similar) would work well.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Martijn van Oosterhout
On Tue, Oct 04, 2005 at 10:06:24AM -0400, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  I'm using: gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
 
 I don't know what the units of this number are, but it's apparently far
 too gcc-version-dependent to consider putting into our build scripts.
 Using gcc version 4.0.1 20050727 (current Fedora Core 4 compiler) on
 i386, and compiling tuplesort.c as you did, I find:
   -O2: warning goes away between 800 and 900
   -O3: warning is always there (tried values up to 1000)
 (the latter behavior may indicate a bug, not sure).

Facsinating. The fact that the warning goes away if you don't specify
-finline-limit seems to indicate they've gotten smarter. Or a bug.
We'd have to check the asm code to see if it's actually inlined or
not.

Two options:
1. Add -Winline so we can at least be aware of when it's (not) happening.
2. If we can't get gcc to reliably inline, maybe we need to consider
other options?

In particular, move the isNull test statements out since they are ones
the optimiser can use to best effect.

Add if we put in -Winline, it would be visible to users while
compiling so they can tweak their own build options (if they care).
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgppKVdbv5luV.pgp
Description: PGP signature


Re: [HACKERS] Vacuum and Transactions

2005-10-04 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 As I understand it vacuum operates outside of the regular transaction

It's a perfectly normal transaction.

 Is it reasonable to cancel and restart the vacuum process periodically

No.

How big is that table, anyway?  Are you trying a VACUUM FULL, or plain
vacuum?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] effective SELECT from child tables

2005-10-04 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 07:59:11PM +0100, Simon Riggs wrote:
 On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote:
  To clarify, this is a hard-coded implementation of what I'm asking for:
  http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell:
  
  CREATE TABLE log_other (
  project_id  smallint NOT NULL
  ...
  )
  
  CREATE TABLE log_8 (
  -- No project_id
  ...
  )
  CREATE TABLE log_24, log_25, log_5...
  CREATE VIEW log AS
  SELECT * FROM log_other
  UNION ALL SELECT 8 AS project_id, * FROM log_8
  ...
  
  So the end result is that for cases where project_id is 5, 8, 24, or 25,
  the data will be stored in tables that don't have the project_id.
  
  If I were to use this on the main table for
  http://stats.distributed.net, which has ~130M rows, I would be able to
  save 130M*4 bytes (4 instead of 2 due to alignment), or 520MB. The logdb
  will have many times that number of rows, so the savings will be even
  larger.
  
  Note that this technique wouldn't help at all for something like date
  partitioning, because you have to store the date in the partitioned
  table.
 
 Jim,
 
 Your idea was noted before and actually; I mentioned it to show that I
 listen and take note of ideas from any source.
 
 For everybody, I would note that the current behaviour is exactly the
 way that List Partitioning works on other systems.
 
 The cost of this technique is only paid if you choose to partition on
 something that you would not otherwise have included in your table. In
 many cases, you'll choose a column that would have been in the table if
 you created one big table so the additional cost is zero.

Well, the idea is to be more space efficient than if one big table was
used. This is unique to this class of partitioning problems.

 In your example, I would expect to see project_id in a superclass table
 and so there would be no cost.

Superclass table?

 The idea is neat, but IMHO the potential saving of this idea is not big
 enough for me to prioritise that very highly over other items at this
 time.

Certainly. I only chimed in with a specific example so people could
better understand what the idea was. I know it's on the list and might
be addressed at some point. In the mean time it's not too horrible to
hard-code a solution.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Martijn van Oosterhout
On Tue, Oct 04, 2005 at 03:56:53PM +0100, Simon Riggs wrote:
 I've been using gcc 3.4 and saw no warning when using either -Winline
 or -O3 -Winline.

Ok, I've just installed 3.4 and verified that. I examined the asm code
and gcc is inlining it. I concede, at this point just throw in -Winline
and monitor the situation.

As an aside, the *_getattr calls end up a bit suboptimal though. It's
producing code like:

  cmp attlen, 4
  je $elsewhere1
  cmp attlen, 2
  je $elsewhere2
  ld byte
here:

--- much later ---
elsewhere1:
  ld integer
  jmp $here
elsewhere2:
  ld short
  jmp $here

No idea whether we want to go down the path of hinting to gcc which
size will be the most common.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgp4B0wVNkzsI.pgp
Description: PGP signature


Re: [HACKERS] Vacuum and Transactions

2005-10-04 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 The catch is that there are some other very active structures (like
 pg_listener for Slony) which after a couple of hours without vacuuming
 will quickly have the DB at an unreasonably high load (low tens) which
 seems to all but halt the vacuum on the large structure.

Yeah.  We desperately need to reimplement listen/notify :-( ... that
code was never designed to handle high event rates.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Sun, Oct 02, 2005 at 09:46:07PM -0400, Tom Lane wrote:
 1. A cross-table index would need to store a table OID as well as the
 existing block/offset information in order to tell you what an entry is
 pointing at.

 Wouldn't it make more sense to use a smaller pointer to a table of OIDs
 that that index covers?

Smaller than what?  Don't tell me you want to restrict how many tables a
cross-table index can handle :-(

In any case, the gain from doing that would be exactly zero because of
alignment considerations: the size of an index tuple header really has
to be a multiple of MAXALIGN.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Zeugswetter Andreas DAZ SD

 Another possibility is optimizing for the special case of 
 indexing on a partitioning key. In this case, index values 
 would be very localized to one table, so just storing the 
 table info on each index page (or something similar) would work well.

If you have the partitioning key in the index and the partitions don't
overlap, it is better to create separate [unique] indexes on the
subtables.
Building separate indexes per partition is usually preferred because of:
1. performance of dropping a partition
2. smaller index for CE

Only if you need an order by without a sort step, that spawns more
than one partition
things usually get ugly. Imho the best solution would be a merge node,
that merges results of
several index accesses to avoid a sort and still use separate indexes.
Such
a merge node could probably also detect the case where accessing
partitions in a certain 
order still produces ordered results.

Usually you will only want the one big unique index when the
partitioning is not 
reflectable in the index keys, and then (also in other db's) such an
index is usually a pain ...

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Jim C. Nasby
On Tue, Oct 04, 2005 at 11:05:49AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Sun, Oct 02, 2005 at 09:46:07PM -0400, Tom Lane wrote:
  1. A cross-table index would need to store a table OID as well as the
  existing block/offset information in order to tell you what an entry is
  pointing at.
 
  Wouldn't it make more sense to use a smaller pointer to a table of OIDs
  that that index covers?
 
 Smaller than what?  Don't tell me you want to restrict how many tables a
 cross-table index can handle :-(
 
 In any case, the gain from doing that would be exactly zero because of
 alignment considerations: the size of an index tuple header really has
 to be a multiple of MAXALIGN.

Hrm, I see that IndexTupleData doesn't have room 'left over' like
HeapTupleData does. If it did, it would probably be a win to allow for
indexes that are on less than X number of tables, where X is whatever
value we can fit into the tuple header. Since this could be exceeded at
any time, we'd also need a flag to indicate that a given tuple is for a
table that's not in the lookup table and that an actual OID is stored.

Given that that's not (currently) the case, it seems that the unused bit
could be used to indicate if the tuple was for a table other than the
one the index was originally created on. That would allow for adding a
table to an existing index without re-writing the entire thing. It could
also provide some speed improvement in cases where the table the index
was defined on contained the majority of the data, but that's a pretty
far-out corner case.

Of course, this is all academic performance tuning until we actually
have cross-table indexes.

Does that 'just' leave locking as the issue? I think cross-table indexes
are going to become a lot more important as our partitioning support
increases, so it would be good if this could get done for 8.2 (I think
it's on Simon's list right now).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Jim C. Nasby
Well, I never said unique, but you're correct, it's pretty undesirable
to put a global index on your partitioning key.

On Tue, Oct 04, 2005 at 06:16:21PM +0200, Zeugswetter Andreas DAZ SD wrote:
 
  Another possibility is optimizing for the special case of 
  indexing on a partitioning key. In this case, index values 
  would be very localized to one table, so just storing the 
  table info on each index page (or something similar) would work well.
 
 If you have the partitioning key in the index and the partitions don't
 overlap, it is better to create separate [unique] indexes on the
 subtables.
 Building separate indexes per partition is usually preferred because of:
 1. performance of dropping a partition
 2. smaller index for CE
 
 Only if you need an order by without a sort step, that spawns more
 than one partition
 things usually get ugly. Imho the best solution would be a merge node,
 that merges results of
 several index accesses to avoid a sort and still use separate indexes.
 Such
 a merge node could probably also detect the case where accessing
 partitions in a certain 
 order still produces ordered results.
 
 Usually you will only want the one big unique index when the
 partitioning is not 
 reflectable in the index keys, and then (also in other db's) such an
 index is usually a pain ...
 
 Andreas
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread mark
On Tue, Oct 04, 2005 at 05:23:41PM +0200, Martijn van Oosterhout wrote:
 On Tue, Oct 04, 2005 at 03:56:53PM +0100, Simon Riggs wrote:
  I've been using gcc 3.4 and saw no warning when using either -Winline
  or -O3 -Winline.
 Ok, I've just installed 3.4 and verified that. I examined the asm code
 and gcc is inlining it. I concede, at this point just throw in -Winline
 and monitor the situation.
 As an aside, the *_getattr calls end up a bit suboptimal though. It's
 producing code like:
   cmp attlen, 4
   je $elsewhere1
   cmp attlen, 2
   je $elsewhere2
   ld byte
 here:
 --- much later ---
 elsewhere1:
   ld integer
   jmp $here
 elsewhere2:
   ld short
   jmp $here
 No idea whether we want to go down the path of hinting to gcc which
 size will be the most common.

If it will very frequently be one value, and not the other values, I
don't see why we wouldn't want to hint? #ifdef it to a expand to just
the expression if not using GCC. It's important that we know that the
value would be almost always a certain value, however, as GCC will try
to make the path for the expected value as fast as possible, at the
cost of an unexpected value being slower.

  __builtin_expect (long EXP, long C)

 You may use `__builtin_expect' to provide the compiler with branch
 prediction information.  In general, you should prefer to use
 actual profile feedback for this (`-fprofile-arcs'), as
 programmers are notoriously bad at predicting how their programs
 actually perform.  However, there are applications in which this
 data is hard to collect.

 The return value is the value of EXP, which should be an integral
 expression.  The value of C must be a compile-time constant.  The
 semantics of the built-in are that it is expected that EXP == C.
 For example:

  if (__builtin_expect (x, 0))
foo ();

 would indicate that we do not expect to call `foo', since we
 expect `x' to be zero.  Since you are limited to integral
 expressions for EXP, you should use constructions such as

  if (__builtin_expect (ptr != NULL, 1))
error ();

 when testing pointer or floating-point values.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] [HACKERS] Query in SQL statement

2005-10-04 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 12:51:08PM -0700, Roger Hand wrote:
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] Behalf Of Jim C. Nasby
  Sent: Friday, September 30, 2005 4:49 PM
  Subject: Re: [PERFORM] [HACKERS] Query in SQL statement
  
  I suggest ditching the CamelCase and going with underline_seperators.
  I'd also not use the bareword id, instead using bad_user_id. And I'd
  name the table bad_user. But that's just me. :)
 
 I converted a db from MS SQL, where tables and fields were CamelCase, and 
 just lowercased the ddl to create the tables.
 
 So table and fields names were all created in lowercase, but I didn't have to 
 change
 any of the application code: the SELECT statements worked fine with mixed 
 case.
 
 -- sample DDL
 CREATE TABLE testtable
 (
   fieldone int4
 ) 
 insert into TestTable (fieldone) values (11);

That will usually work (see Tom's reply), but fieldone is a heck of a
lot harder to read than field_one. But like I said, this is the coding
conventions I've found work well; YMMV.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Inherited indexes.

2005-10-04 Thread Simon Riggs
On Tue, 2005-10-04 at 18:16 +0200, Zeugswetter Andreas DAZ SD wrote:
  Another possibility is optimizing for the special case of 
  indexing on a partitioning key. In this case, index values 
  would be very localized to one table, so just storing the 
  table info on each index page (or something similar) would work well.
 
 If you have the partitioning key in the index and the partitions don't
 overlap, it is better to create separate [unique] indexes on the
 subtables.
 Building separate indexes per partition is usually preferred because of:
 1. performance of dropping a partition
 2. smaller index for CE

...

 Imho the best solution would be a merge node,
 that merges results of
 several index accesses to avoid a sort and still use separate indexes.
 Such
 a merge node could probably also detect the case where accessing
 partitions in a certain 
 order still produces ordered results.

Yes, that was my conclusion also.

There are a number of intermediate steps along the way, so it will take
some time to achieve it.

 Usually you will only want the one big unique index when the
 partitioning is not 
 reflectable in the index keys, and then (also in other db's) such an
 index is usually a pain ...

Agreed^2. The idea of a global index is a non-starter for all of the
reasons that Tom gave and the main one: Its's unusably huge. There's no
point in partitioning a 1TB table if you then have to build a 500GB
index on it. The tree would be so deep that each insert would require
maybe 3 I/Os on index branch blocks before you got to the leaf. Insert
performance would suck real bad, which is a blocker since if you have a
large table you almost certainly have a lot of data to load. If you
don't have a big table you shouldn't be partitioning it anyway. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Announcing Veil

2005-10-04 Thread Marc Munro
This is to announce the first, Alpha, release of Veil, a database
security add-on for Postgres. It allows databases to be created with
efficient row-level access controls; different users have access to
different subsets of data.

Veil is hosted on pgfoundry at http://veil.projects.postgresql.org/
Documentation is here:
http://veil.projects.postgresql.org/curdocs/index.html

If there is sufficient interest, I would ultimately like Veil to be
added to contrib - we will have to see how things pan out.

I am announcing this to the postgres hackers list as I hope for review
comments from a qualified community.  All comments will be welcomed.

If you would like somewhere controversial to start your review, take a
look at veil_shmem.c which hooks into the postgres shared memory
subsystem.  Since I was unable to dynamically assign a LWLock using
LWLockAssign (none available), I have fairly arbitrarily overloaded the
use of existing LWLocks.  When the flames die down perhaps we can
discuss making a small number (one would be enough for me) of LWLocks
available.

Thank you.

__
Marc Munro


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Announcing Veil

2005-10-04 Thread Tom Lane
Marc Munro [EMAIL PROTECTED] writes:
 Since I was unable to dynamically assign a LWLock using
 LWLockAssign (none available), I have fairly arbitrarily overloaded the
 use of existing LWLocks.  When the flames die down perhaps we can
 discuss making a small number (one would be enough for me) of LWLocks
 available.

Perhaps you missed the comment in NumLWLocks()?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] prefix btree implementation

2005-10-04 Thread Qingqing Zhou

I am not sure if this idea was mentioned before.

The basic prefix btree idea is quite straightforward, i.e., try to
compress the key items within a data page by sharing the common prefix.
Thus the fanout of the page is increased and the benefits is obvious
theorectically.

Consider the following cases: 1/ Multi-attributes index, example page
looks like this:
[1, 1, 'abc'], [1, 1, 'bde'], ..., [1, 22, 'mmk']
- So [1, 1] or [1] is compressible

2/ Index on string, example page looks like this:
['aaab'] ['aaac'], ..., ['aabc']
- So ['aaa'] or ['aa'] is compressible

3/ Both 1 and 2, example page looks like this:
[1, 'abc', 1], [1, 'abc', 2], ..., [1, 'abk', 1]
- So [1, 'abc'] or [1, 'ab'] is compressible

4/ And even this, example page looks like this:
[0x1234], [0x1235], ..., [0x2213]
- So [0x] is compressible

So together, there are basically four types of possible sharing:
column-wise (case 1), character-wise (case 2), column-character-wise (case
3), and byte-wise (case 4).

To compress these prefixes, we have the following questions:

1/ What types of prefix compression shall we support?

We can consider case 3 as a generalized case of 1 and 2, thus at least we
can implement for case 1, 2 and 3 now. Case 4 could be seen as a special
case of case 2, so the framework should be scalable enough to add supports
to 4. In short, we will implement 1,2 and 3 now.

2/ When to do prefix compression and what index operations will be
affected?

To simplify things, we can do it when we do bottom-up index rebuild. We
won't try to compress the prefix when inserting new data. By this design,
I could think that the index operations affected will include btree build,
btree split (new page has to inheritant prefix information).

3/ How to represent each index tuple and where to store the common prefix?

Index tuples involved in the compression have to remember how much have
been shared and the shared prefix should be stored within the same page.
The basic idea is that we will use bit 13 of IndexTuple-t_info to
indicate that if it shares a prefix or not, and the common prefix will be
stored in the special area of an index page. The common prefix data will
have format {length|value}.  Thanks that the PageHeader-pd_special is
dyanmic, so the common prefix could have different length on each page.
Based on these changes, we will change index_getattr() and _bt_pageinit()
accordingly.

4/ WAL support?

We only affect btree build and btree split. For btree build, we don't
worry too much, since the data is already page-wise xloged. We will add
prefix information to btree split xlog record (xl_btree_split).

Comments?

Regards,
Qingqing

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org