date:20060913

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Jim C. Nasby

On Sat, Sep 02, 2006 at 09:26:36PM -0700, Joshua D. Drake wrote:
 
 I just spent 1/2 hour fixing the multi-value UPDATE
 patch for the code drift caused by UPDATE/RETURNING.  The patch is a
 simple grammar macro.  Any coder could have taken that, reviewed it, and
 applied it, but no one did.
 Perhaps that's because nobody but you wanted it to go in.
 
 We got tons of people who wanted that.
 
 To further this, I didn't even know it was an issue. If it was only half 
 an hour and it needed to be done, why wasn't it put out there?
 
 Thanks goes to Bruce for fixing it but I didn't know it was an issue, I 
 have 5 C developers, if any of them could have done it -- then some 
 communication is needed and I would have allocated someone to it.
 
 I am sure that is the case with others as well.
 
 I am not saying it is anyone's responsibility to speak up, but I can say 
 had I known some help was needed (especially something so trivial) I 
 would have gladly donated some time.

There's been talk in the past of having some kind of system that
automatically attempts to build things that are in the patch queue, both
as an initial sanity-check and as a means to detect when something
bit-rots... perhaps it's becoming worthwhile to set that up.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Simplifying standby mode

2006-09-13 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 The project policy has always been that we don't change on-disk formats
 in minor releases.  I'm not entirely clear why you are so keen on
 carving out an exception for WAL data.

I had always thought of the policy as initdb is not required not no on-disk
format changes. In that light you're suggesting extending the policy which I
guess I just thought should be done explicitly rather than making policy by
accident.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread Stefan Kaltenbrunner


Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

Tom Lane wrote:

Could we see the actual EXPLAIN ANALYZE results for the slow plan?



http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt


Well, indeed it seems that the hash join is just an innocent bystander:
the bulk of the runtime (all but about 120 sec in fact) is spent here:

 -  Nested Loop  (cost=13.65..1719683.85 rows=12000672 
width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1)
   -  Merge Join  (cost=0.00..10248.66 rows=2 
width=41) (actual time=16.654..2578.060 rows=19837 loops=1)
   ...
   -  Bitmap Heap Scan on lineitem  (cost=13.65..77.16 
rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
 Recheck Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)
 -  Bitmap Index Scan on i_l_suppkey  
(cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 
loops=19837)
   Index Cond: (lineitem.l_suppkey = 
supplier.s_suppkey)

I suppose that the profile result you showed was taken during the
startup transient where it was computing the hashtables that this loop's
results are joined to ... but that's not where the problem is.  The
problem is repeating that bitmap scan on lineitem for nearly 2
different l_suppkeys.


possible - I actually took them over a longer period of time



Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.  The other plan uses a different join order and doesn't try to
join lineitem until it's got orders.o_orderkey, whereupon it does a
mergejoin against an indexscan on lineitem:

 -  Index Scan using i_l_orderkey on lineitem 
(cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 
rows=59991868 loops=1)

The runtimes for the remainders of the plans are roughly comparable, so
it's the cost of joining lineitem that is hurting here.

Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
problem could be overestimating the cost of this indexscan.

What are the physical sizes of lineitem and its indexes, and how do
those compare to your RAM?  What are you using for planner settings
(particularly effective_cache_size)?


ouch - you are right(as usual) here.
effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:

http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt

as for the relation sizes:

dbt3=# select pg_relation_size('lineitem');
 pg_relation_size
--
  10832764928
(1 row)

dbt3=# select pg_total_relation_size('lineitem');
 pg_total_relation_size

22960259072
(1 row)

there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in 
size.



Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jim C. Nasby wrote:
 There's been talk in the past of having some kind of system that
 automatically attempts to build things that are in the patch queue, both
 as an initial sanity-check and as a means to detect when something
 bit-rots... perhaps it's becoming worthwhile to set that up.

After writing the enum patch, I hacked the buildfarm client code to 
apply a patch to the checked out code before building. You could then 
run it thusly:


./run_build.pl --nosend --nostatus --verbose \
  --patch=/home/tom/src/enums-v1.patch --patch-level=1

The idea was that patch authors could either run it manually or stick it 
in a cron so they could get emailed when the patch no longer cleanly 
applied, or when the patched source failed in make, make check etc. 
Obviously my motivation was to keep the enum patch up to date until we 
hit 8.3 and someone looks at it. To that end it might also be useful for 
it to die if duplicate_oids finds anything.


I submitted a patch to Andrew, but it needed a couple of tweaks 
(disabling patching on vpath builds, for example) and I don't think I 
ever got around to resubmitting it, but if there's more general interest 
I'll do so.


Note that it was intended for patch authors to run themselves rather 
than any kind of central mechanism to test the patch queue. While it 
would obviously be nice to know what the current status of any given 
patch in the queue is, the thing about the patch queue is that it 
contains patches that we haven't had time to review yet. It'll only take 
one patch to get into the queue containing a security vulnerability, or 
worse, a trojan, for it to seem unfortunate.


I had thoughts of hacking the buildfarm server to allow the posting of a 
patch along with results, so that authors could report results for their 
own patches, but ran out of time. Is there interest in doing that? 
Obviously it'd be a different server to the existing buildfarm.


Cheers

Tom


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Peter Eisentraut

Say42 wrote:
 Perhaps I am wrong but I assume normalization is a usual case, small
 master (parent) tables are not very rare also.
 Yes, my example is unusual but it is _real_ and demonstrate PG
 optimizer inaccuracy. Why don't we make PG optimizer more close to
 reality if we can? Is it so needless and I make a mountain out of a
 molehill?

All you have shown so far is that one particular query runs faster on 
your machine when sequential scans are turned off.  That is certainly a 
problem that is worth addressing.  But you haven't offered any analysis 
about the cause of this problem, so any speculation about 
normalization, usual cases, caching effects and so on are unfounded and 
premature.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Simplifying standby mode

2006-09-13 Thread Simon Riggs

On Tue, 2006-09-12 at 16:23 -0400, Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  Simon Riggs [EMAIL PROTECTED] writes:
  My memory is lousy at the best of times, but when have we had a minor 
  release that would have broken this due to changed format?
 
  Not often, which is why I mention the possibility of having
  interoperating minor release levels at all. If it was common, I'd just
  put a blanket warning on doing that.
 
  I don't know that it's happened in the past but I wouldn't be surprised.
  Consider that the bug being fixed in the point release may well be a bug in
  WAL log formatting. 
 
 This would be the exception, not the rule, and should not be documented
 as if it were the rule.  It's not really different from telling people
 to expect a forced initdb at a minor release: you are simply
 misrepresenting the project's policy.

OK, that's clear. I'll word it the other way around.

SGML'd version will go straight to -patches.

--

Other Questions and Changes:: please shout them in now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Lock partitions

2006-09-13 Thread Simon Riggs

On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote:
 Strong, David [EMAIL PROTECTED] writes:
  When using 16 buffer and 16 lock partitions, we see that BufMapping
  takes 809 seconds to acquire locks and 174 seconds to release locks. The
  LockMgr takes 362 seconds to acquire locks and 26 seconds to release
  locks.
 
  When using 128 buffer and 128 lock partitions, we see that BufMapping
  takes 277 seconds (532 seconds improvement) to acquire locks and 78
  seconds (96 seconds improvement) to release locks. The LockMgr takes 235
  seconds (127 seconds improvement) to acquire locks and 22 seconds (4
  seconds improvement) to release locks.
 
 While I don't see any particular penalty to increasing
 NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
 significant penalty (increasing PGPROC size as well as the work needed
 during LockReleaseAll, which is executed at every transaction end).
 I think 128 lock partitions is probably verging on the ridiculous
 ... particularly if your benchmark only involves touching half a dozen
 tables.  I'd be more interested in comparisons between 4 and 16 lock
 partitions.  Also, please vary the two settings independently rather
 than confusing the issue by changing them both at once.

Good thinking David. Even if 128 is fairly high, it does seem worth
exploring higher values - I was just stuck in fewer == better
thoughts.

  With the improvements in the various locking times, one might expect an
  improvement in the overall benchmark result. However, a 16 partition run
  produces a result of 198.74 TPS and a 128 partition run produces a
  result of 203.24 TPS.
 
  Part of the time saved from BufMapping and LockMgr partitions is
  absorbed into the WALInsertLock lock. For a 16 partition run, the total
  time to lock/release the WALInsertLock lock is 5845 seconds. For 128
  partitions, the WALInsertLock lock takes 6172 seconds, an increase of
  327 seconds. Perhaps we have our WAL configured incorrectly?
 
 I fear this throws your entire measurement procedure into question.  For
 a fixed workload the number of acquisitions of WALInsertLock ought to be
 fixed, so you shouldn't see any more contention for WALInsertLock if the
 transaction rate didn't change materially.

David's results were to do with lock acquire/release time, not the
number of acquisitions, so that in itself doesn't make me doubt these
measurements. Perhaps we can ask whether there was a substantially
different number of lock acquisitions? As Tom says, that would be an
issue.

It seems reasonable that relieving the bottleneck on BufMapping and
LockMgr locks that we would then queue longer on the next bottleneck,
WALInsertLock. So again, those tests seem reasonable to me so far.

These seem to be the beginnings of accurate wait time analysis, so I'm
listening closely.

Are you using a lightweight timer?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] dump / restore functionality

2006-09-13 Thread Markus Schaber

Hi, Tom,

Tom Lane wrote:

 AFAIR what was discussed was separating
   - schema stuff needed before loading data
   - table data
   - schema stuff needed after loading data
 where the last category boils down to indexes and then foreign keys.
 All the other stuff such as functions really needs to be in the
 first part ... or at least there's no visible benefit to delaying
 loading it.

I agree, it has to be in the first part, especially as data types and
input functions needed for the table definitions and table data may be
defined therein.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] GIN documentation

2006-09-13 Thread Teodor Sigaev


Patch adds GIN documentation and slightly improves GiST docs.

Somebody of native English speakers, pls, check the text... Thank you.


gindocs.gz
Description: application/gzip

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Say42

Peter Eisentraut wrote:
 But you haven't offered any analysis about the cause of this problem, so any
 speculation about normalization, usual cases, caching effects and so on are
 unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where
exists (select code from belg_mobile tc
where c.bnum = tc.code and c.bnum like tc.code || '%'
order by tc.code desc limit 1)

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost)
  + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost)
  + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery)
  ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops)
  + 0.01 (cpu_tuple_cost)) * 68 (record count)
  = 1 (pages fetch cost) + 1.19 (cpu cost) = 2.19

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost)
  * 500K (conn table row count) ) / 500K  ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost)
  * 500K (conn table row count) ) / 500K  ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Alvaro Herrera

Tom Dunstan wrote:

 The idea was that patch authors could either run it manually or stick it 
 in a cron so they could get emailed when the patch no longer cleanly 
 applied, or when the patched source failed in make, make check etc. 
 Obviously my motivation was to keep the enum patch up to date until we 
 hit 8.3 and someone looks at it. To that end it might also be useful for 
 it to die if duplicate_oids finds anything.
 
 I submitted a patch to Andrew, but it needed a couple of tweaks 
 (disabling patching on vpath builds, for example) and I don't think I 
 ever got around to resubmitting it, but if there's more general interest 
 I'll do so.

Huh, why would you disable patching on vpath builds?

 Note that it was intended for patch authors to run themselves rather 
 than any kind of central mechanism to test the patch queue.

Well, I'd think that one important benefit of passing patches through
the buildfarm is detecting possible portability problems in it.

 While it would obviously be nice to know what the current status of
 any given patch in the queue is, the thing about the patch queue is
 that it contains patches that we haven't had time to review yet. It'll
 only take one patch to get into the queue containing a security
 vulnerability, or worse, a trojan, for it to seem unfortunate.

We could have a register of developers allowed to upload patches to the
patched build queue, and have a policy that says that you should only
upload a patch if it has already passed some review.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Inconsistency in extended-query-protocol logging

2006-09-13 Thread Guillaume Smet


Tom,

I'm currently resyncing my parser with the changes you made to
prepared statement logging. Everything is OK apart from an
inconsistency between log_duration and log_min_duration_statement.

* With log_duration, we have:
protocol : LOG:  statement: execute my_query: SELECT * FROM shop WHERE $1 = $2
query : LOG:  statement: EXECUTE test1('lop1', 'lop2')

* With log_min_duration_statement, we have:
protocol: LOG:  duration: 235.345 ms  execute insert_query: INSERT
INTO shop (name) VALUES($1)
query: LOG:  duration: 0.187 ms  statement: EXECUTE test1('lop1', 'lop2')

As you can see, in the log_duration case, we always have the
statement:  part but we don't have it in log_min_duration_statement
case when using protocol.

I attached a patch to improve the consistency. It adds statement: for
every case. Note that statement was not there in the first version of
Bruce and he added it after so I keep it in this patch. I don't really
care if we have statement: or not but I'd really like a consistent
behaviour between both configuration.

Thanks,

--
Guillaume
Index: src/backend/tcop/postgres.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.508
diff -c -r1.508 postgres.c
*** src/backend/tcop/postgres.c	8 Sep 2006 15:55:53 -	1.508
--- src/backend/tcop/postgres.c	13 Sep 2006 12:44:16 -
***
*** 1256,1262 
  			break;
  		case 2:
  			ereport(LOG,
! 	(errmsg(duration: %s ms  parse %s: %s,
  			msec_str,
  			*stmt_name ? stmt_name : unnamed,
  			query_string)));
--- 1256,1262 
  			break;
  		case 2:
  			ereport(LOG,
! 	(errmsg(duration: %s ms  statement: parse %s: %s,
  			msec_str,
  			*stmt_name ? stmt_name : unnamed,
  			query_string)));
***
*** 1610,1616 
  			break;
  		case 2:
  			ereport(LOG,
! 	(errmsg(duration: %s ms  bind %s to %s: %s,
  			msec_str,
  			*portal_name ? portal_name : unnamed,
  			*stmt_name ? stmt_name : unnamed,
--- 1610,1616 
  			break;
  		case 2:
  			ereport(LOG,
! 	(errmsg(duration: %s ms  statement: bind %s to %s: %s,
  			msec_str,
  			*portal_name ? portal_name : unnamed,
  			*stmt_name ? stmt_name : unnamed,
***
*** 1819,1825 
  			break;
  		case 2:
  			ereport(LOG,
! 	(errmsg(duration: %s ms  %s %s%s%s%s%s,
  			msec_str,
  			execute_is_fetch ?
  			_(execute fetch from) :
--- 1819,1825 
  			break;
  		case 2:
  			ereport(LOG,
! 	(errmsg(duration: %s ms  statement: %s %s%s%s%s%s,
  			msec_str,
  			execute_is_fetch ?
  			_(execute fetch from) :

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Alvaro Herrera wrote:
I submitted a patch to Andrew, but it needed a couple of tweaks 
(disabling patching on vpath builds, for example) and I don't think I 
ever got around to resubmitting it, but if there's more general interest 
I'll do so.


Huh, why would you disable patching on vpath builds?


As I understand it, vpath builds only do a checkout to a single place, 
and then build against that (from a different directory). Non-vpath 
builds take a copy of the checked out source and build in the copy. If 
we patched the source in vpath builds, the patch would stick around when 
updating the source tree from CVS, and the next patch attempt would fail 
etc. Non-vpath builds will get a clean copy to patch in subsequent runs. 
I suppose I could have made vpath builds take a copy when doing a patch, 
but it hardly seemed worth it.



Well, I'd think that one important benefit of passing patches through
the buildfarm is detecting possible portability problems in it.


Absolutely. As long as they're blessed as mention below...


We could have a register of developers allowed to upload patches to the
patched build queue, and have a policy that says that you should only
upload a patch if it has already passed some review.


This was where I was originally heading when I first thought about this 
functionality. I didn't go that far with my fairly modest patch since a) 
it wasn't clear that there was manpower available to do the initial 
review to bless the patches, and b) what I did do solved my immediate 
problem.


If there is support for doing something like this, there are all kinds 
of things that could be done. For example, you could check which patches 
break which other ones. An even more way-out idea might be to have 
performance patches run pgbench or some other set of benchmarks.  You 
have a performance related patch? Let's stick it in the queue and see if 
it really improves things or not.


Note that the existing patch queue mechanism wouldn't suffice, since 
there's no standard way to pull patches through via http or whatever. 
We'd probably want to back it with a small database and webapp, which 
might just be a hacked buildfarm server.


Cheers

Tom

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Alvaro Herrera

Tom Dunstan wrote:
 Alvaro Herrera wrote:
 I submitted a patch to Andrew, but it needed a couple of tweaks 
 (disabling patching on vpath builds, for example) and I don't think I 
 ever got around to resubmitting it, but if there's more general interest 
 I'll do so.
 
 Huh, why would you disable patching on vpath builds?
 
 As I understand it, vpath builds only do a checkout to a single place, 
 and then build against that (from a different directory). Non-vpath 
 builds take a copy of the checked out source and build in the copy. If 
 we patched the source in vpath builds, the patch would stick around when 
 updating the source tree from CVS, and the next patch attempt would fail 
 etc. Non-vpath builds will get a clean copy to patch in subsequent runs. 
 I suppose I could have made vpath builds take a copy when doing a patch, 
 but it hardly seemed worth it.

Huh, but the patch can be applied with -R to revert it after the
buildfarm run ... the one problem I can see is if the patch fails for
some reason; for which I'd suggest running a patch --dry-run as a first
step, checking that it applies cleanly, and only continue in that case.


[nice ideas snipped (not sniped)]

 Note that the existing patch queue mechanism wouldn't suffice, since 
 there's no standard way to pull patches through via http or whatever. 
 We'd probably want to back it with a small database and webapp, which 
 might just be a hacked buildfarm server.

Oh, sure.  I'd say there is no existing patch queue, just a Mhonarc
archive which is just useful as a reminder of what patches are there
outstanding.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Andrew Dunstan

Alvaro Herrera wrote:

 Huh, but the patch can be applied with -R to revert it after the
 buildfarm run ... the one problem I can see is if the patch fails for
 some reason; for which I'd suggest running a patch --dry-run as a first
 step, checking that it applies cleanly, and only continue in that case.




Unfortunately, this won't fly as we rely on file timestamps to tell us
which files have changed.

Tom's idea of making a temp copy of the repo and patching that would work,
but if you're going to do that why do a vpath build anyway?

Regarding the idea of a list of approved patch authorisers, don't we have
such a group now? i.e. committers.

cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Inconsistency in extended-query-protocol logging

2006-09-13 Thread Tom Lane

Guillaume Smet [EMAIL PROTECTED] writes:
 I attached a patch to improve the consistency. It adds statement: for
 every case.

Isn't that just bloating the log?  And weren't you the one complaining
about log volumes to begin with?

The reason it is why it is is that Bruce wanted the wording to reflect
why you are getting the log message: anything starting statement:
is being logged because of log_statement, anything starting duration:
is being logged because of one of the log_duration variables.  Your
proposed patch blurs that distinction and doesn't buy anything much
that I can see.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread Tom Lane

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Apparently we've made the planner a bit too optimistic about the savings
 that can be expected from repeated indexscans occurring on the inside of
 a join.

 effective_cache_size was set to 10GB(my fault for copying over the conf
 from a 16GB box) during the run - lowering it just a few megabytes(!) or
 to a more realistic 6GB results in the following MUCH better plan:
 http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt

Interesting.  It used to be that effective_cache_size wasn't all that
critical... what I think this report is showing is that with the 8.2
changes to try to account for caching effects in repeated indexscans,
we've turned that into a pretty significant parameter.

It'd be nice not to have to depend on the DBA to give us a good number
for this setting.  But I don't know of any portable ways to find out
how much RAM is in the box, let alone what fraction of it we should
assume is available per-query.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Andrew Dunstan wrote:

Tom's idea of making a temp copy of the repo and patching that would work,
but if you're going to do that why do a vpath build anyway?


In this case, the answer is to make sure that your patch works when 
*someone else* does a vpath build.



Regarding the idea of a list of approved patch authorisers, don't we have
such a group now? i.e. committers.


Right, and if committers or others are willing to put in the time 
required to verify that patches aren't nasty before going onto the 
blessed patch queue, the idea could quite possibly work and provide some 
value. Note that all we really need to test for here is that the patch 
isn't malicious; patches that are bad design or whatever are unlikely to 
open security holes or fry your box. A major point of the queue is that 
the appropriate committer often doesn't have the time to review the 
patch right now. There might be some benefit in allowing a wider set of 
contributors to bless patches as non-nasty for testing purposes, rather 
than waste the limited time of qualified committers. Maybe such an army 
doesn't exist, though.


Cheers

Tom




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Tom Lane

Say42 [EMAIL PROTECTED] writes:
 ... Let's take my pervious example (I repost query and some lines
 from 'explain' here for convenience):

 select count(*) from conn.conn20060803 c where
 exists (select code from belg_mobile tc
 where c.bnum = tc.code and c.bnum like tc.code || '%'
 order by tc.code desc limit 1)

I'm having a hard time getting excited about improving this query when
it's so badly coded in the first place.  What's an ORDER BY doing in
an EXISTS subquery?  The LIMIT is unnecessary too.  And the inner WHERE
says nothing so much as I don't know how to design a database :-(.
If we're going to look at specific examples we should at least look
at examples that are representative of typical good practice.

It is true that EXISTS() subqueries are planned independently without
any idea of how often they might get re-executed.  This would be good
to fix but I don't see any clear way to do it --- at the time we are
processing the outer WHERE, we don't have enough context to judge
how many times a particular clause might be evaluated.  (Yeah, in this
case it's pretty obvious that it'll be executed once per conn20060803
row, but in join situations, or even just with additional outer WHERE
clauses, it's not nearly so obvious.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread Stefan Kaltenbrunner


Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

Tom Lane wrote:

Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.



effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:
http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt


Interesting.  It used to be that effective_cache_size wasn't all that
critical... what I think this report is showing is that with the 8.2
changes to try to account for caching effects in repeated indexscans,
we've turned that into a pretty significant parameter.


yes I'm a bit worried about that too - it has been a bit of 
conventional wisdom that setting effective_cache_size optimistic will 
never hurt and that it encourages postgresql to sometimes get a better 
plan by favouring index-scans.




It'd be nice not to have to depend on the DBA to give us a good number
for this setting.  But I don't know of any portable ways to find out
how much RAM is in the box, let alone what fraction of it we should
assume is available per-query.


well there are really a number of things the dba would better give 
accurate information to the database - though in that case we might go 
from too much won't hurt to too much will hurt ...



Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Inconsistency in extended-query-protocol logging

2006-09-13 Thread Guillaume Smet


On 9/13/06, Tom Lane [EMAIL PROTECTED] wrote:

Guillaume Smet [EMAIL PROTECTED] writes:
 I attached a patch to improve the consistency. It adds statement: for
 every case.

Isn't that just bloating the log?  And weren't you the one complaining
about log volumes to begin with?


As I told you, I don't care if we remove it or if we add it but having:
statement: query - duration:  duration  statement: query
in one case and
statement: query - duration:  duration  query
in the other case is not consistent.

Either we remove statement: for extended protocol or we add it but I
don't think it's a good idea to have a different behaviour between
log_duration and log_min_duration_statement.

As for bloating the log, it's already the case currently with regular
queries so it won't change that much. At least in the cases we have
here, it's negligible compared to the query text. IMHO, it's not an
argument to choose either solution.

IMHO, it's more logical to remove it as the text after statement: is
not a statement in the extended query protocol case. I chose the other
solution to be consistent with the choices Bruce made before.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: AIX shared libraries (was Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries)

2006-09-13 Thread Rocco Altier

 Tom Lane wrote:
  Is it
  possible that the rules have changed across AIX versions, 
  and that the code in there now is needful for older versions?
 
 I don't think that this behaviour has changed. I remember it from
 AIX 4.3.2.
 
AIX 4.3 is the first version to support the -brtl.  The current code is
in place to mimic the behaviour of dlopen, etc, on the older platforms.

I think we are at a point that we can stop maintaining AIX older than
4.3 if we want.

-rocco

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-13 Thread David Fetter

On Wed, Sep 13, 2006 at 10:47:09AM -0400, Tom Lane wrote:
 Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Apparently we've made the planner a bit too optimistic about the savings
  that can be expected from repeated indexscans occurring on the inside of
  a join.
 
  effective_cache_size was set to 10GB(my fault for copying over the conf
  from a 16GB box) during the run - lowering it just a few megabytes(!) or
  to a more realistic 6GB results in the following MUCH better plan:
  http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
 
 Interesting.  It used to be that effective_cache_size wasn't all
 that critical... what I think this report is showing is that with
 the 8.2 changes to try to account for caching effects in repeated
 indexscans, we've turned that into a pretty significant parameter.
 
 It'd be nice not to have to depend on the DBA to give us a good
 number for this setting.  But I don't know of any portable ways to
 find out how much RAM is in the box, let alone what fraction of it
 we should assume is available per-query.

That's fairly straight-forward, if a little crude.  We ask the DBA and
provide some tools for estimating and tuning same. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries

2006-09-13 Thread Rocco Altier

The patch did not work for me :-(

My buildfarm members failed in local testing to execute the
install-check, because initdb failed to find libpq.so.

Make check did succeed, so I think there is a possibility of getting it
working, but it won't be as simple as adding -brtl to the template.

I was also getting duplicate symbols in the link, that I wasn't before
either.

I don't have time right now to work through all the issues, but wanted
to give feedback that the patch isn't quite this simple.

Thanks,
-rocco

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Albe Laurenz
 Sent: Tuesday, September 12, 2006 9:01 AM
 To: Tom Lane *EXTERN*
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries 
 
 
 Tom Lane wrote:
  In our case, we have libpq.a and libpq.so in the same directory,
  so unless you link with -brtl you will get a static link
  (because libpq.a is a static library).
  
  I wonder whether we ought to suppress building (or at least
 installing)
  our .a libraries at all on AIX.  Adding -btrl to LDFLAGS would help
  within the context of our own build, but external clients that link
  to libpq without saying that are going to get undesirable results.
  
  I think there's a reasonable argument that by installing a .a file
 that
  isn't a shared library, we are violating the platform's conventions.
 
 The natural way in AIX would be:
 - Create libpq.so
 - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so'
 - Install only libpq.a
 
 For a static build on AIX, you have to specify all the libraries and
 give the linker -bstatic and -bI:/lib/syscalls.exp
 
  Should -brtl be added to src/template/aix?
  
  Sounds that way, but that'll only help for psql and other 
 stuff built
  within our build.  Could you try this against CVS tip:
  
  * add -brtl to LDFLAGS in the template
  * Remove the AIX-specific hack on $(libpq) at lines 349-354 of
src/Makefile.global.in
  * see if it configures and builds
 
 I have done that (see the attached patch) and it works fine.
 I don't have the native AIX C compiler, so I could only test
 it with gcc.
 
 I have taken the liberty to modify the static link line
 in Makefile.global.in to contain the LDAP libraries, I hope
 that's appropriate.
 
 Yours,
 Laurenz Albe
 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )

2006-09-13 Thread Albe Laurenz

 The patch did not work for me :-(
 
 My buildfarm members failed in local testing to execute the
 install-check, because initdb failed to find libpq.so.
 
 Make check did succeed, so I think there is a possibility of 
 getting it
 working, but it won't be as simple as adding -brtl to the template.

I suspect that up to now the buildfarm had a static build of
PostgreSQL. What is the output of 'ldd initdb' when it builds
and runs correctly?

Is libpq.so in a non-standard directory? If yes, one either
has to export LIBPATH in the environment or link with
-L/location/of/libpq for the executable to find it
(similar to RPATH in Linux).

Yours,
Laurenz Albe


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Inconsistency in extended-query-protocol logging

2006-09-13 Thread Tom Lane

Guillaume Smet [EMAIL PROTECTED] writes:
 IMHO, it's more logical to remove it as the text after statement: is
 not a statement in the extended query protocol case.

Well, I was never particularly enamored of the idea that we should put
statement: into everything generated by log_statement.  My proposal
would be to generate

statement: querystring  Simple Query
parse stmt: querystring   Parse
bind stmt/portal: querystring   Bind
execute stmt/portal: querystringExecute

or these prefixed with duration: xxx, as appropriate.  Bruce was
pretty hot about having statement: in there, so the hard part might
be to convince him.

Also, the current code distinguishes a fetch from an
already-partially-executed portal ... do you care about that?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Inconsistency in extended-query-protocol logging

2006-09-13 Thread Guillaume Smet


On 9/13/06, Tom Lane [EMAIL PROTECTED] wrote:

statement: querystring  Simple Query
parse stmt: querystring   Parse
bind stmt/portal: querystring   Bind
execute stmt/portal: querystringExecute


I agree with that.
Hmmm, AFAICS, you changed stmt/portal to portal to
statement in your last commit. Or did I misunderstand?


or these prefixed with duration: xxx, as appropriate.  Bruce was
pretty hot about having statement: in there, so the hard part might
be to convince him.


Bruce, any opinion? I really think the extended query protocol is not
a statement stricly speaking.


Also, the current code distinguishes a fetch from an
already-partially-executed portal ... do you care about that?


I don't really understand what is a portal - I must admit I don't use
libpq directly. I never saw a log file with fetch. Do you have an
example? Should I consider an execute from fetch differently?

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-13 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 I have seen some patches come across with this. Is this done, or do I 
 still need to work on it?

Teodor added an uninstall for hstore, and I think Michael has fixed all
the problems in the existing scripts, but we still lack uninstall
scripts for the other modules I mentioned.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Bruce Momjian

FYI, in going through the release notes, I would like to remind
committers that it is important to mention if the commit was backpatched
to any earlier release.  Not only is this valuable for making the
release notes, but it also helps people looking at the commit message.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Ron Mayer

Simon Riggs wrote:
 On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote:
 That's what I want to do:
 1. Replace not very useful indexCorrelation with indexClustering.
 
 An opinion such as not very useful isn't considered sufficient
 explanation or justification for a change around here.

Not sufficient for some types of data would have been more fair.

I speculate that an new additional stat of
  average # of unique values for a column within a block
would go a long way to helping my worst queries.


It's common here for queries to vastly overestimate the
number of pages that would need to be read because
postgresql's guess at the correlation being practically 0
despite the fact that the distinct values for any given
column are closely packed on a few pages.

Our biggest tables (180G or so) are mostly spatial data with columns
like City State Zip County Street School District, Police
Beat, lat/long etc; and we cluster the table on zip,street.

Note that practically all the rows for any single value of any
of the columns will lay in the same few blocks.  However the
calculated correlation being low because the total ordering
of the other values doesn't match that of zip codes.  This
makes the optimizer vastly overestimate the cost of index
scans because it guesses that most of the table will need
to be read, even though in reality just a few pages are needed.


If someone does look at the correlation calculations, I hope
this type of data gets considered as well.


I speculate that a new stat of
  average # of unique values for a column within a block
could be useful here in addition to correlation.  For most
all my columns in my big table, this stat would be 1 or 2;
which I think would be a useful hint that despite a low
correlation, the distinct values are indeed packed together
in blocks.   That way the optimizer can see that a
smaller number of pages would need to be accessed than
correlation alone would suggest.

Does this make sense, or am I missing something.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Peter Eisentraut

Bruce Momjian wrote:
 FYI, in going through the release notes, I would like to remind
 committers that it is important to mention if the commit was
 backpatched to any earlier release.

Backpatches usually happen after the commit to the head branch has been 
made, so the information is not available at that time.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Bruce Momjian

Peter Eisentraut wrote:
 Bruce Momjian wrote:
  FYI, in going through the release notes, I would like to remind
  committers that it is important to mention if the commit was
  backpatched to any earlier release.
 
 Backpatches usually happen after the commit to the head branch has been 
 made, so the information is not available at that time.

Well, I have a script where I create a single commit message that I use
to apply to multiple branches.  Usually you know at the time of commit
to HEAD that you are going to backpatch.  And in such cases, it is
helpful to mention it as part of the HEAD commit.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] GIN documentation

2006-09-13 Thread Jeff Davis

On Wed, 2006-09-13 at 14:24 +0400, Teodor Sigaev wrote:
 Patch adds GIN documentation and slightly improves GiST docs.
 
 Somebody of native English speakers, pls, check the text... Thank you.

I have some minor changes that reworded some parts that were confusing
to me.

I don't fully understand GIN, but I hope this is helpful.

Regards,
Jeff Davis 
--- gindocs.orig	2006-09-13 10:07:57.0 -0700
+++ gindocs	2006-09-13 10:43:25.0 -0700
@@ -89,7 +89,7 @@
 +acronymGIN/acronym stands for Generalized Inverted Index.  It is
 +an index structure storing a set of (key, posting list) pairs, where
 +'posting list' is a set of rows in which the key occurs. The
-+row may contains a lot of keys.
++row may contain many keys.
 +  /para
 + 
 +  para
@@ -209,10 +209,10 @@
 +termCreate vs insert/term
 +listitem
 + 	para
-+ 	 In most cases, insertion into acronymGIN/acronym index is slow enough
-+ 	 due to a lot keys should be inserted per one value. So, for bulk upload
-+ 	 data in table it will be useful to drop index and create it
-+ 	 after finishing upload.
++ 	 In most cases, insertion into acronymGIN/acronym index is slow because 
++ 	 many GIN keys may be inserted for each table row. So, when loading data
++ 	 in bulk it may be useful to drop index and recreate it
++ 	 after the data is loaded in the table.
 + 	/para
 +/listitem
 +   /varlistentry
@@ -381,9 +381,8 @@
 +para
 + 		Short-term share/exclusive page-level locks are used for 
 + 		read/write access. Locks are released immediately after each
-+ 		index row is fetched or inserted. But note, that GIN index
-+ 		usually requires produce several inserts per one row, so,
-+ 		GIN makes more work per one value's insertion.
++ 		index row is fetched or inserted. However, note that GIN index
++ 		usually requires several inserts per one table row.
 +/para
 +   /listitem
 +  /varlistentry

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Stephen Marshall

The following is a proposal for work I'd like to do to force 
long-running backend processes to reprepare their prepared statements.  
It would be used in cases where the user knows they have made a database 
change that will invalidate an existing prepared statement. 


I look forward to comments from the community.

I propose creating a new system administration function to force 
repreparation of prepared statements in all backends.  The functionality 
could be extended to include re-initialization of other kinds of 
per-backend data.


This proposal addresses, to some degree, the prepare-alter-exec issue 
discussed in various mailing list postings, and the following wish-list 
item:


# Invalidate prepared queries, like INSERT, when the table definition is 
altered


However, the solution would only be partial, as it would be the 
responsibility of database clients to call the system administration 
function when needed.  Alternately, additional integration work could be 
done to invoke this logic automatically whenever the columns of any 
table are altered.


--
Here is what I propose:

We define a new system administration function called 
pg_reload_per_backend_data.  This function would work much like 
pg_reload_conf, i.e. it would require superuser privileges and would 
work by sending a signal to the postmaster that would then be propagated 
to all the child backends (but not the special ones, like the 
bgwriter).  The signal handling logic for the backends would be modified 
to respond to the signal by reinitializing any data cached in the 
backend's memory space, such as prepared statements.  Each kind of data 
that would be reinitialized would require special logic, as they would 
all be reinitialized in their own particular way.


Choosing an appropriate signal to send might be difficult, as the list 
of available signals is somewhat restricted.  The user-defined signals 
would be a natural choice, but it appears SIGUSR1 is used for sinval 
or catchup events, while SIGUSR2 is used for asynchronous notification.  
Use of the real time signals (signal numbers = 32) might be possible, 
but could have portability problems.  Another alternative would be to 
overload SIGHUP, so that it causes both configuration reloads and 
reloading of per-backend data.  This makes some sense, since most 
configuration parameters are basically a special form of per-backend 
data.  However, changing the behavior of an existing signal might have 
undesirable side effects.  Overall, I'm very open to suggestions 
regarding the appropriate signal to use.


To implement the repreparation logic, a new function called 
RepreparePreparedStatements() could be added to source files 
backend/commands/prepare.[ch].  This function would be called by a 
signal handler installed the backends within backend/tcop/postgres.c.  
RepreparePreparedStatements would do the equivalent of iterating over 
the prepared_queries hash table and executing DropPreparedStatement() 
and PrepareQuery on each.  However, it is possible that some refactoring 
of the logic would be needed to improve performance and make the code 
more robust.


The scope of pg_reload_per_backend_data could also be expanded to 
include reinitialization of other data that resides in the memory space 
of individual backend processes.  An example of such cached entities are 
reusable modules associated with a particular procedural language, e.g. 
the TCL modules found in the table pltcl_modules.  Once a such a module 
is used in a particular backend, it remains held in backend memory and 
changes to the disk version are not noticed.  There is also no way to 
undefine any global variables associated with such modules.


I have not given much consideration to the implementation for reloading 
modules, but doing the equivalent of the SQL command LOAD 'libname' 
for all dynamically loaded libraries should have the desired effect (at 
least it does for the library that implements the PL/TCL language, 
pltcl.so).  Perhaps the the general response should be to reload any 
libraries that have been dynamically-loaded by the particular backend.


--
Here are few permutations of this plan that could be considered:

1. Bundle pg_reload_per_backend_data functionality with pg_reload_conf.

Pros: Avoids having to find an appropriate unused signal
 Logical consistancy with reloading config, which could be considered a
 special case of reloading per-backend data.
Cons: Changes behavior of an existing functionality, which has the risk of
 unintended side-effects.
 Gives less fine-grained control over when per-backend data is 
reloaded.


2. Break pg_reload_per_backend_data functional into multiple functions.

Pros: Can assign more descriptive names to the functionality, e.g.
 pg_reload_ddl, pg_reprepare_statements, etc.
 Finer grained control over which kind of reloading is performed.
Cons: Require more use of the scarce list of available signals.

Re: [HACKERS] Lock partitions

2006-09-13 Thread Strong, David

Simon,

In the 16/16 (16 buffer partitions/16 lock partitions) test, the
WALInsertLock lock had 14643080 acquisition attempts and 12057678
successful acquisitions on the lock. That's 2585402 retries on the lock.
That is to say that PGSemaphoreLock was invoked 2585402 times.

In the 128/128 test, the WALInsertLock lock had 14991208 acquisition
attempts and 12324765 successful acquisitions. That's 2666443 retries.

The 128/128 test attempted 348128 more lock acquisitions than the 16/16
test and retried 81041 times more than the 16/16 test. We attribute the
rise in WALInsertLock lock accesses to the reduction in time on
acquiring the BufMapping and LockMgr partition locks. Does this seem
reasonable?

The overhead of any monitoring is of great concern to us. We've tried
both clock_gettime () and getttimeofday () calls. They both seem to have
the same overhead ~1 us/call (measured against the TSC of the CPU) and
both seem to be accurate. We realize this can be a delicate point and so
we would be happy to rerun any tests with a different timing mechanism.

David

-Original Message-
From: Simon Riggs [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 13, 2006 2:22 AM
To: Tom Lane
Cc: Strong, David; PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote:
 Strong, David [EMAIL PROTECTED] writes:
  When using 16 buffer and 16 lock partitions, we see that BufMapping
  takes 809 seconds to acquire locks and 174 seconds to release locks.
The
  LockMgr takes 362 seconds to acquire locks and 26 seconds to release
  locks.
 
  When using 128 buffer and 128 lock partitions, we see that
BufMapping
  takes 277 seconds (532 seconds improvement) to acquire locks and 78
  seconds (96 seconds improvement) to release locks. The LockMgr takes
235
  seconds (127 seconds improvement) to acquire locks and 22 seconds (4
  seconds improvement) to release locks.
 
 While I don't see any particular penalty to increasing
 NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
 significant penalty (increasing PGPROC size as well as the work needed
 during LockReleaseAll, which is executed at every transaction end).
 I think 128 lock partitions is probably verging on the ridiculous
 ... particularly if your benchmark only involves touching half a dozen
 tables.  I'd be more interested in comparisons between 4 and 16 lock
 partitions.  Also, please vary the two settings independently rather
 than confusing the issue by changing them both at once.

Good thinking David. Even if 128 is fairly high, it does seem worth
exploring higher values - I was just stuck in fewer == better
thoughts.

  With the improvements in the various locking times, one might expect
an
  improvement in the overall benchmark result. However, a 16 partition
run
  produces a result of 198.74 TPS and a 128 partition run produces a
  result of 203.24 TPS.
 
  Part of the time saved from BufMapping and LockMgr partitions is
  absorbed into the WALInsertLock lock. For a 16 partition run, the
total
  time to lock/release the WALInsertLock lock is 5845 seconds. For 128
  partitions, the WALInsertLock lock takes 6172 seconds, an increase
of
  327 seconds. Perhaps we have our WAL configured incorrectly?
 
 I fear this throws your entire measurement procedure into question.
For
 a fixed workload the number of acquisitions of WALInsertLock ought to
be
 fixed, so you shouldn't see any more contention for WALInsertLock if
the
 transaction rate didn't change materially.

David's results were to do with lock acquire/release time, not the
number of acquisitions, so that in itself doesn't make me doubt these
measurements. Perhaps we can ask whether there was a substantially
different number of lock acquisitions? As Tom says, that would be an
issue.

It seems reasonable that relieving the bottleneck on BufMapping and
LockMgr locks that we would then queue longer on the next bottleneck,
WALInsertLock. So again, those tests seem reasonable to me so far.

These seem to be the beginnings of accurate wait time analysis, so I'm
listening closely.

Are you using a lightweight timer?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 FYI, in going through the release notes, I would like to remind
 committers that it is important to mention if the commit was backpatched
 to any earlier release.  Not only is this valuable for making the
 release notes, but it also helps people looking at the commit message.

With the standard output from cvs2cl, this is pretty obvious anyway, no?
I see entries like

2006-08-29 09:39  teodor

* contrib/tsearch2/: tsvector.c (REL8_1_STABLE), tsvector.c: Remove
pos comparison in silly_cmp_tsvector(): it is not a semantically
significant

so it seems to me that explicit mention of back-patching is mostly
redundant.  (Of course, this requires the committer to commit all the
branches at about the same time, which I make an effort to do precisely
so that the cvs log looks nice.  If some time elapses between patching
and back-patching then a mention in the commit message is definitely
needed.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] AIX shared libraries

2006-09-13 Thread Chris Browne

[EMAIL PROTECTED] (Rocco Altier) writes:
 Tom Lane wrote:
  Is it
  possible that the rules have changed across AIX versions, 
  and that the code in there now is needful for older versions?
 
 I don't think that this behaviour has changed. I remember it from
 AIX 4.3.2.
 
 AIX 4.3 is the first version to support the -brtl.  The current code is
 in place to mimic the behaviour of dlopen, etc, on the older platforms.

 I think we are at a point that we can stop maintaining AIX older than
 4.3 if we want.

Version 5.1 is no longer being maintained by IBM; we were some
displeased when we heard when support offerings were expiring :-(.
Fortunately, we already had plans in place for a migration to 5.3.

I have to agree that even 4.3 is really rather old now.

Looking at IBM's support lifecycle list...
http://www-306.ibm.com/software/info/supportlifecycle/list/a.html

AIX version   AvailableSupport Withdrawn
-
  5.1 May 2001  April 2006
  5.2 Oct 2002  Sept 2008
  5.3 Aug 2004  unannounced, presumably late 2010...

I'd guess that 4.3 fell out of support in late 2004.
-- 
let name=cbbrowne and tld=linuxdatabases.info in String.concat @ 
[name;tld];;
http://linuxfinances.info/info/
Of course 5  years from now that will be different,  but 5 years from
now  everyone  will  be  running  free  GNU on  their  200  MIPS,  64M
SPARCstation-5.  -- Andrew Tanenbaum, 1992.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Tom Lane

Stephen Marshall [EMAIL PROTECTED] writes:
 The following is a proposal for work I'd like to do to force 
 long-running backend processes to reprepare their prepared statements.  
 It would be used in cases where the user knows they have made a database 
 change that will invalidate an existing prepared statement. 

There should be no need for users to concern themselves with this.  The
direction we've been intending to go in is to automatically invalidate
stored plans when any related schema or statistics change occurs,
forcing a re-plan on any subsequent use.  See past discussions (IIRC,
Neil Conway actually did some work on this idea earlier this year, but
didn't get it done).

The appropriate cross-backend communication mechanism already exists:
it's the catcache/relcache invalidation code.  No need to fool with
finding a spare signal; and you can't do any meaningful work in a signal
handler anyway.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jim C. Nasby wrote:

There's been talk in the past of having some kind of system that
automatically attempts to build things that are in the patch queue, both
as an initial sanity-check and as a means to detect when something
bit-rots... perhaps it's becoming worthwhile to set that up.


After writing the enum patch, I hacked the buildfarm client code to 
apply a patch to the checked out code before building. You could then 
run it thusly:


./run_build.pl --nosend --nostatus --verbose \
  --patch=/home/tom/src/enums-v1.patch --patch-level=1

The idea was that patch authors could either run it manually or stick it 
in a cron so they could get emailed when the patch no longer cleanly 
applied, or when the patched source failed in make, make check etc. 
Obviously my motivation was to keep the enum patch up to date until we 
hit 8.3 and someone looks at it. To that end it might also be useful for 
it to die if duplicate_oids finds anything.


I submitted a patch to Andrew, but it needed a couple of tweaks 
(disabling patching on vpath builds, for example) and I don't think I 
ever got around to resubmitting it, but if there's more general interest 
I'll do so.


Note that it was intended for patch authors to run themselves rather 
than any kind of central mechanism to test the patch queue. While it 
would obviously be nice to know what the current status of any given 
patch in the queue is, the thing about the patch queue is that it 
contains patches that we haven't had time to review yet. It'll only take 
one patch to get into the queue containing a security vulnerability, or 
worse, a trojan, for it to seem unfortunate.


I had thoughts of hacking the buildfarm server to allow the posting of a 
patch along with results, so that authors could report results for their 
own patches, but ran out of time. Is there interest in doing that? 
Obviously it'd be a different server to the existing buildfarm.


Cheers

Tom

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  FYI, in going through the release notes, I would like to remind
  committers that it is important to mention if the commit was backpatched
  to any earlier release.  Not only is this valuable for making the
  release notes, but it also helps people looking at the commit message.
 
 With the standard output from cvs2cl, this is pretty obvious anyway, no?
 I see entries like
 
 2006-08-29 09:39  teodor
 
   * contrib/tsearch2/: tsvector.c (REL8_1_STABLE), tsvector.c: Remove
   pos comparison in silly_cmp_tsvector(): it is not a semantically
   significant
 
 so it seems to me that explicit mention of back-patching is mostly
 redundant.  (Of course, this requires the committer to commit all the
 branches at about the same time, which I make an effort to do precisely
 so that the cvs log looks nice.  If some time elapses between patching
 and back-patching then a mention in the commit message is definitely
 needed.)

I pull activity only from HEAD, so I do not see that tag.  In fact, I
use our src/tools/pgcvslog rather than cvslog.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Marshall, Steve

Tom,

Thanks for the update on the roadmap.  I have a couple of questions:

1. Is the invalidation of stored plans going to be part of 8.2?  If not,
any idea when it would be available?  I'd be willing to work on this, if
it would help.

2. Is there any plan for the other part of my proposal, i.e. the ability
to force one or all backends to reload their dynamically linked
libraries?  This is needed by backends that use loadable modules with
procedural languages (like pltcl_modules) in cases where a loadable
module gets updated.

Thanks,
Steve

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 13, 2006 2:08 PM
To: Marshall, Steve
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] - Proposal for repreparing prepared statements 

Stephen Marshall [EMAIL PROTECTED] writes:
 The following is a proposal for work I'd like to do to force 
 long-running backend processes to reprepare their prepared statements.
 It would be used in cases where the user knows they have made a 
 database change that will invalidate an existing prepared statement.

There should be no need for users to concern themselves with this.  The
direction we've been intending to go in is to automatically invalidate
stored plans when any related schema or statistics change occurs,
forcing a re-plan on any subsequent use.  See past discussions (IIRC,
Neil Conway actually did some work on this idea earlier this year, but
didn't get it done).

The appropriate cross-backend communication mechanism already exists:
it's the catcache/relcache invalidation code.  No need to fool with
finding a spare signal; and you can't do any meaningful work in a signal
handler anyway.

regards, tom lane


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I pull activity only from HEAD, so I do not see that tag.  In fact, I
 use our src/tools/pgcvslog rather than cvslog.

Not sure why we are maintaining our own script when there are much
better things out there:
http://freshmeat.net/projects/cvs2cl.pl/

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I pull activity only from HEAD, so I do not see that tag.  In fact, I
  use our src/tools/pgcvslog rather than cvslog.
 
 Not sure why we are maintaining our own script when there are much
 better things out there:
 http://freshmeat.net/projects/cvs2cl.pl/

Well, my script produces output that is closer to what I need to create
the release notes.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Tom Lane

Marshall, Steve [EMAIL PROTECTED] writes:
 1. Is the invalidation of stored plans going to be part of 8.2?  If not,
 any idea when it would be available?  I'd be willing to work on this, if
 it would help.

No, it did not get done; feel free to work on it for 8.3.

 2. Is there any plan for the other part of my proposal, i.e. the ability
 to force one or all backends to reload their dynamically linked
 libraries?

I would vote urgently against that, as not all loadable libraries are
necessarily designed to survive being reloaded.  Nor have I heard any
previous requests for it; how often do loadable libraries get updated in
production?  Stuff like pltcl.so would only change at a version update,
for which you have to restart the postmaster anyway to bring in the new
executable image for postgres itself.

The LOAD command is meant for development, not as something that would
be invoked in production ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Gregory Stark


Ron Mayer [EMAIL PROTECTED] writes:

 It's common here for queries to vastly overestimate the
 number of pages that would need to be read because
 postgresql's guess at the correlation being practically 0
 despite the fact that the distinct values for any given
 column are closely packed on a few pages.

I think we need a serious statistics jock to pipe up with some standard
metrics that do what we need. Otherwise we'll never have a solid footing for
the predictions we make and will never know how much we can trust them.

That said I'm now going to do exactly what I just said we should stop doing
and brain storm about an ad-hoc metric that might help:

I wonder if what we need is something like: sort the sampled values by value
and count up the average number of distinct blocks per value. That might let
us predict how many pages a fetch of a specific value would retrieve. Or
perhaps we need a second histogram where the quantities are of distinct pages
rather than total records.

We might also need a separate average number of n-block spans per value
metric to predict how sequential the i/o will be in addition to how many pages
will be fetched.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Lock partitions

2006-09-13 Thread Strong, David

Tom,

We have some results for you. We left the buffer partition locks at 128
as this did not seem to be a concern and we're still using 25 backend
processes. We ran tests for 4, 8 and 16 lock partitions. 

For 4 lock partitions, it took 620 seconds to acquire locks and 32
seconds to release locks. The test produced 199.95 TPS.

For 8 lock partitions, it took 505 seconds to acquire locks and 31
seconds to release locks. The test produced 201.16 TPS.

For 16 lock partitions, it took 362 seconds to acquire locks and 22
seconds to release locks. The test produced 200.75 TPS.

And, just for grins, using 128 buffer and 128 lock partitions, took 235
seconds to acquire locks and 22 seconds to release locks. The test
produced 203.24 TPS.

Let me know if we can provide any additional information from these
tests and if there are any other tests that we can run.

David

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Strong, David
Sent: Wednesday, September 13, 2006 10:52 AM
To: PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

Simon,

In the 16/16 (16 buffer partitions/16 lock partitions) test, the
WALInsertLock lock had 14643080 acquisition attempts and 12057678
successful acquisitions on the lock. That's 2585402 retries on the lock.
That is to say that PGSemaphoreLock was invoked 2585402 times.

In the 128/128 test, the WALInsertLock lock had 14991208 acquisition
attempts and 12324765 successful acquisitions. That's 2666443 retries.

The 128/128 test attempted 348128 more lock acquisitions than the 16/16
test and retried 81041 times more than the 16/16 test. We attribute the
rise in WALInsertLock lock accesses to the reduction in time on
acquiring the BufMapping and LockMgr partition locks. Does this seem
reasonable?

The overhead of any monitoring is of great concern to us. We've tried
both clock_gettime () and getttimeofday () calls. They both seem to have
the same overhead ~1 us/call (measured against the TSC of the CPU) and
both seem to be accurate. We realize this can be a delicate point and so
we would be happy to rerun any tests with a different timing mechanism.

David

-Original Message-
From: Simon Riggs [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 13, 2006 2:22 AM
To: Tom Lane
Cc: Strong, David; PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote:
 Strong, David [EMAIL PROTECTED] writes:
  When using 16 buffer and 16 lock partitions, we see that BufMapping
  takes 809 seconds to acquire locks and 174 seconds to release locks.
The
  LockMgr takes 362 seconds to acquire locks and 26 seconds to release
  locks.
 
  When using 128 buffer and 128 lock partitions, we see that
BufMapping
  takes 277 seconds (532 seconds improvement) to acquire locks and 78
  seconds (96 seconds improvement) to release locks. The LockMgr takes
235
  seconds (127 seconds improvement) to acquire locks and 22 seconds (4
  seconds improvement) to release locks.
 
 While I don't see any particular penalty to increasing
 NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
 significant penalty (increasing PGPROC size as well as the work needed
 during LockReleaseAll, which is executed at every transaction end).
 I think 128 lock partitions is probably verging on the ridiculous
 ... particularly if your benchmark only involves touching half a dozen
 tables.  I'd be more interested in comparisons between 4 and 16 lock
 partitions.  Also, please vary the two settings independently rather
 than confusing the issue by changing them both at once.

Good thinking David. Even if 128 is fairly high, it does seem worth
exploring higher values - I was just stuck in fewer == better
thoughts.

  With the improvements in the various locking times, one might expect
an
  improvement in the overall benchmark result. However, a 16 partition
run
  produces a result of 198.74 TPS and a 128 partition run produces a
  result of 203.24 TPS.
 
  Part of the time saved from BufMapping and LockMgr partitions is
  absorbed into the WALInsertLock lock. For a 16 partition run, the
total
  time to lock/release the WALInsertLock lock is 5845 seconds. For 128
  partitions, the WALInsertLock lock takes 6172 seconds, an increase
of
  327 seconds. Perhaps we have our WAL configured incorrectly?
 
 I fear this throws your entire measurement procedure into question.
For
 a fixed workload the number of acquisitions of WALInsertLock ought to
be
 fixed, so you shouldn't see any more contention for WALInsertLock if
the
 transaction rate didn't change materially.

David's results were to do with lock acquire/release time, not the
number of acquisitions, so that in itself doesn't make me doubt these
measurements. Perhaps we can ask whether there was a substantially
different number of lock acquisitions? As Tom says, that would be an
issue.

It seems reasonable that relieving the bottleneck on BufMapping and
LockMgr locks that we

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Andrew Dunstan

Bruce Momjian wrote:
 Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I pull activity only from HEAD, so I do not see that tag.  In fact, I
  use our src/tools/pgcvslog rather than cvslog.

 Not sure why we are maintaining our own script when there are much
 better things out there:
 http://freshmeat.net/projects/cvs2cl.pl/

 Well, my script produces output that is closer to what I need to create
 the release notes.



If there are procedures, please document them. Nobody ever told me much
when I was given committer status, and I just did what it looked like you
guys did, and no doubt made some mistakes.

cheers

andrew



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread AgentM



On Sep 13, 2006, at 14:44 , Gregory Stark wrote:



I think we need a serious statistics jock to pipe up with some  
standard
metrics that do what we need. Otherwise we'll never have a solid  
footing for
the predictions we make and will never know how much we can trust  
them.


That said I'm now going to do exactly what I just said we should  
stop doing

and brain storm about an ad-hoc metric that might help:

I wonder if what we need is something like: sort the sampled values  
by value
and count up the average number of distinct blocks per value. That  
might let
us predict how many pages a fetch of a specific value would  
retrieve. Or
perhaps we need a second histogram where the quantities are of  
distinct pages

rather than total records.

We might also need a separate average number of n-block spans per  
value
metric to predict how sequential the i/o will be in addition to how  
many pages

will be fetched.


Currently, statistics are only collected during an ANALYZE. Why  
aren't statistics collected during actual query runs such as seq  
scans? One could turn such as beast off in order to get repeatable,  
deterministic optimizer results.


-M

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Marshall, Steve



Marshall, Steve [EMAIL PROTECTED] writes:
 1. Is the invalidation of stored plans going to be part of 8.2?  If 
 not, any idea when it would be available?  I'd be willing to work on 
 this, if it would help.

No, it did not get done; feel free to work on it for 8.3.

[steve's reply]:
Could you clue me into the buzz words for searching the lists to find
out more details on the plan, what has been done, and what is left to
do?

 2. Is there any plan for the other part of my proposal, i.e. the 
 ability to force one or all backends to reload their dynamically 
 linked libraries?

I would vote urgently against that, as not all loadable libraries are
necessarily designed to survive being reloaded.  Nor have I heard any
previous requests for it; how often do loadable libraries get updated in
production?  Stuff like pltcl.so would only change at a version update,
for which you have to restart the postmaster anyway to bring in the new
executable image for postgres itself.

The LOAD command is meant for development, not as something that would
be invoked in production 

[steve's reply]:
I can understand your reservations about reloading libraries, and I'm
really not committed to doing that.  Reloading libraries is just a
mechanism to do what I really want to do.  I guess that did not come
across very clearly in my first email, so let me explain again.

I want to be able to tell a backend to reinitialize some of the cached
data it is holding in static variables.  Generally I want to do this
when the something cached in memory has gotten out-of-sync with what
is in the database.  The two examples that affect me are:

1. A database table has changed in its structure, and the prepared
statements related to it either no longer work, or do the wrong thing.

2. A TCL module in pltcl_modules is updated with a new version, but
running processes that have already executed a function from this module
do not pick up the change.  

The former is handled by the prepared statement invalidation plan, but
the latter is not.  Effectively, I want something that does the
equivalent to what CREATE OR REPLACE FUNCTION does for SQL function,
except for these procedural language modules.  The current way the
pltcl_modules works is like have a CREATE OR REPLACE FUNCTION
capability that will only effect newly started connections, but has no
effect on existing connections.

The reason I advocated the library reloading is not because pltcl.so has
changed (which should be rare), but that by reloading it, the static
variables associated with the library would get reinitialized.
Essentially reloading the library pltcl.so has the side effect of
causing the TCL modules in pltcl_modules to get reread from disk.  I'd
love to have a lighter-weight way to do this, but unfortunately, I have
not come up with one.  If someone has a better way to do this, I'm all
ears.

Currently the only way I know of to do a complete reinitialization of
static variables is to force a disconnection on all the backend
processes, and let them all reconnect to new postgres backends that have
clean slates in their static variables.  There does not seem to be a way
to do this via SQL; as far as I know, this requires access to the
database server as the postgres user or superuser and sending a TERM
signal to each backend you want to kill. 
-
Basically, I've come up with two basic approaches for forcing a reload
of pltcl_modules:  have a system admin function to do some kind of
resetting in each backend that cleans out the static variables
associated with a procedural language, or kill most or all of the
backends and let the client applications handle the errors and reset
their connections.  Currently, the only resetting operation I know to
work is reloaded the pltcl.so library.

If anyone has any better ideas for how to crack this problem, I love to
hear them.

Here's some more details on the approaches I've thought of, for those
who are interested:

1. Add special logic to pltcl_loadmod (the utility that facilitates
inserting and updating TCL modules in the database). However, this would
still require some kind of communication to all the other backend
processes, which would probably require something like the next
suggestion

2. A system admin command to force the reinitialization of the static
data associated with a procedural language.  This command would need to
communicate either with all the backends, or to a backend with a
particular pid (like pg_cancel_backend does); it is not sufficient for
it only to effect the backend associated with its connection.  This
command could be specific to the procedural language (e.g.
reload_pltcl_modules), or allow a particular dynamically linked library
to be reload (reload_library_if_already_loaded(libname).  There are
probably other ways to effect the reinitialization, but all other them
seem to require communication to a backend process other than the one we
are connected to, which made it seem to me like a candidate for

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Jim Nasby


On Sep 11, 2006, at 1:57 PM, Gregory Stark wrote:

Tom Lane [EMAIL PROTECTED] writes:
I think its's more important to pick bitpatterns that reduce the  
number of
cases heap_deform_tuple has to think about while decoding the  
length of a

field --- every if in that inner loop is expensive.


I'll have to spend a few hours tomorrow becoming one with that  
section of
code. I looked at it already and was surprised at how short it was  
already so

I can understand what you mean.


I'm guessing this won't change the design, but I'll throw it out  
anyway. I'd love to have the ability to control toasting thresholds  
manually. This could result in a lot of speed improvements in cases  
where a varlena field isn't frequently accessed and will be fairly  
large, yet not large enough to normally trigger toasting. An address  
field would be a good example. Being able to force a field to be  
toasted before it normally would could drastically improve tuple  
density without requiring the developer to use a 'side table' to  
store the data.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Jim Nasby


On Sep 13, 2006, at 6:56 PM, Tom Dunstan wrote:
Regarding the idea of a list of approved patch authorisers, don't  
we have

such a group now? i.e. committers.


Right, and if committers or others are willing to put in the time  
required to verify that patches aren't nasty before going onto the  
blessed patch queue, the idea could quite possibly work and provide  
some value. Note that all we really need to test for here is that  
the patch isn't malicious; patches that are bad design or whatever  
are unlikely to open security holes or fry your box. A major point  
of the queue is that the appropriate committer often doesn't have  
the time to review the patch right now. There might be some benefit  
in allowing a wider set of contributors to bless patches as non- 
nasty for testing purposes, rather than waste the limited time of  
qualified committers. Maybe such an army doesn't exist, though.


That's something I'd be willing to do. And for many people that  
aren't committers but are still trusted in the community, we could  
probably bypass the checking.


Another possibility would be to test these patches in some kind of  
virtual machine that gets blown away every X days, so that even if  
someone did get something malicious in there it wouldn't last long.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Bruce Momjian

Andrew Dunstan wrote:
 Bruce Momjian wrote:
  Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   I pull activity only from HEAD, so I do not see that tag.  In fact, I
   use our src/tools/pgcvslog rather than cvslog.
 
  Not sure why we are maintaining our own script when there are much
  better things out there:
  http://freshmeat.net/projects/cvs2cl.pl/
 
  Well, my script produces output that is closer to what I need to create
  the release notes.
 
 
 
 If there are procedures, please document them. Nobody ever told me much
 when I was given committer status, and I just did what it looked like you
 guys did, and no doubt made some mistakes.

I guess the question is whether it is possible using cvs2cl to show only
HEAD, and then show if the same commit message also appears in the most
recent back branch.  And will that always work reliably?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Lock partitions

2006-09-13 Thread Tom Lane

Strong, David [EMAIL PROTECTED] writes:
 We have some results for you. We left the buffer partition locks at 128
 as this did not seem to be a concern and we're still using 25 backend
 processes. We ran tests for 4, 8 and 16 lock partitions. 

 For 4 lock partitions, it took 620 seconds to acquire locks and 32
 seconds to release locks. The test produced 199.95 TPS.

 For 8 lock partitions, it took 505 seconds to acquire locks and 31
 seconds to release locks. The test produced 201.16 TPS.

 For 16 lock partitions, it took 362 seconds to acquire locks and 22
 seconds to release locks. The test produced 200.75 TPS.

 And, just for grins, using 128 buffer and 128 lock partitions, took 235
 seconds to acquire locks and 22 seconds to release locks. The test
 produced 203.24 TPS.

[ itch... ]  I can't help thinking there's something wrong with this;
the wait-time measurements seem sane, but why is there essentially no
change in the TPS result?

The above numbers are only for the lock-partition LWLocks, right?
What are the totals --- that is, how much time is spent blocked
vs. processing overall?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Lock partitions

2006-09-13 Thread Jim Nasby


On Sep 13, 2006, at 2:46 PM, Strong, David wrote:
We have some results for you. We left the buffer partition locks at  
128

as this did not seem to be a concern and we're still using 25 backend
processes. We ran tests for 4, 8 and 16 lock partitions.


Isn't having more lock partitions than buffer partitions pointless?
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Tom Lane

Marshall, Steve [EMAIL PROTECTED] writes:
 I want to be able to tell a backend to reinitialize some of the cached
 data it is holding in static variables.  Generally I want to do this
 when the something cached in memory has gotten out-of-sync with what
 is in the database.  The two examples that affect me are:

 1. A database table has changed in its structure, and the prepared
 statements related to it either no longer work, or do the wrong thing.

This we need to fix.

 2. A TCL module in pltcl_modules is updated with a new version, but
 running processes that have already executed a function from this module
 do not pick up the change.  

To be frank, this is not Postgres' problem, it's pltcl's.  I think the
pltcl_modules facility is poorly designed (precisely because it doesn't
fit into any sane way of handling the schema-update problem) and needs
to be thrown out and redone.  If the units you were reloading were
Postgres functions, or had some other way of being represented in the
system catalogs, then we'd have a reasonable way to attack the problem.
But forcing a reload of pltcl.so is nothing but a kluge --- it leaks
memory like there's no tomorrow, and it's only an accident that it fails
to crash.  I don't want to design a further kluge on top of it.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger


Mark Dilger wrote:

Tom Lane wrote:

Mark Dilger [EMAIL PROTECTED] writes:
... The argument made upthread that a quadratic number of conversion 
operators is necessitated doesn't seem right to me, given that each 
type could upcast to the canonical built in type.  (int1 = smallint, 
int3 = integer, ascii1 = text, ascii2 = text, ascii3 = text, etc.)


This would work all right for the string-category cases, since TEXT is
the only thing you really care about having them cast to anyway.
It probably won't work all that well for int1/int3, because you really
want them to coerce implicitly to all the wider numeric types.
Otherwise, perfectly sane queries like int8 + int1 fail.

Part of the issue here is that we deliberately keep the parser from
searching for multi-step coercions.  So for example if you only provide
int1-int2 then the existence of up-casts from int2 doesn't help you
use an int1 with anything except int2.

I am not sure whether any problems would be created if you did provide
the full spectrum of up-casts.  I remember having argued that there
would be problems with trying to invent uint2/uint4 types, but that was
a very long time ago, before we had pg_cast and some other changes in
the type resolution rules.  With the current system it might work OK.

regards, tom lane


Thanks Tom,

I will try this then.  I won't be proposing to ever put this in core, as 
 the increased code size isn't justified for people who aren't using 
these types (IMHO).  Any further feedback on why this wouldn't work is 
appreciated, as it might save me some time learning on my own.  But 
otherwise I'll post back in a few days when this is finished.




I've created the int1 and int3 types, with casts to/from each other as 
well as to/from int2, int4, int8, float4, float8, and numeric.  They 
also have comparison operators for themselves, though you have to use 
casts if you want to compare against other numeric types.


int1 works perfectly, as far as I can tell.  int3 works great in memory, 
but can't be stored to a table.  The problem seems to be that 
store_att_byval allows data of size 1 byte but not size 3 bytes, forcing 
me to pass int3 by reference.  But when I pass either of these types by 
reference the backend exits when trying to store to a table.


Does anybody know whether storing data by reference works for data 
smaller than 4 bytes?  If not, then I seem to be out of options for 
creating a 3-byte datatype.  This would also seem to prevent the 
creation of an ascii3 type as well.


mark


tiny.tgz
Description: GNU Unix tar archive

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Tom Lane

Mark Dilger [EMAIL PROTECTED] writes:
 int1 works perfectly, as far as I can tell.  int3 works great in memory, 
 but can't be stored to a table.  The problem seems to be that 
 store_att_byval allows data of size 1 byte but not size 3 bytes, forcing 
 me to pass int3 by reference.  But when I pass either of these types by 
 reference the backend exits when trying to store to a table.

Please provide a stack trace --- AFAIK there shouldn't be any reason why
a pass-by-ref 3-byte type wouldn't work.  I'm wondering though what
alignment you expect it to have.  You'd need some pretty ugly code to
pick up an unaligned 3-byte integer portably ... but if you align it,
the space savings probably goes out the window.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Patch attribution and non-ASCII characters

2006-09-13 Thread Bruce Momjian

I see a number of non-ASCII characters in the names of patch submitters
in the CVS logs.  Does anyone know a good way to have all these get the
same encoding in the CVS commit logs?  I am thinking that is impossible
because we can't attach the email encoding to the commit message.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CVS commit messages and backpatching

2006-09-13 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I guess the question is whether it is possible using cvs2cl to show only
 HEAD, and then show if the same commit message also appears in the most
 recent back branch.

It's so rare that we make a back-branch patch without a corresponding
HEAD patch that I'm not clear why you are concerned about showing only
HEAD for this purpose.  I've always found that cvs2cl's behavior shows
me exactly what I want to know for CVS log searching (and that includes
release note making).  If the output format isn't quite what you want,
maybe you could turn on its XML-output option and then munge that into
HTML.  I've never had occasion to play with the XML-format option myself.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Lock partitions

2006-09-13 Thread Tom Lane

Jim Nasby [EMAIL PROTECTED] writes:
 Isn't having more lock partitions than buffer partitions pointless?

AFAIK they're pretty orthogonal.  It's true though that a typical
transaction doesn't hold all that many locks, which is why I don't
see a need for a large number of lock partitions.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jim Nasby wrote:
That's something I'd be willing to do. And for many people that aren't 
committers but are still trusted in the community, we could probably 
bypass the checking.


That's a worthwhile point. How many patches come from the general 
community vs out of the blue? Patches from regulars could probably get a 
free pass, which might cut down the review burden substantially.


Another possibility would be to test these patches in some kind of 
virtual machine that gets blown away every X days, so that even if 
someone did get something malicious in there it wouldn't last long.


Yeah, nasties could be roughly separated into two categories: stuff 
which affects your box, or stuff which uses your box to affect someone 
else. A VM fixes the first, and a firewall blocking outgoing connections 
(with exceptions for the CVS server and patch buildfarm or whatever it 
is server) largely fixes the second.


I was under the impression that most VM products are x86 centric, which 
wouldn't lead to huge amounts of diversity in the buildfarm results. At 
least, not as far as architecture goes.


Cheers

Tom


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Ron Mayer

Gregory Stark wrote:
 Ron Mayer [EMAIL PROTECTED] writes:
 ...vastly overestimate the number of pages .. because postgresql's guess 
 at the correlation being practically 0 despite the fact that the distinct 
 values for any given column are closely packed on a few pages.
 
 I think we need a serious statistics jock to pipe up with some standard
 metrics that do what we need. Otherwise we'll never have a solid footing for
 the predictions we make and will never know how much we can trust them.

Do we know if any such people participate/lurk on this list, or
if the conversation should go elsewhere?

 That said I'm now going to do exactly what I just said we should stop doing
 and brain storm about an ad-hoc metric that might help:
 
 I wonder if what we need is something like: sort the sampled values by value
 and count up the average number of distinct blocks per value. That might let
 us predict how many pages a fetch of a specific value would retrieve. Or
 perhaps we need a second histogram where the quantities are of distinct pages
 rather than total records.

Either of these sound like they might be an improvement over correlation
itself to estimate the number of pages it'd need to read.  Would it be
relatively easy or hard for a programmer not too familiar with the code
to experiment with these ideas?  Where would be a good place to look.

 We might also need a separate average number of n-block spans per value
 metric to predict how sequential the i/o will be in addition to how many pages
 will be fetched.

I'm wildly guessing that, the # of pages itself seems to be
a bigger factor than the sequential/random nature.  For example,
I do a query for data from a particular small city I'd only
need dozens of pages, not many thousands.

OTOH, it'd be neat to know if this were true.  Is there any
good way to make something like explain analyze show both
the expected and actual # of pages and # of seeks?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Tom Lane

Ron Mayer [EMAIL PROTECTED] writes:
 I'm wildly guessing that, the # of pages itself seems to be
 a bigger factor than the sequential/random nature.

No, they're both important: fetching N pages in sequence is way cheaper
than fetching the same number of pages scattered all over.  This is
partly because you reduce seeking at the hardware level, and partly
because sequential reads cue the kernel to do read-ahead, allowing
overlap of I/O and computation.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Marshall, Steve

 

But forcing a reload of pltcl.so is nothing but a kluge --- it leaks
memory like there's no tomorrow, and it's only an accident that it fails
to crash.  I don't want to design a further kluge on top of it.

Are you saying that pltcl.so leaks memory in general, or that forcing a
reload of the pltcl.so library leaks memory?  If the former, I haven't
seen evidence of it in my use of pltcl, but I'd be interested to know
ofo problems other have had with it.

Yours,
Steve Marshall


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Inconsistency in extended-query-protocol logging

2006-09-13 Thread Tom Lane

Guillaume Smet [EMAIL PROTECTED] writes:
 On 9/13/06, Tom Lane [EMAIL PROTECTED] wrote:
 statement: querystring  Simple Query
 parse stmt: querystring   Parse
 bind stmt/portal: querystring   Bind
 execute stmt/portal: querystringExecute

 I agree with that.

OK, Bruce hasn't said anything so I went ahead and changed it,
as per attached patch.

 Hmmm, AFAICS, you changed stmt/portal to portal to
 statement in your last commit. Or did I misunderstand?

Yeah, that seemed like a good idea at the time ... but on reflection
it's probably better to be consistent with the way the execute message
is logged.

 Also, the current code distinguishes a fetch from an
 already-partially-executed portal ... do you care about that?

 I don't really understand what is a portal - I must admit I don't use
 libpq directly. I never saw a log file with fetch. Do you have an
 example? Should I consider an execute from fetch differently?

Well, the point is that if you see
execute unnamed: select ...
followed by
execute fetch from unnamed: select ...
the latter is a continuation of the same command not a newly entered
command.  For resource-analysis purposes you can likely treat them the
same, but if you were trying to debug an application you might need to
know the difference.

regards, tom lane


*** src/backend/tcop/postgres.c.origFri Sep  8 11:55:53 2006
--- src/backend/tcop/postgres.c Wed Sep 13 17:51:35 2006
***
*** 1610,1619 
break;
case 2:
ereport(LOG,
!   (errmsg(duration: %s ms  bind %s to 
%s: %s,
msec_str,
-   *portal_name ? 
portal_name : unnamed,
*stmt_name ? stmt_name 
: unnamed,
pstmt-query_string ? 
pstmt-query_string : source not stored),
 errdetail_params(params)));
break;
--- 1610,1620 
break;
case 2:
ereport(LOG,
!   (errmsg(duration: %s ms  bind %s%s%s: 
%s,
msec_str,
*stmt_name ? stmt_name 
: unnamed,
+   *portal_name ? / : ,
+   *portal_name ? 
portal_name : ,
pstmt-query_string ? 
pstmt-query_string : source not stored),
 errdetail_params(params)));
break;
***
*** 1740,1747 
ereport(LOG,
(errmsg(%s %s%s%s%s%s,
execute_is_fetch ?
!   _(statement: execute fetch 
from) :
!   _(statement: execute),
prepStmtName,
*portal_name ? / : ,
*portal_name ? portal_name : ,
--- 1741,1748 
ereport(LOG,
(errmsg(%s %s%s%s%s%s,
execute_is_fetch ?
!   _(execute fetch from) :
!   _(execute),
prepStmtName,
*portal_name ? / : ,
*portal_name ? portal_name : ,

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Jeremy Drake

On Wed, 13 Sep 2006, Tom Dunstan wrote:

  Another possibility would be to test these patches in some kind of virtual
  machine that gets blown away every X days, so that even if someone did get
  something malicious in there it wouldn't last long.

Or just have a snapshot which is reverted after each run, and read-only
access to files used to do the build.  I know vmware supports this,
probably others too...

 Yeah, nasties could be roughly separated into two categories: stuff which
 affects your box, or stuff which uses your box to affect someone else. A VM
 fixes the first, and a firewall blocking outgoing connections (with exceptions
 for the CVS server and patch buildfarm or whatever it is server) largely fixes
 the second.

 I was under the impression that most VM products are x86 centric, which
 wouldn't lead to huge amounts of diversity in the buildfarm results. At least,
 not as far as architecture goes.

I have played with QEmu (www.qemu.org) which is open source and supports
multiple target architectures.  I'm not sure how stable all of the
different targets are, I know that sparc64 is not quite done yet.


-- 
The problem with engineers is that they tend to cheat in order to get
results.

The problem with mathematicians is that they tend to work on toy
problems in order to get results.

The problem with program verifiers is that they tend to cheat at toy
problems in order to get results.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Tom Lane

Marshall, Steve [EMAIL PROTECTED] writes:
 But forcing a reload of pltcl.so is nothing but a kluge --- it leaks
 memory like there's no tomorrow, and it's only an accident that it fails
 to crash.  I don't want to design a further kluge on top of it.

 Are you saying that pltcl.so leaks memory in general, or that forcing a
 reload of the pltcl.so library leaks memory?

The latter.  There's no mechanism to release the previous instance's data
structures (Tcl interpreters, function bodies, etc).

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Joshua Reich


Ron Mayer wrote:

Gregory Stark wrote:
  

Ron Mayer [EMAIL PROTECTED] writes:

...vastly overestimate the number of pages .. because postgresql's guess 
at the correlation being practically 0 despite the fact that the distinct 
values for any given column are closely packed on a few pages.
  

I think we need a serious statistics jock to pipe up with some standard
metrics that do what we need. Otherwise we'll never have a solid footing for
the predictions we make and will never know how much we can trust them.



Do we know if any such people participate/lurk on this list, or
if the conversation should go elsewhere?
  
I lurk... I don't  know if I'm a 'statistics jock', but I may be 
valuable if only I had a better understanding of how the optimizer 
works. I have been following this thread with interest, but could really 
do with a good pointer to background information beyond what I have read 
in the main postgres manual. Does such information exist,  and if so, 
where ?


Josh Reich

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )

2006-09-13 Thread Tom Lane

Albe Laurenz [EMAIL PROTECTED] writes:
 The patch did not work for me :-(

 Is libpq.so in a non-standard directory? If yes, one either
 has to export LIBPATH in the environment or link with
 -L/location/of/libpq for the executable to find it
 (similar to RPATH in Linux).

libpq will definitely be in a nonstandard directory in buildfarm usage.
It sounds like we should add the local equivalent of RPATH for AIX.

Now I'm pretty dubious that -L is what we want --- -L should have the
semantics of this is where to find the library right now.  RPATH is
used to point to an expected installation directory that might not
contain the library at the time of linking ... or worse, might contain
an old incompatible version of it.  Does AIX have a true RPATH
equivalent?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Patch attribution and non-ASCII characters

2006-09-13 Thread Alvaro Herrera

Bruce Momjian wrote:
 I see a number of non-ASCII characters in the names of patch submitters
 in the CVS logs.  Does anyone know a good way to have all these get the
 same encoding in the CVS commit logs?  I am thinking that is impossible
 because we can't attach the email encoding to the commit message.

Is this a problem now, or are you looking to solve it for future
releases?

I think the best you could do is post the non-ASCII names here, and have
affected people post back their names in HTML escaping or something that
suits the SGML docs.

For example my name is
Aacute;lvaro Herrera

Or, in Latin-1,
Álvaro Herrera

Most commit messages contain the ASCII version of my name, thus you
wouldn't notice the problem anyway.  The COPY (select) commit message,
AFAIR, also has Zoltán's name in ASCII form (Zoltan).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Patch attribution and non-ASCII characters

2006-09-13 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 I think the best you could do is post the non-ASCII names here, and have
 affected people post back their names in HTML escaping or something that
 suits the SGML docs.

That's probably the best way to close the loop.  I know that when I'm
committing such patches, I tend to copy-n-paste from a mail window to a
shell window, and I'd not venture to guarantee anything about what
encoding the text is in anyway.  Sometimes it looks reasonable in the
shell window and sometimes it doesn't ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

information_schema vs temp tables (was Re: [HACKERS] [COMMITTERS] pgsql: Sequences were not being shown due to the use of lowercase `s`)

2006-09-13 Thread Tom Lane

Greg Sabino Mullane [EMAIL PROTECTED] writes:
 ... I can't think of a use case where a user would not want to
 append a is_visible clause to the query above. That or start
 tracking which pg_temp_ schema belongs to whom.

Well, I'm still having a problem with this, because it seems like a
pretty klugy solution.  It's inefficient (the is_visible functions
are not cheap) and it's not hard to fool:

set search_path = pg_temp_N, ...

(This won't work for a non-superuser, because he'll not have USAGE
privilege on someone else's temp schema, but you seem to be worried
about hiding temp tables from superusers.)

If you're really intent on making it work this way, my vote is to
expose namespace.c's isOtherTempNamespace() as a SQL-callable function,
and add a test on that to the info-schema views, rather than relying on
is_visible or explicit knowledge of the temp-schema naming convention.

Perhaps we should expose both
pg_is_my_temp_schema(schema_oid)
pg_is_other_temp_schema(schema_oid)

Thoughts?  Opinions about the function names?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Arturo Perez

In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Jim Nasby) wrote:

 I'd love to have the ability to control toasting thresholds  
 manually. ... Being able to force a field to be  
 toasted before it normally would could drastically improve tuple  
 density without requiring the developer to use a 'side table' to  
 store the data.

+1 :-)

-arturo

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Patch attribution and non-ASCII characters

2006-09-13 Thread Bruce Momjian

Alvaro Herrera wrote:
 Bruce Momjian wrote:
  I see a number of non-ASCII characters in the names of patch submitters
  in the CVS logs.  Does anyone know a good way to have all these get the
  same encoding in the CVS commit logs?  I am thinking that is impossible
  because we can't attach the email encoding to the commit message.
 
 Is this a problem now, or are you looking to solve it for future
 releases?

Either.  ;-)

 I think the best you could do is post the non-ASCII names here, and have
 affected people post back their names in HTML escaping or something that
 suits the SGML docs.
 
 For example my name is
 Aacute;lvaro Herrera
 
 Or, in Latin-1,
 ?lvaro Herrera
 
 Most commit messages contain the ASCII version of my name, thus you
 wouldn't notice the problem anyway.  The COPY (select) commit message,
 AFAIR, also has Zolt?n's name in ASCII form (Zoltan).

Yep. I will grab the unknown names and ask the group to research HTML
versions.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Getting a move on for 8.2 beta

2006-09-13 Thread Tom Dunstan


Jeremy Drake wrote:

On Wed, 13 Sep 2006, Tom Dunstan wrote:


I was under the impression that most VM products are x86 centric, which
wouldn't lead to huge amounts of diversity in the buildfarm results. At least,
not as far as architecture goes.


I have played with QEmu (www.qemu.org) which is open source and supports
multiple target architectures.  I'm not sure how stable all of the
different targets are, I know that sparc64 is not quite done yet.


Oh, I didn't realize Qemu did non-x86 architectures. Is it considered 
good enough at emulating e.g. a sparc for it to be useful to us? PearPC 
was a PowerPC emulator that got some press a while ago, although it 
appears that the project has stagnated a bit (probably because people 
who wanted to run OSX on intel hardware have a legit way to do it now :) )


The problem with these things is if something goes wrong, was it the 
patch that failed or the not-quite-perfect VM product? To cut down on 
those sorts of problems, I suppose we could have it do a clean, 
non-patched run first, and then only do the patched version if the clean 
version passed. We'd have to be reasonably unlucky to have a patch 
trigger a VM bug under those circumstance, I would think.


Cheers

Tom

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Tom Lane

Joshua Reich [EMAIL PROTECTED] writes:
 I lurk... I don't  know if I'm a 'statistics jock', but I may be 
 valuable if only I had a better understanding of how the optimizer 
 works. I have been following this thread with interest, but could really 
 do with a good pointer to background information beyond what I have read 
 in the main postgres manual. Does such information exist,  and if so, 
 where ?

Well, there's the 2-foot view here:
http://developer.postgresql.org/pgdocs/postgres/planner-optimizer.html
but after that you have to start reading code.

The optimizer README file may be useful:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/README
but it goes into a lot of details that probably aren't interesting for
your purposes.  Most of the planner is just mechanism associated with
generating different possible plans.  The policy that determines which
plan is chosen is the cost-estimation equations, and those are all in
costsize.c and selfuncs.c:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/costsize.c
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c
The division between these two files is a bit historical, but roughly
speaking selfuncs.c knows about the behavior of specific WHERE-clause
operators and index access methods, while costsize.c knows about the
behavior of particular plan types.

I'd like to think that costsize.c is well enough commented that you can
follow it even without any C knowledge, but selfuncs.c may be a bit more
daunting.  Still, the comments are pretty extensive, and feel free to
ask questions on pg-hackers.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger


Tom Lane wrote:

Mark Dilger [EMAIL PROTECTED] writes:
int1 works perfectly, as far as I can tell.  int3 works great in memory, 
but can't be stored to a table.  The problem seems to be that 
store_att_byval allows data of size 1 byte but not size 3 bytes, forcing 
me to pass int3 by reference.  But when I pass either of these types by 
reference the backend exits when trying to store to a table.


Please provide a stack trace --- AFAIK there shouldn't be any reason why
a pass-by-ref 3-byte type wouldn't work.  I'm wondering though what
alignment you expect it to have.  You'd need some pretty ugly code to
pick up an unaligned 3-byte integer portably ... but if you align it,
the space savings probably goes out the window.

regards, tom lane


Program received signal SIGSEGV, Segmentation fault.
0xb7e01d45 in memcpy () from /lib/libc.so.6
(gdb) bt
#0  0xb7e01d45 in memcpy () from /lib/libc.so.6
#1  0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, 
values=0x83c2e84, isnull=0x83c2e98 , data=0x83c2ef4 , 
infomask=0x83c2ef0, bit=0x0)

at heaptuple.c:181
#2  0x08078b0d in heap_form_tuple (tupleDescriptor=0x83c2d78, 
values=0x83c2e84, isnull=0x83c2e98 ) at heaptuple.c:749

#3  0x0815d2d9 in ExecCopySlotTuple (slot=0x83c26f4) at execTuples.c:558
#4  0x0815d393 in ExecMaterializeSlot (slot=0x83c26f4) at execTuples.c:639
#5  0x081560ca in ExecutorRun (queryDesc=0x83c2834, 
direction=ForwardScanDirection, count=0) at execMain.c:1401
#6  0x081e78e4 in ProcessQuery (parsetree=0x83c2240, plan=0x83b837c, 
params=0x3, dest=0x83b8290, completionTag=0xbfedffa0 ) at pquery.c:174
#7  0x081e89f9 in PortalRun (portal=0x83c0064, count=2147483647, 
dest=0x83b8290, altdest=0x83b8290, completionTag=0xbfedffa0 ) at 
pquery.c:1076
#8  0x081e4060 in exec_simple_query (query_string=0x83b7bbc insert into 
test (a) values (3::int3);) at postgres.c:1004
#9  0x081e6074 in PostgresMain (argc=4, argv=0x836fab4, 
username=0x836fa8c mark) at postgres.c:3219

#10 0x081b89b3 in ServerLoop () at postmaster.c:2854
#11 0x081ba21b in PostmasterMain (argc=1, argv=0x836d9f8) at 
postmaster.c:941

#12 0x081764a8 in main (argc=1, argv=0x836d9f8) at main.c:265


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CSStorm occurred again by postgreSQL8.2

2006-09-13 Thread Tom Lane

I wrote:
 ... it seems like in the
 case where RecordSubTransactionCommit detects that the subxact has not
 stored its XID anywhere, we could immediately remove the XID from
 the PGPROC array, just as if it had aborted.  This would avoid chewing
 subxid slots for cases such as exception blocks in plpgsql that are
 not modifying the database, but just catching computational errors.

(and later realized that Alvaro had had the same idea awhile back, but
I don't have his message at hand).

I looked into this a bit more; it seems like basically it should only
take addition of

else
XidCacheRemoveRunningXids(xid, 0, NULL);

to the bottom of RecordSubTransactionCommit(), plus suitable adjustment
of the comments in both routines.  However, there's a problem: if we
delete a second-level subxact's XID from PGPROC, and later its upper
subtransaction aborts, XidCacheRemoveRunningXids will emit scary
warnings when it doesn't find the sub-subxact in PGPROC.  This could
doubtless be fixed with sufficient jiggery-pokery --- simply removing
the debug warnings would be a brute-force answer, but I'd like to find
something a bit less brute-force.  Maybe drop the sub-subxact from its
parent's list immediately, instead of carrying it forward?

Anyway, given that there's this one nonobvious gotcha, there might be
others.  My recommendation is that we take this off the open-items list
for 8.2 and revisit it in the 8.3 cycle when there's more time.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Not-so-open items

2006-09-13 Thread Tom Lane

There are several entries on the 8.2 open-items list that I think can be
removed:

Fix backward array comparison - subset

Done (this was redundant with the containment-operator item)

Store only active XIDs in subtransaction cache

Per my note just now, this probably should wait for 8.3.

Double vacuum speed on tables with no indexes

I changed the locking thing I was worried about.  Unless Greg wants to
do some real-world performance measurements to confirm or refute that
change, I think this can be closed.

Fix excessive page checking for new btree split code

Per my note yesterday, I can't reproduce the misbehavior I saw six weeks
ago, so I recommend we leave the code alone.

Suppress error on bind parameters of unknown types

I haven't heard one single person speak up to say yeah, that's a good
idea, so I conclude it probably isn't.  Recommend we not change it.


BTW, pushing out an 8.1.5 is probably a good idea, but what's it doing
in the 8.2 open-items list?  Especially in the documentation section?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Tom Lane

Mark Dilger [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Please provide a stack trace --- AFAIK there shouldn't be any reason why
 a pass-by-ref 3-byte type wouldn't work.

 (gdb) bt
 #0  0xb7e01d45 in memcpy () from /lib/libc.so.6
 #1  0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, 
 values=0x83c2e84, isnull=0x83c2e98 , data=0x83c2ef4 , 
 infomask=0x83c2ef0, bit=0x0)
  at heaptuple.c:181

Hm, are you sure you provided a valid pointer (not the integer value
itself) as the Datum output from int3_in?

(Looks at patch ... ) Um, I think you didn't, although that coding
is far too cute to be actually readable ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Mark Dilger


Tom Lane wrote:

Mark Dilger [EMAIL PROTECTED] writes:

Tom Lane wrote:

Please provide a stack trace --- AFAIK there shouldn't be any reason why
a pass-by-ref 3-byte type wouldn't work.



(gdb) bt
#0  0xb7e01d45 in memcpy () from /lib/libc.so.6
#1  0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, 
values=0x83c2e84, isnull=0x83c2e98 , data=0x83c2ef4 , 
infomask=0x83c2ef0, bit=0x0)

 at heaptuple.c:181


Hm, are you sure you provided a valid pointer (not the integer value
itself) as the Datum output from int3_in?

(Looks at patch ... ) Um, I think you didn't, although that coding
is far too cute to be actually readable ...

regards, tom lane


I tracked this down to my implementation of Int24GetDatum.  I've got 
that fixed now and have hit another bug, but I'm still working on it so 
I won't bother you about that yet.


As for the patch, I will eventually submit a version without the cute 
code autogeneration stuff.


mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Interesting tight loop

2006-09-13 Thread Theo Schlossnagle

In production today (8.1.4), I ran into a backend process that  
wouldn't cancel right away -- minutes went by.


It was in

[0] TransactionIdIsCurrentTransactionId
[1] HeapTupleSatisfiesSnapshot
...

But the interesting thing is that there were 4.6 million elements in  
the s-childXids list.  Which is why it took so damn long.  I can't  
quite figure out how I induced this state.  It is an OLAP server with  
about 10-20 connection that run long queries (from 5 seconds to 24  
hours).


If this is a common possible state, it seems that perhaps a hash of  
the childXids would be more appropriate.  Does the order of the  
childXids chained off the current transaction state matter?  Most of  
the placed I could find that reference it seem to just attempt to  
find an Xid in there.  O(1) sounds a lot better than O(n)  :-D


Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Say42

Tom Lane wrote:

 I'm having a hard time getting excited about improving this query when
 it's so badly coded in the first place.  What's an ORDER BY doing in
 an EXISTS subquery?  The LIMIT is unnecessary too.  And the inner WHERE
 says nothing so much as I don't know how to design a database :-(.

It was the test query which has the same execution plan for belg_mobile
(and the same problem) as the production query below:

select (select max(code) from belg_mobile tc
where c.bnum = tc.code and c.bnum like tc.code || '%') as code,
  c.cause,
  c.ilno,
  extract(hour from c.datetime) as hour,
  count(*) as cnt,
  sum(c.dur) as dur
from conn.conn20060803 c
where itgrp = :itgrp
group by 1,2,3,4

It's a simple OLAP query for analysis telephonic traffic distribution
over time and trunk codes.
'max(codes)' is used to get  the most matching code. For example,
84725 and 8472 are both valid codes, and number 84725123456 must match
84725 but not 8472. The 'c.bnum = tc.code' qual significantly reduce
index scan and execution time.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Draft release notes

2006-09-13 Thread Bruce Momjian

Here is an early draft of the release notes.  It needs more polish and
review:

http://momjian.us/cgi-bin/pgrelease

I will catch up on my email tomorrow, update the open items list for
8.2, and then return to the release notes for cleanup.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

84 matches

Mail list logo