Re: [HACKERS] Constraint Type Coercion issue?

2005-09-15 Thread Martijn van Oosterhout
On Wed, Sep 14, 2005 at 10:42:36PM -0400, Tom Lane wrote:
 The thing that's still fairly unclear to me is whether the collation
 information is attached to the operators/functions or to the data.
 I recall there's been some discussion of sticking collation IDs into
 individual text Datums, which is a completely different path than what
 you are positing above.  Does the SQL spec mandate one or the other of
 these approaches?  If it does, do we want to believe it?  (The more I
 read of SQL2003, the less impressed I get...)

The standard doesn't care I think. My reading is that the COLLATE
status is determined at parse time. From there you can plan however you
like.

AFAIUI, collate is a parameter/property of fields and domains and
affects operators and function, not the data. It only applies to
comparisons, not the output. You could add it as a property to the
data. I wrote a module, taggedtypes [1], which basically implemented
this. My main issue with it is that for the '' operator, the same
collate property has to be on both arguments or it has to bail. The
only reason why you can attach COLLATE to fields and domains is to give
a default in case the user doesn't specify anything. But if the COLLATE
is given explicitly, it overrides anything.

By way of example:

CREATE TABLE t (a text collate c1, b text collate c2);

select * from t where a  b;  -- ERROR: Indeterminate collate
select * from t where a  b COLLATE c3;   -- Order by c3

My worry about adding the collate to the Datum is that your execution
tree becomes more complex. The two types, with and without COLLATE data
are not interchangable and you would have to add or remove them
continuously.

In the above example, you would have to create an executor node whose
sole purpose is to add the collate bit to a and b before passing it to
the '' operator. An index can only support one collation at a time
also.

I don't think there is an easy way out...

[1] http://svana.org/kleptog/pgsql/taggedtypes.html
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpZxzfFhqYrs.pgp
Description: PGP signature


Re: [HACKERS] random system table corruption ...

2005-09-15 Thread Hans-Juergen Schoenig

alvora,

what concerns me here: this is a sun system and the problem happened  
during normal operation.
there should not be a recovery related operation. something which is  
also interesting: there are two corrupted pages in there (page number  
22 and 26).

strange thing :(.

thanks a lot,

hans


On 11 Sep 2005, at 20:01, Alvaro Herrera wrote:


On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:


in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which  
occurs on

hackers' from time to time.
i have checked a broken file and i have seen that the corrupted  
page has

actually been zeroed out.



IIRC the XFS filesystem zeroes out pages that it recovers from the
journal but did not have a fsync on them (AFAIK XFS journals only
metadata, so page creation but not the content itself).  I don't think
this would be applicable to your case, because we do fsync modified
files on checkpoint, and rewrite them completely from WAL images after
that.  But I thought I'd mention it.

--
Alvaro Herrera -- Valdivia, Chile Architect,  
www.EnterpriseDB.com

Just treat us the way you want to be treated + some extra allowance
 for ignorance.(Michael Brusser)

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] bug #1702: nested composite types in plpgsql

2005-09-15 Thread Roman Neuhauser
# [EMAIL PROTECTED] / 2005-09-13 12:17:06 -0400:
 Roman Neuhauser [EMAIL PROTECTED] writes:
  Looking at src/pl/plpgsql/src/pl_exec.c for the first time, is it a problem
  of make_tuple_from_row() not accounting for nested composite types?
 
 Looks that way.  I've committed a fix to HEAD.  I'm not sure how hard
 it'd be to fix 8.0.

Thanks for the fast fix, it's really appreciated.

I tried to hammer[1] your patch[2] onto the REL8_0_STABLE branch
(attached), but am getting

ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.
CONTEXT:  PL/pgSQL function breakme while storing call arguments
into local variables

I'm sorry to be a nuisance, but is this interesting enough for
someone in the know that they'd backport the patch into 8.0?

[1] see attachment
[2] 
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/pl_exec.c.diff?r1=1.151;r2=1.152

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991
Index: src/pl/plpgsql/src/pl_exec.c
===
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.127.4.3
diff -u -r1.127.4.3 pl_exec.c
--- src/pl/plpgsql/src/pl_exec.c20 Jun 2005 22:51:49 -  
1.127.4.3
+++ src/pl/plpgsql/src/pl_exec.c14 Sep 2005 13:04:13 -
@@ -3243,8 +3243,7 @@
  *
  * If expectedtypeid isn't InvalidOid, it is checked against the actual type.
  *
- * This obviously only handles scalar datums (not whole records or rows);
- * at present it doesn't need to handle PLpgSQL_expr datums, either.
+ * At present this doesn't handle PLpgSQL_expr or PLpgSQL_arrayelem datums.
  *
  * NOTE: caller must not modify the returned value, since it points right
  * at the stored value in the case of pass-by-reference datatypes.
@@ -3864,19 +3863,20 @@
 
for (i = 0; i  natts; i++)
{
-   PLpgSQL_var *var;
+   Oid fieldtypeid;
 
if (tupdesc-attrs[i]-attisdropped)
-   continue;   /* leave the column as 
null */
+   {
+   nulls[i] = true;/* leave the column as null */
+   continue;
+   }
if (row-varnos[i]  0) /* should not happen */
elog(ERROR, dropped rowtype entry for non-dropped 
column);
 
-   var = (PLpgSQL_var *) (estate-datums[row-varnos[i]]);
-   if (var-datatype-typoid != tupdesc-attrs[i]-atttypid)
+   exec_eval_datum(estate, estate-datums[row-varnos[i]],
+   InvalidOid, fieldtypeid, 
dvalues[i], nulls[i]);
+   if (fieldtypeid != tupdesc-attrs[i]-atttypid)
return NULL;
-   dvalues[i] = var-value;
-   if (!var-isnull)
-   nulls[i] = ' ';
}
 
tuple = heap_formtuple(tupdesc, dvalues, nulls);

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Per-table freeze limit proposal

2005-09-15 Thread Jim Buttafuoco
while you are at it, can you put in some audit timestamps as to when the vacuum 
occurred (full vs not full). 


-- Original Message ---
From: Alvaro Herrera [EMAIL PROTECTED]
To: Hackers pgsql-hackers@postgresql.org
Sent: Wed, 14 Sep 2005 22:14:23 -0400
Subject: [HACKERS] Per-table freeze limit proposal

 Hackers,
 
 As you've probably heard too many times already, I'm thinking in
 improving vacuum, so we can keep track of the freeze Xid on a table
 level, rather than database level.  Hopefully this will eliminate the
 need for database-wide vacuums.
 
 In fact this seems pretty easy to do.  Add a field to pg_class, tell
 VACUUM to update it using the determined freezeLimit, and that's it.
 (Note that if we ever implement partial vacuum, it won't be able to
 update the freeze point.  But that was true before anyway.)
 
 We also need to teach autovacuum to update pg_database.datfreezexid,
 using the minimum from pg_class.  (I don't think it's a good idea to
 seqscan pg_class to find out the minimum on each VACUUM call.) So, an
 autovacuum iteration would issue all needed VACUUM/ANALYZE calls, then
 get the minimum freezexid from pg_class to update pg_database.  This
 way, GetNewTransactionId can continue checking pg_database.datfreezexid
 as the hard limit for issuing warnings for Xid wraparound.
 
 Does anyone see a need for anything other than the autovacuum process to
 be updating pg_database.datfreezexid?  Of course, if autovacuum is not
 in use, things would continue as now, that is, manual database-wide
 VACUUM calls updating pg_database.datfreezexid.  But note that you can
 mark all tables as disabled on pg_autovacuum, issue your manuals VACUUM
 calls as needed (from cron or whatever), and use autovacuum to set
 pg_database.datfreezexid -- so autovacuum would in fact do nothing
 except set the freeze limit.
 
 The problem is, this seems so awfully simple that I fear I am missing
 something ...  Otherwise, does this sound like a plan?
 
 -- 
 Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
 The easiest way to resolve [trivial code guidelines disputes] is to fire
 one or both of the people involved.  (Damian Conway)
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
--- End of Original Message ---


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] pgplsql temporary tables

2005-09-15 Thread Maximiliano Di Rienzo



Hi, i didn't know 
where to send this, so forgive me if this isn't the right 
list.
In our company we 
had been using PostgreSQL for development for more than 2 years now, it's an 
excelent RDBMS, the only thing we miss is the ability to call 
storedprocedures (functions returning records in PG) with a simplier 
syntax, but at last we get used to... one problem that we all knowis the 
compiled pl/pgsql script that "remembers" the OID of temporary tables, and the 
alternative that PG devs recommend is using EXECUTE() to execute "dinamically" 
the query inside the function, this is tedious because for queries that uses the 
parameters of the function you have to concatenate them, and sometimes you can 
spend more time in debugging concatenation issues than the query 
itself.

So,since 
thePREPARE/EXECUTE (would be a little easier to code if supported) don't 
work in pl/pgsql and the OIDs are remembered, we develop a function that 
recompiles another function, i attach it here for your review. 


This way before 
calling a function with temporary tables we call "select 
recompilar('function_name')" and then call the actual function, that once 
recompiled works well. This is equivalent to recreate the function froma 
command line.

We know this is not 
the right way to do it and it affects the performance since PG must recompile 
the function script, but for functions that aren't called frecuently and are 
dificult to codeusing concatenatios this is a good 
approach.

This version only 
works on PostgreSQL 8 since it looks for argument names in the declaration of 
the function, you can remove the references to proargnames and should work on 
7.x too.

Best 
regards.
--
Lic. Maximiliano Di Rienzo
IT 
Manager
M.P.: 2502 - MCP ID: 
2725911 
[EMAIL PROTECTED]
http://www.fulfill-tech.com
Tel.: 351 
4283419 - Cel.: 351 155901603
Fulfill 
Technology S.A.
27 de 
Abril 424 - Piso3 Of. A -Córdoba 
Atención: 
Este mail es confidencial. En el caso que usted no sea el destinatario, 
no esta autorizado a reproducir o divulgar a terceros el contenido de este 
mensaje. Si usted lo ha recibido por error, por favor informenos inmediatamente 
devolviendo el correo electrónico y borrando el documento.Attention: 
This E-Mail is confidential. If you are not the intended recipient, you 
must not copy, disclose or use its contents. If you have received it in error, 
please inform us immediately by return E-Mail and delete the 
document.Atenção: Esta mensagem, e qualquer de seus anexos, 
eh confidencial e privilegiada. Caso voce nao seja o destinatario, nao esta 
autorizado a reproduzir ou divulgar a terceiros o conteudo desta mensagem e de 
qualquer anexo da mesma e deve apagar com os seus respectivos anexos. 
Aufmerksamkeit: Dieses E-Mail ist vertraulich. Wenn Sie 
nicht der rechtmaessige Empfaenger sind, duerfen Sie den Inhalt weder kopieren, 
verbreiten oder benutzen. Sollten Sie dieses E-Mail versehentlich erhalten 
haben, senden Sie es bitte an uns zurueck und loeschen es 
anschliessend. 



Recompilar.sql
Description: Binary data

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pgplsql temporary tables

2005-09-15 Thread Merlin Moncure
Hi, i didn't know where to send this, so forgive me if this isn't the
right list.

probably general.

 compiled pl/pgsql script that remembers the OID of temporary tables,


Have you considered that if the temp table is created outside of the
function that uses it, this is not a problem?  

Make one function, init_temp_tables() which is called after connection.
It creates all your tables so they are persistent between calls.  (FWIW,
your function is clever).

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Per-table freeze limit proposal

2005-09-15 Thread Alvaro Herrera
On Wed, Sep 14, 2005 at 11:30:52PM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  In fact this seems pretty easy to do.  Add a field to pg_class, tell
  VACUUM to update it using the determined freezeLimit, and that's it.
 
 I think that it'd be worth fixing things so that the recorded value
 is not the freeze cutoff value (as now), but the actual lowest
 not-frozen XID present anywhere in the table.

Cool.  I wonder if the exact figure should be

min(lowest non-frozen Xid in table, GetOldestXmin(false))

just in case a long-running transaction inserts a new tuple after the
vacuum is done.  Also GetOldestXmin should be the value used for empty
tables.  For shared relations, we'd use GetOldestXmin(true).

Also, in light of this, it seems a bad idea to use the name freezexid
for the pg_class column; I would name it relminxid or something like
that (suggestions welcome).  Not sure about renaming the pg_database
column -- I don't see why not.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
La Primavera ha venido. Nadie sabe como ha sido (A. Machado)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] pg_autovacuum settings not saved on dump

2005-09-15 Thread Alvaro Herrera
Hi,

Robert Treat reminded me the other day that we don't currently save
pg_autovacuum settings on pg_dump.  This is expected, because we don't
want to dump pg_autovacuum as a regular table, or it would force us to
accept loading that forever; nor we do have ALTER TABLE commands to do
it on a higher level, because there wasn't agreement that it should be
implemented this early on autovacuum's life.

Expected or not, fact is it's not user friendly.  We should at least
document this somewhere so users can take care of it by themselves.  Not
sure where does it belong though.  The autovacuum section, the backup
section?  Does it merit a mention on the release notes?

Suggestions welcome.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han
perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre
del totalitarismo o del santo nombre de la libertad y la democracia? (Gandhi)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Per-table freeze limit proposal

2005-09-15 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Cool.  I wonder if the exact figure should be
 min(lowest non-frozen Xid in table, GetOldestXmin(false))

Actually just min(lowest Xid in table, RecentXmin).  You only need to be
sure there are no running transactions older than what you put into the
field; their xmins are not at issue.

 Also, in light of this, it seems a bad idea to use the name freezexid
 for the pg_class column; I would name it relminxid or something like
 that (suggestions welcome).

Works for me.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_autovacuum settings not saved on dump

2005-09-15 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Expected or not, fact is it's not user friendly.  We should at least
 document this somewhere so users can take care of it by themselves.  Not
 sure where does it belong though.  The autovacuum section, the backup
 section?  Does it merit a mention on the release notes?

Wherever we document that those settings exist at all (which is not real
prominent AFAIR).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Michael Paesold

Tom Lane wrote:

I guess what this means is that there's no real problem with losing the
cache line while manipulating the LWLock, which is what the patch was
intended to prevent.  Instead, we're paying for swapping two cache lines
(the spinlock and the LWLock) across processors instead of just one line.
But that should at worst be a 2x inflation of the time previously spent
in LWLockAcquire/Release, which is surely not yet all of the application
;-).  Why the heck is this so bad?  Should we expect that apparently
minor changes in shared data structures might be costing equivalently
huge penalties in SMP performance elsewhere?

Unfortunately I don't have root on the Opteron and can't run oprofile.
But I'd really like to see some oprofile stats from these two cases
so we can figure out what in the world is going on here.  Can anyone
help?


I will try the patch here and see if it gives the same result. If so I could 
try to run with oprofile if you can give me a quick start.


Best Regards,
Michael Paesold 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Tom Lane
I wrote:
 I guess what this means is that there's no real problem with losing the
 cache line while manipulating the LWLock, which is what the patch was
 intended to prevent.  Instead, we're paying for swapping two cache lines
 (the spinlock and the LWLock) across processors instead of just one line.
 But that should at worst be a 2x inflation of the time previously spent
 in LWLockAcquire/Release, which is surely not yet all of the application
 ;-).  Why the heck is this so bad?  Should we expect that apparently
 minor changes in shared data structures might be costing equivalently
 huge penalties in SMP performance elsewhere?

I did some oprofile work and found that the cost seems to be because
(1) there's an increase in spinlock contention (more time spent in
s_lock), and (2) there's more time spent in LWLockAcquire/Release.
I'm not entirely clear about the reason for (1), but (2) apparently
is because of the extra cache line swapping, as posited above.  In
the oprofile trace it's clear that the extra cost comes exactly in the
statements that touch fields of the shared LWLock, which are the places
where you might have to wait to acquire a cache line.

I thought for a bit that the problem might come from having chosen to
put a pointer to the spinlock into each LWLock; fetching that pointer
implies an additional access to the contended cache line.  However,
changing the data structure to a simple linear array of spinlocks
didn't help.  So that whole idea seems to have died a painful death.

One other interesting result is that the data structure change neither
helped nor hurt on an EM64T machine with 2 physical (4 logical)
processors.  This is also x86_64, but evidently the caching behavior
is totally different from Opterons.

One thing that did seem to help a little bit was padding the LWLocks
to 32 bytes (by default they are 24 bytes each on x86_64) and ensuring
the array starts on a 32-byte boundary.  This ensures that we won't have
any LWLocks crossing cache lines --- contended access to such an LWLock
would probably incur the sort of large penalty seen above, because you'd
be trading two cache lines back and forth not one.  It seems that the
important locks are not split that way in CVS tip, because the gain
wasn't much, but I wonder whether some effect like this might explain
some of the unexplainable performance changes we've noticed in the past
(eg, in the dbt2 results).  A seemingly unrelated small change in the
size of other data structures in shared memory might move things around
enough to make a performance-critical lock cross a cache line boundary.

On regular x86, the LWLock struct is by chance exactly 16 bytes, so
there's no alignment issue.  But 64-bit platforms and platforms where
slock_t is bigger than char are exposed to this risk.

I'm going to go ahead and make that change, since it doesn't seem likely
to have any downside.  It might be interesting to look into forcing
proper alignment of the shared buffer headers as well.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] US Census database (Tiger 2004FE) - 4.4G

2005-09-15 Thread Darcy Buskermolen
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
 You can send it to me, and ehpg will host it.  I'll send you a
 private email with my info.

 Gavin

 On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
  It is 4.4G in space in a gzip package.
 
  I'll mail a DVD to two people who promise to host it for Hackers.

I'm wondering if this is now available for consumption by the rest of us??  

(ie what's the link) 


-- 
Darcy Buskermolen
Wavefire Technologies Corp.

http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Gavin Sherry
On Thu, 15 Sep 2005, Tom Lane wrote:

 One thing that did seem to help a little bit was padding the LWLocks
 to 32 bytes (by default they are 24 bytes each on x86_64) and ensuring
 the array starts on a 32-byte boundary.  This ensures that we won't have
 any LWLocks crossing cache lines --- contended access to such an LWLock
 would probably incur the sort of large penalty seen above, because you'd
 be trading two cache lines back and forth not one.  It seems that the
 important locks are not split that way in CVS tip, because the gain
 wasn't much, but I wonder whether some effect like this might explain
 some of the unexplainable performance changes we've noticed in the past
 (eg, in the dbt2 results).  A seemingly unrelated small change in the
 size of other data structures in shared memory might move things around
 enough to make a performance-critical lock cross a cache line boundary.

What about padding the LWLock to 64 bytes on these architectures. Both P4
and Opteron have 64 byte cache lines, IIRC. This would ensure that a
cacheline doesn't hold two LWLocks.

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Simon Riggs
 Tom Lane wrote
 I'm going to go ahead and make that change, since it doesn't
 seem likely
 to have any downside.  It might be interesting to look into forcing
 proper alignment of the shared buffer headers as well.

Just catching up on your mails - all of that sounds good so far.

Everything mentioned so far talks about spinlocks in the general sense,
rather than with respect to particular locks.

The different lock types we have are held for different amounts of time
and are also subject to differing amounts of contention. I think it
would be useful to put in a capability to tune each class of lock
according to the possibility for delay and contention on it.

Long delay, low contention e.g. CheckpointLock = waiter should sleep
immediately
Medium delay, high contention e.g. WALWriteLock = waiter spins if at
head of queue, else sleeps immediately
Short delay, high contention e.g. BufMappingLock = waiter spins and
retries forever, cos the lock is coming soon
Short delay, low contention, sometimes long waits at end of VACUUM
e.g. FreeSpaceLock = waiter spins, then eventually sleeps

I'm not sure whether you'll agree with my characterisation of these
locks, but the main thing I'm trying to get across is that
once-size-fits-all isn't the optimal approach. I'm not saying either
that we end up with individual characterisations for each lock type, but
a few key ones could be catered for differently.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Tom Lane
Gavin Sherry [EMAIL PROTECTED] writes:
 What about padding the LWLock to 64 bytes on these architectures. Both P4
 and Opteron have 64 byte cache lines, IIRC. This would ensure that a
 cacheline doesn't hold two LWLocks.

I tried that first, actually, but it was a net loss.  I guess enlarging
the array that much wastes too much cache space.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Beta2 Wrap Up ...

2005-09-15 Thread Marc G. Fournier


Tomorrow afternoon, we are planning on packaging up Beta2 .. if anyone is 
sitting on something that should get in before that happens, or has a bug 
they are sitting on, please let us know ...


I am planning on wrapping things at around noon my time (~3pm GMT, I 
believe, if I have my timezones right) ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Gavin Sherry
On Thu, 15 Sep 2005, Tom Lane wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
  What about padding the LWLock to 64 bytes on these architectures. Both P4
  and Opteron have 64 byte cache lines, IIRC. This would ensure that a
  cacheline doesn't hold two LWLocks.

 I tried that first, actually, but it was a net loss.  I guess enlarging
 the array that much wastes too much cache space.


Interesting. On Xeon (2 phys, 4 log), with LWLock padded to 64 bytes and
the cmpb/jump removed I get:

[EMAIL PROTECTED] pgsqlpad]$ for i in 1 2 4; do time ./nrun.sh $i; done

real0m54.362s
user0m0.003s
sys 0m0.009s

real1m9.788s
user0m0.011s
sys 0m0.013s

real2m8.870s
user0m0.016s
sys 0m0.028s
[EMAIL PROTECTED] pgsqlpad]$ for i in 1 2 4; do time ./nrun.sh $i; done

real0m55.544s
user0m0.006s
sys 0m0.007s

real1m9.313s
user0m0.007s
sys 0m0.018s

real2m1.769s
user0m0.017s
sys 0m0.027s

This compares to the following, which is unpadded but has cmpb/jump
removed but is otherwise vanilla:

1: 55: 2: 111: 4: 207

The decrease is small, but it's there.

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Tom Lane
Gregory Maxwell [EMAIL PROTECTED] writes:
 might be useful to align the structure so it always crosses two lines
 and measure the performance difference.. the delta could be basically
 attributed to the cache line bouncing since even one additional bounce
 would overwhelm the other performance effects from the changed
 alignment.

Good idea.  I goosed the struct declaration and setup code to arrange
that the BufMappingLock's spinlock and the rest of its data were in
different cache lines instead of the same one.  The results (still
on Red Hat's 4-way Opteron):

previous best code (slock-no-cmpb and spin-delay-2):
1 31s   2 42s   4 51s   8 100s
with LWLock padded to 32 bytes and correctly aligned:
1 31s   2 41s   4 51s   8 97s
with LWLocks 32 bytes, but deliberately misaligned:
1 30s   2 50s   4 102s  8 200s

There is no other reason than having to touch multiple cache lines for
the second and third cases to be different: the array indexing code
should be exactly the same.

These last numbers are pretty close to what I got from the
separated-spinlock patch:
1 31s   2 52s   4 106s  8 213s
So it seems there's no doubt that it's the doubled cache traffic that
was causing most of the problem there.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Beta2 Wrap Up ...

2005-09-15 Thread Neil Conway
On Thu, 2005-15-09 at 21:09 -0300, Marc G. Fournier wrote:
 Tomorrow afternoon, we are planning on packaging up Beta2 .. if anyone is 
 sitting on something that should get in before that happens, or has a bug 
 they are sitting on, please let us know ...

One change that I would like to get into beta2 is the proposed
refactoring of some of the new system info / administration functions.
Since these will require an initdb, it would be nice to get them into
the tree before beta2 (since we're requiring an initdb for beta2
anyway).

I've got the patch finished, but I'm just waiting for a consensus on the
right API for some of these functions (see the -hackers thread on the
subject). I'll check something into CVS by this evening EST...

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_autovacuum settings not saved on dump

2005-09-15 Thread Robert Treat
On Thursday 15 September 2005 15:41, Alvaro Herrera wrote:
 Hi,

 Robert Treat reminded me the other day that we don't currently save
 pg_autovacuum settings on pg_dump. 
snip
 Does it merit a mention on the release notes? 


ISTM this is a backwards incompatibility with previous installations that 
might have used autovacuum from contrib, so I think a mention of it is 
warrented in the release notes (with a recomendation on how to port your old 
autovacuum settings to the new system if anyone feels up to the task).

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Beta2 Wrap Up ...

2005-09-15 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 One change that I would like to get into beta2 is the proposed
 refactoring of some of the new system info / administration functions.

I thought we'd more or less dropped that idea based on Andreas'
responses.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Tom Lane
Gavin Sherry [EMAIL PROTECTED] writes:
 Interesting. On Xeon (2 phys, 4 log), with LWLock padded to 64 bytes and
 the cmpb/jump removed I get:
 [ 1 55s   2 69s   4 128s ]
 This compares to the following, which is unpadded but has cmpb/jump
 removed but is otherwise vanilla:
 1: 55: 2: 111: 4: 207

Hmm, that's pretty significant.  I tried it on a Xeon EM64T (thanks to
Stephen Frost for providing access), also 2 phys 4 log, and get:

Yesterday's CVS tip:
1 32s   2 46s   4 88s   8 168s
plus no-cmpb and spindelay2:
1 32s   2 48s   4 100s  8 177s
plus just-committed code to pad LWLock to 32:
1 33s   2 50s   4 98s   8 179s
alter to pad to 64:
1 33s   2 38s   4 108s  8 180s

I don't know what to make of the 2-process time going down while
4-process goes up; that seems just weird.  But both numbers are
repeatable.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Beta2 Wrap Up ...

2005-09-15 Thread Neil Conway
On Thu, 2005-15-09 at 22:31 -0400, Tom Lane wrote:
 I thought we'd more or less dropped that idea based on Andreas'
 responses.

I've heard no argument against renaming pg_complete_relation_size() to
pg_total_relation_size() and changing the functions that return an
integer status code to make them return a boolean (but I'm content with
not making them return void and report errors via elog).

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_autovacuum settings not saved on dump

2005-09-15 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes:
 On Thursday 15 September 2005 15:41, Alvaro Herrera wrote:
 Robert Treat reminded me the other day that we don't currently save
 pg_autovacuum settings on pg_dump. 

 ISTM this is a backwards incompatibility with previous installations that 
 might have used autovacuum from contrib,

How do you figure that?  There was no corresponding capability at all in
previous versions, and pg_dump certainly didn't save any of the contrib
autovacuum daemon's arguments for you.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Gregory Maxwell
On 9/15/05, Tom Lane [EMAIL PROTECTED] wrote:
 Yesterday's CVS tip:
 1 32s   2 46s   4 88s   8 168s
 plus no-cmpb and spindelay2:
 1 32s   2 48s   4 100s  8 177s
 plus just-committed code to pad LWLock to 32:
 1 33s   2 50s   4 98s   8 179s
 alter to pad to 64:
 1 33s   2 38s   4 108s  8 180s
 
 I don't know what to make of the 2-process time going down while
 4-process goes up; that seems just weird.  But both numbers are
 repeatable.

It is odd.

In the two process case there is, assuming random behavior, a 1/2
chance that you've already got the right line, but in the 4 process
case only a 1/4 chance (since we're on a 4 way box). This would
explain why we don't see as much cost in the intentionally misaligned
case. You'd expect the a similar pattern of improvement with the
64byte alignment (some in the two process case, but more in the 4
case), but here we see more improvement in the two way case.

If I had to guess I might say that the 64byte alignment is removing
much of the unneeded line bouncing in the the two process case but is
at the same time creating more risk of bouncing caused by aliasing.
Since two processes have 1/2 chance the aliasing isn't a problem so
the change is a win, but in the four process case it's no longer a win
because with aliasing there is still a lot of fighting over the cache
lines even if you pack well, and the decrease in packing makes odd
aliasing somewhat more likely. This might also explain why the
misaligned case performed so poorly in the 4process case, since the
misalignment didn't just increase the cost 2x, it also increased the
likelihood of a bogus bounce due to aliasing..

If this is the case, then it may be possible through very careful
memory alignment to make sure that no two high contention locks that
are likely to be contended at once share the same line (through either
aliasing or through being directly within the same line).

Then again I could be completely wrong, my understanding of
multiprocessor cache coherency is very limited, and I have no clue how
cache aliasing fits into it... So the above is just uninformed
conjecture.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-15 Thread Tom Lane
Gregory Maxwell [EMAIL PROTECTED] writes:
 If I had to guess I might say that the 64byte alignment is removing
 much of the unneeded line bouncing in the the two process case but is
 at the same time creating more risk of bouncing caused by aliasing.

It's an idea ... not sure if it's right or not.

One thing I have noticed both on Xeon HT (2-physical-4-logical) and
on Opteron (4 real processors) is that the 2-process times are
significantly more variable than the 1, 4, or 8-process times: repeating
the measurements shows variance around the 10% level for 2 processes
but only around 1% for the others.

What I think this means is that the kernel is scheduling the 2 processes
onto 2 processors chosen-at-random, without awareness of whether those
two processors are on the same chip (in the Xeon case) or have closer
NUMA affinity (in the Opteron case).  The other test cases are all
symmetric and there's no way for the kernel to blow it too badly, but
in the 2-process case it will be very visible whether the kernel
understands the hardware or not.  And the impression I have (which
might be out of date) is that current Linux kernel releases are not
very bright about these things.

How does that tie into the point at hand about different results for
64-byte vs 32-byte LWLocks?  Not sure, but I feel it's related somehow.

Anyway, it'd be interesting to get some test results for these things
on SMP machines running non-Linux kernels.  Also, we ought to be more
rigorous about reporting exactly which kernel we are using for any
particular test.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Request to clarify on sql_numeric datatype

2005-09-15 Thread Raghavendra Reddy
Title: Message




Hi,

 I have a query on sql_numeric 
datatype Can you please clarify on this? 
 While inserting numeric data type into a 
table can you please let me know how you are storing the value in the 
database.
 
Whether it is stored as double or character or 
directly storing it as structure.
 

 I am implementing the slq_numeric datatype 
feature for our database and I am stuck up how to store the value in the 
database.
 Can you please help out regarding 
this.

Thanks in 
advance.

Thanks and 
Regards,
Raghavendra



Re: [HACKERS] Beta2 Wrap Up ...

2005-09-15 Thread Neil Conway
On Thu, 2005-15-09 at 22:06 -0400, Neil Conway wrote:
 One change that I would like to get into beta2 is the proposed
 refactoring of some of the new system info / administration functions.

Ok, this is done: the changes have been committed to CVS HEAD and the
catalog version number has been bumped.

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly