Re: [HACKERS] Constraint Type Coercion issue?
On Wed, Sep 14, 2005 at 10:42:36PM -0400, Tom Lane wrote: The thing that's still fairly unclear to me is whether the collation information is attached to the operators/functions or to the data. I recall there's been some discussion of sticking collation IDs into individual text Datums, which is a completely different path than what you are positing above. Does the SQL spec mandate one or the other of these approaches? If it does, do we want to believe it? (The more I read of SQL2003, the less impressed I get...) The standard doesn't care I think. My reading is that the COLLATE status is determined at parse time. From there you can plan however you like. AFAIUI, collate is a parameter/property of fields and domains and affects operators and function, not the data. It only applies to comparisons, not the output. You could add it as a property to the data. I wrote a module, taggedtypes [1], which basically implemented this. My main issue with it is that for the '' operator, the same collate property has to be on both arguments or it has to bail. The only reason why you can attach COLLATE to fields and domains is to give a default in case the user doesn't specify anything. But if the COLLATE is given explicitly, it overrides anything. By way of example: CREATE TABLE t (a text collate c1, b text collate c2); select * from t where a b; -- ERROR: Indeterminate collate select * from t where a b COLLATE c3; -- Order by c3 My worry about adding the collate to the Datum is that your execution tree becomes more complex. The two types, with and without COLLATE data are not interchangable and you would have to add or remove them continuously. In the above example, you would have to create an executor node whose sole purpose is to add the collate bit to a and b before passing it to the '' operator. An index can only support one collation at a time also. I don't think there is an easy way out... [1] http://svana.org/kleptog/pgsql/taggedtypes.html -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpZxzfFhqYrs.pgp Description: PGP signature
Re: [HACKERS] random system table corruption ...
alvora, what concerns me here: this is a sun system and the problem happened during normal operation. there should not be a recovery related operation. something which is also interesting: there are two corrupted pages in there (page number 22 and 26). strange thing :(. thanks a lot, hans On 11 Sep 2005, at 20:01, Alvaro Herrera wrote: On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote: in the past we have faced a couple of problems with corrupted system tables. this seems to be a version independent problem which occurs on hackers' from time to time. i have checked a broken file and i have seen that the corrupted page has actually been zeroed out. IIRC the XFS filesystem zeroes out pages that it recovers from the journal but did not have a fsync on them (AFAIK XFS journals only metadata, so page creation but not the content itself). I don't think this would be applicable to your case, because we do fsync modified files on checkpoint, and rewrite them completely from WAL images after that. But I thought I'd mention it. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Just treat us the way you want to be treated + some extra allowance for ignorance.(Michael Brusser) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] bug #1702: nested composite types in plpgsql
# [EMAIL PROTECTED] / 2005-09-13 12:17:06 -0400: Roman Neuhauser [EMAIL PROTECTED] writes: Looking at src/pl/plpgsql/src/pl_exec.c for the first time, is it a problem of make_tuple_from_row() not accounting for nested composite types? Looks that way. I've committed a fix to HEAD. I'm not sure how hard it'd be to fix 8.0. Thanks for the fast fix, it's really appreciated. I tried to hammer[1] your patch[2] onto the REL8_0_STABLE branch (attached), but am getting ERROR: out of memory DETAIL: Failed on request of size 1073741823. CONTEXT: PL/pgSQL function breakme while storing call arguments into local variables I'm sorry to be a nuisance, but is this interesting enough for someone in the know that they'd backport the patch into 8.0? [1] see attachment [2] http://developer.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/pl_exec.c.diff?r1=1.151;r2=1.152 -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991 Index: src/pl/plpgsql/src/pl_exec.c === RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.127.4.3 diff -u -r1.127.4.3 pl_exec.c --- src/pl/plpgsql/src/pl_exec.c20 Jun 2005 22:51:49 - 1.127.4.3 +++ src/pl/plpgsql/src/pl_exec.c14 Sep 2005 13:04:13 - @@ -3243,8 +3243,7 @@ * * If expectedtypeid isn't InvalidOid, it is checked against the actual type. * - * This obviously only handles scalar datums (not whole records or rows); - * at present it doesn't need to handle PLpgSQL_expr datums, either. + * At present this doesn't handle PLpgSQL_expr or PLpgSQL_arrayelem datums. * * NOTE: caller must not modify the returned value, since it points right * at the stored value in the case of pass-by-reference datatypes. @@ -3864,19 +3863,20 @@ for (i = 0; i natts; i++) { - PLpgSQL_var *var; + Oid fieldtypeid; if (tupdesc-attrs[i]-attisdropped) - continue; /* leave the column as null */ + { + nulls[i] = true;/* leave the column as null */ + continue; + } if (row-varnos[i] 0) /* should not happen */ elog(ERROR, dropped rowtype entry for non-dropped column); - var = (PLpgSQL_var *) (estate-datums[row-varnos[i]]); - if (var-datatype-typoid != tupdesc-attrs[i]-atttypid) + exec_eval_datum(estate, estate-datums[row-varnos[i]], + InvalidOid, fieldtypeid, dvalues[i], nulls[i]); + if (fieldtypeid != tupdesc-attrs[i]-atttypid) return NULL; - dvalues[i] = var-value; - if (!var-isnull) - nulls[i] = ' '; } tuple = heap_formtuple(tupdesc, dvalues, nulls); ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Per-table freeze limit proposal
while you are at it, can you put in some audit timestamps as to when the vacuum occurred (full vs not full). -- Original Message --- From: Alvaro Herrera [EMAIL PROTECTED] To: Hackers pgsql-hackers@postgresql.org Sent: Wed, 14 Sep 2005 22:14:23 -0400 Subject: [HACKERS] Per-table freeze limit proposal Hackers, As you've probably heard too many times already, I'm thinking in improving vacuum, so we can keep track of the freeze Xid on a table level, rather than database level. Hopefully this will eliminate the need for database-wide vacuums. In fact this seems pretty easy to do. Add a field to pg_class, tell VACUUM to update it using the determined freezeLimit, and that's it. (Note that if we ever implement partial vacuum, it won't be able to update the freeze point. But that was true before anyway.) We also need to teach autovacuum to update pg_database.datfreezexid, using the minimum from pg_class. (I don't think it's a good idea to seqscan pg_class to find out the minimum on each VACUUM call.) So, an autovacuum iteration would issue all needed VACUUM/ANALYZE calls, then get the minimum freezexid from pg_class to update pg_database. This way, GetNewTransactionId can continue checking pg_database.datfreezexid as the hard limit for issuing warnings for Xid wraparound. Does anyone see a need for anything other than the autovacuum process to be updating pg_database.datfreezexid? Of course, if autovacuum is not in use, things would continue as now, that is, manual database-wide VACUUM calls updating pg_database.datfreezexid. But note that you can mark all tables as disabled on pg_autovacuum, issue your manuals VACUUM calls as needed (from cron or whatever), and use autovacuum to set pg_database.datfreezexid -- so autovacuum would in fact do nothing except set the freeze limit. The problem is, this seems so awfully simple that I fear I am missing something ... Otherwise, does this sound like a plan? -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com The easiest way to resolve [trivial code guidelines disputes] is to fire one or both of the people involved. (Damian Conway) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq --- End of Original Message --- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] pgplsql temporary tables
Hi, i didn't know where to send this, so forgive me if this isn't the right list. In our company we had been using PostgreSQL for development for more than 2 years now, it's an excelent RDBMS, the only thing we miss is the ability to call storedprocedures (functions returning records in PG) with a simplier syntax, but at last we get used to... one problem that we all knowis the compiled pl/pgsql script that "remembers" the OID of temporary tables, and the alternative that PG devs recommend is using EXECUTE() to execute "dinamically" the query inside the function, this is tedious because for queries that uses the parameters of the function you have to concatenate them, and sometimes you can spend more time in debugging concatenation issues than the query itself. So,since thePREPARE/EXECUTE (would be a little easier to code if supported) don't work in pl/pgsql and the OIDs are remembered, we develop a function that recompiles another function, i attach it here for your review. This way before calling a function with temporary tables we call "select recompilar('function_name')" and then call the actual function, that once recompiled works well. This is equivalent to recreate the function froma command line. We know this is not the right way to do it and it affects the performance since PG must recompile the function script, but for functions that aren't called frecuently and are dificult to codeusing concatenatios this is a good approach. This version only works on PostgreSQL 8 since it looks for argument names in the declaration of the function, you can remove the references to proargnames and should work on 7.x too. Best regards. -- Lic. Maximiliano Di Rienzo IT Manager M.P.: 2502 - MCP ID: 2725911 [EMAIL PROTECTED] http://www.fulfill-tech.com Tel.: 351 4283419 - Cel.: 351 155901603 Fulfill Technology S.A. 27 de Abril 424 - Piso3 Of. A -Córdoba Atención: Este mail es confidencial. En el caso que usted no sea el destinatario, no esta autorizado a reproducir o divulgar a terceros el contenido de este mensaje. Si usted lo ha recibido por error, por favor informenos inmediatamente devolviendo el correo electrónico y borrando el documento.Attention: This E-Mail is confidential. If you are not the intended recipient, you must not copy, disclose or use its contents. If you have received it in error, please inform us immediately by return E-Mail and delete the document.Atenção: Esta mensagem, e qualquer de seus anexos, eh confidencial e privilegiada. Caso voce nao seja o destinatario, nao esta autorizado a reproduzir ou divulgar a terceiros o conteudo desta mensagem e de qualquer anexo da mesma e deve apagar com os seus respectivos anexos. Aufmerksamkeit: Dieses E-Mail ist vertraulich. Wenn Sie nicht der rechtmaessige Empfaenger sind, duerfen Sie den Inhalt weder kopieren, verbreiten oder benutzen. Sollten Sie dieses E-Mail versehentlich erhalten haben, senden Sie es bitte an uns zurueck und loeschen es anschliessend. Recompilar.sql Description: Binary data ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pgplsql temporary tables
Hi, i didn't know where to send this, so forgive me if this isn't the right list. probably general. compiled pl/pgsql script that remembers the OID of temporary tables, Have you considered that if the temp table is created outside of the function that uses it, this is not a problem? Make one function, init_temp_tables() which is called after connection. It creates all your tables so they are persistent between calls. (FWIW, your function is clever). Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Per-table freeze limit proposal
On Wed, Sep 14, 2005 at 11:30:52PM -0400, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: In fact this seems pretty easy to do. Add a field to pg_class, tell VACUUM to update it using the determined freezeLimit, and that's it. I think that it'd be worth fixing things so that the recorded value is not the freeze cutoff value (as now), but the actual lowest not-frozen XID present anywhere in the table. Cool. I wonder if the exact figure should be min(lowest non-frozen Xid in table, GetOldestXmin(false)) just in case a long-running transaction inserts a new tuple after the vacuum is done. Also GetOldestXmin should be the value used for empty tables. For shared relations, we'd use GetOldestXmin(true). Also, in light of this, it seems a bad idea to use the name freezexid for the pg_class column; I would name it relminxid or something like that (suggestions welcome). Not sure about renaming the pg_database column -- I don't see why not. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com La Primavera ha venido. Nadie sabe como ha sido (A. Machado) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] pg_autovacuum settings not saved on dump
Hi, Robert Treat reminded me the other day that we don't currently save pg_autovacuum settings on pg_dump. This is expected, because we don't want to dump pg_autovacuum as a regular table, or it would force us to accept loading that forever; nor we do have ALTER TABLE commands to do it on a higher level, because there wasn't agreement that it should be implemented this early on autovacuum's life. Expected or not, fact is it's not user friendly. We should at least document this somewhere so users can take care of it by themselves. Not sure where does it belong though. The autovacuum section, the backup section? Does it merit a mention on the release notes? Suggestions welcome. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com ¿Que diferencia tiene para los muertos, los huérfanos, y aquellos que han perdido su hogar, si la loca destrucción ha sido realizada bajo el nombre del totalitarismo o del santo nombre de la libertad y la democracia? (Gandhi) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Per-table freeze limit proposal
Alvaro Herrera [EMAIL PROTECTED] writes: Cool. I wonder if the exact figure should be min(lowest non-frozen Xid in table, GetOldestXmin(false)) Actually just min(lowest Xid in table, RecentXmin). You only need to be sure there are no running transactions older than what you put into the field; their xmins are not at issue. Also, in light of this, it seems a bad idea to use the name freezexid for the pg_class column; I would name it relminxid or something like that (suggestions welcome). Works for me. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum settings not saved on dump
Alvaro Herrera [EMAIL PROTECTED] writes: Expected or not, fact is it's not user friendly. We should at least document this somewhere so users can take care of it by themselves. Not sure where does it belong though. The autovacuum section, the backup section? Does it merit a mention on the release notes? Wherever we document that those settings exist at all (which is not real prominent AFAIR). regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom Lane wrote: I guess what this means is that there's no real problem with losing the cache line while manipulating the LWLock, which is what the patch was intended to prevent. Instead, we're paying for swapping two cache lines (the spinlock and the LWLock) across processors instead of just one line. But that should at worst be a 2x inflation of the time previously spent in LWLockAcquire/Release, which is surely not yet all of the application ;-). Why the heck is this so bad? Should we expect that apparently minor changes in shared data structures might be costing equivalently huge penalties in SMP performance elsewhere? Unfortunately I don't have root on the Opteron and can't run oprofile. But I'd really like to see some oprofile stats from these two cases so we can figure out what in the world is going on here. Can anyone help? I will try the patch here and see if it gives the same result. If so I could try to run with oprofile if you can give me a quick start. Best Regards, Michael Paesold ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
I wrote: I guess what this means is that there's no real problem with losing the cache line while manipulating the LWLock, which is what the patch was intended to prevent. Instead, we're paying for swapping two cache lines (the spinlock and the LWLock) across processors instead of just one line. But that should at worst be a 2x inflation of the time previously spent in LWLockAcquire/Release, which is surely not yet all of the application ;-). Why the heck is this so bad? Should we expect that apparently minor changes in shared data structures might be costing equivalently huge penalties in SMP performance elsewhere? I did some oprofile work and found that the cost seems to be because (1) there's an increase in spinlock contention (more time spent in s_lock), and (2) there's more time spent in LWLockAcquire/Release. I'm not entirely clear about the reason for (1), but (2) apparently is because of the extra cache line swapping, as posited above. In the oprofile trace it's clear that the extra cost comes exactly in the statements that touch fields of the shared LWLock, which are the places where you might have to wait to acquire a cache line. I thought for a bit that the problem might come from having chosen to put a pointer to the spinlock into each LWLock; fetching that pointer implies an additional access to the contended cache line. However, changing the data structure to a simple linear array of spinlocks didn't help. So that whole idea seems to have died a painful death. One other interesting result is that the data structure change neither helped nor hurt on an EM64T machine with 2 physical (4 logical) processors. This is also x86_64, but evidently the caching behavior is totally different from Opterons. One thing that did seem to help a little bit was padding the LWLocks to 32 bytes (by default they are 24 bytes each on x86_64) and ensuring the array starts on a 32-byte boundary. This ensures that we won't have any LWLocks crossing cache lines --- contended access to such an LWLock would probably incur the sort of large penalty seen above, because you'd be trading two cache lines back and forth not one. It seems that the important locks are not split that way in CVS tip, because the gain wasn't much, but I wonder whether some effect like this might explain some of the unexplainable performance changes we've noticed in the past (eg, in the dbt2 results). A seemingly unrelated small change in the size of other data structures in shared memory might move things around enough to make a performance-critical lock cross a cache line boundary. On regular x86, the LWLock struct is by chance exactly 16 bytes, so there's no alignment issue. But 64-bit platforms and platforms where slock_t is bigger than char are exposed to this risk. I'm going to go ahead and make that change, since it doesn't seem likely to have any downside. It might be interesting to look into forcing proper alignment of the shared buffer headers as well. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] US Census database (Tiger 2004FE) - 4.4G
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote: You can send it to me, and ehpg will host it. I'll send you a private email with my info. Gavin On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote: It is 4.4G in space in a gzip package. I'll mail a DVD to two people who promise to host it for Hackers. I'm wondering if this is now available for consumption by the rest of us?? (ie what's the link) -- Darcy Buskermolen Wavefire Technologies Corp. http://www.wavefire.com ph: 250.717.0200 fx: 250.763.1759 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On Thu, 15 Sep 2005, Tom Lane wrote: One thing that did seem to help a little bit was padding the LWLocks to 32 bytes (by default they are 24 bytes each on x86_64) and ensuring the array starts on a 32-byte boundary. This ensures that we won't have any LWLocks crossing cache lines --- contended access to such an LWLock would probably incur the sort of large penalty seen above, because you'd be trading two cache lines back and forth not one. It seems that the important locks are not split that way in CVS tip, because the gain wasn't much, but I wonder whether some effect like this might explain some of the unexplainable performance changes we've noticed in the past (eg, in the dbt2 results). A seemingly unrelated small change in the size of other data structures in shared memory might move things around enough to make a performance-critical lock cross a cache line boundary. What about padding the LWLock to 64 bytes on these architectures. Both P4 and Opteron have 64 byte cache lines, IIRC. This would ensure that a cacheline doesn't hold two LWLocks. Gavin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom Lane wrote I'm going to go ahead and make that change, since it doesn't seem likely to have any downside. It might be interesting to look into forcing proper alignment of the shared buffer headers as well. Just catching up on your mails - all of that sounds good so far. Everything mentioned so far talks about spinlocks in the general sense, rather than with respect to particular locks. The different lock types we have are held for different amounts of time and are also subject to differing amounts of contention. I think it would be useful to put in a capability to tune each class of lock according to the possibility for delay and contention on it. Long delay, low contention e.g. CheckpointLock = waiter should sleep immediately Medium delay, high contention e.g. WALWriteLock = waiter spins if at head of queue, else sleeps immediately Short delay, high contention e.g. BufMappingLock = waiter spins and retries forever, cos the lock is coming soon Short delay, low contention, sometimes long waits at end of VACUUM e.g. FreeSpaceLock = waiter spins, then eventually sleeps I'm not sure whether you'll agree with my characterisation of these locks, but the main thing I'm trying to get across is that once-size-fits-all isn't the optimal approach. I'm not saying either that we end up with individual characterisations for each lock type, but a few key ones could be catered for differently. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Gavin Sherry [EMAIL PROTECTED] writes: What about padding the LWLock to 64 bytes on these architectures. Both P4 and Opteron have 64 byte cache lines, IIRC. This would ensure that a cacheline doesn't hold two LWLocks. I tried that first, actually, but it was a net loss. I guess enlarging the array that much wastes too much cache space. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Beta2 Wrap Up ...
Tomorrow afternoon, we are planning on packaging up Beta2 .. if anyone is sitting on something that should get in before that happens, or has a bug they are sitting on, please let us know ... I am planning on wrapping things at around noon my time (~3pm GMT, I believe, if I have my timezones right) ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On Thu, 15 Sep 2005, Tom Lane wrote: Gavin Sherry [EMAIL PROTECTED] writes: What about padding the LWLock to 64 bytes on these architectures. Both P4 and Opteron have 64 byte cache lines, IIRC. This would ensure that a cacheline doesn't hold two LWLocks. I tried that first, actually, but it was a net loss. I guess enlarging the array that much wastes too much cache space. Interesting. On Xeon (2 phys, 4 log), with LWLock padded to 64 bytes and the cmpb/jump removed I get: [EMAIL PROTECTED] pgsqlpad]$ for i in 1 2 4; do time ./nrun.sh $i; done real0m54.362s user0m0.003s sys 0m0.009s real1m9.788s user0m0.011s sys 0m0.013s real2m8.870s user0m0.016s sys 0m0.028s [EMAIL PROTECTED] pgsqlpad]$ for i in 1 2 4; do time ./nrun.sh $i; done real0m55.544s user0m0.006s sys 0m0.007s real1m9.313s user0m0.007s sys 0m0.018s real2m1.769s user0m0.017s sys 0m0.027s This compares to the following, which is unpadded but has cmpb/jump removed but is otherwise vanilla: 1: 55: 2: 111: 4: 207 The decrease is small, but it's there. Gavin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Gregory Maxwell [EMAIL PROTECTED] writes: might be useful to align the structure so it always crosses two lines and measure the performance difference.. the delta could be basically attributed to the cache line bouncing since even one additional bounce would overwhelm the other performance effects from the changed alignment. Good idea. I goosed the struct declaration and setup code to arrange that the BufMappingLock's spinlock and the rest of its data were in different cache lines instead of the same one. The results (still on Red Hat's 4-way Opteron): previous best code (slock-no-cmpb and spin-delay-2): 1 31s 2 42s 4 51s 8 100s with LWLock padded to 32 bytes and correctly aligned: 1 31s 2 41s 4 51s 8 97s with LWLocks 32 bytes, but deliberately misaligned: 1 30s 2 50s 4 102s 8 200s There is no other reason than having to touch multiple cache lines for the second and third cases to be different: the array indexing code should be exactly the same. These last numbers are pretty close to what I got from the separated-spinlock patch: 1 31s 2 52s 4 106s 8 213s So it seems there's no doubt that it's the doubled cache traffic that was causing most of the problem there. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Beta2 Wrap Up ...
On Thu, 2005-15-09 at 21:09 -0300, Marc G. Fournier wrote: Tomorrow afternoon, we are planning on packaging up Beta2 .. if anyone is sitting on something that should get in before that happens, or has a bug they are sitting on, please let us know ... One change that I would like to get into beta2 is the proposed refactoring of some of the new system info / administration functions. Since these will require an initdb, it would be nice to get them into the tree before beta2 (since we're requiring an initdb for beta2 anyway). I've got the patch finished, but I'm just waiting for a consensus on the right API for some of these functions (see the -hackers thread on the subject). I'll check something into CVS by this evening EST... -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum settings not saved on dump
On Thursday 15 September 2005 15:41, Alvaro Herrera wrote: Hi, Robert Treat reminded me the other day that we don't currently save pg_autovacuum settings on pg_dump. snip Does it merit a mention on the release notes? ISTM this is a backwards incompatibility with previous installations that might have used autovacuum from contrib, so I think a mention of it is warrented in the release notes (with a recomendation on how to port your old autovacuum settings to the new system if anyone feels up to the task). -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Beta2 Wrap Up ...
Neil Conway [EMAIL PROTECTED] writes: One change that I would like to get into beta2 is the proposed refactoring of some of the new system info / administration functions. I thought we'd more or less dropped that idea based on Andreas' responses. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Gavin Sherry [EMAIL PROTECTED] writes: Interesting. On Xeon (2 phys, 4 log), with LWLock padded to 64 bytes and the cmpb/jump removed I get: [ 1 55s 2 69s 4 128s ] This compares to the following, which is unpadded but has cmpb/jump removed but is otherwise vanilla: 1: 55: 2: 111: 4: 207 Hmm, that's pretty significant. I tried it on a Xeon EM64T (thanks to Stephen Frost for providing access), also 2 phys 4 log, and get: Yesterday's CVS tip: 1 32s 2 46s 4 88s 8 168s plus no-cmpb and spindelay2: 1 32s 2 48s 4 100s 8 177s plus just-committed code to pad LWLock to 32: 1 33s 2 50s 4 98s 8 179s alter to pad to 64: 1 33s 2 38s 4 108s 8 180s I don't know what to make of the 2-process time going down while 4-process goes up; that seems just weird. But both numbers are repeatable. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Beta2 Wrap Up ...
On Thu, 2005-15-09 at 22:31 -0400, Tom Lane wrote: I thought we'd more or less dropped that idea based on Andreas' responses. I've heard no argument against renaming pg_complete_relation_size() to pg_total_relation_size() and changing the functions that return an integer status code to make them return a boolean (but I'm content with not making them return void and report errors via elog). -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum settings not saved on dump
Robert Treat [EMAIL PROTECTED] writes: On Thursday 15 September 2005 15:41, Alvaro Herrera wrote: Robert Treat reminded me the other day that we don't currently save pg_autovacuum settings on pg_dump. ISTM this is a backwards incompatibility with previous installations that might have used autovacuum from contrib, How do you figure that? There was no corresponding capability at all in previous versions, and pg_dump certainly didn't save any of the contrib autovacuum daemon's arguments for you. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
On 9/15/05, Tom Lane [EMAIL PROTECTED] wrote: Yesterday's CVS tip: 1 32s 2 46s 4 88s 8 168s plus no-cmpb and spindelay2: 1 32s 2 48s 4 100s 8 177s plus just-committed code to pad LWLock to 32: 1 33s 2 50s 4 98s 8 179s alter to pad to 64: 1 33s 2 38s 4 108s 8 180s I don't know what to make of the 2-process time going down while 4-process goes up; that seems just weird. But both numbers are repeatable. It is odd. In the two process case there is, assuming random behavior, a 1/2 chance that you've already got the right line, but in the 4 process case only a 1/4 chance (since we're on a 4 way box). This would explain why we don't see as much cost in the intentionally misaligned case. You'd expect the a similar pattern of improvement with the 64byte alignment (some in the two process case, but more in the 4 case), but here we see more improvement in the two way case. If I had to guess I might say that the 64byte alignment is removing much of the unneeded line bouncing in the the two process case but is at the same time creating more risk of bouncing caused by aliasing. Since two processes have 1/2 chance the aliasing isn't a problem so the change is a win, but in the four process case it's no longer a win because with aliasing there is still a lot of fighting over the cache lines even if you pack well, and the decrease in packing makes odd aliasing somewhat more likely. This might also explain why the misaligned case performed so poorly in the 4process case, since the misalignment didn't just increase the cost 2x, it also increased the likelihood of a bogus bounce due to aliasing.. If this is the case, then it may be possible through very careful memory alignment to make sure that no two high contention locks that are likely to be contended at once share the same line (through either aliasing or through being directly within the same line). Then again I could be completely wrong, my understanding of multiprocessor cache coherency is very limited, and I have no clue how cache aliasing fits into it... So the above is just uninformed conjecture. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Gregory Maxwell [EMAIL PROTECTED] writes: If I had to guess I might say that the 64byte alignment is removing much of the unneeded line bouncing in the the two process case but is at the same time creating more risk of bouncing caused by aliasing. It's an idea ... not sure if it's right or not. One thing I have noticed both on Xeon HT (2-physical-4-logical) and on Opteron (4 real processors) is that the 2-process times are significantly more variable than the 1, 4, or 8-process times: repeating the measurements shows variance around the 10% level for 2 processes but only around 1% for the others. What I think this means is that the kernel is scheduling the 2 processes onto 2 processors chosen-at-random, without awareness of whether those two processors are on the same chip (in the Xeon case) or have closer NUMA affinity (in the Opteron case). The other test cases are all symmetric and there's no way for the kernel to blow it too badly, but in the 2-process case it will be very visible whether the kernel understands the hardware or not. And the impression I have (which might be out of date) is that current Linux kernel releases are not very bright about these things. How does that tie into the point at hand about different results for 64-byte vs 32-byte LWLocks? Not sure, but I feel it's related somehow. Anyway, it'd be interesting to get some test results for these things on SMP machines running non-Linux kernels. Also, we ought to be more rigorous about reporting exactly which kernel we are using for any particular test. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Request to clarify on sql_numeric datatype
Title: Message Hi, I have a query on sql_numeric datatype Can you please clarify on this? While inserting numeric data type into a table can you please let me know how you are storing the value in the database. Whether it is stored as double or character or directly storing it as structure. I am implementing the slq_numeric datatype feature for our database and I am stuck up how to store the value in the database. Can you please help out regarding this. Thanks in advance. Thanks and Regards, Raghavendra
Re: [HACKERS] Beta2 Wrap Up ...
On Thu, 2005-15-09 at 22:06 -0400, Neil Conway wrote: One change that I would like to get into beta2 is the proposed refactoring of some of the new system info / administration functions. Ok, this is done: the changes have been committed to CVS HEAD and the catalog version number has been bumped. -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly