Re: [HACKERS] Why do we want to %Remove behavior of postmaster -o
Am Samstag, 3. Juni 2006 04:27 schrieb Tom Lane: Actually, the TODO item is very badly worded. The idea is to get rid of the spelling differences between postmaster and postgres options, and then there will be no need for '-o' because you'll just say what you want --- that is, -o -foo and -foo will be interchangeable. This is already done. I suppose that the idea was that the -o option should be phased out over a couple of releases. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint
I have a table that I am using to store email token data for DSPAM. I'm noticing that a handful (4-16) of rows with duplicate columns (uid,token) are sneaking into the table every day despite the primary key constraint. The server currently processes a few thousand emails per day, and this particular table currently has about 4.5 million rows in it. I feel as though I must be missing something here, because I have always strongly assumed that postgresql prevents this sort of chaos from happening by default. When these duplicate pairs make their way into the table, all havoc breaks loose ;) The rows with the duplicate pairs seem to become land mines. The postgresql process handling a query that subsequently runs across one of these rows dies, taking down the DSPAM daemon with it, and sometimes corrupting the postgresql shared memory enough that the postmaster has to shutdown the other processes and restart everything anew [1]. I am usually able to clean things up by running the following, but once or twice I've had to drop the unique constraint before postgresql would process the request without choking: delete from dspam_token_data where row(uid,token) in (select uid,token from dspam_token_data group by uid,token having count(*) 1); (I don't worry about preserving one of the duplicates here.) I'm running postgresql-8.1.3. Here is the table in question: CREATE TABLE dspam.dspam_token_data ( uid int4 NOT NULL, token int8 NOT NULL, spam_hits int4, innocent_hits int4, last_hit date, CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token) ) WITHOUT OIDS; [2] What steps can I take to fix this? Is there any additional information I can provide? I've cleaned the table many times now. I then run VACUUM ANALYZE. My next step will probably be to hack the DSPAM sources to make the application more careful about not trying to insert rows that would violate the unique constraint. Even still, it seems that would only reduce the frequency of these occurrences, not eliminate them completely. Thanks! Cheers, -- Travis Notes: [1] A condensed log file showing off exactly what happens here is attached. [2] Previously, the table lacked a primary key and instead used a unique constraint and index. This yielded the exact same results I am currently seeing using a two-column primary key, as above. The old table schema was: CREATE TABLE dspam.dspam_token_data ( uid int4, token int8, spam_hits int4, innocent_hits int4, last_hit date, CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token) ) WITHOUT OIDS; Jun 5 17:58:14 shadow postgres[28775]: [21-1] PANIC: right sibling's left-link doesn't match Jun 5 17:58:14 shadow postgres[28775]: [21-2] STATEMENT: PREPARE dspam_update_plan (bigint) AS UPDATE dspam_token_data SET last_hit = CURRENT_DATE, innocent_hits = Jun 5 17:58:14 shadow postgres[28775]: [21-3] innocent_hits + 1 WHERE uid = '1' AND token = $1;PREPARE dspam_insert_plan (bigint, int, int) AS INSERT INTO dspam_token_data Jun 5 17:58:14 shadow postgres[28775]: [21-4] (uid, token, spam_hits, innocent_hits, last_hit) VALUES (1, $1, $2, $3, CURRENT_DATE); Jun 5 17:58:14 shadow dspam[7780]: query error: (null): see sql.errors for more details Jun 5 17:58:14 shadow postgres[10566]: [21-1] LOG: server process (PID 28775) was terminated by signal 6 Jun 5 17:58:14 shadow postgres[10566]: [22-1] LOG: terminating any other active server processes Jun 5 17:58:14 shadow postgres[7219]: [21-1] WARNING: terminating connection because of crash of another server process Jun 5 17:58:14 shadow postgres[7219]: [21-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Jun 5 17:58:14 shadow postgres[7219]: [21-3] process exited abnormally and possibly corrupted shared memory. Jun 5 17:58:14 shadow postgres[7525]: [24-1] LOG: database system was interrupted at 2006-06-05 17:58:01 UTC Jun 5 17:58:14 shadow postgres[7525]: [25-1] LOG: checkpoint record is at 16/DA1D6868 Jun 5 17:58:14 shadow postgres[7525]: [26-1] LOG: redo record is at 16/DA0081D8; undo record is at 0/0; shutdown FALSE Jun 5 17:58:14 shadow postgres[7525]: [27-1] LOG: next transaction ID: 5691748; next OID: 148615 Jun 5 17:58:14 shadow postgres[7525]: [28-1] LOG: next MultiXactId: 4; next MultiXactOffset: 10 Jun 5 17:58:14 shadow postgres[7525]: [29-1] LOG: database system was not properly shut down; automatic recovery in progress Jun 5 17:58:14 shadow postgres[7525]: [30-1] LOG: redo starts at 16/DA0081D8 Jun 5 17:58:14 shadow postgres[7525]: [31-1] LOG: record with zero length at 16/DC2858F8 Jun 5 17:58:14 shadow postgres[7525]: [32-1] LOG: redo done at 16/DC2858C8 Jun 5 17:58:16 shadow postgres[7542]: [24-1] FATAL: the database system is starting up Jun 5 17:58:16 shadow postgres[7543]: [24-1] FATAL: the database
[HACKERS] AIX check in datetime.h
Can someone please explain why in include/utils/datetime.h (struct datetkn) there is a check for _AIX that either initializes a char* pointer or a char array? Is there any advantage of a char-array except for warnings of some compilers if the initilization string is too long? Apart from that I doubt that AIX cannot handle token[TOKMAXLEN] because similar declarations can be found in other headers without the _AIX check. The struct definition is more than 9 years old and seems to show up first in Attic/dt.h. #define TOKMAXLEN 10 /* only this many chars are stored in * datetktbl */ /* keep this struct small; it gets used a lot */ typedef struct { #if defined(_AIX) char *token; #else chartoken[TOKMAXLEN]; #endif /* _AIX */ chartype; charvalue; /* this may be unsigned, alas */ } datetkn; Joachim ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] fillfactor using WITH syntax
On Tue, 2006-06-06 at 11:45 +0900, ITAGAKI Takahiro wrote: Hi Hackers, I'm rewriting fillfactor patch, per the following discussion, http://archives.postgresql.org/pgsql-hackers/2006-03/msg00287.php Now fillfactor can be set using WITH syntax: - CREATE INDEX index ON table USING btree (columns) WITH (...) - CREATE TABLE table (i integer PRIMARY KEY WITH (...)) - ALTER TABLE table ADD PRIMARY KEY (columns) WITH (...) Sounds good. This is important in other situations too, e.g. http://archives.postgresql.org/pgsql-hackers/2005-09/msg00851.php The settings are stored on pg_class.relfillfactor and the last value will be used on next REINDEX. WITH parameter is a list of DefElems, so we can use it to pass additional parameters to index access methods. Are you implementing the array of parameters on pg_index as Tom suggested or pg_class.relfillfactor? Why not implement an array of option parameters on pg_class, so both heaps and indexes can be given additional parameters? That way you wouldn't need a specific relfillfactor attribute. That would allow us to keep CREATE TABLE free of additional keywords also. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] How to avoid transaction ID wrap
OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. We want to keep about a years worth of data at any specific time. We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. The numbers I have amount to 466,560,000 transactions per month, lasting a maximum of about 9 months until XID wrap. I am thinking about a few work arounds, BEGIN/COMMIT to reduce the number of transactions, COPY, etc. so I'm not dead in the water, but I would be interested in any observations yo may have. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much
Ühel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane: Neil Conway [EMAIL PROTECTED] writes: On Mon, 2006-06-05 at 19:17 +0200, Zoltan Boszormenyi wrote: The general case cannot be applied for all particular cases. E.g. you cannot use cursors from shell scripts This could be fixed by adding an option to psql to transparently produce SELECT result sets via a cursor. I think this is an excellent idea. psql --cursor --fetchby 1 -c select ... | myprogram Note of course that such a thing would push the incomplete-result problem further upstream. For instance in (hypothetical --cursor switch) psql --cursor -c select ... | myprogram there would be no very good way for myprogram to find out that it'd been sent an incomplete result due to error partway through the SELECT. would it not learn about it at the point of error ? even without --cursor there is still no very good way to find out when something else goes wrong, like the result inside libpq taking up all memory and so psql runs out of memory on formatting some longer lines. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] How to avoid transaction ID wrap
On Tue, 6 Jun 2006, Mark Woodward wrote: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. I would suggest doing all the inserts of one frame in one transaction. Maybe even multiple frames in one transaction. That should bring down the number of transactions significantly. We want to keep about a years worth of data at any specific time. We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. Why not? The numbers I have amount to 466,560,000 transactions per month, lasting a maximum of about 9 months until XID wrap. If you can get that maximum up above one year (which was how long you want to keep the data), you won't need to freeze the records to avoid ID wraparound. - Heikki ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] How to avoid transaction ID wrap
Ühel kenal päeval, T, 2006-06-06 kell 08:42, kirjutas Mark Woodward: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. We want to keep about a years worth of data at any specific time. partition by month, then you have better chances of removing old data without causing overload/data loss; We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. why ? just run it with very friendly delay settings. The numbers I have amount to 466,560,000 transactions per month, lasting a maximum of about 9 months until XID wrap. actually 4.5 months as you will start having problems at 2G xacts. I am thinking about a few work arounds, BEGIN/COMMIT to reduce the number of transactions, COPY, etc. so I'm not dead in the water, but I would be interested in any observations yo may have. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] How to avoid transaction ID wrap
Mark Woodward wrote: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. Maybe if you grouped multiple operations on bigger transactions, the I/O savings could be enough to buy you the ability to vacuum once in a while. Or consider buffering somehow -- save the data elsewhere, and have some sort of daemon to put it into the database. This would allow to cope with the I/O increase during vacuum. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why do we want to %Remove behavior of postmaster -o
Peter Eisentraut [EMAIL PROTECTED] writes: Am Samstag, 3. Juni 2006 04:27 schrieb Tom Lane: Actually, the TODO item is very badly worded. The idea is to get rid of the spelling differences between postmaster and postgres options, and then there will be no need for '-o' because you'll just say what you want --- that is, -o -foo and -foo will be interchangeable. This is already done. Oh, right. The patch you still have uncommitted had to do with getting rid of the separate postmaster and postgres executables, right? Is that going anywhere, or did you decide it's not worth the trouble? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much
Hannu Krosing [EMAIL PROTECTED] writes: Ãhel kenal päeval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane: Note of course that such a thing would push the incomplete-result problem further upstream. For instance in (hypothetical --cursor switch) psql --cursor -c select ... | myprogram there would be no very good way for myprogram to find out that it'd been sent an incomplete result due to error partway through the SELECT. would it not learn about it at the point of error ? No, it would merely see EOF after some number of result rows. (I'm assuming you're also using -A -t so that the output is unadorned.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
On Mon, Jun 05, 2006 at 11:02:33PM -0400, Tom Lane wrote: Just got this rather surprising result: snip bogus explain output The Total runtime is correct AFAICT, which puts the top node's actual time rather far out in left field. This is pretty repeatable on my old slow HPPA machine. A new Xeon shows less of a discrepancy, but it's still claiming top node actual total, which is not right. Wierd. Can you get the output of *instr in each call of InstrEndLoop? Preferably after it does the calculation but before it clears the values. So we get an idea of number of samples and what it guesses. SampleOverhead would be good too. I know my version produced sensible results on my machine and the handful of people testing, so I'll try it again with your changes, see how it looks... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint
Travis Cross [EMAIL PROTECTED] writes: I'm noticing that a handful (4-16) of rows with duplicate columns (uid,token) are sneaking into the table every day despite the primary key constraint. Corrupt index, looks like ... you might try reindexing the index. I don't believe that the PANIC you show has anything directly to do with duplicate entries. It is a symptom of corrupt index structure. Now a corrupt index might also explain failure to notice duplications, but changing your application isn't going to fix whatever is causing it. You need to look for server-side causes. Any database or system crashes on this server (before this problem started)? Do you *know* that the disk drive will not lie about write complete? What is the platform and storage system, anyway? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] fillfactor using WITH syntax
Simon Riggs [EMAIL PROTECTED] writes: Why not implement an array of option parameters on pg_class, so both heaps and indexes can be given additional parameters? That way you wouldn't need a specific relfillfactor attribute. That would allow us to keep CREATE TABLE free of additional keywords also. None of this should go anywhere near pg_class. IIRC the solutions we discussed involved adding some sort of array to pg_index. A solution that only works for FILLFACTOR is missing the point, too. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much
On Tue, Jun 06, 2006 at 09:48:43AM -0400, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: ??hel kenal p??eval, E, 2006-06-05 kell 14:10, kirjutas Tom Lane: Note of course that such a thing would push the incomplete-result problem further upstream. For instance in (hypothetical --cursor switch) psql --cursor -c select ... | myprogram there would be no very good way for myprogram to find out that it'd been sent an incomplete result due to error partway through the SELECT. would it not learn about it at the point of error ? No, it would merely see EOF after some number of result rows. (I'm assuming you're also using -A -t so that the output is unadorned.) So if an error occurs partway through reading a cursor, no error message is generated? That certainly sounds like a bug to me... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] psql -A (unaligned format) eats too much
Jim C. Nasby [EMAIL PROTECTED] writes: On Tue, Jun 06, 2006 at 09:48:43AM -0400, Tom Lane wrote: psql --cursor -c select ... | myprogram there would be no very good way for myprogram to find out that it'd been sent an incomplete result due to error partway through the SELECT. So if an error occurs partway through reading a cursor, no error message is generated? That certainly sounds like a bug to me... Sure an error is generated. But it goes to stderr. The guy at the downstream end of the stdout pipe cannot see either the error message, or the nonzero status that psql will (hopefully) exit with. You can theoretically deal with this by having the shell script calling this combination check psql exit status and discard the results of myprogram on failure, but it's not easy or simple. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] COPY (query) TO file
In article [EMAIL PROTECTED], Andrew Dunstan [EMAIL PROTECTED] writes: Mark Woodward wrote: Tom had posted a question about file compression with copy. I thought about it, and I want to through this out and see if anyone things it is a good idea. Currently, the COPY command only copies a table, what if it could operate with a query, as: COPY (select * from mytable where foo='bar') as BAR TO stdout Isn't this already being worked on? The TODO list says: Allow COPY to output from views IIRC Karel Zak posted a patch for that. Another idea would be to allow actual SELECT statements in a COPY. Personally I strongly favor the second option as being more flexible than the first. How so? I see that psql -h somehost somedb -c copy 'somequery' to stdout localfile would be more terse than psql -h somehost somedb -c create temp view tmp as somequery; copy tmp to stdout localfile but what's more flexible there? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] COPY (query) TO file
On Tue, Jun 06, 2006 at 04:47:40PM +0200, Harald Fuchs wrote: In article [EMAIL PROTECTED], Andrew Dunstan [EMAIL PROTECTED] writes: Mark Woodward wrote: Tom had posted a question about file compression with copy. I thought about it, and I want to through this out and see if anyone things it is a good idea. Currently, the COPY command only copies a table, what if it could operate with a query, as: COPY (select * from mytable where foo='bar') as BAR TO stdout Isn't this already being worked on? The TODO list says: Allow COPY to output from views IIRC Karel Zak posted a patch for that. Another idea would be to allow actual SELECT statements in a COPY. Personally I strongly favor the second option as being more flexible than the first. How so? I see that psql -h somehost somedb -c copy 'somequery' to stdout localfile would be more terse than psql -h somehost somedb -c create temp view tmp as somequery; copy tmp to stdout localfile but what's more flexible there? Flexibility aside, doing this via a temporary view is a very non-intuitive way to go about it. AFAIK CREATE TEMP VIEW is also regular DDL, which means more overhead in the system catalogs, along with more need to vacuum. I really fail to see why we shouldn't support copying from a query unless there's some serious technical challenge. If there was a serious technical challange that using a temporary view solved, we should do the work of creating the temporary view for the user. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint
On 6/6/06, Tom Lane [EMAIL PROTECTED] wrote: Travis Cross [EMAIL PROTECTED] writes: I'm noticing that a handful (4-16) of rows with duplicate columns (uid,token) are sneaking into the table every day despite the primary key constraint. Corrupt index, looks like ... you might try reindexing the index. I don't believe that the PANIC you show has anything directly to do with duplicate entries. It is a symptom of corrupt index structure. Now a corrupt index might also explain failure to notice duplications, but changing your application isn't going to fix whatever is causing it. You need to look for server-side causes. Any database or system crashes on this server (before this problem started)? Do you *know* that the disk drive will not lie about write complete? What is the platform and storage system, anyway? FWIW I've seen similar behaviour to this (PostgreSQL processes exiting abnormally, index corruption with duplicate primary keys) on servers with defective RAM chips. Ian Barwick ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] fillfactor using WITH syntax
On Tue, 2006-06-06 at 10:27 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Why not implement an array of option parameters on pg_class, so both heaps and indexes can be given additional parameters? That way you wouldn't need a specific relfillfactor attribute. That would allow us to keep CREATE TABLE free of additional keywords also. None of this should go anywhere near pg_class. IIRC the solutions we discussed involved adding some sort of array to pg_index. Itagaki had suggested adding options to heaps also, so clearly we'd need to add that to pg_class, rather than pg_index in that case. PCTFREE would be useful for heaps as well as indexes, but there could be other options also. Extending the thought for the general case, I see no reason why we would want to permanently exclude heaps from having a more flexible set of options when we aim to provide that for indexes. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
Kenneth Marshall wrote: We have used postgresql 7.4, 8.0, and 8.1 with DSPAM and have never had a single problem like you are describing. In the past on this mailing list, these sorts of issues have been caused by hardware problems on the DB server in some cases. Good luck with tracking it down. Thanks. It is always good to know that something *should* and *does* work well. That gives me a good base for tracking down the issue. Cheers, -- Travis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] AIX check in datetime.h
Wow, that is strange. We could remove it for 8.2 and see how testing goes. --- Joachim Wieland wrote: Can someone please explain why in include/utils/datetime.h (struct datetkn) there is a check for _AIX that either initializes a char* pointer or a char array? Is there any advantage of a char-array except for warnings of some compilers if the initilization string is too long? Apart from that I doubt that AIX cannot handle token[TOKMAXLEN] because similar declarations can be found in other headers without the _AIX check. The struct definition is more than 9 years old and seems to show up first in Attic/dt.h. #define TOKMAXLEN 10 /* only this many chars are stored in * datetktbl */ /* keep this struct small; it gets used a lot */ typedef struct { #if defined(_AIX) char *token; #else chartoken[TOKMAXLEN]; #endif /* _AIX */ chartype; charvalue; /* this may be unsigned, alas */ } datetkn; Joachim ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] AIX check in datetime.h
Bruce Momjian pgman@candle.pha.pa.us writes: Joachim Wieland wrote: Can someone please explain why in include/utils/datetime.h (struct datetkn) there is a check for _AIX that either initializes a char* pointer or a char array? Wow, that is strange. We could remove it for 8.2 and see how testing goes. It looks like a workaround for some ancient compiler problem. [ digs for awhile... ] Very ancient: we inherited that hack from Berkeley, see http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/Attic/nabstime.h I bet we can remove it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] adding new field
hey, let me clarify something. 1. Whenever you run a query, its result or some other thing gets cached. I want to flush this cache? Also let me know when this cache gets flushed automatically. 2. After adding a new field in Resdom structure and making necessary changes in outfuncs.c, createfuncs. etc files, I *sometimes* get garbage value for the field added. The field is initialiazed to zero in makefuns.c 3. **IMP*** Is there something in postgres that, only for N number of fields, of a structure, the memory is allocated, i.e., do I need to modify this N somewhere to reflect the addtion of a new field. And that *weird* thing is: 1. I run some query, didn't get results. 2. Then, I commented the part which I modified [ i.e., commented that new field in Resdom structure], again run the same query, got results. 3. Next, I uncommented that part, and ran the same query again. I GOT THE RESULTS. That's how my prog is behaving, behaving in a *weird* way. It seems like some memory probs somewhere. I can't figure it out. Yes, I'm re-compiling and running initdb whenever I make some modifications in the code.I'm using release 8.03 and due to some reasons can't shift to some other release. Regds,RanbeerOn 6/6/06, Tom Lane [EMAIL PROTECTED] wrote: Martijn van Oosterhout kleptog@svana.org writes: On Mon, Jun 05, 2006 at 11:24:09PM +0530, ranbeer makin wrote: 2. I have added a new field in RESDOM structure, made necessary modifications in outfuncs.c, copy, read, equalfuncs.c but my prog is behaving in a weird way, seems like some memory probs...What other modifications needs to be done to reflect the addition of this new field? Did you remember to recompile *everything* affected? (--enable-depend is useful for this). You also have to initdb again.Also, if you're working on a patch you hope to someday contribute, you should be starting from CVS HEAD or some reasonable approximation of it.Resdom disappeared more than a year ago:http://archives.postgresql.org/pgsql-committers/2005-04/msg00060.php regards, tom lane
Re: [HACKERS] AIX check in datetime.h
I wrote: It looks like a workaround for some ancient compiler problem. [ digs for awhile... ] Very ancient: we inherited that hack from Berkeley, In fact, now that I know where to look, I find the same thing in the postgres-v4r2 tarball, which means the hack is pre-1994. I don't have anything older to look at. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] AIX check in datetime.h
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Joachim Wieland wrote: Can someone please explain why in include/utils/datetime.h (struct datetkn) there is a check for _AIX that either initializes a char* pointer or a char array? Wow, that is strange. We could remove it for 8.2 and see how testing goes. It looks like a workaround for some ancient compiler problem. [ digs for awhile... ] Very ancient: we inherited that hack from Berkeley, see http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/Attic/nabstime.h I bet we can remove it. OK, removed. Let's see if we get failure feedback. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] AGREGATE FUNCTIONS
Hello, I would like to know where in the source-code of postgres is located the code of the aggregate functions min, max, avg. I wish to develop more statistical aggregate functions, and I prefer to use C than to write then in the PL/R. Thanks ___ Navegue com o Yahoo! Acesso Grátis, assista aos jogos do Brasil na Copa e ganhe prêmios de hora em hora! http://br.yahoo.com/artilheirodacopa/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
Tom Lane wrote: Travis Cross [EMAIL PROTECTED] writes: I'm noticing that a handful (4-16) of rows with duplicate columns (uid,token) are sneaking into the table every day despite the primary key constraint. Corrupt index, looks like ... you might try reindexing the index. I probably should have mentioned that I have indeed done a REINDEX on the table a couple of times in the past, suspecting that issue, and having seen it resolve similar issues on this list. Upon your suggestion, I'm running one right now, and I will probably dump and reload the entire database after hours, unless anyone thinks that would be a bad idea (or unproductive in tracking this down). I don't believe that the PANIC you show has anything directly to do with duplicate entries. It is a symptom of corrupt index structure. Now a corrupt index might also explain failure to notice duplications, but changing your application isn't going to fix whatever is causing it. You need to look for server-side causes. Indeed, you are correct. I should also mention that the problem seems to build over time, in the sense that everything will run fine for awhile (a few days), and then will crash repeatedly. Deleting the duplicate rows seems to reset the counter -- of course, I cannot run a successful REINDEX until I have deleted those duplicate rows. Any database or system crashes on this server (before this problem started)? No. In fact, this box, and a sister box running similar hardware, have been models of system stability. My uptimes are 46 and 87 days, respectively, representing the time since I've done a kernel upgrade and the time since I plugged the boxes into the rack. The sister box is running real-time voice services. Do you *know* that the disk drive will not lie about write complete? Know is such a strong word ;) Honestly, I have very little idea. I understand the nature of the problem this presents, as I've read the very fine PostgreSQL manual many times over the years. Because the drives I use are specifically designed to operate well in a RAID environment, I would 'hope' that the drives perform honest write operations. I wonder if there is a utility to perform a deterministic test of this... What is the platform and storage system, anyway? The platform is: Linux 2.6.16.9 (w/o loadable modules) Supermicro PDSMi (a single processor P-D board) 2G ECC DDRII SDRAM The storage system is: On-board SATA ICH7R Controller 2 x WD3200SD hard drives running in a Linux RAID 1 configuration. That is to say: Western Digital 320G SATA 'enterprise' drives. The drives have a somewhat unique feature: time-limited error recovery, which is supposed to let the RAID controller/software deal with errors after a certain point (7 seconds), rather than continuing to block, and causing the drive to fall out of the array. The drive: http://www.westerndigital.com/en/products/products.asp?driveid=114language=en I'll run file system consistency checks tonight to see if I can pick out a proximal cause for all this chaos. I really do appreciate the assistance. Cheers, -- Travis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
Ian Barwick wrote: On 6/6/06, Tom Lane [EMAIL PROTECTED] wrote: Travis Cross [EMAIL PROTECTED] writes: I'm noticing that a handful (4-16) of rows with duplicate columns (uid,token) are sneaking into the table every day despite the primary key constraint. Corrupt index, looks like ... you might try reindexing the index. I don't believe that the PANIC you show has anything directly to do with duplicate entries. It is a symptom of corrupt index structure. Now a corrupt index might also explain failure to notice duplications, but changing your application isn't going to fix whatever is causing it. You need to look for server-side causes. Any database or system crashes on this server (before this problem started)? Do you *know* that the disk drive will not lie about write complete? What is the platform and storage system, anyway? FWIW I've seen similar behaviour to this (PostgreSQL processes exiting abnormally, index corruption with duplicate primary keys) on servers with defective RAM chips. That's a good thought, and also a possibility. I tend to distrust RAM inherently. If nothing else seems verifiable as the cause, I may have to take the system down on a Saturday night for a good thorough run through memtest86. Cheers, -- Travis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] AGREGATE FUNCTIONS
On Tue, 6 Jun 2006, Roberto Rezende de Assis wrote: Hello, I would like to know where in the source-code of postgres is located the code of the aggregate functions min, max, avg. They're in src/backend/utils/adt/numeric.c I wish to develop more statistical aggregate functions, and I prefer to use C than to write then in the PL/R. Make sure you read section 32.10 User-Defined aggregates in the manual: http://www.postgresql.org/docs/current/interactive/xaggr.html Also, take a look at contrib/intagg for another example. - Heikki ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] AGREGATE FUNCTIONS
Roberto Rezende de Assis wrote: Hello, I would like to know where in the source-code of postgres is located the code of the aggregate functions min, max, avg. I wish to develop more statistical aggregate functions, and I prefer to use C than to write then in the PL/R. http://projects.commandprompt.com/public/pgsql/browser/trunk/pgsql/src/backend/utils/adt http://projects.commandprompt.com/public/pgsql/browser/trunk/pgsql/src/backend/utils/adt/numeric.c http://projects.commandprompt.com/public/pgsql/browser/trunk/pgsql/src/backend/utils/adt That will give you and easy interface to view the code and everything after browser is the CVS source tree so you can look for yourself within your copy of HEAD or 8.1 or whatever. Sincerely, Joshua D. Drake Thanks ___ Navegue com o Yahoo! Acesso Grátis, assista aos jogos do Brasil na Copa e ganhe prêmios de hora em hora! http://br.yahoo.com/artilheirodacopa/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] AIX check in datetime.h
In an attempt to throw the authorities off his trail, pgman@candle.pha.pa.us (Bruce Momjian) transmitted: Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Joachim Wieland wrote: Can someone please explain why in include/utils/datetime.h (struct datetkn) there is a check for _AIX that either initializes a char* pointer or a char array? Wow, that is strange. We could remove it for 8.2 and see how testing goes. It looks like a workaround for some ancient compiler problem. [ digs for awhile... ] Very ancient: we inherited that hack from Berkeley, see http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/Attic/nabstime.h I bet we can remove it. OK, removed. Let's see if we get failure feedback. I haven't been monitoring CVS HEAD, but you can be sure this will get tried out when 8.2 gets anywhere vaguely close to relese... -- If this was helpful, http://svcs.affero.net/rm.php?r=cbbrowne rate me http://linuxdatabases.info/info/slony.html Signs of a Klingon Programmer - 8. Debugging? Klingons do not debug. Our software does not coddle the weak. Bugs are good for building character in the user. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] How to avoid transaction ID wrap
Clinging to sanity, [EMAIL PROTECTED] (Hannu Krosing) mumbled into her beard: Ühel kenal päeval, T, 2006-06-06 kell 08:42, kirjutas Mark Woodward: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. We want to keep about a years worth of data at any specific time. partition by month, then you have better chances of removing old data without causing overload/data loss; It's certainly worth something to be able to TRUNCATE an elderly partition; that cleans things out very nicely... We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. why ? just run it with very friendly delay settings. Friendly delay settings can have adverse effects; it is likely to make vacuum run on the order of 3x as long, which means that if you have a very large table that takes 12h to VACUUM, vacuum delay will increase that to 36h, which means you'll have a transaction open for 36h. That'll be very evil, to be sure... The numbers I have amount to 466,560,000 transactions per month, lasting a maximum of about 9 months until XID wrap. actually 4.5 months as you will start having problems at 2G xacts. Right. I am thinking about a few work arounds, BEGIN/COMMIT to reduce the number of transactions, COPY, etc. so I'm not dead in the water, but I would be interested in any observations yo may have. Grouping work together to diminish numbers of transactions is almost always something of a win... -- select 'cbbrowne' || '@' || 'gmail.com'; http://linuxdatabases.info/info/rdbms.html Roses are red, Violets are blue, I'm schizophrenic... And I am too. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Connection Broken with Custom Dicts for TSearch2
Since we are on the topic, is there a timeline/plans for openfts being brought into core? If not, I'll continue my work on bringing it into Gentoo Portage. John On 6/5/06, Oleg Bartunov oleg@sai.msu.su wrote: On Mon, 5 Jun 2006, Teodor Sigaev wrote: Teodor Sigaev wrote: Sorry, it isn't mentioned on page, but this example of code working only with before 8.1 versions. In 8.1 interface to dictionary was changed. Try attached dict_tmpl.c 2Oleg: place file on site, pls done Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] How to avoid transaction ID wrap
On Tue, 2006-06-06 at 13:53 -0400, Christopher Browne wrote: Clinging to sanity, [EMAIL PROTECTED] (Hannu Krosing) mumbled into her beard: Ühel kenal päeval, T, 2006-06-06 kell 08:42, kirjutas Mark Woodward: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. We want to keep about a years worth of data at any specific time. partition by month, then you have better chances of removing old data without causing overload/data loss; It's certainly worth something to be able to TRUNCATE an elderly partition; that cleans things out very nicely... With one potential snafu -- it blocks new SELECTs against the parent table while truncate runs on the child (happens with constraint exclusion as well). If your transactions are short then it won't be an issue. If you have mixed length transactions (many short which the occasional long select) then it becomes tricky since those short transactions will be blocked. We have triggers that fire is something interesting is found on insert. We want this thing to run for a log time. From the numbers, you can see the PostgreSQL database is VERY loaded. Running VACUUM may not always be possible without losing data. why ? just run it with very friendly delay settings. Friendly delay settings can have adverse effects; it is likely to make vacuum run on the order of 3x as long, which means that if you have a very large table that takes 12h to VACUUM, vacuum delay will increase that to 36h, which means you'll have a transaction open for 36h. Sounds like this is almost strictly inserts and selects though. If there is limited garbage collection (updates, deletes, rollbacks of inserts) required then it isn't all that bad. -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] How to avoid transaction ID wrap
Rod Taylor wrote: With one potential snafu -- it blocks new SELECTs against the parent table while truncate runs on the child (happens with constraint exclusion as well). If your transactions are short then it won't be an issue. If you have mixed length transactions (many short which the occasional long select) then it becomes tricky since those short transactions will be blocked. One idea is to try to acquire the lock before issuing the TRUNCATE itself. If the LOCK TABLE times out, you know you should wait for a long-running transaction ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] adding new field
On Tue, Jun 06, 2006 at 09:44:04PM +0530, ranbeer makin wrote: hey, let me clarify something. 1. Whenever you run a query, its result or some other thing gets cached. I want to flush this cache? Also let me know when this cache gets flushed automatically. What cache? Query results are not cached, neither are plans. Please be specific. 2. After adding a new field in Resdom structure and making necessary changes in outfuncs.c, createfuncs. etc files, I *sometimes* get garbage value for the field added. The field is initialiazed to zero in makefuns.c Check for other places in the code that create that structure. makefunc is more convienience than anything else. 3. **IMP*** Is there something in postgres that, only for N number of fields, of a structure, the memory is allocated, i.e., do I need to modify this N somewhere to reflect the addtion of a new field. Postgres uses sizeof() always, so as long as all the files are compiled the same way, everything will work. And that *weird* thing is: 1. I run some query, didn't get results. 2. Then, I commented the part which I modified [ i.e., commented that new field in Resdom structure], again run the same query, got results. 3. Next, I uncommented that part, and ran the same query again. I GOT THE RESULTS. Did you rerun make install and restart the postmaster between each run? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] AIX check in datetime.h
Christopher Browne wrote: I haven't been monitoring CVS HEAD, but you can be sure this will get tried out when 8.2 gets anywhere vaguely close to relese... The whole point of having a buildfarm is that we shouldn't have to wait, we should be able to see very quickly if we have broken something. We currently have AIX coverage for 5.2/ppc with both gcc and (I think) IBM cc. If we need more coverage then feel free to add other AIX machines. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
Martijn van Oosterhout kleptog@svana.org writes: Wierd. Can you get the output of *instr in each call of InstrEndLoop? Preferably after it does the calculation but before it clears the values. So we get an idea of number of samples and what it guesses. SampleOverhead would be good too. The problem looks to be an underestimation of SampleOverhead, and on reflection it's clear why: what CalculateSampleOverhead is measuring isn't the total overhead, but the time between the two gettimeofday calls. Which is probably about half the true overhead. What we ought to do is iterate InstStartNode/InstrStopNode N times, and *separately* measure the total elapsed time spent. It occurs to me that what we really want to know is not so much the total time spent in InstStartNode/InstrStopNode, as the difference in the time spent when sampling is on vs when it is off. I'm not quite sure if the time spent when it's off is negligible. Off to do some measuring ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE
On Tue, Jun 06, 2006 at 12:58:10PM -0400, Travis Cross wrote: Do you *know* that the disk drive will not lie about write complete? Know is such a strong word ;) Honestly, I have very little idea. I understand the nature of the problem this presents, as I've read the very fine PostgreSQL manual many times over the years. Because the drives I use are specifically designed to operate well in a RAID environment, I would 'hope' that the drives perform honest write operations. I wonder if there is a utility to perform a deterministic test of this... Brad from livejournal.com wrote a utility that does just that, thought it requires 2 machines to run the test. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
I wrote: The problem looks to be an underestimation of SampleOverhead, and on reflection it's clear why: what CalculateSampleOverhead is measuring isn't the total overhead, but the time between the two gettimeofday calls. Which is probably about half the true overhead. On further thought, I take that back: the true overhead is not the point here. The time elapsed during a plan node execution when sampling can be broken down into three phases: time before first gettimeofday call time between gettimeofday calls time after second gettimeofday call Only the second part is actually measured by the instrumentation system; the other parts are overhead that has never been counted by EXPLAIN ANALYZE, sampling version or no. Moreover, most of the runtime of InstrStartNode and InstrStopNode falls into the first or third parts. What we would actually like to set SampleOverhead to is the portion of the second-part runtime that doesn't occur when sampling = false. Assuming that gettimeofday() has consistent runtime and the actual time reported is measured at a consistent instant within that runtime, I believe that we should take the SampleOverhead as just equal to the runtime of a single gettimeofday() call. The added or removed second-part time within InstrStartNode is just the tail time of gettimeofday, and the added or removed second-part time within InstrStopNode is basically just the head time of gettimeofday. (To make this as true as possible, we need to change the order of operations so that gettimeofday is invoked *immediately* after the if (sampling) test, but that's easy.) So this line of thought leads to the conclusion that CalculateSampleOverhead is actually overestimating SampleOverhead a bit, and we should simplify it to just time INSTR_TIME_SET_CURRENT(). But that still leaves me with a problem because my machine is clearly overestimating the correction needed. I added some printouts and got raw totaltime = 0.370937 per_iter = 0.000156913, SampleOverhead = 3.28e-06 adj totaltime = 1.82976 sampling = 0 starttime = 0/00 counter = 0/370937 firsttuple = 0.258321 tuplecount = 1 itercount = 10001 samplecount = 704 nextsample = 10011 startup = 0.258321 total = 1.82976 ntuples = 1 nloops = 1 on a run with an actual elapsed time near 750 msec. Clearly the sampling adjustment is wrong, but why? I have a theory about this, and it's not pleasant at all. What I think is that we have a Heisenberg problem here: the act of invoking gettimeofday() actually changes what is measured. That is, the runtime of the second part of ExecProcNode is actually longer when we sample than when we don't, not merely due to the extra time spent in gettimeofday(). It's not very hard to guess at reasons why, either. The kernel entry is probably flushing some part of the CPU's state, such as virtual/physical address mapping for the userland address space. After returning from the kernel call, the time to reload that state shows up as more execution time within the second part. This theory explains two observations that otherwise are hard to explain. One, that the effect is platform-specific: your machine may avoid flushing as much state during a kernel call as mine does. And two, that upper plan nodes seem much more affected than lower ones. That makes sense because the execution cycle of an upper node will involve touching more userspace data than a lower node, and therefore more of the flushed TLB entries will need to be reloaded. If this theory is correct, then the entire notion of EXPLAIN ANALYZE sampling has just crashed and burned. We can't ship a measurement tool that is accurate on some platforms and not others. I'm wondering if it's at all practical to go over to a gprof-like measurement method for taking EXPLAIN ANALYZE runtime measurements; that is, take an interrupt every so often and bump the count for the currently active plan node. This would spread the TLB-flush overhead more evenly and thus avoid introducing that bias. There may be too much platform dependency in this idea, though, and I also wonder if it'd break the ability to do normal gprof profiling of the backend. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] AIX check in datetime.h
Andrew Dunstan [EMAIL PROTECTED] writes: The whole point of having a buildfarm is that we shouldn't have to wait, we should be able to see very quickly if we have broken something. And in fact kookaburra seems happy ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
On Tue, Jun 06, 2006 at 04:50:28PM -0400, Tom Lane wrote: I have a theory about this, and it's not pleasant at all. What I think is that we have a Heisenberg problem here: the act of invoking gettimeofday() actually changes what is measured. That is, the runtime of the second part of ExecProcNode is actually longer when we sample than when we don't, not merely due to the extra time spent in gettimeofday(). It's not very hard to guess at reasons why, either. The kernel entry is probably flushing some part of the CPU's state, such as virtual/physical address mapping for the userland address space. After returning from the kernel call, the time to reload that state shows up as more execution time within the second part. This theory explains two observations that otherwise are hard to explain. One, that the effect is platform-specific: your machine may avoid flushing as much state during a kernel call as mine does. And two, that upper plan nodes seem much more affected than lower ones. That makes sense because the execution cycle of an upper node will involve touching more userspace data than a lower node, and therefore more of the flushed TLB entries will need to be reloaded. If that's the case, then maybe a more sopdisticated method of measuring the overhead would work. My thought is that on the second call to pull a tuple from a node (second because the first probably has some anomolies due to startup), we measure the overhead for that node. This would probably mean doing the following: get start time # I'm not refering to this as gettimeofday to avoid # confusion gettimeofday() # this is the gettimeofday call that will happen during # normal operation get end time Hopefully, there's no caching effect that would come into play from not actually touching any of the data structures after the gettimeofday() call. If that's not the case, it makes measuring the overhead more complex, but I think it should still be doable... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
Jim C. Nasby [EMAIL PROTECTED] writes: If that's the case, then maybe a more sopdisticated method of measuring the overhead would work. I think you missed the point: the time spent in gettimeofday() itself is not the major overhead of EXPLAIN ANALYZE. At least it appears that this is the case on my machine. I'm thinking that interrupt-driven sampling might work OK though. My previous worries were based on trying to use the ITIMER_PROF timer, which might not be portable and would conflict with gprof anyway. But EXPLAIN ANALYZE has always been interested in real time rather than CPU time, so the correct interrupt to use is ITIMER_REAL. That means we only have to share with our own existing usages of that interrupt, which turns it from a portability issue into just a Small Matter Of Programming. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] SERIAL problems?
Hi, I just saw these in the TODO list: o %Disallow changing DEFAULT expression of a SERIAL column? This should be done only if the existing SERIAL problems cannot be fixed. o %Disallow ALTER SEQUENCE changes for SERIAL sequences because pg_dump does not dump the changes What are the existing problems? I am asking because I am experimenting to implement the SQL2003 compliant form for the serial type to be able specify the underlying sequence parameters: SERIAL [ GENERATED [ ALWAYS | BY DEFAULT ] AS IDENTITY ( [ INCREMENT [ BY ] increment ] [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ] [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ] ) ] Best regards, Zoltán Böszörményi ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
On Tue, Jun 06, 2006 at 04:50:28PM -0400, Tom Lane wrote: But that still leaves me with a problem because my machine is clearly overestimating the correction needed. I added some printouts and got raw totaltime = 0.370937 per_iter = 0.000156913, SampleOverhead = 3.28e-06 adj totaltime = 1.82976 sampling = 0 starttime = 0/00 counter = 0/370937 firsttuple = 0.258321 tuplecount = 1 itercount = 10001 samplecount = 704 nextsample = 10011 startup = 0.258321 total = 1.82976 ntuples = 1 nloops = 1 on a run with an actual elapsed time near 750 msec. Clearly the sampling adjustment is wrong, but why? This doesn't make any sense at all. How can a sampling run that only sampled 7% of the actual tuples, end up with a actual measured time that's more than 50% of the actual final runtime? Can you get an estimate of the actual overhead (ie compare wall clock time of EXPLAIN ANALYZE vs actual query). The cost of gettimeofday() is on the order of a few hundred cycles, I'm not sure about the cost of TLB flushes (that's probably highly architechture dependant). To be honest, I wonder about caching effects, but for the disks. The first few cycles of any plan (like an index scan) is going to incur costs that won't happen later on. Because we sample much more heavily at the beginning rather than the end, this will bias towards higher numbers. You should be able to see this by seeing if running queries that don't require disk access fare better. That would suggest a much more careful correction method that works for non-linear timing patterns... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] SERIAL problems?
Zoltan Boszormenyi [EMAIL PROTECTED] writes: What are the existing problems? Please read the archives; this has been discussed recently. There's a lot of disagreement about what ALTER should allow and what pg_dump should do with an altered sequence. I am asking because I am experimenting to implement the SQL2003 compliant form for the serial type to be able specify the underlying sequence parameters: Be aware that the big problem with SQL2003 is that it expects an action at a distance behavior whereby different references to a generator all return the same result if executed within the same query cycle. This makes the construct not equivalent to either nextval() or currval(), but some hybrid with hidden state; and changing of that state would have to tie into core parts of the executor. It looks pretty messy :-( regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
Martijn van Oosterhout kleptog@svana.org writes: This doesn't make any sense at all. How can a sampling run that only sampled 7% of the actual tuples, end up with a actual measured time that's more than 50% of the actual final runtime? AFAICS, the only conclusion is that the sampled executions are in fact taking longer than supposedly-equivalent unsampled ones, and by a pretty good percentage too. I'm growing unsure again about the mechanism responsible for that, however. Can you get an estimate of the actual overhead (ie compare wall clock time of EXPLAIN ANALYZE vs actual query). The cost of gettimeofday() is on the order of a few hundred cycles, I'm not sure about the cost of TLB flushes (that's probably highly architechture dependant). Here's some examples. Keep in mind I've already determined that gettimeofday() takes about 3 usec on this hardware ... regression=# explain analyze select count(*) from (select * from tenk1 a join tenk1 b on a.unique1 = b.unique2 offset 0) ss; QUERY PLAN - Aggregate (cost=2609.00..2609.01 rows=1 width=0) (actual time=797.412..797.416 rows=1 loops=1) - Limit (cost=825.00..2484.00 rows=1 width=488) (actual time=208.208..2576.528 rows=1 loops=1) - Hash Join (cost=825.00..2484.00 rows=1 width=488) (actual time=208.190..2082.577 rows=1 loops=1) Hash Cond: (a.unique1 = b.unique2) - Seq Scan on tenk1 a (cost=0.00..458.00 rows=1 width=244) (actual time=0.082..3.718 rows=1 loops=1) - Hash (cost=458.00..458.00 rows=1 width=244) (actual time=207.933..207.933 rows=1 loops=1) - Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=244) (actual time=0.017..3.583 rows=1 loops=1) Total runtime: 805.036 ms (8 rows) Time: 816.463 ms regression=# select count(*) from (select * from tenk1 a join tenk1 b on a.unique1 = b.unique2 offset 0) ss; count --- 1 (1 row) Time: 816.970 m The actual elapsed time for EXPLAIN ANALYZE seems to jump around quite a bit, probably because of the random variation we're using in sampling interval. This particular combination was unusually close. But in any case, the *actual* overhead of EXPLAIN ANALYZE is clearly pretty small here; the problem is that we're incorrectly extrapolating the measured runtime to the unmeasured executions. What's especially interesting is that the excess time doesn't seem to show up if I form the query in a way that doesn't require pushing as much data around: regression=# explain analyze select count(*) from (select * from tenk1 a join tenk1 b on a.unique1 = b.unique2) ss; QUERY PLAN - Aggregate (cost=1341.00..1341.01 rows=1 width=0) (actual time=212.313..212.317 rows=1 loops=1) - Hash Join (cost=483.00..1316.00 rows=1 width=0) (actual time=88.061..160.886 rows=1 loops=1) Hash Cond: (a.unique1 = b.unique2) - Seq Scan on tenk1 a (cost=0.00..458.00 rows=1 width=4) (actual time=0.071..4.068 rows=1 loops=1) - Hash (cost=458.00..458.00 rows=1 width=4) (actual time=87.862..87.862 rows=1 loops=1) - Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=4) (actual time=0.031..4.780 rows=1 loops=1) Total runtime: 221.022 ms (7 rows) Time: 229.377 ms regression=# select count(*) from (select * from tenk1 a join tenk1 b on a.unique1 = b.unique2) ss; count --- 1 (1 row) Time: 202.531 ms regression=# (Without the OFFSET 0, the planner flattens the subquery and discovers that it doesn't actually need to fetch any of the non-join-key table columns.) Note the only plan nodes showing whacked-out timings are the ones returning wide tuples (large width values). I'm not entirely sure what to make of this. It could be interpreted as evidence for my theory about TLB reloads during userspace data access being the problem. But I'm getting a bit disenchanted with that theory after running the same test case in 8.1: regression=# explain analyze select count(*) from (select * from tenk1 a join tenk1 b on a.unique1 = b.unique2 offset 0) ss; QUERY PLAN -- Aggregate (cost=2609.00..2609.01 rows=1 width=0) (actual time=1033.866..1033.870 rows=1
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
Tom Lane [EMAIL PROTECTED] writes: And two, that upper plan nodes seem much more affected than lower ones. That makes sense because the execution cycle of an upper node will involve touching more userspace data than a lower node, and therefore more of the flushed TLB entries will need to be reloaded. I would have expected the opposite effect. If you only execute one instruction then the cache miss can make it take many times longer than normal. But as the number of instructions grows the cache gets repopulated and the overhead levels off and becomes negligible relative to the total time. The other option aside from gprof-like profiling would be to investigate those cpu timing instructions again. I know some of them are unsafe on multi-cpu systems but surely there's a solution out there. It's not like there aren't a million games, music playing, and other kewl kid toys that depend on accurate low overhead timing these days. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] More thoughts about planner's cost estimates
Tom Lane wrote: One objection to this is that after moving off the gold standard of 1.0 = one page fetch, there is no longer any clear meaning to the cost estimate units; you're faced with the fact that they're just an arbitrary scale. I'm not sure that's such a bad thing, though. It seems to me the appropriate gold standard is Time, in microseconds or milliseconds. The default postgresql.conf can come with a set of hardcoded values that reasonably approximate some real-world system; and if that's documented in the file someone reading it can say hey, my CPU's about the same but my disk subsystem is much faster, so I know in which direction to change things. And another person may say ooh, now I know that my 4GHz machines should have about twice the number here as my 2GHz box. For people who *really* care a lot (HW vendors?), they could eventually make measurements on their systems. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] AIX check in datetime.h
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Andrew Dunstan) wrote: Christopher Browne wrote: I haven't been monitoring CVS HEAD, but you can be sure this will get tried out when 8.2 gets anywhere vaguely close to relese... The whole point of having a buildfarm is that we shouldn't have to wait, we should be able to see very quickly if we have broken something. We currently have AIX coverage for 5.2/ppc with both gcc and (I think) IBM cc. If we need more coverage then feel free to add other AIX machines. I'd very much to add an AIX 5.3 system; that's awaiting some sysadmin activity that is more urgent... -- output = (cbbrowne @ gmail.com) http://linuxdatabases.info/info/internet.html ``What this means is that when people say, The X11 folks should have done this, done that, or included this or that, they really should be saying Hey, the X11 people were smart enough to allow me to add this, that and the other myself.'' -- David B. Lewis [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] That EXPLAIN ANALYZE patch still needs work
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: And two, that upper plan nodes seem much more affected than lower ones. That makes sense because the execution cycle of an upper node will involve touching more userspace data than a lower node, and therefore more of the flushed TLB entries will need to be reloaded. I would have expected the opposite effect. If you only execute one instruction then the cache miss can make it take many times longer than normal. But as the number of instructions grows the cache gets repopulated and the overhead levels off and becomes negligible relative to the total time. Well, none of our plan nodes are in the one instruction regime ;-). I was thinking that the total volume of data accessed was the critical factor. Right at the moment I'm disillusioned with the TLB-access theory though. Something I'm noticing right now is that it seems like only hash joins are really seriously misestimated --- nest and merge joins have some small issues but only the hash is way out there. What's going on?? Can anyone else reproduce this? The other option aside from gprof-like profiling would be to investigate those cpu timing instructions again. I know some of them are unsafe on multi-cpu systems but surely there's a solution out there. It's not like there aren't a million games, music playing, and other kewl kid toys that depend on accurate low overhead timing these days. Yeah, and they all work only on Windoze and Intel chips :-( regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] AGREGATE FUNCTIONS
Hello, I would like to know where in the source-code of postgres is located the code of the aggregate functions min, max, avg. I wish to develop more statistical aggregate functions, and I prefer to use C than to write then in the PL/R. There is a library in contrib called intagg. I wrote it a few years ago, and I have to laugh at the README file because I must have been stoned or something, because I can't understand it. Anyways, if you want to make an aggregate function, it covers what you need. Feel free to ignore the array stuff, because you probably won't need to deal with it. Aggregates have basically two functions, a single function called on every iteration of the query (or GROUP BY) for state. Then there is a function that is called at the end called final. The PostgreSQL docs are pretty good as well. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] How to avoid transaction ID wrap
Mark Woodward wrote: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. Maybe if you grouped multiple operations on bigger transactions, the I/O savings could be enough to buy you the ability to vacuum once in a while. Or consider buffering somehow -- save the data elsewhere, and have some sort of daemon to put it into the database. This would allow to cope with the I/O increase during vacuum. The problem is ssufficiently large that any minor modification can easily hide the problem for a predictble amount of time. My hope was that someone would have a real long term work around. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] DROP INHERITS
I have a question about pg_depends, namely how do you identify the dependency tied to a given relationship. Specifically to handle DROP INHERITS. Is there any other reason there might be a dependency between two tables other than inheritance? If there was how would you tell the dependencies apart? Also, it seems to me the INHRELID syscache is entirely pointless. There isn't a single consumer of it throughout the source tree. Nor can I think of any reason anyone would want to look up the nth parent of a table. Do I misunderstand what purpose this syscache serves? On the other hand I see a few places where a syscache for a particular child-parent pair might be useful. Would it make sense to create an index and syscache for that? I suppose it makes more sense to optimize this on the basis of what's used in the planner and executor rather than ALTER TABLE commands though. I don't know what would be helpful there. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] DROP INHERITS
Greg Stark [EMAIL PROTECTED] writes: I have a question about pg_depends, namely how do you identify the dependency tied to a given relationship. Specifically to handle DROP INHERITS. Is there any other reason there might be a dependency between two tables other than inheritance? If there was how would you tell the dependencies apart? I can't think of another reason --- if there is more than one reason for a dependency between particular objects, we would have to take a harder look at the issues. For the moment you could trawl the sources for recordDependencyOn calls to see if there is more than one reason. Also, it seems to me the INHRELID syscache is entirely pointless. I don't think anyone has checked for useless syscaches for a long time :-(. Please check the others while you are at it. On the other hand I see a few places where a syscache for a particular child-parent pair might be useful. Would it make sense to create an index and syscache for that? Adding an index where none exists now has a very definite cost. You need to argue why the usefulness of the cache exceeds the cost of the index maintenance. This goes both ways of course; there may well be syscaches/indexes we don't have today that would be worth having. I suppose it makes more sense to optimize this on the basis of what's used in the planner and executor rather than ALTER TABLE commands though. No, definitely not. Syscaches only exist to support hard-wired lookups in the backend C code. Indexes on system catalogs are of interest to the planner, but not syscaches. (So it is legitimate to have indexes with no associated syscache. The other way is not possible, though, because the syscache mechanism depends upon having a matching index.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] How to avoid transaction ID wrap
I've once proposed a patch for 64bit transaction ID, but this causes some overhead to each tuple (XMIN and XMAX). Pgbench with 64bit transaction ID has to pay about a couple of percent of performance. If 64bit transaction ID is a reasonable fix, I've already posted this patch. Anyone can apply this to later versions. Mark Woodward wrote: Mark Woodward wrote: OK, here's my problem, I have a nature study where we have about 10 video cameras taking 15 frames per second. For each frame we make a few transactions on a PostgreSQL database. Maybe if you grouped multiple operations on bigger transactions, the I/O savings could be enough to buy you the ability to vacuum once in a while. Or consider buffering somehow -- save the data elsewhere, and have some sort of daemon to put it into the database. This would allow to cope with the I/O increase during vacuum. The problem is ssufficiently large that any minor modification can easily hide the problem for a predictble amount of time. My hope was that someone would have a real long term work around. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Koichi Suzuki ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] DROP INHERITS
Tom Lane [EMAIL PROTECTED] writes: I suppose it makes more sense to optimize this on the basis of what's used in the planner and executor rather than ALTER TABLE commands though. No, definitely not. Syscaches only exist to support hard-wired lookups in the backend C code. Indexes on system catalogs are of interest to the planner, but not syscaches. (So it is legitimate to have indexes with no associated syscache. The other way is not possible, though, because the syscache mechanism depends upon having a matching index.) I imagine the planner and/or executor have to execute hard-wired lookups in C code all day long to check for children of tables before they can execute queries on those tables. I meant that the performance of those lookups was undoubtedly more critical than the performance of DDL. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly