Re: [HACKERS] [GENERAL] INHERITS and planning
Edmund Dengler [EMAIL PROTECTED] writes: Is there an issue when a large number of INHERITS tables exist for planning? Well, there are a number of issues whenever a single query references a whole lot of tables in any fashion. It's only with Neil Conway's rewrite of the List package in 8.0 that we had any hope of less than O(N^2) behavior for practically any measure of query complexity N. I have been spending some time over the past week or so attacking other O(N^2) behaviors, but it's not a finished project yet. I tried to reproduce your issue by doing create table p1 (f1 int, f2 bigint); create table c0() inherits (p1); create table c1() inherits (p1); create table c2() inherits (p1); ... create table c2298() inherits (p1); create table c2299() inherits (p1); and then profiling select * from p1; With no data in the tables, of course this is just measuring planning time and executor startup/shutdown overhead. But I suppose that you don't have a whole lot of data in the tables either, because the data fetching stage is surely pretty linear and you'd not be complaining about overhead if there were much data to be read. What I see in the profile is % cumulative self self total time seconds secondscalls s/call s/call name 42.04 15.5815.58 9214 0.00 0.00 list_nth_cell 20.29 23.10 7.52 34524302 0.00 0.00 SHMQueueNext 8.34 26.19 3.0929939 0.00 0.00 LockCountMyLocks 5.64 28.28 2.09 2960617 0.00 0.00 AllocSetAlloc 2.37 29.16 0.88 2354 0.00 0.00 AllocSetCheck 2.29 30.01 0.85 302960 0.00 0.00 hash_search 2.13 30.80 0.79 2902873 0.00 0.00 MemoryContextAlloc The list_nth operations are all coming from rt_fetch() macros, so we could probably fix that by replacing rangetable Lists by arrays. This seems doable, but also tedious and bug-prone; there are too many places that figure they can randomly add onto rtable lists. What I'm more interested in at the moment are the next two entries, SHMQueueNext and LockCountMyLocks --- it turns out that almost all the SHMQueueNext calls are coming from LockCountMyLocks, which is invoked during LockAcquire. This is another O(N^2) loop, and it's really a whole lot nastier than the rangetable ones, because it executes with the LockMgrLock held. I spent a little time trying to see if we could avoid doing LockCountMyLocks altogether, but it didn't look very promising. What I am thinking though is that we could implement LockCountMyLocks as either a scan through all the proclocks attached to the target proc (the current way) or as a scan through all the proclocks attached to the target lock (proclocks are threaded both ways). There is no hard upper bound on the number of locks a proc holds, whereas there is a bound of MaxBackends on the number of procs linked to a lock. (Well, theoretically it could be 2*MaxBackends considering the possibility of session locks, but that could only happen if all the backends are trying to vacuum the same relation.) So it seems like it might be a win to scan over the per-lock list instead. But I'm very unsure about what the *average* case is, instead of the worst case. I'm also thinking that the shared memory lock structure may be overdesigned now that we've introduced the backend-private LocalLock table --- in particular, it's not clear why we still include transaction IDs in PROCLOCKTAG rather than leaving the backend to track all the different reasons why it wants to hold a lock. If we could get rid of that then LockCountMyLocks reduces to a single PROCLOCK hashtable lookup. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Make Problems
When i try to compile postgres 8 on windows machine i get this error after i issue the make command make -C doc all make[1]: Entering directory `/src/pgsql/doc' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/src/pgsql/doc' make -C src all make[1]: Entering directory `/src/pgsql/src' make -C port all make[2]: Entering directory `/src/pgsql/src/port' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -fno-strict-aliasing -I../../src/port -DFRONTEND -I../../src/include -I./src/include/port/win32 -DEXEC_BACKEND -I../../src/include/port/win32 -c -o open.o open.c open.c: In function `win32_open': open.c:67: `O_SEQUENTIAL' undeclared (first use in this function) open.c:67: (Each undeclared identifier is reported only once open.c:67: for each function it appears in.) make[2]: *** [open.o] Error 1 make[2]: Leaving directory `/src/pgsql/src/port' make[1]: *** [all] Error 2 make[1]: Leaving directory `/src/pgsql/src' make: *** [all] Error 2 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] User Quota Implementation
Hi, did anything happen to implementing quotas, yet? though I did not see anything on the TODO List I was wondering what is going on. Regards, Yann ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] proposed TODO: non-locking CREATE INDEX / REINDEX
It seems that currently we do not allow modifying a table while an index is built on it (at least my experience and some small tests I did indicate it). Strangely I did not find a TODO item for it, so I may be overlooking something and we already have it. In case we really all For big 24/7 tables this can be a real annoyance, especially in an evolving database. There are many ways this could be made to work, so it needs some discussion. I propose the following approach: 1) when CREATE INDEX starts a ctid position (CTID_INDEX_MIN) of last tuple is remembered and all new tuples are inserted after that point while an index is being built. 2) the index is built in the usual fast way up to the remembered ctid, and make it visible for all backends for inserting, but not yet to planner for using. we remember the last ctid inserted while the fast- build phase was running (CTID_INDEX_MAX). the restriction to add new tuples only after CTID_INDEX_MIN is lifted. 3) then add index entries for all tuples whose ctid is between CTID_INDEX_MIN and CTID_INDEX_MAX. 4) declare the index usable for planner. Additionally CREATE INDEX could be made to honour vacuum_cost_* variables and not hog busy database. -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Daily DBT-3 (DSS) Results on CVS head
Hello Mark, (B (BMark Wong wrote: (Bhttp://developer.osdl.org/markw/postgrescvs/ (B (BThis site includes "Profile Report". It's very interesting and useful! (B (BThen, I compared result of 5/29 with the result of 5/30. (B (B new-order transactions per minute (B20050529 1779.41 (B20050530 2320.41 (B (BAnd I looked Profile Report, Top 20 by Tick. (B (Bhttp://khack.osdl.org/stp/302458/profile/Framework_Close-tick.top20 (Bhttp://khack.osdl.org/stp/302468/profile/Framework_Close-tick.top20 (B (B5/29 (B--- (B11611612 default_idle 241908.5833 (B555980 _spin_unlock_irq 11582.9167 (B527285 __copy_to_user_ll4119.4141 (B481664 __copy_from_user_ll 3763. (B221535 _spin_unlock_irqrestore 4615.3125 (B--- (B (B5/30 (B--- (B21246982 default_idle 442645.4583 (B198495 __copy_to_user_ll1550.7422 (B192117 _spin_unlock_irq 4002.4375 (B183016 __copy_from_user_ll 1429.8125 (B 70284 _spin_unlock_irqrestore 1464.2500 (B--- (B (BIt is understood that the frequency by which "_spin_unlock_irq" is (Bcalled has decreased. (B (BIs there something finding there? (B (B-- (BJunji Teramoto (B (B---(end of broadcast)--- (BTIP 5: Have you checked our extensive FAQ? (B (B http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] regexp_replace
Tom Flavel wrote: On 08/06/2005 21:57:29, Bruce Momjian wrote: Andrew Dunstan wrote: surely somthing like foo_replace (source text, pattern text, replacement text, flags text) returns text would fit the bill. OK, it reflects my Perl prejudices, but that looks more natural to me. You could overload it so that the flags default to none (which would be case sensitive, replace the first instance only, among other things). Can we have the flags be a list of words, e.g. all, ignorecase. How do we handle this type of problem in other cases? How about an array? Cumbersome, perhaps, but it makes more sense to me than delimiting with commas or using single characters. I think that it is good to specify the flags by one character as well as Perl. I propose the following specification: regexp_replace(source text, pattern text, replacement text, [flags text]) returns text The flags can use the following values: g: global (replace all) i: ignore case When the flags is not specified, case sensitive, replace the first instance only. regards, --- Atsushi Ogawa ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] User Quota Implementation
Yann Michel wrote: Hi, did anything happen to implementing quotas, yet? though I did not see anything on the TODO List I was wondering what is going on. No work has been done on it, and I don't even see a TODO item for it. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] proposed TODO: non-locking CREATE INDEX / REINDEX
Hannu Krosing [EMAIL PROTECTED] writes: There are many ways this could be made to work, so it needs some discussion. (1) when do you ever catch up? (2) if your answer to (1) involves increasing the strength of a lock, how do you avoid risk of deadlock? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] User Quota Implementation
Yann Michel wrote: Hi Bruce, On Fri, Jun 10, 2005 at 09:45:32AM -0400, Bruce Momjian wrote: did anything happen to implementing quotas, yet? though I did not see anything on the TODO List I was wondering what is going on. No work has been done on it, and I don't even see a TODO item for it. Do you think that it is possible that one can generate a TODO item out of the request or do you rather think different? Yes, sure. Ah, I found it. TODO has now: * Allow limits on per-db/user connections That is pretty vague, but it is all we have so far. In fact, that refers more to the number of connections rather than say disk space or CPU. The issue we have had with these issues in the past is that we aren't sure how such limits would be implemented or used. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] User Quota Implementation
Hi Bruce, On Fri, Jun 10, 2005 at 09:45:32AM -0400, Bruce Momjian wrote: did anything happen to implementing quotas, yet? though I did not see anything on the TODO List I was wondering what is going on. No work has been done on it, and I don't even see a TODO item for it. Do you think that it is possible that one can generate a TODO item out of the request or do you rather think different? Regards, Yann ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Server instrumentation
Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: pg_terminate_backend() exposing kill -SIGTERM backendpid to the client The objections to this have not changed since last year; in fact they are stronger because we have at least one report of actual trouble with retail SIGTERMs. I'm not arguing about that, still SIGTERMing a single backend is widely used. pg_file_stat() pg_file_length() pg_file_read() pg_file_write() pg_file_unlink() pg_file_rename() pg_dir_ls() I really have serious doubts about the value of this. You're insisting on this exceptionally weak argument for a year now. We all know that you personally do everything from the cmd line, but there *is* a requirement to have additional ways of access to config files. I do have wide positive feedback on this, see the discussion a month back (and some more private mails). And besides, they are already widely in use by pgadmin from the win32 pgsql distro when displaying logfiles, which is installed by default. Regards, Andreas ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Tilde expansion on Win32
I noticed that we don't expand tildes in Win32 because of the use of tilde in short versions of long file names: char * expand_tilde(char **filename) { if (!filename || !(*filename)) return NULL; /* MSDOS uses tilde for short versions of long file names, so skip it. */ #ifndef WIN32 /* try tilde expansion */ if (**filename == '~') { However, I thought the tilde was usually used toward the end of the file name, not at the beginning. Is this true? Should this code be modified? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Daily DBT-3 (DSS) Results on CVS head
On Fri, 10 Jun 2005 19:17:58 +0900 Junji TERAMOTO [EMAIL PROTECTED] wrote: Hello Mark, Mark Wong wrote: http://developer.osdl.org/markw/postgrescvs/ This site includes Profile Report. It's very interesting and useful! Then, I compared result of 5/29 with the result of 5/30. new-order transactions per minute 20050529 1779.41 20050530 2320.41 And I looked Profile Report, Top 20 by Tick. http://khack.osdl.org/stp/302458/profile/Framework_Close-tick.top20 http://khack.osdl.org/stp/302468/profile/Framework_Close-tick.top20 It looks like the 20050530 run might be bogus. If we look at the top of the page here: http://khack.osdl.org/stp/302468/results/0/ The number of rollbacks are way too high. There should only be 1% for New Order only. I should to refer to the specification to see if I should be counting those rolled back transaction. My intuition tells me I've made a mistake in doing so. ;) What you were seeing in the profiles are mostly a side effect of that. The database had crashed so all the transactions were failing right away. We can see that here: http://khack.osdl.org/stp/302468/results/0/db/log Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Tilde expansion on Win32
Bruce Momjian wrote: I noticed that we don't expand tildes in Win32 because of the use of tilde in short versions of long file names: char * expand_tilde(char **filename) { if (!filename || !(*filename)) return NULL; /* MSDOS uses tilde for short versions of long file names, so skip it. */ #ifndef WIN32 /* try tilde expansion */ if (**filename == '~') { However, I thought the tilde was usually used toward the end of the file name, not at the beginning. Is this true? Should this code be modified? If you talking about WIn32 ~ they are typically in the middle. Like: program files === progra~1 Sincerely, Joshua D. Drake -- Your PostgreSQL solutions provider, Command Prompt, Inc. 24x7 support - 1.800.492.2240, programming, and consulting Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit http://www.commandprompt.com / http://www.postgresql.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Tilde expansion on Win32
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian Sent: 10 June 2005 15:31 To: PostgreSQL-development Subject: [HACKERS] Tilde expansion on Win32 I noticed that we don't expand tildes in Win32 because of the use of tilde in short versions of long file names: char * expand_tilde(char **filename) { if (!filename || !(*filename)) return NULL; /* MSDOS uses tilde for short versions of long file names, so skip it. */ #ifndef WIN32 /* try tilde expansion */ if (**filename == '~') { However, I thought the tilde was usually used toward the end of the file name, not at the beginning. Is this true? Should this code be modified? Yes, it's true - long filenames may be shortened to something like Long filename.document - longfi~1.doc To munge them into 8.3 format. Without looking at the code I assume that it is expanding ~dpage into /home/dpage or whatever? If so, I'd be inclined to leave it - tilde isn't used like that on Windows. Regards, Dave ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Tilde expansion on Win32
Dave Page wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian Sent: 10 June 2005 15:31 To: PostgreSQL-development Subject: [HACKERS] Tilde expansion on Win32 I noticed that we don't expand tildes in Win32 because of the use of tilde in short versions of long file names: char * expand_tilde(char **filename) { if (!filename || !(*filename)) return NULL; /* MSDOS uses tilde for short versions of long file names, so skip it. */ #ifndef WIN32 /* try tilde expansion */ if (**filename == '~') { However, I thought the tilde was usually used toward the end of the file name, not at the beginning. Is this true? Should this code be modified? Yes, it's true - long filenames may be shortened to something like Long filename.document - longfi~1.doc To munge them into 8.3 format. Without looking at the code I assume that it is expanding ~dpage into /home/dpage or whatever? If so, I'd be inclined to leave it - tilde isn't used like that on Windows. OK, you d'man. I will leave it. I know we can't support ~user/ on Win32, but I thought maybe we should support ~/. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] proposed TODO: non-locking CREATE INDEX / REINDEX
On R, 2005-06-10 at 09:47 -0400, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: There are many ways this could be made to work, so it needs some discussion. (1) when do you ever catch up? (2) if your answer to (1) involves increasing the strength of a lock, how do you avoid risk of deadlock? No. I don't plan on locking the table at all. The only thing that is changed during the initial fast-build-index is that new tuples are inserted after CTID_INDEX_MIN, and after the initial fastbuild index is done, the only restriction is that the index can't be used in queries before the tuples between CTID_INDEX_MIN and CTID_INDEX_MAX are added to the index. As the number of tuples between CTID_INDEX_MIN and CTID_INDEX_MAX is finite, they must be added in finite time, by which time the index will be up-to-date and usable for querie planner. (i.e. (1) is done) All tuples inserted after the initial fast-build-index has finished and CTID_INDEX_MAX is fixed, are inserted into the index at the time of inserting the tuple, like for any other index. -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] User Quota Implementation
On Fri, Jun 10, 2005 at 10:07:59AM -0400, Bruce Momjian wrote: Do you think that it is possible that one can generate a TODO item out of the request or do you rather think different? Yes, sure. Ah, I found it. TODO has now: * Allow limits on per-db/user connections Fine! That is pretty vague, but it is all we have so far. In fact, that refers more to the number of connections rather than say disk space or CPU. The issue we have had with these issues in the past is that we aren't sure how such limits would be implemented or used. Well, I have realy a lot of experiences with oracle usage and with its limitation capabilities. What I need the most is space-limitation per tablespace. Since 9i there is also a possibility to restrict cpu-usage for a certain consumer or group but in fact I din't need to to so since most of the apps have their own database. Maybe it could be useful to have these groups later on. As far as I understood the thread above (and any other mails) the space limitation would not only be nice for me. BTW: Is there any patch available, yet? This thread dealt with a patch but I didn't see any!? Regards, Yann ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] proposed TODO: non-locking CREATE INDEX / REINDEX
On R, 2005-06-10 at 17:54 +0300, Hannu Krosing wrote: On R, 2005-06-10 at 09:47 -0400, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: There are many ways this could be made to work, so it needs some discussion. (1) when do you ever catch up? (2) if your answer to (1) involves increasing the strength of a lock, how do you avoid risk of deadlock? No. I don't plan on locking the table at all. The only thing that is changed during the initial fast-build-index is that new tuples are inserted after CTID_INDEX_MIN, and after the initial fastbuild index is done, the only restriction is that the index can't be used in queries before the tuples between CTID_INDEX_MIN and CTID_INDEX_MAX are added to the index. Maybe I did not make it very clear, but the initial fast-build-index is done only for the tuples whose ctid is below CTID_INDEX_MIN, thereby this also will happen in finite amount of time. The total time will be: initial_index_build_time + inserting_tuples_added_while_doing_the_initial_build the latter wil be slowed down a little, as these are competing with index entries from inserts happening in real time. As the number of tuples between CTID_INDEX_MIN and CTID_INDEX_MAX is finite, they must be added in finite time, by which time the index will be up-to-date and usable for querie planner. (i.e. (1) is done) All tuples inserted after the initial fast-build-index has finished and CTID_INDEX_MAX is fixed, are inserted into the index at the time of inserting the tuple, like for any other index. -- Hannu Krosing [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] User Quota Implementation
Yann Michel wrote: On Fri, Jun 10, 2005 at 10:07:59AM -0400, Bruce Momjian wrote: Do you think that it is possible that one can generate a TODO item out of the request or do you rather think different? Yes, sure. Ah, I found it. TODO has now: * Allow limits on per-db/user connections Fine! That is pretty vague, but it is all we have so far. In fact, that refers more to the number of connections rather than say disk space or CPU. The issue we have had with these issues in the past is that we aren't sure how such limits would be implemented or used. Well, I have realy a lot of experiences with oracle usage and with its limitation capabilities. What I need the most is space-limitation per tablespace. Since 9i there is also a possibility to restrict cpu-usage for a certain consumer or group but in fact I din't need to to so since most of the apps have their own database. Maybe it could be useful to have these groups later on. As far as I understood the thread above (and any other mails) the space limitation would not only be nice for me. I assume you can't use file system quotas for the tablespace partitions? BTW: Is there any patch available, yet? This thread dealt with a patch but I didn't see any!? Oh, there is no patch, just a discussion. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Concrete proposal for large objects and MVCC
I spent a little bit of time thinking about what it would mean exactly for large-object operations to obey MVCC, and decided that there are more worms in that can than I had realized. Part of the problem is that we have no concept of a lock on an individual LO, and thus operations that really should be serialized, such as DROP, aren't going to work very well. We could implement DROP as the equivalent of DELETE FROM pg_largeobject WHERE loid = nnn; with an MVCC snapshot --- but there is no guarantee that we won't miss a page that someone else is concurrently inserting into that same large object. So what I'm thinking is that the prudent course is to leave writing semantics as they are, namely SnapshotNow rules. (What this means in practice is you get tuple concurrently updated errors if two transactions try to write the same page of the same LO concurrently. We have seen few if any complaints about that error in connection with LO operations, so ISTM there's not a problem there that needs solving.) The problem we do need to solve is letting pg_dump have a stable view of the database's large objects. I propose that we can fix this in a suitably narrow way by making the following definition: * A large object descriptor opened for read-only access saves the current ActiveSnapshot and uses that snapshot to read pg_largeobject for the duration of its existence. * A large object descriptor opened for write-only or read-write access uses SnapshotNow, same as before. This avoids the risk of creating any serious backwards-compatibility issues: if there's anyone out there who does need SnapshotNow reads, they just have to be sure to open the LO in read-write mode to have fully backward compatible operation. Comments, objections? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Bug in pg_restore ... ?
On Thu, 9 Jun 2005, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Yeah it is an ordering problem with pg_dump... If you are using pg_restore you can hack around the problem by using pg_restore's load-order-control switch (which was invented exactly to let people work around pg_dump's problems ;-)). In this case though, the simplest answer is probably to install tsearch2 into the new database before you run pg_restore. It'll complain that the functions already exist, but you can ignore that. More then just that ... it errors out trying to create tables that already exist from loading tsearch2.sql: pg_restore: [archiver (db)] could not execute query: ERROR: relation pg_ts_dict already exists So that doesn't appear to be an option either ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] proposed TODO: non-locking CREATE INDEX / REINDEX
Hannu Krosing [EMAIL PROTECTED] writes: As the number of tuples between CTID_INDEX_MIN and CTID_INDEX_MAX is finite, they must be added in finite time, by which time the index will be up-to-date and usable for querie planner. (i.e. (1) is done) ... and by which time, some more could have been added after CTID_INDEX_MAX. Have you forgotten Zeno's paradox? I don't see a reason to assume the indexer can *ever* catch up --- it's entirely likely that adding a new unindexed row is faster than adding an index entry for it. All tuples inserted after the initial fast-build-index has finished and CTID_INDEX_MAX is fixed, are inserted into the index at the time of inserting the tuple, like for any other index. This implies that you are hoping for an asynchronous change in the behavior of other processes, which you are not going to get without taking out locks, which is what you wanted to avoid. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Gist Recovery testing
Ok, I already realized WAL in GiST and tested with online backup. It works. Now I need test it on recovery after crash (power loss etc) and there is a problem in GiST with incompleted inserts, I hope, I resolved it, but this problem, I think, has the same nature as incompleted splits in btree code. For solving, btree xlog code uses stack of incompleted splits and if they leaves incompleted, btree_xlog_cleanup function completes them. I make use similar idea, but I can't reproduce situation when xlog_cleanup makes some useful work except just looks at void stack. So question: how is it possible to reproduce desired behaviour? I tried to kill(getpid(), SIGKILL) in a middle of insertion (inside gist.c), but postgres doesn't try ro restore aborted transaction Thank you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] User Quota Implementation
Hi, On Fri, Jun 10, 2005 at 10:59:46AM -0400, Bruce Momjian wrote: Well, I have realy a lot of experiences with oracle usage and with its limitation capabilities. What I need the most is space-limitation per tablespace. Since 9i there is also a possibility to restrict cpu-usage for a certain consumer or group but in fact I din't need to to so since most of the apps have their own database. Maybe it could be useful to have these groups later on. As far as I understood the thread above (and any other mails) the space limitation would not only be nice for me. I assume you can't use file system quotas for the tablespace partitions? No, that's definetely no solution, due to I'm interested in a general solution which should be applicable for all platforms. BTW: Is there any patch available, yet? This thread dealt with a patch but I didn't see any!? Oh, there is no patch, just a discussion. O.K. so I was not wrong about that. Is it usefull to re-discuss some of the aspects to get a gist of what should probably be implemented/extended? Regards, Yann ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] User Quota Implementation
Yann Michel wrote: Hi, On Fri, Jun 10, 2005 at 10:59:46AM -0400, Bruce Momjian wrote: Well, I have realy a lot of experiences with oracle usage and with its limitation capabilities. What I need the most is space-limitation per tablespace. Since 9i there is also a possibility to restrict cpu-usage for a certain consumer or group but in fact I din't need to to so since most of the apps have their own database. Maybe it could be useful to have these groups later on. As far as I understood the thread above (and any other mails) the space limitation would not only be nice for me. I assume you can't use file system quotas for the tablespace partitions? No, that's definetely no solution, due to I'm interested in a general solution which should be applicable for all platforms. BTW: Is there any patch available, yet? This thread dealt with a patch but I didn't see any!? Oh, there is no patch, just a discussion. O.K. so I was not wrong about that. Is it usefull to re-discuss some of the aspects to get a gist of what should probably be implemented/extended? Sure. Basically there has not been a lot of interest in this, and we are not sure how to implement it without a huge amount of work. Considering the other things we are working on, it hasn't been a priority, and lots of folks don't like the Oracle approach either. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Gist Recovery testing
Teodor Sigaev [EMAIL PROTECTED] writes: Ok, I already realized WAL in GiST and tested with online backup. It works. Now I need test it on recovery after crash (power loss etc) and there is a problem in GiST with incompleted inserts, I hope, I resolved it, but this problem, I think, has the same nature as incompleted splits in btree code. For solving, btree xlog code uses stack of incompleted splits and if they leaves incompleted, btree_xlog_cleanup function completes them. I make use similar idea, but I can't reproduce situation when xlog_cleanup makes some useful work except just looks at void stack. So question: how is it possible to reproduce desired behaviour? It's not very easy; you have to arrange for the WAL history to end between the two action records that you are interested in. What I'd suggest is - run test backend under GDB, set breakpoint between making log entry for first split step and entry for second. - let it reach the breakpoint - in another backend, commit some unrelated transaction to force XLOG buffers to disk - in GDB, kill -9 to prevent test backend from proceeding If you want to trace through the recovery as well, the best bet might be to temporarily add a sleep(30) near the beginning of StartupXlog so that you have time to reattach to the recovery process. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Concrete proposal for large objects and MVCC
I spent a little bit of time thinking about what it would mean exactly for large-object operations to obey MVCC, and decided that there are more worms in that can than I had realized. Part of the problem is that we have no concept of a lock on an individual LO, and thus operations that really should be serialized, such as DROP, aren't going to work very well. We could implement DROP as the equivalent of DELETE FROM pg_largeobject WHERE loid = nnn; with an MVCC snapshot --- but there is no guarantee that we won't miss a page that someone else is concurrently inserting into that same large object. So what I'm thinking is that the prudent course is to leave writing semantics as they are, namely SnapshotNow rules. (What this means in practice is you get tuple concurrently updated errors if two transactions try to write the same page of the same LO concurrently. We have seen few if any complaints about that error in connection with LO operations, so ISTM there's not a problem there that needs solving.) The problem we do need to solve is letting pg_dump have a stable view of the database's large objects. I propose that we can fix this in a suitably narrow way by making the following definition: * A large object descriptor opened for read-only access saves the current ActiveSnapshot and uses that snapshot to read pg_largeobject for the duration of its existence. * A large object descriptor opened for write-only or read-write access uses SnapshotNow, same as before. This avoids the risk of creating any serious backwards-compatibility issues: if there's anyone out there who does need SnapshotNow reads, they just have to be sure to open the LO in read-write mode to have fully backward compatible operation. Comments, objections? Besides the MVCC issue, I am not sure it's a good idea LO being binded to OID. In my understanding OID is solely used to distinguish each LO in a database. In another word, it's just a key to LO. I think giving explicit key when creating a LO has some benefits: 1) not need to worry about OID wrap around problem 2) easier to find orpahn LO 3) for replication systems it's easier to replicate LOs What do you think? -- Tatsuo Ishii ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Concrete proposal for large objects and MVCC
This avoids the risk of creating any serious backwards-compatibility issues: if there's anyone out there who does need SnapshotNow reads, they just have to be sure to open the LO in read-write mode to have fully backward compatible operation. Comments, objections? Besides the MVCC issue, I am not sure it's a good idea LO being binded to OID. In my understanding OID is solely used to distinguish each LO in a database. In another word, it's just a key to LO. I think giving explicit key when creating a LO has some benefits: 1) not need to worry about OID wrap around problem 2) easier to find orpahn LO 3) for replication systems it's easier to replicate LOs 4) No longer tied to a system object and thus no oddities needed for backup/restore. It should just be an int4 or in8 with a serial IMHO. Sincerely, Joshua D. Drake What do you think? -- Tatsuo Ishii ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Concrete proposal for large objects and MVCC
Tatsuo Ishii [EMAIL PROTECTED] writes: Besides the MVCC issue, I am not sure it's a good idea LO being binded to OID. In my understanding OID is solely used to distinguish each LO in a database. In another word, it's just a key to LO. I think giving explicit key when creating a LO has some benefits: I'm not excited about making non-backwards-compatible changes in LOs; the whole thing is pretty much a legacy feature in my mind, and changing it significantly seems to miss the point. However, we could offer a new variant of lo_creat that allows a particular OID to be specified. (That would simplify pg_dump tremendously, which is probably sufficient reason to do it.) I think the only other change needed is that the default form of lo_creat should loop until it finds a free OID, which is something I had intended to change anyway --- the current coding is failure-prone once the OID counter wraps around. This is really orthogonal to the MVCC issue, but I'm willing to change it at the same time if there's no objections. Anyone have a preference about the name for the new function? (At least at the libpq level, we have to invent a new name, we can't just overload.) I'm inclined to use lo_create, but maybe that's too close to lo_creat. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] User Quota Implementation
Bruce, Yann, Sure. Basically there has not been a lot of interest in this, and we are not sure how to implement it without a huge amount of work. Considering the other things we are working on, it hasn't been a priority, and lots of folks don't like the Oracle approach either. Yeah. I'd prefer per-database quotas, rather than per-user quotas, which seem kind of useless. The hard part is making any transaction which would exceed the per-database quota roll back cleanly with a comprehensible error message rather than just having the database shut down. If we had per-database user quotas, and per-database users, it would pretty much wind up all of the issues which ISPs have with Postgres. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] User Quota Implementation
Josh Berkus josh@agliodbs.com writes: Yeah. I'd prefer per-database quotas, rather than per-user quotas, which seem kind of useless. The hard part is making any transaction which would exceed the per-database quota roll back cleanly with a comprehensible error message rather than just having the database shut down. That part doesn't seem hard to me: we already recover reasonably well from smgrextend failures. The real difficulty is in monitoring the total database size to know when it's time to complain. We don't currently make any effort at all to measure that, let alone keep track of it in real time. Given that there might be lots of processes concurrently adding pages in different places, I don't think you could hope for an exact stop-on-a-dime limit, but maybe if you're willing to accept some fuzz it is doable ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] The Contrib Roundup (long)
actually I think part of the point of this was to give a command line version of the reindex command, like we have for vaccum. If that still matters, then it should probably stay. Actually it should probably be converted to C and moved to /src/bin. Wouldn't something like echo 'REINDEX DATABASE {database};' | psql {database} be easier? Of course it would need a working shell, but even Windows can do this, I believe. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] The Contrib Roundup (long)
On Friday 10 June 2005 10:54 am, Kaare Rasmussen wrote: actually I think part of the point of this was to give a command line version of the reindex command, like we have for vaccum. If that still matters, then it should probably stay. Actually it should probably be converted to C and moved to /src/bin. Wouldn't something like echo 'REINDEX DATABASE {database};' | psql {database} be easier? But not as easy as: psql -c reindex database {database} {database} Add connection options as desired. Cheers, Steve ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] The Contrib Roundup (long)
Josh Berkus josh@agliodbs.com writes: I had a lot of time to kill on airplanes recently so I've gone digging through /contrib in an effort to sort out what's in there and try to apply some consistent rules to it. Sorry for not responding sooner; I'm catching up on back email. As already noted, I agree with most of your goals here, though I'm with Peter that restructuring the directory hierarchy is more trouble than it's worth; just organizing the docs that way should be sufficient. Here are some comments about the individual modules (where not stated, I agree with your evaluation): adddepend: is this still needed, or would a proper dump-and-reload from 7.2 add the dependancy information anyway? It is still theoretically useful, but given that the author feels it is unmaintained and possibly broken, I would agree with removing it (or maybe better, push to pgfoundry). Anyone trying to jump straight from 7.2-or-earlier to 8.1 is probably going to have worse issues than lack of dependencies anyway. dbase You seem to have missed this one. I would argue that it should go to pgfoundry as it is a data conversion tool. findoidjoins: again, it's not clear what this module is for. We need it to generate the oidjoins regression test. Possibly it should move into src/tools; I can't see any reason that ordinary users would want it. intagg: what does this module do which is not already available through the built-in array functions and operators? Maybe I don't understand what it does. Unnatributed in the README. Move to pgfoundry? The aggregate is functionally equivalent to ARRAY(sub-SELECT) but I think that the aggregate notation is probably easier to use in many scenarios. The other function is basically the reverse conversion: given an array, return a setof integers. I don't think that we currently have a built-in equivalent to that. The functionality is useful but severely limited by the fact that it only works on one datatype (int4) --- I'd like to see it reimplemented as polymorphic functions. lo: another special data type. Is its functionality required anymore? The datatype as such is pretty much a waste of time --- you might as well use OID. (We could replace the datatype with a domain over OID and have a compatible one-line implementation...) The useful part of this is the lo_manage trigger, which essentially supports automatic dropping of large objects when the (assumed unique) references to them from the main database go away. It'd perhaps make sense to migrate lo_manage into the main backend and lose the rest. misc_utils: I believe that all of these utils are obsolesced by builtin system commands or easily written userspace functions (like max(x,y)). Also, is under the GPL (see above). Author Massimo Dal Zotto ([EMAIL PROTECTED]) I agree with just summarily removing this one. noupdate: this is a cool example of a simple C trigger and would be lovely to have in a doc somewhere. As somebody else noted, it's completely broken: it does not do at all what the documentation claims. There are much more interesting trigger examples under spi/, so I'd agree with removal. pg_dumplo: is this still required for pg large objects? If so, can't we integrate it into the core? utilities/ Probably drop; this was long ago superseded by functionality in pg_dump. pg_upgrade: what's the status of this, Bruce? Does it work at all? Shouldn't this be moved to the pgfoundry project of the same name until it's stable? Doesn't work and hasn't worked in a long time. I'd agree with removal. pgbench: I see repeated complaints on -performance about how pgbench results are misleading. Why are we shipping it with PostgreSQL then? It's handy to have *some* simple concurrent-behavior test included, even if it's not something we put a lot of stock in. The parallel regression tests are a joke as far as exercising concurrent updates go --- I think pg_bench is an important test tool for that reason. I'd not vote to remove this without a better replacement. reindexdb: now obsolete per the REINDEX {database} command. Remove from contrib. Per other followups, this isn't obsolete at all. Possibly the functionality could be merged into vacuumdb, rather than writing a whole 'nother program? spi: contains TimeTravel functions. Do these actually still work? The spi stuff is good for documentation purposes anyway ... but if the functions aren't working, should be in the docs and not /contrib. Not only do they work, several of them are used in the regression tests. string: data_types/ Same problem as Massimo's other library; it's GPL. Also, is it really needed at this point? Massimo ([EMAIL PROTECTED]). Actually, I've never looked closely at this before, and now that I have I've got a serious problem with the proposed mode of use: overwriting the typoutput functions for standard datatypes is just a guaranteed recipe for breaking client code left and right.
Re: [HACKERS] User Quota Implementation
Hi Josh! On Fri, Jun 10, 2005 at 10:13:52AM -0700, Josh Berkus wrote: Yeah. I'd prefer per-database quotas, rather than per-user quotas, which seem kind of useless. The hard part is making any transaction which would exceed the per-database quota roll back cleanly with a comprehensible error message rather than just having the database shut down. If we had per-database user quotas, and per-database users, it would pretty much wind up all of the issues which ISPs have with Postgres. O.K. This makes sens to me. Otherwise I'd like to see quotas per tablespace. As far as I got it, a tablespace may grow in size untile the volume is full. Here a grace quota might be usefull as well. Let's say a 5% threshold like the ext filesystem as an default for generating a warning to th elogs files letting the admin extend the volum(s) by time. Regards, Yann ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] User Quota Implementation
Hi Tom, On Fri, Jun 10, 2005 at 01:37:54PM -0400, Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Yeah. I'd prefer per-database quotas, rather than per-user quotas, which seem kind of useless. The hard part is making any transaction which would exceed the per-database quota roll back cleanly with a comprehensible error message rather than just having the database shut down. That part doesn't seem hard to me: we already recover reasonably well from smgrextend failures. The real difficulty is in monitoring the total database size to know when it's time to complain. We don't currently make any effort at all to measure that, let alone keep track of it in real time. Given that there might be lots of processes concurrently adding pages in different places, I don't think you could hope for an exact stop-on-a-dime limit, but maybe if you're willing to accept some fuzz it is doable ... Well I think a fuzzy test is better than none. But I think one should be able to calculate how much later the quota is detected as exceeded than it is planed to be. Therefor a threshold is usefull as well (for alerting) Regards, Yann ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposed toast info extraction function for disaster recovery
Alvaro Herrera [EMAIL PROTECTED] writes: Now that I think about it, maybe my problem is not related to TOAST at all, but to a corrupted varlena field. Right. So if the corruption does not involve toasting, I'm in the same position as before, i.e. I haven't found out what is the corrupted tuple. Hmm. Maybe we need something more like a lint check for tables, ie run through and look for visibly corrupt data, such as obviously impossible lengths for varlena fields. (I thought about adding more checks to the standard code paths, such as heap_deformtuple, but aside from the speed penalty involved, most of those places don't actually have enough context to issue a useful error message. A lint check could reasonably expect to tell you by ctid which tuple has a problem.) Come to think of it, didn't someone already write something close to this a few years ago? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] The Contrib Roundup (long)
But not as easy as: psql -c reindex database {database} {database} Well it was just to show that there really is no need for a program just for this functionality. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] User Quota Implementation
Yann, O.K. This makes sens to me. Otherwise I'd like to see quotas per tablespace. As far as I got it, a tablespace may grow in size untile the volume is full. Here a grace quota might be usefull as well. Let's say a 5% threshold like the ext filesystem as an default for generating a warning to th elogs files letting the admin extend the volum(s) by time. Hmmm ... Tablespace quotas would be *even more* useful than database quotas. If it's just as easy for you? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] User Quota Implementation
I have a patch for user quotas in (I think) 7.4.2. I was going to update it for 8.x but have been too busy. The discussion (in the past) was related to whether quotas would be applied to users or groups and whether it would be on tablespaces (which I think it should). I can spend some time reviving the patch this weekend if it is of interest to you. Just let me know. -Jonah Josh Berkus wrote: Yann, O.K. This makes sens to me. Otherwise I'd like to see quotas per tablespace. As far as I got it, a tablespace may grow in size untile the volume is full. Here a grace quota might be usefull as well. Let's say a 5% threshold like the ext filesystem as an default for generating a warning to th elogs files letting the admin extend the volum(s) by time. Hmmm ... Tablespace quotas would be *even more* useful than database quotas. If it's just as easy for you? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] The Contrib Roundup (long)
On 2005-06-10, Kaare Rasmussen [EMAIL PROTECTED] wrote: But not as easy as: psql -c reindex database {database} {database} Well it was just to show that there really is no need for a program just for this functionality. Either you're misunderstanding what reindex database does (it reindexes only the system catalogs), or you're misunderstanding what reindexdb does (it reindexes all indexes in the specified database, or all databases, unless told otherwise). -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]