AW: [HACKERS] selecting from cursor
That's gonna have to be fixed. If you're not up for it, don't implement this. Given that cursors (are supposed to) support FETCH BACKWARDS, I really don't see why they shouldn't be expected to handle ReScan... I thought only scrollable cursors can do that. What if cursor isn't scrollable? Should it error during the execution? In PostgreSQL, all cursors are scrollable. The allowed grammar keyword is simply ignored. I am actually not sure that this is optimal, since there are a few very effective optimizations, that you can do if you know, that ReScan is not needed (like e.g. not storing the result temporarily). Andreas ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] selecting from cursor
Alex Pilosov [EMAIL PROTECTED] writes: And what are you doing with the places that don't care which kind of RTE they are dealing with (which is most of them IIRC)? While you haven't They just have things declared as RangeTblEntry *, and as long as they don't access type-specific fields, they are fine. So you have four (soon to be six or seven) different structs that *must* have the same fields? I don't think that's cleaner than a union ... at the very least, declare it as structs containing RangeTblEntry, similar to the way the various Plan node types work (see plannodes.h). For scrollable cursors, Rescan should be implemented as 'scroll backwards until you can't scroll no more', correct? No, it should be implemented as Rescan. The portal mechanism needs to expose the Rescan call for the contained querytree. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: AW: [HACKERS] selecting from cursor
Zeugswetter Andreas SB [EMAIL PROTECTED] writes: this. Given that cursors (are supposed to) support FETCH BACKWARDS, I really don't see why they shouldn't be expected to handle ReScan... I thought only scrollable cursors can do that. What if cursor isn't scrollable? Should it error during the execution? In PostgreSQL, all cursors are scrollable. The allowed grammar keyword is simply ignored. I am actually not sure that this is optimal, since there are a few very effective optimizations, that you can do if you know, that ReScan is not needed (like e.g. not storing the result temporarily). It's worse than that: we don't distinguish plans for cursors from plans for any other query, hence *all* query plans are supposed to be able to run backwards. (In practice, a lot of them don't work :-(.) Someday that needs to be improved. It would be good if the system understood whether a particular plan node would ever be asked to rescan itself or run backwards, and could optimize things on that basis. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: Buffer access rules, and a probable bug
Hiroshi Inoue [EMAIL PROTECTED] writes: Tom Lane wrote: I believe that nbtree.c's btbuild() code is currently in violation of these rules, because it calls HeapTupleSatisfiesNow() while holding a pin but no lock on the containing buffer. OK, we had better avoid using heapam routines in btbuild() ? On further thought, btbuild is not that badly broken at the moment, because CREATE INDEX acquires ShareLock on the relation, so there can be no concurrent writers at the page level. Still, it seems like it'd be a good idea to do LockBuffer(buffer, BUFFER_LOCK_SHARE) here, and probably also to invoke HeapTupleSatisfiesNow() via the HeapTupleSatisfies() macro so that infomask update is checked for. Vadim, what do you think? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] selecting from cursor
Alex Pilosov [EMAIL PROTECTED] writes: On Tue, 3 Jul 2001, Tom Lane wrote: So you have four (soon to be six or seven) different structs that *must* have the same fields? I don't think that's cleaner than a union ... Please see my diffs. Its implemented via #define to declare all common fields. #define RTE_COMMON_FIELDS \ NodeTag type; \ [etc] I don't think that technique is cleaner than a union, either ;-). The macro definition is a pain in the neck: you have to play games with semicolon placement, most tools won't autoindent it nicely, etc etc. But the main point is that I think NodeType = RangeTblEntry with a separate subtype field is a better way to go than making a bunch of different NodeType values. When most of the fields are common, as in this case, it's going to be true that many places only want to know is it a rangetable entry or not? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
RE: [HACKERS] Re: Buffer access rules, and a probable bug
On further thought, btbuild is not that badly broken at the moment, because CREATE INDEX acquires ShareLock on the relation, so there can be no concurrent writers at the page level. Still, it seems like it'd be a good idea to do LockBuffer(buffer, BUFFER_LOCK_SHARE) here, and probably also to invoke HeapTupleSatisfiesNow() via the HeapTupleSatisfies() macro so that infomask update is checked for. Vadim, what do you think? Looks like there is no drawback in locking buffer so let's lock it. Vadim ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [OT] Any major users of postgresql?
On Tue, 3 Jul 2001, Joe Brenner wrote: The reason I'm asking is that the place that I work is actually contemplating reverting from Oracle's expensive bugs to MySQL's (supposedly) cheap ones. They'd consider postgresql, but they figure that with MySQL they can at least point to sites that pump a fair amount of data with it (e.g. mp3.com). Please help save me from a life without referential integrity... sourceforge for one. They were using MySQL, then changed. Also, look at the postgres web site - there is an article there were someome did a speed comparison between PG MySQL. Postgres came out on top, even in places where folks thought MySQL would win. Also, it depends on what your application is. If there is any amount of DB updates, PG will easily be the best choice. :-) Take care, Bill ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] selecting from cursor
Alex Pilosov [EMAIL PROTECTED] writes: True true. On other hand, unlike union, its automatically typechecked, you cannot by mistake reference a field you shouldn't be referencing. Only true to the extent that you have cast a generic pointer to the correct type to begin with. However, we've probably wasted more time arguing the point than it's really worth. I would suggest leaving off the final semicolon in the macro definition so that you can write typedef struct RangeTblEntryRelation { RTE_COMMON_FIELDS; /* Fields valid for a plain relation RTE */ char *relname;/* real name of the relation */ Oid relid; /* OID of the relation */ Without this, tools like pgindent will almost certainly mess up these struct declarations (I know emacs' C mode will get it wrong...) regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: I added some codes into HandleDeadLock to measure how long LockLockTable and DeadLOckCheck calls take. Followings are the result in running pgbench -c 1000 (it failed with stuck spin lock error). real time shows how long they actually run (using gettimeofday). user time and system time are measured by calling getrusage. The time unit is milli second. LockLockTable: real time min | max |avg -++--- 0 | 867873 | 152874.9015151515 LockLockTable: user time min | max | avg -+-+-- 0 | 30 | 1.2121212121 LockLockTable: system time min | max | avg -+--+ 0 | 2140 | 366.5909090909 DeadLockCheck: real time min | max | avg -+---+- 0 | 87671 | 3463.6996197719 DeadLockCheck: user time min | max | avg -+-+--- 0 | 330 | 14.2205323194 DeadLockCheck: system time min | max | avg -+-+-- 0 | 100 | 2.5095057034 Hm. It doesn't seem that DeadLockCheck is taking very much of the time. I have to suppose that the problem is (once again) our inefficient spinlock code. If you think about it, on a typical platform where processes waiting for a time delay are released at a clock tick, what's going to be happening is that a whole lot of spinblocked processes will all be awoken in the same clock tick interrupt. The first one of these that gets to run will acquire the spinlock, if it's free, and the rest will go back to sleep and try again at the next tick. This could be highly unfair depending on just how the kernel's scheduler works --- for example, one could easily believe that the waiters might be awoken in process-number order, in which case backends with high process numbers might never get to acquire the spinlock, or at least would have such low probability of winning that they are prone to stuck spinlock timeout. We really need to look at replacing the spinlock mechanism with something more efficient. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Re: Backup and Recovery
With stock PostgreSQL... how many committed transactions can one lose on a simple system crash/reboot? With Oracle or Informix, the answer is zero. Is that true with PostgreSQL in fsync mode? If not, does it lose all in the log, or just those not yet written to the DB? With WAL the theory is that it will not lose a committed transaction. Bugs have plagged previous versions (7.1.2 looks clean) and it none (Oracle, Informix, Postgres) can protect against coding errors in the certain cases but from general power failure it's fine. This assumes adequate hardware too. Some harddrives claim to have written when they haven't among other things, but Postgres itself won't lose the information -- your hardware might :do that silently though.) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] New SQL Datatype RECURRINGCHAR
This is rather like MySQL's enum. I still opt for the join, and if you like make a view for those who don't want to know the data structure. -- Rod Taylor Your eyes are weary from staring at the CRT. You feel sleepy. Notice how restful it is to watch the cursor blink. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise. - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, June 29, 2001 6:05 PM Subject: [HACKERS] New SQL Datatype RECURRINGCHAR Idea for a new SQL Data Type: RECURRINGCHAR The idea with RECURRINGCHAR is treated exactly like a VARCHAR in it's usage. However, it's designed for table columns that store a small set of repeated values (=256 values). This allows for a great deal of savings in the storage of the data. Example: Query: select count(*) from order Returns: 100,000 Query: select distinct status from order Returns: OPEN REWORK PLANNED RELEASED FINISHED SHIPPED It's apparent that there is a lot of duplicate space used in the storage of this information. The idea is if order.status was stored as a RECURRINGCHAR then the only data stored for the row would be a reference to the value of the column. The actual values would be stored in a separate lookup table. Advantages: - Storage space is optimized. - a query like: select distinct {RECURRINGCHAR} from {table} can be radically optimized - Eliminates use of joins and extended knowledge of data relationships for adhoc users. This datatype could be extended to allow for larger sets of repeated values: RECURRINGCHAR1 (8-bit) up to 256 unique column values RECURRINGCHAR2 (16-bit) up to 65536 unique column values Reasoning behind using 'long reference values': It is often an advantage to actually store an entire word representing a business meaning as the value of a column (as opposed to a reference number or mnemonic abbreviation ). This helps to make the system 'self documenting' and adds value to users who are performing adhoc queries on the database. David Bennett President - Bensoft 912 Baltimore, Suite 200 Kansas City, MO 64105 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Buffer access rules, and a probable bug
[EMAIL PROTECTED] (Nathan Myers) writes: On Mon, Jul 02, 2001 at 09:40:25PM -0400, Tom Lane wrote: 4. It is considered OK to update tuple commit status bits (ie, OR the values HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, or HEAP_XMAX_INVALID into t_infomask) while holding only a shared lock and pin on a buffer. This is OK because another backend looking at the tuple at about the same time would OR the same bits into the field, so there is little or no risk of conflicting update; what's more, if there did manage to be a conflict it would merely mean that one bit-update would be lost and need to be done again later. Without looking at the code, this seems mad. Are you sure? Yes. Those status bits aren't ground truth, only hints. They cache the results of looking up transaction status in pg_log; if they get dropped, the only consequence is the next visitor to the tuple has to do the lookup over again. Changing any other bits in t_infomask requires exclusive lock, however. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] funny (cache (?)) bug in postgres (7.x tested)
RISKO Gergely [EMAIL PROTECTED] writes: I found a funny bug in postgres with c functions. (or feature??) Scribbling on your input datum is verboten. palloc a new value to return. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] New SQL Datatype RECURRINGCHAR
Rod Taylor [EMAIL PROTECTED] writes: This is rather like MySQL's enum. Yes. If we were going to do anything like this, I'd vote for stealing the enum API, lock stock and barrel --- might as well be compatible. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Help with SI buffer overflow error
Matthew [EMAIL PROTECTED] writes: NOTICE: RegisterSharedInvalid: SI buffer overflow NOTICE: InvalidateSharedInvalid: cache state reset These are normal; at most they suggest that you've got another backend sitting around doing nothing (but in an open transaction) while VACUUM runs. I think we finally got around to downgrading them to DEBUG messages for 7.2. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Re: Buffer access rules, and a probable bug
Okay, on to the next concern. I've been thinking some more about the restrictions needed to make the world safe for concurrent VACUUM. I previously said: 5. To physically remove a tuple or compact free space on a page, one must hold a pin and an exclusive lock, *and* observe while holding the exclusive lock that the buffer's shared reference count is one (ie, no other backend holds a pin). If these conditions are met then no other backend can perform a page scan until the exclusive lock is dropped, and no other backend can be holding a reference to an existing tuple that it might expect to examine again. Note that another backend might pin the buffer (increment the refcount) while one is performing the cleanup, but it won't be able to to actually examine the page until it acquires shared or exclusive lock. This is OK when considering a page in isolation, but it does not get the job done when one is deleting related index tuples and heap tuples. It seems to me that there *must* be some cross-page coupling to make that work safely. Otherwise you could have this scenario: 1. Indexscanning process visits an index tuple, decides to access the corresponding heap tuple, drops its lock on the index buffer page. 2. VACUUMing process visits the index buffer page and marks the index tuple dead. (It won't try to delete the tuple yet, since it sees the pin still held on the page by process #1, but it can acquire exclusive lock and mark the tuple dead anyway.) 3. VACUUMing process is the first to acquire pin and lock on the heap buffer page. It sees no other pin, so it deletes the tuple. 4. Indexscanning process finally acquires pin and lock on the heap page, tries to access what is now a gone tuple. Ooops. (Even if we made that not an error condition, it could be worse: what if a third process already reused the line pointer for a new tuple?) It does not help to postpone the actual cleaning of an index or heap page until its own pin count drops to zero --- the problem here is that an indexscanner has acquired a reference into a heap page from the index, but does not yet hold a pin on the heap page to ensure that the reference stays good. So we can't just postpone the cleaning of the index page till it has pin count zero, we have to make the related heap page(s)' cleanup wait for that to happen too. I can think of two ways of guaranteeing that this problem cannot happen. One is for an indexscanning process to retain its shared lock on the index page until it has acquired at least a pin on the heap page. This is very bad for concurrency --- it means that we'd typically be holding indexpage shared locks for the time needed to read in a randomly-accessed disk page. And it's very complicated, since we still need all the other rules, plus the mechanism for postponing cleanups until pin count goes to zero. It'd cause considerable changes to the index access method API, too. The other is to forget about asynchronous cleaning, and instead have the VACUUM process directly do the wait for pin count zero, then clean the index page. Then when it does the same for the heap page, we know for sure there are no indexscanners in transit to the heap page. This would be logically a lot simpler, it seems to me. Another advantage is that we need only one WAL entry per cleaned page, not two (one for the initial mark-dead step and one for the postponable compaction step), and there's no need for an intermediate gone but not forgotten state for index tuples. We could implement this in pretty nearly the same way as the mark for cleanup facility that you partially implemented awhile back: essentially, the cleanup callback would send a signal or semaphore increment to the waiting process, which would then try to acquire pin and exclusive lock on the buffer. If it succeeded in observing pin count 1 with exclusive lock, it could proceed with cleanup, else loop back and try again. Eventually it'll get the lock. (It might take awhile, but for a background VACUUM I think that's OK.) What I'm wondering is if you had any other intended use for mark for cleanup than VACUUM. The cheapest implementation would allow only one process to be waiting for cleanup on a given buffer, which is OK for VACUUM because we'll only allow one VACUUM at a time on a relation anyway. But if you had some other uses in mind, maybe the code needs to support multiple waiters. Comments? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] stuck spin lock with many concurrent users
Hiroshi Inoue [EMAIL PROTECTED] writes: DeadLockCheck: real time min | max | avg -+---+- 0 | 87671 | 3463.6996197719 DeadLockCheck: user time min | max | avg -+-+--- 0 | 330 | 14.2205323194 DeadLockCheck: system time min | max | avg -+-+-- 0 | 100 | 2.5095057034 Hm. It doesn't seem that DeadLockCheck is taking very much of the time. Isn't the real time big ? Yes, it sure is, but remember that the guy getting useful work done (DeadLockCheck) is having to share the CPU with 999 other processes that are waking up on every clock tick for just long enough to fail to get the spinlock. I think it's those useless process wakeups that are causing the problem. If you estimate that a process dispatch cycle is ~ 10 microseconds, then waking 999 useless processes every 10 msec is just about enough to consume 100% of the CPU doing nothing useful... so what should be a few-millisecond check takes a long time, which makes things worse because the 999 wannabees are spinning for that much more time. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] stuck spin lock with many concurrent users
Bruce Momjian [EMAIL PROTECTED] writes: If you estimate that a process dispatch cycle is ~ 10 microseconds, then waking 999 useless processes every 10 msec is just about enough to consume 100% of the CPU doing nothing useful... Don't we back off the sleeps or was that code removed? Not enough to affect this calculation. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Buffer access rules, and a probable bug
[EMAIL PROTECTED] (Nathan Myers) writes: Here, backend B is a good citizen and locks while it makes its change. No, backend B wasn't a good citizen: it should have been holding exclusive lock on the buffer. Also, as hints, would it be Bad(tm) if an attempt to clear one failed? Clearing hint bits is also an exclusive-lock-only operation. Notice I specified that *setting* them is the only case allowed to be done with shared lock. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])