[HACKERS] Vacuum dead tuples that are between transactions
The topic of improving vacuum for use in heavy-update environments seems to come up frequently on the list. Has anyone weighed the costs of allowing VACUUM to reclaim tuples that are not older than the oldest transaction but are nonetheless invisible to all running transactions? It seems that it's not that hard Currently, a tuple is not elligible to be reclaimed by vacuum unless it was deleted by a transaction that committed before the oldest currently running transaction committed. (i.e., it's xmax is known to have committed before the oldest-currently-running xid was started.) Right? However, it seems like under certain scenarios (heavy updates to small tables while a long-running transaction is occurring) there might be a lot of tuples that are invisible to all transactions but not able to be vacuumed under the current method. Example: updating a single row over and over again while pg_dump is running. Suppose that in the system, we have a serializable transaction with xid 1000 and a read committed transaction with xid 1001. Other than these two, the oldest running xid is 2000. Suppose we consider a tuple with xmin 1200 and xmax 1201. We will assume that xid 1201 committed before xid 2000 began to run. So: (A) This tuple is invisible to the serializable transaction, since its snapshot can't ever advance. (B) The read committed transaction might be able to see it. However, if its current command started AFTER xid 1201 committed, it can't. Unless I'm missing something, it seems that when vacuuming you can leave serializable transactions (like pg_dump) out of the calculation of the oldest running transaction so long as you keep a list of them and check each tuple T against each serializable transaction X to make sure that T's xmin is greater than X, or else T's xmax committed before X started to run. Of course this is a lot of work, but this should mitigate the effect of long running serializable transactions until such time as processor power becomes your limiting factor. The read committed ones are a more difficult matter, but I think you can treat a tuple as dead if it was inserted after the read committed transaction started to run AND the tuple was deleted before the transaction's currently running command started to run. I suppose the major difficulty here is that currently a transaction has no way of knowing when another backend's command started to run? Is this too difficult to do or is it a good idea that no one has enough 'round tuits for? Regards, Paul Tillotson ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] t_ctid chains
(Prompted by nearby thread about VACUUM FULL bugs, but not having anything to do with that properly speaking.) Hackers, For some time, I have wondered: what does postgres use t_ctid chains for? It seems like it is useful to find the newer version of a tuple. However, wouldn't that eventually get found anyway? A sequential scan scans the whole table, and so it will find the new tuple. Since indexes contain all tuples, so will an index scan. I infer that the there must be some sort of optimization to make it worth (a) using extra space in the disk pages and (b) causing the extra complexity such as the bugs mentioned in VACUUM FULL. So: what are the t_ctid chains good for? If this is too long or too elementary to type, can someone point me to the source code that uses t_ctid chains? Regards, Paul Tillotson ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [COMMITTERS] pgsql: Fix NUMERIC modulus to properly
Bruce Momjian wrote: Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: No, I don't think so. It doesn't seem to be something that enough people use to risk the change in behavior --- it might break something that was working. But, if folks want it backported we can do it. It is only a change to properly do modulus for numeric. Well, from my point of view it's an absolute mathematical error - i'd backport it. I can't see anyone relying on it :) Doesn't this patch break the basic theorem that a = trunc(a / b) * b + (a mod b) ? If division rounds and mod doesn't, you've got pretty serious issues. Well, this is a good question. In the equation above we assume '/' is an integer division. The problem with NUMERIC when used with zero-scale operands is that the result is already _rounded_ to the nearest hole number before it gets to trunc(), and that is why we used to get negative modulus values. I assume the big point is that we don't offer any way for users to get a NUMERIC division without rounding. With integers, we always round down to the nearest whole number on division; float doesn't offer a modulus operator, and C doesn't support it either. We round NUMERICs to the specific scale because we want to give the most accurate value: test= select 1000::numeric(24,0) / 11::numeric(24,0); ?column? 9090909090909090909091 The actual values is: -- 9090909090909090909090.90 But the problem is that the equation at the top assumes the division is not rounded. Should we supply a NUMERIC division operator that doesn't round? integer doesn't need it, and float doesn't have the accurate precision needed for modulus operators. The user could supply some digits in the division: test= select 1000::numeric(30,6) / 11::numeric(24,0); ?column? --- 9090909090909090909090.909091 (1 row) but there really is no _right_ value to prevent rounding (think 0.999). A non-rounding NUMERIC division would require duplicating numeric_div() but with a false for 'round', and adding operators. I would prefer that division didn't round, as with integers. You can always calculate your result to 1 more decimal place and then round, but there is no way to unround a rounded result. Tom had asked whether PG passed the regression tests if we change the round_var() to a trunc_var() at the end of the function div_var(). It does not pass, but I think that is because the regression test is expecting that division will round up. (Curiously, the regression test for numeric passes, but the regression test for aggregation--sum() I think--is the one that fails.) I have attached the diffs here if anyone is interested. Regards, Paul Tillotson *** ./expected/aggregates.out Sun May 29 19:58:43 2005 --- ./results/aggregates.outMon Jun 6 21:01:11 2005 *** *** 10,16 SELECT avg(a) AS avg_32 FROM aggtest WHERE a 100; avg_32 - ! 32.6667 (1 row) -- In 7.1, avg(float4) is computed using float8 arithmetic. --- 10,16 SELECT avg(a) AS avg_32 FROM aggtest WHERE a 100; avg_32 - ! 32. (1 row) -- In 7.1, avg(float4) is computed using float8 arithmetic. == test boolean ... ok test char ... ok test name ... ok test varchar ... ok test text ... ok test int2 ... ok test int4 ... ok test int8 ... ok test oid ... ok test float4 ... ok test float8 ... ok test bit ... ok test numeric ... ok test strings ... ok test numerology ... ok test point... ok test lseg ... ok test box ... ok test path ... ok test polygon ... ok test circle ... ok test date ... ok test time ... ok test timetz ... ok test timestamp... ok test timestamptz ... ok test interval ... ok test abstime ... ok test reltime ... ok test tinterval... ok test inet ... ok test comments ... ok test oidjoins ... ok test type_sanity ... ok test opr_sanity ... ok test geometry ... ok test horology ... ok test insert ... ok test create_function_1... ok test create_type ... ok test create_table ... ok test create_function_2
[HACKERS] tool for incrementally shrinking bloated tables
Goal: on a prduction server, to gradually shrink a table (no matter how large) back to 10% free space without noticeably interrupting write access to it. (noticeably = without taking any exclusive locks for more than a few seconds at a time.) I am thinking about making this if it proves to be not to difficult. To accomplish this, tuples need to be moved into free space in the beginning of the table, and the table must be shrunk using ftruncate(). It seems that I could make these two changes: (a) Modifying the VACUUM command to take an option that means pack the free space map with the pages that occur earliest in the table rather than the pages with the most free space. (b) Create a command that will take an exclusive lock, scan a table backwards until it comes to a tuple that cannot be removed (i.e., a tuple that is not HEAPTUPLE_DEAD (see scan_heap() in src/backend/commands/vacuum.c)) or until some preset amount of time has elapsed, and then ftruncate() the table. To use this system one would do this: (1) VACUUM KEEP_EARLY_FREE_PAGES mybloatedtable; -- use item (a) discussed above (2) UPDATE mybloatedtable SET foo = foo WHERE ctid '(n, 0)'; -- move tuples in end of the table to the front. (3) SHRINK TABLE mybloatedtable; -- use item (b) discussed above Then repeat as many times as necessary to accomplish the desired shrinking. In defense of the need for this tool: Although this is usually preventable by proper vacuuming and FSM configuration, often on the list I see people say that they have a huge multi-gigabyte table that is using up all their drive space, but they cannot afford the interruption that VACUUM FULL would entail. Also, certain maintenance operations (e.g., adding a column and populating it within a transaction) can double the on-disk size of a table, not to mention user error such as running an unconstrained UPDATE command inside a transaction and then rolling it back. Comments? Am I missing some obvious way of accomplishing this goal? Is anyone working on something like this? Paul Tillotson ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_locks view and user locks
This is the first I have ever heard user locks, but I have more than once wanted a lock that would persist beyond the end of a transaction. Do these do that? Paul Merlin Moncure [EMAIL PROTECTED] writes: ... is there any merit to promoting the user lock wrappers out of contrib Dunno. Yours is the first message I can recall in quite a long time indicating that anyone was using userlocks. I thought the code was kind of dying on the vine. Of course it's hard to tell whether that's not just *because* it's in contrib and not mainstream. But personally I'd like to see some more evidence of use before we promote it. (And yeah, the API could probably use some cleanup first.) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [pgsql-advocacy] What can we learn from MySQL?
On the other topics... I think the biggest service PGSQL could provide to the open source community is a resource that teaches people with no database experience the fundamentals of databases. If people had an understanding of what a RDBMS should be capable of and how it should be used, they wouldn't pick MySQL. I think that this is incredibly important. Many many developers choose MySQL because MySQL really does make the effort in this regard. This strategy has helped both MySQL and Red Hat become the commercial successes they are today. I believe that postgres is making an effort here. I learned SQL from the postgres docs found in the first few chapters here: http://www.postgresql.org/docs/7.4/static/tutorial.html Those, in my opinion, are excellent, and were way more informative to me than anything on the MySQL website (I tried reading there first). Maybe we are aiming for users who had a clue quotient much lower than I, but those attain an excellent balance between too short and simple to be useful and too long and complicated. Paul Tillotson ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Usability, MySQL, Postgresql.org, gborg, contrib,
Joshua D. Drake wrote: I know both. :-). Seriously - I'd like to raise my voice in favor of installing plpgsql in template1 by default. I haven't heard any good reason not to (nor even a bad reason). If we install plPGSQL by default, we should install any other pl language that was configured at runtime by default as well. This includes plPerl, plTCL, and plPython. Of course only if they were compiled in, but sense they are a part of the core distribution we shouldn't favor one over the other. Personally, plpgSQL is only useful to those who are coming from Oracle. People are more likely to be comfortable with plPython or plPerl than plpgSQL. I disagree. I know python and perl and I have never touched Oracle, but I find plpgsql to be more useful than the others. Mainly because (1) It is strongly typed, has all the same types as postgresql itself, consistent support for NULLS, etc. (2) Upon a casual read of the docs I couldn't figure out how to read any values from the database other than what was passed into the pl{perl|python|php} function. I.e., how do you run a select? What client libraries do you use? Maybe the others just need better docs. Paul Tillotson ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] where can I get the HTML docs
Can anyone show me where to download a zipped tarball of .html files of what exists at the following link? http://www.postgresql.org/docs/7.4/static/index.html Thanks. Paul ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Avoid MVCC using exclusive lock possible?
I use this type of approach when mirroring data from a foxpro database (yuck) to a read-only postgres database. It is quicker and cleaner than deleting all of the rows and inserting them again (TRUNCATE is not transaction safe, which I need). However, for this to be useful, your table must not have any indexes, views, foreign keys, sequences, triggers, etc., or else you must be prepared to re-create all of them using application level code. I imagine this would break lots of things, but it would be nice if instead of Shridhar's rename step (see below) one could do this: $table1node = query(SELECT relfilenode FROM pg_class WHERE relname = '$old_table';); $table2node = query(SELECT relfilenode FROM pg_class WHERE relname = '$new_table';); exec(UPDATE pg_class SET relfilenode = $table2node WHERE relname = '$old_table';); exec(UPDATE pg_class SET relfilenode = $table1node WHERE relname = '$new_table';); You would of course need to change the relfilenode for all of the toasted columns and indexes as well in the same atomic step, but it seems like this might be more compatible with postgresql's MVCC model than other ideas suggested. Regards, Paul Tillotson Shridhar Daithankar wrote: I am sure people have answered the approach you have suggested so let me suggest a workaround for your problem. You could run following in a transaction. - begin - Create another table with exact same structure - write a procedure that reads from input table and updates the value in between - drop the original table - rename new table to old one - commit - analyze new table Except for increased disk space, this approach has all the good things postgresql offers. Especially using transactable DDLs it is huge benefit. You certainly do save on vacuum. If the entire table is updated then you can almost certainly get things done faster this way. HTH Shridhar ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])