[ADMIN] storage calculations
i've been browsing the documentation online, and i can't find where the storage layouts for tables and indexes are explained. i'd like to be able to figure out, given a table or two with a simple schema (no ineritance) and some indexes, and a number of rows currently in these tables, how much disk space is being used. the underlying problem being worked here, by the way, is to come up with a task that monitors space used by the database so as to issue a warning of some sort when the partition upon which the database resides is getting full. i had originally used the statfs output from the (unix) os, but for this to accurately reflect the "used" space, i found i had to do a "vacuum full" and compress the database. i'd rather come up with a heuristic where i can take a number of rows (which i get from the reltuples column in pg_class, which seems to be accurate after running 'analyze') and multiply by some number i have calculated based on the table schema to get a fairly accurate guess at the real space used. any and all advice welcome, and thanks. pg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [ADMIN] storage calculations
sorry, when i referred to "disk space used" i meant the actual amount used by live rows. if i insert 1000 rows then delete 999 of them, the disk file will be the size of the 100 row container (make that 1000, not 100 - i cannot backspace here) until i do a "vacuum full" - which does a table lock, which is a bad thing. given that, i'd like to try to be able to calculate the number of bytes a row uses given its schema. i've seen this kind of documentation for other dbs, i just cannot seem to find it in the postgresql docs. thanks. pg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[ADMIN] recovery from a bad transaction log
given that someone (else) deleted the pg_xlog directory, or that it got otherwise corrupted, is there a way to regenerate the database, given that all the data is still safely tucked away in $PGDATA/base ??? pg ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [ADMIN] recovery from a bad transaction log
that's what i was looking for. thanks. and i see what you mean about not really trusting that the data really is all there. still, most is better than none. again, thnks. pg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [ADMIN] storage calculations
On Thu, 31 Jul 2003, Peter Eisentraut wrote:N > [EMAIL PROTECTED] writes:N > > sorry, when i referred to "disk space used" i meant the actual amount > > used by live rows. if i insert 1000 rows then delete 999 of them, the > > disk file will be the size of the 100 row container (make that 1000, not > > 100 - i cannot backspace here) until i do a "vacuum full" - which does a > > table lock, which is a bad thing. > > The task that you originally described was that you want to monitor when > the disk is getting full. For that task, you need to take into account > the actual size of the data on disk, not the size after a "vacuum full" > which you say you do not want to execute. Basing a disk full monitor on > hypothetical sizes sounds pretty useless. > > > given that, i'd like to try to be able to calculate the number of > > bytes a row uses given its schema. i've seen this kind of > > documentation for other dbs, i just cannot seem to find it > > in the postgresql docs. > > There is some information about that in the FAQ, but keep in mind that > rows may be compressed or moved to secondary storage automatically. well, i can admit that i am confused. my assumption is that when a row is deleted in pg, that row is merely marked as 'gone' until a vacuum-full is done. my further assumption is that if i continue to add rows, those rows are not necessarily appended to the end of the physical data file, but can be written over 'deleted' rows. given that, a vacuum-full is the only way i know of to get an accurate reflection of the number of bytes being used to store the data. without the vacuum, i can tell how big a potentially sparse file is, but i don't want to raise a warning just because the file size is getting large (unless, of course, pg dos not overwrite deleted rows, in which case the warning, or a vaccum-full, seems appropriate. i think i agree with you, too, that i cannot really calculate a hypothetical size, unless i have all fixed-sized fields. in that case, i should be able to accurately calculate the size, yes? if not, what are the variables i could not account for? uh, any more info on your comment "rows may be compressed or moved to secondary storage automatically." i'd *love* to know how to do that. thanks. pgboy ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [ADMIN] storage calculations
On Thu, 31 Jul 2003, Andrew Sullivan wrote: > On Thu, Jul 31, 2003 at 08:51:09AM -0400, [EMAIL PROTECTED] wrote: > > > > well, i can admit that i am confused. my assumption is that when a row > > is deleted in pg, that row is merely marked as 'gone' until a vacuum-full > > is done. my further assumption is that if i continue to add rows, those > > rows are not necessarily appended to the end of the physical data file, > > but can be written over 'deleted' rows. > > Your assumption is wrong. VACUUM without FULL will allow you to > write over the free space now available in your data files, subject > to the limitations of tracking as determined by your FSM settings. > > VACUUM FULL actually shortens the data file. Except for cases having > to do with large tuples that won't fit in the previous page, VACUUM > FULL means that you can't fit any more data in that file, once the > VACUUM FULL is done. > > A > > well, i was close. i didn't realize that i had to do a (non-full) vacuum to mark deleted space as free. but after that, i am still left with a potentially sparse file and i don't really have a way to guess how much free space is available until i do a full vacuum, correct? (at which time the file size(s) will indicate the amount being used) just started looking at fsm. thanks for the pointer there. i hope i'm not acting too obtuse here. thanks. pg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [ADMIN] counting rows
nice stuff (the whole presentation, full of stuff i did not know), thanks. of course, a question... the bit that reads: If you know the maximum value of an integer column and have an index on it: SELECT COUNT(*) FROM t2 WHERE c2 < 100; made me think that the above would do an index scan vice a table scan. but when i try it with my table (currently has almost 70,000,000 rows) it still does a table scan (at least, EXPLAIN says it will do a table scan) ... well, sometimes. the table has two integer fields, SECONDS and MICROS, and i have created an index on (SECONDS, MICROS). ANALYZEd the table so the index would be considered useful, tested that pg was happy by running: EXPLAIN SELECT count(*) FROM tess WHERE SECONDS < 100; sure enough, query plan says it'll use an index scan. but... EXPLAIN SELECT count(*) from tess where SECONDS < 11; query plan here says it will use a table scan. the cutoff, the highest value at which it will still do an index scan, is 1060183451. given that this value is typically gotten from the system call gettimeofday and the current maximum value is 1060188816 (about noon today), this is, er, problematic. any ideas? thanks. pg On Wed, 6 Aug 2003 [EMAIL PROTECTED] wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > > > i need to get a row count on a large table. it appears that > > select count(*) from ; > > always does a table scan, whether i have an index or not, and that is too > > slow. > > > http://www.gtsm.com/oscon2003/findrows.html > > > - -- > Greg Sabino Mullane [EMAIL PROTECTED] > PGP Key: 0x14964AC8 200308060951 > -BEGIN PGP SIGNATURE- > Comment: http://www.turnstep.com/pgp.html > > iD8DBQE/MQgmvJuQZxSWSsgRAkFDAJwPz5C7OljCg4TYqxvw4/WjLr8TlgCgovtS > xQPuL/l+QCPgmwVDeX0yocM= > =XtnA > -END PGP SIGNATURE- > > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[ADMIN] counting rows
i need to get a row count on a large table. it appears that select count(*) from ; always does a table scan, whether i have an index or not, and that is too slow. so i found this: select reltuples from pg_class where relname = ''; this is nice and fast, but not all that accurate at any given time. but, i *did* notice that it is pretty accurate if i run analyze just before. so, i am wondering if analyze also does a table scan, in which case i am back to where i started. it *seems* to run fairly quickly, even on a large table, but i am just wall-clocking all of this. or any other ideas on counting rows are welcome. thanks. pg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])