[ADMIN] storage calculations

2003-07-29 Thread pgboy
i've been browsing the documentation online, and i can't
find where the storage layouts for tables and indexes are
explained.

i'd like to be able to figure out, given a table or two
with a simple schema (no ineritance) and some indexes, and
a number of rows currently in these tables, how much disk 
space is being used.

the underlying problem being worked here, by the way, is
to come up with a task that monitors space used by the database
so as to issue a warning of some sort when the partition
upon which the database resides is getting full. i had originally 
used the statfs output from the (unix) os, but for this to accurately
reflect the "used" space, i found i had to do a "vacuum full" and
compress the database. i'd rather come up with a heuristic where
i can take a number of rows (which i get from the reltuples column
in pg_class, which seems to be accurate after running 'analyze')
and multiply by some number i have calculated based on the table
schema to get a fairly accurate guess at the real space used.

any and all advice welcome, and thanks.

pg

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [ADMIN] storage calculations

2003-07-30 Thread pgboy
sorry, when i referred to "disk space used" i meant the actual
amount used by live rows. if i insert 1000 rows then delete
999 of them, the disk file will be the size of the 100 row
container (make that 1000, not 100 - i cannot backspace here)
until i do a "vacuum full" - which does a table lock, which is
a bad thing.

given that, i'd like to try to be able to calculate the number of
bytes a row uses given its schema. i've seen this kind of 
documentation for other dbs, i just cannot seem to find it
in the postgresql docs.

thanks.
pg

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[ADMIN] recovery from a bad transaction log

2003-07-30 Thread pgboy

given that someone (else) deleted the pg_xlog directory, or that it got
otherwise corrupted, is there a way to regenerate the database, given 
that all the data is still safely tucked away in $PGDATA/base ???

pg

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [ADMIN] recovery from a bad transaction log

2003-07-30 Thread pgboy
that's what i was looking for. thanks. and i see what you mean
about not really trusting that the data really is all there.
still, most is better than none.

again, thnks.
pg

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [ADMIN] storage calculations

2003-07-31 Thread pgboy

On Thu, 31 Jul 2003, Peter Eisentraut wrote:N
> [EMAIL PROTECTED] writes:N
> > sorry, when i referred to "disk space used" i meant the actual amount
> > used by live rows. if i insert 1000 rows then delete 999 of them, the
> > disk file will be the size of the 100 row container (make that 1000, not
> > 100 - i cannot backspace here) until i do a "vacuum full" - which does a
> > table lock, which is a bad thing.
>
> The task that you originally described was that you want to monitor when
> the disk is getting full.  For that task, you need to take into account
> the actual size of the data on disk, not the size after a "vacuum full"
> which you say you do not want to execute.  Basing a disk full monitor on
> hypothetical sizes sounds pretty useless.
>
> > given that, i'd like to try to be able to calculate the number of
> > bytes a row uses given its schema. i've seen this kind of
> > documentation for other dbs, i just cannot seem to find it
> > in the postgresql docs.
>
> There is some information about that in the FAQ, but keep in mind that
> rows may be compressed or moved to secondary storage automatically.

well, i can admit that i am confused. my assumption is that when a row
is deleted in pg, that row is merely marked as 'gone' until a vacuum-full
is done. my further assumption is that if i continue to add rows, those
rows are not necessarily appended to the end of the physical data file,
but can be written over 'deleted' rows.

given that, a vacuum-full is the only way i know of to get an accurate
reflection of the number of bytes being used to store the data. without
the vacuum, i can tell how big a potentially sparse file is, but i don't
want to raise a warning just because the file size is getting large 
(unless, of course, pg dos not overwrite deleted rows, in which case the
warning, or a vaccum-full, seems appropriate.

i think i agree with you, too, that i cannot really calculate a
hypothetical size, unless i have all fixed-sized fields. in that case,
i should be able to accurately calculate the size, yes? if not, what
are the variables i could not account for?

uh, any more info on your comment "rows may be compressed or moved
to secondary storage automatically." i'd *love* to know how to do 
that.

thanks.
pgboy


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [ADMIN] storage calculations

2003-07-31 Thread pgboy
On Thu, 31 Jul 2003, Andrew Sullivan wrote:

> On Thu, Jul 31, 2003 at 08:51:09AM -0400, [EMAIL PROTECTED] wrote:
> > 
> > well, i can admit that i am confused. my assumption is that when a row
> > is deleted in pg, that row is merely marked as 'gone' until a vacuum-full
> > is done. my further assumption is that if i continue to add rows, those
> > rows are not necessarily appended to the end of the physical data file,
> > but can be written over 'deleted' rows.
> 
> Your assumption is wrong.  VACUUM without FULL will allow you to
> write over the free space now available in your data files, subject
> to the limitations of tracking as determined by your FSM settings.
> 
> VACUUM FULL actually shortens the data file.  Except for cases having
> to do with large tuples that won't fit in the previous page, VACUUM
> FULL means that you can't fit any more data in that file, once the
> VACUUM FULL is done.  
> 
> A
> 
> 

well, i was close. i didn't realize that i had to do a (non-full)
vacuum to mark deleted space as free.

but after that, i am still left with a potentially sparse file
and i don't really have a way to guess how much free space is 
available until i do a full vacuum, correct? (at which time the
file size(s) will indicate the amount being used)

just started looking at fsm. thanks for the pointer there. i hope
i'm not acting too obtuse here.

thanks.
pg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [ADMIN] counting rows

2003-08-14 Thread pgboy

nice stuff (the whole presentation, full of stuff i did not know), thanks.

of course, a question...

the bit that reads:
  If you know the maximum value of an integer column and have an index on 
it:
  SELECT COUNT(*) FROM t2 WHERE c2 < 100;

made me think that the above would do an index scan vice a table scan. but
when i try it with my table (currently has almost 70,000,000 rows) it
still does a table scan (at least, EXPLAIN says it will do a table scan)
... well, sometimes. the table has two integer fields, SECONDS and MICROS,
and i have created an index on (SECONDS, MICROS). ANALYZEd the table so
the index would be considered useful, tested that pg was happy by running:

  EXPLAIN SELECT count(*) FROM tess WHERE SECONDS < 100;

sure enough, query plan says it'll use an index scan. but...

  EXPLAIN SELECT count(*) from tess where SECONDS < 11;

query plan here says it will use a table scan.

the cutoff, the highest value at which it will still do an index scan, is
1060183451. given that this value is typically gotten from the system call 
gettimeofday and the current maximum value is 1060188816 (about noon 
today), this is, er, problematic.

any ideas? 

thanks.
pg



On Wed, 6 Aug 2003 [EMAIL PROTECTED] wrote:

> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> > i need to get a row count on a large table. it appears that
> >   select count(*) from ;
> > always does a table scan, whether i have an index or not, and that is too 
> > slow.
> 
> 
> http://www.gtsm.com/oscon2003/findrows.html
> 
> 
> - --
> Greg Sabino Mullane [EMAIL PROTECTED]
> PGP Key: 0x14964AC8 200308060951
> -BEGIN PGP SIGNATURE-
> Comment: http://www.turnstep.com/pgp.html
> 
> iD8DBQE/MQgmvJuQZxSWSsgRAkFDAJwPz5C7OljCg4TYqxvw4/WjLr8TlgCgovtS
> xQPuL/l+QCPgmwVDeX0yocM=
> =XtnA
> -END PGP SIGNATURE-
> 
> 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[ADMIN] counting rows

2003-08-14 Thread pgboy
i need to get a row count on a large table. it appears that
  select count(*) from ;
always does a table scan, whether i have an index or not, and that is too 
slow.

so i found this:
  select reltuples from pg_class where relname = '';

this is nice and fast, but not all that accurate at any given time. but, i 
*did* notice that it is pretty accurate if i run analyze just before.

so, i am wondering if analyze also does a table scan, in which case i am 
back to where i started. it *seems* to run fairly quickly, even on a large 
table, but i am just wall-clocking all of this.

or any other ideas on counting rows are welcome.

thanks.
pg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])