[PERFORM] WAL log performance/efficiency question

2007-05-17 Thread Keaton Adams
I sent this to pgsql-admin but didn't receive a response.  Would this be
a WAL log performance/efficiency issue?

Thanks,

Keaton


Given these postgresql.conf settings:

#---
# WRITE AHEAD LOG
#---

# - Settings -

fsync = on  # turns forced synchronization
on or off
wal_sync_method = fsync # the default is the first
option
# supported by the operating
system:
#   open_datasync
#   fdatasync
#   fsync
#   fsync_writethrough
#   open_sync
full_page_writes = on   # recover from partial page
writes
wal_buffers = 32# min 4, 8KB each
commit_delay = 10   # range 0-10, in
microseconds
commit_siblings = 1000  # range 1-1000

# - Checkpoints -

checkpoint_segments = 500   # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300# range 30-3600, in seconds
checkpoint_warning = 120# in seconds, 0 is off

# - Archiving -
archive_command = '/mnt/logship/scripts/archivemaster.sh %p %f'
# command to use to archive a logfile
# segment



And these tables to load data into:

   List of relations
Schema |   Name| Type  |  Owner   
+---+---+--
public | testload  | table | postgres
public | testload2 | table | postgres
public | testload3 | table | postgres
(3 rows)

postgres=# \d testload
   Table "public.testload"
Column |  Type  | Modifiers 
++---
name   | character(100) | 

postgres=# \d testload2
  Table "public.testload2"
Column |  Type  | Modifiers 
++---
name   | character(100) | 

postgres=# \d testload3
  Table "public.testload3"
Column |  Type  | Modifiers 
++---
name   | character(100) | 

There are no indexes on the tables.


Using an 8K data page:

8K data page (8192 bytes)
Less page header and row overhead leaves ~8000 bytes
At 100 bytes per row = ~80 rows/page
Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000 bytes /
1048576 = ~ 24.4 MB of data page space.

The test file is shown here (250,000 rows all the same):
-bash-3.1$ more datafile.txt
AABBCCDDEEFFGGHHIIJJ
AABBCCDDEEFFGGHHIIJJ
AABBCCDDEEFFGGHHIIJJ

The load script:
-bash-3.1$ more loaddata.sql
copy testload from '/home/kadams/logship/datafile.txt' delimiter '|';
copy testload2 from '/home/kadams/logship/datafile.txt' delimiter '|';
copy testload3 from '/home/kadams/logship/datafile.txt' delimiter '|';

So the one load process does a COPY into the three tables.  24.4 MB * 3
tables = ~ 73.2 MB of data page space.

This is the only process running on the database.  No other loads/users
are on the system.

psql -f sql/loaddata.sql  >/dev/null 2>&1 &

It seems that 112 MB of WAL file space (16 MB * 7) is required for 73.2
MB of loaded data, which is an extra 34.8% of disk space to log/archive
the COPY commands:

First pass:
LOG:  transaction ID wrap limit is 2147484146, limited by database
"postgres"
LOG:  archived transaction log file "00010001005E"
LOG:  archived transaction log file "00010001005F"
LOG:  archived transaction log file "000100010060"
LOG:  archived transaction log file "000100010061"
LOG:  archived transaction log file "000100010062"
LOG:  archived transaction log file "000100010063"
LOG:  archived transaction log file "000100010064"

# of logs in pg_xlog: 9

Second pass:
LOG:  archived transaction log file "000100010065"
LOG:  archived transaction log file "000100010066"
LOG:  archived transaction log file "000100010067"
LOG:  archived transaction log file "000100010068"
LOG:  archived transaction log file "000100010069"
LOG:  archived transaction log file "00010001006A"
LOG:  archived transaction log file "00010001006B"

# of logs in pg_xlog: 15

Third pass:
LOG:  archived transaction log file "00010001006C"
LOG:  archived transaction log file "00010001006D"
LOG:  archived transaction log file "00010001006E"
LOG:  archived transaction log file "00010001006F"
LOG:  archived transaction log file "000100010070"
LOG:  archived transaction log file "000100010071"
LOG:

Re: [PERFORM] WAL log performance/efficiency question

2007-05-17 Thread Keaton Adams
So for every data page there is a 20 byte header, for every row there is
a 4 byte identifier (offset into the page), AND there is also a 28 byte
fixed-size header (27 + optional null bitmap)?? (I did find the section
in the 8.1 manual that give the physical page layout.)  The other RDBMS
platforms I have worked with have a header in the 28 byte range and a
row pointer of 4 bytes, and that's it.  I find it a bit surprising that
PostgreSQL would need another 28 bytes per row to track its contents.

I'll try the pg_relpages function as you suggest and recalculate from
there.

Thanks for the info,

-Keaton



On Thu, 2007-05-17 at 15:23 +0100, Heikki Linnakangas wrote:

> Keaton Adams wrote:
> > Using an 8K data page:
> > 
> > 8K data page (8192 bytes)
> > Less page header and row overhead leaves ~8000 bytes
> > At 100 bytes per row = ~80 rows/page
> > Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000 bytes /
> > 1048576 = ~ 24.4 MB of data page space.
> 
> That's not accurate. There's 32 bytes of overhead per row, and that 
> gives you just 61 tuples per page. Anyhow, I'd suggest measuring the 
> real table size with pg_relpages function (from contrib/pgstattuple) or 
> from pg_class.relpages column (after ANALYZE).
> 
> > We are running on PostgreSQL 8.1.4 and are planning to move to 8.3 when
> > it becomes available.  Are there space utilization/performance
> > improvements in WAL logging in the upcoming release?
> 
> One big change in 8.3 is that COPY on a table that's been created or 
> truncated in the same transaction doesn't need to write WAL at all, if 
> WAL archiving isn't enabled.
> 


Re: [PERFORM] WAL log performance/efficiency question

2007-05-17 Thread Keaton Adams

OK, I understand.

So one clarifying question on WAL contents:

On an insert of a 100 byte row that is logged, what goes into the WAL
log? Is it 100 bytes, 132 bytes (row + overhead), or other?  Does just
the row contents get logged, or the contents plus all of the relative
overhead?  I understand that after a checkpoint the first insert
requires the entire 8K page to be written to the WAL, so do subsequent
inserts into WAL follow the same storage pattern as the layout on the
data page, or is the byte count less?

-K



> 
> 8K data page (8192 bytes)
> Less page header and row overhead leaves ~8000 bytes
> At 100 bytes per row = ~80 rows/page
> Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000
> bytes / 1048576 = ~ 24.4 MB of data page space.
> 



On Thu, 2007-05-17 at 08:04 -0600, Keaton Adams wrote:

> I sent this to pgsql-admin but didn't receive a response.  Would this
> be a WAL log performance/efficiency issue?
> 
> Thanks,
> 
> Keaton
> 
> 
> Given these postgresql.conf settings:
> 
> #---
> # WRITE AHEAD LOG
> #---
> 
> # - Settings -
> 
> fsync = on  # turns forced synchronization
> on or off
> wal_sync_method = fsync # the default is the first
> option
> # supported by the operating
> system:
> #   open_datasync
> #   fdatasync
> #   fsync
> #   fsync_writethrough
> #   open_sync
> full_page_writes = on   # recover from partial page
> writes
> wal_buffers = 32# min 4, 8KB each
> commit_delay = 10   # range 0-10, in
> microseconds
> commit_siblings = 1000  # range 1-1000
> 
> # - Checkpoints -
> 
> checkpoint_segments = 500   # in logfile segments, min 1, 16MB
> each
> checkpoint_timeout = 300# range 30-3600, in seconds
> checkpoint_warning = 120# in seconds, 0 is off
> 
> # - Archiving -
> archive_command = '/mnt/logship/scripts/archivemaster.sh %p %f'
> # command to use to archive a logfile
> # segment
> 
> 
> 
> And these tables to load data into:
> 
>List of relations
> Schema |   Name| Type  |  Owner   
> +---+---+--
> public | testload  | table | postgres
> public | testload2 | table | postgres
> public | testload3 | table | postgres
> (3 rows)
> 
> postgres=# \d testload
>Table "public.testload"
> Column |  Type  | Modifiers 
> ++---
> name   | character(100) | 
> 
> postgres=# \d testload2
>   Table "public.testload2"
> Column |  Type  | Modifiers 
> ++---
> name   | character(100) | 
> 
> postgres=# \d testload3
>   Table "public.testload3"
> Column |  Type  | Modifiers 
> ++---
> name   | character(100) | 
> 
> There are no indexes on the tables.
> 
> 
> Using an 8K data page:
> 
> 8K data page (8192 bytes)
> Less page header and row overhead leaves ~8000 bytes
> At 100 bytes per row = ~80 rows/page
> Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000
> bytes / 1048576 = ~ 24.4 MB of data page space.
> 
> The test file is shown here (250,000 rows all the same):
> -bash-3.1$ more datafile.txt
> AABBCCDDEEFFGGHHIIJJ
> AABBCCDDEEFFGGHHIIJJ
> AABBCCDDEEFFGGHHIIJJ
> 
> The load script:
> -bash-3.1$ more loaddata.sql
> copy testload from '/home/kadams/logship/datafile.txt' delimiter '|';
> copy testload2 from '/home/kadams/logship/datafile.txt' delimiter '|';
> copy testload3 from '/home/kadams/logship/datafile.txt' delimiter '|';
> 
> So the one load process does a COPY into the three tables.  24.4 MB *
> 3 tables = ~ 73.2 MB of data page space.
> 
> This is the only process running on the database.  No other
> loads/users are on the system.
> 
> psql -f sql/loaddata.sql  >/dev/null 2>&1 &
> 
> It seems that 112 MB of WAL file space (16 MB * 7) is required for
> 73.2 M