Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Paul Thomas
On 08/08/2003 11:28 mixo wrote:
I have just installed redhat linux 9 which ships with Pg
7.3.2. Pg has to be setup so that data inserts (blobs) should
be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.


It doesn't sound like a particlarly heavy loading to me. I'd start off 
with something like

shared_buffers = 2000
sort_mem = 1024
max_coonections = 100
and see how it performs under normal business loading.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Neil Conway
On Mon, Aug 11, 2003 at 06:59:30PM -0400, Bruce Momjian wrote:
> Uh, the ext2 developers say it isn't 100% reliable --- at least that is
> that was told.  I don't know any personally, but I mentioned it while I
> was visiting Red Hat, and they didn't refute it.

IMHO, if we're going to say "don't use X on production PostgreSQL
systems", we need to have some better evidene than "no one has
said anything to the contrary, and I heard X is bad". If we can't
produce such evidence, we shouldn't say anything at all, and users
can decide what to use for themselves.

(Not that I'm agreeing or disagreeing about ext2 in particular...)

> > My
> > untested interpretation was that the update bookkeeping as well as data
> > update were all getting journalled, the journal space would fill, get
> > sync'd, then repeat. In effect, all blocks were being written TWICE just
> > for the journalling, never mind the overhead for PostgreSQL
> > transactions.

Journalling may or may not have been the culprit, but I doubt everything
was being written to disk twice:

(a) ext3 does metadata-only journalling by default

(b) PostgreSQL only fsyncs WAL records to disk, not the data itself

-Neil


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Mon, 11 Aug 2003, Bruce Momjian wrote:

> scott.marlowe wrote:
> > On Fri, 8 Aug 2003, Andrew Sullivan wrote:
> > 
> > > On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > > > 
> > > > Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
> > > > (ext2?) with the partition that holds your data and WAL.
> > > 
> > > I would give you exactly the opposite advice: _never_ use a
> > > non-journalling fs for your data and WAL.  I suppose if you can
> > > afford to lose some transactions, you can do without journalling. 
> > > Otherwise, you're just borrowing trouble, near as I can tell.
> > 
> > I'd argue that a reliable filesystem (ext2) is still better than a 
> > questionable journaling filesystem (ext3 on kernels <2.4.20).
> > 
> > This isn't saying to not use jounraling, but I would definitely test it 
> > under load first to make sure it's not gonna lose data or get corrupted.
> 
> That _would_ work if ext2 was a reliable file system --- it is not.
> 
> This is the problem of Linux file systems --- they have unreliable, and
> journalled, with nothing in between, except using a journalling file
> system and having it only journal metadata.

Never the less, on LINUX, which is what we use, it is by far more reliable 
than ext3 or reiserfs.  In four years of use I've lost zero files to any 
of its bugs.  Of course, maybe that's RedHat patching the kernel for me or 
something. :-)  they seem to hire some pretty good hackers.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian
scott.marlowe wrote:
> On Fri, 8 Aug 2003, Andrew Sullivan wrote:
> 
> > On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > > 
> > > Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
> > > (ext2?) with the partition that holds your data and WAL.
> > 
> > I would give you exactly the opposite advice: _never_ use a
> > non-journalling fs for your data and WAL.  I suppose if you can
> > afford to lose some transactions, you can do without journalling. 
> > Otherwise, you're just borrowing trouble, near as I can tell.
> 
> I'd argue that a reliable filesystem (ext2) is still better than a 
> questionable journaling filesystem (ext3 on kernels <2.4.20).
> 
> This isn't saying to not use jounraling, but I would definitely test it 
> under load first to make sure it's not gonna lose data or get corrupted.

That _would_ work if ext2 was a reliable file system --- it is not.

This is the problem of Linux file systems --- they have unreliable, and
journalled, with nothing in between, except using a journalling file
system and having it only journal metadata.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Ron Johnson
On Wed, 2003-08-13 at 01:47, Christopher Kings-Lynne wrote:
> > So, group the inserts in transactions with maybe 1000 commands each. It
> > will go much faster. It can then cache the rows and in the end just make
> > sure all 1000 have been written out on disk.
> 
> More than that, he should be using COPY - it's 10x faster than even grouped
> inserts.

I have a table which has a foreign key reference to a properly indexed
table, and needed to load 15GB of uncompressed data into that table.

Since the machine is minimal (60GB 5400RPM IDE HDD, 1GB RAM, 1GHz
Athlon), to save precious disk space, I had the data compressed into
22 files totaling 641GiB.  The records are approximately 275 bytes
in size.

Also, because date transformations needed to be made, I had to 1st
insert into a temp table, and insert from there into the main table.
Thus, in essence, I had to insert each record twice.

So, in 8:45 (not 8 minutes 45 seconds!, decompressed 641MiB worth of
96% compressed files, inserted 30M rows, and inserted 30M rows again,
while doing foreign key checks to another table.  And the data files
plus database are all on the same disk.

Pretty impressive: 1,920 inserts/second.

for f in ltx_*unl.gz;
do
psql test1 -c "truncate table t_lane_tx2;" ;
(zcat $f | sed "s/\"//g" | \
   psql test1 -c "copy t_lane_tx2 from stdin delimiter ',';");
time psql -a -f sel_into_ltx.sql -d test1  ;
done

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| "Man, I'm pretty.  Hoo Hah!"  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Dennis Björklund
On Tue, 12 Aug 2003, mixo wrote:

> that I am currently importing data into Pg which is about 2.9 Gigs.
> Unfortunately, to maintain data intergrity, data is inserted into a table
> one row at a time.'

So you don't put a number of inserts into one transaction?

If you don't do that then postgresql will treat each command as a
transaction and each insert is going to be forced out on disk (returning
when the data is just in some cache is not safe even if other products
might do that). If you don't do this then the server promise the client
that the row have been stored but then the server goes down and the row
that was in the cache is lost. It's much faster but not what you expect
from a real database.

So, group the inserts in transactions with maybe 1000 commands each. It 
will go much faster. It can then cache the rows and in the end just make 
sure all 1000 have been written out on disk.

There is also a configuration variable that can tell postgresql to not 
wait until the insert is out on disk, but that is not recomended if you 
value your data.

And last, why does it help integrity to insert data one row at a time?

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Tom Lane
Jeff <[EMAIL PROTECTED]> writes:
> On Tue, 12 Aug 2003, Christopher Browne wrote:
>> Are you _certain_ that's still true?  Have you a metric that shows
>> Informix being 10x faster on a modern system?  That would be quite
>> surprising...

> We were forced (for budget reason) to switch from raw disk to cooked files
> on our informix db. We took a huge hit - about 5-6x slower.
> [snip]
> The raw disks were on a hitachi fibre array and the cooked files
> were on a raid5 (scsi). Forget how many spindles in the raid.
> There were 20 raw disks)

Seems like you can't know how much of the performance hit was due to the
filesystem change and how much to the hardware change.  But I'd bet 20
disks on fibre array have way more net throughput than a single RAID
array on scsi.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Jeff
On Tue, 12 Aug 2003, Christopher Browne wrote:

> Are you _certain_ that's still true?  Have you a metric that shows
> Informix being 10x faster on a modern system?  That would be quite
> surprising...
>

We were forced (for budget reason) to switch from raw disk to cooked files
on our informix db. We took a huge hit - about 5-6x slower.  Granted part
of that was because informix takes number of spindles, etc into account
when generating query plans and the fact running UPDATE STATISTICS (think
Vacuum analyze) on the version we run locks the table exclusively. And it
is unacceptable to have our "main table" unavailable for hours and hours
while the update runs. (For the record: its a 8cpu sun e4500 running
sol2.6.  The raw disks were on a hitachi fibre array and the cooked files
were on a raid5 (scsi). Forget how many spindles in the raid.
There were 20 raw disks)

Informix, etc. have spent a lot of time and money working on it.
They also have the advantage of having many paid fulltime
developers who are doing this for a job, not as a weekend hobby
(Compared to the what? 2-3 full time PG developers).

The other advantage (which I hinted to above) with raw disks is being able
to optimize queries to take advantage of it.  Informix is multithreaded
and it will spawn off multiple "readers" to do say, a seq scan (and merge
the results at the end).

So if you have a table across say, 3 disks and you need to do a seq scan
it will spawn three readers to do the read. Result: nice and fast (Yes, It
may not always spawn the three readers, only when it thinks it will be a
good thing to do)

I think for PG the effort would be much better spent on other features...
like replication and whatnot.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian
Ron Johnson wrote:
> On Tue, 2003-08-12 at 13:39, Bruce Momjian wrote:
> > OK, I got some hard evidence.  Here is a discussion on the Linux kernel
> > mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
> > Tweedie (ext3 author).
> > 
> > http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start
> > 
> > Search for "softupdates and ext2".
> > 
> > Here is the original email in the thread:
> > 
> > http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html
> > 
> > Summary is at:
> > 
> > http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html
> > 
> > and conclusion in:
> > 
> > http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html
> > 
> > I now remember the issue --- ext2 makes all disk changes asynchonously
> > (unless you mount it via sync, which is slow).  This means that the file
> > system isn't always consistent on disk.  
> > 
> > UFS has always sync metadata (file/directory creation) to the disk so
> > the disk was always consistent, but doesn't sync the data to the disk,
> > for performance reasons.  With soft updates, the metadata writes are
> > delayed, and written to disk in an order that keeps the file system
> > consistent.
> >
> > Is this enough evidence, or should I keep researching?
> 
> This is all 4 years old, though.  Isn't that why the ext3 "layer" was

Yes, it is four years old, but no one has told me ext2 has changed in
this regard, and seeing that they created ext3 to fix these aspects, I
would think ext2 hasn't changed.

> created, and filesystems like reiserFS, XFS and (kinda) JFS were added
> to Linux?

Yes, it is those ext2 limitations that caused the development of ext3
and the others.  However, they went much father than implementing a
crash-safe file system, but rather enabled a file system that doesn't
need fsck on crash reboot.  This causes fsync of data and metadata (file
creation), which slows down the file system, and PostgreSQL doesn't need
it.

You can mount ext3 and others with data=writeback to fsync only
metadata, but it isn't the default.

I am not sure what the ext3 layer is.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Browne
[EMAIL PROTECTED] (Jeff) writes:
> On Tue, 12 Aug 2003, Christopher Browne wrote:
>> Are you _certain_ that's still true?  Have you a metric that shows
>> Informix being 10x faster on a modern system?  That would be quite
>> surprising...

> We were forced (for budget reason) to switch from raw disk to cooked
> files on our informix db. We took a huge hit - about 5-6x slower.
> Granted part of that was because informix takes number of spindles,
> etc into account when generating query plans and the fact running
> UPDATE STATISTICS (think Vacuum analyze) on the version we run locks
> the table exclusively. And it is unacceptable to have our "main
> table" unavailable for hours and hours while the update runs. (For
> the record: its a 8cpu sun e4500 running sol2.6.  The raw disks were
> on a hitachi fibre array and the cooked files were on a raid5
> (scsi). Forget how many spindles in the raid.  There were 20 raw
> disks)

Sounds like what you were forced to do was to do TWO things:

 1.  Switch from raw disk to cooked files, and
 2.  Switch from a fibre array to a RAID array

You're attributing the 5-6x slowdown to 1., when it seems likely that
2. is a far more significant multiple.

What with there being TWO big changes that took place that might be
expected to affect performance, it seems odd to attribute a
factor-of-many change to just one aspect of that.

> Informix, etc. have spent a lot of time and money working on it.
> They also have the advantage of having many paid fulltime developers
> who are doing this for a job, not as a weekend hobby (Compared to
> the what? 2-3 full time PG developers).


Sure, and I'm sure the PG developers hardly know _anything_ about
implementing databases, either.


Certainly IBM (who bought Informix) has lots of time and money to
devote to enhancements.  But I think you underestimate the time,
skill, and effort involved with PG work.  It's quite typical for
people to imagine free software projects to basically be free-wheeling
efforts mostly involving guys that still have acne that haven't much
knowledge of the area.  Reality, for the projects that are of some
importance, is staggeringly different from that.  The number of people
with graduate degrees tends to surprise everyone.

The developers may not have time to add frivolous things to the
system, like building sophisticated Java-based GUI installers, XML
processors, or such.  That does, however, improve their focus, and so
PostgreSQL does not suffer from the way Oracle has fifty different
bundlings most of which nobody understands.

> The other advantage (which I hinted to above) with raw disks is
> being able to optimize queries to take advantage of it.  Informix is
> multithreaded and it will spawn off multiple "readers" to do say, a
> seq scan (and merge the results at the end).
>
> So if you have a table across say, 3 disks and you need to do a seq
> scan it will spawn three readers to do the read. Result: nice and
> fast (Yes, It may not always spawn the three readers, only when it
> thinks it will be a good thing to do)

Andrew Sullivan's fairly regular response is that he tried (albeit not
VASTLY extensively) to distinguish between disks when working with
fibre arrays, and he couldn't measure an improvement in shifting WAL
(the OBVIOUS thing to shift) to separate disks.

There's a lot of guesswork as to precisely why that result falls out.

One of the better guesses seems to be that if you've got enough
battery-backed memory cache on the array, that lets updates get pushed
to cache so fast that it doesn't too much matter which disk they hit.

If you've got enough spindles, and build much of the array in a
striped manner, you'll get data splitting across disks without having
to specify any "table options" to force it to happen.

You raise a good point vis-a-vis the thought of spawning multiple
readers; that could conceivably be a useful approach to improve
performance for very large queries.  If you could "stripe" the tables
in some manner so they could be doled out to multiple worker
processes, that could indeed provide some benefits.  If there are
three workers, they might round-robin to grab successive pages from
the table to do their work, and then end with a merge step.

That's probably a 7.7 change, mind you :-), but once other simpler
approaches to making the engine faster have been exhausted, that's the
sort of thing to look into next.

> I think for PG the effort would be much better spent on other
> features...  like replication and whatnot.

At this point, sure.
-- 
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/lisp.html
"Using Java  as a general purpose application  development language is
like  going big  game  hunting  armed with  Nerf  weapons." 
-- Author Unknown

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Sean Chittenden
> Andrew Sullivan's fairly regular response is that he tried (albeit
> not VASTLY extensively) to distinguish between disks when working
> with fibre arrays, and he couldn't measure an improvement in
> shifting WAL (the OBVIOUS thing to shift) to separate disks.

Real quick... the faster the drives, the less important it is to move
WAL onto a different drive.  The slower the drives, the more important
this is... which is why this isn't as necessary (if at all) for large
production environments.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Ron Johnson
On Wed, 2003-08-13 at 10:46, Josh Berkus wrote:
> Jeff,
> 
[snip]
> > The other advantage (which I hinted to above) with raw disks is being able
> > to optimize queries to take advantage of it.  Informix is multithreaded
> > and it will spawn off multiple "readers" to do say, a seq scan (and merge
> > the results at the end).
> 
> I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm  we'd 
> need to see some tests demonstrating that this approach was still a technical 
> advantage given the improvements in RAID  and FS technology since Informix 
> was designed.

Wouldn't PG 1st need horizontal partitioning, and as a precursor to
that, "tablespaces"?

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| "Man, I'm pretty.  Hoo Hah!"  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian
Josh Berkus wrote:
> Jeff,
> 
> > Informix, etc. have spent a lot of time and money working on it.
> > They also have the advantage of having many paid fulltime
> > developers who are doing this for a job, not as a weekend hobby
> > (Compared to the what? 2-3 full time PG developers).
> 
> I think 4-6 full-time, actually, plus about 200 part-time contributors.  Which 
> adds up to a bloody *lot* of code if you monitor pgsql-patches between 
> versions.  The only development advantage the commercials have over us is the 
> ability to engage in large projects (e.g. replication, raw filesystems, etc.) 
> that are difficult for a distributed network of people.

I think Informix's track record for post-Informix 5.0 releases is poor:

6.0 aborted release, pretty much withdrawn
7.0 took 1-2 years to stabalize
8.0 where was that?
9.0 confused customers

How much does Informix improve in 6 months?  In 2 years?  How long does
it take to get a bug fixed?

At this point, only the largest corporations can keep up with our
open-source development model.  The other database vendors have already
closed, as did Informix when purchased by IBM.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread mixo
Thanks to everyone who responded. It's a pity that the discussion has gone
the ext2 vs ext3 route. The main reason I asked my original question is
that I am currently importing data into Pg which is about 2.9 Gigs.
Unfortunately, to maintain data intergrity, data is inserted into a table
one row at a time. This exercise took ~7 days on the same system with
slightly different setup(PIII 1.0GHZ, 512M RAM -- CPU speed was down graded
due to serveral over heating problems which have since been fixed, and RAM
was added for good measure). I have just reloaded the machine, and started
the import. So far ~ 6000 record have been imported, and there is 32000
left.
P.S. Importing the same data on Mysql took ~2 days.

Bjoern Metzdorf wrote:

be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.
   

For 15 users you won't need great tuning at all. Just make sure, that you
have the right indizes on the tables and that you have good queries (query
plan).
About the 8Meg blobs, I don't know. Other people on this list may be able to
give you hints here.
Regards,
Bjoern
 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bill Moran
Shridhar Daithankar wrote:
On 11 Aug 2003 at 23:42, Ron Johnson wrote:


On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:

Well, yeah.  But given the Linux propensity for introducing major
features in "minor" releases (and thereby introducing all the
attendant bugs), I'd think twice about using _any_ Linux feature
until it's been through a major version (e.g. things introduced in
2.4.x won't really be stable until 2.6.x) -- and even there one is
taking a risk[1].
Dudes, seriously - switch to FreeBSD :P
But, like, we want a *good* OS... 8-0


Joke aside, I guess since postgresql is pretty much reliant on file system for 
basic file functionality, I guess it's time to test Linux 2.6 and compare it.

And don't forget, for large databases, there is still XFS out there which is 
probably the ruler at upper end..
This is going to push the whole thing a little off-topic, but I'm curious to
know the answer.
Has it ever been proposed or attemped to run PostgreSQL without any filesystem
(or any other database for that matter ...).
Meaning ... just tell it a raw partition to keep the data on and Postgre would
create its own "filesystem" ... obviously, doing that would allow Postgre to
bypass all the failings of all filesystems and rely entirely apon its own
rules.
Or are modern filesystems advanced enough that doing something like that would
lose more than it would gain?
Just thinking out loud.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Andrew Sullivan
On Tue, Aug 12, 2003 at 02:39:19PM -0400, Bill Moran wrote:
> Meaning ... just tell it a raw partition to keep the data on and
> Postgre would create its own "filesystem" ... obviously, doing that
> would allow Postgre to bypass all the failings of all filesystems
> and rely entirely apon its own rules.
> 
> Or are modern filesystems advanced enough that doing something like
> that would lose more than it would gain?

The latter, mostly.  This has been debated repeatedly on -hackers. 
If you want "raw" access, then you have to implement some other kind
of specialised filesystem of your own.  And you have to have all
sorts of nice tools to cope with the things that (for instance) fsck
handles.  I think the reaction of most developers has been, "Why
reinvent the wheel?"

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Tue, 12 Aug 2003, Christopher Kings-Lynne wrote:

> > Well, yeah.  But given the Linux propensity for introducing major
> > features in "minor" releases (and thereby introducing all the
> > attendant bugs), I'd think twice about using _any_ Linux feature
> > until it's been through a major version (e.g. things introduced in
> > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > taking a risk[1].
> 
> Dudes, seriously - switch to FreeBSD :P

Yeah, it's nice to have a BUG FREE OS huh? ;^)

And yes, I've used FreeBSD, it's quite good, but I kept getting the 
feeling it wasn't quite done.  Especially the installation documentation.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Neil Conway
On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
 I don't use Linux and was just repeating what I had heard from others,
> and read in postings.  I don't have any first-hand experience with ext2
> (except for a laptop I borrowed that wouldn't boot after being shut
> off), but others on this mailing list have said the same thing.

Right, and I understand the need to answer users asking about
which filesystem to use, but I'd be cautious of bad-mouthing
another OSS project without any hard evidence to back up our
claim (of course if we have such evidence, then fine -- I
just haven't seen it). It would be like $SOME_LARGE_OSS
project saying "Don't use our project with PostgreSQL, as
[EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
UnixWare" -- kind of annoying, right?

> > (a) ext3 does metadata-only journalling by default
> 
> If that is true, why was I told people have to mount their ext3 file
> systems with metadata-only.  Again, I have no experience myself, but why
> are people telling me this?

Perhaps they were suggesting that people mount ext2 using
data=writeback, rather than the default of data=ordered.

BTW, I've heard from a couple different people that using
ext3 with data=journalled (i.e. enabling journalling of both
data and metadata) actually makes PostgreSQL faster, as
it means that ext3 can skip PostgreSQL's fsync request
since ext3's log is flushed to disk already. I haven't
tested this myself, however.

-Neil


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Ron Johnson
On Tue, 2003-08-12 at 13:39, Bruce Momjian wrote:
> OK, I got some hard evidence.  Here is a discussion on the Linux kernel
> mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
> Tweedie (ext3 author).
> 
>   http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start
> 
> Search for "softupdates and ext2".
> 
> Here is the original email in the thread:
> 
>   http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html
> 
> Summary is at:
> 
>   http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html
> 
> and conclusion in:
> 
>   http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html
> 
> I now remember the issue --- ext2 makes all disk changes asynchonously
> (unless you mount it via sync, which is slow).  This means that the file
> system isn't always consistent on disk.  
> 
> UFS has always sync metadata (file/directory creation) to the disk so
> the disk was always consistent, but doesn't sync the data to the disk,
> for performance reasons.  With soft updates, the metadata writes are
> delayed, and written to disk in an order that keeps the file system
> consistent.
>
> Is this enough evidence, or should I keep researching?

This is all 4 years old, though.  Isn't that why the ext3 "layer" was
created, and filesystems like reiserFS, XFS and (kinda) JFS were added
to Linux?

> ---
> 
> Neil Conway wrote:
> > On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
> >  I don't use Linux and was just repeating what I had heard from others,
> > > and read in postings.  I don't have any first-hand experience with ext2
> > > (except for a laptop I borrowed that wouldn't boot after being shut
> > > off), but others on this mailing list have said the same thing.
> > 
> > Right, and I understand the need to answer users asking about
> > which filesystem to use, but I'd be cautious of bad-mouthing
> > another OSS project without any hard evidence to back up our
> > claim (of course if we have such evidence, then fine -- I
> > just haven't seen it). It would be like $SOME_LARGE_OSS
> > project saying "Don't use our project with PostgreSQL, as
> > [EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
> > UnixWare" -- kind of annoying, right?
> > 
> > > > (a) ext3 does metadata-only journalling by default
> > > 
> > > If that is true, why was I told people have to mount their ext3 file
> > > systems with metadata-only.  Again, I have no experience myself, but why
> > > are people telling me this?
> > 
> > Perhaps they were suggesting that people mount ext2 using
> > data=writeback, rather than the default of data=ordered.
> > 
> > BTW, I've heard from a couple different people that using
> > ext3 with data=journalled (i.e. enabling journalling of both
> > data and metadata) actually makes PostgreSQL faster, as
> > it means that ext3 can skip PostgreSQL's fsync request
> > since ext3's log is flushed to disk already. I haven't
> > tested this myself, however.
> > 
> > -Neil

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| "Man, I'm pretty.  Hoo Hah!"  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian

Uh, the ext2 developers say it isn't 100% reliable --- at least that is
that was told.  I don't know any personally, but I mentioned it while I
was visiting Red Hat, and they didn't refute it.

Now, the failure window might be quite small, but I have seen it happen
myself, and have heard it from others.

---

Reece Hart wrote:
> On Mon, 2003-08-11 at 15:16, Bruce Momjian wrote:
> 
> > That _would_ work if ext2 was a reliable file system --- it is not.
> 
> 
> Bruce-
> 
> I'd like to know your evidence for this. I'm not refuting it, but I'm a
> >7 year linux user (including several clusters, all of which have run
> ext2 or ext3) and keep a fairly close ear to kernel newsgroups,
> announcements, and changelogs. I am aware that there have very
> occasionally been corruption problems, but my understanding is that
> these are fixed (and quickly). In any case, I'd say that your assertion
> is not widely known and I'd appreciate some data or references.
> 
> As for PostgreSQL on ext2 and ext3, I recently switched from ext3 to
> ext2 (Stephen Tweedy was insightful to facilitate this backward
> compatibility). I did this because I had a 45M row update on one table
> that was taking inordinate time (killed after 10 hours), even though
> creating the database from backup takes ~4 hours including indexing (see
> pgsql-perform post on 2003/07/22). CPU usage was ~2% on an otherwise
> unloaded, fast, SCSI160 machine. vmstat io suggested that PostgreSQL was
> writing something on the order of 100x as many blocks as being read. My
> untested interpretation was that the update bookkeeping as well as data
> update were all getting journalled, the journal space would fill, get
> sync'd, then repeat. In effect, all blocks were being written TWICE just
> for the journalling, never mind the overhead for PostgreSQL
> transactions. This emphasizes that journals probably work best with
> short burst writes and syncing during lulls rather than sustained
> writes.
> 
> I ended up solving the update issue without really updating, so ext2
> timings aren't known. So, you may want to test this yourself if you're
> concerned.
> 
> -Reece
> 
> 
> -- 
> Reece Hart, Ph.D.   [EMAIL PROTECTED], http://www.gene.com/
> Genentech, Inc. 650/225-6133 (voice), -5389 (fax)
> Bioinformatics and Protein Engineering
> 1 DNA Way, MS-93http://www.in-machina.com/~reece/
> South San Francisco, CA  94080-4990 [EMAIL PROTECTED], GPG: 0x25EC91A0

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian

OK, I got some hard evidence.  Here is a discussion on the Linux kernel
mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
Tweedie (ext3 author).

http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start

Search for "softupdates and ext2".

Here is the original email in the thread:

http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html

Summary is at:

http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html

and conclusion in:

http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html

I now remember the issue --- ext2 makes all disk changes asynchonously
(unless you mount it via sync, which is slow).  This means that the file
system isn't always consistent on disk.  

UFS has always sync metadata (file/directory creation) to the disk so
the disk was always consistent, but doesn't sync the data to the disk,
for performance reasons.  With soft updates, the metadata writes are
delayed, and written to disk in an order that keeps the file system
consistent.
   
Is this enough evidence, or should I keep researching?

---

Neil Conway wrote:
> On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
>  I don't use Linux and was just repeating what I had heard from others,
> > and read in postings.  I don't have any first-hand experience with ext2
> > (except for a laptop I borrowed that wouldn't boot after being shut
> > off), but others on this mailing list have said the same thing.
> 
> Right, and I understand the need to answer users asking about
> which filesystem to use, but I'd be cautious of bad-mouthing
> another OSS project without any hard evidence to back up our
> claim (of course if we have such evidence, then fine -- I
> just haven't seen it). It would be like $SOME_LARGE_OSS
> project saying "Don't use our project with PostgreSQL, as
> [EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
> UnixWare" -- kind of annoying, right?
> 
> > > (a) ext3 does metadata-only journalling by default
> > 
> > If that is true, why was I told people have to mount their ext3 file
> > systems with metadata-only.  Again, I have no experience myself, but why
> > are people telling me this?
> 
> Perhaps they were suggesting that people mount ext2 using
> data=writeback, rather than the default of data=ordered.
> 
> BTW, I've heard from a couple different people that using
> ext3 with data=journalled (i.e. enabling journalling of both
> data and metadata) actually makes PostgreSQL faster, as
> it means that ext3 can skip PostgreSQL's fsync request
> since ext3's log is flushed to disk already. I haven't
> tested this myself, however.
> 
> -Neil
> 
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Gregory S. Williamson
FWIW, Informix can be run using a "cooked" (Unix) file for storing data or it uses 
"raw" disk space and bypasses the ordinary (high level) UNIX controllers and does its 
own reads/writes. About 10 times faster and safer. Of course, itmay have taken a lot 
of programmer time to make that solid. But the performance gains are significant.

Greg W.


-Original Message-
From:   Bill Moran [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/12/2003 11:39 AM
To: 
Cc: PgSQL Performance ML
Subject:    Re: [PERFORM] Perfomance Tuning

Shridhar Daithankar wrote:
> On 11 Aug 2003 at 23:42, Ron Johnson wrote:
> 
> 
>>On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
>>
>>>>Well, yeah.  But given the Linux propensity for introducing major
>>>>features in "minor" releases (and thereby introducing all the
>>>>attendant bugs), I'd think twice about using _any_ Linux feature
>>>>until it's been through a major version (e.g. things introduced in
>>>>2.4.x won't really be stable until 2.6.x) -- and even there one is
>>>>taking a risk[1].
>>>
>>>Dudes, seriously - switch to FreeBSD :P
>>
>>But, like, we want a *good* OS... 8-0
> 
> 
> Joke aside, I guess since postgresql is pretty much reliant on file system for 
> basic file functionality, I guess it's time to test Linux 2.6 and compare it.
> 
> And don't forget, for large databases, there is still XFS out there which is 
> probably the ruler at upper end..

This is going to push the whole thing a little off-topic, but I'm curious to
know the answer.

Has it ever been proposed or attemped to run PostgreSQL without any filesystem
(or any other database for that matter ...).

Meaning ... just tell it a raw partition to keep the data on and Postgre would
create its own "filesystem" ... obviously, doing that would allow Postgre to
bypass all the failings of all filesystems and rely entirely apon its own
rules.

Or are modern filesystems advanced enough that doing something like that would
lose more than it would gain?

Just thinking out loud.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Tue, 12 Aug 2003, Neil Conway wrote:

> On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
>  I don't use Linux and was just repeating what I had heard from others,
> > and read in postings.  I don't have any first-hand experience with ext2
> > (except for a laptop I borrowed that wouldn't boot after being shut
> > off), but others on this mailing list have said the same thing.
> 
> Right, and I understand the need to answer users asking about
> which filesystem to use, but I'd be cautious of bad-mouthing
> another OSS project without any hard evidence to back up our
> claim (of course if we have such evidence, then fine -- I
> just haven't seen it). It would be like $SOME_LARGE_OSS
> project saying "Don't use our project with PostgreSQL, as
> [EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
> UnixWare" -- kind of annoying, right?

Wow, you put my thoughts exactly into words for me, thanks Neil.

> > > (a) ext3 does metadata-only journalling by default
> > 
> > If that is true, why was I told people have to mount their ext3 file
> > systems with metadata-only.  Again, I have no experience myself, but why
> > are people telling me this?
> 
> Perhaps they were suggesting that people mount ext2 using
> data=writeback, rather than the default of data=ordered.
> 
> BTW, I've heard from a couple different people that using
> ext3 with data=journalled (i.e. enabling journalling of both
> data and metadata) actually makes PostgreSQL faster, as
> it means that ext3 can skip PostgreSQL's fsync request
> since ext3's log is flushed to disk already. I haven't
> tested this myself, however.

Now that you mention it, that makes sense.  I might have to test ext3 now 
that the 2.6 kernel is on the way, i.e. the 2.4 kernel should be settling 
down by now.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Jonathan Gardner
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Friday 08 August 2003 03:28, mixo wrote:
> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.
>

Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
(ext2?) with the partition that holds your data and WAL.

Consider having a seperate partition for the WAL as well.

These are things that are more difficult to change later on. Everything else 
is tweaking.

Is it absolutely necessary to store 8MB files in the database? I find it 
cumbersome. Storing them on a file server has been a better alternative for 
me.

- -- 
Jonathan Gardner <[EMAIL PROTECTED]>
Live Free, Use Linux!
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/M9J0WgwF3QvpWNwRAlT5AJ9EmDourbCiqj7MFOqfBospc2dW7gCfZKz0
JQjn/2KAeh1SPJfN601LoFg=
=PW6k
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Sean Chittenden
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> > 
> > Dudes, seriously - switch to FreeBSD :P
> 
> Yeah, it's nice to have a BUG FREE OS huh? ;^)
> 
> And yes, I've used FreeBSD, it's quite good, but I kept getting the
> feeling it wasn't quite done.  Especially the installation
> documentation.

While the handbook isn't the same as reading the actual source or the
only FreeBSD documentation, it certainly is quite good (to the point
that publishers see small market to publish FreeBSD books because the
documentation provided by the project is so good), IMHO.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/

If anyone on this list has any issues with the documentation, please
take them up with me _privately_ and I will do my best to either
address or correct the problem.

Now, back to our regularly scheduled and on topic programming...  -sc

-- 
Sean Chittenden
"(PostgreSQL|FreeBSD).org - The Power To Serve"

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Fri, 8 Aug 2003, Andrew Sullivan wrote:

> On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > 
> > Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
> > (ext2?) with the partition that holds your data and WAL.
> 
> I would give you exactly the opposite advice: _never_ use a
> non-journalling fs for your data and WAL.  I suppose if you can
> afford to lose some transactions, you can do without journalling. 
> Otherwise, you're just borrowing trouble, near as I can tell.

I'd argue that a reliable filesystem (ext2) is still better than a 
questionable journaling filesystem (ext3 on kernels <2.4.20).

This isn't saying to not use jounraling, but I would definitely test it 
under load first to make sure it's not gonna lose data or get corrupted.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On 11 Aug 2003, Ron Johnson wrote:

> On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> > 
> > Dudes, seriously - switch to FreeBSD :P
> 
> But, like, we want a *good* OS... 8-0

What, like Unixware?  (ducking quickly) (*_*)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Rod Taylor wrote:
>> On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
>>> I would give you exactly the opposite advice: _never_ use a
>>> non-journalling fs for your data and WAL.  I suppose if you can
>>> afford to lose some transactions, you can do without journalling. 
>>> Otherwise, you're just borrowing trouble, near as I can tell.
>> 
>> Agreed.. WAL cannot recover something when WAL no longer exists due to a
>> filesystem corruption.

> It is true that ext2 isn't good because the file system may not recover,
> but BSD UFS isn't a journalled file system, but does guarantee file
> system recovery after a crash --- it is especially good using soft
> updates.

The main point here is that the filesystem has to be able to take care
of itself; we expect it not to lose any files or forget where the data
is.  If it wants to use journalling to accomplish that, fine.

Journalling file contents updates, as opposed to filesystem metadata,
should be redundant with what we do in WAL.  So I'd recommend
journalling metadata only, if that option is available (and if Postgres
stuff is the only stuff on the disk...)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Kings-Lynne
> So, group the inserts in transactions with maybe 1000 commands each. It
> will go much faster. It can then cache the rows and in the end just make
> sure all 1000 have been written out on disk.

More than that, he should be using COPY - it's 10x faster than even grouped
inserts.

Chris


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Kings-Lynne
> Well, yeah.  But given the Linux propensity for introducing major
> features in "minor" releases (and thereby introducing all the
> attendant bugs), I'd think twice about using _any_ Linux feature
> until it's been through a major version (e.g. things introduced in
> 2.4.x won't really be stable until 2.6.x) -- and even there one is
> taking a risk[1].

Dudes, seriously - switch to FreeBSD :P

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Perfomance Tuning

2003-08-14 Thread mixo
I have just installed redhat linux 9 which ships with Pg
7.3.2. Pg has to be setup so that data inserts (blobs) should
be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Andrew Sullivan
On Mon, Aug 11, 2003 at 08:47:07AM -0600, scott.marlowe wrote:
> This isn't saying to not use jounraling, but I would definitely test it 
> under load first to make sure it's not gonna lose data or get corrupted.

Well, yeah.  But given the Linux propensity for introducing major
features in "minor" releases (and thereby introducing all the
attendant bugs), I'd think twice about using _any_ Linux feature
until it's been through a major version (e.g. things introduced in
2.4.x won't really be stable until 2.6.x) -- and even there one is
taking a risk[1].

A

My laptop's PCMCIA network card recently stopped working during a
"minor" version upgrade, even though it's almost 6 years old. 
Someone decided that "cleaning up" the code required complete
redesign, and so all the bugs that had been shaken out during the 2.2
series will now be reimplemented in a new and interesting way.  Sigh.

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Browne
Martha Stewart called it a Good Thing [EMAIL PROTECTED] ("Gregory S. Williamson")wrote:
> FWIW, Informix can be run using a "cooked" (Unix) file for storing
> data or it uses "raw" disk space and bypasses the ordinary (high
> level) UNIX controllers and does its own reads/writes. About 10
> times faster and safer. Of course, itmay have taken a lot of
> programmer time to make that solid. But the performance gains are
> significant.

Are you _certain_ that's still true?  Have you a metric that shows
Informix being 10x faster on a modern system?  That would be quite
surprising...

It may have been true on '80s style UFS implementations, but a couple
of decades have passed, and pretty much any Unix system has new
selections of filesystems that probably aren't so much slower.

It could conceivably be an interesting idea to implement a
block-oriented filesystem where the granularity of files was 8K (or
some such number :-)).

Oracle seems to have done something vaguely like this...
http://otn.oracle.com/tech/linux/open_source.html

But long and short is that the guys implementing OSes have been
putting a LOT of effort into making the potential performance gains of
using "raw" partitions less and less.
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://www.ntlug.org/~cbbrowne/sap.html
(eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
  ; will pick up previous value to var set!-ed,
  ; the unassigned object.
-- from BBN-CL's cl-parser.scm

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Ron Johnson
On Wed, 2003-08-13 at 09:37, Tom Lane wrote:
> Jeff <[EMAIL PROTECTED]> writes:
> > On Tue, 12 Aug 2003, Christopher Browne wrote:
> >> Are you _certain_ that's still true?  Have you a metric that shows
> >> Informix being 10x faster on a modern system?  That would be quite
> >> surprising...
> 
> > We were forced (for budget reason) to switch from raw disk to cooked files
> > on our informix db. We took a huge hit - about 5-6x slower.
> > [snip]
> > The raw disks were on a hitachi fibre array and the cooked files
> > were on a raid5 (scsi). Forget how many spindles in the raid.
> > There were 20 raw disks)
> 
> Seems like you can't know how much of the performance hit was due to the
> filesystem change and how much to the hardware change.  But I'd bet 20
> disks on fibre array have way more net throughput than a single RAID
> array on scsi.

I wouldn't be surprised either if the fiber array had more cache
than the SCSI controller.

Was/is the Hitachi device a SAN?

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| "Man, I'm pretty.  Hoo Hah!"  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Browne
[EMAIL PROTECTED] (Josh Berkus) writes:
>> The other advantage (which I hinted to above) with raw disks is being able
>> to optimize queries to take advantage of it.  Informix is multithreaded
>> and it will spawn off multiple "readers" to do say, a seq scan (and merge
>> the results at the end).
>
> I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm
>  we'd need to see some tests demonstrating that this approach
> was still a technical advantage given the improvements in RAID and
> FS technology since Informix was designed.

Ah, but this approach isn't so much an I/O optimization as it is a CPU
optimization.

If you have some sort of join against a big table, and do a lot of
processing on each component, there might be CPU benefits from the
split:

create table customers(
  id customer_id,  name character varying, other fields
);  --- And we're a phone company with 8 millions of them...


create table customer_status (
  customer_id customer_id,
  status status_code
);

create table customer_address (
  customer_id customer_id,
  address_info...
);

And then are doing:

  select c.id, sum(status), address_label(c.id), balance(c.id) from
 customers c, customer_status cs;

We know there's going to be a SEQ SCAN against customers, because
that's the big table.

If I wanted to finish the query as fast as possible, as things stand
now, and had 4 CPUs, I would run 4 concurrent queries, for 4 ranges of
customers.

The Really Cool approach would be for PostgreSQL to dole out customers
across four processors, perhaps throwing a page at a time at each CPU,
where each process would quasi-independently build up their respective
result sets.
-- 
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Josh Berkus
Jeff,

> Informix, etc. have spent a lot of time and money working on it.
> They also have the advantage of having many paid fulltime
> developers who are doing this for a job, not as a weekend hobby
> (Compared to the what? 2-3 full time PG developers).

I think 4-6 full-time, actually, plus about 200 part-time contributors.  Which 
adds up to a bloody *lot* of code if you monitor pgsql-patches between 
versions.  The only development advantage the commercials have over us is the 
ability to engage in large projects (e.g. replication, raw filesystems, etc.) 
that are difficult for a distributed network of people.

> The other advantage (which I hinted to above) with raw disks is being able
> to optimize queries to take advantage of it.  Informix is multithreaded
> and it will spawn off multiple "readers" to do say, a seq scan (and merge
> the results at the end).

I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm  we'd 
need to see some tests demonstrating that this approach was still a technical 
advantage given the improvements in RAID  and FS technology since Informix 
was designed.

As I have said elsewhere, Informix is probably a poor database to emulate 
since they are effectively an old dead-end fork of the Ingres/Postgres code, 
and have already been "mined" for most of the improvements they made.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Shridhar Daithankar
On 11 Aug 2003 at 23:42, Ron Johnson wrote:

> On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> > 
> > Dudes, seriously - switch to FreeBSD :P
> 
> But, like, we want a *good* OS... 8-0

Joke aside, I guess since postgresql is pretty much reliant on file system for 
basic file functionality, I guess it's time to test Linux 2.6 and compare it.

And don't forget, for large databases, there is still XFS out there which is 
probably the ruler at upper end..

Bye
 Shridhar

--
Unfair animal names:-- tsetse fly   -- bullhead-- booby
 -- duck-billed 
platypus-- sapsucker-- Clarence -- Gary Larson


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian
Rod Taylor wrote:
-- Start of PGP signed section.
> On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
> > On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > > 
> > > Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
> > > (ext2?) with the partition that holds your data and WAL.
> > 
> > I would give you exactly the opposite advice: _never_ use a
> > non-journalling fs for your data and WAL.  I suppose if you can
> > afford to lose some transactions, you can do without journalling. 
> > Otherwise, you're just borrowing trouble, near as I can tell.
> 
> Agreed.. WAL cannot recover something when WAL no longer exists due to a
> filesystem corruption.

It is true that ext2 isn't good because the file system may not recover,
but BSD UFS isn't a journalled file system, but does guarantee file
system recovery after a crash --- it is especially good using soft
updates.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Fri, 8 Aug 2003, mixo wrote:

> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time.

Nothing has to be done to tune postgresql to handle this, 8 Meg blobs are 
no problem as far as I know.

> The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.

Seeing as you have only one hard drive, how you arrange things on it 
doesn't really make a big difference.   If you can get another drive and 
mirror your data partition that will help speed up selects as well as 
provide some redundancy should one drive fail.

How many queries per second are you looking at handling?  If it's 1 or 
less, you probably don't have much to worry about with this setup.  We run 
dual PIII-750s at work with 1.5 Gig ram, and while we're going to upgrade 
the servers (they're currently handling apache/php/postgresql & ldap) 
we'll keep the dual PIII-750 machines as the database boxes with nothing 
else on them.  Postgresql is quite snappy on such hardware.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-12 Thread Bjoern Metzdorf
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.

For 15 users you won't need great tuning at all. Just make sure, that you
have the right indizes on the tables and that you have good queries (query
plan).

About the 8Meg blobs, I don't know. Other people on this list may be able to
give you hints here.

Regards,
Bjoern


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Perfomance Tuning

2003-08-12 Thread Ron Johnson
On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
> > Well, yeah.  But given the Linux propensity for introducing major
> > features in "minor" releases (and thereby introducing all the
> > attendant bugs), I'd think twice about using _any_ Linux feature
> > until it's been through a major version (e.g. things introduced in
> > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > taking a risk[1].
> 
> Dudes, seriously - switch to FreeBSD :P

But, like, we want a *good* OS... 8-0

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| "Man, I'm pretty.  Hoo Hah!"  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-11 Thread Bruce Momjian
Neil Conway wrote:
> On Mon, Aug 11, 2003 at 06:59:30PM -0400, Bruce Momjian wrote:
> > Uh, the ext2 developers say it isn't 100% reliable --- at least that is
> > that was told.  I don't know any personally, but I mentioned it while I
> > was visiting Red Hat, and they didn't refute it.
> 
> IMHO, if we're going to say "don't use X on production PostgreSQL
> systems", we need to have some better evidene than "no one has
> said anything to the contrary, and I heard X is bad". If we can't
> produce such evidence, we shouldn't say anything at all, and users
> can decide what to use for themselves.
> 
> (Not that I'm agreeing or disagreeing about ext2 in particular...)

I don't use Linux and was just repeating what I had heard from others,
and read in postings.  I don't have any first-hand experience with ext2
(except for a laptop I borrowed that wouldn't boot after being shut
off), but others on this mailing list have said the same thing.

Here is another email talking about corrupting ext2 file systems:


http://groups.google.com/groups?q=ext2+corrupt+%22power+failure%22&start=10&hl=en&lr=&ie=UTF-8&selm=20021128061318.GE18980%40ursine&rnum=11

>From his wording, I assume he is not talking about fsck-correctable
corrupting.

>From what I remember, the ext2 failure cases were quite small, but known
by the ext2 developers, and considered too large a performance hit to
correct.

> > > My
> > > untested interpretation was that the update bookkeeping as well as data
> > > update were all getting journalled, the journal space would fill, get
> > > sync'd, then repeat. In effect, all blocks were being written TWICE just
> > > for the journalling, never mind the overhead for PostgreSQL
> > > transactions.
> 
> Journalling may or may not have been the culprit, but I doubt everything
> was being written to disk twice:
> 
> (a) ext3 does metadata-only journalling by default

If that is true, why was I told people have to mount their ext3 file
systems with metadata-only.  Again, I have no experience myself, but why
are people telling me this?

> (b) PostgreSQL only fsyncs WAL records to disk, not the data itself

Right.  WAL recovers the data.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Perfomance Tuning

2003-08-11 Thread Reece Hart




On Mon, 2003-08-11 at 15:16, Bruce Momjian wrote:

That _would_ work if ext2 was a reliable file system --- it is not.


Bruce-

I'd like to know your evidence for this. I'm not refuting it, but I'm a >7 year linux user (including several clusters, all of which have run ext2 or ext3) and keep a fairly close ear to kernel newsgroups, announcements, and changelogs. I am aware that there have very occasionally been corruption problems, but my understanding is that these are fixed (and quickly). In any case, I'd say that your assertion is not widely known and I'd appreciate some data or references.

As for PostgreSQL on ext2 and ext3, I recently switched from ext3 to ext2 (Stephen Tweedy was insightful to facilitate this backward compatibility). I did this because I had a 45M row update on one table that was taking inordinate time (killed after 10 hours), even though creating the database from backup takes ~4 hours including indexing (see pgsql-perform post on 2003/07/22). CPU usage was ~2% on an otherwise unloaded, fast, SCSI160 machine. vmstat io suggested that PostgreSQL was writing something on the order of 100x as many blocks as being read. My untested interpretation was that the update bookkeeping as well as data update were all getting journalled, the journal space would fill, get sync'd, then repeat. In effect, all blocks were being written TWICE just for the journalling, never mind the overhead for PostgreSQL transactions. This emphasizes that journals probably work best with short burst writes and syncing during lulls rather than sustained writes.

I ended up solving the update issue without really updating, so ext2 timings aren't known. So, you may want to test this yourself if you're concerned.

-Reece




-- 
Reece Hart, Ph.D.   [EMAIL PROTECTED], http://www.gene.com/
Genentech, Inc. 650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93http://www.in-machina.com/~reece/
South San Francisco, CA  94080-4990 [EMAIL PROTECTED], GPG: 0x25EC91A0








Re: [PERFORM] Perfomance Tuning

2003-08-10 Thread Andrew Sullivan
On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> 
> Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
> (ext2?) with the partition that holds your data and WAL.

I would give you exactly the opposite advice: _never_ use a
non-journalling fs for your data and WAL.  I suppose if you can
afford to lose some transactions, you can do without journalling. 
Otherwise, you're just borrowing trouble, near as I can tell.

A


Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Perfomance Tuning

2003-08-08 Thread Shridhar Daithankar
On 8 Aug 2003 at 12:28, mixo wrote:

> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.

You can look at http://www.varlena.com/GeneralBits/Tidbits/perf.html to start 
with, although that would not take careof anything specifics to BLOB.

I would suggest some pilot benchmark about how system performs after initial 
tuning. We could discuss this in detail after you have a set of initial 
benchmark.

HTH

Bye
 Shridhar

--
wolf, n.:   A man who knows all the ankles.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-08 Thread Andrew Sullivan
On Fri, Aug 08, 2003 at 03:34:44PM -0400, Bruce Momjian wrote:
> 
> It is true that ext2 isn't good because the file system may not recover,
> but BSD UFS isn't a journalled file system, but does guarantee file
> system recovery after a crash --- it is especially good using soft
> updates.

Sorry.  I usually write "journalled or equivalent" for this reason. 
I think UFS with soft updates is a good example of this.  You also
don't need complete journalling in most cases -- metadata is probably
sufficient, given fsync.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-08 Thread Rod Taylor
> > Agreed.. WAL cannot recover something when WAL no longer exists due to a
> > filesystem corruption.
> 
> It is true that ext2 isn't good because the file system may not recover,
> but BSD UFS isn't a journalled file system, but does guarantee file
> system recovery after a crash --- it is especially good using soft
> updates.

Yes, UFS(2) is an excellent filesystem for PostgreSQL, especially if you
can use background fsck & softupdates.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Perfomance Tuning

2003-08-08 Thread Rod Taylor
On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
> On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > 
> > Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
> > (ext2?) with the partition that holds your data and WAL.
> 
> I would give you exactly the opposite advice: _never_ use a
> non-journalling fs for your data and WAL.  I suppose if you can
> afford to lose some transactions, you can do without journalling. 
> Otherwise, you're just borrowing trouble, near as I can tell.

Agreed.. WAL cannot recover something when WAL no longer exists due to a
filesystem corruption.



signature.asc
Description: This is a digitally signed message part