Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-25 Thread Luke Lonergan
Jim, On 9/22/06 7:01 AM, Jim C. Nasby [EMAIL PROTECTED] wrote: There's been talk of adding code that would have a seqscan detect if another seqscan is happening on the table at the same time, and if it is, to start it's seqscan wherever the other seqscan is currently running. That would

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-24 Thread Ron Mayer
Luke Lonergan wrote: I think the topic is similar to cache bypass, used in cache capable vector processors (Cray, Convex, Multiflow, etc) in the 90's. When you are scanning through something larger than the cache, it should be marked non-cacheable and bypass caching altogether. This avoids

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-22 Thread Markus Schaber
Hi, Guy, Guy Thornley wrote: Of course you could argue the OS should be able to detect this, and prevent it occuring anyway. I don't know anything about linux's behaviour in this area. Yes, one can argue that way. But a generic Algorithm in the OS can never be as smart as the application

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-22 Thread Jim C. Nasby
On Thu, Sep 21, 2006 at 08:46:41PM -0700, Luke Lonergan wrote: Mark, On 9/21/06 8:40 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I'd advise against using this call unless it can be shown that the page will not be used in the future, or at least, that the page is less useful than all

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-22 Thread Jim C. Nasby
On Thu, Sep 21, 2006 at 11:05:39PM -0400, Bruce Momjian wrote: We tried posix_fadvise() during the 8.2 development cycle, but had problems as outlined in a comment in xlog.c: /* * posix_fadvise is problematic on many platforms: on older x86 Linux * it just dumps core, and

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Markus Schaber
Hi, Luke, Luke Lonergan wrote: I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly meant for this purpose? This is a good idea - I wasn't aware that this was possible. This possibility was the reason for me to propose it. :-) We'll do some testing and see if it works as

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Bucky Jordan
Do you think that adding some posix_fadvise() calls to the backend to pre-fetch some blocks into the OS cache asynchroneously could improve that situation? Nope - this requires true multi-threading of the I/O, there need to be multiple seek operations running simultaneously. The current

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Mark Lewis
So this might be a dumb question, but the above statements apply to the cluster (e.g. postmaster) as a whole, not per postgres process/transaction correct? So each transaction is blocked waiting for the main postmaster to retrieve the data in the order it was requested (i.e. not multiple

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Markus Schaber
Hi, Bucky, Bucky Jordan wrote: We can implement multiple scanners (already present in MPP), or we could implement AIO and fire off a number of simultaneous I/O requests for fulfillment. So this might be a dumb question, but the above statements apply to the cluster (e.g. postmaster) as a

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Luke Lonergan
Bucky, On 9/21/06 2:16 PM, Bucky Jordan [EMAIL PROTECTED] wrote: Does this have anything to do with postgres indexes not storing data, as some previous posts to this list have mentioned? (In otherwords, having the index in memory doesn't help? Or are we talking about indexes that are too

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Guy Thornley
I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly meant for this purpose? This is a good idea - I wasn't aware that this was possible. This possibility was the reason for me to propose it. :-) posix_fadvise() features in the TODO list already; I'm not sure if any

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread mark
On Fri, Sep 22, 2006 at 02:52:09PM +1200, Guy Thornley wrote: I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly meant for this purpose? This is a good idea - I wasn't aware that this was possible. This possibility was the reason for me to propose it. :-)

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Luke Lonergan
Mark, On 9/21/06 8:40 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I'd advise against using this call unless it can be shown that the page will not be used in the future, or at least, that the page is less useful than all other pages currently in memory. This is what the call really means.

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Markus Schaber
Hi, Luke, Luke Lonergan wrote: Since PG's heap scan is single threaded, the seek rate is equivalent to a single disk (even though RAID arrays may have many spindles), the typical random seek rates are around 100-200 seeks per second from within the backend. That means that as sequential

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Luke Lonergan
Markus, On 9/20/06 1:09 AM, Markus Schaber [EMAIL PROTECTED] wrote: Do you think that adding some posix_fadvise() calls to the backend to pre-fetch some blocks into the OS cache asynchroneously could improve that situation? Nope - this requires true multi-threading of the I/O, there need to

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Markus Schaber
Hi, Luke, Luke Lonergan wrote: Do you think that adding some posix_fadvise() calls to the backend to pre-fetch some blocks into the OS cache asynchroneously could improve that situation? Nope - this requires true multi-threading of the I/O, there need to be multiple seek operations

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Ron
IMHO, AIO is the architecturally cleaner and more elegant solution. We in fact have a project on the boards to do this but funding (as yet) has not been found. My $.02, Ron At 02:02 PM 9/20/2006, Markus Schaber wrote: Hi, Luke, Luke Lonergan wrote: Do you think that adding some

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Luke Lonergan
Markus, On 9/20/06 11:02 AM, Markus Schaber [EMAIL PROTECTED] wrote: I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly meant for this purpose? This is a good idea - I wasn't aware that this was possible. We'll do some testing and see if it works as advertised on Linux and

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-19 Thread Bucky Jordan
Mike, On Mon, Sep 18, 2006 at 07:14:56PM -0400, Alex Turner wrote: If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much). I'll keep that in mind (minimizing

[PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Bucky Jordan
Yes. What's pretty large? We've had to redefine large recently, now we're talking about systems with between 100TB and 1,000TB. - Luke Well, I said large, not gargantuan :) - Largest would probably be around a few TB, but the problem I'm having to deal with at the moment is large numbers

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Merlin Moncure
On 9/18/06, Bucky Jordan [EMAIL PROTECTED] wrote: My question is at what point do I have to get fancy with those big tables? From your presentation, it looks like PG can handle 1.2 billion records or so as long as you write intelligent queries. (And normal PG should be able to handle that,

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alan Hodgson
On Monday 18 September 2006 13:56, Merlin Moncure [EMAIL PROTECTED] wrote: just another fyi, if you have a really big database, you can forget about doing pg_dump for backups (unless you really don't care about being x day or days behind)...you simply have to due some type of

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Bucky Jordan
good normalization skills are really important for large databases, along with materialization strategies for 'denormalized sets'. Good points- thanks. I'm especially curious what others have done for the materialization. The matview project on gborg appears dead, and I've only found a

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alex Turner
Do the basic math:If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much). If you have to do a full table scan, then it will take roughly 400 seconds with a single 10k RPM

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Michael Stone
On Mon, Sep 18, 2006 at 07:14:56PM -0400, Alex Turner wrote: If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much). If you're in that range it doesn't even count

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Luke Lonergan
Bucky, On 9/18/06 7:37 AM, Bucky Jordan [EMAIL PROTECTED] wrote: My question is at what point do I have to get fancy with those big tables? From your presentation, it looks like PG can handle 1.2 billion records or so as long as you write intelligent queries. (And normal PG should be able to

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-18 Thread Luke Lonergan
Alex, On 9/18/06 4:14 PM, Alex Turner [EMAIL PROTECTED] wrote: Be warned, the tech specs page: http://www.sun.com/servers/x64/x4500/specs.xml#anchor3 doesn't mention RAID 10 as a possible, and this is probably what most would recommend for fast data access if you are doing both read and

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alex Turner
Sweet - thats good - RAID 10 support seems like an odd thing to leave out.AlexOn 9/18/06, Luke Lonergan [EMAIL PROTECTED] wrote:Alex,On 9/18/06 4:14 PM, Alex Turner [EMAIL PROTECTED] wrote: Be warned, the tech specs page: http://www.sun.com/servers/x64/x4500/specs.xml#anchor3 doesn't mention

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-18 Thread Luke Lonergan
Yep, Solaris ZFS kicks butt. It does RAID10/5/6, etc and implements most of the high end features available on high end SANs... - Luke On 9/18/06 8:40 PM, Alex Turner [EMAIL PROTECTED] wrote: Sweet - thats good - RAID 10 support seems like an odd thing to leave out. Alex On 9/18/06,

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-18 Thread Luke Lonergan
Mark, On 9/18/06 8:45 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Does a tool exist yet to time this for a particular configuration? We're considering building this into ANALYZE on a per-table basis. The basic approach times sequential access in page rate, then random seeks as page rate