Re: [PERFORM] TB-sized databases

2008-03-18 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join? What for? That

Re: [PERFORM] TB-sized databases

2008-03-18 Thread Ron Mayer
Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join?

Re: [PERFORM] TB-sized databases

2008-03-18 Thread Ron Mayer
Ron Mayer wrote: Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Bruce Momjian
I have _not_ added a TODO for this item. Let me know if one is needed. --- Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: There's something fishy about this --- given that that plan has a lower

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: I have _not_ added a TODO for this item. Let me know if one is needed. Please do, I think it's an open issue. * Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I have _not_ added a TODO for this item. Let me know if one is needed. Please do, I think it's an open issue. * Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Pablo Alcaraz
All that helps to pgsql to perform good in a TB-sized database enviroment is a Good Think (r) :D Pablo Bruce Momjian wrote: I have _not_ added a TODO for this item. Let me know if one is needed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Ron Mayer
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I have _not_ added a TODO for this item. Let me know if one is needed. Please do, I think it's an open issue. * Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation Would

Re: [PERFORM] TB-sized databases

2007-12-11 Thread Simon Riggs
On Fri, 2007-12-07 at 12:45 -0500, Robert Treat wrote: On Thursday 06 December 2007 04:38, Simon Riggs wrote: I think you're completly overlooking the effect of disk latency has on query times. We run queries all the time that can vary from 4 hours to 12 hours in time based solely on

Re: [PERFORM] TB-sized databases

2007-12-07 Thread Robert Treat
On Thursday 06 December 2007 04:38, Simon Riggs wrote: Robert, On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: If the whole performance of your system depends upon indexed access, then maybe you need a database that gives you a way to force index access at the query level? That

Re: [PERFORM] TB-sized databases

2007-12-07 Thread Ron Mayer
Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: There's something fishy about this --- given that that plan has a lower cost estimate, it should've picked it without any artificial constraints. One final thing I find curious about this is that the estimated number of rows

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Simon Riggs
Robert, On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: If the whole performance of your system depends upon indexed access, then maybe you need a database that gives you a way to force index access at the query level? That sounds like a request for hints, which is OT here, ISTM.

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Michael Stone
On Thu, Dec 06, 2007 at 09:38:16AM +, Simon Riggs wrote: The issue is that if somebody issues a large query then it will be a problem whichever plan the query takes. Forcing index scans can make a plan more expensive than a seq scan in many cases. OTOH, the planner can really screw up

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Michael Stone [EMAIL PROTECTED] writes: OTOH, the planner can really screw up queries on really large databases. IIRC, the planner can use things like unique constraints to get some idea, e.g., of how many rows will result from a join. Unfortunately, the planner can't apply those techniques

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Matthew [EMAIL PROTECTED] writes: On Thu, 6 Dec 2007, Tom Lane wrote: Hmm. IIRC, there are smarts in there about whether a mergejoin can terminate early because of disparate ranges of the two join variables. Very cool. Would that be a planner cost estimate fix (so it avoids the merge join),

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Matthew
On Thu, 6 Dec 2007, Tom Lane wrote: Matthew [EMAIL PROTECTED] writes: ... For this query, Postgres would perform a nested loop, iterating over all rows in the small table, and doing a hundred index lookups in the big table. This completed very quickly. However, adding the LIMIT meant that

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Michael Stone
On Thu, Dec 06, 2007 at 11:13:18AM -0500, Tom Lane wrote: Indeed, and if you've got examples where it's that far off, you should report them. Yeah, the trick is to get it to a digestable test case. The basic scenario (there are more tables columns in the actual case) is a set of tables

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Ron Mayer
Tom Lane wrote: Michael Stone [EMAIL PROTECTED] writes: OTOH, the planner can really screw up queries on really large databases. ... I've got some queries that the planner thinks will return on the order of 10^30 rows for that sort of reason. In practice, the query may return 10^3 rows

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: There's something fishy about this --- given that that plan has a lower cost estimate, it should've picked it without any artificial constraints. I think the reason it's not picking it was discussed back in this thread too.

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Ron Mayer
Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: ...given that that plan has a lower cost estimate, it should've picked it without any artificialconstraints. I think the reason it's not picking it was discussed back...

Re: [PERFORM] TB-sized databases

2007-12-05 Thread Robert Treat
On Thursday 29 November 2007 11:14, Simon Riggs wrote: On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Tom's previous concerns were along the lines of How would know what to set it to?, given that the

Re: [PERFORM] TB-sized databases

2007-12-05 Thread Decibel!
On Nov 28, 2007, at 7:27 AM, Bill Moran wrote: Is there something wrong with: set enable_seqscan = off Note that in cases of very heavy skew, that won't work. It only adds 10M to the cost estimate for a seqscan, and it's definitely possible to have an index scan that looks even more

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Luke Lonergan
Hi Peter, If you run into a scaling issue with PG (you will at those scales 1TB+), you can deploy Greenplum DB which is PG 8.2.5 compatible. A large internet company (look for press soon) is in production with a 150TB database on a system capable of doing 400TB and we have others in production

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Csaba Nagy
Isn't that what statement_timeout is for? Since this is entirely based on estimates, using arbitrary fuzzy numbers for this seems fine to me; precision isn't really the goal. There's an important difference to statement_timeout: this proposal would avoid completely taking any resources if it

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Trevor Talbot
On 11/29/07, Gregory Stark [EMAIL PROTECTED] wrote: Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Tom's previous

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Simon Riggs
On Fri, 2007-11-30 at 17:41 +1100, Russell Smith wrote: Simon Riggs wrote: On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. The units are not the problem. The

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Simon Riggs
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Hm, that's only

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote: On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. I think you have a point, but the alternative is often

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Csaba Nagy
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Mark Kirkwood
Simon Riggs wrote: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Russell Smith
Simon Riggs wrote: On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses.

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Matthew
On Tue, 27 Nov 2007, Pablo Alcaraz wrote: it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. You mean: Be able to tell Postgres Don't ever do a sequential scan of this table. It's silly. I

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Bill Moran
In response to Matthew [EMAIL PROTECTED]: On Tue, 27 Nov 2007, Pablo Alcaraz wrote: it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. You mean: Be able to tell Postgres Don't ever do a

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Csaba Nagy
On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: Is there something wrong with: set enable_seqscan = off ? Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... if you want the big table to never use seqscan, but a medium table which is joined in should use

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Bill Moran
In response to Gregory Stark [EMAIL PROTECTED]: Bill Moran [EMAIL PROTECTED] writes: In response to Matthew [EMAIL PROTECTED]: On Tue, 27 Nov 2007, Pablo Alcaraz wrote: it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Csaba Nagy
On Wed, 2007-11-28 at 08:54 -0500, Bill Moran wrote: Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... If that's true, then I have a bug report to file: [snip] It looks to me to be session-alterable. I didn't mean that it can't be set per session, I

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Pablo Alcaraz
Matthew wrote: On Tue, 27 Nov 2007, Pablo Alcaraz wrote: it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. You mean: Be able to tell Postgres Don't ever do a sequential scan of this

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Matthew
On Wed, 28 Nov 2007, Gregory Stark wrote: Is there something wrong with: set enable_seqscan = off ? This does kind of the opposite of what you would actually want here. What you want is that if you give it a query which would be best satisfied by a sequential scan it should throw an

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Pablo Alcaraz
Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Alvaro Herrera
Pablo Alcaraz escribió: In my opinion there are queries that I think they ll need to be tuned for huge databases (huge databases = a database which relevant tables(indexes) are (will be) far bigger that all the ram available): -- example table CREATE TABLE homes ( id bigserial,

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Simon Riggs
On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Simon Riggs
On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Matthew
On Wed, 28 Nov 2007, Simon Riggs wrote: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to re-write and resubmit this patch for 8.4? Yes please. The more options, the better. Tom's previous concerns were along the lines of How would know what to set it to?, given

Re: [PERFORM] TB-sized databases

2007-11-28 Thread david
On Wed, 28 Nov 2007, Simon Riggs wrote: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on

Re: [PERFORM] TB-sized databases

2007-11-27 Thread Peter Koczan
Thanks all. This is just what I needed. On Nov 26, 2007 1:16 PM, Stephen Cook [EMAIL PROTECTED] wrote: I think either would work; both PostgreSQL and MS SQL Server have success stories out there running VLDBs. It really depends on what you know and what you have. If you have a lot of

Re: [PERFORM] TB-sized databases

2007-11-27 Thread Simon Riggs
On Tue, 2007-11-27 at 14:18 -0600, Peter Koczan wrote: Thanks all. This is just what I needed. All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The

Re: [PERFORM] TB-sized databases

2007-11-27 Thread Pablo Alcaraz
Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance critical. So it all

[PERFORM] TB-sized databases

2007-11-26 Thread Peter Koczan
Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize them (mostly hardware

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Joshua D. Drake
Peter Koczan wrote: Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Oleg Bartunov
We have several TB database in production and it works well on HP rx1620 dual Itanium2, MSA 20, running Linux. It's read-only storage for astronomical catalogs with about 4-billions objects. We have custom index for spherical coordinates which provide great performance. Oleg On Mon, 26 Nov 2007,

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Pablo Alcaraz
I had a client that tried to use Ms Sql Server to run a 500Gb+ database. The database simply colapsed. They switched to Teradata and it is running good. This database has now 1.5Tb+. Currently I have clients using postgresql huge databases and they are happy. In one client's database the

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Stephen Cook
I think either would work; both PostgreSQL and MS SQL Server have success stories out there running VLDBs. It really depends on what you know and what you have. If you have a lot of experience with Postgres running on Linux, and not much with SQL Server on Windows, of course the former would