Re: [PERFORM] TB-sized databases

2008-03-18 Thread Ron Mayer
Ron Mayer wrote: Tom Lane wrote: Ron Mayer <[EMAIL PROTECTED]> writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part

Re: [PERFORM] TB-sized databases

2008-03-18 Thread Ron Mayer
Tom Lane wrote: Ron Mayer <[EMAIL PROTECTED]> writes: Would another possible condition for considering Cartesian joins be be: * Consider Cartesian joins when a unique constraint can prove that at most one row will be pulled from one of the tables that would be part of this join?

Re: [PERFORM] TB-sized databases

2008-03-18 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes: > Would another possible condition for considering > Cartesian joins be be: >* Consider Cartesian joins when a unique constraint can prove > that at most one row will be pulled from one of the tables > that would be part of this join? What for?

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Ron Mayer
Tom Lane wrote: Bruce Momjian <[EMAIL PROTECTED]> writes: I have _not_ added a TODO for this item. Let me know if one is needed. Please do, I think it's an open issue. * Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation Woul

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Pablo Alcaraz
All that helps to pgsql to perform good in a TB-sized database enviroment is a Good Think (r) :D Pablo Bruce Momjian wrote: I have _not_ added a TODO for this item. Let me know if one is needed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes t

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > I have _not_ added a TODO for this item. Let me know if one is needed. > > Please do, I think it's an open issue. > > * Consider Cartesian joins when both relations are needed to form an > indexscan qualification for a third relat

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > I have _not_ added a TODO for this item. Let me know if one is needed. Please do, I think it's an open issue. * Consider Cartesian joins when both relations are needed to form an indexscan qualification for a third relation r

Re: [PERFORM] TB-sized databases

2008-03-17 Thread Bruce Momjian
I have _not_ added a TODO for this item. Let me know if one is needed. --- Tom Lane wrote: > Ron Mayer <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> There's something fishy about this --- given that that plan has a l

Re: [PERFORM] TB-sized databases

2007-12-11 Thread Simon Riggs
On Fri, 2007-12-07 at 12:45 -0500, Robert Treat wrote: > On Thursday 06 December 2007 04:38, Simon Riggs wrote: > > > I think you're completly overlooking the effect of disk latency has on > > > query times. We run queries all the time that can vary from 4 hours to > > > 12 hours in time based so

Re: [PERFORM] TB-sized databases

2007-12-07 Thread Ron Mayer
Tom Lane wrote: > Ron Mayer <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> There's something fishy about this --- given that that plan has a lower >>> cost estimate, it should've picked it without any artificial >>> constraints. One final thing I find curious about this is that the estimated n

Re: [PERFORM] TB-sized databases

2007-12-07 Thread Robert Treat
On Thursday 06 December 2007 04:38, Simon Riggs wrote: > Robert, > > On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: > > If the whole performance of your system depends upon indexed access, then > > maybe you need a database that gives you a way to force index access at > > the query level?

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Ron Mayer
Tom Lane wrote: > Ron Mayer <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> ...given that that plan has a lower cost estimate, it >>> should've picked it without any artificialconstraints. > >>I think the reason it's not picking it was discussed back... >> http://archives.postgresql.org/pgsql-

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Ron Mayer
Tom Lane wrote: > Ron Mayer <[EMAIL PROTECTED]> writes: > >> Also shown below it seems that if I use "OFFSET 0" as a "hint" >> I can force a much (10x) better plan. I wonder if there's room for >> a pgfoundry project for a patch set that lets us use more hints >> than OFFSET 0. >> > There's someth

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> There's something fishy about this --- given that that plan has a lower >> cost estimate, it should've picked it without any artificial >> constraints. > I think the reason it's not picking it was discussed back in this thread > too. > htt

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
I wrote: > Hmm. IIRC, there are smarts in there about whether a mergejoin can > terminate early because of disparate ranges of the two join variables. > Seems like it should be straightforward to fix it to also consider > whether the time-to-return-first-row will be bloated because of > disparate

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Ron Mayer
Tom Lane wrote: > Michael Stone <[EMAIL PROTECTED]> writes: >> OTOH, the planner can really screw up queries on really large databases. >> ... I've got some queries that the >> planner thinks will return on the order of 10^30 rows for that sort of >> reason. In practice, the query may return 10^

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Michael Stone
On Thu, Dec 06, 2007 at 11:13:18AM -0500, Tom Lane wrote: Indeed, and if you've got examples where it's that far off, you should report them. Yeah, the trick is to get it to a digestable test case. The basic scenario (there are more tables & columns in the actual case) is a set of tables part

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Matthew <[EMAIL PROTECTED]> writes: > On Thu, 6 Dec 2007, Tom Lane wrote: >> Hmm. IIRC, there are smarts in there about whether a mergejoin can >> terminate early because of disparate ranges of the two join variables. > Very cool. Would that be a planner cost estimate fix (so it avoids the > merg

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Matthew
On Thu, 6 Dec 2007, Tom Lane wrote: > Matthew <[EMAIL PROTECTED]> writes: > > ... For this query, Postgres would perform a nested loop, > > iterating over all rows in the small table, and doing a hundred index > > lookups in the big table. This completed very quickly. However, adding the > > LIMIT

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Matthew <[EMAIL PROTECTED]> writes: > ... For this query, Postgres would perform a nested loop, > iterating over all rows in the small table, and doing a hundred index > lookups in the big table. This completed very quickly. However, adding the > LIMIT meant that suddenly a merge join was very attr

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Matthew
On Thu, 6 Dec 2007, Tom Lane wrote: > Indeed, and if you've got examples where it's that far off, you should > report them. Oo, oo, I have one! So, this query bit us a while back. We had two tables being joined together in a query by a key column. The key column was an integer, and for the first

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Tom Lane
Michael Stone <[EMAIL PROTECTED]> writes: > OTOH, the planner can really screw up queries on really large databases. > IIRC, the planner can use things like unique constraints to get some > idea, e.g., of how many rows will result from a join. Unfortunately, > the planner can't apply those techn

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Michael Stone
On Thu, Dec 06, 2007 at 09:38:16AM +, Simon Riggs wrote: The issue is that if somebody issues a "large query" then it will be a problem whichever plan the query takes. Forcing index scans can make a plan more expensive than a seq scan in many cases. OTOH, the planner can really screw up que

Re: [PERFORM] TB-sized databases

2007-12-06 Thread Simon Riggs
Robert, On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: > If the whole performance of your system depends upon indexed access, then > maybe you need a database that gives you a way to force index access at the > query level? That sounds like a request for hints, which is OT here, ISTM.

Re: [PERFORM] TB-sized databases

2007-12-05 Thread Decibel!
On Nov 28, 2007, at 7:27 AM, Bill Moran wrote: Is there something wrong with: set enable_seqscan = off Note that in cases of very heavy skew, that won't work. It only adds 10M to the cost estimate for a seqscan, and it's definitely possible to have an index scan that looks even more expens

Re: [PERFORM] TB-sized databases

2007-12-05 Thread Robert Treat
On Thursday 29 November 2007 11:14, Simon Riggs wrote: > On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > > Gregory Stark <[EMAIL PROTECTED]> writes: > > > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > >> Tom's previous concerns were along the lines of "How would know what > > >> to set it to?"

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Luke Lonergan
Hi Peter, If you run into a scaling issue with PG (you will at those scales 1TB+), you can deploy Greenplum DB which is PG 8.2.5 compatible. A large internet company (look for press soon) is in production with a 150TB database on a system capable of doing 400TB and we have others in production at

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Csaba Nagy
> Isn't that what statement_timeout is for? Since this is entirely based > on estimates, using arbitrary fuzzy numbers for this seems fine to me; > precision isn't really the goal. There's an important difference to statement_timeout: this proposal would avoid completely taking any resources if it

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Trevor Talbot
On 11/29/07, Gregory Stark <[EMAIL PROTECTED]> wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw > >> error if the expected cost exceeds a certain threshold... >

Re: [PERFORM] TB-sized databases

2007-11-30 Thread Simon Riggs
On Fri, 2007-11-30 at 17:41 +1100, Russell Smith wrote: > Simon Riggs wrote: > > On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: > > > >> Simon Riggs wrote: > >> > >>> All of those responses have cooked up quite a few topics into one. Large > >>> databases might mean text warehouses

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Russell Smith
Simon Riggs wrote: On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses.

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Mark Kirkwood
Simon Riggs wrote: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Csaba Nagy
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Given that this list spends all day every day discussing cases where the > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > You could probably avoid this risk by setting the cutoff at something > like 100 or 1000 times

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Stephen Frost
* Simon Riggs ([EMAIL PROTECTED]) wrote: > On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > > Given that this list spends all day every day discussing cases where the > > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > I think you have a point, but the alternative is

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Simon Riggs
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Simon Riggs" <[EMAIL PROTECTED]> writes: > >> Tom's previous concerns were along the lines of "How would know what to > >> set it to?", given that the planner costs are mostly arbitrary numbers. > >

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > "Simon Riggs" <[EMAIL PROTECTED]> writes: >> Tom's previous concerns were along the lines of "How would know what to >> set it to?", given that the planner costs are mostly arbitrary numbers. > Hm, that's only kind of true. The units are not the problem

Re: [PERFORM] TB-sized databases

2007-11-29 Thread Gregory Stark
"Simon Riggs" <[EMAIL PROTECTED]> writes: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... > > Well, I've suggested it before: > > statement_cost_limit o

Re: [PERFORM] TB-sized databases

2007-11-28 Thread david
On Wed, 28 Nov 2007, Simon Riggs wrote: On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: In fact an even more useful option would be to ask the planner to throw error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hacker

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Matthew
On Wed, 28 Nov 2007, Simon Riggs wrote: > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? Yes please. The more options, the better. > Tom's previous concerns were along the lines of "How would know what to > set it to?", gi

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Simon Riggs
On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: > Simon Riggs wrote: > > All of those responses have cooked up quite a few topics into one. Large > > databases might mean text warehouses, XML message stores, relational > > archives and fact-based business data warehouses. > > > > The main t

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Simon Riggs
On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > In fact an even more useful option would be to ask the planner to throw > error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Alvaro Herrera
Pablo Alcaraz escribió: > In my opinion there are queries that I think they ll need to be tuned for > "huge databases" (huge databases = a database which relevant > tables(indexes) are (will be) far bigger that all the ram available): > > -- example table > CREATE TABLE homes ( >id bigse

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Pablo Alcaraz
Pablo Alcaraz wrote: Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance criti

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Matthew
On Wed, 28 Nov 2007, Gregory Stark wrote: > > Is there something wrong with: > > set enable_seqscan = off > > ? > > This does kind of the opposite of what you would actually want here. What you > want is that if you give it a query which would be best satisfied by a > sequential scan it should thro

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Pablo Alcaraz
Matthew wrote: On Tue, 27 Nov 2007, Pablo Alcaraz wrote: it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. You mean: Be able to tell Postgres "Don't ever do a sequential scan of this t

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Csaba Nagy
On Wed, 2007-11-28 at 08:54 -0500, Bill Moran wrote: > > Nothing wrong with enable_seqscan = off except it is all or nothing type > > of thing... > > If that's true, then I have a bug report to file: [snip] > It looks to me to be session-alterable. I didn't mean that it can't be set per session,

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Bill Moran
In response to Gregory Stark <[EMAIL PROTECTED]>: > "Bill Moran" <[EMAIL PROTECTED]> writes: > > > In response to Matthew <[EMAIL PROTECTED]>: > > > >> On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > >> > it would be nice to do something with selects so we can recover a rowset > >> > on huge tables u

Re: [PERFORM] TB-sized databases

2007-11-28 Thread david
On Wed, 28 Nov 2007, Csaba Nagy wrote: On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: Is there something wrong with: set enable_seqscan = off ? Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... if you want the big table to never use seqscan, but a medi

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Gregory Stark
"Bill Moran" <[EMAIL PROTECTED]> writes: > In response to Matthew <[EMAIL PROTECTED]>: > >> On Tue, 27 Nov 2007, Pablo Alcaraz wrote: >> > it would be nice to do something with selects so we can recover a rowset >> > on huge tables using a criteria with indexes without fall running a full >> > sca

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Bill Moran
In response to Csaba Nagy <[EMAIL PROTECTED]>: > On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: > > Is there something wrong with: > > set enable_seqscan = off > > ? > > Nothing wrong with enable_seqscan = off except it is all or nothing type > of thing... If that's true, then I have a bug

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Csaba Nagy
On Wed, 2007-11-28 at 08:27 -0500, Bill Moran wrote: > Is there something wrong with: > set enable_seqscan = off > ? Nothing wrong with enable_seqscan = off except it is all or nothing type of thing... if you want the big table to never use seqscan, but a medium table which is joined in should use

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Bill Moran
In response to Matthew <[EMAIL PROTECTED]>: > On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > > it would be nice to do something with selects so we can recover a rowset > > on huge tables using a criteria with indexes without fall running a full > > scan. > > You mean: Be able to tell Postgres "Don't

Re: [PERFORM] TB-sized databases

2007-11-28 Thread Matthew
On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > it would be nice to do something with selects so we can recover a rowset > on huge tables using a criteria with indexes without fall running a full > scan. You mean: Be able to tell Postgres "Don't ever do a sequential scan of this table. It's silly. I w

Re: [PERFORM] TB-sized databases

2007-11-27 Thread Pablo Alcaraz
Simon Riggs wrote: All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance critical. So it all depends

Re: [PERFORM] TB-sized databases

2007-11-27 Thread Simon Riggs
On Tue, 2007-11-27 at 14:18 -0600, Peter Koczan wrote: > Thanks all. This is just what I needed. All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The mai

Re: [PERFORM] TB-sized databases

2007-11-27 Thread Peter Koczan
Thanks all. This is just what I needed. On Nov 26, 2007 1:16 PM, Stephen Cook <[EMAIL PROTECTED]> wrote: > I think either would work; both PostgreSQL and MS SQL Server have > success stories out there running VLDBs. It really depends on what you > know and what you have. If you have a lot of exp

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Stephen Cook
I think either would work; both PostgreSQL and MS SQL Server have success stories out there running VLDBs. It really depends on what you know and what you have. If you have a lot of experience with Postgres running on Linux, and not much with SQL Server on Windows, of course the former would

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Pablo Alcaraz
I had a client that tried to use Ms Sql Server to run a 500Gb+ database. The database simply colapsed. They switched to Teradata and it is running good. This database has now 1.5Tb+. Currently I have clients using postgresql huge databases and they are happy. In one client's database the bigge

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Oleg Bartunov
We have several TB database in production and it works well on HP rx1620 dual Itanium2, MSA 20, running Linux. It's read-only storage for astronomical catalogs with about 4-billions objects. We have custom index for spherical coordinates which provide great performance. Oleg On Mon, 26 Nov 2007,

Re: [PERFORM] TB-sized databases

2007-11-26 Thread Joshua D. Drake
Peter Koczan wrote: Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize the

[PERFORM] TB-sized databases

2007-11-26 Thread Peter Koczan
Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize them (mostly hardware and