Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Mischa Sandberg
Quoting Andrew Dunstan <[EMAIL PROTECTED]>: > After some more experimentation, I'm wondering about some sort of > adaptive algorithm, a bit along the lines suggested by Marko Ristola, but limited to 2 rounds. > > The idea would be that we take a sample (either of fixed size, or > some sm

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-26 Thread Kevin Brown
Josh Berkus wrote: > Jim, Kevin, > > > > Hrm... I was about to suggest that for timing just the query (and not > > > output/data transfer time) using explain analyze, but then I remembered > > > that explain analyze can incur some non-trivial overhead with the timing > > > calls. Is there a way to

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread Mike Rylander
On 4/26/05, Mohan, Ross <[EMAIL PROTECTED]> wrote: > Maybe he needs to spend $7K on performance improvements? > > ;-) > AAARRRGGG! I will forever hate the number 7,000 from this day forth! Seriously, though, I've never seen a thread on any list wander on so aiml

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Dave Held
> -Original Message- > From: Gurmeet Manku [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 26, 2005 5:01 PM > To: Simon Riggs > Cc: Tom Lane; josh@agliodbs.com; Greg Stark; Marko Ristola; > pgsql-perform; pgsql-hackers@postgresql.org; Utkarsh Srivastava; > [EMAIL PROTECTED] > Subject: Re:

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Andrew Dunstan
Simon Riggs wrote: The comment * Every value in the sample appeared more than once. Assume * the column has just these values. doesn't seem to apply when using larger samples, as Josh is using. Looking at Josh's application it does seem likely that when taking a sample, all site

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Andrew Dunstan
Tom Lane wrote: Josh Berkus writes: Overall, our formula is inherently conservative of n_distinct. That is, I believe that it is actually computing the *smallest* number of distinct values which would reasonably produce the given sample, rather than the *median* one. This is contrary to

Re: [PERFORM] What needs to be done for real Partitioning?

2005-04-26 Thread Roger Hand
On March 21, 2005 8:07 AM, Hannu Krosing wrote: > On L, 2005-03-19 at 23:47 -0500, Tom Lane wrote: > > Well, partitioning on the primary key would be Good Enough for 95% or > > 99% of the real problems out there. I'm not excited about adding a > > large chunk of complexity to cover another few per

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Andrew Dunstan
Josh Berkus wrote: Simon, Tom: While it's not possible to get accurate estimates from a fixed size sample, I think it would be possible from a small but scalable sample: say, 0.1% of all data pages on large tables, up to the limit of maintenance_work_mem. Setting up these samples as a % of da

Re: [PERFORM] speed up query with max() and odd estimates

2005-04-26 Thread Matthew Nuzum
On 4/26/05, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote: > On Tue, Apr 26, 2005 at 03:16:57PM -0500, Matthew Nuzum wrote: > > Seq Scan on usage_access (cost=0.00..1183396.40 rows=12713851 > > width=116) (actual time=481796.22..481839.43 rows=3343 loops=1) > > That's a gross misestimation -- f

Re: [PERFORM] speed up query with max() and odd estimates

2005-04-26 Thread John A Meinel
Matthew Nuzum wrote: I have this query that takes a little over 8 min to run: select client,max(atime) as atime from usage_access where atime >= (select atime - '1 hour'::interval from usage_access order by atime desc limit 1) group by client; I think it can go a lot faster. Any suggestions on impr

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Gurmeet Manku
Hi everybody! Perhaps the following papers are relevant to the discussion here (their contact authors have been cc'd): 1. The following proposes effective algorithms for using block-level sampling for n_distinct estimation: "Effective use of block-level sampling in statistics estimat

Re: [PERFORM] speed up query with max() and odd estimates

2005-04-26 Thread Matthew Nuzum
On 4/26/05, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote: > On Tue, Apr 26, 2005 at 03:16:57PM -0500, Matthew Nuzum wrote: > > Seq Scan on usage_access (cost=0.00..1183396.40 rows=12713851 > > width=116) (actual time=481796.22..481839.43 rows=3343 loops=1) > > That's a gross misestimation -- f

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Simon Riggs
On Mon, 2005-04-25 at 17:10 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Mon, 2005-04-25 at 11:23 -0400, Tom Lane wrote: > >> It's not just the scan --- you also have to sort, or something like > >> that, if you want to count distinct values. I doubt anyone is really > >

Re: [PERFORM] speed up query with max() and odd estimates

2005-04-26 Thread Steinar H. Gunderson
On Tue, Apr 26, 2005 at 03:16:57PM -0500, Matthew Nuzum wrote: > Seq Scan on usage_access (cost=0.00..1183396.40 rows=12713851 > width=116) (actual time=481796.22..481839.43 rows=3343 loops=1) That's a gross misestimation -- four orders of magnitude off! Have you considering doing this in two s

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Josh Berkus
Simon, > Could it be that we have overlooked this simple explanation and that the > Haas and Stokes equation is actually quite good, but just not being > applied? That's probably part of it, but I've tried Haas and Stokes on a pure random sample and it's still bad, or more specifically overly co

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Simon Riggs
On Sun, 2005-04-24 at 00:48 -0400, Tom Lane wrote: > Josh Berkus writes: > > Overall, our formula is inherently conservative of n_distinct. That is, I > > believe that it is actually computing the *smallest* number of distinct > > values which would reasonably produce the given sample, rather

[PERFORM] speed up query with max() and odd estimates

2005-04-26 Thread Matthew Nuzum
I have this query that takes a little over 8 min to run: select client,max(atime) as atime from usage_access where atime >= (select atime - '1 hour'::interval from usage_access order by atime desc limit 1) group by client; I think it can go a lot faster. Any suggestions on improving this? DB is 7.

Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-26 Thread Steve Poe
Tom, Honestly, you've got me. It was either comment from Tom Lane or Josh that the os is caching the results (I may not be using the right terms here), so I thought it the database is dropped and recreated, I would see less of a skew (or variation) in the results. Someone which to comment? Stev

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread Mohan, Ross
Maybe he needs to spend $7K on performance improvements? ;-) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Josh Berkus Sent: Tuesday, April 26, 2005 8:00 PM To: Richard Huxton Cc: [EMAIL PROTECTED]; pgsql-performance@postgresql.org Subject: Re: [PERF

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread Josh Berkus
Richard, > I believe these are being worked on at the moment. You might want to > search the archives of the hackers mailing list to see if the plans will > suit your needs. Actually, this is being discussed through the Bizgres project: www.bizgres.org. However, I agree that a 1GB table is not

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread Andreas Pflug
[EMAIL PROTECTED] wrote: Hmm, I have asked some Peoples on the List an some one has posted this links http://archives.postgresql.org/pgsql-performance/2004-12/msg00101.php It is quite usefull to read but iam not sure thadt theese Trick is verry helpfull. I want to splitt my 1GByte Table into some

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-26 Thread John A Meinel
Shoaib Burq (VPAC) wrote: OK ... so just to clearify... (and pardon my ignorance): I need to increase the value of 'default_statistics_target' variable and then run VACUUM ANALYZE, right? If so what should I choose for the 'default_statistics_target'? BTW I only don't do any sub-selection on the V

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-26 Thread Dave Held
> -Original Message- > From: Shoaib Burq (VPAC) [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 26, 2005 9:31 AM > To: Tom Lane > Cc: John A Meinel; Russell Smith; Jeff; > pgsql-performance@postgresql.org > Subject: Re: [PERFORM] two queries and dual cpu (perplexed) > > > OK ... so just

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-26 Thread Shoaib Burq (VPAC)
OK ... so just to clearify... (and pardon my ignorance): I need to increase the value of 'default_statistics_target' variable and then run VACUUM ANALYZE, right? If so what should I choose for the 'default_statistics_target'? BTW I only don't do any sub-selection on the View. I have attached

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread [EMAIL PROTECTED]
Hmm, I have asked some Peoples on the List an some one has posted this links http://archives.postgresql.org/pgsql-performance/2004-12/msg00101.php It is quite usefull to read but iam not sure thadt theese Trick is verry helpfull. I want to splitt my 1GByte Table into some little Partitions but ho

Re: [PERFORM] BLOB's bypassing the OS Filesystem for better Image

2005-04-26 Thread [EMAIL PROTECTED]
Which filesystems? I know ext2 used to have issues with many-thousands of files in one directory, but that was a directory scanning issue rather than file reading. From my Point of view i think it is better to let one Process do the operation to an Postgres Cluster Filestructure as if i bypass

Re: [PERFORM] BLOB's bypassing the OS Filesystem for better Image

2005-04-26 Thread Richard Huxton
[EMAIL PROTECTED] wrote: Hi all again, My next queststion is dedicated to blobs in my Webapplication (using Tomcat 5 and JDBC integrated a the J2EE Appserver JBoss). Filesystems with many Filesystem Objects can slow down the Performance at opening and reading Data. Which filesystems? I know ext

[PERFORM] BLOB's bypassing the OS Filesystem for better Image loading speed?

2005-04-26 Thread [EMAIL PROTECTED]
Hi all again, My next queststion is dedicated to blobs in my Webapplication (using Tomcat 5 and JDBC integrated a the J2EE Appserver JBoss). Filesystems with many Filesystem Objects can slow down the Performance at opening and reading Data. My Question: Can i speedup my Webapplication if i sto

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread Richard Huxton
[EMAIL PROTECTED] wrote: Hi all, Ia a Guy from Germany an a strong Postgres believer! It is the best OpenSource Database i have ever have bee tasted and i try to using it in any Database Environments. It is exiting to see thadt Verison 8.0 has Tablespaces like ORACLE and DB/2, but i need Partit

[PERFORM] Table Partitioning: Will it be supported in Future? (splitting large Tables)

2005-04-26 Thread [EMAIL PROTECTED]
Hi all, Ia a Guy from Germany an a strong Postgres believer! It is the best OpenSource Database i have ever have bee tasted and i try to using it in any Database Environments. It is exiting to see thadt Verison 8.0 has Tablespaces like ORACLE and DB/2, but i need Partitioning on a few very large