Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-23 Thread Tom Lane
Josh Berkus writes: > Overall, our formula is inherently conservative of n_distinct. That is, I > believe that it is actually computing the *smallest* number of distinct > values which would reasonably produce the given sample, rather than the > *median* one. This is contrary to the notes in

Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-23 Thread Thomas F . O'Connell
Steve, Per your and Tom's recommendations, I significantly increased the number of transactions used for testing. See my last post. The database will have pretty heavy mixed use, i.e., both reads and writes. I performed 32 iterations per scenario this go-round. I'll look into OSDB for further b

Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-23 Thread Thomas F . O'Connell
Okay. I updated my benchmark page with new numbers, which are the result of extensive pgbench usage over this past week. In fact, I modified pgbench (for both of the latest version of postgres) to be able to accept multiple iterations as an argument and report the results of each iteration as w

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-23 Thread Andrew Dunstan
Josh Berkus said: > > > Well, unusual distributions are certainly tough. But I think the > problem exists even for relatively well-distributed populations. > Part of it is, I believe, the formula we are using: > > n*d / (n - f1 + f1*n/N) > [snip] > > This is so broken, in fact, that I'm wonderin

Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-23 Thread Tom Lane
John A Meinel <[EMAIL PROTECTED]> writes: > Actually, you probably don't want enable_seqscan=off, you should try: > SET enable_nestloop TO off. > The problem is that it is estimating there will only be 44 rows, but in > reality there are 13M rows. It almost definitely should be doing a > seqscan wi

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-23 Thread Josh Berkus
People, > Can someone whose math is more recent than calculus in 1989 take a look at > that paper, and look at the formula toward the bottom of page 10, and see > if we are correctly interpreting it?    I'm particularly confused as to > what "q" and "d-sub-n" represent.  Thanks! Actually, I manag

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-23 Thread Josh Berkus
Greg, > I looked into this a while back when we were talking about changing the > sampling method. The conclusions were discouraging. Fundamentally, using > constant sized samples of data for n_distinct is bogus. Constant sized > samples only work for things like the histograms that can be analyze

[PERFORM] flattening the file might work for me here is the analyze.

2005-04-23 Thread Joel Fradkin
"Index Scan using ix_tblviwauditcube_clientnum on tblviwauditcube  (cost=0.00..35895.75 rows=303982 width=708) (actual time=0.145..1320.432 rows=316490 loops=1)" "  Index Cond: ((clientnum)::text = 'MSI'::text)" "Total runtime: 1501.028 ms"   Joel Fradkin   Wazagua, Inc. 2520 Trai

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Tom Lane
Richard Plotkin <[EMAIL PROTECTED]> writes: > Thanks for your responses this morning. I did the select relname, and > it returned 0 rows. I do have one function that creates a temp table > and fills it within the same transaction. I'm pasting it below. > Perhaps the "ON COMMIT DROP" is causi

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Richard Plotkin
Hi Tom, Thanks for your responses this morning. I did the select relname, and it returned 0 rows. I do have one function that creates a temp table and fills it within the same transaction. I'm pasting it below. Perhaps the "ON COMMIT DROP" is causing problems, and I need to drop the table a

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-23 Thread Josh Berkus
Jim, Kevin, > > Hrm... I was about to suggest that for timing just the query (and not > > output/data transfer time) using explain analyze, but then I remembered > > that explain analyze can incur some non-trivial overhead with the timing > > calls. Is there a way to run the query but have psql ig

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-23 Thread Christopher Browne
Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] ("Joel Fradkin") would write: > I am just testing the water so to speak, if it cant handle single > user tests then multiple user tests are kind of a waste of time. I would suggest that if multi-user functionality is needed, then starting

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Tom Lane
Richard Plotkin <[EMAIL PROTECTED]> writes: > /usr/local/pgsql/data/base/17234/42791 > /usr/local/pgsql/data/base/17234/42791.1 > /usr/local/pgsql/data/base/17234/42791.2 > /usr/local/pgsql/data/base/17234/42791.3 > ... Well, that is certainly a table or index of some kind. Go into database 17234

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Richard Plotkin
/usr/local/pgsql/data/base/17234/42791 /usr/local/pgsql/data/base/17234/42791.1 /usr/local/pgsql/data/base/17234/42791.2 /usr/local/pgsql/data/base/17234/42791.3 /usr/local/pgsql/data/base/17234/42791.4 /usr/local/pgsql/data/base/17234/42791.5 /usr/local/pgsql/data/base/17234/42791.6 /usr/local/pgs

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Tom Lane
Richard Plotkin <[EMAIL PROTECTED]> writes: > I updated postgres to 8.0.2, am running vacuumdb -faz every 3 hours, > and 50 minutes after a vacuum the CPU usage still skyrocketed, and the > disk started filling. This time, there is only a single file that is > spanning multiple GB, but running

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Richard Plotkin
I also forgot to mention, vacuumdb fails on the command line now with the following error: vacuumdb: could not connect to database smt: FATAL: sorry, too many clients already On Apr 23, 2005, at 9:57 AM, Richard Plotkin wrote: If anybody has additional advice on this problem, I would really, r

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-23 Thread Richard Plotkin
If anybody has additional advice on this problem, I would really, really appreciate it... I updated postgres to 8.0.2, am running vacuumdb -faz every 3 hours, and 50 minutes after a vacuum the CPU usage still skyrocketed, and the disk started filling. This time, there is only a single file tha

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-23 Thread brew
> I am just testing the water so to speak, if it cant handle single user > tests then multiple user tests are kind of a waste of time. At the risk of being even more pedantic, let me point out that if you are going to be running your application with multiple users the reverse is even more true,

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-23 Thread Joshua D. Drake
Joel Fradkin wrote: I would very, very strongly encourage you to run multi-user tests before deciding on mysql. Mysql is nowhere near as capable when it comes to concurrent operations as PostgreSQL is. From what others have said, it doesn't take many concurrent operations for it to just fall over.

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-23 Thread Joel Fradkin
I would very, very strongly encourage you to run multi-user tests before deciding on mysql. Mysql is nowhere near as capable when it comes to concurrent operations as PostgreSQL is. From what others have said, it doesn't take many concurrent operations for it to just fall over. I can't speak from e

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-23 Thread Greg Stark
Josh Berkus writes: > ... I think the problem is in our heuristic sampling code. I'm not the first > person to have this kind of a problem. Will be following up with tests ... I looked into this a while back when we were talking about changing the sampling method. The conclusions were discou