Hi Robert,

"There is no significant disk activity (read 0), one CPU is pegged, and
that process is consuming 218M Resident memory, 168M Shared (10% available
memory total). All reasonable, except for the fact it doesn't come back..."

Just to let you know, I've observed the identical phenomenon on my RHEL3-WS
server running PostgreSQL V7.3.4: One of the CPU's pegged at 100% (2-way
SMP with hyperthreading, so 4 apparent CPU's), virtually zero disk I/O
activity, high memory usage, etc.  I thought it might be due to a casting
problem in a JOIN's ON clause, but that did not turn out to be the case.  I
*have* recently observed that if I run a vacuum analyze on the entire
database, the amount of time spent in this looping state decreases greatly,
but it has *not* disappeared in all cases.

Next week I hope to be able to run some directed test with stats collection
turned on, to try to see if I can find out what's causing this to occur.
I'll post the results if I find anything significant.

--- Steve

Steven Rosenstein
Senior IT Architect/Specialist | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt

             Robert Creager                                                
             <[EMAIL PROTECTED]                                             
             ogicalChaos.org>                                           To 
             Sent by:                  Tom Lane <[EMAIL PROTECTED]>        
             pgsql-performance                                          cc 
             [EMAIL PROTECTED]         PGPerformance                       
             .org                      <[EMAIL PROTECTED]>  
                                       Re: [PERFORM] This query is still   
             09/28/2004 11:44          running after 10 hours...           

When grilled further on (Tue, 28 Sep 2004 11:04:23 -0400),
Tom Lane <[EMAIL PROTECTED]> confessed:

> Robert Creager <[EMAIL PROTECTED]> writes:
> > Normally, this query takes from 5 minutes to 2 hours to run.  On this
> > update, it's been running for more than 10 hours.
> > ...
> >         ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
> >               ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
> >                     Hash Cond: ("outer".pair_id = "inner".pair_id)
> > ...
> It chose a nested loop here because it was only expecting 67 rows out of
> the next-lower join, and so it thought it would only need 67 repetitions
> of the index probe into obs_v_file_id_index.  I'm suspicious that that
> estimate was way low and so the nestloop is taking forever.  You might
> try "SET enable_nestloop = off" as a crude way of avoiding that trap.

I tried your suggestion.  Did generate a different plan (below), but the
estimation is blown as it still used a nested loop. The query is currently
running(42 minutes so far). For the query in question, there are 151
pair_id's in the pairs table, which equates to 302 entries in the files
(part of the query), which moves on to 533592 entries in the obs_v table
533699 entries in the obs_i table.

The groups table has 76 total entries, files 9028, zero_pair 2532,
147. Only the obs_v and obs_i tables have data of any significant
with 10M rows apiece.  The trigger hitting the catalog table (875499
entries) is
searching for single entries to match (one fire per obs_v/obs_i update) on
index (took 54ms on the first query of a random id just now).

There is no significant disk activity (read 0), one CPU is pegged, and that
process is consuming 218M Resident memory, 168M Shared (10% available
total). All reasonable, except for the fact it doesn't come back...

Hash Join  (cost=100267870.17..100751247.13 rows=1578889 width=63)
  Hash Cond: ("outer".star_id = "inner".star_id)
  ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  ->  Hash  (cost=100266886.39..100266886.39 rows=77113 width=59)
        ->  Hash Join  (cost=100000307.51..100266886.39 rows=77113
              Hash Cond: ("outer".file_id = "inner".file_id)
              ->  Seq Scan on obs_v  (cost=0.00..213854.50 rows=10390650
1)              ->  Hash  (cost=100000307.34..100000307.34 rows=67
                    ->  Hash Join  (cost=100000250.69..100000307.34 rows=67
width=12)                          Hash Cond: ("outer".pair_id =
"inner".pair_id)                          ->  Seq Scan on zero_pair zp
(cost=0.00..43.32 rows=2532 width=8)                          ->  Hash
(cost=100000250.40..100000250.40 rows=118 width=12)

  ->  Hash Join  (cost=100000004.80..100000250.40 rows=118 width=12)

                         Hash Cond: ("outer".night_id = "inner".night_id)

                              ->  Seq Scan on files f  (cost=0.00..199.28
rows=9028 width=12)                                      ->  Hash
(cost=100000004.80..100000004.80rows=1 width=8)

          ->  Nested Loop  (cost=100000000.00..100000004.80 rows=1 width=8)

                                            ->  Seq Scan on color_groups cg

(cost=0.00..2.84 rows=1 width=8)

       Filter: (175 = group_id)

->  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)

                                 Filter: (group_id = 175)

 20:48:23 up 5 days, 23:14,  2 users,  load average: 2.56, 2.91, 2.78
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
(See attached file: color.explain)(See attached file: attlakjy.dat)

Attachment: color.explain
Description: Binary data

Attachment: attlakjy.dat
Description: Binary data

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?


Reply via email to