[PERFORM] Alternative to a temporary table

2005-10-03 Thread Steven Rosenstein




I have a PHP web-based application where a temporary list of servers and
their characteristics (each represented by a unique numeric server_id) is
extracted from a master server list based on a number of dynamic and
user-selected criteria (can the user view the server, is it on-line, is it
a member of a specific group, etc).  Once the user selects the set of
criteria (and servers), it rarely change during the rest of the browser
session.  The resulting temporary list of servers is then joined against
other tables with different sets of information about each of the servers,
based on the server_id.

I currently create a temporary table to hold the selected server_id's and
characteristics.  I then join this temp table with other data tables to
produce my reports.  My reason for using the temporary table method is that
the SQL for the initial server selection is dynamically created based on
the user's selections, and is complex enough that it does not lend itself
to being easily incorporated into any of the other data extraction queries
(which may also contain dynamic filtering).

Unfortunately, the PHP connection to the database does not survive from
webscreen to webscreen, so I have to re-generate the temporary server_id
table each time it is needed for a report screen.  An idea I had to make
this process more efficient was instead of re-creating the temporary table
over and over each time it is needed, do a one-time extraction of the list
of user-selected server_id's, store the list in a PHP global variable, and
then use the list in a dynamically-created WHERE clause in the rest of the
queries.  The resulting query would look something like

 SELECT *
 FROM some_data_table
 WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)

Simple enough, however in rare cases the list of server_id's can range
between 6,000 and 10,000.

My question to the group is, can having so many values in a WHERE/IN clause
effect query performance?  Am I being naive about this and is there a
different, better way?  The server_id field is of course indexed, but it is
possible that the list of selected sid's can contain almost all of the
values in the some_data_table server_id index (in the situation where _all_
of the records are requested I wouldn't use the WHERE clause in the query).
The some_data_table can contain millions of records for thousands of
servers, so every bit of efficiency helps.

If this is not the proper group for this kind of question, please point me
in the right direction.

Thanks!
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Unable to explain DB error

2005-07-28 Thread Steven Rosenstein




Postgres V7.3.9-2.

While executing a query in psql, the following error was generated:

vsa=# select * from vsa.dtbl_logged_event_20050318 where id=2689472;
PANIC:  open of /vsa/db/pg_clog/0FC0 failed: No such file or directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!#

I checked in the /vsa/db/pg_clog directory, and the files have monotonically 
increasing filenames starting with .  The most recent names are:

-rw---1 postgres postgres   262144 Jul 25 21:39 04CA
-rw---1 postgres postgres   262144 Jul 26 01:10 04CB
-rw---1 postgres postgres   262144 Jul 26 05:39 04CC
-rw---1 postgres postgres   262144 Jul 28 00:01 04CD
-rw---1 postgres postgres   237568 Jul 28 11:31 04CE

Any idea why Postgres would be looking for a clog file name 0FC0 when the most 
recent filename is 04CE?

Any help and suggestions for recovery are appreciated.

--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Questions about temporary tables and performance

2005-07-16 Thread Steven Rosenstein




Postgres Version:
7.3.9 and 8.0.1 (different sites use different versions depending on when
they first installed Postgres)

Migration Plans:
All sites on 8.n within the next 6-9 months.

Scenario:
A temporary table is created via a SELECT blah INTO TEMPORARY TABLE blah
FROM  The SELECT query is composed of a number of joins on small
(thousands of rows) parameter tables.  A view is not usable here because
the temporary table SELECT query is constructed on the fly in PHP with JOIN
parameters and WHERE filters that may change from main query set to main
query set.

After the table is created, the key main query JOIN parameter (device ID)
is indexed.  The resulting temporary table is at most 3000-4000 small (128
byte) records.

The temporary table is then joined in a series of SELECT queries to other
data tables in the database that contain information associated with the
records in the temporary table.  These secondary tables can have tens of
millions of records each.  After the queries are executed, the DB
connection is closed and the temporary table and index automatically
deleted.

Are there any performance issues or considerations associated with using a
temporary table in this scenario?  Is it worth my trying to develop a
solution that just incorporates all the logic used to create the temporary
table into each of the main queries?How expensive an operation is
temporary table creation and joining?

Thanks in advance for your advice,
--- Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM]

2005-05-03 Thread Steven Rosenstein




In our application we have tables that we regularly load with 5-10 million
records daily.  We *were* using INSERT (I know...  Still kicking ourselves
for *that* design decision), and we now converting over to COPY.  For the
sake of robustness, we are planning on breaking the entire load into chunks
of a couple hundred thousand records each. This is to constrain the amount
of data we'd have to re-process if one of the COPYs fails.

My question is, are there any advantages, drawbacks, or outright
restrictions to using multiple simultaneous COPY commands to load data into
the same table?   One issue that comes to mind is the loss of data
sequencing if we have multiple chunks interleaving records in the table at
the same time.  But from a purely technical point of view, is there any
reason why the backend would not be happy with two or more COPY commands
trying to insert data into the same table at the same time?  Does COPY take
out any locks on a table?

Thanks in advance,
--- Steve


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Bulk COPY end of copy delimiter

2005-04-04 Thread Steven Rosenstein




Today while trying to do a bulk COPY of data into a table, the process
aborted with the following error message:

ERROR: end-of-copy marker corrupt
CONTEXT: COPY tbl_logged_event, line 178519: 606447014,1492,2005-02-24
03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt
author...

Googling the error, we found reference to the '\.' (backslash-period) being
an end-of-copy marker.  Unfortunately, our data contains the
backslash-period character sequence.  Is there any know fix or workaround
for this condition?

We're using Postgresql 7.3.9 and also running tests on an 8.0.1 system.

Thanks in advance,
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Bulk COPY end of copy delimiter

2005-04-04 Thread Steven Rosenstein




Your assumption is correct.  The data was generated out of a DB2 database,
and uses commas as field delimiters.

Thank you for the workaround,
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


   
 Gavin Sherry  
 [EMAIL PROTECTED] 
 u To 
 Sent by:  Steven Rosenstein/New   
 pgsql-performance York/[EMAIL PROTECTED]   
   
 [EMAIL PROTECTED]  cc 
 .org  pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Bulk COPY end of copy 
 04/04/2005 08:00  delimiter   
 PM
   
   
   
   
   




Hi,

On Mon, 4 Apr 2005, Steven Rosenstein wrote:





 Today while trying to do a bulk COPY of data into a table, the process
 aborted with the following error message:

 ERROR: end-of-copy marker corrupt
 CONTEXT: COPY tbl_logged_event, line 178519: 606447014,1492,2005-02-24
 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt
 author...

 Googling the error, we found reference to the '\.' (backslash-period)
being
 an end-of-copy marker.  Unfortunately, our data contains the
 backslash-period character sequence.  Is there any know fix or workaround
 for this condition?

Any sequence \. in COPY input data should be escaped as \\. If this data
was generated by pg_dump then its a problem, but I haven't seen any other
reports of this. Can I assume that you've generated the data for bulk load
yourself? If so, there is discussion of escaping characters here:
http://www.postgresql.org/docs/8.0/static/sql-copy.html.

Gavin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Steven Rosenstein




This is probably a very trivial question and I feel foolish in even posting
it, but I cannot seem to get it to work.

SCENARIO (abstracted):

Two tables, summary and detail.  The schema of summary looks like:

id  int   serial sequential record id
collect_date  date  date the detail events were collected

The schema of detail looks like:

id  int   serial sequential record id
sum_idint   the id of the parent record in the summary table
details   text  a particular event's details

The relationship is obvious.  If I want to extract all the detail records
for a particular date (2/5/05), I construct a query as follows:

SELECT * FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

Now... I want to *delete* all the detail records for a particular date, I
tried:

DELETE FROM detail JOIN summary ON (summary.id=detail.sum_id) WHERE
collect_date='2005-02-05';

But I keep getting a parser error.  Am I not allowed to use JOINs in a
DELETE statement, or am I just fat-fingering the SQL text somewhere.  If
I'm *not* allowed to use a JOIN with a DELETE, what is the best workaround?
I want to delete just the records in the detail table, and not its parent
summary record.

Thanks in advance for your help,
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Steven Rosenstein
) (actual
time=0.00..0.00 rows=2 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=41
width=4) (actual time=0.02..10.08 rows=2 loops=1)
 Filter: (scan_datetime  '2004-09-02 00:00:00'::timestamp
without time zone)
 Total runtime: 8259.91 msec
(7 rows)

Time: 8263.52 ms

DATE=2004-09-05; SUMMARY ROWS=3; DETAIL ROWS=93; TIME=5.61 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime  '2004-09-05 00:00:00');
 QUERY PLAN

 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..380531977.65 rows=376033
width=1142) (actual time=10.11..5616.68 rows=93 loops=1)
   Filter: (subplan)
   SubPlan
 -  Materialize  (cost=505.06..506.58 rows=152 width=4) (actual
time=0.00..0.00 rows=3 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=152
width=4) (actual time=0.02..10.05 rows=3 loops=1)
 Filter: (scan_datetime  '2004-09-05 00:00:00'::timestamp
without time zone)
 Total runtime: 5616.81 msec
(7 rows)

Time: 5617.87 ms

DATE=2004-09-15; SUMMARY ROWS=16; DETAIL ROWS=674; TIME=18.03 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime  '2004-09-15 00:00:00');
  QUERY PLAN
--
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..381919433.78 rows=376033
width=1142) (actual time=10.18..18032.25 rows=674 loops=1)
   Filter: (subplan)
   SubPlan
 -  Materialize  (cost=505.06..510.27 rows=521 width=4) (actual
time=0.00..0.01 rows=16 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=521
width=4) (actual time=0.02..10.11 rows=16 loops=1)
 Filter: (scan_datetime  '2004-09-15 00:00:00'::timestamp
without time zone)
 Total runtime: 18032.72 msec
(7 rows)

Time: 18033.78 ms

DATE=2004-09-16; SUMMARY ROWS=25; DETAIL ROWS=1131; TIME=26.22 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime  '2004-09-16 00:00:00');
  QUERY PLAN
--
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382058179.39 rows=376033
width=1142) (actual time=6.14..26218.56 rows=1131 loops=1)
   Filter: (subplan)
   SubPlan
 -  Materialize  (cost=505.06..510.64 rows=558 width=4) (actual
time=0.00..0.01 rows=25 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=558
width=4) (actual time=0.01..6.09 rows=25 loops=1)
 Filter: (scan_datetime  '2004-09-16 00:00:00'::timestamp
without time zone)
 Total runtime: 26219.24 msec
(7 rows)

Time: 26220.44 ms

DATE=2004-09-17; SUMMARY ROWS=34; DETAIL ROWS=1588; TIME=34.97 sec
vsa=# explain analyze SELECT * FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime  '2004-09-17 00:00:00');
  QUERY PLAN
---
 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382196925.01 rows=376033
width=1142) (actual time=10.25..34965.95 rows=1588 loops=1)
   Filter: (subplan)
   SubPlan
 -  Materialize  (cost=505.06..511.01 rows=595 width=4) (actual
time=0.00..0.02 rows=34 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=595
width=4) (actual time=0.02..10.16 rows=34 loops=1)
 Filter: (scan_datetime  '2004-09-17 00:00:00'::timestamp
without time zone)
 Total runtime: 34966.90 msec
(7 rows)

Time: 34967.95 ms


What I may end up doing is using the scripting language PHP to solve the
issue by running one query just to return the summary table ID's, and then
DELETE all the rows matching each ID individually by looping through the
ID's.  I was looking for something more elegant, but this will work if its
the only solution.

Thank you all for your help with this.
--- Steve

___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes

[PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Steven Rosenstein




While working on a previous question I posed to this group, I ran a number
of EXPLAIN ANALYZE's to provide as examples.  After sending up my last
email, I ran the same query *without* EXPLAIN ANALYZE.  The runtimes were
vastly different.  In the following example, I ran two identical queries
one right after the other.  The runtimes for both was very close (44.77
sec).  I then immediately ran the exact same query, but without EXPLAIN
ANALYZE.  The same number of rows was returned, but the runtime was only
8.7 sec.  I don't think EXPLAIN ANALYZE puts that much overhead on a query.
Does anyone have any idea what is going on here?

--- Steve

vsa=# explain analyze SELECT id,win_patch_scan_id FROM
vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM
vsa.tbl_win_patch_scan WHERE scan_datetime  '2004-09-18 00:00:00');
 QUERY PLAN

 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382335670.62 rows=376033
width=8) (actual time=10.18..44773.22 rows=2045 loops=1)
   Filter: (subplan)
   SubPlan
 -  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=632
width=4) (actual time=0.02..10.09 rows=43 loops=1)
 Filter: (scan_datetime  '2004-09-18 00:00:00'::timestamp
without time zone)
 Total runtime: 44774.49 msec
(7 rows)

Time: 44775.62 ms


vsa=# explain analyze SELECT id,win_patch_scan_id FROM
vsa.tbl_win_patch_scan_item WHERE win_patch_scan_id IN (SELECT id FROM
vsa.tbl_win_patch_scan WHERE scan_datetime  '2004-09-18 00:00:00');
 QUERY PLAN

 Seq Scan on tbl_win_patch_scan_item  (cost=0.00..382335670.62 rows=376033
width=8) (actual time=10.18..44765.36 rows=2045 loops=1)
   Filter: (subplan)
   SubPlan
 -  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)
   -  Seq Scan on tbl_win_patch_scan  (cost=0.00..505.06 rows=632
width=4) (actual time=0.02..10.10 rows=43 loops=1)
 Filter: (scan_datetime  '2004-09-18 00:00:00'::timestamp
without time zone)
 Total runtime: 44766.62 msec
(7 rows)

Time: 44767.71 ms


vsa=# SELECT id,win_patch_scan_id FROM vsa.tbl_win_patch_scan_item WHERE
win_patch_scan_id IN (SELECT id FROM vsa.tbl_win_patch_scan WHERE
scan_datetime  '2004-09-18 00:00:00');
   id   | win_patch_scan_id
+---
  1 | 1
  2 | 1
  3 | 1
  4 | 1
  5 | 1
--8 SNIP --
211 | 7
212 | 7
213 | 7
214 | 7
215 | 7
216 | 7
217 | 7
 692344 |  9276
 692345 |  9276
 692346 |  9276
 692347 |  9276
 692348 |  9276
--8 SNIP --
 694167 |  9311
 694168 |  9311
 694169 |  9311
 694170 |  9311
 694171 |  9311
(2045 rows)

Time: 8703.56 ms
vsa=#
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Are JOINs allowed with DELETE FROM

2005-02-06 Thread Steven Rosenstein




Many thanks to Gaetano Mendola and Tom Lane for the hints about using
fields from other tables in a DELETE's WHERE clause.  That was the magic
bullet I needed, and my application is working as expected.

--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


   
 Tom Lane  
 [EMAIL PROTECTED] 
 s To 
 Sent by:  Steven Rosenstein/New   
 pgsql-performance York/[EMAIL PROTECTED]   
   
 [EMAIL PROTECTED]  cc 
 .org  pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Are JOINs allowed 
 02/06/2005 02:49  with DELETE FROM
 PM
   
   
   
   
   




Steven Rosenstein [EMAIL PROTECTED] writes:
 Thank you for the link to the documentation page.  I forgot to mention
that
 we're still using version 7.3.  When I checked the 7.3 documentation for
 DELETE, there was no mention of being able to use fields from different
 tables in a WHERE clause.  This feature must have been added in a
 subsequent release of PostgreSQL.

No, it's been there all along, if perhaps not well documented.

 regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Steven Rosenstein




You're probably right about my being overly optimistic about the load
imposed by EXPLAIN ANALYZE.  It was just that in my previous experience
with it, I'd never seen such a large runtime discrepancy before.  I even
allowed for a caching effect by making sure the server was all but
quiescent, and then running the three queries as quickly after one another
as I could.

The server itself is an IBM x345 with dual Xeon 3ghz CPU's (hyperthreading
turned off) and 2.5gb of RAM.  O/S is RHEL3 Update 4.  Disks are a
ServeRAID of some flavor, I'm not sure what.

Thanks for the heads-up about the performance of IN in 7.3.  We're looking
to migrate to 8.0 or 8.0.1 when they become GA, but some of our databases
are in excess of 200gb-300gb, and we need to make sure we have a good
migration plan in place (space to store the dump out of the 7.3 db) before
we start.
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


   
 Tom Lane  
 [EMAIL PROTECTED] 
 s To 
   Steven Rosenstein/New   
 02/06/2005 05:46  York/[EMAIL PROTECTED]   
   
 PM cc 
   pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Can the V7.3 EXPLAIN  
   ANALYZE be trusted? 
   
   
   
   
   
   




 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Steven
Rosenstein
  I don't think EXPLAIN ANALYZE puts that much overhead on a query.

I think you're being overly optimistic.  The explain shows that the
Materialize subnode is being entered upwards of 32 million times:

   -  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)

43 * 752066 = 32338838.  The instrumentation overhead is basically two
gettimeofday() kernel calls per node entry.  Doing the math shows that
your machine is able to do gettimeofday() in about half a microsecond,
which isn't stellar but it's not all that slow for a kernel call.
(What's the platform here, anyway?)  Nonetheless it's a couple of times
larger than the actual time needed to pull a row from a materialized
array ...

The real answer to your question is IN (subselect) sucks before PG 7.4;
get a newer release.

 regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] This query is still running after 10 hours...

2004-09-29 Thread Steven Rosenstein




Hi Robert,

There is no significant disk activity (read 0), one CPU is pegged, and
that process is consuming 218M Resident memory, 168M Shared (10% available
memory total). All reasonable, except for the fact it doesn't come back...

Just to let you know, I've observed the identical phenomenon on my RHEL3-WS
server running PostgreSQL V7.3.4: One of the CPU's pegged at 100% (2-way
SMP with hyperthreading, so 4 apparent CPU's), virtually zero disk I/O
activity, high memory usage, etc.  I thought it might be due to a casting
problem in a JOIN's ON clause, but that did not turn out to be the case.  I
*have* recently observed that if I run a vacuum analyze on the entire
database, the amount of time spent in this looping state decreases greatly,
but it has *not* disappeared in all cases.

Next week I hope to be able to run some directed test with stats collection
turned on, to try to see if I can find out what's causing this to occur.
I'll post the results if I find anything significant.

--- Steve
___

Steven Rosenstein
Senior IT Architect/Specialist | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


   
 Robert Creager
 [EMAIL PROTECTED] 
 ogicalChaos.org   To 
 Sent by:  Tom Lane [EMAIL PROTECTED]
 pgsql-performance  cc 
 [EMAIL PROTECTED] PGPerformance   
 .org  [EMAIL PROTECTED]  
   Subject 
   Re: [PERFORM] This query is still   
 09/28/2004 11:44  running after 10 hours...   
 PM
   
   
   
   
   




When grilled further on (Tue, 28 Sep 2004 11:04:23 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 Robert Creager [EMAIL PROTECTED] writes:
  Normally, this query takes from 5 minutes to 2 hours to run.  On this
  update, it's been running for more than 10 hours.

  ...
  -  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
-  Hash Join  (cost=250.69..307.34 rows=67 width=12)
  Hash Cond: (outer.pair_id = inner.pair_id)
  ...

 It chose a nested loop here because it was only expecting 67 rows out of
 the next-lower join, and so it thought it would only need 67 repetitions
 of the index probe into obs_v_file_id_index.  I'm suspicious that that
 estimate was way low and so the nestloop is taking forever.  You might
 try SET enable_nestloop = off as a crude way of avoiding that trap.

I tried your suggestion.  Did generate a different plan (below), but the
estimation is blown as it still used a nested loop. The query is currently
running(42 minutes so far). For the query in question, there are 151
different
pair_id's in the pairs table, which equates to 302 entries in the files
table
(part of the query), which moves on to 533592 entries in the obs_v table
and
533699 entries in the obs_i table.

The groups table has 76 total entries, files 9028, zero_pair 2532,
color_groups
147. Only the obs_v and obs_i tables have data of any significant
quantities
with 10M rows apiece.  The trigger hitting the catalog table (875499
entries) is
searching for single entries to match (one fire per obs_v/obs_i update) on
an
index (took 54ms on the first query of a random id just now).

There is no significant disk activity (read 0), one CPU is pegged, and that
process is consuming 218M Resident memory, 168M Shared (10% available
memory
total). All reasonable, except for the fact it doesn't come back...

Hash Join  (cost=100267870.17..100751247.13 rows=1578889 width=63)
  Hash Cond: (outer.star_id = inner.star_id)
  -  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  -  Hash  (cost=100266886.39..100266886.39 rows=77113 width=59)
-  Hash Join  (cost=10307.51..100266886.39 rows=77113
width=59)
  Hash Cond: (outer.file_id = inner.file_id)
  -  Seq Scan

[PERFORM] Infinite CPU loop due to field ::type casting

2004-09-22 Thread Steven Rosenstein




Gentlefolk,

I'm not sure if this is the proper forum for this question, and it might
have been answered in a previous thread, but I'm new to PostgreSQL and the
research I did in the archives did not turn up anything addressing this
issue.  Please direct me to the proper forum is this is not the correct
venue.

Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6
(stock with the RHEL distribution)

The two tables I used in the example are tbl_device and tbl_sad_event:

vsa=# \d vsa.tbl_device;
   Table vsa.tbl_device
 Column |   Type   |
Modifiers
+--+-
 id | integer  | not null default
nextval('vsa.tbl_device_id_seq'::text)
 name   | character varying(100)   | not null
 account_id | bigint   | not null
 vss_site_id| bigint   | not null
 org_site_id| bigint   | not null default 0
 device_type_id | integer  | not null default 1
 os_family_id   | integer  | not null default 0
 status_id  | integer  | not null default 0
 timezone   | character varying(80)|
 clientkey  | character varying(2048)  | not null
 record_created | timestamp with time zone | default now()
Indexes: pk_tbl_device primary key btree (id),
 idx_d_uniq_name_site_account_key unique btree (name, vss_site_id,
account_id, clientkey),
 tbl_device_clientkey_key unique btree (clientkey),
 idx_d_account_id btree (account_id),
 idx_d_account_site_name btree (account_id, vss_site_id, name),
 idx_d_device_type_id btree (device_type_id),
 idx_d_name btree (name),
 idx_d_org_site_id btree (org_site_id),
 idx_d_os_family_id btree (os_family_id),
 idx_d_status_id btree (status_id),
 idx_d_vss_site_id btree (vss_site_id)
Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES
vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES
vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dof FOREIGN KEY (os_family_id) REFERENCES
vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dsc FOREIGN KEY (status_id) REFERENCES
vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dt FOREIGN KEY (device_type_id) REFERENCES
vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION
Triggers: trg_clean_device_name

vsa=# \d vsa.tbl_sad_event
 Table vsa.tbl_sad_event
 Column |Type |
Modifiers
+-+
 id | integer | not null default
nextval('vsa.tbl_sad_event_id_seq'::text)
 device_id  | bigint  | not null
 log_type   | integer |
 severity   | character varying(20)   |
 time_logged| timestamp without time zone |
 user_name  | character varying(50)   |
 remote_user| character varying(50)   |
 remote_host| character varying(100)  |
 source_tag | character varying(30)   |
 event_code | character varying(50)   |
 type   | character varying(6)|
 record_created | timestamp with time zone| default now()
Indexes: pk_tbl_sad_event primary key btree (id),
 idx_se_dev_time_type btree (device_id, time_logged, type),
 idx_se_device_id btree (device_id),
 idx_se_time_logged btree (time_logged),
 idx_se_type btree (type),
 sjr_se_id_time_type btree (device_id, time_logged, type)
Foreign Key constraints: fk_sade_d FOREIGN KEY (device_id) REFERENCES
vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CASCADE


Here is my original query, and the query plan generated by the planner:

vsa=# explain
SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count
FROM vsa.tbl_device AS dev
LEFT OUTER JOIN
  (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*)
   FROM vsa.dtbl_logged_event_20040922 AS stbl
   WHERE stbl.log_type IN (2, 3, 4, 5)
   GROUP BY stbl.device_id, stbl.log_type, stbl.severity) AS tbl
  ON (dev.id=tbl.device_id)
ORDER BY dev.name;
  
QUERY PLAN

--
 Sort  (cost=40893.18..40960.93 rows=27100 width=79)
   Sort Key: dev.name
   -  Merge Join  (cost=38417.13..38897.77 rows=27100 width=79)
 Merge Cond: 

[PERFORM] Fw: Infinite CPU loop due to field ::type casting, Take II :-)

2004-09-22 Thread Steven Rosenstein




I just realized in my haste to send this email out I provided the wrong
table in my example.  Below is the same email, but with
vsa.dtbl_logged_event_20040922 substituted for vsa.tbl_sad_event.

Sorry for the inconvenience.

--- Steve


Gentlefolk,

I'm not sure if this is the proper forum for this question, and it might
have been answered in a previous thread, but I'm new to PostgreSQL and the
research I did in the archives did not turn up anything addressing this
issue.  Please direct me to the proper forum is this is not the correct
venue.

Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6
(stock with the RHEL distribution)

The two tables I used in the example are tbl_device and
dtbl_logged_event_20040922:

vsa=# \d vsa.tbl_device;
   Table vsa.tbl_device
 Column |   Type   |
Modifiers
+--+-
 id | integer  | not null default
nextval('vsa.tbl_device_id_seq'::text)
 name   | character varying(100)   | not null
 account_id | bigint   | not null
 vss_site_id| bigint   | not null
 org_site_id| bigint   | not null default 0
 device_type_id | integer  | not null default 1
 os_family_id   | integer  | not null default 0
 status_id  | integer  | not null default 0
 timezone   | character varying(80)|
 clientkey  | character varying(2048)  | not null
 record_created | timestamp with time zone | default now()
Indexes: pk_tbl_device primary key btree (id),
 idx_d_uniq_name_site_account_key unique btree (name, vss_site_id,
account_id, clientkey),
 tbl_device_clientkey_key unique btree (clientkey),
 idx_d_account_id btree (account_id),
 idx_d_account_site_name btree (account_id, vss_site_id, name),
 idx_d_device_type_id btree (device_type_id),
 idx_d_name btree (name),
 idx_d_org_site_id btree (org_site_id),
 idx_d_os_family_id btree (os_family_id),
 idx_d_status_id btree (status_id),
 idx_d_vss_site_id btree (vss_site_id)
Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES
vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES
vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dof FOREIGN KEY (os_family_id) REFERENCES
vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dsc FOREIGN KEY (status_id) REFERENCES
vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 fk_d_dt FOREIGN KEY (device_type_id) REFERENCES
vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION
Triggers: trg_clean_device_name

vsa=# \d vsa.dtbl_logged_event_20040922
 Table
vsa.dtbl_logged_event_20040922
 Column  |Type |
Modifiers
-+-+-
 id  | integer | not null default
nextval('vsa.dtbl_logged_event_20040922_id_seq'::text)
 device_id   | bigint  | not null
 report_datetime | timestamp without time zone |
 time_logged | timestamp without time zone |
 log_type| integer | not null
 type| character varying(50)   |
 severity| character varying(30)   |
 source_tag  | character varying(30)   |
 remote_host | character varying(100)  |
 user_name   | character varying(50)   |
 event_code  | character varying(10)   |
 description | text|
 record_created  | timestamp with time zone| default now()
 event_code_new  | character varying(30)   |
 remote_user | character varying(50)   |
Indexes: pk_dtbl_logged_event_20040922 primary key btree (id),
 idx_le_device_id_20040922 btree (device_id),
 idx_le_id_source_event_20040922 btree (device_id, source_tag,
event_code),
 idx_le_id_src_20040922 btree (device_id, source_tag),
 idx_le_id_type_severity_evtcode_20040922 btree (device_id,
log_type, severity, event_code),
 idx_le_log_type_20040922 btree (log_type),
 idx_le_source_tag_20040922 btree (source_tag),
 idx_le_time_logged_20040922 btree (time_logged),
 idx_le_time_type_20040922 btree (time_logged, log_type)
Foreign Key constraints: fk_le_lelt_20040922 FOREIGN KEY (log_type)
REFERENCES vsa.enum_le_log_type(id) ON UPDATE NO ACTION ON DELETE NO
ACTION,
 fk_le_d_20040922 FOREIGN KEY (device_id)
REFERENCES vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE