Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-07 Thread Markus Schaber
Hi, @all,

Greg Stark schrieb:
 Leeuw van der, Tim [EMAIL PROTECTED] writes:

I don't think EXPLAIN ANALYZE puts that much overhead on a query.

 EXPLAIN ANALYZE does indeed impose a significant overhead.

Additional note:

In some rare cases, you can experience just the opposite effect, explain
analyze can be quicker then the actual query.

This is the case for rather expensive send/output functions, like the
PostGIS ones:

lwgeom=# \timing
Zeitmessung ist an.
lwgeom=# explain analyze select setsrid(geom,4326) from adminbndy1;
QUERY PLAN

---
 Seq Scan on adminbndy1  (cost=0.00..4.04 rows=83 width=89) (actual
time=11.793..2170.184 rows=83 loops=1)
 Total runtime: 2170.834 ms
(2 Zeilen)

Zeit: 2171,688 ms
lwgeom=# \o /dev/null
lwgeom=# select setsrid(geom,4326) from adminbndy1;
Zeit: 9681,001 ms


BTW: I use the cheap setsrid(geom,4326) to force deTOASTing of the
geometry column. Not using it seems to ignore TOASTed columns in
sequential scan simulation.)

lwgeom=# explain analyze select geom from adminbndy1;
  QUERY PLAN

---
 Seq Scan on adminbndy1  (cost=0.00..3.83 rows=83 width=89) (actual
time=0.089..0.499 rows=83 loops=1)
 Total runtime: 0.820 ms
(2 Zeilen)


Markus

--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Leeuw van der, Tim


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein
Sent: Sunday, February 06, 2005 8:51 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?






While working on a previous question I posed to this group, I ran a number
of EXPLAIN ANALYZE's to provide as examples.  After sending up my last
email, I ran the same query *without* EXPLAIN ANALYZE.  The runtimes were
vastly different.  In the following example, I ran two identical queries
one right after the other.  The runtimes for both was very close (44.77
sec).  I then immediately ran the exact same query, but without EXPLAIN
ANALYZE.  The same number of rows was returned, but the runtime was only
8.7 sec.  I don't think EXPLAIN ANALYZE puts that much overhead on a query.
Does anyone have any idea what is going on here?

--- Steve


Caching by the OS?

(Did you try to *first* run the query w/o EXPLAIN ANALYZE, and then with? 
What's the timing if you do that?)

--Tim

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Greg Stark

Leeuw van der, Tim [EMAIL PROTECTED] writes:

 I don't think EXPLAIN ANALYZE puts that much overhead on a query.

EXPLAIN ANALYZE does indeed impose a significant overhead. What percentage of
the time is overhead depends heavily on how much i/o the query is doing.

For queries that are primarily cpu bound because they're processing data from
the cache it can be substantial. If all the data is in the shared buffers then
the gettimeofday calls for explain analyze can be just about the only syscalls
being executed and they're executed a lot.

It would be interesting to try to subtract out the profiling overhead from the
data like most profilers do. But it's not an easy thing to do since the times
are nested.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Tom Lane
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steven Rosenstein
  I don't think EXPLAIN ANALYZE puts that much overhead on a query.

I think you're being overly optimistic.  The explain shows that the
Materialize subnode is being entered upwards of 32 million times:

   -  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual 
time=0.00..0.02 rows=43 loops=752066)

43 * 752066 = 32338838.  The instrumentation overhead is basically two
gettimeofday() kernel calls per node entry.  Doing the math shows that
your machine is able to do gettimeofday() in about half a microsecond,
which isn't stellar but it's not all that slow for a kernel call.
(What's the platform here, anyway?)  Nonetheless it's a couple of times
larger than the actual time needed to pull a row from a materialized
array ...

The real answer to your question is IN (subselect) sucks before PG 7.4;
get a newer release.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Can the V7.3 EXPLAIN ANALYZE be trusted?

2005-02-06 Thread Steven Rosenstein




You're probably right about my being overly optimistic about the load
imposed by EXPLAIN ANALYZE.  It was just that in my previous experience
with it, I'd never seen such a large runtime discrepancy before.  I even
allowed for a caching effect by making sure the server was all but
quiescent, and then running the three queries as quickly after one another
as I could.

The server itself is an IBM x345 with dual Xeon 3ghz CPU's (hyperthreading
turned off) and 2.5gb of RAM.  O/S is RHEL3 Update 4.  Disks are a
ServeRAID of some flavor, I'm not sure what.

Thanks for the heads-up about the performance of IN in 7.3.  We're looking
to migrate to 8.0 or 8.0.1 when they become GA, but some of our databases
are in excess of 200gb-300gb, and we need to make sure we have a good
migration plan in place (space to store the dump out of the 7.3 db) before
we start.
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


   
 Tom Lane  
 [EMAIL PROTECTED] 
 s To 
   Steven Rosenstein/New   
 02/06/2005 05:46  York/[EMAIL PROTECTED]   
   
 PM cc 
   pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Can the V7.3 EXPLAIN  
   ANALYZE be trusted? 
   
   
   
   
   
   




 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Steven
Rosenstein
  I don't think EXPLAIN ANALYZE puts that much overhead on a query.

I think you're being overly optimistic.  The explain shows that the
Materialize subnode is being entered upwards of 32 million times:

   -  Materialize  (cost=505.06..511.38 rows=632 width=4) (actual
time=0.00..0.02 rows=43 loops=752066)

43 * 752066 = 32338838.  The instrumentation overhead is basically two
gettimeofday() kernel calls per node entry.  Doing the math shows that
your machine is able to do gettimeofday() in about half a microsecond,
which isn't stellar but it's not all that slow for a kernel call.
(What's the platform here, anyway?)  Nonetheless it's a couple of times
larger than the actual time needed to pull a row from a materialized
array ...

The real answer to your question is IN (subselect) sucks before PG 7.4;
get a newer release.

 regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org