Re: [PERFORM] EXPLAIN ANALYZE time calculations

2007-12-02 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes:
> I took startup time to be the time to return the first row *of the  
> first loop*. But it's actually the average startup time to return the  
> first row *in each loop*, right?

Correct, just as the total time and tuples returned are averages over all
the loops.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] EXPLAIN ANALYZE time calculations

2007-12-02 Thread Michael Glaesemann


On Dec 2, 2007, at 19:56 , Tom Lane wrote:

IOW the actual time to get in and out of a node is going to be a  
shade more

than is reported.


Thanks, Tom. Should be close enough for jazz.

When I was first going over the Using Explain section, I stumbled a  
bit on the startup time/total time/loops bit (which is why explain- 
analyze.info times and percentages are currently miscalculated). I  
took startup time to be the time to return the first row *of the  
first loop*. But it's actually the average startup time to return the  
first row *in each loop*, right?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] EXPLAIN ANALYZE time calculations

2007-12-02 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes:
> I'd like to get confirmation that I'm correctly understanding the  
> times given in EXPLAIN ANALYZE.
> ...
> Is this correct?

Looks about right to me.  Note that some of what you are calling
"executor overhead" might also be classed as "gettimeofday overhead".
The measured difference between two successive gettimeofday readings
presumably includes the actual userland runtime plus the equivalent
of one gettimeofday call; but we actually did two calls.  IOW the
actual time to get in and out of a node is going to be a shade more
than is reported.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] EXPLAIN ANALYZE time calculations

2007-12-02 Thread Michael Glaesemann
I'd like to get confirmation that I'm correctly understanding the  
times given in EXPLAIN ANALYZE. Taking the example given in the Using  
Explain section of the docs,


http://www.postgresql.org/docs/current/static/using-explain

EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 <  
100 AND t1.unique2 = t2.unique2;


QUERY PLAN
 
--
Nested Loop  (cost=2.37..553.11 rows=106 width=488) (actual  
time=1.392..12.700 rows=100 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=2.37..232.35 rows=106  
width=244) (actual time=0.878..2.367 rows=100 loops=1)

 Recheck Cond: (unique1 < 100)
 ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..2.37  
rows=106 width=0) (actual time=0.546..0.546 rows=100 loops=1)

   Index Cond: (unique1 < 100)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.00..3.01  
rows=1 width=244) (actual time=0.067..0.078 rows=1 loops=100)

 Index Cond: ("outer".unique2 = t2.unique2)
Total runtime: 14.452 ms

I'm interested in figuring out what percentage of the total runtime  
is spent in each node. Here are my calculations.


Nested loop:
  actual time: 12.700 ms/loop * 1 loop = 12.700 ms
  percent of total runtime: 88%
  percent spent in subnodes: 16% + 54% = 70%
  percent spent in node: 18%

Bitmap Heap Scan on tenk1:
  actual time: 2.367 ms/loop * 1 loop = 2.367 ms
  percent of total runtime: 16%
  time spent in subnodes: 4%
  time spent in node: 12%

Bitmap Heap Scan on tenk1_unique1:
  actual time: 0.546 ms/loop * 1 loop = 0.546 ms: 4%
  time spent in subnodes: 0%
  time spent in node: 4%

Index Scan total time:
  actual time: 0.078 ms/loop * 100 loops = 7.80 ms
  percent of total runtime: 54%
  percent spent in subnodes: 0%
  percent spent in node: 54%

executor overhead: 14.452 ms - 12.700 ms = 1.752 ms: 12%

Is this correct?

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] PostgreSQL 8.2.5 slow performance on INSERT on Linux

2007-12-02 Thread Beyers Cronje
>
> Your development system is probably running inexpensive IDE disks that
> cache writes, while the test server is not caching.  If you loop over
> single inserts, PostgreSQL's default configuration will do a physical
> commit to disk after every one of them, which limits performance to how
> fast the disk spins.  If your server has 15K RPM drives, a single client
> can commit at most 250 transactions per second to disk, which means 10,000
> inserts done one at a time must take at least 40 seconds no matter how
> fast the server is.
>
> There's a rambling discussion of this topic at
> http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htmthat
> should fill in some background here.


This is exactly what is happening. Thank you for the above link, the article
was very informative.

If you use COPY instead of INSERT, that bypasses the WAL and you don't see
> this.  Also, if you adjust your loop to do multiple inserts as a single
> transaction, that will change the behavior here as well.


I will give COPY a go and see how it performs. For testing we specifically
only did one insert per transaction, we will obviously optimize the actual
application to do multiple insert per transaction where-ever possible.

Kind regards

Beyers Cronje

PS Thank you for the quick responses Greg and Pavel. It is always
encouraging starting off with a new product and seeing there are people
passionate about it.


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Mark Kirkwood

Mindaugas wrote:
 


  And I cannot use some index organized table or table partitioned by From :) 
because there are at least 2 similar indexes by which queries can be executed - 
From and To.

 


This makes things a bit tough. One trick is to vertically partition the 
table into two new tables - with "From" in one and "To" in the other... 
then you can (horizontally) partition or cluster on each of these 
columns separately.


You can make it reasonably transparent by using a view to combine the 
columns again to get something that looks like the original table.


Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL 8.2.5 slow performance on INSERT on Linux

2007-12-02 Thread Greg Smith

On Sun, 2 Dec 2007, Beyers Cronje wrote:

Initially I tested this on my development PC, an old P4 system with 2GB 
RAM and 10,000 INSERTs took ~12 secs on average, which I was fairly 
satisfied with. I then moved everything over to our test server, a new 
Dell 1950 server with quad core Xeon processors, 4GB RAM and SCSI hdd 
expecting to see better performance, but instead performance dropped to 
~44 secs for 10,000 INSERTs.


Your development system is probably running inexpensive IDE disks that 
cache writes, while the test server is not caching.  If you loop over 
single inserts, PostgreSQL's default configuration will do a physical 
commit to disk after every one of them, which limits performance to how 
fast the disk spins.  If your server has 15K RPM drives, a single client 
can commit at most 250 transactions per second to disk, which means 10,000 
inserts done one at a time must take at least 40 seconds no matter how 
fast the server is.


There's a rambling discussion of this topic at 
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm that 
should fill in some background here.


If you use COPY instead of INSERT, that bypasses the WAL and you don't see 
this.  Also, if you adjust your loop to do multiple inserts as a single 
transaction, that will change the behavior here as well.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL 8.2.5 slow performance on INSERT on Linux

2007-12-02 Thread Pavel Stehule
On 02/12/2007, Beyers Cronje <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm busy evaluating PostgreSQL and I'm having performance problems on one of
> my servers. I have a very simple one table database, and the client using
> Mono 1.2.5.1 is running a loop doing INSERTs on the table. Initially I
> tested this on my development PC, an old P4 system with 2GB RAM and 10,000
> INSERTs took ~12 secs on average, which I was fairly satisfied with. I then
> moved everything over to our test server, a new Dell 1950 server with quad
> core Xeon processors, 4GB RAM and SCSI hdd expecting to see better
> performance, but instead performance dropped to ~44 secs for 10,000 INSERTs.
> This obviously is not acceptable. Both the PC and server are running the
> exact same PostgreSQL version, Mono version, client application and both
> tests were run under very low load and on an empty table. I noticed that CPU
> utilization on the Dell server is very low, 1-2% utilization, so obviously
> it's not a load problem. Only the test application is accessing the
> database.
>
> So my question is, can anyone please give me some tips on what commands or
> tools I can use to try and pin down where exactly the performance drop is
> coming from? I'm obviously new to PostgreSQL so even basic checks can be
> relevant.
>
> Kind regards
>
> Beyers Cronje
>

Hello

a) use COPY instead INSERT (it's much faster) if it is possible

b) check your configuration and read this article
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Regards
Pavel Stehule

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] PostgreSQL 8.2.5 slow performance on INSERT on Linux

2007-12-02 Thread Beyers Cronje
Hi all,

I'm busy evaluating PostgreSQL and I'm having performance problems on one of
my servers. I have a very simple one table database, and the client using
Mono 1.2.5.1 is running a loop doing INSERTs on the table. Initially I
tested this on my development PC, an old P4 system with 2GB RAM and 10,000
INSERTs took ~12 secs on average, which I was fairly satisfied with. I then
moved everything over to our test server, a new Dell 1950 server with quad
core Xeon processors, 4GB RAM and SCSI hdd expecting to see better
performance, but instead performance dropped to ~44 secs for 10,000 INSERTs.
This obviously is not acceptable. Both the PC and server are running the
exact same PostgreSQL version, Mono version, client application and both
tests were run under very low load and on an empty table. I noticed that CPU
utilization on the Dell server is very low, 1-2% utilization, so obviously
it's not a load problem. Only the test application is accessing the
database.

So my question is, can anyone please give me some tips on what commands or
tools I can use to try and pin down where exactly the performance drop is
coming from? I'm obviously new to PostgreSQL so even basic checks can be
relevant.

Kind regards

Beyers Cronje


Re: [PERFORM] Training Recommendations

2007-12-02 Thread Usama Munir Dar



Robert Treat wrote:

On Wednesday 28 November 2007 11:20, Usama Munir Dar wrote:
  

EnterpriseDB (www.enterprisedb.com), ofcourse




lame :-P
  


Have you or anyone you know tried the training offerings? or you think 
its lame because i top posted , which of course would be a very poor 
criteria , not to mention completely unrelated, so i definitely think 
its not the reason. i would love to hear whats wrong with it so we can 
work on its improvement



  

Campbell, Lance wrote:


PostgreSQL: 8.2.4



Does anyone have any companies they would recommend using for
performance tuning training of PostgreSQL for Linux?  Or general DBA
training?

  


Never take advice from a guy who top posts...  A friend of mine just went 
through an OTG course and had good things to say, and I've heard other speak 
well of it too, so I'd probably recommend them, but there are several 
options, check out the training section on the website:

http://www.postgresql.org/about/eventarchive

Note also some of the more popular pg support companies also offer personal 
training, even if it isn't advertised. HTH.


  


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Training Recommendations

2007-12-02 Thread Guido Neitzer

On 02.12.2007, at 06:30, Merlin Moncure wrote:


I've been dying to know if anyone has ever done PostgreSQL training at
'the big nerd ranch'.


There are a couple of reviews floating around the web:

http://www.linux.com/articles/48870
http://www.linuxjournal.com/article/7847

I was in the course too (out of interest) but as I'm with Big Nerd  
Ranch, I don't want to say anything here about the course.


cug

--
http://www.event-s.net


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Training Recommendations

2007-12-02 Thread Merlin Moncure
On Nov 30, 2007 4:15 AM, Robert Treat <[EMAIL PROTECTED]> wrote:
> Never take advice from a guy who top posts...  A friend of mine just went
> through an OTG course and had good things to say, and I've heard other speak
> well of it too, so I'd probably recommend them, but there are several
> options, check out the training section on the website:
> http://www.postgresql.org/about/eventarchive
>
> Note also some of the more popular pg support companies also offer personal
> training, even if it isn't advertised. HTH.

I've been dying to know if anyone has ever done PostgreSQL training at
'the big nerd ranch'.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Mindaugas

> What exactly is your goal? Do you need this query to respond in under a
> specific limit? What limit? Do you need to be able to execute many instances
> of this query in less than 5s * the number of executions? Or do you have more
> complex queries that you're really worried about?

  I'd like this query to respond under a specific time limit. 5s now is OK but 
50s later for 1 rows is too slow.

> Both Greenplum and EnterpriseDB have products in this space which let you
> break the query up over several servers but at least in EnterpriseDB's case
> it's targeted towards running complex queries which take longer than this to
> run. I doubt you would see much benefit for a 5s query after the overhead of
> sending parts of the query out to different machines and then reassembling the
> results. If your real concern is with more complex queries they may make sense
> though. It's also possible that paying someone to come look at your database
> will find other ways to speed it up.

  I see. This query also should benefit alot even when run in parallel on one 
server. Since anyway most time it spends in waiting for storage to respond.

  Also off list I was pointed out about covering indexes in MySQL. But they are 
not supported in PostgreSQL, aren't they?

  Mindaugas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Gregory Stark

"Mindaugas" <[EMAIL PROTECTED]> writes:

>   I execute simple query "select * from bigtable where From='something'".
>   Query returns like 1000 rows and takes 5++ seconds to complete. 

As you pointed out that's not terribly slow for 1000 random accesses. It
sounds like your drive has nearly 5ms seek time which is pretty common.

What exactly is your goal? Do you need this query to respond in under a
specific limit? What limit? Do you need to be able to execute many instances
of this query in less than 5s * the number of executions? Or do you have more
complex queries that you're really worried about?

I do have an idea of how to improve Postgres for this case but it has to wait
until we're done with 8.3 and the tree opens for 8.4.

>   Ideas for improvement? Greenplum or EnterpriseDB? Or I forgot something
>   from PostgreSQL features.

Both Greenplum and EnterpriseDB have products in this space which let you
break the query up over several servers but at least in EnterpriseDB's case
it's targeted towards running complex queries which take longer than this to
run. I doubt you would see much benefit for a 5s query after the overhead of
sending parts of the query out to different machines and then reassembling the
results. If your real concern is with more complex queries they may make sense
though. It's also possible that paying someone to come look at your database
will find other ways to speed it up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Guillaume Smet
On Dec 2, 2007 11:26 AM, Mindaugas <[EMAIL PROTECTED]> wrote:
>   I execute simple query "select * from bigtable where From='something'". 
> Query returns like 1000 rows and takes 5++ seconds to complete. As far as I 
> understand the query is slow because:

Can you post an EXPLAIN ANALYZE? Which version of PostgreSQL do you use?

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Mindaugas

> my answer may be out of topic since you might be looking for a
> postgres-only solution.. But just in case

  I'd like to stay with SQL.

> What are you trying to achieve exactly ? Is there any way you could
> re-work your algorithms to avoid selects and use a sequential scan
> (consider your postgres data as one big file) to retrieve each of the
> rows, analyze / compute them (possibly in a distributed manner), and
> join the results at the end ?

  I'm trying to improve performance - get answer from mentioned query 
faster.

  And since cardinality is high (10+ different values) I doubt that it 
would be possible to reach select speed with reasonable number of nodes of 
sequential scan nodes.

  Mindaugas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Sami Dalouche
Hi,

my answer may be out of topic since you might be looking for a
postgres-only solution.. But just in case

What are you trying to achieve exactly ? Is there any way you could
re-work your algorithms to avoid selects and use a sequential scan
(consider your postgres data as one big file) to retrieve each of the
rows, analyze / compute them (possibly in a distributed manner), and
join the results at the end ? 

A few pointers :
http://lucene.apache.org/hadoop/
http://www.gridgain.com/

Regards,
Sami Dalouche



On Sun, 2007-12-02 at 12:26 +0200, Mindaugas wrote:
> Hello,
> 
>   Started to work with big tables (like 300GB) and performance problems 
> started to appear. :(
> 
>   To simplify things - table has an index on From an index on To columns. And 
> it also have several other not indexed columns. There are 10+ of 
> different values for From and the same for To.
> 
>   I execute simple query "select * from bigtable where From='something'". 
> Query returns like 1000 rows and takes 5++ seconds to complete. As far as I 
> understand the query is slow because:
>   - first it has to retrieve pointers to rows with data from index. That goes 
> fast.
>   - retrieve all the rows one by one. There we have 100% random read because 
> rows with the same From is distributed evenly through all the 300GB and most 
> probably nothing is cached. So there we are limited by _one_ disk performance 
> independently of how many disks we have in storage? And in case storage 
> consists of 15k rpm Cheetahs with 3.5ms average read seek time we should 
> expect not more than ~285 rows per second?
> 
>   I feel that I'm overlooking something here. But I'm new into data 
> warehousing. :)
> 
>   Also this query should greatly benefit from parallel execution or async IO. 
> Storage (seeks/second) scales almost linearly when it has a lot of disks. And 
> query is completely IO bound so it should scale well on single server.
> 
>   And I cannot use some index organized table or table partitioned by From :) 
> because there are at least 2 similar indexes by which queries can be executed 
> - From and To.
> 
>   Ideas for improvement? Greenplum or EnterpriseDB? Or I forgot something 
> from PostgreSQL features.
> 
>   Thanks,
> 
>   Mindaugas
> 
> ---(end of broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
> 
> http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Dealing with big tables

2007-12-02 Thread Mindaugas
  Hello,

  Started to work with big tables (like 300GB) and performance problems started 
to appear. :(

  To simplify things - table has an index on From an index on To columns. And 
it also have several other not indexed columns. There are 10+ of different 
values for From and the same for To.

  I execute simple query "select * from bigtable where From='something'". Query 
returns like 1000 rows and takes 5++ seconds to complete. As far as I 
understand the query is slow because:
  - first it has to retrieve pointers to rows with data from index. That goes 
fast.
  - retrieve all the rows one by one. There we have 100% random read because 
rows with the same From is distributed evenly through all the 300GB and most 
probably nothing is cached. So there we are limited by _one_ disk performance 
independently of how many disks we have in storage? And in case storage 
consists of 15k rpm Cheetahs with 3.5ms average read seek time we should expect 
not more than ~285 rows per second?

  I feel that I'm overlooking something here. But I'm new into data 
warehousing. :)

  Also this query should greatly benefit from parallel execution or async IO. 
Storage (seeks/second) scales almost linearly when it has a lot of disks. And 
query is completely IO bound so it should scale well on single server.

  And I cannot use some index organized table or table partitioned by From :) 
because there are at least 2 similar indexes by which queries can be executed - 
From and To.

  Ideas for improvement? Greenplum or EnterpriseDB? Or I forgot something from 
PostgreSQL features.

  Thanks,

  Mindaugas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate