Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-18 Thread Brian Hamlin

Hi Kevin, List, others...

On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:


Andy Colson a...@squeakycode.net wrote:


I tried shared_buffers at both 2400M and 18000M, and it took 4.5
hours both times.  ... (weak attempts at humor omitted) 


All else being the same, adjusting shared_buffers affects how much
of your cache is managed by PostgreSQL and how much of your cache is
managed by the OS; it doesn't exactly change how much you have
cached or necessarily affect disk waits.  (There's a lot more that
can be said about the fine points of this, but you don't seem to
have sorted out the big picture yet.)


  Linux caching is aggressive already.. so I think this example  
points out that
Postgres caching is not contributing here..  thats why I posted this  
short
example to this list.. I thought ti was a useful data point..  that  
it might be

useful to others... and to the PostgreSQL project devs...




I heard of this program called vmstat that I'll read up on and
post some results for.  -ignore- I dont take advice with  
vinegar well...


That's a good way to get a handle on whether your bottleneck is
currently CPU or disk access.


(attempted insults omitted)


If you're looking to make things faster (a fact not yet exactly in
evidence), you might want to start with the query which runs the
longest, or perhaps the one which most surprises you with its run
time, and get the EXPLAIN ANALYZE output for that query.  There is
other information you should include; this page should help:

http://wiki.postgresql.org/wiki/SlowQueryQuestions



  some of the queries have been gone over fairly well, other not..
Its a complex sequence and we are in production mode here,
so I dont get a chance to do everything I might do with regard to
one particular query...





I just learned about http://explain.depesz.com/ and figure it
might help me.


It is a nice way to present EXPLAIN ANALYZE output from complex
queries.



  explain.depesz.com  definitely a good reference, thank you for that..


==
Brian Hamlin
GeoCal
OSGeo California Chapter
415-717-4462 cell




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-17 Thread Andy Colson

On 03/16/2012 05:30 PM, Kevin Grittner wrote:

Brian Hamlinmapl...@light42.com  wrote:

On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:

Andy Colsona...@squeakycode.net  wrote:


I tried shared_buffers at both 2400M and 18000M, and it took 4.5
hours both times.  ... (weak attempts at humor omitted) 


Ah, I didn't pick up on the attempts at humor; perhaps that's why
you mistook something I said as an attempt at an insult.


It wasn't you Kevin, it was me that insulted him.  (Although I was trying to be 
funny, and not mean).

Sorry again Brian.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-17 Thread Kevin Grittner
Andy Colson  wrote:
On 03/16/2012 05:30 PM, Kevin Grittner wrote:
 
 Ah, I didn't pick up on the attempts at humor; perhaps that's why
 you mistook something I said as an attempt at an insult.

 It wasn't you Kevin, it was me that insulted him. (Although I was
 trying to be funny, and not mean).
 
Adding to the confusion, I think I missed one of the emails/posts. 
Oh, well, it sounds like mostly people need to use more smiley-faces,
since humor can be so easy to miss in this medium.
 
Brian, I hope this doesn't put you off from posting -- we try to be
helpful here.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Shared memory for large PostGIS operations

2012-03-16 Thread maplabs
(from #postgresql IRC on freenode)

darkblue_b I did an interesting experiment the other day davidfetter_vmw .. 
davidfetter_vmw do tell
darkblue_b well you know I do these huge monolithic postGIS queries on an 
otherwise idle linux machine.. and there was a persistant thought in my head 
that Postgresql+PostGIS did not make good use of memory allocation 2G

darkblue_b so I had this long, python driven analysis.. 15 steps.. some, 
unusual for me, are multiple queries running at once on the same data ... and 
others are just one labor intensive thing then the next  
    (one result table is 1.8M rows for 745M on disk, others are smaller)

darkblue_b I finally got the kinks worked out.. so I ran it twice.. 4.5 hours 
on our hardware.. once with shared_buffers set to 2400M and the second time 
with shared_buffers set to 18000M

darkblue_b work_mem was unchanged at 640M and.. the run times were within 
seconds of each other.. no improvement, no penalty

darkblue_b I have been wondering about this for the last two years

davidfetter_vmw darkblue_b, have you gone over any of this on -performance or 
-hackers? darkblue_b no - though I think I should start a blog .. I have a 
couple of things like this now darkblue_b good story though eh ?

 davidfetter_vmw darkblue_b, um, it's a story that hasn't really gotten started 
until you've gotten some feedback from -performance darkblue_b ok - true...

darkblue_b    pg 9.1  PostGIS 1.5.3    Ubuntu Server Oneiric 64bit  Dual Xeons  
one Western Digital black label for pg_default; one 3-disk RAID 5 for the 
database tablespace

==
Brian Hamlin
GeoCal
OSGeo California Chapter
415-717-4462 cell


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-16 Thread Andy Colson



So let me clean that up for you:

 On 3/14/2012 11:29 PM, mapl...@light42.com wrote:

Hello list, my name is Brian Hamlin, but I prefer to go by darkblue, its 
mysterious and dangerous!


I run PG 9.1, PostGIS 1.5.3, Linux 64 on Dual Xeons, OS on a single 
drive, and db is on 3-disk raid 5.  I'm the only user.


work_mem = 640M

I do these huge monolithic postGIS queries on an otherwise idle linux 
machine.  python driven analysis.. 15 steps.. some, unusual for me, are 
multiple queries running at once on the same data ... and others are 
just one labor intensive thing then the next (one result table is 1.8M 
rows for 745M on disk, others are smaller)


I tried shared_buffers at both 2400M and 18000M, and it took 4.5 hours 
both times.  I dont know if I am CPU bound or IO bound, but since giving 
PG more ram didnt help much, I'll assume I'm CPU bound.  I heard of this 
program called vmstat that I'll read up on and post some results for.


I don't know how much memory my box has, and I've never run explain 
analyze, but I'll try it out and post some.  I just learned about 
http://explain.depesz.com/ and figure it might help me.


This is the best list ever!  Thanks all!  (especially that poetic Dave 
Fetter, and that somewhat mean, but helpful, Andy Colson)


Shout outs to my friends Garlynn, Nick and Rush (best band ever!). 
Party, my house, next week!



==
(Virtually) Brian Hamlin
GeoCal
OSGeo California Chapter
415-717-4462 cell



-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-16 Thread Kevin Grittner
Andy Colson a...@squeakycode.net wrote:
 
 I tried shared_buffers at both 2400M and 18000M, and it took 4.5
 hours both times.  I dont know if I am CPU bound or IO bound, but
 since giving PG more ram didnt help much, I'll assume I'm CPU
 bound.
 
All else being the same, adjusting shared_buffers affects how much
of your cache is managed by PostgreSQL and how much of your cache is
managed by the OS; it doesn't exactly change how much you have
cached or necessarily affect disk waits.  (There's a lot more that
can be said about the fine points of this, but you don't seem to
have sorted out the big picture yet.)
 
 I heard of this program called vmstat that I'll read up on and
 post some results for.
 
That's a good way to get a handle on whether your bottleneck is
currently CPU or disk access.
 
 I don't know how much memory my box has
 
That's pretty basic information when it comes to tuning.  What does
`free -m` show?  (Copy/paste is a good thing.)
 
 and I've never run explain analyze
 
If you're looking to make things faster (a fact not yet exactly in
evidence), you might want to start with the query which runs the
longest, or perhaps the one which most surprises you with its run
time, and get the EXPLAIN ANALYZE output for that query.  There is
other information you should include; this page should help:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
 I just learned about http://explain.depesz.com/ and figure it
 might help me.
 
It is a nice way to present EXPLAIN ANALYZE output from complex
queries.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shared memory for large PostGIS operations

2012-03-16 Thread Kevin Grittner
Brian Hamlin mapl...@light42.com wrote:
 On Mar 16, 2012, at 7:17 AM, Kevin Grittner wrote:
 Andy Colson a...@squeakycode.net wrote:

 I tried shared_buffers at both 2400M and 18000M, and it took 4.5
 hours both times.  ... (weak attempts at humor omitted) 
 
Ah, I didn't pick up on the attempts at humor; perhaps that's why
you mistook something I said as an attempt at an insult.  We get
posts here from people at all different levels of experience, and
many people are grateful for pointers on what various utilities can
do for them or how best to formulate a post so they can get help
when they need it.  Attempts to help don't constitute insults, even
if the need is feigned.
 
 All else being the same, adjusting shared_buffers affects how
 much of your cache is managed by PostgreSQL and how much of your
 cache is managed by the OS; it doesn't exactly change how much
 you have cached or necessarily affect disk waits.
 
 Linux caching is aggressive already.. so I think this example  
 points out that Postgres caching is not contributing here..  thats
 why I posted this short example to this list.. I thought ti was a
 useful data point.. that it might be useful to others... and to
 the PostgreSQL project devs...
 
Yeah, guidelines for shared_buffers in the docs are vague because
the best setting varies so much with the workload.  While the docs
hint at setting it at 25% of the computer's RAM, most benchmarks
posted on this list have found throughput to peak at around 8GB to
10GB on system where 25% would be more than that.  (Far less on
Windows, as the docs mention.)
 
There can be a point well before that where there are latency
spikes.  In our shop we have a multi-TB database backing a web site,
and to prevent these latency spikes we keep shared_buffers down to
2GB even though the system has 128GB RAM.  Forcing dirty pages out
to the OS cache helps them to be written in a more timely manner by
code which knows something about the hardware and what order of
writes will be most efficient.  PostgreSQL has, as a matter of a
design choice, decided to leave a lot to the OS caching, file
systems, and device drivers, and a key part of tuning is to discover
what balance of that versus the DBMS caching performs best for your
workload.
 
 some of the queries have been gone over fairly well, other not..
 Its a complex sequence and we are in production mode here,
 so I dont get a chance to do everything I might do with regard to
 one particular query...
 
You may want to take a look at auto_explain:
 
http://www.postgresql.org/docs/current/interactive/auto-explain.html
 
Since you're already in production it may be hard to test the
performance of your disk system, but it's a pretty safe bet that if
you are at all disk-bound you would benefit greatly from adding one
more drive and converting your 3 drive RAID 5 to a 4 drive RAID 10,
preferably with a RAID controller with BBU cache configured for
write-back.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance