Re: [PERFORM] planner costs in warm cache tests

2010-06-01 Thread Scott Carey
It is still best to have random_page_cost to be slightly larger (~50%) than 
sequential_page_cost, because even when entirely in RAM, sequential reads are 
faster than random reads.  Today's CPU's do memory prefetching on sequential 
access.  Perhaps try something like 0.3 and 0.2, or half that.  You still don't 
want it to gratuitously scan a lot of RAM -- reading a page is not free and can 
kick out other pages from shared_buffers.


On May 31, 2010, at 12:55 PM, Tom Lane wrote:

 Jesper Krogh jes...@krogh.cc writes:
 On 2010-05-30 20:34, Tom Lane wrote:
 Well, hmm, I really doubt that that represents reality either.  A page
 access is by no means free even when the page is already in cache.
 I don't recall anyone suggesting that you set these numbers to less
 than perhaps 0.01.
 
 Thank you for the prompt response. Is it a false assumption that the
 cost should in some metric between different plans be a measurement
 of actual run-time in a dead-disk run?
 
 Well, the default cost parameters (seq_page_cost=1, random_page_cost=4)
 are intended to model the non-cached state where most page fetches
 actually do require a disk access.  They are definitely too large
 relative to the cpu_xxx_cost parameters when you have a fully-cached
 database, but what I've seen people recommending for that condition
 is to set them both to the same value in the vicinity of 0.1 or 0.01
 or so.  If it's only mostly cached you might try intermediate settings.
 
   regards, tom lane
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Zeus IOPS

2010-06-01 Thread Mindaugas Riauba
  Hello,

 Anybody on the list have any experience with these drives?  They get
 good numbers but I can't find diddly on them on the internet for the
 last year or so.

 http://www.stec-inc.com/product/zeusiops.php

  Most of the storage vendors (I have confirmation from EMC and HP)
use those in their SAN boxes. I believe that is because they are the
only SLC SSD makers that have supercapacitors on the SSD drive which
allows them to run with write cache enabled. As a side effect - they
are insanely expensive. :)

  Mindaugas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Matthew Wakeling

On Sun, 23 May 2010, David Jarvis wrote:

The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were causing the
planner to do a full table scan, which is correct.


I wonder if you might see some benefit from CLUSTERing the tables on the 
index.


Matthew

--
And the lexer will say Oh look, there's a null string. Oooh, there's 
another. And another., and will fall over spectacularly when it realises

there are actually rather a lot.
- Computer Science Lecturer (edited)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Matthew Wakeling

On Fri, 28 May 2010, Merlin Moncure wrote:

At best, if you are a ninja with the marginally documented backend
api, you will create code that goes about as fast as your pl/pgsql
function for 10 times the amount of input work, unless there are heavy
amounts of 'other than sql' code in your function.  The reason to
write C in the backend is:

*) Interface w/3rd party libraries w/C linkage
*) Do things that are illegal in regular SQL (write files, etc)
*) Make custom types


The major case I found when writing pl/pgsql was when trying to build 
arrays row by row. AFAIK when I tried it, adding a row to an array caused 
the whole array to be copied, which put a bit of a damper on performance.


Matthew

--
The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow words;
on occasion, English has pursued other languages down alleyways to beat
them unconscious and rifle their pockets for new vocabulary. - James Nicoll

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Stephen Frost
* Matthew Wakeling (matt...@flymine.org) wrote:
 The major case I found when writing pl/pgsql was when trying to build  
 arrays row by row. AFAIK when I tried it, adding a row to an array caused 
 the whole array to be copied, which put a bit of a damper on performance.

Using the built-ins now available in 8.4 (array_agg), that copying
doesn't happen any more.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Matthew Wakeling

On Tue, 1 Jun 2010, Stephen Frost wrote:

* Matthew Wakeling (matt...@flymine.org) wrote:

The major case I found when writing pl/pgsql was when trying to build
arrays row by row. AFAIK when I tried it, adding a row to an array caused
the whole array to be copied, which put a bit of a damper on performance.


Using the built-ins now available in 8.4 (array_agg), that copying
doesn't happen any more.


Thanks. I had wondered if that had been improved.

Matthew

--
Our riverbanks and seashores have a beauty all can share, provided
there's at least one boot, three treadless tyres, a half-eaten pork
pie, some oil drums, an old felt hat, a lorry-load of tar blocks,
and a broken bedstead there.   -- Flanders and Swann

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Zeus IOPS

2010-06-01 Thread Brad Nicholson
On Tue, 2010-06-01 at 10:27 +0300, Mindaugas Riauba wrote:
 Hello,
 
  Anybody on the list have any experience with these drives?  They get
  good numbers but I can't find diddly on them on the internet for the
  last year or so.
 
  http://www.stec-inc.com/product/zeusiops.php
 
   Most of the storage vendors (I have confirmation from EMC and HP)
 use those in their SAN boxes. I believe that is because they are the
 only SLC SSD makers that have supercapacitors on the SSD drive which
 allows them to run with write cache enabled. As a side effect - they
 are insanely expensive. :)

Texas Memory Systems also have these.
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner costs in warm cache tests

2010-06-01 Thread Tom Lane
Scott Carey sc...@richrelevance.com writes:
 It is still best to have random_page_cost to be slightly larger (~50%)
 than sequential_page_cost, because even when entirely in RAM,
 sequential reads are faster than random reads.  Today's CPU's do
 memory prefetching on sequential access.

Do you have any actual evidence of that?  Because I don't believe it.
Neither PG nor any kernel that I've ever heard of makes any effort to
ensure that logically sequential blocks occupy physically sequential
buffers, so even if the CPU tries to do some prefetching, it's not
going to help at all.

Now, if the database isn't entirely cached, then indeed it's a good
idea to keep random_page_cost higher than seq_page_cost.  But that's
because of the actual disk fetches, not anything that happens in RAM.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar jun 01 05:55:35 -0400 2010:
 On Sun, 23 May 2010, David Jarvis wrote:
  The measurement table indexes (on date and weather station) were not being
  used because the only given date ranges (e.g., 1900 - 2009) were causing the
  planner to do a full table scan, which is correct.
 
 I wonder if you might see some benefit from CLUSTERing the tables on the 
 index.

Eh, isn't this a GIN or GiST index?  I don't think you can cluster on
those, can you?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from David Jarvis's message of mar jun 01 14:01:22 -0400 2010:
 Sorry, Alvaro.
 
 I was contemplating using a GIN or GiST index as a way of optimizing the
 query.

My fault -- I didn't read the whole thread.

 Instead, I found that re-inserting the data in order of station ID (the
 primary look-up column) and then CLUSTER'ing on the station ID, taken date,
 and category index increased the speed by an order of magnitude.

Hmm, that's nice, though I cannot but wonder whether the exclusive lock
required by CLUSTER is going to be a problem in the long run.

 I might be able to drop the station/taken/category index in favour of the
 simple station index and CLUSTER on that, instead (saving plenty of disk
 space). Either way, it's fast right now so I'm not keen to try and make it
 much faster.

Hm, keep in mind that if the station clause alone is not selective
enough, scanning it may be too expensive.  The current three column
index is probably a lot faster to search (though of course it's causing
more work to be kept up to date on insertions).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Mark Kirkwood
I'm helping set up a Red Hat 5.5 system for Postgres. I was going to 
recommend xfs for the filesystem - however it seems that xfs is 
supported as a technology preview layered product for 5.5. This 
apparently means that the xfs tools are only available via special 
channels.


What are Red Hat using people choosing for a good performing filesystem?

regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Tom Lane
Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:
 I'm helping set up a Red Hat 5.5 system for Postgres. I was going to 
 recommend xfs for the filesystem - however it seems that xfs is 
 supported as a technology preview layered product for 5.5. This 
 apparently means that the xfs tools are only available via special 
 channels.

It also means that it's probably not production grade, anyway.

 What are Red Hat using people choosing for a good performing filesystem?

What's your time horizon?  RHEL6 will have full support for xfs.
On RHEL5 I really wouldn't consider anything except ext3.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Mark Kirkwood

On 02/06/10 15:26, Tom Lane wrote:


What's your time horizon?  RHEL6 will have full support for xfs.
On RHEL5 I really wouldn't consider anything except ext3.

   
Yeah, RHEL6 seems like the version we would prefer - unfortunately time 
frame is the next few days. Awesome - thanks for the quick reply!


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Devrim GÜNDÜZ
On Wed, 2010-06-02 at 15:06 +1200, Mark Kirkwood wrote:
 What are Red Hat using people choosing for a good performing
 filesystem?

ext2 (xlogs) and ext3 (data). 

For xfs, you may want to read this:

http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Mark Kirkwood

On 02/06/10 17:17, Devrim GÜNDÜZ wrote:


For xfs, you may want to read this:

http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html


   


Thanks - yes RHEL6 is the version we would have liked to use I suspect!

Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance