Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Markus Schaber
Hi, Luke,

Luke Lonergan wrote:

 Since PG's heap scan is single threaded, the seek rate is equivalent to a
 single disk (even though RAID arrays may have many spindles), the typical
 random seek rates are around 100-200 seeks per second from within the
 backend.  That means that as sequential scan performance increases, such as
 happens when using large RAID arrays, the random_page_cost will range from
 50 to 300 linearly as the size of the RAID array increases.

Do you think that adding some posix_fadvise() calls to the backend to
pre-fetch some blocks into the OS cache asynchroneously could improve
that situation?

I could imagine that e. G. index bitmap scans could profit in the heap
fetching stage by fadvise()ing the next few blocks.

Maybe asynchroneous I/O could be used for the same benefit, but
posix_fadvise is less() intrusive, and can easily be #define'd out on
platforms that don't support it.

Combine this with the Linux Kernel I/O Scheduler patches (readahead
improvements) that were discussed here in summer...

Regards,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Update on high concurrency OLTP application and Postgres 8 tuning

2006-09-20 Thread Cosimo Streppone

Hi all,

I was searching tips to speed up/reduce load on a Pg8 app.
Thank you for all your suggestions on the matter.
Thread is archived here:

http://www.mail-archive.com/pgsql-performance@postgresql.org/msg18342.html

After intensive application profiling and database workload analysis,
I managed to reduce CPU load with application-level changes.

For database overload in presence of many concurrent
transactions, I found that just doing an ANALYZE on sensible
relations makes the situation better.

I scheduled a cron job every hour or so that runs an analyze on the
4/5 most intensive relations and sleeps 30 seconds between every
analyze.

This has optimized db response times when many clients run together.
I wanted to report this, maybe it can be helpful for others
out there... :-)

--
Cosimo


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Update on high concurrency OLTP application and Postgres 8 tuning

2006-09-20 Thread Andrew Sullivan
On Wed, Sep 20, 2006 at 11:09:23AM +0200, Cosimo Streppone wrote:
 
 I scheduled a cron job every hour or so that runs an analyze on the
 4/5 most intensive relations and sleeps 30 seconds between every
 analyze.
 
 This has optimized db response times when many clients run together.
 I wanted to report this, maybe it can be helpful for others
 out there... :-)

This suggests to me that your statistics need a lot of updating.  You
_might_ find that setting the statistics to a higher number on some
columns of some of your tables will allow you to analyse less
frequently.  That's a good thing just because ANALYSE will impose an
I/O load.

A 

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Update on high concurrency OLTP application and Postgres

2006-09-20 Thread Cosimo Streppone

Andrew wrote:


On Wed, Sep 20, 2006 at 11:09:23AM +0200, Cosimo Streppone wrote:

I scheduled a cron job every hour or so that runs an analyze on the
4/5 most intensive relations and sleeps 30 seconds between every
analyze.


This suggests to me that your statistics need a lot of updating.


Agreed.


You _might_ find that setting the statistics to a higher number on some
columns of some of your tables will allow you to analyse less
frequently.


At the moment, my rule of thumb is to check out the ANALYZE VERBOSE
messages to see if all table pages are being scanned.

  INFO: mytable: scanned xxx of yyy pages, containing ...

If xxx = yyy, then I keep statistics at the current level.
When xxx is way less than yyy, I increase the numbers a bit
and retry.

It's probably primitive, but it seems to work well.

 [...] ANALYSE will impose an I/O load.

In my case, analyze execution doesn't impact performance
in any noticeable way. YMMV of course.

--
Cosimo


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] running benchmark test on a 50GB database

2006-09-20 Thread Nuno Alves

Hi,

I am running bechmark test in a 50 GB postgresql database.
I have the postgresql.conf with all parameters by default.
In this configuration the database is very, very slow.

Could you please tell which is the best configuration?

My system:
Pentium D 3.0Ghz
RAM: 1GB
HD: 150GB SATA

Thanks in advance,
Nuno

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] running benchmark test on a 50GB database

2006-09-20 Thread Dave Dutcher
I would start by reading this web page:

http://powerpostgresql.com/PerfList

There are probably some other web pages out there with similar information,
or you can check the mailing list archives for a lot of info.  If those
places don't help, then you should try to indentify what queries are slow,
post an EXPLAIN ANALYZE of the slow queries along with the relvent schema
info (i.e. table definitions and indexes).



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Nuno Alves
 Sent: Wednesday, September 20, 2006 10:28 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] running benchmark test on a 50GB database
 
 
 Hi,
 
 I am running bechmark test in a 50 GB postgresql database.
 I have the postgresql.conf with all parameters by default.
 In this configuration the database is very, very slow.
 
 Could you please tell which is the best configuration?
 
 My system:
 Pentium D 3.0Ghz
 RAM: 1GB
 HD: 150GB SATA
 
 Thanks in advance,
 Nuno
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] running benchmark test on a 50GB database

2006-09-20 Thread Chris Mair

 I am running bechmark test in a 50 GB postgresql database.
 I have the postgresql.conf with all parameters by default.
 In this configuration the database is very, very slow.
 
 Could you please tell which is the best configuration?
 
 My system:
 Pentium D 3.0Ghz
 RAM: 1GB
 HD: 150GB SATA

We don't know what your database looks like, what the
queries are you're running, what very, very
slow means for you and what version of PostgreSQL
on what OS this is :/

The two links are a good starting point to tuning your DB:
http://www.postgresql.org/docs/8.1/static/performance-tips.html
http://www.powerpostgresql.com/PerfList/


Bye, Chris.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Luke Lonergan
Markus,

On 9/20/06 1:09 AM, Markus Schaber [EMAIL PROTECTED] wrote:

 Do you think that adding some posix_fadvise() calls to the backend to
 pre-fetch some blocks into the OS cache asynchroneously could improve
 that situation?

Nope - this requires true multi-threading of the I/O, there need to be
multiple seek operations running simultaneously.  The current executor
blocks on each page request, waiting for the I/O to happen before requesting
the next page.  The OS can't predict what random page is to be requested
next.

We can implement multiple scanners (already present in MPP), or we could
implement AIO and fire off a number of simultaneous I/O requests for
fulfillment.

- Luke  



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Markus Schaber
Hi, Luke,

Luke Lonergan wrote:

 Do you think that adding some posix_fadvise() calls to the backend to
 pre-fetch some blocks into the OS cache asynchroneously could improve
 that situation?
 
 Nope - this requires true multi-threading of the I/O, there need to be
 multiple seek operations running simultaneously.  The current executor
 blocks on each page request, waiting for the I/O to happen before requesting
 the next page.  The OS can't predict what random page is to be requested
 next.

I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly
meant for this purpose?

My idea was that the executor could posix_fadvise() the blocks it will
need in the near future, and later, when it actually issues the blocking
read, the block is there already. This could even give speedups in the
single-spindle case, as the I/O scheduler could already fetch the next
blocks while the executor processes the current one.

But there must be some details in the executor that prevent this.

 We can implement multiple scanners (already present in MPP), or we could
 implement AIO and fire off a number of simultaneous I/O requests for
 fulfillment.

AIO is much more intrusive to implement, so I'd preferrably look
whether posix_fadvise() could improve the situation.

Thanks,
Markus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Ron

IMHO, AIO is the architecturally cleaner and more elegant solution.

We in fact have a project on the boards to do this but funding (as 
yet) has not been found.


My $.02,
Ron


At 02:02 PM 9/20/2006, Markus Schaber wrote:

Hi, Luke,

Luke Lonergan wrote:

 Do you think that adding some posix_fadvise() calls to the backend to
 pre-fetch some blocks into the OS cache asynchroneously could improve
 that situation?

 Nope - this requires true multi-threading of the I/O, there need to be
 multiple seek operations running simultaneously.  The current executor
 blocks on each page request, waiting for the I/O to happen before 
requesting

 the next page.  The OS can't predict what random page is to be requested
 next.

I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly
meant for this purpose?

My idea was that the executor could posix_fadvise() the blocks it will
need in the near future, and later, when it actually issues the blocking
read, the block is there already. This could even give speedups in the
single-spindle case, as the I/O scheduler could already fetch the next
blocks while the executor processes the current one.

But there must be some details in the executor that prevent this.

 We can implement multiple scanners (already present in MPP), or we could
 implement AIO and fire off a number of simultaneous I/O requests for
 fulfillment.

AIO is much more intrusive to implement, so I'd preferrably look
whether posix_fadvise() could improve the situation.

Thanks,
Markus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] running benchmark test on a 50GB database

2006-09-20 Thread Jim C. Nasby
On Wed, Sep 20, 2006 at 05:47:41PM +0200, Chris Mair wrote:
 
  I am running bechmark test in a 50 GB postgresql database.
  I have the postgresql.conf with all parameters by default.
  In this configuration the database is very, very slow.
  
  Could you please tell which is the best configuration?
  
  My system:
  Pentium D 3.0Ghz
  RAM: 1GB
  HD: 150GB SATA
 
 We don't know what your database looks like, what the
 queries are you're running, what very, very
 slow means for you and what version of PostgreSQL
 on what OS this is :/
 
 The two links are a good starting point to tuning your DB:
 http://www.postgresql.org/docs/8.1/static/performance-tips.html
 http://www.powerpostgresql.com/PerfList/

Also, 1G is kinda light on memory for a 50G database.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Luke Lonergan
Markus,

On 9/20/06 11:02 AM, Markus Schaber [EMAIL PROTECTED] wrote:

 I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly
 meant for this purpose?

This is a good idea - I wasn't aware that this was possible.

We'll do some testing and see if it works as advertised on Linux and
Solaris.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq