[PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Bryce Nesbitt
Or, if you want to actually read that query plan, try:
http://explain.depesz.com/s/qYq


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Pavel Stehule
2010/2/10 Bryce Nesbitt bry...@obviously.com:
 Or, if you want to actually read that query plan, try:
 http://explain.depesz.com/s/qYq


hello,

check your work_mem sesttings. Hash join is very slow in your case.

Pavel


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [GENERAL] PostgreSQL - case studies

2010-02-10 Thread Ing. Marcos L. Ortiz Valmaseda

El 10/02/2010 6:49, Scott Marlowe escribió:

Quick note, please stick to text formatted email for the mailing list,
it's the preferred format.

On Tue, Feb 9, 2010 at 9:09 PM, Jayadevan M
jayadevan.maym...@ibsplc.com  wrote:
   

Hello all,
Apologies for the long mail.
I work for a company that is provides solutions mostly on a Java/Oracle 
platform. Recently we moved on of our products to PostgreSQL. The main reason 
was PostgreSQL's GIS capabilities and the inability of government departments 
(especially road/traffic) to spend a lot of money for such projects. This 
product is used to record details about accidents and related analysis (type of 
road, when/why etc) with maps. Fortunately, even in India, an accident 
reporting application does not have to handle many tps :).  So, I can't say 
PostgreSQL's performance was really tested in this case.
Later, I tested one screen of one of our products - load testing with Jmeter. 
We tried it with Oracle, DB2, PostgreSQL and Ingres, and PostgreSQL easily 
out-performed the rest. We tried a transaction mix with 20+ SELECTS, update, 
delete and a few inserts.
 

Please note that benchmarking oracle (and a few other commercial dbs)
and then publishing those results without permission of oracle is
considered to be in breech of their contract.  Yeah, another wonderful
aspect of using Oracle.

That said, and as someone who is not an oracle licensee in any way,
this mimics my experience that postgresql is a match for oracle, db2,
and most other databases in the simple, single db on commodity
hardware scenario.

   

After a really good experience with the database, I subscribed to all 
PostgreSQL groups (my previous experience is all-Oracle) and reading these 
mails, I realized that many organizations are using plan, 'not customized'  
PostgreSQL for databases that handle critical applications.  Since there is no 
company trying to 'sell' PostgreSQL, many of us are not aware of such cases.
 

Actually there are several companies that sell pgsql service, and some
that sell customized versions.  RedHat, Command Prompt, EnterpriseDB,
and so on.

   

Could some of you please share some info on such scenarios- where you are 
supporting/designing/developing databases that run into at least a few hundred 
GBs of data (I know, that is small by todays' standards)?
 

There are other instances of folks on the list sharing this kind of
info you can find by searching the archives.  I've used pgsql for
about 10 years for anywhere from a few megabytes to hundreds of
gigabytes, and all kinds of applications.

Where I currently work we have a main data store for a web app that is
about 180Gigabytes and growing, running on three servers with slony
replication. We handle somewhere in the range of 10k to 20k queries
per minute (a mix of 90% or so reads to 10% writes).  Peak load can be
into the 30k or higher reqs / minute.

The two big servers that handle this load are dual quad core opteron
2.1GHz machines with 32Gig RAM and 16 15krpm SAS drives configured as
2 in RAID-1 for OS and pg_xlog, 2 hot spares, and 12 in a RAID-10 for
the main data.  HW Raid controller is the Areca 1680 which is mostly
stable, except for the occasional (once a year or so) hang problem
which has been described, and which Areca has assured me they are
working on.

Our total downtime due to database outages in the last year or so has
been 10 to 20 minutes, and that was due to a RAID card driver bug that
hits us about once every 300 to 400 days.  the majority of the down
time has been waiting for our hosting provider to hit the big red
switch and restart the main server.

Our other pgsql servers provide search facility, with a db size of
around 300Gig, and statistics at around ~1TB.

   

I am sure PostgreSQL has matured a lot more from the days when these case 
studies where posted. I went through the case studies at EnterpiseDB and 
similar vendors too. But those are customized PostgreSQL servers.
 

Not necessarily.  They sell support more than anything, and the
majority of customization is not for stability but for additional
features, such as mpp queries or replication etc.

The real issue you run into is that many people don't want to tip
their hand that they are using pgsql because it is a competitive
advantage.  It's inexpensive, capable, and relatively easy to use.  If
your competitor is convinced that Oracle or MSSQL server with $240k in
licensing each year is the best choice, and you're whipping them with
pgsql, the last thing you want is for them to figure that out and
switch.

   
Following with that subject, there are many apps on the world that are 
using PostgreSQL for its business.
We are planning the design and deployment of the a large PostgreSQL 
Cluster for a DWH-ODS-BI apps.
We are documenting everthing for give the information later to be 
published on the PostgreSQL CaseStudies section.


We are using Slony-I for replication, PgBouncer for pooling 

Re: [PERFORM] PostgreSQL - case studies

2010-02-10 Thread Kevin Grittner
Jayadevan M jayadevan.maym...@ibsplc.com wrote:
 
 Could some of you please share some info on such scenarios- where
 you are supporting/designing/developing databases that run into at
 least a few hundred GBs of data (I know, that is small by todays'
 standards)?
 
I'm a database administrator for the Wisconsin Courts.  We've got
about 200 PostgreSQL database clusters on about 100 servers spread
across the state.  Databases range from tiny (few MB) to 1.3 TB. 
 
Check out this for more info:
 
http://www.pgcon.org/2009/schedule/events/129.en.html
 
I hope that helps.  If you have any particular questions not
answered by the above, just ask.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL - case studies

2010-02-10 Thread Stephen Frost
* Kevin Grittner (kevin.gritt...@wicourts.gov) wrote:
  Could some of you please share some info on such scenarios- where
  you are supporting/designing/developing databases that run into at
  least a few hundred GBs of data (I know, that is small by todays'
  standards)?

Just saw this, so figured I'd comment:

tsf= \l+
  List of databases
   Name|  Owner   | Encoding |  Collation  |Ctype| Access 
privileges  |  Size   | Tablespace  |Description
---+--+--+-+-++-+-+---
 beac  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres 
  | 1724 GB | pg_default  | 

Doesn't look very pretty, but the point is that its 1.7TB.  There's a
few other smaller databases on that system too.  PG handles it quite
well, though this is primairly for data-mining.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] How exactly PostgreSQL allocates memory for its needs?

2010-02-10 Thread Justin Graf
On 2/10/2010 12:10 AM, Anton Maksimenkov wrote:
 Can anybody briefly explain me how each postgres process allocate
 memory for it needs?
 I mean, what is the biggest size of malloc() it may want? How many
 such chunks? What is the average size of allocations?

 I think that at first it allocates contiguous piece of shared memory
 for shared buffers (rather big, hundreds of megabytes usually, by
 one chunk).
 What next? temp_buffers, work_mem, maintenance_work_mem - are they
 allocated as contiguous too?
 What about other needs? By what size they are typically allocated?


There  is no short answer to this, you should read section 18 of the manual
http://www.postgresql.org/docs/8.4/interactive/runtime-config.html
specifically section 18.4
http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html

and performance section of the wiki
http://wiki.postgresql.org/wiki/Performance_Optimization

Here is a link annotated postgresql.conf
http://www.pgcon.org/2008/schedule/attachments/44_annotated_gucs_draft1.pdf

Keep in mind each connection/client that connecting to the server 
creates a new process on the server.  Each one the settings you list 
above is the max amount of memory each one of those sessions is allowed 
to consume.



All legitimate Magwerks Corporation quotations are sent in a .PDF file 
attachment with a unique ID number generated by our proprietary quotation 
system. Quotations received via any other form of communication will not be 
honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally 
privileged, confidential or other information proprietary to Magwerks 
Corporation and is intended solely for the use of the individual to whom it 
addresses. If the reader of this e-mail is not the intended recipient or 
authorized agent, the reader is hereby notified that any unauthorized viewing, 
dissemination, distribution or copying of this e-mail is strictly prohibited. 
If you have received this e-mail in error, please notify the sender by replying 
to this message and destroy all occurrences of this e-mail immediately.
Thank you.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline

2010-02-10 Thread Jeff


On Feb 10, 2010, at 1:37 AM, Greg Smith wrote:


Jeff wrote:
I'd done some testing a while ago on the schedulers and at the time  
deadline or noop smashed cfq.  Now, it is 100% possible since then  
that they've made vast improvements to cfq and or the VM to get  
better or similar performance.  I recall a vintage of 2.6 where  
they severely messed up the VM. Glad I didn't upgrade to that one :)


Here's the old post: 
http://archives.postgresql.org/pgsql-performance/2008-04/msg00155.php


pgiosim doesn't really mix writes into there though, does it?  The  
mixed read/write situations are the ones where the scheduler stuff  
gets messy.




It has the abillity to rewrite blocks randomly as well - but I  
honestly don't remember if I did that during my cfq/deadline test.   
I'd wager I didn't.  Maybe I'll get some time to run some more tests  
on it in the next couple days



--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



--
Jeff Trout j...@jefftrout.com
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deferred constraint and delete performance

2010-02-10 Thread Andres Freund
On Wednesday 10 February 2010 15:56:40 Tom Lane wrote:
 Franck Routier franck.rout...@axege.com writes:
  I am wondering if deferring foreign key constraints (instead of
  disableing them) would increase performance, compared to non deferred
  constraints
 
 No, it wouldn't make any noticeable difference AFAICS.  It would
 postpone the work from end-of-statement to end-of-transaction,
 but not make the work happen any more (or less) efficiently.
It could make a difference if the transaction is rather long and updates the 
same row repeatedly because of better cache usage. But I admit thats a bit of 
a constructed scenario (where one likely would get into trigger-queue size 
problems as well)

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline

2010-02-10 Thread Scott Carey

On Feb 9, 2010, at 10:37 PM, Greg Smith wrote:

 Jeff wrote:
 I'd done some testing a while ago on the schedulers and at the time 
 deadline or noop smashed cfq.  Now, it is 100% possible since then 
 that they've made vast improvements to cfq and or the VM to get better 
 or similar performance.  I recall a vintage of 2.6 where they severely 
 messed up the VM. Glad I didn't upgrade to that one :)
 
 Here's the old post: 
 http://archives.postgresql.org/pgsql-performance/2008-04/msg00155.php
 
 pgiosim doesn't really mix writes into there though, does it?  The mixed 
 read/write situations are the ones where the scheduler stuff gets messy.
 

Also, read/write mix performance depend on the file system not just the 
scheduler.
The block device readahead parameter can have a big impact too.

If you test xfs, make sure you configure the 'allocsize' mount parameter 
properly as well.  If there are any sequential reads or writes in there mixed 
with other reads/writes, that can have a big impact on how fragmented the 
filesystem gets.

Ext3 has several characteristics for writes that might favor cfq that other 
file systems do not.  Features like delayed allocation, extents, and write 
barriers significantly change the pattern of writes seen by the I/O scheduler.

In short, one scheduler may be best for one filesystem, but not a good idea for 
others.

And then on top of that, it all depends on what type of DB you're running.  
Lots of small fast mostly read queries?  Large number of small writes?  Large 
bulk writes?  Large reporting queries?  Different configurations and tuning is 
required to maximize performance on each.

There is no single rule for Postgres on Linux that I can think of other than 
never have ext3 in 'ordered' or 'journal' mode for your WAL on the same 
filesystem as your data.

 -- 
 Greg Smith2ndQuadrant   Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com  www.2ndQuadrant.com
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Robert Haas
On Wed, Feb 10, 2010 at 3:29 AM, Bryce Nesbitt bry...@obviously.com wrote:
 Or, if you want to actually read that query plan, try:
 http://explain.depesz.com/s/qYq

Much better, though I prefer a text attachment...  anyhow, I think the
root of the problem may be that both of the subquery scans under the
append node are seeing hundreds of times more rows than they're
expecting, which is causing the planner to choose nested loops higher
up that it otherwise might have preferred to implement in some other
way.  I'm not quite sure why, though.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] perf problem with huge table

2010-02-10 Thread rama


Hi all,

i am trying to move my app from M$sql to PGsql, but i need a bit of help :)


on M$sql, i had certain tables that was made as follow (sorry pseudo code)

contab_y
   date
   amt
   uid


contab_yd
  date
  amt
  uid

contab_ymd
  date
  amt 
  uid


and so on..

this was used to solidify (aggregate..btw sorry for my terrible english) the 
data on it..

so basically, i get

contab_y
date = 2010
amt = 100
uid = 1

contab_ym
  date = 2010-01
  amt = 10
  uid = 1

  date = 2010-02
  amt = 90
  uid = 1


contab_ymd
   date=2010-01-01
   amt = 1
   uid = 1

blabla


in that way, when i need to do a query for a long ranges  (ie: 1 year) i just 
take the rows that are contained to contab_y
if i need to got a query for a couple of days, i can go on ymd, if i need to 
get some data for the other timeframe, i can do some cool intersection between 
the different table using some huge (but fast) queries.


Now, the matter is that this design is hard to mantain, and the tables are 
difficult to check

what i have try is to go for a normal approach, using just a table that 
contains all the data, and some proper indexing.
The issue is that this table can contains easilly 100M rows :)
that's why the other guys do all this work to speed-up queryes splitting data 
on different table and precalculating the sums.


I am here to ask for an advice to PGsql experts: 
what do you think i can do to better manage this situation? 
there are some other cases where i can take a look at? maybe some 
documentation, or some technique that i don't know?
any advice is really appreciated!










-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Justin Graf
On 2/10/2010 5:13 PM, rama wrote:
 in that way, when i need to do a query for a long ranges  (ie: 1 year) i just 
 take the rows that are contained to contab_y
 if i need to got a query for a couple of days, i can go on ymd, if i need to 
 get some data for the other timeframe, i can do some cool intersection between
 the different table using some huge (but fast) queries.


 Now, the matter is that this design is hard to mantain, and the tables are 
 difficult to check

 what i have try is to go for a normal approach, using just a table that 
 contains all the data, and some proper indexing.
 The issue is that this table can contains easilly 100M rows :)
 that's why the other guys do all this work to speed-up queryes splitting data 
 on different table and precalculating the sums.


 I am here to ask for an advice to PGsql experts:
 what do you think i can do to better manage this situation?
 there are some other cases where i can take a look at? maybe some 
 documentation, or some technique that i don't know?
 any advice is really appreciated!

Look into table partitioning
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Its similar to what you are doing but it simplifies queries and logic to 
access large data sets.

All legitimate Magwerks Corporation quotations are sent in a .PDF file 
attachment with a unique ID number generated by our proprietary quotation 
system. Quotations received via any other form of communication will not be 
honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally 
privileged, confidential or other information proprietary to Magwerks 
Corporation and is intended solely for the use of the individual to whom it 
addresses. If the reader of this e-mail is not the intended recipient or 
authorized agent, the reader is hereby notified that any unauthorized viewing, 
dissemination, distribution or copying of this e-mail is strictly prohibited. 
If you have received this e-mail in error, please notify the sender by replying 
to this message and destroy all occurrences of this e-mail immediately.
Thank you.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Dave Crooke
Hi Rama

I'm actually looking at going in the other direction 

I have an app using PG where we have a single table where we just added a
lot of data, and I'm ending up with many millions of rows, and I'm finding
that the single table schema simply doesn't scale.

In PG, the table partitioning is only handled by the database for reads, for
insert/update you need to do quite a lot of DIY (setting up triggers, etc.)
so I am planning to just use named tables and generate the necessary DDL /
DML in vanilla SQL the same way that your older code does.

My experience is mostly with Oracle, which is not MVCC, so I've had to
relearn some stuff:

- Oracle often answers simple queries (e.g. counts and max / min) using only
the index, which is of course pre-sorted. PG has to go out and fetch the
rows to see if they are still in scope, and if they are stored all over the
place on disk it means an 8K random page fetch for each row. This means that
adding an index to PG is not nearly the silver bullet that it can be with
some non-MVCC databases.

- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut
feel, I haven't been doing true comparisons ...  however, for my app I have
limited myself to only two indexes on that table, and each index is larger
(in disk space) than the table itself ... I have 60GB of data and 140GB of
indexes :-)

- There is a lot of row turnover in my big table (I age out data)  a big
delete (millions of rows) in PG seems a bit more expensive to process than
in Oracle, however PG is not nearly as sensitive to transaction sizes as
Oracle is, so you can cheerfully throw out one big DELETE from FOO where
... and let the database chew on it

I am interested to hear about your progress.

Cheers
Dave

On Wed, Feb 10, 2010 at 4:13 PM, rama rama.r...@tiscali.it wrote:



 Hi all,

 i am trying to move my app from M$sql to PGsql, but i need a bit of help :)


 on M$sql, i had certain tables that was made as follow (sorry pseudo code)

 contab_y
   date
   amt
   uid


 contab_yd
  date
  amt
  uid

 contab_ymd
  date
  amt
  uid


 and so on..

 this was used to solidify (aggregate..btw sorry for my terrible english)
 the data on it..

 so basically, i get

 contab_y
 date = 2010
 amt = 100
 uid = 1

 contab_ym
  date = 2010-01
  amt = 10
  uid = 1
 
  date = 2010-02
  amt = 90
  uid = 1


 contab_ymd
   date=2010-01-01
   amt = 1
   uid = 1
 
 blabla


 in that way, when i need to do a query for a long ranges  (ie: 1 year) i
 just take the rows that are contained to contab_y
 if i need to got a query for a couple of days, i can go on ymd, if i need
 to get some data for the other timeframe, i can do some cool intersection
 between
 the different table using some huge (but fast) queries.


 Now, the matter is that this design is hard to mantain, and the tables are
 difficult to check

 what i have try is to go for a normal approach, using just a table that
 contains all the data, and some proper indexing.
 The issue is that this table can contains easilly 100M rows :)
 that's why the other guys do all this work to speed-up queryes splitting
 data on different table and precalculating the sums.


 I am here to ask for an advice to PGsql experts:
 what do you think i can do to better manage this situation?
 there are some other cases where i can take a look at? maybe some
 documentation, or some technique that i don't know?
 any advice is really appreciated!










 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Jorge Montero
That sure looks like the source of the problem to me too. I've seen similar 
behavior in queries not very different from that. It's hard to guess  what the 
problem is exactly without having more knowledge of the data distribution in 
article_words though.

Given the results of analyze, I'd try to run the deepest subquery and try to 
see if I could get the estimate to match reality, either by altering statistics 
targets, or tweaking the query to give more information to the planner. 

For example, i'd check if the number of expected rows from 

SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 
'insider'

is much less accurate than the estimate for

SELECT context_key FROM article_words WHERE word_key = (whatever the actual 
word_key for insider is)


 Robert Haas robertmh...@gmail.com 02/10/10 2:31 PM 
On Wed, Feb 10, 2010 at 3:29 AM, Bryce Nesbitt bry...@obviously.com wrote:
 Or, if you want to actually read that query plan, try:
 http://explain.depesz.com/s/qYq 

Much better, though I prefer a text attachment...  anyhow, I think the
root of the problem may be that both of the subquery scans under the
append node are seeing hundreds of times more rows than they're
expecting, which is causing the planner to choose nested loops higher
up that it otherwise might have preferred to implement in some other
way.  I'm not quite sure why, though.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Jon Lewison
On Wed, Feb 10, 2010 at 4:16 PM, Dave Crooke dcro...@gmail.com wrote:

 Hi Rama

 I'm actually looking at going in the other direction 

 I have an app using PG where we have a single table where we just added a
 lot of data, and I'm ending up with many millions of rows, and I'm finding
 that the single table schema simply doesn't scale.

 In PG, the table partitioning is only handled by the database for reads,
 for insert/update you need to do quite a lot of DIY (setting up triggers,
 etc.) so I am planning to just use named tables and generate the necessary
 DDL / DML in vanilla SQL the same way that your older code does.

 My experience is mostly with Oracle, which is not MVCC, so I've had to
 relearn some stuff:


Just a nit, but Oracle implements MVCC.  90% of the databases out there do.


 - Oracle often answers simple queries (e.g. counts and max / min) using
 only the index, which is of course pre-sorted. PG has to go out and fetch
 the rows to see if they are still in scope, and if they are stored all over
 the place on disk it means an 8K random page fetch for each row. This means
 that adding an index to PG is not nearly the silver bullet that it can be
 with some non-MVCC databases.

 - PG's indexes seem to be quite a bit larger than Oracle's, but that's gut
 feel, I haven't been doing true comparisons ...  however, for my app I have
 limited myself to only two indexes on that table, and each index is larger
 (in disk space) than the table itself ... I have 60GB of data and 140GB of
 indexes :-)

 - There is a lot of row turnover in my big table (I age out data)  a
 big delete (millions of rows) in PG seems a bit more expensive to process
 than in Oracle, however PG is not nearly as sensitive to transaction sizes
 as Oracle is, so you can cheerfully throw out one big DELETE from FOO where
 ... and let the database chew on it .


I find partitioning pretty useful in this scenario if the data allows is.
Aging out data just means dropping a partition rather than a delete
statement.


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Dave Crooke
On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com wrote:



 Just a nit, but Oracle implements MVCC.  90% of the databases out there do.


Sorry, I spoke imprecisely. What I meant was the difference in how the rows
are stored internally  in Oracle, the main tablespace contains only the
newest version of a row, which is (where possible) updated in place -
queries in a transaction that can still see an older version have to pull
it from the UNDO tablespace (rollback segments in Oracle 8 and older).

In Postgres, all versions of all rows are in the main table, and have
validity ranges associated with them (this version of this row existed
between transaction ids x and y). Once a version goes out of scope, it has
to be garbage collected by the vacuuming process so the space can be
re-used.

In general, this means Oracle is faster *if* you're only doing lots of small
transactions (consider how these different models handle an update to a
single field in a single row) but it is more sensitive to the scale of
transactions  doing a really big transaction against a database with an
OLTP workload can upset Oracle's digestion as it causes a lot of UNDO
lookups, PG's performance is a lot more predictable in this regard.

Both models have benefits and drawbacks ... when designing a schema for
performance it's important to understand these differences.


 I find partitioning pretty useful in this scenario if the data allows is.
 Aging out data just means dropping a partition rather than a delete
 statement.


Forgot to say this - yes, absolutely agree  dropping a table is a lot
cheaper than a transactional delete.

In general, I think partitioning is more important / beneficial with PG's
style of MVCC than with Oracle or SQL-Server (which I think is closer to
Oracle than PG).


Cheers
Dave


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Jon Lewison
On Wed, Feb 10, 2010 at 4:48 PM, Dave Crooke dcro...@gmail.com wrote:

 On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com wrote:



 Just a nit, but Oracle implements MVCC.  90% of the databases out there
 do.


 Sorry, I spoke imprecisely. What I meant was the difference in how the rows
 are stored internally  in Oracle, the main tablespace contains only the
 newest version of a row, which is (where possible) updated in place -
 queries in a transaction that can still see an older version have to pull
 it from the UNDO tablespace (rollback segments in Oracle 8 and older).

 In Postgres, all versions of all rows are in the main table, and have
 validity ranges associated with them (this version of this row existed
 between transaction ids x and y). Once a version goes out of scope, it has
 to be garbage collected by the vacuuming process so the space can be
 re-used.

 In general, this means Oracle is faster *if* you're only doing lots of
 small transactions (consider how these different models handle an update to
 a single field in a single row) but it is more sensitive to the scale of
 transactions  doing a really big transaction against a database with an
 OLTP workload can upset Oracle's digestion as it causes a lot of UNDO
 lookups, PG's performance is a lot more predictable in this regard.

 Both models have benefits and drawbacks ... when designing a schema for
 performance it's important to understand these differences.


Yes, absolutely.  It's not unusual to see the UNDO tablespace increase in
size by several gigs for a large bulk load.

Speaking of rollback segments I'm assuming that since all storage for
non-visible row versions is in the main table that PostgreSQL has no
equivalent for an ORA-01555.

- Jon


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Dave Crooke
Actually, in a way it does  No space left on device or similar ;-)

Cheers
Dave

P.S. for those not familiar with Oracle, ORA-01555 translates to your query
/ transaction is kinda old and I've forgotten the data, so I'm just going to
throw an error at you now. If you're reading, your SELECT randomly fails,
if you're writing it forces a rollback of your transaction.

On Wed, Feb 10, 2010 at 6:09 PM, Jon Lewison jlewis...@gmail.com wrote:


 Speaking of rollback segments I'm assuming that since all storage for
 non-visible row versions is in the main table that PostgreSQL has no
 equivalent for an ORA-01555.

 - Jon




Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Greg Smith

rama wrote:

in that way, when i need to do a query for a long ranges  (ie: 1 year) i just 
take the rows that are contained to contab_y
if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between 
the different table using some huge (but fast) queries.


Now, the matter is that this design is hard to mantain, and the tables are 
difficult to check
  


You sound like you're trying to implement something like materialized 
views one at a time; have you considered adopting the more general 
techniques used to maintain those so that you're not doing custom 
development each time for the design?


http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views
http://www.pgcon.org/2008/schedule/events/69.en.html

I think that sort of approach is more practical than it would have been 
for you in MySQL, so maybe this wasn't on your list of possibilities before.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline

2010-02-10 Thread Greg Smith

Scott Marlowe wrote:

I'd love to see someone do a comparison of early to mid 2.6 kernels (2.6.18 
like RHEL5) to very
up to date 2.6 kernels.  On fast hardware.


I'd be happy just to find fast hardware that works on every kernel from 
the RHEL5 2.6.18 up to the latest one without issues.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline

2010-02-10 Thread david

On Wed, 10 Feb 2010, Greg Smith wrote:


Scott Marlowe wrote:
I'd love to see someone do a comparison of early to mid 2.6 kernels (2.6.18 
like RHEL5) to very

up to date 2.6 kernels.  On fast hardware.


I'd be happy just to find fast hardware that works on every kernel from the 
RHEL5 2.6.18 up to the latest one without issues.


it depends on your definition of 'fast hardware'

I have boxes that were very fast at the time that work on all these 
kernels, but they wouldn't be considered fast by todays's standards.


remember that there is a point release about every 3 months, 2.6.33 is 
about to be released, so this is a 3 x (33-18) = ~45 month old kernel.


hardware progresses a LOT on 4 years.

most of my new hardware has no problems with the old kernels as well, but 
once in a while I run into something that doesn't work.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Bryce Nesbitt
If you guys succeed in making this class of query perform, you'll have
beat out the professional consulting firm we hired, which was all but
useless!  The query is usually slow, but particular combinations of
words seem to make it obscenely slow.

The query plans are now attached (sorry I did not start there: many
lists reject attachments). Or you can click on text at the query
planner analysis site http://explain.depesz.com/s/qYq


_Here's typical server load:_
Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  0.9%sy,  0.0%ni, 86.5%id, 11.2%wa,  0.0%hi,  0.1%si, 
0.0%st
Mem:  32966936k total, 32873860k used,93076k free, 2080k buffers
Swap: 33554424k total,  472k used, 33553952k free, 30572904k cached

_
Configurations modified from Postgres 8.3 default are:_
listen_addresses = '10.100.2.11, 10.101.2.11'   # what IP address(es) to
listen on;
port = 5432 # (change requires restart)
max_connections = 400   # (change requires restart)
shared_buffers = 4096MB # min 128kB or max_connections*16kB
work_mem = 16MB # min 64kB
max_fsm_pages =  50 # default:2
min:max_fsm_relations*16,6 bytes see:MAIN-5740
max_fsm_relations = 2700# min 100, ~70 bytes each
checkpoint_segments = 20# in logfile segments, min 1,
16MB each
random_page_cost = 2.0  # same scale as above
effective_cache_size = 28672MB
default_statistics_target = 150 # range 1-1000
log_destination = 'syslog'  # Valid values are combinations of
log_min_error_statement = error # values in order of decreasing detail:
log_min_duration_statement = 5000   # -1 is disabled, 0 logs all
statements
log_checkpoints = on# default off
autovacuum_naptime = 5min   # time between autovacuum runs
escape_string_warning = off # default:on  (See bepress
MAIN-4857)
standard_conforming_strings = off   # deafult:off (See bepress
MAIN-4857)



production=# EXPLAIN ANALYZE SELECT context_key FROM article_words
JOIN words using (word_key) WHERE word = 'insider';
-
 Nested Loop  (cost=0.00..23393.15 rows=23 width=4) (actual
time=0.077..15.637 rows=4003 loops=1)
   -  Index Scan using words_word on words  (cost=0.00..5.47 rows=1
width=4) (actual time=0.049..0.051 rows=1 loops=1)
 Index Cond: ((word)::text = 'insider'::text)
   -  Index Scan using article_words_wc on article_words 
(cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237
rows=4003 loops=1)
 Index Cond: (article_words.word_key = words.word_key)
 Total runtime: 19.776 ms

production=# EXPLAIN ANALYZE SELECT context_key FROM article_words
WHERE word_key = 3675;
---
 Index Scan using article_words_wc on article_words 
(cost=0.00..21433.53 rows=11309 width=4) (actual time=0.025..7.579
rows=4003 loops=1)
   Index Cond: (word_key = 3675)
 Total runtime: 11.704 ms



production=# explain analyze select count(*) from article_words;
Aggregate  (cost=263831.63..263831.64 rows=1 width=0) (actual
time=35851.654..35851.655 rows=1 loops=1)
   -  Seq Scan on words  (cost=0.00..229311.30 rows=13808130 width=0)
(actual time=0.043..21281.124 rows=13808184 loops=1)
 Total runtime: 35851.723 ms

production=# select count(*) from words;
13,808,184


production=# explain analyze select count(*) from article_words;
Aggregate  (cost=5453242.40..5453242.41 rows=1 width=0) (actual
time=776504.017..776504.018 rows=1 loops=1)
   -  Seq Scan on article_words  (cost=0.00..4653453.52 rows=319915552
width=0) (actual time=0.034..438969.347 rows=319956663 loops=1)
 Total runtime: 776504.177 ms

production=# select count(*) from article_words;
319,956,720

Aggregate  (cost=139279.85..139279.86 rows=1 width=4) (actual 
time=502493.000..502493.001 rows=1 loops=1)
   -  Hash Join  (cost=131314.31..139279.84 rows=2 width=4) (actual 
time=501787.397..502492.316 rows=622 loops=1)
 Hash Cond: (matview_82034.context_key = articles.context_key)
 -  Seq Scan on matview_82034  (cost=0.00..6322.20 rows=438220 
width=4) (actual time=0.014..462.312 rows=438220 loops=1)
 -  Hash  (cost=131314.30..131314.30 rows=1 width=16) (actual 
time=501553.755..501553.755 rows=622 loops=1)
   -  Nested Loop IN Join  (cost=46291.79..131314.30 rows=1 
width=16) (actual time=467.546..501550.735 rows=622 loops=1)
 Join Filter: (a.context_key = articles.context_key)
 -  Nested Loop  (cost=46291.79..46323.15 rows=2 width=12) 
(actual time=179.760..303.474 rows=1473 loops=1)
   -  Nested Loop  (cost=46291.79..46314.35 

Re: [PERFORM] Linux I/O tuning: CFQ vs. deadline

2010-02-10 Thread Greg Smith

da...@lang.hm wrote:
most of my new hardware has no problems with the old kernels as well, 
but once in a while I run into something that doesn't work.


Quick survey just of what's within 20 feet of me:
-Primary desktop:  2 years old, requires 2.6.23 or later for SATA to work
-Server:  3 years old, requires 2.6.22 or later for the Areca card not 
to panic under load
-Laptops:  both about 2 years old, and require 2.6.28 to work at all; 
mostly wireless issues, but some power management ones that impact the 
processor working right too, occasional SATA ones too.


I'm looking into a new primary desktop to step up to 8 HT cores; I fully 
expect it won't boot anything older than 2.6.28 and may take an even 
newer kernel just for basic processor and disks parts to work.


We're kind of at a worst-case point right now for this sort of thing, on 
the tail side of the almost 3 year old RHEL5 using a 3.5 year old kernel 
as the standard for so many Linux server deployments.  Until RHEL6 is 
ready to go, there's little motivation for the people who make server 
hardware to get all their drivers perfect in the newer kernels.  Just 
after that ships will probably be a good time to do that sort of 
comparison, like it was possible to easily compare RHEL4 using 2.6.9 and 
RHEL5 with 2.6.18 easily in mid to late 2007 with many bits of 
high-performance hardware known to work well on each.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2010-02-10 Thread Andres Freund
On Monday 08 February 2010 05:53:23 Robert Haas wrote:
 On Sun, Feb 7, 2010 at 10:09 PM, Alvaro Herrera
 
 alvhe...@commandprompt.com wrote:
  Andres Freund escribió:
  I personally think the fsync on the directory should be added to the
  stable branches - other opinions?
  If wanted I can prepare patches for that.
  
  Yeah, it seems there are two patches here -- one is the addition of
  fsync_fname() and the other is the fsync_prepare stuff.
 
 Andres, you want to take a crack at splitting this up?
I hope I didnt duplicate Gregs work, but I didnt hear back from him, so...

Everything 8.1 is hopeless because cp is used there... I didnt see it worth 
to replace that. The patch applies cleanly for 8.1 to 8.4 and survives the 
regression tests

Given pg's heavy commit model I didnt see a point to split the patch for 9.0 
as well...

Andres
diff --git a/src/port/copydir.c b/src/port/copydir.c
index 72fbf36..b057ffa 100644
*** a/src/port/copydir.c
--- b/src/port/copydir.c
*** copydir(char *fromdir, char *todir, bool
*** 50,55 
--- 50,56 
  {
  	DIR		   *xldir;
  	struct dirent *xlde;
+ 	int dirfd;
  	char		fromfile[MAXPGPATH];
  	char		tofile[MAXPGPATH];
  
*** copydir(char *fromdir, char *todir, bool
*** 91,96 
--- 92,116 
  	}
  
  	FreeDir(xldir);
+ 
+ 	/*
+ 	 * fsync the directory to make sure data has reached the
+ 	 * disk. While needed by most filesystems, the window got bigger
+ 	 * with newer ones like ext4.
+ 	 */
+ 	dirfd = BasicOpenFile(todir,
+ 	  O_RDONLY | PG_BINARY,
+ 	  S_IRUSR | S_IWUSR);
+ 	if(dirfd == -1)
+ 		ereport(ERROR,
+ 		(errcode_for_file_access(),
+ 		 errmsg(could not open directory for fsync \%s\: %m, todir)));
+ 
+ 	if(pg_fsync(dirfd) == -1)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not fsync directory \%s\: %m, todir)));
+ 	close(dirfd);
  }
  
  /*

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How exactly PostgreSQL allocates memory for its needs?

2010-02-10 Thread Scott Marlowe
On Wed, Feb 10, 2010 at 9:43 AM, Justin Graf jus...@magwerks.com wrote:
 Keep in mind each connection/client that connecting to the server
 creates a new process on the server.  Each one the settings you list
 above is the max amount of memory each one of those sessions is allowed
 to consume.

It's even worse for work_mem (formerly sort_mem) in that each
individual hash agg or sort can grab that much memory.  A complex
query with 4 sorts and 2 hash aggregates could chew through 6 x
work_mem if it needed it.  Which is why work_mem can be such a
horrific foot gun.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Tom Lane
Bryce Nesbitt bry...@obviously.com writes:
 The query plans are now attached (sorry I did not start there: many
 lists reject attachments). Or you can click on text at the query
 planner analysis site http://explain.depesz.com/s/qYq

At least some of the problem is the terrible quality of the rowcount
estimates in the IN subquery, as you extracted here:

  Nested Loop  (cost=0.00..23393.15 rows=23 width=4) (actual 
 time=0.077..15.637 rows=4003 loops=1)
-  Index Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) 
 (actual time=0.049..0.051 rows=1 loops=1)
  Index Cond: ((word)::text = 'insider'::text)
-  Index Scan using article_words_wc on article_words 
 (cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237 rows=4003 
 loops=1)
  Index Cond: (article_words.word_key = words.word_key)
  Total runtime: 19.776 ms

Given that it estimated 1 row out of words (quite correctly) and 12264
rows out of each scan on article_words, you'd think that the join size
estimate would be 12264, which would be off by only a factor of 3 from
the true result.  Instead it's 23, off by a factor of 200 :-(.

Running a roughly similar test case here, I see that 8.4 gives
significantly saner estimates, which I think is because of this patch:
http://archives.postgresql.org/pgsql-committers/2008-10/msg00191.php

At the time I didn't want to risk back-patching it, because there
were a lot of other changes in the same general area in 8.4.  But
it would be interesting to see what happens with your example if
you patch 8.3 similarly.  (Note: I think only the first diff hunk
is relevant to 8.3.)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance