Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Yeb Havinga

Greg Smith wrote:

Tom Lane wrote:

So has anyone looked at porting MythTV to PG?
  


Periodically someone hacks together something that works, last big 
effort I'm aware of was in 2006, and then it bit rots away.  I'm sure 
we'd get some user uptake on the result--MySQL corruption is one of 
the top ten cause of a MythTV system crashing.
It would be the same with PG, unless the pg cluster configuration with 
mythtv would come with a properly configured WAL - I had corrupted 
tables (and a personal wiki entry  (the other mysql database in my 
house) *only* when I sometimes took the risk of not shutting down the 
machine properly when e.g. the remote was missing).


regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Yeb Havinga

Yeb Havinga wrote:

Greg Smith wrote:

Tom Lane wrote:

So has anyone looked at porting MythTV to PG?
  


Periodically someone hacks together something that works, last big 
effort I'm aware of was in 2006, and then it bit rots away.  I'm sure 
we'd get some user uptake on the result--MySQL corruption is one of 
the top ten cause of a MythTV system crashing.
It would be the same with PG, unless the pg cluster configuration with 
mythtv would come with a properly configured WAL - I had corrupted 
tables (and a personal wiki entry  

forgot to add how to fix the corrupted tables, sorry
(the other mysql database in my house) *only* when I sometimes took 
the risk of not shutting down the machine properly when e.g. the 
remote was missing).


regards,
Yeb Havinga



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Greg Smith

Yeb Havinga wrote:

Greg Smith wrote:
MySQL corruption is one of the top ten cause of a MythTV system 
crashing.
It would be the same with PG, unless the pg cluster configuration with 
mythtv would come with a properly configured WAL - I had corrupted 
tables (and a personal wiki entry  (the other mysql database in my 
house) *only* when I sometimes took the risk of not shutting down the 
machine properly when e.g. the remote was missing).


You can shutdown a PostgreSQL database improperly and it will come back 
up again just fine unless a number of things have happened at just the 
wrong time:


1) You've written something to disk
2) The write is sitting in in a write cache, usually on the hard drive, 
but the OS believes the data has been written

3) There is a hard crash before that data is actually written to disk

Now, this certainly still happens with PostgreSQL; was just discussing 
that yesterday with a client who runs an app on desktop hardware in 
countries with intermittant power, and database corruption is a problem 
for them.  However, that's a fairly heavy write volume situation, which 
is not the case with most MythTV servers.  The actual window where the 
WAL will not do what it's supposed to here is pretty narrow; it's easy 
to trigger if you pull the plug when writing constantly, but that's not 
a typical MythTV database load.


Also, moving forward, we'll see the default filesystem on more Linux 
systems shift to ext4, and it's starting to lose even this 
vulnerability--newer kernels will flush the data out to disk in this 
situation using the appropriate drive command.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Chris Browne
reeds...@rice.edu (Ross J. Reedstrom) writes:
  http://www.mythtv.org/wiki/PostgreSQL_Support 

That's a pretty hostile presentation...

The page has had two states:

 a) In 2008, someone wrote up...

After some bad experiences with MySQL (data loss by commercial power
failure, very bad performance deleting old records and more) I would
prefer to have a MythTV Application option to use PostgreSQL. I
never saw such bad database behaviour at any other RDBMS than MySQL.

I'm ready to contribute at any activity going that direction (I'm
developer for commercial database applications).

 b) Deleted by GBee in 2009, indicating (Outdated, messy and
unsupported)
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://linuxfinances.info/info/spreadsheets.html
A language that doesn't affect the way you think about programming,
is not worth knowing.  -- Alan J. Perlis

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-24 Thread Chris Browne
t...@sss.pgh.pa.us (Tom Lane) writes:
 Ross J. Reedstrom reeds...@rice.edu writes:
 On Sat, Mar 20, 2010 at 10:47:30PM -0500, Andy Colson wrote:
 (I added the and trust as an after thought, because I do have one very 
 important 100% uptime required mysql database that is running.  Its my 
 MythTV box at home, and I have to ask permission from my GF before I take 
 the box down to upgrade anything.  And heaven forbid if it crashes or 
 anything.  So I do have experience with care and feeding of mysql.  And no, 
 I'm not kidding.)

 Andy, you are so me! I have the exact same one-and-only-one mission
 critical mysql DB, but the gatekeeper is my wife. And experience with
 that instance has made me love and trust PostgreSQL even more.

 So has anyone looked at porting MythTV to PG?

It has come up several times on the MythTV list.

http://david.hardeman.nu/files/patches/mythtv/mythletter.txt
http://www.mythtv.org/pipermail/mythtv-dev/2004-August/025385.html
http://www.mythtv.org/pipermail/mythtv-users/2006-July/141191.html

Probably worth asking David Härdeman and Danny Brow who have proposed
such to the MythTV community what happened.  (It's possible that they
will get cc'ed on this.)

If there's a meaningful way to help, that would be cool.  If not, then
we might as well not run slipshot across the same landmines that blew
the idea up before.
-- 
Transported  to a surreal  landscape,  a young  girl kills the  first
woman she  meets and  then teams  up with  three complete strangers to
kill again.  -- Unknown, Marin County newspaper's TV listing for _The
Wizard of Oz_

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-24 Thread Christian Brink

On 03/22/2010 03:21 PM, Tom Lane wrote:

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

   


Tom  Peter,

I thought you might like to know the outcome of this. I was able to get 
the 8.0 and the 8.2 planner to correctly run the query. There were 2 
issues. As Tom pointed out the the 'systrings' lookup seems to be the 
main culprit. Which makes sense. How can the planner know how to run the 
query when it doesn't know approximately what it will bracket the until 
the query has started?


The other part of the solution is bit concerning. I had to do a 'dump 
and load' (and vacuum analyze)  to get the planner to work correctly 
even after I rewrote the query. FYI I had run 'VACUUM ANALYZE' (and 
sometimes 'REINDEX TABLE x') between each test.



--
Christian Brink



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing index scan on query produces 16x faster

2010-03-24 Thread Robert Haas
On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick pe...@automotive.com wrote:
 I'm running 8.4.2 and have noticed a similar heavy preference for
 sequential scans and hash joins over index scans and nested loops.  Our
 database is can basically fit in cache 100% so this may not be
 applicable to your situation, but the following params seemed to help
 us:

 seq_page_cost = 1.0
 random_page_cost = 1.01
 cpu_tuple_cost = 0.0001
 cpu_index_tuple_cost = 0.5
 cpu_operator_cost = 0.25
 effective_cache_size = 1000MB
 shared_buffers = 1000MB


 Might I suggest the Postgres developers reconsider these defaults for
 9.0 release, or perhaps provide a few sets of tuning params for
 different workloads in the default install/docs? The cpu_*_cost in
 particular seem to be way off afaict. I may be dead wrong though, fwiw
 =)

The default assume that the database is not cached in RAM.  If it is,
you want to lower seq_page_cost and random_page_cost to something much
smaller, and typically make them equal.  I often recommend 0.005, but
I know others have had success with higher values.

Ultimately it would be nice to have a better model of how data gets
cached in shared_buffers and the OS buffer cache, but that is not so
easy.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] memory question

2010-03-24 Thread Campbell, Lance
PostgreSQL 8.4.3

Linux Redhat 5.0

 

Question: How much memory do I really need?

 

From my understanding there are two primary strategies for setting up
PostgreSQL in relationship to memory:

 

1)  Rely on Linux to cache the files.  In this approach you set the
shared_buffers to a relatively low number.  

2)  You can set shared_buffers to a very large percentage of your
memory so that PostgreSQL reserves the memory for the database.

 

I am currently using option #1.  I have 24 Gig of memory on my server
and the database takes up 17 Gig of disk space.  When I do the Linux
command top I notice that 19 Gig is allocated for cache.  Is there a
way for me to tell how much of that cache is associated with the caching
of database files?

 

I am basically asking how much memory do I really need?  Maybe I have
complete over kill.  Maybe I am getting to a point where I might need
more memory.

 

My thought was I could use option #2 and then set the number to a lower
amount.  If the performance is bad then slowly work the number up.

 

Our server manager seems to think that I have way to much memory.  He
thinks that we only need 5 Gig.  I don't really believe that.  But I
want to cover myself.  With money tight I don't want to be the person
who is wasting resources.  We need to replace our database servers so I
want to do the right thing.

 

Thanks,

 

Lance Campbell

Software Architect/DBA/Project Manager

Web Services at Public Affairs

217-333-0382

 



Re: [PERFORM] Forcing index scan on query produces 16x faster

2010-03-24 Thread Eger, Patrick
Ok, the wording is a bit unclear in the documentation as to whether it is the 
cost for an entire *page* of tuples, or actual tuples. So something like the 
following might give better results for a fully-cached DB?

seq_page_cost = 1.0
random_page_cost = 1.1 #even memory has random access costs, lack of readahead, 
TLB misses, etc
cpu_tuple_cost = 1.0
cpu_index_tuple_cost = 0.5
cpu_operator_cost = 0.25
effective_cache_size = 1000MB
shared_buffers = 1000MB


-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Wednesday, March 24, 2010 5:47 PM
To: Eger, Patrick
Cc: Christian Brink; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Forcing index scan on query produces 16x faster

On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick pe...@automotive.com wrote:
 I'm running 8.4.2 and have noticed a similar heavy preference for
 sequential scans and hash joins over index scans and nested loops.  Our
 database is can basically fit in cache 100% so this may not be
 applicable to your situation, but the following params seemed to help
 us:

 seq_page_cost = 1.0
 random_page_cost = 1.01
 cpu_tuple_cost = 0.0001
 cpu_index_tuple_cost = 0.5
 cpu_operator_cost = 0.25
 effective_cache_size = 1000MB
 shared_buffers = 1000MB


 Might I suggest the Postgres developers reconsider these defaults for
 9.0 release, or perhaps provide a few sets of tuning params for
 different workloads in the default install/docs? The cpu_*_cost in
 particular seem to be way off afaict. I may be dead wrong though, fwiw
 =)

The default assume that the database is not cached in RAM.  If it is,
you want to lower seq_page_cost and random_page_cost to something much
smaller, and typically make them equal.  I often recommend 0.005, but
I know others have had success with higher values.

Ultimately it would be nice to have a better model of how data gets
cached in shared_buffers and the OS buffer cache, but that is not so
easy.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] memory question

2010-03-24 Thread Scott Marlowe
On Wed, Mar 24, 2010 at 6:49 PM, Campbell, Lance la...@illinois.edu wrote:
 PostgreSQL 8.4.3

 Linux Redhat 5.0

 Question: How much memory do I really need?

The answer is as much as needed to hold your entire database in
memory and a few gig left over for sorts and backends to play in.

 From my understanding there are two primary strategies for setting up
 PostgreSQL in relationship to memory:



 1)  Rely on Linux to cache the files.  In this approach you set the
 shared_buffers to a relatively low number.

 2)  You can set shared_buffers to a very large percentage of your memory
 so that PostgreSQL reserves the memory for the database.

The kernel is better at caching large amounts of memory.  Pg is better
at handling somewhat smaller amounts and not flushing out random
access data for sequential access data.

 I am currently using option #1.  I have 24 Gig of memory on my server and
 the database takes up 17 Gig of disk space.  When I do the Linux command
 “top” I notice that 19 Gig is allocated for cache.  Is there a way for me to
 tell how much of that cache is associated with the caching of database
 files?

Probably nearly all of that 19G for cache is allocated for pg files.
Not sure how to tell off the top of my head though.

 I am basically asking how much memory do I really need?  Maybe I have
 complete over kill.  Maybe I am getting to a point where I might need more
 memory.

Actually, there are three levels of caching that are possible.  1:
Entire db, tables and indexes, can fit in RAM.  This is the fastest
method.  Worth the extra $ for RAM if you can afford it / db isn't too
huge.  2: Indexes can fit in RAM, some of tables can.  Still pretty
fast.  Definitely worth paying a little extra for.  3: Neither indexes
nor tables can wholly fit in RAM.  At this point the speed of your
large disk array becomes important, and you want a fast cachine RAID
controller.  Both of these items (disk array and RAID controller) are
considerably more costly than 16 or 32 Gigs of RAM.

 My thought was I could use option #2 and then set the number to a lower
 amount.  If the performance is bad then slowly work the number up.

I'm not sure what you mean.  Install less RAM and let PG do all the
caching?  Usually a bad idea. Usually.  I'm sure there are use cases
that it might be a good idea on.  But keep in mind, a large amount of
shared_buffers doesn't JUST buffer your reads, it also results in a
much large memory space to keep track of in terms of things that need
to get written out etc.  I'm actually about to reduce the
shared_buffers from 8G on one reporting server down to 1 or 2G cause
that's plenty, and it's having a hard time keeping up with the huge
checkpoints it's having to do.

 Our server manager seems to think that I have way to much memory.  He thinks
 that we only need 5 Gig.

How much do you absolutely need to boot up, run postgresql, and not
run out of memory?  That's what you need and it's probably around
1Gig.  It's just no less arbitraty than 5G.  Did he show you how he
arrived at this number?  If your DB is 17Gig on disk, it's foolish to
be cheap on memory.

 I don’t really believe that.  But I want to cover
 myself.  With money tight I don’t want to be the person who is wasting
 resources.  We need to replace our database servers so I want to do the
 right thing.

You can waste your time (valuable but sunk cost) other people's time
(more valuable, also sunk cost) or waste a few dollars on memory.
24Gig isn't that expensive really compared to say 10 seconds per
transaction for 100 users, 1000 times a day.  Or 11 user days in a
single day.  10s of seconds start to add up.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing index scan on query produces 16x faster

2010-03-24 Thread Robert Haas
On Wed, Mar 24, 2010 at 8:59 PM, Eger, Patrick pe...@automotive.com wrote:
 Ok, the wording is a bit unclear in the documentation as to whether it is the 
 cost for an entire *page* of tuples, or actual tuples. So something like the 
 following might give better results for a fully-cached DB?

 seq_page_cost = 1.0
 random_page_cost = 1.1 #even memory has random access costs, lack of 
 readahead, TLB misses, etc
 cpu_tuple_cost = 1.0
 cpu_index_tuple_cost = 0.5
 cpu_operator_cost = 0.25
 effective_cache_size = 1000MB
 shared_buffers = 1000MB

Yeah, you can do it that way, by jacking up the cpu_tuple costs.  I
prefer to lower the {random/seq}_page_cost values because it keeps the
cost values in the range I'm used to seeing, but it works out to the
same thing.

I am not sure that there is any benefit from making random_page_cost 
seq_page_cost on a fully cached database.  What does readahead mean in
the context of cached data?  The data isn't likely physically
contiguous in RAM, and I'm not sure it would matter much if it were.
Basically, what random_page_cost  seq_page_cost tends to do is
discourage the use of index scans in borderline cases, so you want to
benchmark it and figure out which way is faster and then tune
accordingly.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance