Re: [PERFORM] Typecast bug?

2008-06-26 Thread Craig James

Tom Lane wrote:

Craig James [EMAIL PROTECTED] writes:

This seems like a bug to me, but it shows up as a performance problem.



emol_warehouse_1= explain analyze select version_id, parent_id from version 
where version_id = 999;


If you actually *need* so many 9's here as to force it out of the range
of bigint, then why is your id column not declared numeric?

This seems to me to be about on par with complaining that intcol = 4.2e1
won't be indexed.  We have a numeric data type hierarchy, learn to
work with it ...


Your suggestion of learn to work with it doesn't fly.  A good design 
separates the database schema details from the application to the greatest extent 
possible.  What you're suggesting is that every application that queries against a 
Postgres database should know the exact range of every numeric data type of every indexed 
column in the schema, simply because Postgres can't recognize an out-of-range numeric 
value.

In this case, the optimizer could have instantly returned zero results with no 
further work, since the query was out of range for that column.

This seems like a pretty simple optimization to me, and it seems like a helpful 
suggestion to make to this forum.

BTW, this query came from throwing lots of junk at a web app in an effort to 
uncover exactly this sort of problem.  It's not a real query, but then, hackers 
don't use real queries.  The app checks that its input is a well-formed integer 
expression, but then assumes Postgres can deal with it from there.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Typecast bug?

2008-06-26 Thread Frank Joerdens
On 6/26/08, Craig James [EMAIL PROTECTED] wrote:
 This seems like a bug to me, but it shows up as a performance problem.
 Since the column being queried is an integer, the second query (see below)
 can't possibly match, yet Postgres uses a typecast, forcing a full table
 scan for a value that can't possibly be in the table.

Which version are you using? 8.3 removes a lot of implicit casts (all?
not sure), so this may already be your fix.

Cheers,

Frank

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Typecast bug?

2008-06-26 Thread Guillaume Smet
On Thu, Jun 26, 2008 at 9:02 AM, Frank Joerdens [EMAIL PROTECTED] wrote:
 Which version are you using? 8.3 removes a lot of implicit casts (all?
 not sure), so this may already be your fix.

8.3 only removed implicit casts from non text types to text (date -
text, int - text, interval - text...) to avoid unexpected
behaviours.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread jay
I know the problem, because there are about 35 million rows , which
cost about 12G disk space and checkpoint segments use 64, but update
operation is in one transaction which lead fast fill up the checkpoint
segments and lead do checkpoints frequently, but checkpoints will cost lots
resources, so update operation become slowly and slowly and bgwrite won't
write because it's not commit yet.
Create a new table maybe a quick solution, but it's not appropriated in some
cases.
If we can do commit very 1000 row per round, it may resolve the
problem.
But  PG not support transaction within function yet? 

-邮件原件-
发件人: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 代表 Heikki Linnakangas
发送时间: 2008年6月25日 18:11
收件人: jay
抄送: pgsql-performance@postgresql.org
主题: Re: [PERFORM] Postgresql update op is very very slow

jay wrote:
 I've a table with about 34601755 rows ,when I execute 'update msg_table
set
 type=0;' is very very slow, cost several hours, but still not complete?
 
 Why postgresql is so slowly? Is the PG MVCC problem? 

Possibly. Because of MVCC, a full-table update will actually create a 
new version of each row.

I presume that's a one-off query, or a seldom-run batch operation, and 
not something your application needs to do often. In that case, you 
could drop all indexes, and recreate them after the update, which should 
help a lot:

BEGIN;
DROP INDEX index name, index name 2, ...; -- for each index
UPDATE msg_table SET type = 0;
CREATE INDEX ... -- Recreate indexes
COMMIT;

Or even better, instead of using UPDATE, do a SELECT INTO a new table, 
drop the old one, and rename the new one in its place. That has the 
advantage that the new table doesn't contain the old row version, so you 
don't need to vacuum right away to reclaim the space.

Actually, there's an even more clever trick to do roughly the same thing:

ALTER TABLE msg_table ALTER COLUMN type TYPE int4 USING 0;

(assuming type is int4, replace with the actual data type if necessary)

This will rewrite the table, similar to a DROP + CREATE, and rebuild all 
indexes. But all in one command.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Pavan Deolasee
2008/6/26 jay [EMAIL PROTECTED]:

If we can do commit very 1000 row per round, it may resolve the
 problem.
 But  PG not support transaction within function yet?


Yeah, transaction control is not supported inside functions. There are
some hacks using dblink to do transactions inside functions. You may
want to check that out.

I had suggested another hack in the past for very simplistic updates,
when you are sure that the tuple length does not change between
updates and you are ready to handle half updated table if there is a
crash or failure in between. May be for your case, where you are
updating a single column of the entire table and setting it to some
default value for all the rows, it may work fine. But please be aware
of data consistency issues before you try that. And it must be once in
a lifetime kind of hack.

http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: ??: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Heikki Linnakangas

jay wrote:

I know the problem, because there are about 35 million rows , which
cost about 12G disk space and checkpoint segments use 64, but update
operation is in one transaction which lead fast fill up the checkpoint
segments and lead do checkpoints frequently, but checkpoints will cost lots
resources, so update operation become slowly and slowly and bgwrite won't
write because it's not commit yet.
Create a new table maybe a quick solution, but it's not appropriated in some
cases.
If we can do commit very 1000 row per round, it may resolve the
problem.


Committing more frequently won't help you with checkpoints. The updates 
will generate just as much WAL regardless of how often you commit, so 
you will have to checkpoint just as often. And commits have no effect on 
bgwriter either; bgwriter will write just as much regardless of how 
often you commit.


One idea would be to partition the table vertically, that is, split the 
table into two tables, so that the columns that you need to update like 
that are in one table, together with the primary key, and the rest of 
the columns are in another table. That way the update won't need to scan 
or write the columns that are not changed. You can create a view on top 
of the two tables to make them look like the original table to the 
application.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ??: Postgresql update op is very very slow

2008-06-26 Thread Holger Hoffstaette

Hi -

I have been following this thread and find some of the recommendations
really surprising. I understand that MVCC necessarily creates overhead,
in-place updates would not be safe against crashes etc. but have a hard
time believing that this is such a huge problem for RDBMS in 2008. How do
large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
(maybe a different kind?) as well, but I cannot believe that large updates
still pose such big problems.
Are there no options (algorithms) for adaptively choosing different
update strategies that do not incur the full MVCC overhead?

Holger

(Disclaimer: I'm not a professional DBA, just a curious developer).



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ??: Postgresql update op is very very slow

2008-06-26 Thread Craig Ringer

Holger Hoffstaette wrote:

Hi -

I have been following this thread and find some of the recommendations
really surprising. I understand that MVCC necessarily creates overhead,
in-place updates would not be safe against crashes etc. but have a hard
time believing that this is such a huge problem for RDBMS in 2008. How do
large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
(maybe a different kind?) as well, but I cannot believe that large updates
still pose such big problems.
Are there no options (algorithms) for adaptively choosing different
update strategies that do not incur the full MVCC overhead?


I think Pg already does in place updates, or close, if the tuples being 
replaced aren't referenced by any in-flight transaction. I noticed a 
while ago that if I'm doing bulk load/update work, if there aren't any 
other transactions no MVCC bloat seems to occur and updates are faster.


I'd be interested to have this confirmed, as I don't think I've seen it 
documented anywhere. Is it a side-effect/benefit of HOT somehow?


--
Craig Ringer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware suggestions for high performance 8.3

2008-06-26 Thread Henrik
I've seen some concerns about buying database performance hardware  
from DELL. Are there at least some of the RAID cards that work well  
with Linux or should I stay clear of DELL permanently?


Thanks!

//Henke
25 jun 2008 kl. 17.45 skrev Greg Smith:


On Wed, 25 Jun 2008, Henrik wrote:


Would you turn off fsync if you had a controller with BBU? =)


Turning off fsync has some potential to introduce problems even in  
that environment, so better not to do that.  The issue is that you  
might have, say, 1GB of OS-level cache but 256MB of BBU cache, and  
if you turn fsync off it won't force the OS cache out to the  
controller when it's supposed to and that can cause corruption.


Also, if you've got a controller with BBU, the overhead of fsync for  
regular writes is low enough that you don't really need to turn it  
off. If writes are cached the fsync is almost free.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com  
Baltimore, MD


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Greg Smith

On Wed, 25 Jun 2008, Andrew Sullivan wrote:

the key thing to do is to ensure you have good testing infrastructure in 
place to check that things will work before you deploy to production.


This is true whether you're using Linux or completely closed source 
software.  There are two main differences from my view:


-OSS software lets you look at the code before a typical closed-source 
company would have pushed a product out the door at all.  Downside is that 
you need to recognize that.  Linux kernels for example need significant 
amounts of encouters with the real world after release before they're 
ready for most people.


-If your OSS program doesn't work, you can potentially find the problem 
yourself.  I find that I don't fix issues when I come across them very 
much, but being able to browse the source code for something that isn't 
working frequently makes it easier to understand what's going on as part 
of troubleshooting.


It's not like closed source software doesn't have the same kinds of bugs. 
The way commercial software (and projects like PostgreSQL) get organized 
into a smaller number of official releases tends to focus the QA process a 
bit better though, so that regular customers don't see as many rough 
edges.  Linux used to do a decent job of this with their development vs. 
stable kernels, which I really miss.  Unfortunately there's just not 
enough time for the top-level developers to manage that while still 
keeping up with the pace needed just for new work.  Sorting out which are 
the stable kernel releases seems to have become the job of the 
distributors (RedHat, SuSE, Debian, etc.) instead of the core kernel 
developers.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Vivek Khera


On Jun 25, 2008, at 11:35 AM, Matthew Wakeling wrote:


On Wed, 25 Jun 2008, Greg Smith wrote:

A firewire-attached log device is an extremely bad idea.


Anyone have experience with IDE, SATA, or SAS-connected flash  
devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely -  
32GB, at a transfer rate of 100MB/s, and doesn't degrade much in  
performance when writing small random blocks. But what's it actually  
like, and is it reliable?


None of these manufacturers rates these drives for massive amounts of  
writes.  They're sold as suitable for laptop/desktop use, which  
normally is not a heavy wear and tear operation like a DB.  Once they  
claim suitability for this purpose, be sure that I and a lot of others  
will dive into it to see how well it really works.  Until then, it  
will just be an expensive brick-making experiment, I'm sure.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ??: Postgresql update op is very very slow

2008-06-26 Thread Andrew Sullivan
On Thu, Jun 26, 2008 at 02:40:59PM +0200, Holger Hoffstaette wrote:

 large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
 (maybe a different kind?) as well, but I cannot believe that large updates
 still pose such big problems.

DB2 does not use MVCC.  This is why lock escalation is such a big
problem for them.

Oracle uses a kind of MVCC based on rollback segments: your work goes
into the rollback segment, so that it can be undone, and the update
happens in place.  This causes a different kind of pain: you can run
out of rollback segments (part way through a long-running transaction,
even) and then have to undo everything in order to do any work at
all.  Every system involves trade-offs, and different systems make
different ones.  The bulk update problem is PostgreSQL's weak spot,
and for that cost one gets huge other benefits.  

 Are there no options (algorithms) for adaptively choosing different
 update strategies that do not incur the full MVCC overhead?

How would you pick?  But one thing you could do is create the table
with a non-standard fill factor, which might allow HOT to work its magic.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ??: Postgresql update op is very very slow

2008-06-26 Thread Andrew Sullivan
On Thu, Jun 26, 2008 at 09:16:25PM +0800, Craig Ringer wrote:

 I think Pg already does in place updates, or close, if the tuples being 
 replaced aren't referenced by any in-flight transaction. I noticed a while 
 ago that if I'm doing bulk load/update work, if there aren't any other 
 transactions no MVCC bloat seems to occur and updates are faster.

Are you on 8.3?  That may be HOT working for you.  MVCC doesn't get
turned off if there are no other transactions (it can't: what if
another transaction starts part way through yours?).

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ??: Postgresql update op is very very slow

2008-06-26 Thread Mark Mielke

Holger Hoffstaette wrote:

Hi -

I have been following this thread and find some of the recommendations
really surprising. I understand that MVCC necessarily creates overhead,
in-place updates would not be safe against crashes etc. but have a hard
time believing that this is such a huge problem for RDBMS in 2008. How do
large databases treat mass updates? AFAIK both DB2 and Oracle use MVCC
(maybe a different kind?) as well, but I cannot believe that large updates
still pose such big problems.
Are there no options (algorithms) for adaptively choosing different
update strategies that do not incur the full MVCC overhead?
  


My opinion:

Any system that provides cheap UPDATE operations is either not ACID 
compliant, or is not designed for highly concurrent access, possibly 
both. By ACID compliant I mean that there both the OLD and NEW need to 
take space on the hard disk in order to guarantee that if a failure 
occurs in the middle of the transaction, one can select only the OLD 
versions for future transactions, or if it fails after the end fo the 
transaction, one can select only the NEW versions for future 
transactions. If both must be on disk, it follows that updates are 
expensive. Even with Oracle rollback segments - the rollback segments 
need to be written. Perhaps they will be more sequential, and able to be 
written more efficiently, but the data still needs to be written. The 
other option is to make sure that only one person is doing updates at a 
time, and in this case it becomes possible (although not necessarily 
safe unless one implements the ACID compliant behaviour described in the 
previous point) for one operation to complete before the next begins.


The HOT changes introduced recently into PostgreSQL should reduce the 
cost of updates in many cases (but not all - I imagine that updating ALL 
rows is still expensive).


There is a third system I can think of, but I think it's more 
theoretical than practical. That is, remember the list of changes to 
each row/column and replay them on query. The database isn't ever 
stored in a built state, but is only kept as pointers that allow any 
part of the table to be re-built on access. The UPDATE statement could 
be recorded cheaply, but queries against the UPDATE statement might be 
very expensive. :-)


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Tom Lane
jay [EMAIL PROTECTED] writes:
   I know the problem, because there are about 35 million rows , which
 cost about 12G disk space and checkpoint segments use 64, but update
 operation is in one transaction which lead fast fill up the checkpoint
 segments and lead do checkpoints frequently, but checkpoints will cost lots
 resources, so update operation become slowly and slowly and bgwrite won't
 write because it's not commit yet.
 Create a new table maybe a quick solution, but it's not appropriated in some
 cases.
   If we can do commit very 1000 row per round, it may resolve the
 problem.

No, that's utterly unrelated.  Transaction boundaries have nothing to do
with checkpoints.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Scott Marlowe
2008/6/26 Pavan Deolasee [EMAIL PROTECTED]:
 2008/6/26 jay [EMAIL PROTECTED]:

If we can do commit very 1000 row per round, it may resolve the
 problem.
 But  PG not support transaction within function yet?


 Yeah, transaction control is not supported inside functions. There are
 some hacks using dblink to do transactions inside functions. You may
 want to check that out.

If you need autonomous transactions.  For most people save points and
catching seem to be a n acceptable form of transaction control.

 I had suggested another hack in the past for very simplistic updates,
 when you are sure that the tuple length does not change between
 updates and you are ready to handle half updated table if there is a
 crash or failure in between. May be for your case, where you are
 updating a single column of the entire table and setting it to some
 default value for all the rows, it may work fine. But please be aware
 of data consistency issues before you try that. And it must be once in
 a lifetime kind of hack.

 http://postgresql-in.blogspot.com/2008/04/postgresql-in-place-update.html

In a way that's what pg_bulkloader does.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] 答复: [PERFORM] Postgresql update op is very very slow

2008-06-26 Thread Scott Marlowe
2008/6/26 Tom Lane [EMAIL PROTECTED]:
 jay [EMAIL PROTECTED] writes:
   I know the problem, because there are about 35 million rows , which
 cost about 12G disk space and checkpoint segments use 64, but update
 operation is in one transaction which lead fast fill up the checkpoint
 segments and lead do checkpoints frequently, but checkpoints will cost lots
 resources, so update operation become slowly and slowly and bgwrite won't
 write because it's not commit yet.
 Create a new table maybe a quick solution, but it's not appropriated in some
 cases.
   If we can do commit very 1000 row per round, it may resolve the
 problem.

 No, that's utterly unrelated.  Transaction boundaries have nothing to do
 with checkpoints.

True.  But if you update 1 rows and vacuum you can keep the bloat
to something reasonable.

On another note, I haven't seen anyone suggest adding the appropriate
where clause to keep from updating rows that already match.  Cheap
compared to updating the whole table even if a large chunk aren't a
match.  i.e.

... set col=0 where col 0;

That should  be the first thing you reach for in this situation, if it can help.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Matthew Wakeling

On Thu, 26 Jun 2008, Vivek Khera wrote:
Anyone have experience with IDE, SATA, or SAS-connected flash devices like 
the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer 
rate of 100MB/s, and doesn't degrade much in performance when writing small 
random blocks. But what's it actually like, and is it reliable?


None of these manufacturers rates these drives for massive amounts of writes. 
They're sold as suitable for laptop/desktop use, which normally is not a 
heavy wear and tear operation like a DB.  Once they claim suitability for 
this purpose, be sure that I and a lot of others will dive into it to see how 
well it really works.  Until then, it will just be an expensive brick-making 
experiment, I'm sure.


It claims a MTBF of 2,000,000 hours, but no further reliability 
information seems forthcoming. I thought the idea that flash couldn't cope 
with many writes was no longer true these days?


Matthew

--
I work for an investment bank. I have dealt with code written by stock
exchanges. I have seen how the computer systems that store your money are
run. If I ever make a fortune, I will store it in gold bullion under my
bed.  -- Matthew Crosby

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Scott Marlowe
On Thu, Jun 26, 2008 at 10:14 AM, Matthew Wakeling [EMAIL PROTECTED] wrote:
 On Thu, 26 Jun 2008, Vivek Khera wrote:

 Anyone have experience with IDE, SATA, or SAS-connected flash devices
 like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a
 transfer rate of 100MB/s, and doesn't degrade much in performance when
 writing small random blocks. But what's it actually like, and is it
 reliable?

 None of these manufacturers rates these drives for massive amounts of
 writes. They're sold as suitable for laptop/desktop use, which normally is
 not a heavy wear and tear operation like a DB.  Once they claim suitability
 for this purpose, be sure that I and a lot of others will dive into it to
 see how well it really works.  Until then, it will just be an expensive
 brick-making experiment, I'm sure.

 It claims a MTBF of 2,000,000 hours, but no further reliability information
 seems forthcoming. I thought the idea that flash couldn't cope with many
 writes was no longer true these days?

What's mainly happened is a great increase in storage capacity has
allowed flash based devices to spread their writes out over so many
cells that the time it takes to overwrite all the cells enough to get
dead ones is measured in much longer intervals.  Instead of dieing in
weeks or months, they'll now die, for most work loads, in years or
more.

However, I've tested a few less expensive solid state storage and for
some transactional loads it was much faster, but then for things like
report queries scanning whole tables they were factors slower than a
sw RAID-10 array of just 4 spinning disks.  But pg_bench was quite
snappy using the solid state storage for pg_xlog.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2008 at 9:49 AM, Peter T. Breuer [EMAIL PROTECTED] wrote:
 Also sprach Merlin Moncure:
 As discussed down thread, software raid still gets benefits of
 write-back caching on the raid controller...but there are a couple of

 (I wish I knew what write-back caching was!)

hardware raid controllers generally have some dedicated memory for
caching.  the controllers can be configured in one of two modes: (the
jargon is so common it's almost standard)
write back: raid controller can lie to host o/s. when o/s asks
controller to sync, controller can hold data in cache (for a time)
write through: raid controller can not lie. all sync requests must
pass through to disk

The thinking is, the bbu on the controller can hold scheduled writes
in memory (for a time) and replayed to disk when server restarts in
event of power failure.  This is a reasonable compromise between data
integrity and performance.  'write back' caching provides insane burst
IOPS (because you are writing to controller cache) and somewhat
improved sustained IOPS because the controller is reorganizing writes
on the fly in (hopefully) optimal fashion.

 This imposes a considerable extra resource burden. It's a mystery to me
 However the lack of extra buffering is really deliberate (double
 buffering is a horrible thing in many ways, not least because of the

snip
completely unconvincing.  the overhead of various cache layers is
completely minute compared to a full fault to disk that requires a
seek which is several orders of magnitude slower.

The linux software raid algorithms are highly optimized, and run on a
presumably (much faster) cpu than what the controller supports.
However, there is still some extra oomph you can get out of letting
the raid controller do what the software raid can't...namely delay
sync for a time.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2008 at 12:14 PM, Matthew Wakeling [EMAIL PROTECTED] wrote:
 None of these manufacturers rates these drives for massive amounts of
 writes. They're sold as suitable for laptop/desktop use, which normally is
 not a heavy wear and tear operation like a DB.  Once they claim suitability
 for this purpose, be sure that I and a lot of others will dive into it to
 see how well it really works.  Until then, it will just be an expensive
 brick-making experiment, I'm sure.

 It claims a MTBF of 2,000,000 hours, but no further reliability information
 seems forthcoming. I thought the idea that flash couldn't cope with many
 writes was no longer true these days?

Flash and disks have completely different failure modes, and you can't
do apples to apples MTBF comparisons.  In addition there are many
different types of flash (MLC/SLC) and the flash cells themselves can
be organized in particular ways involving various trade-offs.

The best flash drives combined with smart wear leveling are
anecdotally believed to provide lifetimes that are good enough to
warrant use in high duty server environments.  The main issue is lousy
random write performance that basically makes them useless for any
kind of OLTP operation.  There are a couple of software (hacks?) out
there which may address this problem if the technology doesn't get
there first.

If the random write problem were solved, a single ssd would provide
the equivalent of a stack of 15k disks in a raid 10.

see:
http://www.bigdbahead.com/?p=44
http://feedblog.org/2008/01/30/24-hours-with-an-ssd-and-mysql/

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread Peter T. Breuer
Also sprach Merlin Moncure:
 write back: raid controller can lie to host o/s. when o/s asks

This is not what the linux software raid controller does, then. It 
does not queue requests internally at all, nor ack requests that have
not already been acked by the components (modulo the fact that one can
deliberately choose to have a slow component not be sync by allowing
write-behind on it, in which case the controller will ack the
incoming request after one of the compionents has been serviced,
without waiting for both).

 integrity and performance.  'write back' caching provides insane burst
 IOPS (because you are writing to controller cache) and somewhat
 improved sustained IOPS because the controller is reorganizing writes
 on the fly in (hopefully) optimal fashion.

This is what is provided by Linux file system and (ordinary) block
device driver subsystem. It is deliberately eschewed by the soft raid
driver, because any caching will already have been done above and below
the driver, either in the FS or in the components. 

  However the lack of extra buffering is really deliberate (double
  buffering is a horrible thing in many ways, not least because of the
 
 snip
 completely unconvincing. 

But true.  Therefore the problem in attaining conviction must be at your
end.  Double buffering just doubles the resources dedicated to a single
request, without doing anything for it!  It doubles the frequency with
which one runs out of resources, it doubles the frequency of the burst
limit being reached.  It's deadly (deadlockly :) in the situation where
the receiving component device also needs resources in order to service
the request, such as when the transport is network tcp (and I have my
suspicions about scsi too).

 the overhead of various cache layers is
 completely minute compared to a full fault to disk that requires a
 seek which is several orders of magnitude slower.

That's aboslutely true when by overhead you mean computation cycles
and absolutely false when by overhead you mean memory resources, as I
do.  Double buffering is a killer.

 The linux software raid algorithms are highly optimized, and run on a

I can confidently tell you that that's balderdash both as a Linux author
and as a software RAID linux author (check the attributions in the
kernel source, or look up something like Raiding the Noosphere on
google).

 presumably (much faster) cpu than what the controller supports.
 However, there is still some extra oomph you can get out of letting
 the raid controller do what the software raid can't...namely delay
 sync for a time.

There are several design problems left in software raid in the linux kernel.
One of them is the need for extra memory to dispatch requests with and
as (i.e. buffer heads and buffers, both). bhs should be OK since the
small cache per device won't be exceeded while the raid driver itself
serialises requests, which is essentially the case (it does not do any
buffering, queuing, whatever .. and tries hard to avoid doing so). The
need for extra buffers for the data is a problem. On different
platforms different aspects of that problem are important (would you
believe that on ARM mere copying takes so much cpu time that one wants
to avoid it at all costs, whereas on intel it's a forgettable trivium).

I also wouldn't aboslutely swear that request ordering is maintained
under ordinary circumstances.

But of course we try.


Peter

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2008 at 1:03 AM, Peter T. Breuer [EMAIL PROTECTED] wrote:
 Also sprach Merlin Moncure:
 write back: raid controller can lie to host o/s. when o/s asks

 This is not what the linux software raid controller does, then. It
 does not queue requests internally at all, nor ack requests that have
 not already been acked by the components (modulo the fact that one can
 deliberately choose to have a slow component not be sync by allowing
 write-behind on it, in which case the controller will ack the
 incoming request after one of the compionents has been serviced,
 without waiting for both).

 integrity and performance.  'write back' caching provides insane burst
 IOPS (because you are writing to controller cache) and somewhat
 improved sustained IOPS because the controller is reorganizing writes
 on the fly in (hopefully) optimal fashion.

 This is what is provided by Linux file system and (ordinary) block
 device driver subsystem. It is deliberately eschewed by the soft raid
 driver, because any caching will already have been done above and below
 the driver, either in the FS or in the components.

  However the lack of extra buffering is really deliberate (double
  buffering is a horrible thing in many ways, not least because of the

 snip
 completely unconvincing.

 But true.  Therefore the problem in attaining conviction must be at your
 end.  Double buffering just doubles the resources dedicated to a single
 request, without doing anything for it!  It doubles the frequency with
 which one runs out of resources, it doubles the frequency of the burst
 limit being reached.  It's deadly (deadlockly :) in the situation where

Only if those resources are drawn from the same pool.  You are
oversimplifying a calculation that has many variables such as cost.
CPUs for example are introducing more cache levels (l1, l2, l3), etc.
 Also, the different levels of cache have different capabilities.
Only the hardware controller cache is (optionally) allowed to delay
acknowledgment of a sync.  In postgresql terms, we get roughly the
same effect with the computers entire working memory with fsync
disabled...so that we are trusting, rightly or wrongly, that all
writes will eventually make it to disk.  In this case, the raid
controller cache is redundant and marginally useful.

 the receiving component device also needs resources in order to service
 the request, such as when the transport is network tcp (and I have my
 suspicions about scsi too).

 the overhead of various cache layers is
 completely minute compared to a full fault to disk that requires a
 seek which is several orders of magnitude slower.

 That's aboslutely true when by overhead you mean computation cycles
 and absolutely false when by overhead you mean memory resources, as I
 do.  Double buffering is a killer.

Double buffering is most certainly _not_ a killer (or at least, _the_
killer) in practical terms.  Most database systems that do any amount
of writing (that is, interesting databases) are bound by the ability
to randomly read and write to the storage medium, and only that.

This is why raid controllers come with a relatively small amount of
cache...there are diminishing returns from reorganizing writes.  This
is also why up and coming storage technologies (like flash) are so
interesting.  Disk drives have made only marginal improvements in
speed since the early 80's.

 The linux software raid algorithms are highly optimized, and run on a

 I can confidently tell you that that's balderdash both as a Linux author

I'm just saying here that there is little/no cpu overhead for using
software raid on modern hardware.

 believe that on ARM mere copying takes so much cpu time that one wants
 to avoid it at all costs, whereas on intel it's a forgettable trivium).

This is a database list.  The main area of interest is in dealing with
server class hardware.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread david

On Thu, 26 Jun 2008, Peter T. Breuer wrote:


Also sprach Merlin Moncure:

The linux software raid algorithms are highly optimized, and run on a


I can confidently tell you that that's balderdash both as a Linux author
and as a software RAID linux author (check the attributions in the
kernel source, or look up something like Raiding the Noosphere on
google).


presumably (much faster) cpu than what the controller supports.
However, there is still some extra oomph you can get out of letting
the raid controller do what the software raid can't...namely delay
sync for a time.


There are several design problems left in software raid in the linux kernel.
One of them is the need for extra memory to dispatch requests with and
as (i.e. buffer heads and buffers, both). bhs should be OK since the
small cache per device won't be exceeded while the raid driver itself
serialises requests, which is essentially the case (it does not do any
buffering, queuing, whatever .. and tries hard to avoid doing so). The
need for extra buffers for the data is a problem. On different
platforms different aspects of that problem are important (would you
believe that on ARM mere copying takes so much cpu time that one wants
to avoid it at all costs, whereas on intel it's a forgettable trivium).

I also wouldn't aboslutely swear that request ordering is maintained
under ordinary circumstances.


which flavor of linux raid are you talking about (the two main families I 
am aware of are the md and dm ones)


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Federated Postgresql architecture ?

2008-06-26 Thread kevin kempter

Hi List;

Anyone have any experiences to share per setting up a federated  
architecture with PostgreSQL ? I wonder if the dblink contrib works  
well in a federated scenario, specifically in the setup of the  
federated views which equate to a select * from the same table on each  
federated server ?


Thanks in advance...


/Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Federated Postgresql architecture ?

2008-06-26 Thread Jonah H. Harris
On Thu, Jun 26, 2008 at 4:33 PM, kevin kempter
[EMAIL PROTECTED] wrote:
 Anyone have any experiences to share per setting up a federated architecture
 with PostgreSQL ? I wonder if the dblink contrib works well in a federated
 scenario, specifically in the setup of the federated views which equate to a
 select * from the same table on each federated server ?

Because Postgres currently lacks the ability to push down predicates
to individual nodes over a database link, you have to spend a good
amount of time writing PL set-returning functions capable of adding
appropriate WHERE clauses to queries sent over the link.  There are
other things you can do, but it's mostly hackery at this point in
time.  IIRC, David Fetter is trying to get some of the required
predicate information exposed for use in DBI-Link.

Not to self-plug, but if you require it, EnterpriseDB includes
Oracle-style database links (SELECT col FROM [EMAIL PROTECTED]) which support
predicate push-down.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Federated Postgresql architecture ?

2008-06-26 Thread Jonah H. Harris
On Thu, Jun 26, 2008 at 5:41 PM, Josh Berkus [EMAIL PROTECTED] wrote:
 Not to self-plug, but if you require it, EnterpriseDB includes
 Oracle-style database links (SELECT col FROM [EMAIL PROTECTED]) which support
 predicate push-down.

 Also check out Skytools:  http://skytools.projects.postgresql.org/doc/

Hmm, I didn't think the Skype tools could really provide federated
database functionality without a good amount of custom work.  Or, am I
mistaken?

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ??: Postgresql update op is very very slow

2008-06-26 Thread Greg Smith

On Thu, 26 Jun 2008, Holger Hoffstaette wrote:

How do large databases treat mass updates? AFAIK both DB2 and Oracle use 
MVCC (maybe a different kind?) as well


An intro to the other approaches used by Oracle and DB2 (not MVCC) is at

http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Transaction_Locking_and_Scalability

(a URL which I really need to shorten one day).

Are there no options (algorithms) for adaptively choosing different 
update strategies that do not incur the full MVCC overhead?


If you stare at the big picture of PostgreSQL's design, you might notice 
that it usually aims to do things one way and get that implementation 
right for the database's intended audience.  That intended audience cares 
about data integrity and correctness and is willing to suffer the overhead 
that goes along with operating that way.  There's few I don't care about 
reliability here so long as it's fast switches you can flip, and not 
having duplicate code paths to support them helps keep the code simpler 
and therefore more reliable.


This whole area is one of those good/fast/cheap trios.  If you want good 
transaction guarantees on updates, you either get the hardware and 
settings right to handle that (!cheap), or it's slow.  The idea of 
providing a !good/fast/cheap option for updates might have some 
theoretical value, but I think you'd find it hard to get enough support 
for that idea to get work done on it compared to the other things 
developer time is being spent on right now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Federated Postgresql architecture ?

2008-06-26 Thread Josh Berkus
Jonah,

 Hmm, I didn't think the Skype tools could really provide federated
 database functionality without a good amount of custom work.  Or, am I
 mistaken?

Sure, what do you think pl/proxy is for?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread Greg Smith

On Thu, 26 Jun 2008, Peter T. Breuer wrote:


Double buffering is a killer.


No, it isn't; it's a completely trivial bit of overhead.  It only exists 
during the time when blocks are queued to write but haven't been written 
yet.  On any database system, in those cases I/O congestion at the disk 
level (probably things backed up behind seeks) is going to block writes 
way before the memory used or the bit of CPU time making the extra copy 
becomes a factor on anything but minimal platforms.


You seem to know quite a bit about the RAID implementation, but you are a) 
extrapolating from that knowledge into areas of database performance you 
need to spend some more time researching first and b) extrapolating based 
on results from trivial hardware, relative to what the average person on 
this list is running a database server on in 2008.  The weakest platform I 
deploy PostgreSQL on and consider relevant today has two cores and 2GB of 
RAM, for a single-user development system that only has to handle a small 
amount of data relative to what the real servers handle.  If you note the 
kind of hardware people ask about here that's pretty typical.


You have some theories here, Merlin and I have positions that come from 
running benchmarks, and watching theories suffer a brutal smack-down from 
the real world is one of those things that happens every day.  There is 
absolutely some overhead from paths through the Linux software RAID that 
consume resources.  But you can't even measure that in database-oriented 
comparisions against hardware setups that don't use those resources, which 
means that for practical purposes the overhead doesn't exist in this 
context.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Federated Postgresql architecture ?

2008-06-26 Thread Jonah H. Harris
On Thu, Jun 26, 2008 at 6:31 PM, Josh Berkus [EMAIL PROTECTED] wrote:
 Sure, what do you think pl/proxy is for?

Well, considering that an application must be written specifically to
make use of it, and for very specific scenarios, I wouldn't consider
it as making PostgreSQL a federated database.  The pl/proxy
architecture certainly doesn't resemble federated in the sense of the
other database vendors.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Robert Treat
On Wednesday 25 June 2008 11:24:23 Greg Smith wrote:
 What I often do is get a hardware RAID controller, just to accelerate disk
 writes, but configure it in JBOD mode and use Linux or other software RAID
 on that platform.


JBOD + RAIDZ2 FTW ;-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware suggestions for high performance 8.3

2008-06-26 Thread Scott Marlowe
On Thu, Jun 26, 2008 at 10:47 PM, Greg Smith [EMAIL PROTECTED] wrote:
 On Thu, 26 Jun 2008, Henrik wrote:

 I've seen some concerns about buying database performance hardware from
 DELL. Are there at least some of the RAID cards that work well with Linux or
 should I stay clear of DELL permanently?

 People seem to be doing OK if the RAID card is their Perc/6i, which has an
 LSI Logic MegaRAID SAS 1078 chipset under the hood.  There's some helpful
 benchmark results and follow-up meesages related to one of those at
 http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

Yeah, the problems I've had have been with the internal RAID (perc
5???) lsi based controllers.  They kick their drives offline.  Dell
has a firmware update but we haven't had a chance to install it just
yet to see if it fixes the problem with that one.

 That said, I consider the rebranded LSI cards a pain and hate the quality of
 Dell's hardware.

Yeah, I'd just as soon get a regular LSI bios as the remade one Dell
seems intent on pushing.  Also, we just discovered the broadcom
chipsets we have in our Dell 1950s and 1850s will not negotiate to
gigabit with our Nortel switches.  Everything else I've plugged in
just worked.  Went looking at Dell's site, and for the 1950 they
recommend buying a dual port Intel NIC for it.  Why couldn't they just
build in better NICS to start?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance