[PERFORM] testing - ignore

2016-06-28 Thread George Neuner
testing



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Rayson Ho
Thanks Yves for the clarification!

It used to be very important to pre-warm EBS before running benchmarks
in order to get consistent results.

Then at re:Invent 2015, the AWS engineers said that it is not needed
anymore, which IMO is a lot less work for us to do benchmarking in
AWS, because pre-warming a multi-TB EBS vol is very time consuming,
and the I/Os were not free.

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


On Thu, May 26, 2016 at 11:41 AM, Yves Dorfsman  wrote:
> On 2016-05-26 09:03, Artem Tomyuk wrote:
>> Why no? Or you missed something?
>
> I think Rayson is correct, but the double negative makes it hard to read:
>
> "So no EBS pre-warming does not apply to EBS volumes created from snapshots."
>
> Which I interpret as:
> So, "no EBS pre-warming", does not apply to EBS volumes created from 
> snapshots.
>
> Which is correct, you sitll have to warm your EBS when created from sanpshots 
> (to get the data from S3 to the filesystem).
>
>
> --
> http://yves.zioup.com
> gpg: 4096R/32B0F416
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Yves Dorfsman
On 2016-05-26 09:03, Artem Tomyuk wrote:
> Why no? Or you missed something?

I think Rayson is correct, but the double negative makes it hard to read:

"So no EBS pre-warming does not apply to EBS volumes created from snapshots."

Which I interpret as:
So, "no EBS pre-warming", does not apply to EBS volumes created from snapshots.

Which is correct, you sitll have to warm your EBS when created from sanpshots 
(to get the data from S3 to the filesystem).


-- 
http://yves.zioup.com
gpg: 4096R/32B0F416 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Artem Tomyuk
Why no? Or you missed something?

It should be done on every EBS restored from snapshot.

Is that from your personal experience, and if so, when did you do the test??

Yes we are using this practice, because as a part of our production load we
are using auto scale groups to create new instances, wheech are created
from AMI, wheech stands on snapshots, so...








2016-05-26 17:54 GMT+03:00 Rayson Ho :

> Thanks Artem.
>
> So no EBS pre-warming does not apply to EBS volumes created from snapshots.
>
> Rayson
>
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
> On Thu, May 26, 2016 at 10:52 AM, Artem Tomyuk 
> wrote:
> > Please look at the official doc.
> >
> > "New EBS volumes receive their maximum performance the moment that they
> are
> > available and do not require initialization (formerly known as
> pre-warming).
> > However, storage blocks on volumes that were restored from snapshots
> must be
> > initialized (pulled down from Amazon S3 and written to the volume) before
> > you can access the block"
> >
> > Quotation from:
> > http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html
> >
> > 2016-05-26 17:47 GMT+03:00 Rayson Ho :
> >>
> >> On Thu, May 26, 2016 at 10:00 AM, Artem Tomyuk 
> >> wrote:
> >>>
> >>>
> >>> 2016-05-26 16:50 GMT+03:00 Rayson Ho :
> 
>  Amazon engineers said that EBS pre-warming is not needed anymore.
> >>>
> >>>
> >>> but still if you will skip this step you wont get much performance on
> ebs
> >>> created from snapshot.
> >>
> >>
> >>
> >> IIRC, that's not what Amazon engineers said. Is that from your personal
> >> experience, and if so, when did you do the test??
> >>
> >> Rayson
> >>
> >> ==
> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> http://gridscheduler.sourceforge.net/
> >> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >>
> >>
> >>
> >
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Rayson Ho
Thanks Artem.

So no EBS pre-warming does not apply to EBS volumes created from snapshots.

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


On Thu, May 26, 2016 at 10:52 AM, Artem Tomyuk  wrote:
> Please look at the official doc.
>
> "New EBS volumes receive their maximum performance the moment that they are
> available and do not require initialization (formerly known as pre-warming).
> However, storage blocks on volumes that were restored from snapshots must be
> initialized (pulled down from Amazon S3 and written to the volume) before
> you can access the block"
>
> Quotation from:
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html
>
> 2016-05-26 17:47 GMT+03:00 Rayson Ho :
>>
>> On Thu, May 26, 2016 at 10:00 AM, Artem Tomyuk 
>> wrote:
>>>
>>>
>>> 2016-05-26 16:50 GMT+03:00 Rayson Ho :

 Amazon engineers said that EBS pre-warming is not needed anymore.
>>>
>>>
>>> but still if you will skip this step you wont get much performance on ebs
>>> created from snapshot.
>>
>>
>>
>> IIRC, that's not what Amazon engineers said. Is that from your personal
>> experience, and if so, when did you do the test??
>>
>> Rayson
>>
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>>
>>
>>
>


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Artem Tomyuk
Please look at the official doc.

"New EBS volumes receive their maximum performance the moment that they are
available and do not require initialization (formerly known as
pre-warming). However, storage blocks on volumes that were restored from
snapshots must be initialized (pulled down from Amazon S3 and written to
the volume) before you can access the block"

Quotation from:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html

2016-05-26 17:47 GMT+03:00 Rayson Ho :

> On Thu, May 26, 2016 at 10:00 AM, Artem Tomyuk 
> wrote:
>
>>
>> 2016-05-26 16:50 GMT+03:00 Rayson Ho :
>>
>>> Amazon engineers said that EBS pre-warming is not needed anymore.
>>
>>
>> but still if you will skip this step you wont get much performance on ebs
>> created from snapshot.
>>
>
>
> IIRC, that's not what Amazon engineers said. Is that from your personal
> experience, and if so, when did you do the test??
>
> Rayson
>
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
>
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Rayson Ho
On Thu, May 26, 2016 at 10:00 AM, Artem Tomyuk  wrote:

>
> 2016-05-26 16:50 GMT+03:00 Rayson Ho :
>
>> Amazon engineers said that EBS pre-warming is not needed anymore.
>
>
> but still if you will skip this step you wont get much performance on ebs
> created from snapshot.
>


IIRC, that's not what Amazon engineers said. Is that from your personal
experience, and if so, when did you do the test??

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Artem Tomyuk
2016-05-26 16:50 GMT+03:00 Rayson Ho :

> Amazon engineers said that EBS pre-warming is not needed anymore.


but still if you will skip this step you wont get much performance on ebs
created from snapshot.


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Rayson Ho
On Thu, May 26, 2016 at 9:00 AM, Artem Tomyuk  wrote:
>
> But still strong recommendation to pre-warm your ebs in any case,
especially if they created from snapshot.

That used to be true. However, at AWS re:Invent 2015, Amazon engineers said
that EBS pre-warming is not needed anymore.

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



> 2016-05-26 15:53 GMT+03:00 Yves Dorfsman :
>>
>> On 2016-05-25 19:08, Rayson Ho wrote:
>> > Actually, when "EBS-Optimized" is on, then the instance gets dedicated
>> > bandwidth to EBS.
>>
>> Hadn't realised that, thanks.
>> Is the EBS bandwidth then somewhat limited depending on the type of
instance too?
>>
>> --
>> http://yves.zioup.com
>> gpg: 4096R/32B0F416
>>
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Artem Tomyuk
Yes, the smaller instance you choose - the slower ebs will be.
EBS lives separately from EC2, they are communicating via network. So small
instance = low network bandwidth = poorer disk performance.
But still strong recommendation to pre-warm your ebs in any case,
especially if they created from snapshot.

2016-05-26 15:53 GMT+03:00 Yves Dorfsman :

> On 2016-05-25 19:08, Rayson Ho wrote:
> > Actually, when "EBS-Optimized" is on, then the instance gets dedicated
> > bandwidth to EBS.
>
> Hadn't realised that, thanks.
> Is the EBS bandwidth then somewhat limited depending on the type of
> instance too?
>
> --
> http://yves.zioup.com
> gpg: 4096R/32B0F416
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Yves Dorfsman
On 2016-05-25 19:08, Rayson Ho wrote:
> Actually, when "EBS-Optimized" is on, then the instance gets dedicated
> bandwidth to EBS.

Hadn't realised that, thanks.
Is the EBS bandwidth then somewhat limited depending on the type of instance 
too?

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-26 Thread Artem Tomyuk
Hi.

AWS EBS its a really painful story
How was created volumes for RAID? From snapshots?
If you want to get the best performance from EBS it needs to pre-warmed.

Here is the tutorial how to achieve that:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html

Also you should read this one if you want to get really great for
performance:
http://hatim.eu/2014/05/24/leveraging-ssd-ephemeral-disks-in-ec2-part-1/

Good luck!

2016-05-26 1:34 GMT+03:00 Tory M Blue :

> We are starting some testing in AWS, with EC2, EBS backed setups.
>
> What I found interesting today, was a single EBS 1TB volume, gave me
> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
> volumes), gave me something like 31MB/s (test after test after test).
>
> I'm wondering what you folks are using inside of Amazon (not
> interested in RDS at the moment).
>
> Thanks
> Tory
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-25 Thread Rayson Ho
Actually, when "EBS-Optimized" is on, then the instance gets dedicated
bandwidth to EBS.

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



On Wed, May 25, 2016 at 7:56 PM, Yves Dorfsman  wrote:

> Indeed, old-style disk EBS vs new-style SSd EBS.
>
> Be aware that EBS traffic is considered as part of the total "network"
> traffic, and each type of instance has different limits on maximum network
> throughput. Those difference are very significant, do tests on the same
> volume
> between two different type of instances, both with enough cpu and memory
> for
> the I/O to be the bottleneck, you will be surprised!
>
>
> On 2016-05-25 17:02, Rayson Ho wrote:
> > There are many factors that can affect EBS performance. For example, the
> type
> > of EBS volume, the instance type, whether EBS-optimized is turned on or
> not, etc.
> >
> > Without the details, then there is no apples to apples comparsion...
> >
> > Rayson
> >
> > ==
> > Open Grid Scheduler - The Official Open Source Grid Engine
> > http://gridscheduler.sourceforge.net/
> > http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >
> >
> >
> > On Wed, May 25, 2016 at 6:34 PM, Tory M Blue  > > wrote:
> >>
> >> We are starting some testing in AWS, with EC2, EBS backed setups.
> >>
> >> What I found interesting today, was a single EBS 1TB volume, gave me
> >> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
> >> volumes), gave me something like 31MB/s (test after test after test).
> >>
> >> I'm wondering what you folks are using inside of Amazon (not
> >> interested in RDS at the moment).
> >>
> >> Thanks
> >> Tory
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (
> pgsql-performance@postgresql.org
> > )
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> http://yves.zioup.com
> gpg: 4096R/32B0F416
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-25 Thread Yves Dorfsman
Indeed, old-style disk EBS vs new-style SSd EBS.

Be aware that EBS traffic is considered as part of the total "network"
traffic, and each type of instance has different limits on maximum network
throughput. Those difference are very significant, do tests on the same volume
between two different type of instances, both with enough cpu and memory for
the I/O to be the bottleneck, you will be surprised!


On 2016-05-25 17:02, Rayson Ho wrote:
> There are many factors that can affect EBS performance. For example, the type
> of EBS volume, the instance type, whether EBS-optimized is turned on or not, 
> etc.
> 
> Without the details, then there is no apples to apples comparsion...
> 
> Rayson
> 
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> 
> 
> 
> On Wed, May 25, 2016 at 6:34 PM, Tory M Blue  > wrote:
>>
>> We are starting some testing in AWS, with EC2, EBS backed setups.
>>
>> What I found interesting today, was a single EBS 1TB volume, gave me
>> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
>> volumes), gave me something like 31MB/s (test after test after test).
>>
>> I'm wondering what you folks are using inside of Amazon (not
>> interested in RDS at the moment).
>>
>> Thanks
>> Tory
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance


-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-25 Thread Rayson Ho
There are many factors that can affect EBS performance. For example, the
type of EBS volume, the instance type, whether EBS-optimized is turned on
or not, etc.

Without the details, then there is no apples to apples comparsion...

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



On Wed, May 25, 2016 at 6:34 PM, Tory M Blue  wrote:
>
> We are starting some testing in AWS, with EC2, EBS backed setups.
>
> What I found interesting today, was a single EBS 1TB volume, gave me
> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
> volumes), gave me something like 31MB/s (test after test after test).
>
> I'm wondering what you folks are using inside of Amazon (not
> interested in RDS at the moment).
>
> Thanks
> Tory
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Testing in AWS, EBS

2016-05-25 Thread Tory M Blue
We are starting some testing in AWS, with EC2, EBS backed setups.

What I found interesting today, was a single EBS 1TB volume, gave me
something like 108MB/s throughput, however a RAID10 (4 250GB EBS
volumes), gave me something like 31MB/s (test after test after test).

I'm wondering what you folks are using inside of Amazon (not
interested in RDS at the moment).

Thanks
Tory


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] testing - please ignore

2016-04-27 Thread George Neuner




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing strategies

2014-04-15 Thread Matheus de Oliveira
On Tue, Apr 15, 2014 at 12:57 PM, Dave Cramer  wrote:

> I have a client wanting to test PostgreSQL on ZFS running Linux. Other
> than pg_bench are there any other benchmarks that are easy to test?


Check Gregory Smith article about testing disks [1].

[1] http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm

-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


[PERFORM] Testing strategies

2014-04-15 Thread Dave Cramer
I have a client wanting to test PostgreSQL on ZFS running Linux.

Other than pg_bench are there any other benchmarks that are easy to test?

One of the possible concerns is fragmentation over time. Any ideas on how
to fragment the database before running pg_bench ?

Also there is some concern about fragmentation of the WAL logs. I am
looking at testing with and without the WAL logs on ZFS. Any other specific
concerns ?


Dave Cramer
credativ ltd (Canada)

78 Zina St
Orangeville, ON
Canada. L9W 1E8

Office: +1 (905) 766-4091
Mobile: +1 (519) 939-0336

===
Canada:  http://www.credativ.ca
USA: http://www.credativ.us
Germany: http://www.credativ.de
Netherlands: http://www.credativ.nl
UK:  http://www.credativ.co.uk
India:   http://www.credativ.in
===


Re: [PERFORM] Testing Sandforce SSD

2010-08-11 Thread Bruce Momjian
Greg Smith wrote:
> > * How to test for power failure?
> 
> I've had good results using one of the early programs used to 
> investigate this class of problems:  
> http://brad.livejournal.com/2116715.html?page=2

FYI, this tool is mentioned in the Postgres documentation:

http://www.postgresql.org/docs/9.0/static/wal-reliability.html

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-05 Thread Brad Nicholson

 On 10-08-04 03:49 PM, Scott Carey wrote:

On Aug 2, 2010, at 7:26 AM, Merlin Moncure wrote:


On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havinga  wrote:

After a week testing I think I can answer the question above: does it work
like it's supposed to under PostgreSQL?

YES

The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro,
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534

* it is safe to mount filesystems with barrier off, since it has a 'supercap
backed cache'. That data is not lost is confirmed by a dozen power switch
off tests while running either diskchecker.pl or pgbench.
* the above implies its also safe to use this SSD with barriers, though that
will perform less, since this drive obeys write trough commands.
* the highest pgbench tps number for the TPC-B test for a scale 300 database
(~5GB) I could get was over 6700. Judging from the iostat average util of
~40% on the xlog partition, I believe that this number is limited by other
factors than the SSD, like CPU, core count, core MHz, memory size/speed, 8.4
pgbench without threads. Unfortunately I don't have a faster/more core
machines available for testing right now.
* pgbench numbers for a larger than RAM database, read only was over 25000
tps (details are at the end of this post), during which iostat reported
~18500 read iops and 100% utilization.
* pgbench max reported latencies are 20% of comparable BBWC setups.
* how reliable it is over time, and how it performs over time I cannot say,
since I tested it only for a week.

Thank you very much for posting this analysis.  This has IMNSHO the
potential to be a game changer.  There are still some unanswered
questions in terms of how the drive wears, reliability, errors, and
lifespan but 6700 tps off of a single 400$ device with decent fault
tolerance is amazing (Intel, consider yourself upstaged).  Ever since
the first samsung SSD hit the market I've felt the days of the
spinning disk have been numbered.  Being able to build a 100k tps
server on relatively inexpensive hardware without an entire rack full
of drives is starting to look within reach.

Intel's next gen 'enterprise' SSD's are due out later this year.  I have heard 
from those with access to to test samples that they really like them -- these 
people rejected the previous versions because of the data loss on power failure.

So, hopefully there will be some interesting competition later this year in the 
medium price range enterprise ssd market.



I'll be doing some testing on Enterprise grade SSD's this year.  I'll 
also be looking at some hybrid storage products that use as  SSD's as 
accelerators mixed with lower cost storage.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Chris Browne
j...@commandprompt.com ("Joshua D. Drake") writes:
> On Sat, 2010-07-24 at 16:21 -0400, Greg Smith wrote:
>> Greg Smith wrote:
>> > Note that not all of the Sandforce drives include a capacitor; I hope 
>> > you got one that does!  I wasn't aware any of the SF drives with a 
>> > capacitor on them were even shipping yet, all of the ones I'd seen 
>> > were the chipset that doesn't include one still.  Haven't checked in a 
>> > few weeks though.
>> 
>> Answer my own question here:  the drive Yeb got was the brand spanking 
>> new OCZ Vertex 2 Pro, selling for $649 at Newegg for example:  
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with 
>> the supercacitor listed right in the main production specifications 
>> there.  This is officially the first inexpensive (relatively) SSD with a 
>> battery-backed write cache built into it.  If Yeb's test results prove 
>> it works as it's supposed to under PostgreSQL, I'll be happy to finally 
>> have a moderately priced SSD I can recommend to people for database 
>> use.  And I fear I'll be out of excuses to avoid buying one as a toy for 
>> my home system.
>
> That is quite the toy. I can get 4 SATA-II with RAID Controller, with
> battery backed cache, for the same price or less :P

Sure, but it:
- Fits into a single slot
- Is quiet
- Consumes little power
- Generates little heat
- Is likely to be about as quick as the 4-drive array

It doesn't have the extra 4TB of storage, but if you're building big-ish
databases, metrics have to change anyways.

This is a pretty slick answer for the small OLTP server.
-- 
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/postgresql.html
Chaotic Evil means never having to say you're sorry.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Chris Browne
g...@2ndquadrant.com (Greg Smith) writes:
> Yeb Havinga wrote:
>> * What filesystem to use on the SSD? To minimize writes and maximize
>> chance for seeing errors I'd choose ext2 here. 
>
> I don't consider there to be any reason to deploy any part of a
> PostgreSQL database on ext2.  The potential for downtime if the fsck
> doesn't happen automatically far outweighs the minimal performance
> advantage you'll actually see in real applications.  

Ah, but if the goal is to try to torture the SSD as cruelly as possible,
these aren't necessarily downsides (important or otherwise).

I don't think ext2 helps much in "maximizing chances of seeing errors"
in notably useful ways, as the extra "torture" that takes place as part
of the post-remount fsck isn't notably PG-relevant.  (It's not obvious
that errors encountered would be readily mapped to issues relating to
PostgreSQL.)

I think the WAL-oriented test would be *way* more useful; inducing work
whose "brokenness" can be measured in one series of files in one
directory should be way easier than trying to find changes across a
whole PG cluster.  I don't expect the filesystem choice to be terribly
significant to that.
-- 
"cbbrowne","@","gmail.com"
"Heuristics (from the  French heure, "hour") limit the  amount of time
spent executing something.  [When using heuristics] it shouldn't take
longer than an hour to do something."

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Scott Carey

On Aug 3, 2010, at 9:27 AM, Merlin Moncure wrote:
> 
> 2) I've heard that some SSD have utilities that you can use to query
> the write cycles in order to estimate lifespan.  Does this one, and is
> it possible to publish the output (an approximation of the amount of
> work along with this would be wonderful)?
> 

On the intel drives, its available via SMART.  Plenty of hits on how to read 
the data from google.  Sandforce drives probably have it exposed via SMART as 
well.

I have had over 50 X25-M's (80GB G1's) in production for 22 months that write 
~100GB a day and SMART reports they have 78% of their write cycles left.  Plus, 
when it dies from usage it supposedly enters a read-only state.  (these only 
have recoverable data so data loss on power failure is not a concern for me).

So if Sandforce has low write amplification like Intel (they claim to be 
better) longevity should be fine.

> merlin
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Scott Carey

On Aug 2, 2010, at 7:26 AM, Merlin Moncure wrote:

> On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havinga  wrote:
>> After a week testing I think I can answer the question above: does it work
>> like it's supposed to under PostgreSQL?
>> 
>> YES
>> 
>> The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro,
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534
>> 
>> * it is safe to mount filesystems with barrier off, since it has a 'supercap
>> backed cache'. That data is not lost is confirmed by a dozen power switch
>> off tests while running either diskchecker.pl or pgbench.
>> * the above implies its also safe to use this SSD with barriers, though that
>> will perform less, since this drive obeys write trough commands.
>> * the highest pgbench tps number for the TPC-B test for a scale 300 database
>> (~5GB) I could get was over 6700. Judging from the iostat average util of
>> ~40% on the xlog partition, I believe that this number is limited by other
>> factors than the SSD, like CPU, core count, core MHz, memory size/speed, 8.4
>> pgbench without threads. Unfortunately I don't have a faster/more core
>> machines available for testing right now.
>> * pgbench numbers for a larger than RAM database, read only was over 25000
>> tps (details are at the end of this post), during which iostat reported
>> ~18500 read iops and 100% utilization.
>> * pgbench max reported latencies are 20% of comparable BBWC setups.
>> * how reliable it is over time, and how it performs over time I cannot say,
>> since I tested it only for a week.
> 
> Thank you very much for posting this analysis.  This has IMNSHO the
> potential to be a game changer.  There are still some unanswered
> questions in terms of how the drive wears, reliability, errors, and
> lifespan but 6700 tps off of a single 400$ device with decent fault
> tolerance is amazing (Intel, consider yourself upstaged).  Ever since
> the first samsung SSD hit the market I've felt the days of the
> spinning disk have been numbered.  Being able to build a 100k tps
> server on relatively inexpensive hardware without an entire rack full
> of drives is starting to look within reach.

Intel's next gen 'enterprise' SSD's are due out later this year.  I have heard 
from those with access to to test samples that they really like them -- these 
people rejected the previous versions because of the data loss on power failure.

So, hopefully there will be some interesting competition later this year in the 
medium price range enterprise ssd market.

> 
>> Postgres settings:
>> 8.4.4
>> --with-blocksize=4
>> I saw about 10% increase in performance compared to 8KB blocksizes.
> 
> That's very interesting -- we need more testing in that department...
> 
> regards (and thanks again)
> merlin
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Scott Carey

On Jul 26, 2010, at 12:45 PM, Greg Smith wrote:

> Yeb Havinga wrote:
>> I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory 
>> read/write test. (scale 300) No real winners or losers, though ext2 
>> isn't really faster and the manual need for fix (y) during boot makes 
>> it impractical in its standard configuration.
> 
> That's what happens every time I try it too.  The theoretical benefits 
> of ext2 for hosting PostgreSQL just don't translate into significant 
> performance increases on database oriented tests, certainly not ones 
> that would justify the downside of having fsck issues come back again.  
> Glad to see that holds true on this hardware too.
> 

ext2 is slow for many reasons.  ext4 with no journal is significantly faster 
than ext2.  ext4 with a journal is faster than ext2.

> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-04 Thread Hannu Krosing
On Tue, 2010-08-03 at 10:40 +0200, Yeb Havinga wrote:
> se note that the 10% was on a slower CPU. On a more recent CPU the 
> difference was 47%, based on tests that ran for an hour.

I am not surprised at all that reading and writing almost twice as much
data from/to disk takes 47% longer. If less time is spent on seeking the
amount of data starts playing bigger role.

>  That's why I 
> absolutely agree with Merlin Moncure that more testing in this 
> department is welcome, preferably by others since after all I could be 
> on the pay roll of OCZ :-)

:)


> I looked a bit into Bonnie++ but fail to see how I could do a test that 
> somehow matches the PostgreSQL setup during the pgbench tests (db that 
> fits in memory, 

Did it fit in shared_buffers, or system cache ?

Once we are in high tps ground, the time it takes to move pages between
userspace and system cache starts to play bigger role.

I first noticed this several years ago, when doing a COPY to a large
table with indexes took noticably longer (2-3 times longer) when the
indexes were in system cache than when they were in shared_buffers.

> so the test is actually how fast the ssd can capture 
> sequential WAL writes and fsync without barriers, mixed with an 
> occasional checkpoint with random write IO on another partition). Since 
> the WAL writing is the same for both block_size setups, I decided to 
> compare random writes to a file of 5GB with Oracle's Orion tool:

Are you sure that you are not writing full WAL pages ?

Do you have any stats on how much WAL is written for 8kb and 4kb test
cases ?

And for other disk i/o during the tests ?



-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-03 Thread Merlin Moncure
On Tue, Aug 3, 2010 at 11:37 AM, Yeb Havinga  wrote:
> Yeb Havinga wrote:
>>
>> Hannu Krosing wrote:
>>>
>>> Did it fit in shared_buffers, or system cache ?
>>>
>>
>> Database was ~5GB, server has 16GB, shared buffers was set to 1920MB.
>>>
>>> I first noticed this several years ago, when doing a COPY to a large
>>> table with indexes took noticably longer (2-3 times longer) when the
>>> indexes were in system cache than when they were in shared_buffers.
>>>
>>
>> I read this as a hint: try increasing shared_buffers. I'll redo the
>> pgbench run with increased shared_buffers.
>
> Shared buffers raised from 1920MB to 3520MB:
>
> pgbench -v -l -c 20 -M prepared -T 1800 test
> starting vacuum...end.
> starting vacuum pgbench_accounts...end.
> transaction type: TPC-B (sort of)
> scaling factor: 300
> query mode: prepared
> number of clients: 20
> duration: 1800 s
> number of transactions actually processed: 12971714
> tps = 7206.244065 (including connections establishing)
> tps = 7206.349947 (excluding connections establishing)
>
> :-)

1) what can we comparing this against (changing only the
shared_buffers setting)?

2) I've heard that some SSD have utilities that you can use to query
the write cycles in order to estimate lifespan.  Does this one, and is
it possible to publish the output (an approximation of the amount of
work along with this would be wonderful)?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-03 Thread Yeb Havinga

Yeb Havinga wrote:

Hannu Krosing wrote:

Did it fit in shared_buffers, or system cache ?
  

Database was ~5GB, server has 16GB, shared buffers was set to 1920MB.

I first noticed this several years ago, when doing a COPY to a large
table with indexes took noticably longer (2-3 times longer) when the
indexes were in system cache than when they were in shared_buffers.
  
I read this as a hint: try increasing shared_buffers. I'll redo the 
pgbench run with increased shared_buffers.

Shared buffers raised from 1920MB to 3520MB:

pgbench -v -l -c 20 -M prepared -T 1800 test
starting vacuum...end.
starting vacuum pgbench_accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 20
duration: 1800 s
number of transactions actually processed: 12971714
tps = 7206.244065 (including connections establishing)
tps = 7206.349947 (excluding connections establishing)

:-)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-03 Thread Greg Smith

Yeb Havinga wrote:

Small IO size: 4 KB
Maximum Small IOPS=86883 @ Small=8 and Large=0

Small IO size: 8 KB
Maximum Small IOPS=48798 @ Small=11 and Large=0


Conclusion:  you can write 4KB blocks almost twice as fast as 8KB ones.  
This is a useful observation about the effectiveness of the write cache 
on the unit, but not really a surprise.  On ideal hardware performance 
should double if you halve the write size.  I already wagered the 
difference in pgbench results is caused by the same math.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-03 Thread Yeb Havinga

Hannu Krosing wrote:

Did it fit in shared_buffers, or system cache ?
  

Database was ~5GB, server has 16GB, shared buffers was set to 1920MB.

I first noticed this several years ago, when doing a COPY to a large
table with indexes took noticably longer (2-3 times longer) when the
indexes were in system cache than when they were in shared_buffers.
  
I read this as a hint: try increasing shared_buffers. I'll redo the 
pgbench run with increased shared_buffers.
so the test is actually how fast the ssd can capture 
sequential WAL writes and fsync without barriers, mixed with an 
occasional checkpoint with random write IO on another partition). Since 
the WAL writing is the same for both block_size setups, I decided to 
compare random writes to a file of 5GB with Oracle's Orion tool:



Are you sure that you are not writing full WAL pages ?
  

I'm not sure I understand this question.

Do you have any stats on how much WAL is written for 8kb and 4kb test
cases ?
  

Would some iostat -xk 1 for each partition suffice?

And for other disk i/o during the tests ?
  

Not existent.

regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-03 Thread Yeb Havinga

Scott Marlowe wrote:

On Mon, Aug 2, 2010 at 6:07 PM, Greg Smith  wrote:
  

Josh Berkus wrote:


That doesn't make much sense unless there's some special advantage to a
4K blocksize with the hardware itself.
  

Given that pgbench is always doing tiny updates to blocks, I wouldn't be
surprised if switching to smaller blocks helps it in a lot of situations if
one went looking for them.  Also, as you point out, pgbench runtime varies
around wildly enough that 10% would need more investigation to really prove
that means something.  But I think Yeb has done plenty of investigation into
the most interesting part here, the durability claims.

Please note that the 10% was on a slower CPU. On a more recent CPU the 
difference was 47%, based on tests that ran for an hour. That's why I 
absolutely agree with Merlin Moncure that more testing in this 
department is welcome, preferably by others since after all I could be 
on the pay roll of OCZ :-)


I looked a bit into Bonnie++ but fail to see how I could do a test that 
somehow matches the PostgreSQL setup during the pgbench tests (db that 
fits in memory, so the test is actually how fast the ssd can capture 
sequential WAL writes and fsync without barriers, mixed with an 
occasional checkpoint with random write IO on another partition). Since 
the WAL writing is the same for both block_size setups, I decided to 
compare random writes to a file of 5GB with Oracle's Orion tool:


=== 4K test summary 
ORION VERSION 11.1.0.7.0

Commandline:
-testname test -run oltp -size_small 4 -size_large 1024 -write 100

This maps to this test:
Test: test
Small IO size: 4 KB
Large IO size: 1024 KB
IO Types: Small Random IOs, Large Random IOs
Simulated Array Type: CONCAT
Write: 100%
Cache Size: Not Entered
Duration for each Data Point: 60 seconds
Small Columns:,  1,  2,  3,  4,  5,  6,  
7,  8,  9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20

Large Columns:,  0
Total Data Points: 21

Name: /mnt/data/5gb Size: 524288
1 FILEs found.

Maximum Small IOPS=86883 @ Small=8 and Large=0
Minimum Small Latency=0.01 @ Small=1 and Large=0

=== 8K test summary 

ORION VERSION 11.1.0.7.0

Commandline:
-testname test -run oltp -size_small 8 -size_large 1024 -write 100

This maps to this test:
Test: test
Small IO size: 8 KB
Large IO size: 1024 KB
IO Types: Small Random IOs, Large Random IOs
Simulated Array Type: CONCAT
Write: 100%
Cache Size: Not Entered
Duration for each Data Point: 60 seconds
Small Columns:,  1,  2,  3,  4,  5,  6,  
7,  8,  9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20

Large Columns:,  0
Total Data Points: 21

Name: /mnt/data/5gb Size: 524288
1 FILEs found.

Maximum Small IOPS=48798 @ Small=11 and Large=0
Minimum Small Latency=0.02 @ Small=1 and Large=0

Running the tests for longer helps a lot on reducing the noisy
results.  Also letting them runs longer means that the background
writer and autovacuum start getting involved, so the test becomes
somewhat more realistic.
  
Yes, that's why I did a lot of the TPC-B tests with -T 3600 so they'd 
run for an hour. (also the 4K vs 8K blocksize in postgres).


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-02 Thread Scott Marlowe
On Mon, Aug 2, 2010 at 6:07 PM, Greg Smith  wrote:
> Josh Berkus wrote:
>>
>> That doesn't make much sense unless there's some special advantage to a
>> 4K blocksize with the hardware itself.
>
> Given that pgbench is always doing tiny updates to blocks, I wouldn't be
> surprised if switching to smaller blocks helps it in a lot of situations if
> one went looking for them.  Also, as you point out, pgbench runtime varies
> around wildly enough that 10% would need more investigation to really prove
> that means something.  But I think Yeb has done plenty of investigation into
> the most interesting part here, the durability claims.

Running the tests for longer helps a lot on reducing the noisy
results.  Also letting them runs longer means that the background
writer and autovacuum start getting involved, so the test becomes
somewhat more realistic.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-02 Thread Greg Smith

Josh Berkus wrote:

That doesn't make much sense unless there's some special advantage to a
4K blocksize with the hardware itself.


Given that pgbench is always doing tiny updates to blocks, I wouldn't be 
surprised if switching to smaller blocks helps it in a lot of situations 
if one went looking for them.  Also, as you point out, pgbench runtime 
varies around wildly enough that 10% would need more investigation to 
really prove that means something.  But I think Yeb has done plenty of 
investigation into the most interesting part here, the durability claims. 


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-02 Thread Josh Berkus

> Definately - that 10% number was on the old-first hardware (the core 2
> E6600). After reading my post and the 185MBps with 18500 reads/s number
> I was a bit suspicious whether I did the tests on the new hardware with
> 4K, because 185MBps / 18500 reads/s is ~10KB / read, so I thought thats
> a lot closer to 8KB than 4KB. I checked with show block_size and it was
> 4K. Then I redid the tests on the new server with the default 8KB
> blocksize and got about 4700 tps (TPC-B/300)... 67/47 = 1.47. So it
> seems that on newer hardware, the difference is larger than 10%.

That doesn't make much sense unless there's some special advantage to a
4K blocksize with the hardware itself.  Can you just do a basic
filesystem test (like Bonnie++) with a 4K vs. 8K blocksize?

Also, are you running your pgbench tests more than once, just to account
for randomizing?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-02 Thread Yeb Havinga

Merlin Moncure wrote:

On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havinga  wrote:
  

Postgres settings:
8.4.4
--with-blocksize=4
I saw about 10% increase in performance compared to 8KB blocksizes.



That's very interesting -- we need more testing in that department...
  
Definately - that 10% number was on the old-first hardware (the core 2 
E6600). After reading my post and the 185MBps with 18500 reads/s number 
I was a bit suspicious whether I did the tests on the new hardware with 
4K, because 185MBps / 18500 reads/s is ~10KB / read, so I thought thats 
a lot closer to 8KB than 4KB. I checked with show block_size and it was 
4K. Then I redid the tests on the new server with the default 8KB 
blocksize and got about 4700 tps (TPC-B/300)... 67/47 = 1.47. So it 
seems that on newer hardware, the difference is larger than 10%.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-08-02 Thread Merlin Moncure
On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havinga  wrote:
> After a week testing I think I can answer the question above: does it work
> like it's supposed to under PostgreSQL?
>
> YES
>
> The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro,
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534
>
> * it is safe to mount filesystems with barrier off, since it has a 'supercap
> backed cache'. That data is not lost is confirmed by a dozen power switch
> off tests while running either diskchecker.pl or pgbench.
> * the above implies its also safe to use this SSD with barriers, though that
> will perform less, since this drive obeys write trough commands.
> * the highest pgbench tps number for the TPC-B test for a scale 300 database
> (~5GB) I could get was over 6700. Judging from the iostat average util of
> ~40% on the xlog partition, I believe that this number is limited by other
> factors than the SSD, like CPU, core count, core MHz, memory size/speed, 8.4
> pgbench without threads. Unfortunately I don't have a faster/more core
> machines available for testing right now.
> * pgbench numbers for a larger than RAM database, read only was over 25000
> tps (details are at the end of this post), during which iostat reported
> ~18500 read iops and 100% utilization.
> * pgbench max reported latencies are 20% of comparable BBWC setups.
> * how reliable it is over time, and how it performs over time I cannot say,
> since I tested it only for a week.

Thank you very much for posting this analysis.  This has IMNSHO the
potential to be a game changer.  There are still some unanswered
questions in terms of how the drive wears, reliability, errors, and
lifespan but 6700 tps off of a single 400$ device with decent fault
tolerance is amazing (Intel, consider yourself upstaged).  Ever since
the first samsung SSD hit the market I've felt the days of the
spinning disk have been numbered.  Being able to build a 100k tps
server on relatively inexpensive hardware without an entire rack full
of drives is starting to look within reach.

> Postgres settings:
> 8.4.4
> --with-blocksize=4
> I saw about 10% increase in performance compared to 8KB blocksizes.

That's very interesting -- we need more testing in that department...

regards (and thanks again)
merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-30 Thread Karl Denninger
6700tps?!  Wow..

Ok, I'm impressed.  May wait a bit for prices to come somewhat, but that
sounds like two of those are going in one of my production machines
(Raid 1, of course)

Yeb Havinga wrote:
> Greg Smith wrote:
>> Greg Smith wrote:
>>> Note that not all of the Sandforce drives include a capacitor; I
>>> hope you got one that does!  I wasn't aware any of the SF drives
>>> with a capacitor on them were even shipping yet, all of the ones I'd
>>> seen were the chipset that doesn't include one still.  Haven't
>>> checked in a few weeks though.
>>
>> Answer my own question here:  the drive Yeb got was the brand
>> spanking new OCZ Vertex 2 Pro, selling for $649 at Newegg for
>> example: 
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and
>> with the supercacitor listed right in the main production
>> specifications there.  This is officially the first inexpensive
>> (relatively) SSD with a battery-backed write cache built into it.  If
>> Yeb's test results prove it works as it's supposed to under
>> PostgreSQL, I'll be happy to finally have a moderately priced SSD I
>> can recommend to people for database use.  And I fear I'll be out of
>> excuses to avoid buying one as a toy for my home system.
>>
> Hello list,
>
> After a week testing I think I can answer the question above: does it
> work like it's supposed to under PostgreSQL?
>
> YES
>
> The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro,
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534
>
> * it is safe to mount filesystems with barrier off, since it has a
> 'supercap backed cache'. That data is not lost is confirmed by a dozen
> power switch off tests while running either diskchecker.pl or pgbench.
> * the above implies its also safe to use this SSD with barriers,
> though that will perform less, since this drive obeys write trough
> commands.
> * the highest pgbench tps number for the TPC-B test for a scale 300
> database (~5GB) I could get was over 6700. Judging from the iostat
> average util of ~40% on the xlog partition, I believe that this number
> is limited by other factors than the SSD, like CPU, core count, core
> MHz, memory size/speed, 8.4 pgbench without threads. Unfortunately I
> don't have a faster/more core machines available for testing right now.
> * pgbench numbers for a larger than RAM database, read only was over
> 25000 tps (details are at the end of this post), during which iostat
> reported ~18500 read iops and 100% utilization.
> * pgbench max reported latencies are 20% of comparable BBWC setups.
> * how reliable it is over time, and how it performs over time I cannot
> say, since I tested it only for a week.
>
> regards,
> Yeb Havinga
>
> PS: ofcourse all claims I make here are without any warranty. All
> information in this mail is for reference purposes, I do not claim it
> is suitable for your database setup.
>
> Some info on configuration:
> BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-server  elevator=deadline
> quad core AMD Phenom(tm) II X4 940 Processor on 3.0GHz
> 16GB RAM 667MHz DDR2
>
> Disk/ filesystem settings.
> Model Family: OCZ Vertex SSD
> Device Model: OCZ VERTEX2-PRO
> Firmware Version: 1.10
>
> hdparm: did not change standard settings: write cache is on, as well
> as readahead.
> hdparm -AW /dev/sdc
> /dev/sdc:
> look-ahead=  1 (on)
> write-caching =  1 (on)
>
> Untuned ext4 filesystem.
> Mount options
> /dev/sdc2 on /data type ext4
> (rw,noatime,nodiratime,relatime,barrier=0,discard)
> /dev/sdc3 on /xlog type ext4
> (rw,noatime,nodiratime,relatime,barrier=0,discard)
> Note the -o discard: this means use of the automatic SSD trimming on a
> new linux kernel.
> Also, per core per filesystem there now is a [ext4-dio-unwrit] process
> - which suggest something like 'directio'? I haven't investigated this
> any further.
>
> Sysctl:
> (copied from a larger RAM database machine)
> kernel.core_uses_pid = 1
> fs.file-max = 327679
> net.ipv4.ip_local_port_range = 1024 65000
> kernel.msgmni = 2878
> kernel.msgmax = 8192
> kernel.msgmnb = 65536
> kernel.sem = 250 32000 100 142
> kernel.shmmni = 4096
> kernel.sysrq = 1
> kernel.shmmax = 33794121728
> kernel.shmall = 16777216
> net.core.rmem_default = 262144
> net.core.rmem_max = 2097152
> net.core.wmem_default = 262144
> net.core.wmem_max = 262144
> fs.aio-max-nr = 3145728
> vm.swappiness = 0
> vm.dirty_background_ratio = 3
> vm.dirty_expire_centisecs = 500
> vm.dirty_writeback_centisecs = 100
> vm.dirty_ratio = 15
>
> Postgres settings:
> 8.4.4
> --with-blocksize=4
> I saw about 10% increase in performance compared to 8KB blocksizes.
>
> Postgresql.conf:
> changed from default config are:
> maintenance_work_mem = 480MB # pgtune wizard 2010-07-25
> checkpoint_completion_target = 0.9 # pgtune wizard 2010-07-25
> effective_cache_size = 5632MB # pgtune wizard 2010-07-25
> work_mem = 512MB # pgtune wizard 2010-07-25
> wal_buffers = 8MB # pgtune wizard 2010-07-25
> checkpoint_segments = 128 # pgtune said 16 here
> share

Re: [PERFORM] Testing Sandforce SSD

2010-07-30 Thread Yeb Havinga

Greg Smith wrote:

Greg Smith wrote:
Note that not all of the Sandforce drives include a capacitor; I hope 
you got one that does!  I wasn't aware any of the SF drives with a 
capacitor on them were even shipping yet, all of the ones I'd seen 
were the chipset that doesn't include one still.  Haven't checked in 
a few weeks though.


Answer my own question here:  the drive Yeb got was the brand spanking 
new OCZ Vertex 2 Pro, selling for $649 at Newegg for example:  
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and 
with the supercacitor listed right in the main production 
specifications there.  This is officially the first inexpensive 
(relatively) SSD with a battery-backed write cache built into it.  If 
Yeb's test results prove it works as it's supposed to under 
PostgreSQL, I'll be happy to finally have a moderately priced SSD I 
can recommend to people for database use.  And I fear I'll be out of 
excuses to avoid buying one as a toy for my home system.



Hello list,

After a week testing I think I can answer the question above: does it 
work like it's supposed to under PostgreSQL?


YES

The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro, 
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534


* it is safe to mount filesystems with barrier off, since it has a 
'supercap backed cache'. That data is not lost is confirmed by a dozen 
power switch off tests while running either diskchecker.pl or pgbench.
* the above implies its also safe to use this SSD with barriers, though 
that will perform less, since this drive obeys write trough commands.
* the highest pgbench tps number for the TPC-B test for a scale 300 
database (~5GB) I could get was over 6700. Judging from the iostat 
average util of ~40% on the xlog partition, I believe that this number 
is limited by other factors than the SSD, like CPU, core count, core 
MHz, memory size/speed, 8.4 pgbench without threads. Unfortunately I 
don't have a faster/more core machines available for testing right now.
* pgbench numbers for a larger than RAM database, read only was over 
25000 tps (details are at the end of this post), during which iostat 
reported ~18500 read iops and 100% utilization.

* pgbench max reported latencies are 20% of comparable BBWC setups.
* how reliable it is over time, and how it performs over time I cannot 
say, since I tested it only for a week.


regards,
Yeb Havinga

PS: ofcourse all claims I make here are without any warranty. All 
information in this mail is for reference purposes, I do not claim it is 
suitable for your database setup.


Some info on configuration:
BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-server  elevator=deadline
quad core AMD Phenom(tm) II X4 940 Processor on 3.0GHz
16GB RAM 667MHz DDR2

Disk/ filesystem settings.
Model Family: OCZ Vertex SSD
Device Model: OCZ VERTEX2-PRO
Firmware Version: 1.10

hdparm: did not change standard settings: write cache is on, as well as 
readahead.

hdparm -AW /dev/sdc
/dev/sdc:
look-ahead=  1 (on)
write-caching =  1 (on)

Untuned ext4 filesystem.
Mount options
/dev/sdc2 on /data type ext4 
(rw,noatime,nodiratime,relatime,barrier=0,discard)
/dev/sdc3 on /xlog type ext4 
(rw,noatime,nodiratime,relatime,barrier=0,discard)
Note the -o discard: this means use of the automatic SSD trimming on a 
new linux kernel.
Also, per core per filesystem there now is a [ext4-dio-unwrit] process - 
which suggest something like 'directio'? I haven't investigated this any 
further.


Sysctl:
(copied from a larger RAM database machine)
kernel.core_uses_pid = 1
fs.file-max = 327679
net.ipv4.ip_local_port_range = 1024 65000
kernel.msgmni = 2878
kernel.msgmax = 8192
kernel.msgmnb = 65536
kernel.sem = 250 32000 100 142
kernel.shmmni = 4096
kernel.sysrq = 1
kernel.shmmax = 33794121728
kernel.shmall = 16777216
net.core.rmem_default = 262144
net.core.rmem_max = 2097152
net.core.wmem_default = 262144
net.core.wmem_max = 262144
fs.aio-max-nr = 3145728
vm.swappiness = 0
vm.dirty_background_ratio = 3
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_ratio = 15

Postgres settings:
8.4.4
--with-blocksize=4
I saw about 10% increase in performance compared to 8KB blocksizes.

Postgresql.conf:
changed from default config are:
maintenance_work_mem = 480MB # pgtune wizard 2010-07-25
checkpoint_completion_target = 0.9 # pgtune wizard 2010-07-25
effective_cache_size = 5632MB # pgtune wizard 2010-07-25
work_mem = 512MB # pgtune wizard 2010-07-25
wal_buffers = 8MB # pgtune wizard 2010-07-25
checkpoint_segments = 128 # pgtune said 16 here
shared_buffers = 1920MB # pgtune wizard 2010-07-25
max_connections = 100

initdb with data on sda2 and xlog on sda3, C locale

Read write test on ~5GB database:
$ pgbench -v -c 20 -M prepared -T 3600 test
starting vacuum...end.
starting vacuum pgbench_accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 20
duration: 3600 s
number of transactions actually pro

Re: [PERFORM] Testing Sandforce SSD

2010-07-29 Thread Michael Stone

On Wed, Jul 28, 2010 at 03:45:23PM +0200, Yeb Havinga wrote:
Due to the LBA remapping of the SSD, I'm not sure of putting files 
that are sequentially written in a different partition (together with 
e.g. tables) would make a difference: in the end the SSD will have a 
set new blocks in it's buffer and somehow arrange them into sets of 
128KB of 256KB writes for the flash chips. See also 
http://www.anandtech.com/show/2899/2


It's not a question of the hardware side, it's the software. The xlog
needs to by synchronized, and the things the filesystem has to do to 
make that happen penalize the non-xlog disk activity. That's why my 
preferred config is xlog on ext2, rest on xfs. That allows the 
synchronous activity to happen with minimal overhead, while the parts 
that benefit from having more data in flight can do that freely.


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Greg Spiegelberg
On Wed, Jul 28, 2010 at 9:18 AM, Yeb Havinga  wrote:

> Yeb Havinga wrote:
>
>> Due to the LBA remapping of the SSD, I'm not sure of putting files that
>> are sequentially written in a different partition (together with e.g.
>> tables) would make a difference: in the end the SSD will have a set new
>> blocks in it's buffer and somehow arrange them into sets of 128KB of 256KB
>> writes for the flash chips. See also http://www.anandtech.com/show/2899/2
>>
>> But I ran out of ideas to test, so I'm going to test it anyway.
>>
> Same machine config as mentioned before, with data and xlog on separate
> partitions, ext3 with barrier off (save on this SSD).
>
> pgbench -c 10 -M prepared -T 3600 -l test
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 300
> query mode: prepared
> number of clients: 10
> duration: 3600 s
> number of transactions actually processed: 10856359
> tps = 3015.560252 (including connections establishing)
> tps = 3015.575739 (excluding connections establishing)
>
> This is about 25% faster than data and xlog combined on the same
> filesystem.
>
>
The trick may be in kjournald for which there is 1 for each ext3 journalled
file system.  I learned back in Red Hat 4 pre U4 kernels there was a problem
with kjournald that would either cause 30 second hangs or lock up my server
completely when pg_xlog and data were on the same file system plus a few
other "right" things going on.

Given the multicore world we have today, I think it makes sense that
multiple ext3 file systems, and the kjournald's that service them, is faster
than a single combined file system.


Greg


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Yeb Havinga

Yeb Havinga wrote:

Michael Stone wrote:

On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:
I know I'm talking development now but is there a case for a pg_xlog 
block

device to remove the file system overhead and guaranteeing your data is
written sequentially every time?


If you dedicate a partition to xlog, you already get that in practice 
with no extra devlopment.
Due to the LBA remapping of the SSD, I'm not sure of putting files 
that are sequentially written in a different partition (together with 
e.g. tables) would make a difference: in the end the SSD will have a 
set new blocks in it's buffer and somehow arrange them into sets of 
128KB of 256KB writes for the flash chips. See also 
http://www.anandtech.com/show/2899/2


But I ran out of ideas to test, so I'm going to test it anyway.
Same machine config as mentioned before, with data and xlog on separate 
partitions, ext3 with barrier off (save on this SSD).


pgbench -c 10 -M prepared -T 3600 -l test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 10
duration: 3600 s
number of transactions actually processed: 10856359
tps = 3015.560252 (including connections establishing)
tps = 3015.575739 (excluding connections establishing)

This is about 25% faster than data and xlog combined on the same filesystem.

Below is output from iostat -xk 1 -p /dev/sda, which shows each second 
per partition statistics.

sda2 is data, sda3 is xlog In the third second a checkpoint seems to start.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 63.500.00   30.502.500.003.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6518.00   36.00 2211.00   148.00 35524.00
31.75 0.280.12   0.11  25.00
sda1  0.00 2.000.005.00 0.00   636.00   
254.40 0.036.00   2.00   1.00
sda2  0.00   218.00   36.00   40.00   148.00  1032.00
31.05 0.000.00   0.00   0.00
sda3  0.00  6298.000.00 2166.00 0.00 33856.00
31.26 0.250.12   0.12  25.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 60.500.00   37.500.500.001.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6514.00   33.00 2283.00   140.00 35188.00
30.51 0.320.14   0.13  29.00
sda1  0.00 0.000.003.00 0.0012.00 
8.00 0.000.00   0.00   0.00
sda2  0.00 0.00   33.002.00   140.00 8.00 
8.46 0.030.86   0.29   1.00
sda3  0.00  6514.000.00 2278.00 0.00 35168.00
30.88 0.290.13   0.13  29.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 33.000.00   34.00   18.000.00   15.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  3782.007.00 7235.0028.00 44068.00
12.1869.529.46   0.09  62.00
sda1  0.00 0.000.001.00 0.00 4.00 
8.00 0.000.00   0.00   0.00
sda2  0.00   322.007.00 6018.0028.00 25360.00 
8.4369.22   11.33   0.08  47.00
sda3  0.00  3460.000.00 1222.00 0.00 18728.00
30.65 0.300.25   0.25  30.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  9.000.00   36.00   22.500.00   32.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  1079.003.00 0.0012.00 49060.00 
8.83   120.64   10.95   0.08  86.00
sda1  0.00 2.000.002.00 0.00   320.00   
320.00 0.12   60.00  35.00   7.00
sda2  0.0030.003.00 10739.0012.00 43076.00 
8.02   120.49   11.30   0.08  83.00
sda3  0.00  1047.000.00  363.00 0.00  5640.00
31.07 0.030.08   0.08   3.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 62.000.00   31.002.000.005.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6267.00   51.00 2493.00   208.00 35040.00
27.71 1.800.71   0.12  31.00
sda1  0.00 0.000.003.00 0.0012.00 
8.00 0.000.00   0.00   0.00
sda2  0.00   123.00   51.00  344.00   208.00  1868.00
10.51 1.503.80   0.10   4.00
sda3  0.00  6144.000.00 2146.00 0.00 33160.00
30.90 0.300.14   0.14  30.00



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Yeb Havinga

Michael Stone wrote:

On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:
I know I'm talking development now but is there a case for a pg_xlog 
block

device to remove the file system overhead and guaranteeing your data is
written sequentially every time?


If you dedicate a partition to xlog, you already get that in practice 
with no extra devlopment.
Due to the LBA remapping of the SSD, I'm not sure of putting files that 
are sequentially written in a different partition (together with e.g. 
tables) would make a difference: in the end the SSD will have a set new 
blocks in it's buffer and somehow arrange them into sets of 128KB of 
256KB writes for the flash chips. See also 
http://www.anandtech.com/show/2899/2


But I ran out of ideas to test, so I'm going to test it anyway.

regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Michael Stone

On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:

I know I'm talking development now but is there a case for a pg_xlog block
device to remove the file system overhead and guaranteeing your data is
written sequentially every time?


If you dedicate a partition to xlog, you already get that in practice 
with no extra devlopment.


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Michael Stone

On Mon, Jul 26, 2010 at 01:47:14PM -0600, Scott Marlowe wrote:

Note that SSDs aren't usually real fast at large sequential writes
though, so it might be worth putting pg_xlog on a spinning pair in a
mirror and seeing how much, if any, the SSD drive speeds up when not
having to do pg_xlog.


xlog is also where I use ext2; it does bench faster for me in that 
config, and the fsck issues don't really exist because you're not in a 
situation with a lot of files being created/removed.


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-27 Thread Hannu Krosing
On Mon, 2010-07-26 at 14:34 -0400, Greg Smith wrote:
> Matthew Wakeling wrote:
> > Yeb also made the point - there are far too many points on that graph 
> > to really tell what the average latency is. It'd be instructive to 
> > have a few figures, like "only x% of requests took longer than y".
> 
> Average latency is the inverse of TPS.  So if the result is, say, 1200 
> TPS, that means the average latency is 1 / (1200 transactions/second) = 
> 0.83 milliseconds/transaction. 

This is probably only true if you run all transactions sequentially in
one connection? 

If you run 10 parallel threads and get 1200 sec, the average transaction
time (latency?) is probably closer to 8.3 ms ?

>  The average TPS figure is normally on a 
> more useful scale as far as being able to compare them in ways that make 
> sense to people.
> 
> pgbench-tools derives average, worst-case, and 90th percentile figures 
> for latency from the logs.  I have 37MB worth of graphs from a system 
> showing how all this typically works for regular hard drives I've been 
> given permission to publish; just need to find a place to host it at 
> internally and I'll make the whole stack available to the world.  So far 
> Yeb's data is showing that a single SSD is competitive with a small 
> array on average, but with better worst-case behavior than I'm used to 
> seeing.
> 
> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 
> 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Greg Spiegelberg wrote:
I know I'm talking development now but is there a case for a pg_xlog 
block device to remove the file system overhead and guaranteeing your 
data is written sequentially every time?


It's possible to set the PostgreSQL wal_sync_method parameter in the 
database to open_datasync or open_sync, and if you have an operating 
system that supports direct writes it will use those and bypass things 
like the OS write cache.  That's close to what you're suggesting, 
supposedly portable, and it does show some significant benefit when it's 
properly supported.  Problem has been, the synchronous writing code on 
Linux in particular hasn't ever worked right against ext3, and the 
PostgreSQL code doesn't make the right call at all on Solaris.  So 
there's two popular platforms that it just plain doesn't work on, even 
though it should.


We've gotten reports that there are bleeding edge Linux kernel and 
library versions available now that finally fix that issue, and that 
PostgreSQL automatically takes advantage of them when it's compiled on 
one of them.  But I'm not aware of any distribution that makes this easy 
to try out that's available yet, paint is still wet on the code I think.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Andres Freund
On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:
> On Mon, Jul 26, 2010 at 1:45 PM, Greg Smith  wrote:
> > Yeb Havinga wrote:
> >> I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory
> >> read/write test. (scale 300) No real winners or losers, though ext2 isn't
> >> really faster and the manual need for fix (y) during boot makes it
> >> impractical in its standard configuration.
> >>
> >
> > That's what happens every time I try it too.  The theoretical benefits of
> > ext2 for hosting PostgreSQL just don't translate into significant
> > performance increases on database oriented tests, certainly not ones that
> > would justify the downside of having fsck issues come back again.  Glad to
> > see that holds true on this hardware too.
> I know I'm talking development now but is there a case for a pg_xlog block
> device to remove the file system overhead and guaranteeing your data is
> written sequentially every time?
For one I doubt that its a relevant enough efficiency loss in
comparison with a significantly significantly complex implementation
(for one you cant grow/shrink, for another you have to do more
complex, hw-dependent things like rounding to hardware boundaries,
page size etc to stay efficient) for another my experience is that at
a relatively low point XlogInsert gets to be the bottleneck - so I
don't see much point in improving at that low level (yet at least).

Where I would like to do some hw dependent measuring (because I see
significant improvements there) would be prefetching for seqscan,
indexscans et al. using blktrace... But I currently dont have the
time. And its another topic ;-)

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Spiegelberg
On Mon, Jul 26, 2010 at 1:45 PM, Greg Smith  wrote:

> Yeb Havinga wrote:
>
>> I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory
>> read/write test. (scale 300) No real winners or losers, though ext2 isn't
>> really faster and the manual need for fix (y) during boot makes it
>> impractical in its standard configuration.
>>
>
> That's what happens every time I try it too.  The theoretical benefits of
> ext2 for hosting PostgreSQL just don't translate into significant
> performance increases on database oriented tests, certainly not ones that
> would justify the downside of having fsck issues come back again.  Glad to
> see that holds true on this hardware too.
>
>
I know I'm talking development now but is there a case for a pg_xlog block
device to remove the file system overhead and guaranteeing your data is
written sequentially every time?

Greg


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Scott Marlowe
On Mon, Jul 26, 2010 at 12:40 PM, Greg Smith  wrote:
> Greg Spiegelberg wrote:
>>
>> Speaking of the layers in-between, has this test been done with the ext3
>> journal on a different device?  Maybe the purpose is wrong for the SSD.  Use
>> the SSD for the ext3 journal and the spindled drives for filesystem?
>
> The main disk bottleneck on PostgreSQL databases are the random seeks for
> reading and writing to the main data blocks.  The journal information is
> practically noise in comparison--it barely matters because it's so much less
> difficult to keep up with.  This is why I don't really find ext2 interesting
> either.

Note that SSDs aren't usually real fast at large sequential writes
though, so it might be worth putting pg_xlog on a spinning pair in a
mirror and seeing how much, if any, the SSD drive speeds up when not
having to do pg_xlog.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Yeb Havinga wrote:
I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory 
read/write test. (scale 300) No real winners or losers, though ext2 
isn't really faster and the manual need for fix (y) during boot makes 
it impractical in its standard configuration.


That's what happens every time I try it too.  The theoretical benefits 
of ext2 for hosting PostgreSQL just don't translate into significant 
performance increases on database oriented tests, certainly not ones 
that would justify the downside of having fsck issues come back again.  
Glad to see that holds true on this hardware too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Yeb Havinga wrote:
To get similar *average* performance results you'd need to put about 
4 drives and a BBU into a server.  The 


Please forget this question, I now see it in the mail i'm replying to. 
Sorry for the spam!


-- Yeb


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Greg Smith wrote:

Yeb Havinga wrote:
Please remember that particular graphs are from a read/write pgbench 
run on a bigger than RAM database that ran for some time (so with 
checkpoints), on a *single* $435 50GB drive without BBU raid controller.


To get similar *average* performance results you'd need to put about 4 
drives and a BBU into a server.  The worst-case latency on that 
solution is pretty bad though, when a lot of random writes are queued 
up; I suspect that's where the SSD will look much better.


By the way:  if you want to run a lot more tests in an organized 
fashion, that's what http://github.com/gregs1104/pgbench-tools was 
written to do.  That will spit out graphs by client and by scale 
showing how sensitive the test results are to each.

Got it, running the default config right now.

When you say 'comparable to a small array' - could you give a ballpark 
figure for 'small'?


regards,
Yeb Havinga

PS: Some update on the testing: I did some ext3,ext4,xfs,jfs and also 
ext2 tests on the just-in-memory read/write test. (scale 300) No real 
winners or losers, though ext2 isn't really faster and the manual need 
for fix (y) during boot makes it impractical in its standard 
configuration. I did some poweroff tests with barriers explicitily off 
in ext3, ext4 and xfs, still all recoveries went ok.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Kevin Grittner
Greg Smith  wrote:
 
> Yeb's data is showing that a single SSD is competitive with a
> small array on average, but with better worst-case behavior than
> I'm used to seeing.
 
So, how long before someone benchmarks a small array of SSDs?  :-)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Greg Spiegelberg wrote:
Speaking of the layers in-between, has this test been done with the 
ext3 journal on a different device?  Maybe the purpose is wrong for 
the SSD.  Use the SSD for the ext3 journal and the spindled drives for 
filesystem?  


The main disk bottleneck on PostgreSQL databases are the random seeks 
for reading and writing to the main data blocks.  The journal 
information is practically noise in comparison--it barely matters 
because it's so much less difficult to keep up with.  This is why I 
don't really find ext2 interesting either.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Matthew Wakeling wrote:
Yeb also made the point - there are far too many points on that graph 
to really tell what the average latency is. It'd be instructive to 
have a few figures, like "only x% of requests took longer than y".


Average latency is the inverse of TPS.  So if the result is, say, 1200 
TPS, that means the average latency is 1 / (1200 transactions/second) = 
0.83 milliseconds/transaction.  The average TPS figure is normally on a 
more useful scale as far as being able to compare them in ways that make 
sense to people.


pgbench-tools derives average, worst-case, and 90th percentile figures 
for latency from the logs.  I have 37MB worth of graphs from a system 
showing how all this typically works for regular hard drives I've been 
given permission to publish; just need to find a place to host it at 
internally and I'll make the whole stack available to the world.  So far 
Yeb's data is showing that a single SSD is competitive with a small 
array on average, but with better worst-case behavior than I'm used to 
seeing.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Spiegelberg
On Mon, Jul 26, 2010 at 10:26 AM, Yeb Havinga  wrote:

> Matthew Wakeling wrote:
>
>> Apologies, I was interpreting the graph as the latency of the device, not
>> all the layers in-between as well. There isn't any indication in the email
>> with the graph as to what the test conditions or software are.
>>
> That info was in the email preceding the graph mail, but I see now I forgot
> to mention it was a 8.4.4 postgres version.
>
>
Speaking of the layers in-between, has this test been done with the ext3
journal on a different device?  Maybe the purpose is wrong for the SSD.  Use
the SSD for the ext3 journal and the spindled drives for filesystem?
Another possibility is to use ext2 on the SSD.

Greg


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Matthew Wakeling wrote:
Apologies, I was interpreting the graph as the latency of the device, 
not all the layers in-between as well. There isn't any indication in 
the email with the graph as to what the test conditions or software are.
That info was in the email preceding the graph mail, but I see now I 
forgot to mention it was a 8.4.4 postgres version.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling

On Mon, 26 Jul 2010, Greg Smith wrote:

Matthew Wakeling wrote:
Does your latency graph really have milliseconds as the y axis? If so, this 
device is really slow - some requests have a latency of more than a second!


Have you tried that yourself?  If you generate one of those with standard 
hard drives and a BBWC under Linux, I expect you'll discover those latencies 
to be >5 seconds long.  I recently saw >100 *seconds* running a large pgbench 
test due to latency flushing things to disk, on a system with 72GB of RAM. 
Takes a long time to flush >3GB of random I/O out to disk when the kernel 
will happily cache that many writes until checkpoint time.


Apologies, I was interpreting the graph as the latency of the device, not 
all the layers in-between as well. There isn't any indication in the email 
with the graph as to what the test conditions or software are. Obviously 
if you factor in checkpoints and the OS writing out everything, then you 
would have to expect some large latency operations. However, if the device 
itself behaved as in the graph, I would be most unhappy and send it back.


Yeb also made the point - there are far too many points on that graph to 
really tell what the average latency is. It'd be instructive to have a few 
figures, like "only x% of requests took longer than y".


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Yeb Havinga wrote:
Please remember that particular graphs are from a read/write pgbench 
run on a bigger than RAM database that ran for some time (so with 
checkpoints), on a *single* $435 50GB drive without BBU raid controller.


To get similar *average* performance results you'd need to put about 4 
drives and a BBU into a server.  The worst-case latency on that solution 
is pretty bad though, when a lot of random writes are queued up; I 
suspect that's where the SSD will look much better.


By the way:  if you want to run a lot more tests in an organized 
fashion, that's what http://github.com/gregs1104/pgbench-tools was 
written to do.  That will spit out graphs by client and by scale showing 
how sensitive the test results are to each.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Matthew Wakeling wrote:
Does your latency graph really have milliseconds as the y axis? If so, 
this device is really slow - some requests have a latency of more than 
a second!


Have you tried that yourself?  If you generate one of those with 
standard hard drives and a BBWC under Linux, I expect you'll discover 
those latencies to be >5 seconds long.  I recently saw >100 *seconds* 
running a large pgbench test due to latency flushing things to disk, on 
a system with 72GB of RAM.  Takes a long time to flush >3GB of random 
I/O out to disk when the kernel will happily cache that many writes 
until checkpoint time.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Matthew Wakeling wrote:

On Sun, 25 Jul 2010, Yeb Havinga wrote:
Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at 
http://tinypic.com/r/x5e846/3


Does your latency graph really have milliseconds as the y axis?

Yes
If so, this device is really slow - some requests have a latency of 
more than a second!
I try to just give the facts. Please remember that particular graphs are 
from a read/write pgbench run on a bigger than RAM database that ran for 
some time (so with checkpoints), on a *single* $435 50GB drive without 
BBU raid controller. Also, this is a picture with a few million points: 
the ones above 200ms are perhaps a hundred and hence make up a very 
small fraction.


So far I'm pretty impressed with this drive. Lets be fair to OCZ and the 
SandForce guys and do not shoot from the hip things like "really slow", 
without that being backed by a graphed pgbench run together with it's 
cost, so we can compare numbers with numbers.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling

On Sun, 25 Jul 2010, Yeb Havinga wrote:
Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at 
http://tinypic.com/r/x5e846/3


Does your latency graph really have milliseconds as the y axis? If so, 
this device is really slow - some requests have a latency of more than a 
second!


Matthew

--
The early bird gets the worm, but the second mouse gets the cheese.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Yeb Havinga wrote:

Greg Smith wrote:
Put it on ext3, toggle on noatime, and move on to testing.  The 
overhead of the metadata writes is the least of the problems when 
doing write-heavy stuff on Linux.
I ran a pgbench run and power failure test during pgbench with a 3 
year old computer



On the same config more tests.

scale 10 read only and read/write tests. note: only 240 s.

starting vacuum...end.
transaction type: SELECT only
scaling factor: 10
query mode: prepared
number of clients: 10
duration: 240 s
number of transactions actually processed: 8208115
tps = 34197.109896 (including connections establishing)
tps = 34200.658720 (excluding connections establishing)

y...@client45:~$ pgbench -c 10 -l -M prepared -T 240 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: prepared
number of clients: 10
duration: 240 s
number of transactions actually processed: 809271
tps = 3371.147020 (including connections establishing)
tps = 3371.518611 (excluding connections establishing)

--
scale 300 (just fits in RAM) read only and read/write tests

pgbench -c 10 -M prepared -T 300 -S test
starting vacuum...end.
transaction type: SELECT only
scaling factor: 300
query mode: prepared
number of clients: 10
duration: 300 s
number of transactions actually processed: 9219279
tps = 30726.931095 (including connections establishing)
tps = 30729.692823 (excluding connections establishing)

The test above doesn't really test the drive but shows the CPU/RAM limit.

pgbench -c 10 -l -M prepared -T 3600 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 10
duration: 3600 s
number of transactions actually processed: 8838200
tps = 2454.994217 (including connections establishing)
tps = 2455.012480 (excluding connections establishing)

--
scale 2000

pgbench -c 10 -M prepared -T 300 -S test
starting vacuum...end.
transaction type: SELECT only
scaling factor: 2000
query mode: prepared
number of clients: 10
duration: 300 s
number of transactions actually processed: 755772
tps = 2518.547576 (including connections establishing)
tps = 2518.762476 (excluding connections establishing)

So the test above tests the random seek performance. Iostat on the drive 
showed a steady just over 4000 read io's/s:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 11.390.00   13.37   60.400.00   14.85
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00 0.00 4171.000.00 60624.00 0.00
29.0711.812.83   0.24 100.00
sdb   0.00 0.000.000.00 0.00 0.00 
0.00 0.000.00   0.00   0.00


pgbench -c 10 -l -M prepared -T 24000 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 2000
query mode: prepared
number of clients: 10
duration: 24000 s
number of transactions actually processed: 30815691
tps = 1283.979098 (including connections establishing)
tps = 1283.980446 (excluding connections establishing)

Note the duration of several hours. No long waits occurred - of this 
last test the latency png is at http://yfrog.com/f/0vlatencywp/ and the 
TPS graph at http://yfrog.com/f/b5tpsp/


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-25 Thread Yeb Havinga

Yeb Havinga wrote:


8GB DDR2 something..

(lots of details removed)

Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at 
http://tinypic.com/r/x5e846/3


Thanks http://www.westnet.com/~gsmith/content/postgresql/pgbench.htm for 
the gnuplot and psql scripts!



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-25 Thread Yeb Havinga

Greg Smith wrote:
Put it on ext3, toggle on noatime, and move on to testing.  The 
overhead of the metadata writes is the least of the problems when 
doing write-heavy stuff on Linux.
I ran a pgbench run and power failure test during pgbench with a 3 year 
old computer


8GB DDR ?
Intel Core 2 duo 6600 @ 2.40GHz
Intel Corporation 82801IB (ICH9) 2 port SATA IDE Controller
64 bit 2.6.31-22-server (Ubuntu karmic), kernel option elevator=deadline
sysctl options besides increasing shm:
fs.file-max=327679
fs.aio-max-nr=3145728
vm.swappiness=0
vm.dirty_background_ratio = 3
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_ratio = 15

Filesystem on SSD with postgresql data: ext3 mounted with 
noatime,nodiratime,relatime
Postgresql cluster: did initdb with C locale. Data and pg_xlog together 
on the same ext3 filesystem.


Changed in postgresql.conf: settings with pgtune for OLTP and 15 connections
maintenance_work_mem = 480MB # pgtune wizard 2010-07-25
checkpoint_completion_target = 0.9 # pgtune wizard 2010-07-25
effective_cache_size = 5632MB # pgtune wizard 2010-07-25
work_mem = 512MB # pgtune wizard 2010-07-25
wal_buffers = 8MB # pgtune wizard 2010-07-25
checkpoint_segments = 31 # pgtune said 16 here
shared_buffers = 1920MB # pgtune wizard 2010-07-25
max_connections = 15 # pgtune wizard 2010-07-25

Initialized with scale 800 with is about 12GB. I especially went beyond 
an in RAM size for this machine (that would be ~ 5GB), so random reads 
would weigh in the result. Then let pgbench run the tcp benchmark with 
-M prepared, 10 clients and -T 3600 (one hour) and 10 clients, after 
that loaded the logfile in a db and did some queries. Then realized the 
pgbench result page was not in screen buffer anymore so I cannot copy it 
here, but hey, those can be edited as well right ;-)


select count(*),count(*)/3600,avg(time),stddev(time) from log;
 count  | ?column? |  avg  | stddev
-+--+---+

4939212 | 1372 | 7282.8581978258880161 | 11253.96967962
(1 row)

Judging from the latencys in the logfiles I did not experience serious 
lagging (time is in microseconds):


select * from log order by time desc limit 3;
client_id | tx_no |  time   | file_no |   epoch| time_us
---+---+-+-++-
3 | 33100 | 1229503 |   0 | 1280060345 |  866650
9 | 39990 | 1077519 |   0 | 1280060345 |  858702
2 | 55323 | 1071060 |   0 | 1280060519 |  750861
(3 rows)

select * from log order by time desc limit 3 OFFSET 1000;
client_id | tx_no  |  time  | file_no |   epoch| time_us
---+++-++-
5 | 262466 | 245953 |   0 | 1280062074 |  513789
1 | 267519 | 245867 |   0 | 1280062074 |  513301
7 | 273662 | 245532 |   0 | 1280062078 |  378932
(3 rows)

select * from log order by time desc limit 3 OFFSET 1;
client_id | tx_no  | time  | file_no |   epoch| time_us
---++---+-++-
5 | 123011 | 82854 |   0 | 1280061036 |  743986
6 | 348967 | 82853 |   0 | 1280062687 |  776317
8 | 439789 | 82848 |   0 | 1280063109 |  552928
(3 rows)

Then I started pgbench again with the same setting, let it run for a few 
minutes and in another console did CHECKPOINT and then turned off power. 
After restarting, the database recovered without a problem.


LOG:  database system was interrupted; last known up at 2010-07-25 
10:14:15 EDT
LOG:  database system was not properly shut down; automatic recovery in 
progress

LOG:  redo starts at F/98008610
LOG:  record with zero length at F/A2BAC040
LOG:  redo done at F/A2BAC010
LOG:  last completed transaction was at log time 2010-07-25 
10:14:16.151037-04


regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Greg Smith

Yeb Havinga wrote:
Writes/s start low but quickly converge to a number in the range of 
1200 to 1800. The writes diskchecker does are 16kB writes. Making this 
4kB writes does not increase writes/s. 32kB seems a little less, 64kB 
is about two third of initial writes/s and 128kB is half.


Let's turn that into MB/s numbers:

4k * 1200 = 4.7 MB/s
8k * 1200 = 9.4 MB/s
16k * 1200 = 18.75 MB/s
64kb * 1200 * 2/3 [800] = 37.5 MB/s
128kb * 1200 / 2 [600] = 75 MB/s

For comparison sake, a 7200 RPM drive running PostgreSQL will do <120 
commits/second without a BBWC, so at an 8K block size that's <1 MB/s.  
If you put a cache in the middle, I'm used to seeing about 5000 8K 
commits/second, which is around 40 MB/s.  So this is sitting right in 
the middle of those two.  Sequential writes with a commit after each one 
like this are basically the worst case for the SSD, so if it can provide 
reasonable performance on that I'd be happy.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Yeb Havinga

Yeb Havinga wrote:

Yeb Havinga wrote:
diskchecker: running 37 sec, 4.47% coverage of 500 MB (1468 writes; 
39/s)

Total errors: 0

:-)

OTOH, I now notice the 39 write /s .. If that means ~ 39 tps... bummer.
When playing with it a bit more, I couldn't get the test_file to be 
created in the right place on the test system. It turns out I had the 
diskchecker config switched and 39 write/s was the speed of the 
not-rebooted system, sorry.


I did several diskchecker.pl tests this time with the testfile on the 
SSD, none of the tests have returned an error :-)


Writes/s start low but quickly converge to a number in the range of 1200 
to 1800. The writes diskchecker does are 16kB writes. Making this 4kB 
writes does not increase writes/s. 32kB seems a little less, 64kB is 
about two third of initial writes/s and 128kB is half.


So no BBU speeds here for writes, but still ~ factor 10 improvement of 
iops for a rotating SATA disk.


regards,
Yeb Havinga

PS: hdparm showed write cache was on. I did tests with both ext2 and 
xfs, where xfs tests I did with both barrier and nobarrier.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Greg Smith

Joshua D. Drake wrote:

That is quite the toy. I can get 4 SATA-II with RAID Controller, with
battery backed cache, for the same price or less :P
  


True, but if you look at tests like 
http://www.anandtech.com/show/2899/12 it suggests there's probably at 
least a 6:1 performance speedup for workloads with a lot of random I/O 
to them.  And I'm really getting sick of the power/noise/heat that the 6 
drives in my home server produces.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Joshua D. Drake
On Sat, 2010-07-24 at 16:21 -0400, Greg Smith wrote:
> Greg Smith wrote:
> > Note that not all of the Sandforce drives include a capacitor; I hope 
> > you got one that does!  I wasn't aware any of the SF drives with a 
> > capacitor on them were even shipping yet, all of the ones I'd seen 
> > were the chipset that doesn't include one still.  Haven't checked in a 
> > few weeks though.
> 
> Answer my own question here:  the drive Yeb got was the brand spanking 
> new OCZ Vertex 2 Pro, selling for $649 at Newegg for example:  
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with 
> the supercacitor listed right in the main production specifications 
> there.  This is officially the first inexpensive (relatively) SSD with a 
> battery-backed write cache built into it.  If Yeb's test results prove 
> it works as it's supposed to under PostgreSQL, I'll be happy to finally 
> have a moderately priced SSD I can recommend to people for database 
> use.  And I fear I'll be out of excuses to avoid buying one as a toy for 
> my home system.

That is quite the toy. I can get 4 SATA-II with RAID Controller, with
battery backed cache, for the same price or less :P

Sincerely,

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Greg Smith

Greg Smith wrote:
Note that not all of the Sandforce drives include a capacitor; I hope 
you got one that does!  I wasn't aware any of the SF drives with a 
capacitor on them were even shipping yet, all of the ones I'd seen 
were the chipset that doesn't include one still.  Haven't checked in a 
few weeks though.


Answer my own question here:  the drive Yeb got was the brand spanking 
new OCZ Vertex 2 Pro, selling for $649 at Newegg for example:  
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with 
the supercacitor listed right in the main production specifications 
there.  This is officially the first inexpensive (relatively) SSD with a 
battery-backed write cache built into it.  If Yeb's test results prove 
it works as it's supposed to under PostgreSQL, I'll be happy to finally 
have a moderately priced SSD I can recommend to people for database 
use.  And I fear I'll be out of excuses to avoid buying one as a toy for 
my home system.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Yeb Havinga

Yeb Havinga wrote:

diskchecker: running 37 sec, 4.47% coverage of 500 MB (1468 writes; 39/s)
Total errors: 0

:-)

OTOH, I now notice the 39 write /s .. If that means ~ 39 tps... bummer.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Yeb Havinga

Greg Smith wrote:
Note that not all of the Sandforce drives include a capacitor; I hope 
you got one that does!  I wasn't aware any of the SF drives with a 
capacitor on them were even shipping yet, all of the ones I'd seen 
were the chipset that doesn't include one still.  Haven't checked in a 
few weeks though.
I think I did, it was expensive enough, though while ordering its very 
easy to order the wrong one, all names on the product category page look 
the same. (OCZ Vertex 2 Pro)

* How to test for power failure?


I've had good results using one of the early programs used to 
investigate this class of problems:  
http://brad.livejournal.com/2116715.html?page=2

A great tool, thanks for the link!

 diskchecker: running 34 sec, 4.10% coverage of 500 MB (1342 writes; 39/s)
 diskchecker: running 35 sec, 4.24% coverage of 500 MB (1390 writes; 39/s)
 diskchecker: running 36 sec, 4.35% coverage of 500 MB (1427 writes; 39/s)
 diskchecker: running 37 sec, 4.47% coverage of 500 MB (1468 writes; 39/s)
didn't get 'ok' from server (11387 316950), msg=[] = Connection reset by 
peer at ./diskchecker.pl line 132.


here's where I removed the power and left it off for about a minute. 
Then started again then did the verify


y...@a:~$ ./diskchecker.pl -s client45.eemnes verify test_file
verifying: 0.00%
Total errors: 0

:-)
this was on ext2

* What filesystem to use on the SSD? To minimize writes and maximize 
chance for seeing errors I'd choose ext2 here. 


I don't consider there to be any reason to deploy any part of a 
PostgreSQL database on ext2.  The potential for downtime if the fsck 
doesn't happen automatically far outweighs the minimal performance 
advantage you'll actually see in real applications.
Hmm.. wouldn't that apply for other filesystems as well? I know that JFS 
also won't mount if booted unclean, it somehow needs a marker from the 
fsck. Don't know for ext3, xfs etc.
All of the benchmarks showing large gains for ext2 over ext3 I have 
seen been synthetic, not real database performance; the internal ones 
I've run using things like pgbench do not show a significant 
improvement.  (Yes, I'm already working on finding time to publicly 
release those findings)
The reason I'd choose ext2 on the SSD was mainly to decrease the number 
of writes, not for performance. Maybe I should ultimately do tests for 
both journalled and ext2 filesystems and compare the amount of data per 
x pgbench transactions.
Put it on ext3, toggle on noatime, and move on to testing.  The 
overhead of the metadata writes is the least of the problems when 
doing write-heavy stuff on Linux.

Will surely do and post the results.

thanks,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Merlin Moncure
On Sat, Jul 24, 2010 at 3:20 AM, Yeb Havinga  wrote:
> Hello list,
>
> Probably like many other's I've wondered why no SSD manufacturer puts a
> small BBU on a SSD drive. Triggered by Greg Smith's mail
> http://archives.postgresql.org/pgsql-performance/2010-02/msg00291.php here,
> and also anandtech's review at http://www.anandtech.com/show/2899/1 (see
> page 6 for pictures of the capacitor) I ordered a SandForce drive and this
> week it finally arrived.
>
> And now I have to test it and was wondering about some things like
>
> * How to test for power failure?

I test like this: write a small program that sends a endless series of
inserts like this:
*) on the server:
create table foo (id serial);
*) from the client:
insert into foo default values returning id;
on the client side print the inserted value to the terminal after the
query is reported as complete to the client.

Run the program, wait a bit, then pull the plug on the server.  The
database should recover clean and the last reported insert on the
client should be there when it restarts.  Try restarting immediately a
few times then if that works try it and let it simmer overnight.  If
it makes it at least 24-48 hours that's a very promising sign.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Greg Smith

Yeb Havinga wrote:
Probably like many other's I've wondered why no SSD manufacturer puts 
a small BBU on a SSD drive. Triggered by Greg Smith's mail 
http://archives.postgresql.org/pgsql-performance/2010-02/msg00291.php 
here, and also anandtech's review at 
http://www.anandtech.com/show/2899/1 (see page 6 for pictures of the 
capacitor) I ordered a SandForce drive and this week it finally arrived.


Note that not all of the Sandforce drives include a capacitor; I hope 
you got one that does!  I wasn't aware any of the SF drives with a 
capacitor on them were even shipping yet, all of the ones I'd seen were 
the chipset that doesn't include one still.  Haven't checked in a few 
weeks though.



* How to test for power failure?


I've had good results using one of the early programs used to 
investigate this class of problems:  
http://brad.livejournal.com/2116715.html?page=2


You really need a second "witness" server to do this sort of thing 
reliably, which that provides.


* What filesystem to use on the SSD? To minimize writes and maximize 
chance for seeing errors I'd choose ext2 here. 


I don't consider there to be any reason to deploy any part of a 
PostgreSQL database on ext2.  The potential for downtime if the fsck 
doesn't happen automatically far outweighs the minimal performance 
advantage you'll actually see in real applications.  All of the 
benchmarks showing large gains for ext2 over ext3 I have seen been 
synthetic, not real database performance; the internal ones I've run 
using things like pgbench do not show a significant improvement.  (Yes, 
I'm already working on finding time to publicly release those findings)


Put it on ext3, toggle on noatime, and move on to testing.  The overhead 
of the metadata writes is the least of the problems when doing 
write-heavy stuff on Linux.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Ben Chobot
On Jul 24, 2010, at 12:20 AM, Yeb Havinga wrote:

> The problem in this scenario is that even when the SSD would show not data 
> loss and the rotating disk would for a few times, a dozen tests without 
> failure isn't actually proof that the drive can write it's complete buffer to 
> disk after power failure.

Yes, this is always going to be the case with testing like this - you'll never 
be able to prove that it will always be safe. 
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread david

On Sat, 24 Jul 2010, David Boreham wrote:


Do you guys have any more ideas to properly 'feel this disk at its teeth' ?


While an 'end-to-end' test using PG is fine, I think it would be easier to 
determine if the drive is behaving correctly by using a simple test program 
that emulates the storage semantics the WAL expects. Have it write a constant 
stream of records, fsync'ing after each write. Record the highest record 
number flushed so far in some place that won't be lost with the drive under 
test (e.g. send it over the network to another machine).


Kill the power, bring the system back up again and examine what's at the tail 
end of that file. I think this will give you the worst case test with the 
easiest result discrimination.


If you want to you could add concurrent random writes to another file for 
extra realism.


Someone here may already have a suitable test program. I know I've written 
several over the years in order to test I/O performance, prove the existence 
of kernel bugs, and so on.


I doubt it matters much how long the power is turned of. A second should be 
plenty time to flush pending writes if the drive is going to do so.


remember that SATA is designed to be hot-plugged, so you don't have to 
kill the entire system to kill power to the drive.


this is a little more ubrupt than the system loosing power, but in terms 
of loosing data this is about the worst case (while at the same time, it 
eliminates the possibility that the OS continues to perform writes to the 
drive as power dies, which is a completely different class of problems, 
independant of the drive type)


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread David Boreham


Do you guys have any more ideas to properly 'feel this disk at its 
teeth' ?


While an 'end-to-end' test using PG is fine, I think it would be easier 
to determine if the drive is behaving correctly by using a simple test 
program that emulates the storage semantics the WAL expects. Have it 
write a constant stream of records, fsync'ing after each write. Record 
the highest record number flushed so far in some place that won't be 
lost with the drive under test (e.g. send it over the network to another 
machine).


Kill the power, bring the system back up again and examine what's at the 
tail end of that file. I think this will give you the worst case test 
with the easiest result discrimination.


If you want to you could add concurrent random writes to another file 
for extra realism.


Someone here may already have a suitable test program. I know I've 
written several over the years in order to test I/O performance, prove 
the existence of kernel bugs, and so on.


I doubt it matters much how long the power is turned of. A second should 
be plenty time to flush pending writes if the drive is going to do so.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Testing Sandforce SSD

2010-07-24 Thread Yeb Havinga

Hello list,

Probably like many other's I've wondered why no SSD manufacturer puts a 
small BBU on a SSD drive. Triggered by Greg Smith's mail 
http://archives.postgresql.org/pgsql-performance/2010-02/msg00291.php 
here, and also anandtech's review at 
http://www.anandtech.com/show/2899/1 (see page 6 for pictures of the 
capacitor) I ordered a SandForce drive and this week it finally arrived.


And now I have to test it and was wondering about some things like

* How to test for power failure? I thought by running on the same 
machine a parallel pgbench setup on two clusters where one runs with 
data and wal on a rotating disk, the other on the SSD, both without BBU 
controller. Then turn off power. Do that a few times. The problem in 
this scenario is that even when the SSD would show not data loss and the 
rotating disk would for a few times, a dozen tests without failure isn't 
actually proof that the drive can write it's complete buffer to disk 
after power failure.


* How long should the power be turned off? A minute? 15 minutes?

* What filesystem to use on the SSD? To minimize writes and maximize 
chance for seeing errors I'd choose ext2 here. For the sake of not 
comparing apples with pears I'd have to go with ext2 on the rotating 
data disk as well.


Do you guys have any more ideas to properly 'feel this disk at its teeth' ?

regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Ben Chobot
On Mar 17, 2010, at 7:41 AM, Brad Nicholson wrote:

> As an aside, some folks in our Systems Engineering department here did
> do some testing of FusionIO, and they found that the helper daemons were
> inefficient and placed a fair amount of load on the server.  That might
> be something to watch of for for those that are testing them.

As another anecdote, we have 4 of the 160GB cards in a 24-core Istanbul server. 
I don't know how efficient the helper daemons are, but they do take up about 
half of one core's cycles, regardless of how busy the box actually is. So that 
sounds "bad" until you take into account how much that one core costs, and 
compare it to how much it would cost to have the same amount of IOPs in a 
different form. 
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread david

On Wed, 17 Mar 2010, Brad Nicholson wrote:


On Wed, 2010-03-17 at 14:11 -0400, Justin Pitts wrote:

On Mar 17, 2010, at 10:41 AM, Brad Nicholson wrote:


On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote:

FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, which 
wear levels across 100GB of actual installed capacity.
http://community.fusionio.com/forums/p/34/258.aspx#258



20% of overall capacity free for levelling doesn't strike me as a lot.


I don't have any idea how to judge what amount would be right.


Some of the Enterprise grade stuff we are looking into (like TMS RamSan)
leaves 40% (with much larger overall capacity).

Also, running that drive at 80GB is the "Maximum Capacity" mode, which
decreases the write performance.


Very fair. In my favor, my proposed use case is probably at half capacity or 
less. I am getting the impression that partitioning/formatting the drive for 
the intended usage, and not the max capacity, is the way to go. Capacity isn't 
an issue with this workload. I cannot fit enough drives into these servers to 
get a tenth of the IOPS that even Tom's documents the ioDrive is capable of at 
reduced performance levels.



The actual media is only good for a very limited number of write cycles.  The 
way that the drives get around to be reliable is to
constantly write to different areas.  The more you have free, the less you have 
to re-use, the longer the lifespan.

This is done by the drives wear levelling algorithms, not by using
partitioning utilities btw.


true, but if the drive is partitioned so that parts of it are never 
written to by the OS, the drive knows that those parts don't contain data 
and so can treat them as unallocated.


once the OS writes to a part of the drive, unless the OS issues a trim 
command the drive can't know that the data there is worthless and can be 
ignored, it has to try and preserve that data, which makes doing the wear 
leveling harder and slower.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson
On Wed, 2010-03-17 at 14:11 -0400, Justin Pitts wrote:
> On Mar 17, 2010, at 10:41 AM, Brad Nicholson wrote:
> 
> > On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote:
> >> FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, 
> >> which wear levels across 100GB of actual installed capacity. 
> >> http://community.fusionio.com/forums/p/34/258.aspx#258
> >> 
> > 
> > 20% of overall capacity free for levelling doesn't strike me as a lot.
> 
> I don't have any idea how to judge what amount would be right.
> 
> > Some of the Enterprise grade stuff we are looking into (like TMS RamSan)
> > leaves 40% (with much larger overall capacity).
> > 
> > Also, running that drive at 80GB is the "Maximum Capacity" mode, which
> > decreases the write performance.
> 
> Very fair. In my favor, my proposed use case is probably at half capacity or 
> less. I am getting the impression that partitioning/formatting the drive for 
> the intended usage, and not the max capacity, is the way to go. Capacity 
> isn't an issue with this workload. I cannot fit enough drives into these 
> servers to get a tenth of the IOPS that even Tom's documents the ioDrive is 
> capable of at reduced performance levels.


The actual media is only good for a very limited number of write cycles.  The 
way that the drives get around to be reliable is to 
constantly write to different areas.  The more you have free, the less you have 
to re-use, the longer the lifespan.

This is done by the drives wear levelling algorithms, not by using
partitioning utilities btw.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Kenny Gorman
Greg,

Did you ever contact them and get your hands on one?

We eventually did see long SSD rebuild times on server crash as well.  But data 
came back uncorrupted per my blog post.  This is a good case for Slony Slaves.  
Anyone in a high TX low downtime environment would have already engineered 
around needing to wait for rebuild/recover times anyway.  So it's not a deal 
killer in my view.

-kg

On Mar 8, 2010, at 12:50 PM, Greg Smith wrote:

> Ben Chobot wrote:
>> We've enjoyed our FusionIO drives very much. They can do 100k iops without 
>> breaking a sweat. Just make sure you shut them down cleanly - it can up to 
>> 30 minutes per card to recover from a crash/plug pull test.   
> 
> Yeah...I got into an argument with Kenny Gorman over my concerns with how 
> they were handling durability issues on his blog, the reading I did about 
> them never left me satisfied Fusion was being completely straight with 
> everyone about this area:  http://www.kennygorman.com/wordpress/?p=398
> 
> If it takes 30 minutes to recover, but it does recover, I guess that's better 
> than I feared was the case with them.  Thanks for reporting the plug pull 
> tests--I don't trust any report from anyone about new storage hardware that 
> doesn't include that little detail as part of the testing.  You're just 
> asking to have your data get lost without that basic due diligence, and I'm 
> sure not going to even buy eval hardware from a vendor that appears evasive 
> about it.  There's a reason I don't personally own any SSD hardware yet.
> 
> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Justin Pitts
FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, which 
wear levels across 100GB of actual installed capacity. 

http://community.fusionio.com/forums/p/34/258.aspx#258

Max drive performance would be about 41TB/day, which coincidently works out 
very close to the 3 year warranty they have on the devices.

FusionIO's claim _seems_ credible. I'd love to see some evidence to the 
contrary.


On Mar 17, 2010, at 9:18 AM, Brad Nicholson wrote:

> On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote:
>> On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote:
>> 
>>> I've been hearing bad things from some folks about the quality of the
>>> FusionIO drives from a durability standpoint.
>> 
>> Can you be more specific about that? Durability over what time frame? How 
>> many devices in the sample set? How did FusionIO deal with the issue?
> 
> I didn't get any specifics - as we are looking at other products.  It
> did center around how FusionIO did wear-leveling though. 
> -- 
> Brad Nicholson  416-673-4106
> Database Administrator, Afilias Canada Corp.
> 
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Justin Pitts
On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote:

> I've been hearing bad things from some folks about the quality of the
> FusionIO drives from a durability standpoint.

Can you be more specific about that? Durability over what time frame? How many 
devices in the sample set? How did FusionIO deal with the issue?
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Justin Pitts

On Mar 17, 2010, at 10:41 AM, Brad Nicholson wrote:

> On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote:
>> FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, 
>> which wear levels across 100GB of actual installed capacity. 
>> http://community.fusionio.com/forums/p/34/258.aspx#258
>> 
> 
> 20% of overall capacity free for levelling doesn't strike me as a lot.

I don't have any idea how to judge what amount would be right.

> Some of the Enterprise grade stuff we are looking into (like TMS RamSan)
> leaves 40% (with much larger overall capacity).
> 
> Also, running that drive at 80GB is the "Maximum Capacity" mode, which
> decreases the write performance.

Very fair. In my favor, my proposed use case is probably at half capacity or 
less. I am getting the impression that partitioning/formatting the drive for 
the intended usage, and not the max capacity, is the way to go. Capacity isn't 
an issue with this workload. I cannot fit enough drives into these servers to 
get a tenth of the IOPS that even Tom's documents the ioDrive is capable of at 
reduced performance levels.

>> Max drive performance would be about 41TB/day, which coincidently works out 
>> very close to the 3 year warranty they have on the devices.
>> 
> 
> To counter that:
> 
> http://www.tomshardware.com/reviews/fusioinio-iodrive-flash,2140-2.html
> 
> "Fusion-io’s wear leveling algorithm is based on a cycle of 5 TB
> write/erase volume per day, resulting in 24 years run time for the 80 GB
> model, 48 years for the 160 GB version and 16 years for the MLC-based
> 320 GB type. However, since 5 TB could be written or erased rather
> quickly given the performance level, we recommend not relying on these
> approximations too much."
> 

I'm not sure if that is a counter or a supporting claim :) 

> 
>> FusionIO's claim _seems_ credible. I'd love to see some evidence to the 
>> contrary.
> 
> Vendor claims always seem credible.  The key is to separate the
> marketing hype from the actual details.

I'm hoping to get my hands on a sample in the next few weeks. 

> 
> Again, I'm just passing along what I heard - which was from a
> vendor-neutral, major storage consulting firm that decided to stop
> recommending these drives to clients.  Make of that what you will.
> 
> As an aside, some folks in our Systems Engineering department here did
> do some testing of FusionIO, and they found that the helper daemons were
> inefficient and placed a fair amount of load on the server.  That might
> be something to watch of for for those that are testing them.
> 

That is a wonderful little nugget of knowledge that I shall put on my test plan.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson
On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote:
> FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, 
> which wear levels across 100GB of actual installed capacity. 
> http://community.fusionio.com/forums/p/34/258.aspx#258
> 

20% of overall capacity free for levelling doesn't strike me as a lot.
Some of the Enterprise grade stuff we are looking into (like TMS RamSan)
leaves 40% (with much larger overall capacity).

Also, running that drive at 80GB is the "Maximum Capacity" mode, which
decreases the write performance.

> Max drive performance would be about 41TB/day, which coincidently works out 
> very close to the 3 year warranty they have on the devices.
> 

To counter that:

http://www.tomshardware.com/reviews/fusioinio-iodrive-flash,2140-2.html

"Fusion-io’s wear leveling algorithm is based on a cycle of 5 TB
write/erase volume per day, resulting in 24 years run time for the 80 GB
model, 48 years for the 160 GB version and 16 years for the MLC-based
320 GB type. However, since 5 TB could be written or erased rather
quickly given the performance level, we recommend not relying on these
approximations too much."


> FusionIO's claim _seems_ credible. I'd love to see some evidence to the 
> contrary.

Vendor claims always seem credible.  The key is to separate the
marketing hype from the actual details.

Again, I'm just passing along what I heard - which was from a
vendor-neutral, major storage consulting firm that decided to stop
recommending these drives to clients.  Make of that what you will.

As an aside, some folks in our Systems Engineering department here did
do some testing of FusionIO, and they found that the helper daemons were
inefficient and placed a fair amount of load on the server.  That might
be something to watch of for for those that are testing them.

> 
> On Mar 17, 2010, at 9:18 AM, Brad Nicholson wrote:
> 
> > On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote:
> >> On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote:
> >> 
> >>> I've been hearing bad things from some folks about the quality of the
> >>> FusionIO drives from a durability standpoint.
> >> 
> >> Can you be more specific about that? Durability over what time frame? How 
> >> many devices in the sample set? How did FusionIO deal with the issue?
> > 
> > I didn't get any specifics - as we are looking at other products.  It
> > did center around how FusionIO did wear-leveling though. 
> > -- 
> > Brad Nicholson  416-673-4106
> > Database Administrator, Afilias Canada Corp.
> > 
> > 
> 
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson
On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote:
> On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote:
> 
> > I've been hearing bad things from some folks about the quality of the
> > FusionIO drives from a durability standpoint.
> 
> Can you be more specific about that? Durability over what time frame? How 
> many devices in the sample set? How did FusionIO deal with the issue?

I didn't get any specifics - as we are looking at other products.  It
did center around how FusionIO did wear-leveling though. 
-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Brad Nicholson
On Wed, 2010-03-17 at 14:30 +0200, Devrim GÜNDÜZ wrote:
> On Mon, 2010-03-08 at 09:38 -0800, Ben Chobot wrote:
> > We've enjoyed our FusionIO drives very much. They can do 100k iops
> > without breaking a sweat.
> 
> Yeah, performance is excellent. I bet we could get more, but CPU was
> bottleneck in our test, since it was just a demo server :(

Did you test the drive in all three modes?  If so, what sort of
differences did you see.

I've been hearing bad things from some folks about the quality of the
FusionIO drives from a durability standpoint. I'm Unsure if this is
vendor specific bias or not, but considering the source (which not
vendor specific), I don't think so. 

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Devrim GÜNDÜZ
On Mon, 2010-03-08 at 09:38 -0800, Ben Chobot wrote:
> We've enjoyed our FusionIO drives very much. They can do 100k iops
> without breaking a sweat.

Yeah, performance is excellent. I bet we could get more, but CPU was
bottleneck in our test, since it was just a demo server :(
-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Ben Chobot
On Mar 8, 2010, at 12:50 PM, Greg Smith wrote:

> Ben Chobot wrote:
>> We've enjoyed our FusionIO drives very much. They can do 100k iops without 
>> breaking a sweat. Just make sure you shut them down cleanly - it can up to 
>> 30 minutes per card to recover from a crash/plug pull test.   
> 
> Yeah...I got into an argument with Kenny Gorman over my concerns with how 
> they were handling durability issues on his blog, the reading I did about 
> them never left me satisfied Fusion was being completely straight with 
> everyone about this area:  http://www.kennygorman.com/wordpress/?p=398
> 
> If it takes 30 minutes to recover, but it does recover, I guess that's better 
> than I feared was the case with them.  Thanks for reporting the plug pull 
> tests--I don't trust any report from anyone about new storage hardware that 
> doesn't include that little detail as part of the testing.  You're just 
> asking to have your data get lost without that basic due diligence, and I'm 
> sure not going to even buy eval hardware from a vendor that appears evasive 
> about it.  There's a reason I don't personally own any SSD hardware yet.

Of course, the plug pull test can never be conclusive, but we never lost any 
data the handful of times we did it. Normally we'd do it more, but with such a 
long reboot cycle

But from everything we can tell, FusionIO does do reliability right.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Greg Smith

Ben Chobot wrote:
We've enjoyed our FusionIO drives very much. They can do 100k iops without breaking a sweat. Just make sure you shut them down cleanly - it can up to 30 minutes per card to recover from a crash/plug pull test. 
  


Yeah...I got into an argument with Kenny Gorman over my concerns with 
how they were handling durability issues on his blog, the reading I did 
about them never left me satisfied Fusion was being completely straight 
with everyone about this area:  http://www.kennygorman.com/wordpress/?p=398


If it takes 30 minutes to recover, but it does recover, I guess that's 
better than I feared was the case with them.  Thanks for reporting the 
plug pull tests--I don't trust any report from anyone about new storage 
hardware that doesn't include that little detail as part of the 
testing.  You're just asking to have your data get lost without that 
basic due diligence, and I'm sure not going to even buy eval hardware 
from a vendor that appears evasive about it.  There's a reason I don't 
personally own any SSD hardware yet.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Ben Chobot
We've enjoyed our FusionIO drives very much. They can do 100k iops without 
breaking a sweat. Just make sure you shut them down cleanly - it can up to 30 
minutes per card to recover from a crash/plug pull test. 

I also have serious questions about their longevity and failure mode when the 
flash finally burns out. Our hardware guys claim they have overbuilt the amount 
of flash on the card to be able to do their heavy writes for >5 years, but I 
remain skeptical. 

On Mar 8, 2010, at 6:41 AM, Devrim GÜNDÜZ wrote:

> Hi,
> 
> I have a FusionIO drive to test for a few days. I already ran iozone and
> bonnie++ against it. Does anyone have more suggestions for it?
> 
> It is a single drive (unfortunately).
> 
> Regards,
> -- 
> Devrim GÜNDÜZ
> PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
> PostgreSQL RPM Repository: http://yum.pgrpms.org
> Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
> http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Łukasz Jagiełło
2010/3/8 Devrim GÜNDÜZ :
> Hi,
>
> I have a FusionIO drive to test for a few days. I already ran iozone and
> bonnie++ against it. Does anyone have more suggestions for it?
>
> It is a single drive (unfortunately).

vdbench

-- 
Łukasz Jagiełło
System Administrator
G-Forces Web Management Polska sp. z o.o. (www.gforces.pl)

Ul. Kruczkowskiego 12, 80-288 Gdańsk
Spółka wpisana do KRS pod nr 246596 decyzją Sądu Rejonowego Gdańsk-Północ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Yeb Havinga

Devrim GÜNDÜZ wrote:

Hi,

I have a FusionIO drive

Cool!!

 to test for a few days. I already ran iozone and
bonnie++ against it. Does anyone have more suggestions for it?
  
Oracle has a tool to test drives specifically for database loads kinds 
called orion - its free software and comes with a good manual. Download 
without registration etc at 
http://www.oracle.com/technology/software/tech/orion/index.html


Quickstart

create file named named 'fusion.lun' with the device name, e.g.
/dev/sda1

Invoke orion tool with something like
 -run advanced -testname fusion -num_disks 50 -size_small 
4 -size_large 1024 -type rand -simulate concat -verbose -write 25 
-duration 15 -matrix detailed -cache_size 256


cache size is in MB's but not so important for random io.
num disks doesn't have to match physical disks but it's used by the tool 
to determine how large the test matrix should be. E.g. 1 disk gives a 
small matrix with small number of concurrent io requests. So I set it to 50.


Another idea: pgbench?

regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Testing FusionIO

2010-03-08 Thread Devrim GÜNDÜZ
Hi,

I have a FusionIO drive to test for a few days. I already ran iozone and
bonnie++ against it. Does anyone have more suggestions for it?

It is a single drive (unfortunately).

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


[PERFORM] Testing list access

2005-05-03 Thread Jona
Testing list access
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] testing

2003-08-03 Thread Ron Johnson

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly