On Tue, 2006-01-31 at 21:53 -0800, Luke Lonergan wrote:
Jeffrey,
On 1/31/06 8:09 PM, Jeffrey W. Baker [EMAIL PROTECTED] wrote:
... Prove it.
I think I've proved my point. Software RAID1 read balancing provides
0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
respectively.
I did a little test on soft raid1 :
I have two 800 Mbytes files, say A and B. (RAM is 512Mbytes).
Test 1 :
1- Read A, then read B :
19 seconds per file
2- Read A and B simultaneously using two threads :
22 seconds total
On Tue, Jan 31, 2006 at 08:09:40PM -0800, Jeffrey W. Baker wrote:
I think I've proved my point. Software RAID1 read balancing provides
0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
respectively. In the presence of random I/O, the results are even
better.
Umm, the point *was*
On Wed, Feb 01, 2006 at 09:42:12AM -0800, Luke Lonergan wrote:
This is actually interesting overall - I think what this might be showing is
that the Linux SW RAID1 is alternating I/Os to the mirror disks from
different processes (LWP or HWP both maybe?), but not within one process.
Having read
Jeffrey,
On 2/1/06 12:25 AM, Jeffrey W. Baker [EMAIL PROTECTED] wrote:
Ah, but someday Pg will be able to concurrently read from two
datastreams to complete a single query. And that day will be glorious
and fine, and you'll want as much disk concurrency as you can get your
hands on.
Well -
Jim,
On 1/30/06 12:25 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
Why divide by 2? A good raid controller should be able to send read
requests to both drives out of the mirrored set to fully utilize the
bandwidth. Of course, that probably won't come into play unless the OS
decides that it's
Luke Lonergan wrote:
Jim,
On 1/30/06 12:25 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
Why divide by 2? A good raid controller should be able to send read
requests to both drives out of the mirrored set to fully utilize the
bandwidth. Of course, that probably won't come into play unless the
On Tue, Jan 31, 2006 at 09:00:30AM -0800, Luke Lonergan wrote:
Jim,
On 1/30/06 12:25 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
Why divide by 2? A good raid controller should be able to send read
requests to both drives out of the mirrored set to fully utilize the
bandwidth. Of course,
Jeffrey,
On 1/31/06 12:03 PM, Jeffrey W. Baker [EMAIL PROTECTED] wrote:
Then you've not seen Linux.
:-D
Linux does balanced reads on software
mirrors. I'm not sure why you think this can't improve bandwidth. It
does improve streaming bandwidth as long as the platter STR is more than
the
Linux does balanced reads on software
mirrors. I'm not sure why you think this can't improve bandwidth. It
does improve streaming bandwidth as long as the platter STR is more than
the bus STR.
... Prove it.
(I have a software RAID1 on this desktop machine)
It's a lot faster
On Tue, Jan 31, 2006 at 02:52:57PM -0800, Luke Lonergan wrote:
It's because your alternating reads are skipping in chunks across the
platter. Disks work at their max internal rate when reading sequential
data, and the cache is often built to buffer a track-at-a-time, so
alternating pieces
PFC,
On 1/31/06 3:11 PM, PFC [EMAIL PROTECTED] wrote:
... Prove it.
(I have a software RAID1 on this desktop machine)
It's a lot faster than a single disk for random reads when more than 1
thread hits the disk, because it distributes reads to both disks. Thus,
applications start
Jim,
On 1/31/06 3:12 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
The alternating technique in mirroring might improve rotational latency for
random seeking - a trick that Tandem exploited, but it won't improve
bandwidth.
Or just work in multiples of tracks, which would greatly reduce the
On Tue, Jan 31, 2006 at 12:47:10PM -0800, Luke Lonergan wrote:
Linux does balanced reads on software
mirrors. I'm not sure why you think this can't improve bandwidth. It
does improve streaming bandwidth as long as the platter STR is more than
the bus STR.
... Prove it.
FWIW, this is on
On Tue, Jan 31, 2006 at 03:19:38PM -0800, Luke Lonergan wrote:
Well, the only problem with that is if the machine crashes for any
reason you risk having the database corrupted (or at best losing some
committed transactions).
So, do you routinely turn off Linux write caching? If not, then
Jeffrey,
On 1/31/06 8:09 PM, Jeffrey W. Baker [EMAIL PROTECTED] wrote:
... Prove it.
I think I've proved my point. Software RAID1 read balancing provides
0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
respectively. In the presence of random I/O, the results are even
better.
On 1/29/06, Luke Lonergan [EMAIL PROTECTED] wrote:
Oh - and about RAID 10 - for large data work it's more often a waste of
disk performance-wise compared to RAID 5 these days. RAID5 will almost
double the performance on a reasonable number of drives.
how many is reasonable?
depesz
Depesz,
On 1/30/06 9:53 AM, hubert depesz lubaczewski [EMAIL PROTECTED] wrote:
double the performance on a reasonable number of drives.
how many is reasonable?
What I mean by that is: given a set of disks N, the read performance of RAID
will be equal to the drive read rate A times the
On Fri, Jan 27, 2006 at 07:05:04PM -0800, Luke Lonergan wrote:
Sounds like you are running into the limits of your disk subsystem. You are
scanning all of the data in the transactions table, so you will be limited
by the disk bandwidth you have ? and using RAID-10, you should divide the
On 1/28/06, Luke Lonergan [EMAIL PROTECTED] wrote:
You should check your disk performance, I would
expect you'll find it lacking, partly because you are running RAID10, but
mostly because I expect you are using a hardware RAID adapter.
hmm .. do i understand correctly that you're suggesting
On Sun, Jan 29, 2006 at 12:25:23PM +0100, hubert depesz lubaczewski wrote:
hmm .. do i understand correctly that you're suggesting that using
raid 10 and/or hardware raid adapter might hurt disc subsystem
performance? could you elaborate on the reasons, please?
I think it's been fairly well
Depesz,
[mailto:[EMAIL PROTECTED] On Behalf Of
hubert depesz lubaczewski
Sent: Sunday, January 29, 2006 3:25 AM
hmm .. do i understand correctly that you're suggesting that
using raid 10 and/or hardware raid adapter might hurt disc
subsystem performance? could you elaborate on the
On Sun, 2006-01-29 at 13:44 -0500, Luke Lonergan wrote:
Depesz,
[mailto:[EMAIL PROTECTED] On Behalf Of
hubert depesz lubaczewski
Sent: Sunday, January 29, 2006 3:25 AM
hmm .. do i understand correctly that you're suggesting that
using raid 10 and/or hardware raid adapter might
On Fri, Jan 27, 2006 at 08:23:55PM -0500, Mike Biamonte wrote:
This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4
Kernel) with RAID-10 (15K drives) and 12 GB Ram. I was expecting it
to take about 4 hours - based on some experience with a similar
dataset on a different machine
Mike Biamonte wrote:
Does anyone have any experience with extremely large data sets?
I'm mean hundreds of millions of rows.
The queries I need to run on my 200 million transactions are relatively
simple:
select month, count(distinct(cardnum)) count(*), sum(amount) from
transactions group by
On Sun, 29 Jan 2006, Luke Lonergan wrote:
In fact, in our testing of various host-based SCSI RAID adapters (LSI,
Dell PERC, Adaptec, HP SmartArray), we find that *all* of them
underperform, most of them severely.
[snip]
The important lesson we've learned is to always test the I/O subsystem
Charles,
On 1/29/06 9:35 PM, Charles Sprickman [EMAIL PROTECTED] wrote:
What are you folks using to measure your arrays?
Bonnie++ measures random I/Os, numbers we find are typically in the 500/s
range, the best I've seen is 1500/s on a large Fibre Channel RAID0 (at
Mike Biamonte [EMAIL PROTECTED] writes:
The queries I need to run on my 200 million transactions are relatively
simple:
select month, count(distinct(cardnum)) count(*), sum(amount) from
transactions group by month;
count(distinct) is not relatively simple, and the current
implementation
On Sat, 2006-01-28 at 10:55 -0500, Tom Lane wrote:
Assuming that month means what it sounds like, the above would
result
in running twelve parallel sort/uniq operations, one for each month
grouping, to eliminate duplicates before counting. You've got sortmem
set high enough to blow out RAM
Jeffrey W. Baker [EMAIL PROTECTED] writes:
On Sat, 2006-01-28 at 10:55 -0500, Tom Lane wrote:
Assuming that month means what it sounds like, the above would result
in running twelve parallel sort/uniq operations, one for each month
grouping, to eliminate duplicates before counting. You've got
Title: Re: [PERFORM] Huge Data sets, simple queries
Sounds like you are running into the limits of your disk subsystem. You are scanning all of the data in the transactions table, so you will be limited by the disk bandwidth you have and using RAID-10, you should divide the number of disk
I wrote:
(We might need to tweak the planner to discourage selecting
HashAggregate in the presence of DISTINCT aggregates --- I don't
remember whether it accounts for the sortmem usage in deciding
whether the hash will fit in memory or not ...)
Ah, I take that all back after checking the
Does anyone have any experience with extremely large data sets?
I'm mean hundreds of millions of rows.
The queries I need to run on my 200 million transactions are relatively
simple:
select month, count(distinct(cardnum)) count(*), sum(amount) from
transactions group by month;
This query
On Fri, 2006-01-27 at 20:23 -0500, Mike Biamonte wrote:
Does anyone have any experience with extremely large data sets?
I'm mean hundreds of millions of rows.
Sure, I think more than a few of us do. Just today I built a summary
table from a 25GB primary table with ~430 million rows. This
34 matches
Mail list logo