Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Alan Brown
Mehma Sarja wrote:

 There is one more thing to think about and that is cumulative aging. 
 Starting with all new disks is a false sense of security because as they 
 age, and if they are in any sort of RAID/performance configuration, they 
 will age and wear evenly.

Expanding on that:

It is generally a bad idea to use 100% drives from the same 
manufacturing batch because their eventual failures are likely to be 
very close together

Similarly it is a good idea to use a mix of drives from different 
manufacturers, in order to try and spread any failures out over a longer 
period of time.

I know of very few hardware suppliers who mix batches in their raid 
arrays and of none who mix manufacturers. So far this hasn't been a 
problem for our hardware arrays, but I prefer to mix things up when 
building software arrays.

AB



--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Alan Brown
Phil Stracchino wrote:

 Well, a good start is to use something like SMART monitoring set up to
 alert you when any drive enters what it considers a pre-fail state.
 (Which can be simple age, increasing numbers of hard errors, increasing
 variation in spindle speed, increasing slow starts, etc, etc...)

FWIW: Nexan, Xyratec and Infortrend all have SMART tracking disabled on 
their hardware arrays because they claim it usually only says a drive is 
on its way out a few hours after it died.

(Personally: I use it and find that it does predict imminent drive 
failures, but usually with less than 24 hours to go. That's still better 
than no warning at all.)







--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread John Drescher
 Well, a good start is to use something like SMART monitoring set up to
 alert you when any drive enters what it considers a pre-fail state.
 (Which can be simple age, increasing numbers of hard errors, increasing
 variation in spindle speed, increasing slow starts, etc, etc...)

 FWIW: Nexan, Xyratec and Infortrend all have SMART tracking disabled on
 their hardware arrays because they claim it usually only says a drive is
 on its way out a few hours after it died.


I would say this is true for smart PASS / FAIL but if you look at the
raw SMART data you can use this to predict failure before it totally
fails. At least I have been able to predict this for the 10 to 20
drives that have died here at work since 2009. I usually know a week
or so before a drive is going to die and I pull the drive from the
raid for further testing. And what I mean as further testing is I do a
4 pass badblocks  read / write test looking at the SMART raw data
before and after.

John

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Alan Brown
John Drescher wrote:

 I would say this is true for smart PASS / FAIL but if you look at the
 raw SMART data you can use this to predict failure before it totally
 fails. 

I agree but they don't do that.

 At least I have been able to predict this for the 10 to 20
 drives that have died here at work since 2009. 

I haven't had as many die as you have (Do your users kick their 
computers around the room?) but my experience matches yours when looking 
at changes in the raw data. The problem is I haven't had enough die to 
put 100% certainty on it so I tend to rely on smartd's output.




--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread John Drescher
 I haven't had as many die as you have (Do your users kick their computers
 around the room?) but my experience matches yours when looking at changes in
 the raw data. The problem is I haven't had enough die to put 100% certainty
 on it so I tend to rely on smartd's output.


I have between 100 and 200 drives at work.

John

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Alan Brown
John Drescher wrote:
 I haven't had as many die as you have (Do your users kick their computers
 around the room?) but my experience matches yours when looking at changes in
 the raw data. The problem is I haven't had enough die to put 100% certainty
 on it so I tend to rely on smartd's output.

 
 I have between 100 and 200 drives at work.

That's less than half the number I have in the server room alone (approx 
450 there), plus about 150 managed PCs in various offices - even the 
user machines have overall disk failure rates well below 1% per 3 years.

We have a firm policy of replacing server drives at 5 years and handling 
them with kid gloves at all times, so that may be one of the reasons 
we don't see as many failures.

AB



--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Mehma Sarja
On 3/23/11 7:28 AM, Alan Brown wrote:
 Phil Stracchino wrote:

 Well, a good start is to use something like SMART monitoring set up to
 alert you when any drive enters what it considers a pre-fail state.
 (Which can be simple age, increasing numbers of hard errors, increasing
 variation in spindle speed, increasing slow starts, etc, etc...)
 FWIW: Nexan, Xyratec and Infortrend all have SMART tracking disabled on
 their hardware arrays because they claim it usually only says a drive is
 on its way out a few hours after it died.

 (Personally: I use it and find that it does predict imminent drive
 failures, but usually with less than 24 hours to go. That's still better
 than no warning at all.)


Since drives ONLY fail on Friday afternoons local time, an effective 
remedy is to check for SMART messages before the weekend. Foolish as 
that is, I am surprised how many times it has held true for me.

Mehma



--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Alan Brown
Mehma Sarja wrote:

 Since drives ONLY fail on Friday afternoons local time, an effective 
 remedy is to check for SMART messages before the weekend. Foolish as 
 that is, I am surprised how many times it has held true for me.

For similar reasons we only perform work on critical infrastructure on 
tuesday (or thursday if a followup is needed).

Mondays are for picking up any pieces from the weekend and Fridays are 
best left alone.

Equipment tends to fail the evening or day after it was last worked on...




--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Chris Hoogendyk


On 3/23/11 12:51 PM, Alan Brown wrote:
 Mehma Sarja wrote:
 Since drives ONLY fail on Friday afternoons local time, an effective
 remedy is to check for SMART messages before the weekend. Foolish as
 that is, I am surprised how many times it has held true for me.
 For similar reasons we only perform work on critical infrastructure on
 tuesday (or thursday if a followup is needed).

 Mondays are for picking up any pieces from the weekend and Fridays are
 best left alone.

 Equipment tends to fail the evening or day after it was last worked on...

Worst case scenario: A number of years ago, I had a critical central proxy 
server whose drive failed 
on Christmas Day. Okay, campus was closed and everyone was on vacation, but 
lots of people doing 
research from home could not access resources. I had no replacement drive on 
hand. I had to recover 
onto a completely different machine, set up the proxy services, change DNS 
entries to point to it, 
set up virtual interfaces for the proxy connections, etc. Spent the better part 
of Christmas Day and 
evening alone in the server room sweating out the details, after I had tracked 
down the cause of the 
difficulties.

After that, I convinced management to pay for mirrored drives.

-- 
---

Chris Hoogendyk

-
O__   Systems Administrator
   c/ /'_ --- Biology  Geology Departments
  (*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst

hoogen...@bio.umass.edu

---

Erdös 4



--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-23 Thread Alan Brown
Chris Hoogendyk wrote:

 After that, I convinced management to pay for mirrored drives.
 

How much was the overtime bill? ;)




--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-18 Thread Alan Brown
Phil Stracchino wrote:

 With RAID6, you can survive any one or two disk failures, in degraded
 mode.  You'll have a larger working set than RAID10, but performance
 will be slower because of the overhead of parity calculations.  A third
 failure will bring the array down and you will lose the data.

There's always RAID60, but that requires a lot of drives.




--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-18 Thread Steve Costaras
Not really, RAID6+0 only requires 8 drives minimum you can create two 
RAID6's of 4 drives each and stripe them together.This has a benefit 
as multi-layer based parity raids increases random write iops 
performance. But the main issue is array integrity, mainly with 
large capacity drives as they have UBR rates in 10^14 or 10^15 range but 
when you consider their capacity this dominates availability 
calculations for an array well over any 'hard' failure calculations.
Generally with 1TB+ drives with 10^14 UBR rates I would be hard pressed 
to  use more than 6 drives in a raid group (4D+2P), with 10^15 you may 
get by with 8D+2P but you have to choose your own risk level.


Personally, I don't like building arrays that have a probability of not 
reading a sector in a single sub array greater than 5%  (ideally it 
should be less than 1% but you're not going to get that unless you're 
talking 10^16 and small drives (500GB).


This still doesn't address the silent errors that happen which is the 
main thrust behind file systems like ZFS and/or T10 DIF (fat sectors) 
which do checking to make sure what sector your requesting is the sector 
you're getting.




On 2011-03-18 08:22, Alan Brown wrote:

Phil Stracchino wrote:


With RAID6, you can survive any one or two disk failures, in degraded
mode.  You'll have a larger working set than RAID10, but performance
will be slower because of the overhead of parity calculations.  A third
failure will bring the array down and you will lose the data.

There's always RAID60, but that requires a lot of drives.




--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-18 Thread Marcello Romani
Il 18/03/2011 19:01, Mehma Sarja ha scritto:
 On 3/17/11 4:57 PM, Phil Stracchino wrote:
 On 03/17/11 18:46, Marcello Romani wrote:
 Il 16/03/2011 18:38, Phil Stracchino ha scritto:
 On 03/16/11 13:08, Mike Hobbs wrote:
  Hello,  I'm currently testing bacula v5.0.3 and so far so good.  One
 of my issues though, I have a 16 bay Promise Technologies VessJBOD.  How
 do I get bacula to use all the disks for writing volumes to?

 I guess the way I envision it working would be, 50gb volumes would be
 used and when disk1 fills up, bacula switches over to disk2 and starts
 writing out volumes until that disk is filled, then on to disk3, etc..
 eventually coming back around and recycling the volumes on disk 1.

 I'm not sure the above scenario is the best way to go about this, I've
 read that some people create a pool for each drive.  What is the most
 common practice when setting up a JBOD unit with bacula?  Any
 suggestions or advice would be appropriated.
 That scheme sounds like a bad and overly complex idea, honestly.
 Depending on your data load, I'd use software RAID to make them into a
 single RAID5 or RAID10 volume.  RAID10 would be faster and, if set up
 correctly[1], more redundant; RAID5 is more space-efficient, but slower.


 [1] There's a right and a wrong way to set up RAID10.  The wrong way is
 to set up two five-disk stripes, then mirror them; lose one disk from
 each stripe, and you're dead in the water.  The right way is to set up
 five mirrored pairs, then stripe the pairs; this will survive multiple
 disk failures as long as you don't lose both disks of any single pair.


 Hi Phil,
that last sentence sounds a little scary to me: this will survive
 multiple disk failures *as long as you don't lose both disks of any
 single pair*.
 Isn't RAID6 a safer bet ?
 That depends.

 With RAID6, you can survive any one or two disk failures, in degraded
 mode.  You'll have a larger working set than RAID10, but performance
 will be slower because of the overhead of parity calculations.  A third
 failure will bring the array down and you will lose the data.

 With RAID10 with sixteen drives, you can survive any one drive failure
 with minimal performance degradation.  There is a 1 in 15 chance that a
 second failure will be the other drive of that pair, and bring the array
 down.  If not, then there is a 1 in 7 chance that a third drive failure
 will be on the same pair as one of the two drives already failed.  If
 not, the array will still continue to operate, with some read
 performance degradation, and there is now a just less than 1 in 4 chance
 (3/13) that if a fourth drive fails, it will be on the same pair as one
 of the three already failed.  ... And so on.  There is a cumulative 39%
 chance that four random failures will fail the entire array, which rises
 to 59% with five failures, and 78% with six.  (91% at seven, 98% at
 eight, and no matter how many leprechauns live in your back yard, at
 nine failures you're screwed of course.  It's like the joke about the
 two men in the airliner.)

 But if the array was RAID6, it already went down for the count when the
 third drive failed.



 Now, granted, multiple failures like that are rare.  But ... I had a
 cascade failure of three drives out of a twelve-drive RAIDZ2 array
 between 4am and 8am one morning.  Each drive that failed pushed the load
 on the remaining drives higher, and after a couple of hours of that, the
 next weakest drive failed, which pushed the load still higher.  And when
 the third drive failed, the entire array went down.  It can happen.

 But ...  I'm running RAIDZ3 right now, and as soon as I can replace the
 rest of the drives with new drives, I'll be going back to RAIDZ2.
 Because RAIDZ3 is a bit too much of a performance hit on my server, and
 - with drives that aren't dying of old age - RAIDZ2 is redundant
 *enough* for me.  There is no data on the array that is crucial *AND*
 irreplaceable *AND* not also stored somewhere else.

 What it comes down to is, you have to decide for yourself what your
 priorities are - redundancy, performance, space efficiency - and how
 much of each you're willing to give up to get as much as you want of the
 others.


 There is one more thing to think about and that is cumulative aging.
 Starting with all new disks is a false sense of security because as they
 age, and if they are in any sort of RAID/performance configuration, they
 will age and wear evenly. Which means they will all start to fail
 together. It is OK to design a system and assume one or two simultaneous
 drive failure - when the drives are relatively young. After 3 years of
 sustained use, like email storage, you are at higher risk no matter
 which RAID scheme you have used.

 Mehma

This is an interesting point. But what parameter should one take into 
account to decide when it's time to replace an aged (but still good) 
disk with a fresh one ?

Marcello


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-18 Thread Phil Stracchino
On 03/18/11 19:41, Marcello Romani wrote:
 Il 18/03/2011 19:01, Mehma Sarja ha scritto:
 There is one more thing to think about and that is cumulative aging.
 Starting with all new disks is a false sense of security because as they
 age, and if they are in any sort of RAID/performance configuration, they
 will age and wear evenly. Which means they will all start to fail
 together. It is OK to design a system and assume one or two simultaneous
 drive failure - when the drives are relatively young. After 3 years of
 sustained use, like email storage, you are at higher risk no matter
 which RAID scheme you have used.

 Mehma
 
 This is an interesting point. But what parameter should one take into 
 account to decide when it's time to replace an aged (but still good) 
 disk with a fresh one ?
 
 Marcello

Well, a good start is to use something like SMART monitoring set up to
alert you when any drive enters what it considers a pre-fail state.
(Which can be simple age, increasing numbers of hard errors, increasing
variation in spindle speed, increasing slow starts, etc, etc...)


-- 
  Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
  ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
 It's not the years, it's the mileage.

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-18 Thread Phil Stracchino
On 03/18/11 21:00, Mehma Sarja wrote:
 I can only think of staggering drive age and maintenance. Here's hoping 
 that someone on the list can come up with more creative solutions/practices.

Try to avoid buying a large number of drives from the same batch.  This
is frequently easily accomplished by spreading purchases across several
vendors.  Four drives here, four there...


-- 
  Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
  ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
 It's not the years, it's the mileage.

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-17 Thread Marcello Romani
Il 16/03/2011 18:38, Phil Stracchino ha scritto:
 On 03/16/11 13:08, Mike Hobbs wrote:
Hello,  I'm currently testing bacula v5.0.3 and so far so good.  One
 of my issues though, I have a 16 bay Promise Technologies VessJBOD.  How
 do I get bacula to use all the disks for writing volumes to?

 I guess the way I envision it working would be, 50gb volumes would be
 used and when disk1 fills up, bacula switches over to disk2 and starts
 writing out volumes until that disk is filled, then on to disk3, etc..
 eventually coming back around and recycling the volumes on disk 1.

 I'm not sure the above scenario is the best way to go about this, I've
 read that some people create a pool for each drive.  What is the most
 common practice when setting up a JBOD unit with bacula?  Any
 suggestions or advice would be appropriated.

 That scheme sounds like a bad and overly complex idea, honestly.
 Depending on your data load, I'd use software RAID to make them into a
 single RAID5 or RAID10 volume.  RAID10 would be faster and, if set up
 correctly[1], more redundant; RAID5 is more space-efficient, but slower.


 [1] There's a right and a wrong way to set up RAID10.  The wrong way is
 to set up two five-disk stripes, then mirror them; lose one disk from
 each stripe, and you're dead in the water.  The right way is to set up
 five mirrored pairs, then stripe the pairs; this will survive multiple
 disk failures as long as you don't lose both disks of any single pair.



Hi Phil,
 that last sentence sounds a little scary to me: this will survive 
multiple disk failures *as long as you don't lose both disks of any 
single pair*.
Isn't RAID6 a safer bet ?

Thanks.

Marcello

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-17 Thread Marcello Romani
Il 18/03/2011 00:57, Phil Stracchino ha scritto:
 On 03/17/11 18:46, Marcello Romani wrote:
 Il 16/03/2011 18:38, Phil Stracchino ha scritto:
 On 03/16/11 13:08, Mike Hobbs wrote:
 Hello,  I'm currently testing bacula v5.0.3 and so far so good.  One
 of my issues though, I have a 16 bay Promise Technologies VessJBOD.  How
 do I get bacula to use all the disks for writing volumes to?

 I guess the way I envision it working would be, 50gb volumes would be
 used and when disk1 fills up, bacula switches over to disk2 and starts
 writing out volumes until that disk is filled, then on to disk3, etc..
 eventually coming back around and recycling the volumes on disk 1.

 I'm not sure the above scenario is the best way to go about this, I've
 read that some people create a pool for each drive.  What is the most
 common practice when setting up a JBOD unit with bacula?  Any
 suggestions or advice would be appropriated.

 That scheme sounds like a bad and overly complex idea, honestly.
 Depending on your data load, I'd use software RAID to make them into a
 single RAID5 or RAID10 volume.  RAID10 would be faster and, if set up
 correctly[1], more redundant; RAID5 is more space-efficient, but slower.


 [1] There's a right and a wrong way to set up RAID10.  The wrong way is
 to set up two five-disk stripes, then mirror them; lose one disk from
 each stripe, and you're dead in the water.  The right way is to set up
 five mirrored pairs, then stripe the pairs; this will survive multiple
 disk failures as long as you don't lose both disks of any single pair.



 Hi Phil,
   that last sentence sounds a little scary to me: this will survive
 multiple disk failures *as long as you don't lose both disks of any
 single pair*.
 Isn't RAID6 a safer bet ?

 That depends.

 With RAID6, you can survive any one or two disk failures, in degraded
 mode.  You'll have a larger working set than RAID10, but performance
 will be slower because of the overhead of parity calculations.  A third
 failure will bring the array down and you will lose the data.

 With RAID10 with sixteen drives, you can survive any one drive failure
 with minimal performance degradation.  There is a 1 in 15 chance that a
 second failure will be the other drive of that pair, and bring the array
 down.  If not, then there is a 1 in 7 chance that a third drive failure
 will be on the same pair as one of the two drives already failed.  If
 not, the array will still continue to operate, with some read
 performance degradation, and there is now a just less than 1 in 4 chance
 (3/13) that if a fourth drive fails, it will be on the same pair as one
 of the three already failed.  ... And so on.  There is a cumulative 39%
 chance that four random failures will fail the entire array, which rises
 to 59% with five failures, and 78% with six.  (91% at seven, 98% at
 eight, and no matter how many leprechauns live in your back yard, at
 nine failures you're screwed of course.  It's like the joke about the
 two men in the airliner.)

 But if the array was RAID6, it already went down for the count when the
 third drive failed.



 Now, granted, multiple failures like that are rare.  But ... I had a
 cascade failure of three drives out of a twelve-drive RAIDZ2 array
 between 4am and 8am one morning.  Each drive that failed pushed the load
 on the remaining drives higher, and after a couple of hours of that, the
 next weakest drive failed, which pushed the load still higher.  And when
 the third drive failed, the entire array went down.  It can happen.

 But ...  I'm running RAIDZ3 right now, and as soon as I can replace the
 rest of the drives with new drives, I'll be going back to RAIDZ2.
 Because RAIDZ3 is a bit too much of a performance hit on my server, and
 - with drives that aren't dying of old age - RAIDZ2 is redundant
 *enough* for me.  There is no data on the array that is crucial *AND*
 irreplaceable *AND* not also stored somewhere else.

 What it comes down to is, you have to decide for yourself what your
 priorities are - redundancy, performance, space efficiency - and how
 much of each you're willing to give up to get as much as you want of the
 others.



Phil,
 that was an interesting read. Thanks for your detailed response.
(Your last paragraph is of course the definitive word on the subject.)
Now that I think about it, I realize I didn't fully take into account 
the high number of drives we're talking about. Probably if using RAID6 a 
spare drive is to be considered. Or, better yet, a mirror machine...
But then we're back to it depends, I guess :-)

Oh, and BTW, maybe it's time for me to move past these old limited raid 
levels and investigate ZFS and those intriguing RAIDZx arrays...

Marcello

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d

Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-16 Thread Mike Hobbs
  On 03/16/2011 01:12 PM, Robison, Dave wrote:
 Just curious, why not put that jbod into a RAID array? I believe you'd
 get far better performance with the additional spools and you'd get
 redundancy as well.

 Personally I'd set that up as a RAIDZ using ZFS on FreeBSD.



I believe the reason why we decided not to use raid was in case the raid 
array got corrupted.  We would then lose all of our backups.. Where as 
if one disk dies, we only lose what was on that disk.  There may have 
been another reason but  I think that was the main reason.

mike

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-16 Thread Bruno Friedmann
On 03/16/2011 06:29 PM, Mike Hobbs wrote:
   On 03/16/2011 01:12 PM, Robison, Dave wrote:
 Just curious, why not put that jbod into a RAID array? I believe you'd
 get far better performance with the additional spools and you'd get
 redundancy as well.

 Personally I'd set that up as a RAIDZ using ZFS on FreeBSD.


 
 I believe the reason why we decided not to use raid was in case the raid 
 array got corrupted.  We would then lose all of our backups.. Where as 
 if one disk dies, we only lose what was on that disk.  There may have 
 been another reason but  I think that was the main reason.
 
 mike
 

Nice but if the controller do silent corruption, you're down too :-)

-- 

Bruno Friedmann
Ioda-Net Sàrl www.ioda-net.ch

openSUSE Member  Ambassador
GPG KEY : D5C9B751C4653227
irc: tigerfoot

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-16 Thread Phil Stracchino
On 03/16/11 13:08, Mike Hobbs wrote:
   Hello,  I'm currently testing bacula v5.0.3 and so far so good.  One 
 of my issues though, I have a 16 bay Promise Technologies VessJBOD.  How 
 do I get bacula to use all the disks for writing volumes to?
 
 I guess the way I envision it working would be, 50gb volumes would be 
 used and when disk1 fills up, bacula switches over to disk2 and starts 
 writing out volumes until that disk is filled, then on to disk3, etc.. 
 eventually coming back around and recycling the volumes on disk 1.
 
 I'm not sure the above scenario is the best way to go about this, I've 
 read that some people create a pool for each drive.  What is the most 
 common practice when setting up a JBOD unit with bacula?  Any 
 suggestions or advice would be appropriated.

That scheme sounds like a bad and overly complex idea, honestly.
Depending on your data load, I'd use software RAID to make them into a
single RAID5 or RAID10 volume.  RAID10 would be faster and, if set up
correctly[1], more redundant; RAID5 is more space-efficient, but slower.


[1] There's a right and a wrong way to set up RAID10.  The wrong way is
to set up two five-disk stripes, then mirror them; lose one disk from
each stripe, and you're dead in the water.  The right way is to set up
five mirrored pairs, then stripe the pairs; this will survive multiple
disk failures as long as you don't lose both disks of any single pair.


-- 
  Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
  ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
 It's not the years, it's the mileage.

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula and 16 bay JBOD

2011-03-16 Thread John Drescher
On Wed, Mar 16, 2011 at 1:29 PM, Mike Hobbs mho...@mtl.mit.edu wrote:
  On 03/16/2011 01:12 PM, Robison, Dave wrote:
 Just curious, why not put that jbod into a RAID array? I believe you'd
 get far better performance with the additional spools and you'd get
 redundancy as well.

 Personally I'd set that up as a RAIDZ using ZFS on FreeBSD.



 I believe the reason why we decided not to use raid was in case the raid
 array got corrupted.  We would then lose all of our backups..

I believe that is a very big danger with a raid. I never recommend a
single raid array for all your backups. Two raid arrays on separate
raid controllers (preferably separate machines) each containing at
least 1 backup of everything are fine but a single raid to hold your
only backup copy is dangerous.

 Where as
 if one disk dies, we only lose what was on that disk.  There may have
 been another reason but  I think that was the main reason.


I do not have time to explain the details at the moment but I
recommend you take a look at the bacula vchanger for what you are
trying to do:

http://sourceforge.net/projects/vchanger/

John

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users