Re: [zfs-discuss] Petabytes on a budget - blog

2009-12-03 Thread Trevor Pretty





Just thought I would let everybody know I saw one at a local ISP
yesterday. They hadn't started testing the metal had only arrived the
day before and they where waiting for the drives to arrive. They had
also changed the design to give it more network. I will try to find out
more as the customer progresses.


Interesting blog:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/







-- 



Trevor Pretty 
| Technical Account Manager
|
T: +64 9 639 0652 |
M: +64 21 666 161

Eagle Technology Group Ltd. 
Gate D, Alexandra Park, Greenlane West, Epsom

Private Bag 93211, Parnell, Auckland




www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-04 Thread Marc Bevand
Bill Moore Bill.Moore at sun.com writes:
 
 Moving on, modern high-capacity SATA drives are in the 100-120MB/s
 range.  Let's call it 125MB/s for easier math.  A 5-port port multiplier
 (PM) has 5 links to the drives, and 1 uplink.  SATA-II speed is 3Gb/s,
 which after all the framing overhead, can get you 300MB/s on a good day.
 So 3 drives can more than saturate a PM.  45 disks (9 backplanes at 5
 disks + PM each) in the box won't get you more than about 21 drives
 worth of performance, tops.  So you leave at least half the available
 drive bandwidth on the table, in the best of circumstances.  That also
 assumes that the SiI controllers can push 100% of the bandwidth coming
 into them, which would be 300MB/s * 2 ports = 600MB/s, which is getting
 close to a 4x PCIe-gen2 slot.

Wrong. The theoretical bandwidth of an x4 PCI-E v2.0 slot is 2GB/s per
direction (5Gbit/s before 8b-10b encoding per lane, times 0.8, times 4),
amply sufficient to deal with 600MB/s.

However they don't have this kind of slot, they have x2 PCI-E v1.0
slots (500MB/s per direction). Moreover SiI3132 default to a
MAX_PAYLOAD_SIZE of 128 bytes therefore my guess is that each 2-port
SATA card is only able to provide 60% of the theoretical throughput[1],
or about 300MB/s.

Then they have 3 such cards: total throughput of 900MB/s.

Finally the 4th SATA card (with 4 ports) is in a 32-bit 33MHz PCI slot
(not PCI-E). In practice such a bus can only provide a usable throughput
of about 100MB/s (out of 133MB/s theoretical).

All the bottlenecks are obviously the PCI-E links and the PCI bus.
So in conclusion, my SBNSWAG (scientific but not so wild-ass guess)
is that the max I/O throughput when reading from all the disks on
1 of their storage pod is about 1000MB/s. This is poor compared to
a Thumper for example, but the most important factor for them was
GB/$, not GB/sec. And they did a terrific job at that!

 And I'd re-iterate what myself and others have observed about SiI and
 silent data corruption over the years.

Irrelevant, because it seems they have built fault-tolerance higher in
the stack, à la Google. Commodity hardware + reliable software = great
combo.

[1] 
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

-mrb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-04 Thread Marc Bevand
Marc Bevand m.bevand at gmail.com writes:
 
 So in conclusion, my SBNSWAG (scientific but not so wild-ass guess)
 is that the max I/O throughput when reading from all the disks on
 1 of their storage pod is about 1000MB/s.

Correction: the SiI3132 are on x1 (not x2) links, so my guess as to
the aggregate throughput when reading from all the disks is:
3*150+100 = 550MB/s.
(150MB/s is 60% of the max theoretical 250MB/s bandwidth of an x1 link)

And if they tuned MAX_PAYLOAD_SIZE to allow the 3 PCI-E SATA cards
to exploit closer to the max theoretical bandwidth of an x1 PCI-E
link, it would be:
3*250+100 = 850MB/s.

-mrb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-04 Thread Tim Cook
On Fri, Sep 4, 2009 at 5:36 AM, Marc Bevand m.bev...@gmail.com wrote:

 Marc Bevand m.bevand at gmail.com writes:
 
  So in conclusion, my SBNSWAG (scientific but not so wild-ass guess)
  is that the max I/O throughput when reading from all the disks on
  1 of their storage pod is about 1000MB/s.

 Correction: the SiI3132 are on x1 (not x2) links, so my guess as to
 the aggregate throughput when reading from all the disks is:
 3*150+100 = 550MB/s.
 (150MB/s is 60% of the max theoretical 250MB/s bandwidth of an x1 link)

 And if they tuned MAX_PAYLOAD_SIZE to allow the 3 PCI-E SATA cards
 to exploit closer to the max theoretical bandwidth of an x1 PCI-E
 link, it would be:
 3*250+100 = 850MB/s.

 -mrb



Whats the point of arguing what the back-end can do anyways?  This is bulk
data storage.  Their MAX input is ~100MB/sec.  The backend can more than
satisfy that.  Who cares at that point whether it can push 500MB/s or
5000MB/s?  It's not a database processing transactions.  It only needs to be
able to push as fast as the front-end can go.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-04 Thread Marc Bevand
Tim Cook tim at cook.ms writes:
 
 Whats the point of arguing what the back-end can do anyways?  This is bulk 
data storage.  Their MAX input is ~100MB/sec.  The backend can more than 
satisfy that.  Who cares at that point whether it can push 500MB/s or 
5000MB/s?  It's not a database processing transactions.  It only needs to be 
able to push as fast as the front-end can go.  --Tim

True, what they have is sufficient to match GbE speed. But internal I/O 
throughput matters for resilvering RAID arrays, scrubbing, local data 
analysis/processing, etc. In their case they have 3 15-drive RAID6 arrays per 
pod. If their layout is optimal they put 5 drives on the PCI bus (to minimize 
this number)  10 drives behind PCI-E links per array, so this means the PCI 
bus's ~100MB/s practical bandwidth is shared by 5 drives, so 20MB/s per 
(1.5TB-)drive, so it is going to take minimun 20.8 hours to resilver one of 
their arrays.

-mrb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-04 Thread Tim Cook
On Sat, Sep 5, 2009 at 12:30 AM, Marc Bevand m.bev...@gmail.com wrote:

 Tim Cook tim at cook.ms writes:
 
  Whats the point of arguing what the back-end can do anyways?  This is
 bulk
 data storage.  Their MAX input is ~100MB/sec.  The backend can more than
 satisfy that.  Who cares at that point whether it can push 500MB/s or
 5000MB/s?  It's not a database processing transactions.  It only needs to
 be
 able to push as fast as the front-end can go.  --Tim

 True, what they have is sufficient to match GbE speed. But internal I/O
 throughput matters for resilvering RAID arrays, scrubbing, local data
 analysis/processing, etc. In their case they have 3 15-drive RAID6 arrays
 per
 pod. If their layout is optimal they put 5 drives on the PCI bus (to
 minimize
 this number)  10 drives behind PCI-E links per array, so this means the
 PCI
 bus's ~100MB/s practical bandwidth is shared by 5 drives, so 20MB/s per
 (1.5TB-)drive, so it is going to take minimun 20.8 hours to resilver one of
 their arrays.

 -mrb


But none of that matters.  The data is replicated at a higher layer,
combined with raid-6.  They'd have to see triple disk failure across
multiple arrays at the same time...  They aren't concerned with performance,
the home users they're backing up aren't ever going to get anything remotely
close to gigE speeds.  Absolute BEST case scenario *MIGHT* push 20mbit if
the end-user is lucky enough to have FIOS or docsis 3.0 in their area, and
has large files with a clean link.

Even rebuilding two failed disks that setup will push 2MB/sec all day long.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Al Hopper
Interesting blog:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
  Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Michael Shadle
Yeah I wrote them about it. I said they should sell them and even  
better pair it with their offsite backup service kind of like a  
massive appliance and service option.


They're not selling them but did encourage me to just make a copy of  
it. It looks like the only questionable piece in it is the port  
multipliers. Sil3726 if I recall. Which I think just barely is  
becoming supported in the most recent snvs? That's been something I've  
been wanting forever anyway.


You could also just design your own case that is optimized for a bunch  
of disks, a mobo as long as it has ECC support and enough pci/pci-x/ 
pcie slots for the amount of cards to add. You might be able to build  
one without port multipliers and just use a bunch of 8, 12, or 16 port  
sata controllers.


I want to design a case that has two layers - an internal layer with  
all the drives and guts and an external layer that pushes air around  
it to exhaust it quietly and has additional noise dampening...


Sent from my iPhone

On Sep 2, 2009, at 11:01 AM, Al Hopper a...@logical-approach.com wrote:


Interesting blog:

http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

Regards,

--
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
  Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Torrey McMahon

As some Sun folks pointed out

1) No redundancy at the power or networking side
2) Getting 2TB drives in a x4540 would make the numbers closer
3) Performance isn't going to be that great with their design but...they 
might not need it.



On 9/2/2009 2:13 PM, Michael Shadle wrote:
Yeah I wrote them about it. I said they should sell them and even 
better pair it with their offsite backup service kind of like a 
massive appliance and service option.


They're not selling them but did encourage me to just make a copy of 
it. It looks like the only questionable piece in it is the port 
multipliers. Sil3726 if I recall. Which I think just barely is 
becoming supported in the most recent snvs? That's been something I've 
been wanting forever anyway.


You could also just design your own case that is optimized for a bunch 
of disks, a mobo as long as it has ECC support and enough 
pci/pci-x/pcie slots for the amount of cards to add. You might be able 
to build one without port multipliers and just use a bunch of 8, 12, 
or 16 port sata controllers.


I want to design a case that has two layers - an internal layer with 
all the drives and guts and an external layer that pushes air around 
it to exhaust it quietly and has additional noise dampening...


Sent from my iPhone

On Sep 2, 2009, at 11:01 AM, Al Hopper a...@logical-approach.com 
mailto:a...@logical-approach.com wrote:



Interesting blog:

http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/


Regards,

--
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com 
mailto:a...@logical-approach.com

  Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Mario Goebbels

As some Sun folks pointed out

1) No redundancy at the power or networking side
2) Getting 2TB drives in a x4540 would make the numbers closer
3) Performance isn't going to be that great with their design but...they
might not need it.


4) Silicon Image chipsets. Their SATA controller chips used on a variety 
of mainboards are already well known for their unreliability and data 
corruption. I'd not want a whole bunch of SiI chips handle 67TB.


-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread C. Bergström

Mario Goebbels wrote:

As some Sun folks pointed out

1) No redundancy at the power or networking side
2) Getting 2TB drives in a x4540 would make the numbers closer
3) Performance isn't going to be that great with their design but...they
might not need it.


4) Silicon Image chipsets. Their SATA controller chips used on a 
variety of mainboards are already well known for their unreliability 
and data corruption. I'd not want a whole bunch of SiI chips handle 67TB.

5) Where's the ECC ram?
6) Management interface? lustre + zfs...   I'm already bouncing around 
ideas with others about an open Fishworks.. Maybe this is the boost we 
needed to justify sponsoring some of the development... Anyone interested?



./C

--
CTO PathScale // Open source developer
Follow me - http://www.twitter.com/CTOPathScale
blog: http://www.codestrom.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Jacob Ritorto

Torrey McMahon wrote:

3) Performance isn't going to be that great with their design but...they 
might not need it.



Would you be able to qualify this assertion?  Thinking through it a bit, 
even if the disks are better than average and can achieve 1000Mb/s each, 
each uplink from the multiplier to the controller will still have 
1000Gb/s to spare in the slowest SATA mode out there.  With (5) disks 
per multiplier * (2) multipliers * 1000GB/s each, that's 1Gb/s at 
the PCI-e interface, which approximately coincides with a meager 4x 
PCI-e slot.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Michael Shadle
IMHO it depends on the usage model. Mine is for home storage. A couple
HD streams at most. 40mB/sec over a gigabit network switch is pretty
good with me.

On Wed, Sep 2, 2009 at 11:54 AM, Jacob Ritortojacob.rito...@gmail.com wrote:
 Torrey McMahon wrote:

 3) Performance isn't going to be that great with their design but...they
 might not need it.


 Would you be able to qualify this assertion?  Thinking through it a bit,
 even if the disks are better than average and can achieve 1000Mb/s each,
 each uplink from the multiplier to the controller will still have 1000Gb/s
 to spare in the slowest SATA mode out there.  With (5) disks per multiplier
 * (2) multipliers * 1000GB/s each, that's 1Gb/s at the PCI-e interface,
 which approximately coincides with a meager 4x PCI-e slot.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Bill Moore
On Wed, Sep 02, 2009 at 02:54:42PM -0400, Jacob Ritorto wrote:
 Torrey McMahon wrote:

 3) Performance isn't going to be that great with their design 
 but...they might not need it.


 Would you be able to qualify this assertion?  Thinking through it a bit,  
 even if the disks are better than average and can achieve 1000Mb/s each,  
 each uplink from the multiplier to the controller will still have  
 1000Gb/s to spare in the slowest SATA mode out there.  With (5) disks  
 per multiplier * (2) multipliers * 1000GB/s each, that's 1Gb/s at  
 the PCI-e interface, which approximately coincides with a meager 4x  
 PCI-e slot.

Let's look at the math.  First, I don't know how 5 * 2 * 1000GB/s equals
1Gb/s, or how a 4x PCIe-gen2 slot, which can't really push a
10Gb/s Ethernet NIC can do 1000x that.

Moving on, modern high-capacity SATA drives are in the 100-120MB/s
range.  Let's call it 125MB/s for easier math.  A 5-port port multiplier
(PM) has 5 links to the drives, and 1 uplink.  SATA-II speed is 3Gb/s,
which after all the framing overhead, can get you 300MB/s on a good day.
So 3 drives can more than saturate a PM.  45 disks (9 backplanes at 5
disks + PM each) in the box won't get you more than about 21 drives
worth of performance, tops.  So you leave at least half the available
drive bandwidth on the table, in the best of circumstances.  That also
assumes that the SiI controllers can push 100% of the bandwidth coming
into them, which would be 300MB/s * 2 ports = 600MB/s, which is getting
close to a 4x PCIe-gen2 slot.  Frankly, I'd be surprised.  And the card
that uses 3 of the 4 ports has to do more like 900MB/s, which is greater
than 4x PCIe-gen2 can pull off in the real world.

And I'd re-iterate what myself and others have observed about SiI and
silent data corruption over the years.

Most of your data, most of the time, it would seem.



--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Roland Rambau

Jacob,

Jacob Ritorto schrieb:

Torrey McMahon wrote:

3) Performance isn't going to be that great with their design 
but...they might not need it.



Would you be able to qualify this assertion?  Thinking through it a bit, 
even if the disks are better than average and can achieve 1000Mb/s each, 
each uplink from the multiplier to the controller will still have 
1000Gb/s to spare in the slowest SATA mode out there.  With (5) disks 
per multiplier * (2) multipliers * 1000GB/s each, that's 1Gb/s at 
the PCI-e interface, which approximately coincides with a meager 4x 
PCI-e slot.


they use a 85$ PC motherboard - that does not have meager 4x PCI-e slots,
it has one 16x and 3 *1x* PCIe slots, plus 3 PCI slots ( remember, long time
ago: 32-bit wide 33 MHz, probably shared bus ).

Also it seems that all external traffic uses the single GbE motherboard port.

  -- Roland


--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Brent Jones
On Wed, Sep 2, 2009 at 12:12 PM, Roland Rambauroland.ram...@sun.com wrote:
 Jacob,

 Jacob Ritorto schrieb:

 Torrey McMahon wrote:

 3) Performance isn't going to be that great with their design but...they
 might not need it.


 Would you be able to qualify this assertion?  Thinking through it a bit,
 even if the disks are better than average and can achieve 1000Mb/s each,
 each uplink from the multiplier to the controller will still have 1000Gb/s
 to spare in the slowest SATA mode out there.  With (5) disks per multiplier
 * (2) multipliers * 1000GB/s each, that's 1Gb/s at the PCI-e interface,
 which approximately coincides with a meager 4x PCI-e slot.

 they use a 85$ PC motherboard - that does not have meager 4x PCI-e slots,
 it has one 16x and 3 *1x* PCIe slots, plus 3 PCI slots ( remember, long time
 ago: 32-bit wide 33 MHz, probably shared bus ).

 Also it seems that all external traffic uses the single GbE motherboard
 port.

  -- Roland


 --

 **
 Roland Rambau                 Platform Technology Team
 Principal Field Technologist  Global Systems Engineering
 Phone: +49-89-46008-2520      Mobile:+49-172-84 58 129
 Fax:   +49-89-46008-      mailto:roland.ram...@sun.com
 **
    Sitz der Gesellschaft: Sun Microsystems GmbH,
    Sonnenallee 1, D-85551 Kirchheim-Heimstetten
    Amtsgericht München: HRB 161028;  Geschäftsführer:
    Thomas Schröder, Wolfgang Engels, Wolf Frenkel
    Vorsitzender des Aufsichtsrates:   Martin Häring
 *** UNIX * /bin/sh  FORTRAN **
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Probably for their usage patterns, these boxes make sense. But I
concur that the reliability and performance would be very suspect to
any organization which values their data in any fashion.
Personally, I have some old Dual P3 systems still running fine at
home, on what were cheap motherboards. But would I advocate such a
system to protect business data? Not a chance.

I'm sure at the price they offer storage, this was the only way they
could be profitable, and it's a pretty creative solution.
For my personal data backups, I'm sure their service would meet all my
needs, but thats about as far as I would trust these systems - MP3's,
backups of photos for which I already maintain a couple copies of.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Richard Elling


On Sep 2, 2009, at 11:54 AM, Jacob Ritorto wrote:


Torrey McMahon wrote:

3) Performance isn't going to be that great with their design  
but...they might not need it.



Would you be able to qualify this assertion?  Thinking through it a  
bit, even if the disks are better than average and can achieve  
1000Mb/s each, each uplink from the multiplier to the controller  
will still have 1000Gb/s to spare in the slowest SATA mode out  
there.  With (5) disks per multiplier * (2) multipliers * 1000GB/s  
each, that's 1Gb/s at the PCI-e interface, which approximately  
coincides with a meager 4x PCI-e slot.


That doesn't matter. It does HTTP PUT/GET, so it is completely
limited by the network interface.

The advantage to their model is that they are not required to implement
a POSIX file system. PUT/GET is very easy to implement and tends to
be large transfers. In other words, they aren't running an OLTP  
database,
no user-level quotas, no directories with millions of files, etc. The  
simple

life can be good :-)

I'd be more interested in seeing their field failure rate data :-)

FWIW, bringing such a product to a global market would raise the
list price to be on par with the commercially available products.
Testing, qualifying, service, documentation, warranty, marketing,
distribution, taxes, sales, and all sorts of other costs add up quickly.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread David Magda

On Sep 2, 2009, at 14:48, C. Bergström wrote:


o Goebbels wrote:

As some Sun folks pointed out

1) No redundancy at the power or networking side
2) Getting 2TB drives in a x4540 would make the numbers closer
3) Performance isn't going to be that great with their design  
but...they

might not need it.


4) Silicon Image chipsets. Their SATA controller chips used on a  
variety of mainboards are already well known for their  
unreliability and data corruption. I'd not want a whole bunch of  
SiI chips handle 67TB.

5) Where's the ECC ram?
6) Management interface? lustre + zfs...   I'm already bouncing  
around ideas with others about an open Fishworks.. Maybe this is  
the boost we needed to justify sponsoring some of the development...  
Anyone interested?


Redundancy is handled on the software side (a la Google). From  
Backblaze's Tim Nufire:


... on redundant power, it’s easy to swap out the 2 PSUs in the  
current design with a 3+1 redundant unit. This adds a couple hundred  
dollars to the cost and since we built redundancy into our software  
layer we don’t need it. Our goal was dumb hardware, smart software.


http://storagemojo.com/2009/09/01/cloud-storage-for-100-a-terabyte/#comment-204892

The design goal was cheap space. The same comment also states that  
only only one of the six fans actually needs to be running to handle  
cooling.


I think a lot of people seem to be critiquing the Blazebox Pod  
criteria that it wasn't meant to handle.  It solved their problem  
(oodles of storage) at about a magnitude less cost than the closest  
alternatives. If you want redundancy and integrity you do it higher in  
the stack.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread David Magda

On Sep 2, 2009, at 15:14, Bill Moore wrote:


And I'd re-iterate what myself and others have observed about SiI and
silent data corruption over the years.

Most of your data, most of the time, it would seem.


Unless you have two or three or nine of these things and you spread  
data around. For the $ 1M that they claim a petabyte from Sun costs,  
they're able to make nine of their pods.


Just because they don't don't have redundancy and checksumming on the  
box doesn't mean it doesn't exists higher up in their stack. :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Trevor Pretty






  

Overall, the product is what it is.  There is nothing wrong with it in the 
right situation although they have trimmed some corners that I wouldn't 
have trimmed in their place.  However, comparing it to a NetAPP or an EMC 
is to grossly misrepresent the market.  

I don't think that is what they where doing. I think they where trying
to point out they had $X budget and wanted to buy YPB of storage and
building their own was cheaper than buying it. No surprise there!
However they don't show their RD costs. I'm sure the designers
don't work for nothing, although to their credit they do share the H/W
design and have made is open source. They also mention
www.protocase.com will make them for you so if you want to build your
own then you have no RD costs.

I would love to know why they did not use ZFS.


  This is the equivalent of seeing 
how many USB drives you can plug in as a storage solution.  I've seen this 
done.


Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


-- 





Trevor
Pretty |+64
9 639 0652 |
+64
21 666 161
Eagle
Technology Group Ltd. 
Gate
D, Alexandra Park, Greenlane West, Epsom
Private Bag 93211,
Parnell, Auckland










www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Michael Shadle
Probably due to the lack of port multiplier support. Or perhaps they  
run software for monitoring that only

works on Linux.

Sent from my iPhone

On Sep 2, 2009, at 4:33 PM, Trevor Pretty trevor_pre...@eagle.co.nz  
wrote:






Overall, the product is what it is.  There is nothing wrong with it  
in the
right situation although they have trimmed some corners that I  
wouldn't
have trimmed in their place.  However, comparing it to a NetAPP or  
an EMC

is to grossly misrepresent the market.
I don't think that is what they where doing. I think they where  
trying to point out they had $X budget and wanted to buy YPB of  
storage and building their own was cheaper than buying it. No  
surprise there! However they don't show their RD costs. I'm sure  
the designers don't work for nothing, although to their credit they  
do share the H/W design and have made is open source. They also  
mention www.protocase.com will make them for you so if you want to  
build your own then you have no RD costs.


I would love to know why they did not use ZFS.


This is the equivalent of seeing
how many USB drives you can plug in as a storage solution.  I've  
seen this

done.


Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Trevor Pretty | +64 9 639 0652 | +64 21 666 161
Eagle Technology Group Ltd.
Gate D, Alexandra Park, Greenlane West, Epsom
Private Bag 93211, Parnell, Auckland





www.eagle.co.nz
This email is confidential and may be legally privileged. If  
received in error please destroy and immediately notify us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread David Magda

On Sep 2, 2009, at 19:45, Michael Shadle wrote:

Probably due to the lack of port multiplier support. Or perhaps they  
run software for monitoring that only works on Linux.


Said support was committed only two to three weeks ago:


PSARC/2009/394 SATA Framework Port Multiplier Support
6422924 sata framework has to support port multipliers
6691950 ahci driver needs to support SIL3726/4726 SATA port multiplier


http://mail.opensolaris.org/pipermail/onnv-notify/2009-August/010084.html

If the rest of their stack is also Linux, then it would natural for  
their storage nodes to also run it as well.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss