Re: how to...accelarate randon access to millions of images?

2008-03-16 Thread Michael S. Fischer
On Fri, Mar 14, 2008 at 1:37 PM, Sascha Ottolski [EMAIL PROTECTED] wrote:
  The challenge is to server 20+ million image files, I guess with up to
  1500 req/sec at peak.

A modern disk drive can service 100 random IOPS (@ 10ms/seek, that's
reasonable).  Without any caching, you'd need 15 disks to service your
peak load, with a bit over 10ms I/O latency (seek + read).

 The files tend to be small, most of them in a
  range of 5-50 k. Currently the image store is about 400 GB in size (and
  growing every day). The access pattern is very random, so it will be
  very unlikely that any size of RAM will be big enough...

Are you saying that the hit ratio is likely to be zero?  If so,
consider whether you want to have caching turned on the first place.
There's little sense buying extra RAM if it's useless to you.

  Now my question is: what kind of hardware would I need? Lots of RAM
  seems to be obvious, what ever a lot may be...What about the disk
  subsystem? Should I look into something like RAID-0 with many disk to
  push the IO-performance?

You didn't say what your failure tolerance requirements were.  Do you
care if you lose data?   Do you care if you're unable to serve some
requests while a machine is down?

Consider dividing up your image store onto multiple machines.  Not
only would you get better performance, but you would be able to
survive hardware failures with fewer catastropic effects (i.e., you'd
lose only 1/n of service).

If I were designing such a service, my choices would be:

(1) 4 machines, each with 4-disk RAID 1 (fast, but dangerous)
(2) 4 machines, each with 5-disk RAID 5 (safe, fast reads, but slow
writes for your file size - also, RAID 5 should be battery backed,
which adds cost)
(3) 4 machines, each with 4-disk RAID 10 (will meet workload
requirement, but won't handle peak load in degraded mode)
(4) 5 machines, each with 4-disk RAID 10
(5) 9 machines, each with 2-disk RAID 0

Multiply each of these machine counts by 2 if you want to be resilient
to failures other than disk failures.

You can then put a Varnish proxy layer in front of your image storage
servers, and direct incoming requests to the appropriate backend
server.

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: how to...accelarate randon access to millions of images?

2008-03-16 Thread Michael S. Fischer
On Sun, Mar 16, 2008 at 10:02 AM, Michael S. Fischer
[EMAIL PROTECTED] wrote:

I don't know why I'm having such a problem with this.  Sigh!  I think
I got it right this time.

If I were designing such a service, my choices would be:

  Corrections:


(1) 4 machines, each with 4-disk RAID 0 (fast, but dangerous)
(2) 4 machines, each with 5-disk RAID 5 (safe, fast reads, but slow
writes for your file size - also, RAID 5 should be battery backed,
which adds cost)
(3) 4 machines, each with 4-disk RAID 10 (will meet workload
requirement, but won't handle peak load in degraded mode)
(4) 5 machines, each with 4-disk RAID 10
(5) 9 machines, each with 2-disk RAID 1

--Michael
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc