Re: Rv: Why not BerkeleyDB based object store?
On Sat, Nov 29, 2008 at 1:01 AM, Lucas Brasilino <[EMAIL PROTECTED]> wrote: > Hi Kinkie: > >> - workload is mostly writes >> a well-tuned forward proxy will have a hit-rate of roughly 30%, >> which means 3 writes for every read on average > > I don't know if I've misunderstood this statement.. Are you saying that > even if an object is uncacheable, Squid stills storing it to disk ? That was an oversimplification on my part, sorry about that. -- /kinkie
Re: Rv: Why not BerkeleyDB based object store?
Hi Kinkie: > - workload is mostly writes > a well-tuned forward proxy will have a hit-rate of roughly 30%, > which means 3 writes for every read on average I don't know if I've misunderstood this statement.. Are you saying that even if an object is uncacheable, Squid stills storing it to disk ? regards Lucas Brasilino
Re: Rv: Why not BerkeleyDB based object store?
I thought about it a while ago but i'm just out of time to be honest. Writing objects to disk only if they're popular or you need the RAM to handle concurrent accesses for large objects for some reason would probably way way improve disk performance as the amount of writing would drop drastically. Sponsorship for investigating and developing this is gladly accepted :) Adrian 2008/11/26 Mark Nottingham <[EMAIL PROTECTED]>: > Just a tangental thought; has there been any investigation into reducing the > amount of write traffic with the existing stores? > > E.g., establishing a floor for reference count; if it doesn't have n refs, > don't write to disk? This will impact hit rate, of course, but may mitigate > in situations where disk caching is desirable, but writing is the > bottleneck... > > > On 26/11/2008, at 9:14 AM, Kinkie wrote: > >> On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti >> <[EMAIL PROTECTED]> wrote: >>> >>> Amazon uses BerkeleyDB for several critical parts of its website. The >>> Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its >>> trading database. And Google uses BerkeleyDB to process Gmail and Google >>> user accounts. Are you sure BerkeleyDB is not a good idea to replace the >>> Squid filesystems even COSS? >> >> Squid3 uses a modular storage backend system, so you're more than >> welcome to try to code it up and see how it compares. >> Generally speaking, the needs of a data cache such as squid are very >> different from those of a general-purpose backend storage. >> Among the other key differences: >> - the data in the cache has little or no value. >> it's important to know whether a file was corrupted, but it can >> always be thrown away and fetched from the origin server at a >> relatively low cost >> - workload is mostly writes >> a well-tuned forward proxy will have a hit-rate of roughly 30%, >> which means 3 writes for every read on average >> - data is stored in incremental chunks >> >> Given these characteristics, a long list of mechanisms database-like >> systems have such as journaling, transactions etc. are a waste of >> resources. >> COSS is explicitly designed to handle a workload of this kind. I would >> not trust any valuable data to it, but it's about as fast as it gets >> for a cache. >> >> IMHO BDB might be much more useful as a metadata storage engine, as >> those have a very different access pattern than a general-purpose >> cache store. >> But if I had any time to devote to this, my priority would be in >> bringing 3.HEAD COSS up to speed with the work Adrian has done in 2. >> >> -- >> /kinkie > > -- > Mark Nottingham [EMAIL PROTECTED] > > >
Re: Rv: Why not BerkeleyDB based object store?
Just a tangental thought; has there been any investigation into reducing the amount of write traffic with the existing stores? E.g., establishing a floor for reference count; if it doesn't have n refs, don't write to disk? This will impact hit rate, of course, but may mitigate in situations where disk caching is desirable, but writing is the bottleneck... On 26/11/2008, at 9:14 AM, Kinkie wrote: On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti <[EMAIL PROTECTED]> wrote: Amazon uses BerkeleyDB for several critical parts of its website. The Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its trading database. And Google uses BerkeleyDB to process Gmail and Google user accounts. Are you sure BerkeleyDB is not a good idea to replace the Squid filesystems even COSS? Squid3 uses a modular storage backend system, so you're more than welcome to try to code it up and see how it compares. Generally speaking, the needs of a data cache such as squid are very different from those of a general-purpose backend storage. Among the other key differences: - the data in the cache has little or no value. it's important to know whether a file was corrupted, but it can always be thrown away and fetched from the origin server at a relatively low cost - workload is mostly writes a well-tuned forward proxy will have a hit-rate of roughly 30%, which means 3 writes for every read on average - data is stored in incremental chunks Given these characteristics, a long list of mechanisms database-like systems have such as journaling, transactions etc. are a waste of resources. COSS is explicitly designed to handle a workload of this kind. I would not trust any valuable data to it, but it's about as fast as it gets for a cache. IMHO BDB might be much more useful as a metadata storage engine, as those have a very different access pattern than a general-purpose cache store. But if I had any time to devote to this, my priority would be in bringing 3.HEAD COSS up to speed with the work Adrian has done in 2. -- /kinkie -- Mark Nottingham [EMAIL PROTECTED]
Re: Rv: Why not BerkeleyDB based object store?
On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti <[EMAIL PROTECTED]> wrote: > Amazon uses BerkeleyDB for several critical parts of its website. The Chicago > Mercatile Exchange uses BerkeleyDB for backup and recovery of its trading > database. And Google uses BerkeleyDB to process Gmail and Google user > accounts. Are you sure BerkeleyDB is not a good idea to replace the Squid > filesystems even COSS? Squid3 uses a modular storage backend system, so you're more than welcome to try to code it up and see how it compares. Generally speaking, the needs of a data cache such as squid are very different from those of a general-purpose backend storage. Among the other key differences: - the data in the cache has little or no value. it's important to know whether a file was corrupted, but it can always be thrown away and fetched from the origin server at a relatively low cost - workload is mostly writes a well-tuned forward proxy will have a hit-rate of roughly 30%, which means 3 writes for every read on average - data is stored in incremental chunks Given these characteristics, a long list of mechanisms database-like systems have such as journaling, transactions etc. are a waste of resources. COSS is explicitly designed to handle a workload of this kind. I would not trust any valuable data to it, but it's about as fast as it gets for a cache. IMHO BDB might be much more useful as a metadata storage engine, as those have a very different access pattern than a general-purpose cache store. But if I had any time to devote to this, my priority would be in bringing 3.HEAD COSS up to speed with the work Adrian has done in 2. -- /kinkie
Rv: Why not BerkeleyDB based object store?
Amazon uses BerkeleyDB for several critical parts of its website. The Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its trading database. And Google uses BerkeleyDB to process Gmail and Google user accounts. Are you sure BerkeleyDB is not a good idea to replace the Squid filesystems even COSS? ¡Buscá desde tu celular! Yahoo! oneSEARCH ahora está en Claro http://ar.mobile.yahoo.com/onesearch