[web2py:17362] Re: sharding counters

Jim Mon, 02 Mar 2009 16:11:39 -0800

In the first message, I give a reference to google's implementation.
I also found http://highscalability.com/numbers-everyone-should-know
helpful:


"Sharded Counters

We always seem to want to keep count of things. But BigTable doesn't
keep a count of entities because it's a key-value store. It's very
good at getting data by keys, it's not interested in how many you
have. So the job of keeping counts is shifted to you.

The naive counter implementation is to lock-read-increment-write. This
is fine if there a low number of writes. But if there are frequent
updates there's high contention. Given the the number of writes that
can be made per second is so limited, a high write load serializes and
slows down the whole process.

The solution is to shard counters. This means:
# Create N counters in parallel.
# Pick a shard to increment transactionally at random for each item
counted.
# To get the real current count sum up all the sharded counters.
# Contention is reduced by 1/N. Writes have been optimized because
they have been spread over the different shards. A bottleneck around
shared state has been removed.

This approach seems counter-intuitive because we are used to a counter
being a single incrementable variable. Reads are cheap so we replace
having a single easily read counter with having to make multiple reads
to recover the actual count. Frequently updated shared variables are
expensive so we shard and parallelize those writes."



On Mar 2, 3:27 pm, mdipierro <[email protected]> wrote:
> I am lost here. I need to look up what "sharding" means.
>
> Massimo
>
> On Mar 2, 3:25 pm, Jim <[email protected]> wrote:
>
> > Yes, but that gets me an exact match, right?
>
> > If I've sharded my counters the way Google is recommending, I'll have
> > something like this:
>
> > name     count
> > test0            3
> > test1            2
> > test2            3
> > ...
> > test19          2
>
> > So I want to find all the records in my shards table that start with
> > "test" and then total them.
>
> > On Mar 2, 12:20 pm, mdipierro <[email protected]> wrote:
>
> > > Because it is
>
> > > counter = db(db.shards.name==shard_name).select()
>
> > > ;-)
>
> > > On Mar 2, 12:48 pm, Jim <[email protected]> wrote:
>
> > > > Boiled down, my problem is this:
>
> > > > I want to use
>
> > > >         counter = db(db.shards.name=shard_name).select()
>
> > > > but web2py won't allow it.   I get
>
> > > > SyntaxError: keyword can't be an expression
>
> > > > I'm trying to avoid writing a Google Query Language query here but
> > > > maybe that's the right answer.
>
> > > > On Mar 1, 9:55 pm, Jim <[email protected]> wrote:
>
> > > > > I'm working towards implementing on Google App Engine.   Google
> > > > > stresses the importance of sharding counters in their 
> > > > > environment.http://code.google.com/appengine/articles/sharding_counters.html
>
> > > > > Why?   Well, Craigslist keeps a counter for their ads.  Every time
> > > > > someone posts an ad, that counter gets incremented.  As of a few weeks
> > > > > ago, they released Sphinx as their search engine, which means that you
> > > > > can use that 9-digit ad number in any city.
>
> > > > > On GAE, writes are slow - you might have to wait a second or more to
> > > > > write out one record.  Other records will have to wait for the first
> > > > > one to finish.   You can't run Craigslist on GAE.
>
> > > > > What does sharding counters mean?  As I interpret it, it means knowing
> > > > > "about" how many records you have.  Not exactly.   So you get 10 or 20
> > > > > sub-counters and write to one at random.   If you need to know about
> > > > > how many records you have, you total the 10 or 20 sub-counters to get
> > > > > an answer.  It's an approximate answer but if you've got a lot of
> > > > > data, hey, it's going to be close enough.   And you can get 10 or 20
> > > > > or 40 writes/second because each time you're grabbing a different
> > > > > counter.
>
> > > > > At least that's the theory.   I'm trying to translate the Google
> > > > > implementation into web2py.  I've got increment working except for
> > > > > memcache.  It's doing test0, test1, etc.  with the counts properly.
> > > > > get_count is not working and I'm having trouble figuring out
> > > > > why         counters = db(db.shards.name==name).select() is not
> > > > > returning any results.   That seems to be the most accurate way to
> > > > > translate GAE/webapp to web2py but my suspicion is it's failing
> > > > > because test0 is not equal to test.   I'm trying to avoid using two
> > > > > versions of routines - one for web2py w/o GAE, one with - but it might
> > > > > be necessary.
>
> > > > > def test_it():
> > > > >     count=get_count('test')
> > > > >     session.flash = count is + `count`
> > > > >     increment('test')
> > > > >     return
>
> > > > > def get_count(name):
> > > > >     """Retrieve the value for a given sharded counter.
>
> > > > >     Parameters:
> > > > >         name - The name of the counter
> > > > >     """
> > > > >     total = memcache.get(name)
> > > > >     if total is None:
> > > > >         print "none"
> > > > >         total = 0
> > > > >         counters = db(db.shards.name==name).select()
> > > > >         for counter in counters:
> > > > >             total += counter.count
> > > > >             print counter.name
> > > > >         memcache.add(name, str(total), 60)
> > > > >     return total
>
> > > > > def increment(name):
> > > > >     """Increment the value for a given sharded counter.
>
> > > > >     Parameters:
> > > > >     name - The name of the counter
> > > > >     """
>
> > > > >     index = random.randint(0, NUM_SHARDS - 1)
> > > > >     shard_name = name + str(index)
> > > > >     try:
> > > > >         counter = db(db.shards.name==shard_name).select()[0]
> > > > >         temp=counter.count+1
> > > > >         counter.update_record(count=temp)
> > > > >     except:
> > > > >         db.shards.insert(name=shard_name, count=1)
>
> > > > > #    memcache.incr(name)
>
> > > > > Eventually I hope to implement the version that allows for increasing
> > > > > of the number of shards.
>
> > > > > Thanks.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:17362] Re: sharding counters

Reply via email to