Hi, The first part of my answer is not CouchDB specific. All of the big analytics systems that I have ever built or seen at my clients' have used queues. Since, as you know, analytics can have such a high write rate you would be crazy to try and persist each transaction to disk (which is what databases do). Instead send them to a queue where they can sit and you can consume them at your own leisure.
If you don't want to host your own queue, then take a look at Amazon Simple Queue Service. Now, for the CouchDB part. Have each transaction be its own document. Yes, even if you are tracking the same type of action for the same resource (URL). You no longer live in a locking world, so this is the most straight forward approach. Now you can build views that use actions, resources, or whatever other piece of data that you want. More information at http://guide.couchdb.org/draft/recipes.html Given the write rate of analytics systems you would be right to worry about view build time. That's why you have the queue: you can control the write rate in CouchDB. You can also just build views once per night (or whatever), and ALWAYS query with ?stale=ok so you don't kick off a view build at read time. There's a bunch more land mines, but these are the basics and should get you on your way. :) -- Sam Bisbee On Thu, Jun 2, 2011 at 5:34 AM, [email protected] <[email protected]> wrote: > Hi everyone, > > I came across couchdb a couple of weeks back & got really excited by > the fundamental change it brings by simply taking the app-server out > of the picture. > Must say, kudos to the dev team! > > I am planning to write a quick analytics solution for my website - > something on the lines of Google analytics - which will measure > certain properties of the visitors hitting our site. > > Since this is my first attempt at a JSON style document store, I > thought I'll share the architecture & see if I can make it better (or > correct my mistakes before I do them) :-) > > - For each unique visitor, create a document with his session_id as the doc.id > - For each property i need to track about this visitor, I create a > key-value pair in the doc created for this visitor > - If visitor is a returning user, use the session_id to re-open his > doc & keep on modifying the properties > - At end of each calculation time period (say 1 hour or 24 hours), I > run a cron job which fires the map-reduce jobs by requesting the views > over curl/http. > > A couple of questions based on above architecture... > We see concurrent traffic ranging from 2k users to 5k users. > - Would a couchdb instance running on a good machine (say High CPU > EC2, medium instance) work well with simultaneous writes happening... > (visitors browsing, properties changing or getting created) > - With a couple of million documents, would I be able to process my > views without causing any significant impact to write performance? > > I think my questions might be biased by the fact that I come from a > MySQL/Rails background... :-) > > Let me know how you guys think about this. > > Thanks in advance, > -- > Mayank > http://adomado.com >
