Hi,
You need to decide what you want to use with the data - how do you mine it to turn it into useful information? My guess is you don't need to see live up to the minute stats, so could write the logs to a database per day/week, run views against that (say counting the different browsers etc) and dump the results of those views into another database that you aggregate over a longer period (e.g. to give you monthly/yearly stats). Depending on your requirements you could drop the daily database after some time (or better yet, archive it off somewhere). If you keep it around you can add new views against the daily data at a later stage and just run the queries again, adding the result into the aggregation database. CouchDB's replication would make for a nice back up e.g. have a live instance on some HA servers that gets the logs, which at the end of the day you replicate to a slower but backed up instance that just holds the archive, or something.

If you're parsing server logs to generate your documents you might be better off skipping the per access document and just record daily high level stats. It depends on how well you know what information you want to access from the logs, and how likely that is to change with time.

If you're concerned about data volume make sure you do regular house cleaning - compact the db, clean up views etc.
Cheers
Simon


On 22 Oct 2010, at 13:25, Fabio Batalha Cunha dos Santos wrote:

Hello All!

I'm new with couchdb, I'm doing some experiences to create a tool to store
access logs into couchdb. URL:
http://github.com/fabiobatalha/Analytics---CouchDB

I have some doubts like How is the best way to store this kind of
information in couchdb and if is it viable?. I have created a database and start to registry all access log of our website. In almost 14 hours the database reach 0.5Gb with 70.000 registers, I'm estimating that in 24 hours
the database will reach 1.0Gb of stored data.

According with the sample document that I'm registering and couchDB
performance. Will I be able to create statistics logs with this information
using the couchDB / map reduce, thinking that probably in 1 year the
database will reach something around 400Gb of stored data.

This is a sample of one register:

{
  "_id": "0007f561f96d61cf7744b987895e1ef0",
  "_rev": "1-e6b1dcdcbe0b2c6fde0dec5b4bfc41a8",
  "instance": "scielo",
  "date": "20101021",
  "time": "1832",
"url": "http://www.scielo.br/scielo.php?pid=S0080-62342003000300002&script=sci_arttext ",
  "host": "www.scielo.br",
  "urlParams": {
      "pid": "S0080-62342003000300002",
      "script": "sci_arttext"
  },
"referrer": "http://www.google.com.br/url?sa=t&source=web&cd=1&ved=0CBYQFjAA&url=http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid%3DS0080-62342003000300002%26script%3Dsci_arttext&rct=j&q=qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F&ei=XqPATLGMFYL-8AawmvXbBg&usg=AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA ",
  "referrerParams": {
      "sa": "t",
      "source": "web",
      "cd": "1",
      "ved": "0CBYQFjAA",
"url": "http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid %3DS0080-62342003000300002%26script%3Dsci_arttext",
      "rct": "j",
      "q": "qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F",
      "ei": "XqPATLGMFYL-8AawmvXbBg",
      "usg": "AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA"
  },
  "appCodeName": "Mozilla",
  "appVersion": "5.0 (Windows; pt-BR)",
  "language": "pt-BR",
  "platform": "Win32",
  "product": "Gecko",
  "userAgent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR;
rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11 ( .NET CLR 3.5.30729)",
  "vendor": "",
  "vendorSub": ""
}

Thanks in advance for any guidance, comments and suggestions.
Fabio Batalha

Reply via email to