Hi,
You need to decide what you want to use with the data - how do you
mine it to turn it into useful information? My guess is you don't need
to see live up to the minute stats, so could write the logs to a
database per day/week, run views against that (say counting the
different browsers etc) and dump the results of those views into
another database that you aggregate over a longer period (e.g. to give
you monthly/yearly stats). Depending on your requirements you could
drop the daily database after some time (or better yet, archive it off
somewhere). If you keep it around you can add new views against the
daily data at a later stage and just run the queries again, adding the
result into the aggregation database. CouchDB's replication would make
for a nice back up e.g. have a live instance on some HA servers that
gets the logs, which at the end of the day you replicate to a slower
but backed up instance that just holds the archive, or something.
If you're parsing server logs to generate your documents you might be
better off skipping the per access document and just record daily high
level stats. It depends on how well you know what information you want
to access from the logs, and how likely that is to change with time.
If you're concerned about data volume make sure you do regular house
cleaning - compact the db, clean up views etc.
Cheers
Simon
On 22 Oct 2010, at 13:25, Fabio Batalha Cunha dos Santos wrote:
Hello All!
I'm new with couchdb, I'm doing some experiences to create a tool to
store
access logs into couchdb. URL:
http://github.com/fabiobatalha/Analytics---CouchDB
I have some doubts like How is the best way to store this kind of
information in couchdb and if is it viable?. I have created a
database and
start to registry all access log of our website. In almost 14 hours
the
database reach 0.5Gb with 70.000 registers, I'm estimating that in
24 hours
the database will reach 1.0Gb of stored data.
According with the sample document that I'm registering and couchDB
performance. Will I be able to create statistics logs with this
information
using the couchDB / map reduce, thinking that probably in 1 year the
database will reach something around 400Gb of stored data.
This is a sample of one register:
{
"_id": "0007f561f96d61cf7744b987895e1ef0",
"_rev": "1-e6b1dcdcbe0b2c6fde0dec5b4bfc41a8",
"instance": "scielo",
"date": "20101021",
"time": "1832",
"url": "http://www.scielo.br/scielo.php?pid=S0080-62342003000300002&script=sci_arttext
",
"host": "www.scielo.br",
"urlParams": {
"pid": "S0080-62342003000300002",
"script": "sci_arttext"
},
"referrer": "http://www.google.com.br/url?sa=t&source=web&cd=1&ved=0CBYQFjAA&url=http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid%3DS0080-62342003000300002%26script%3Dsci_arttext&rct=j&q=qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F&ei=XqPATLGMFYL-8AawmvXbBg&usg=AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA
",
"referrerParams": {
"sa": "t",
"source": "web",
"cd": "1",
"ved": "0CBYQFjAA",
"url": "http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid
%3DS0080-62342003000300002%26script%3Dsci_arttext",
"rct": "j",
"q": "qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F",
"ei": "XqPATLGMFYL-8AawmvXbBg",
"usg": "AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA"
},
"appCodeName": "Mozilla",
"appVersion": "5.0 (Windows; pt-BR)",
"language": "pt-BR",
"platform": "Win32",
"product": "Gecko",
"userAgent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR;
rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11 ( .NET CLR 3.5.30729)",
"vendor": "",
"vendorSub": ""
}
Thanks in advance for any guidance, comments and suggestions.
Fabio Batalha