Hi Viktors, I noticed you mentioned the following two things:
> - several column families on one date/time are useful > - and different tables for different level of aggregation (hour, date, week, month, year) Could you please explain: - why multiple CFs on one date/time are good (better than 1)? - why store different levels of aggregation to separate tables instead of just 1 table? Thanks Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Viktors Rotanovs <[email protected]> > To: [email protected] > Sent: Mon, May 24, 2010 7:32:26 PM > Subject: Re: Using HBase for logging > > I'm using HBase for similar stats, some things I've learned: - date/time as > key is good because that way it's very easy to get last N results (for a > chart, for example), and it's much more scalable than timestamps - > several column families on one date/time are useful - and different > tables for different level of aggregation (hour, date, week, month, year) > - you can increment long values when you need to know total: > href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue" > > target=_blank > >http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[], byte[], > byte[], long) - MR jobs are a good and scalable way of processing this type > of data - data size is unlimited, so it's fine to write to multiple > tables - optimize for reads you're going to make, not for writes. To > import some of our logs, I'm using a java program which is called via > logrotate every 10 minutes (but be careful with that one, because if hbase > client freezes like happened to me after 0.20.4 upgrade, memory can get > filled very quickly). There's also a Python project for analytical data: > > >http://github.com/zohmg/zohmg Hope that helps, -- > Viktors On Tue, May 25, 2010 at 12:44 AM, Alex Thurlow < > ymailto="mailto:[email protected]" > href="mailto:[email protected]">[email protected]> wrote: > Hi > list, > With HBase's great write speed, I was thinking it would be a > good thing > to switch an app that logs to a database to logging to HBase. > I couldn't > really find anyone else who's using it that way though. Are > there reasons I > shouldn't? If I should, how should I structure my > data? > > It's basically going to be data for an ad server, so the > relevant stuff > would be the timestamp, the id of the ad placement, and > the id of the > creative that showed. Some other data would be stored, > but I wouldn't need > to search on it. > > I would be wanting > to make reports out of that data by date, date/placement > id, > date/creative id, date/placementid/creativeid > > Should I just log > with the timestamp as the key and then pull the whole > range and filter > when I need the data or should I log everything three times > so I can > pull by whichever key I need? > > I'm fairly new to HBase, although > I've used Cassandra some, so I have an > idea of how this kind of works. > I just can't quite get my head around the > right way to use it for this > purpose. > > Thanks, > > -Alex > > -- > target=_blank >http://rotanovs.com - personal blog | > href="http://www.hitgeist.com" target=_blank >http://www.hitgeist.com > - fastest growing websites
