Hi Eric, Thanks for sharing the application. I have two questions about your scenario:
1) It looks like you tapped Chukwa's monitoring logs directly into an HBase table, how big is your HBase cell (how many servers) and what is your throughput of incoming log stream? 2) It looks like you have not done the MapReduce part though you have read the javadoc, right? If that is the case, have you thought of the case that a heavy MapReduce analytics job pegs the HBase cell so heavily that its query serving degragates so much that the end user experience becomes so bad (i.e., the query latency becomes so high because of data crunching)? What I am thinking of is the following scenario: -- 1) I want to store my hourly web traffic into a fact table hourly into Table A -- 2) I want to invoke map-reduce to generate aggregated table like trends/web-usage-summary into Table B -- 3) I want to serve end user's query from Table B. Thanks, Sean On Sat, Jul 3, 2010 at 4:53 PM, Eric Yang <[email protected]> wrote: > Hi Sean, > > I am writing an interface for Chukwa to inject data directly into > hbase and relay on hbase to index my data by time group/row key. It > is working fine for me. I could tap into the realtime data sink table > to monitor the data arrival and create simple visualization. The only > minor problem is by default the cell has return the most recent three > revisions back to me instead of 60 versions that I put into the > system. I am sure it's something simple that I missed. > > The next step is to use TableInput and TableOutput for mapreduce to > process analytic computation for my large time series trends. From > what I gather from hbase javadoc, it looks very promising and simple > to implement. With hbase manages the file structures, indexing, and > roll up of files, it is bring chukwa one step closer to become a real > time monitoring and reporting application for hadoop. Being a silent > observer on hbase, I waited 2 years for big table like storage for > hadoop ecosystem, and hbase is the closest in obtaining this goal. > > Running mapreduce job on hbase is unlikely to be a real time system, > since there is a lot of bytes transferring between mapreduce and > hbase. However, if you only need to have near real time experience, > like running mapreduce job every 5-30 minutes. Then it is certainly > in the realm of possibility. > > regards, > Eric > > On Sat, Jul 3, 2010 at 2:42 PM, Sean Bigdatafun > <[email protected]> wrote: > > I read a thread "Use cases of Hbase" in March archive, and several people > > seemed to suggest that an HBase cell can be used as a mixed cell for data > > crunching and online serving (i.e, using Hive Hbase client to do the > > analytics part while serving live query, see > > http://osdir.com/ml/hbase-user-hadoop-apache/2010-03/msg00299.html), did > > someone really have such successful story? I am a little doubtful about > that > > idea. > > > > Someone else also implied such use case "Since 0.20.0, results of > analytic > > computations over the data can be materialized and served out in real > time > > in response to queries. This is a complete solution." > > > > Can someone share the experience on such an option? > > > > Thanks, > > Sean > > >
