Hi, first: I think hbase is what you are looking for. If I understand correctly you want to show the customer his or her data very fast and let them manipulate their data. So you need something like a data warehouse system. Thus, hbase is the method of choice for you (and I think for your kind of data, hbase is a better choice than cassandra or mongoDB). But of course you need a running hadoop system to run a hbase. So it's not an either/or ;)
(my answers are for hbase, as I think it's what you are looking for. If you are not interested, just ignore the following text. Sry @all by writing about hbase on this list ;).) Am 01.10.2014 um 17:24 schrieb mani kandan: > 1) How much web usage data will a typical website like ours collect on a > daily basis? (I know I can ask our IT department, but I would like to > gather some background idea before talking to them.) well, if you have the option to ask your IT department you should do that, because everyone here would have to guess. You would have to explain very detailed what you have to do to let us guess. If you e.g. want to track the user on what he or she has clicked, perhaps to make personalized ads, than you have to save more data. So, you should ask the persons who have the data right away without guessing. > 3) How many clusters/nodes would I need to ​run a web usage analytics > system? in the book "hbase in action" there are some recommendations for some "case studies" (part IV "deploying hbase"). There are some thoughts on the number of nodes, and how to use them, depending on the size of your data > 4) What are the ways for me to use our data? (One use case I'm thinking > of is to analyze the error messages log for each page on quote process > to redesign the UI. Is this possible?) sure. And this should be very easy. I would pump the error log into a hbase table. By this method you could read the messages directly from the hbase shell (if they are few enough). Or you could use hive to query your log a little more "sql like" and make statistics very easy. > 5) How long would it take for me to set up and start such a system? for a novice who have to do it for the first time: for the stand alone hbase system perhaps 2 hours. For a complete distributed test cluster ... perhaps a day. For the real producing system, with all security features ... a little longer ;). > I'm sorry if some/all of these questions are unanswerable. I just want > to discuss my thoughts, and get an idea of what things can I achieve by > going the way of Hadoop. well, I think, but I could err, that you think of hadoop (or hbase) in a way that you just can change the "database backend" from "SQL" to "hbase/hadoop" and everything would run right away. This will not be that easy. You would have to change the code of your web application in a very fundamental way. You have to rethink all the table designs etc., so this could be more complicate than you think right know. However, hbase/hadoop hase some advantages which are very interesing for you. Well first, it is distributed, which enables your company to grow almost limitless, or to collect more data about your customers so you can get more informations (and sell more stuff). And map reduce is a wonderful tool for making real fancy "statistics", which is very interesting for an insurance company. Your mathematical economist will REALLY love it ;). Hope this helped. best wishes Wilm
