Thank you guys very much for your responses. I want to provide a bit more information based on our evaluation of the use case. What we are looking for are: 1) capability to serve real time query with low latency 2) a lot of frequent random reads 3) capability of building indexes for better reading performance 4) distributed key-value store 5) scale linearly. We hope to server more traffic by adding more servers. 6) data are replicated 7) availability (24-7)
What we do not really care are: 1) the ability to serve complex arbitrary queries 2) great write throughput ( this is for on line serving purpose only, no off line computation will be done on this system.) 3) great performance of sequential reads as our query is to serve different customers, so the it will be a lot of small random queries, but not range scan. 4) data consistency. We don't really care if the data is fully consistent or eventually consistent. And there will not be any writes to the database during the day time. With all the above said, now we tent to favor MongoDB over HBase because of two main advantage that MongoDB has: 1) Indexing With MongoDB allows building multiple indexes on one collection. On the other hand, HBase does not maintain index itself. Application has to do it. With our use case, we do not need complex query, but we still like the database to maintain one or two index for us. 2) Reading performance of lots of frequent small read As it is to serve an on-line system, quick response time is the key. We feel that we do not have a great need of sequential read. And we do not really care about write throughput since the system is only updated around night time. However, random reads is very critical. One miner more advantage of MongoDB over HBase is: Learning curve and operational maintainability Based on some reading, it seems to us that HBase is more difficulty to operate due to its configuration complexity. As Li pointed out there have been some great recommendation systems built on top of HBase like what Stumble Upon has, the main difference that our recommendation database has is that it is more like a key-value distributed store that serves real-time traffic. There will be no off-line processing done on this particular system. At the beginning, we plan to only have one table in the database. A simple example of what we like to insert into the database is userid:1234, recommendations: p1, p2, p3 userid:2345, recommendations: p3, p4, p6 The table will be indexed based on used_id. A secondary index based on purchase history is also built. The reason why we are considering Non-SQL solution is that we do not really need relational database. What we really need is a low latency, distributed key-value store with some ability of indexing. So we decide to dive into the Non-SQL world. As we are fairly new to the community, we are very appreciated the information that you guys provided. ~bwing On Fri, Sep 23, 2011 at 10:02 AM, Jean-Daniel Cryans <[email protected]>wrote: > I think it really depends on how "small" the e-commerce website is and > what "scale" you think you'll reach. Do you really need to learn about > a new DB or would MySQL fit your needs for some time? As much as I > like pushing HBase as a solution, I've witnessed enough people trying > to learn it while building their product because they thought they > needed the scale, whereas that time could have been better spent on > the product itself. > > My free advice would be to just design your system in a way that you > can easily swap-in any non-relational DB. > > J-D > > On Thu, Sep 22, 2011 at 10:55 PM, bwing <[email protected]> wrote: > > We are searching for a Non-SQL solution to build a system behind a > > small e-commerce web site that pushes recommendations to our end > > users. The recommendations will be computed elsewhere and loaded up > > to the system on a daily basis. We are comparing with MongoDB and H- > > Base as two alternative solutions. We understand that MongoDB is good > > for random small reads and writes where each read better covers only a > > few records. On the other hand, HBase is good at reading a large > > number of records with few disk-seeks. > > > > As it is a recommendation system for a e-commerce web site, a quick > > response time is critical. We probably only have a few records to > > recommend at the beginning, but we would like the system to be able to > > scale. With all these said, we felt that HBase might be a good way to > > go. But we would like to hear opinions from HBase user groups. > > We would love to know if any one had similar use cases and what are > > the experiences. Any suggestions are appreciated. > > ~bwing > > >
