Hi Jeremy, pinterest is using it for their feeds: http://www.slideshare.net/cloudera/case-studies-session-3a http://www.slideshare.net/cloudera/operations-session-1
Not sure on their dataset size, they are doing cluster level replication for DR. We based our architecture on their success (cluster in each az, multi master replication between them for DR, flume & api's watch zookeeper znodes for which cluster to talk too-- talk to one cluster at a time and we control flips between them for maintenance/DR). Our use case is retrieving social data ingested from twitter/fb/etc. when customer facing applications hit our social api. In terms of team size there are many variables - If you are running your own metal there would be more work around networking/rack+stack+cabling/provisioning os/etc. unless this is provided by another dept already - Do you have an hbase expert or DBA in house already? Or are your developers going to take on learning schema design and tuning the cluster? - Do you have sysadmins/devops available to write puppet/chef/ansible for provisioning this cluster (and dev/qa enviornments) and performing upgrades/etc. moving forward? - Do you have a NOC & monitoring already in place for other pieces of infra that will take on monitoring cluster health and responding to alerts/failed disk/regionservers/etc. You may want to check out previous hbasecon and hadoop summit videos, lots of presentations will talk about or at least mention their dataset size and use case: - https://www.youtube.com/user/HadoopSummit - http://hbasecon.com/archive.html All the best, -- Iain Wright This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message. On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <[email protected]> wrote: > Hey all, > > So, I'm currently evaluating HBase as a solution for querying a very large > data set (think 60+ TB). We'd like to use it to directly power a > customer-facing product. My question is threefold : > > 1) What companies use HBase to serve a customer-facing product? I'm not > interested in evaluations, experiments, or POC. I'm also not interested in > offline BI or analytics. I'm specifically interested in cases where HBase > serves as the data store for a customer-facing product. > > 2) Of the companies that use HBase to serve a customer-facing product, > which ones use it to query data sets of 60TB or more? > > 3) Of companies use HBase to query 60+ TB data sets and serve a > customer-facing product, how many employees are required to support their > HBase installation? In other words, if I were to start a team tomorrow, > and their purpose was to maintain a 60+ TB HBase installation for a > customer-facing product, how many people should I hire? > > 4) Of companies use HBase to query 60+ TB data sets and serve a > customer-facing product, what kind of measures do they take for disaster > recovery? > > If you can, please point me to articles, videos, and other materials. > Obviously, the larger the company, the better case it will make for HBase. > > Thank you! >
