Please! There are lots of blogs etc. about the two, but very few head-to-head for a real use case.
----- Original Message ----- | From: "anil gupta" <[email protected]> | To: "[email protected]" <[email protected]> | Sent: Wednesday, November 28, 2012 11:01:55 AM | Subject: Re: Best practice for storage of data that changes | Hi Jeff, | At my workplace "Intuit", we did some detailed study to evaluate | HBase and Cassandra for our use case. I will see if i can post the | comparative study on my public blog or on this mailing list. | BTW, What is your use case? What bottleneck are you hitting at | current solutions? If you can share some details then HBase | community will try to help you out. | Thanks, | Anil Gupta | On Wed, Nov 28, 2012 at 9:55 AM, jeff l < [email protected] > | wrote: | | Hi, | | | I have quite a bit of experience with RDBMSs ( Oracle, Postgres, | | Mysql ) and MongoDB but don't feel any are quite right for this | | problem. The amount of data being stored and access requirements | | just don't match up well. | | | I was hoping to keep the stack as simple as possible and just use | | hdfs but everything I was seeing kept pointing to the need for some | | other datastore. I'll check out both HBase and Cassandra. | | | Thanks for the feedback. | | | On Sun, Nov 25, 2012 at 1:11 PM, anil gupta < [email protected] | | > | | wrote: | | | | Hi Jeff, | | | | | | My two cents below: | | | | | | 1st use case: Append-only data - e.g. weblogs or user logins | | | | | | As others have already mentioned that Hadoop is suitable enough | | | to | | | store append only data. If you want to do analysis of weblogs or | | | user logins then Hadoop is a suitable solution for it. | | | | | | 2nd use case: Account/User data | | | | | | First, of all i would suggest you to have a look at your use case | | | then analyze whether it really needs a NoSql solution or not. | | | | | | As you were talking about maintaining User Data in NoSql. Why | | | NoSql | | | instead of RDBMS? What is the size of data? Which NoSql features | | | are | | | the selling points for you? | | | | | | For real time read writes you can have a look at Cassandra or | | | HBase. | | | But, i would suggest you to have a very close look at both of | | | them | | | because both of them have their own advantages. So, the choice | | | will | | | be dependent on your use case. | | | | | | One added advantage with HBase is that it has a deeper | | | integration | | | with Hadoop ecosystem so you can do a lot of stuff on HBase data | | | using Hadoop Tools. HBase has integration with Hive querying but | | | AFAIK it has some limitations. | | | | | | HTH, | | | | | | Anil Gupta | | | | | | On Sun, Nov 25, 2012 at 4:52 AM, Mahesh Balija < | | | [email protected] > wrote: | | | | | | | Hi Jeff, | | | | | | | | | | As HDFS paradigm is "Write once and read many" you cannot be | | | | able | | | | to | | | | update the files on HDFS. | | | | | | | | | | But for your problem what you can do is you keep the | | | | logs/userdata | | | | in | | | | hdfs with different timestamps. | | | | | | | | | | Run some mapreduce jobs at certain intervals to extract | | | | required | | | | data | | | | from those logs and put it to Hbase/Cassandra/Mongodb. | | | | | | | | | | Mongodb read performance is quite faster also it supports | | | | ad-hoc | | | | querying. Also you can use Hadoop-MongoDB connector to | | | | read/write | | | | the data to Mongodb thru Hadoop-Mapreduce. | | | | | | | | | | If you are very specific about updating the hdfs files directly | | | | then | | | | you have to use any commercial Hadoop packages like MapR which | | | | supports updating the HDFS files. | | | | | | | | | | Best, | | | | | | | | | | Mahesh Balija, | | | | | | | | | | Calsoft Labs. | | | | | | | | | | On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada < | | | | [email protected] > wrote: | | | | | | | | | | | Hi Jeff, | | | | | | | | | | | | | | | Please look at [1] . You can store your data in HBase tables | | | | | and | | | | | query them normally just by mapping them to Hive tables. | | | | | Regarding | | | | | Cassandra support, please follow JIRA [2], its not yet in the | | | | | trunk | | | | | I suppose! | | | | | | | | | | | | | | | [1] https://cwiki.apache.org/Hive/hbaseintegration.html | | | | | | | | | | | | | | | [2] https://issues.apache.org/jira/browse/HIVE-1434 | | | | | | | | | | | | | | | Thanks, | | | | | | | | | | | | | | | On Sun, Nov 25, 2012 at 2:26 AM, jeff l < | | | | | [email protected] | | | | | > | | | | | wrote: | | | | | | | | | | | | | | | | Hi All, | | | | | | | | | | | | | | | | | | | | | I'm coming from the RDBMS world and am looking at hdfs for | | | | | | long | | | | | | term | | | | | | data storage and analysis. | | | | | | | | | | | | | | | | | | | | | I've done some research and set up some smallish hdfs | | | | | | clusters | | | | | | with | | | | | | hive for testing but I'm having a little trouble | | | | | | understanding | | | | | | how | | | | | | everything fits together and was hoping someone could point | | | | | | me | | | | | | in | | | | | | the right direction. | | | | | | | | | | | | | | | | | | | | | I'm looking at storing two types of data: | | | | | | | | | | | | | | | | | | | | | 1. Append-only data - e.g. weblogs or user logins | | | | | | | | | | | | | | | | | | | | | 2. Account/User data | | | | | | | | | | | | | | | | | | | | | HDFS seems to be perfect for append-only data like #1, but | | | | | | I'm | | | | | | having | | | | | | trouble figuring out what to do with data that may change | | | | | | frequently. | | | | | | | | | | | | | | | | | | | | | A simple example would be user data where various bits of | | | | | | information: email, etc may change from day to day. Would | | | | | | hbase | | | | | | or | | | | | | cassandra be the better way to go for this type of data, | | | | | | and | | | | | | can | | | | | | I | | | | | | overlay hive over all ( hdfs, hbase, cassandra ) so that I | | | | | | can | | | | | | query | | | | | | the data through a single interface? | | | | | | | | | | | | | | | | | | | | | Thanks in advance for any help. | | | | | | | | | | | | | | | | | | | | -- | | | | | | | | | | | | | | | Regards, | | | | | | | | | | | | | | | Bharath .V | | | | | | | | | | | | | | | w: http://researchweb.iiit.ac.in/~bharath.v | | | | | | | | | | | | | -- | | | | | | Thanks & Regards, | | | | | | Anil Gupta | | | | -- | Thanks & Regards, | Anil Gupta
