Hi, I have quite a bit of experience with RDBMSs ( Oracle, Postgres, Mysql ) and MongoDB but don't feel any are quite right for this problem. The amount of data being stored and access requirements just don't match up well.
I was hoping to keep the stack as simple as possible and just use hdfs but everything I was seeing kept pointing to the need for some other datastore. I'll check out both HBase and Cassandra. Thanks for the feedback. On Sun, Nov 25, 2012 at 1:11 PM, anil gupta <[email protected]> wrote: > Hi Jeff, > > My two cents below: > > 1st use case: Append-only data - e.g. weblogs or user logins > As others have already mentioned that Hadoop is suitable enough to store > append only data. If you want to do analysis of weblogs or user logins then > Hadoop is a suitable solution for it. > > > 2nd use case: Account/User data > First, of all i would suggest you to have a look at your use case then > analyze whether it really needs a NoSql solution or not. > As you were talking about maintaining User Data in NoSql. Why NoSql > instead of RDBMS? What is the size of data? Which NoSql features are the > selling points for you? > > For real time read writes you can have a look at Cassandra or HBase. But, > i would suggest you to have a very close look at both of them because both > of them have their own advantages. So, the choice will be dependent on your > use case. > > One added advantage with HBase is that it has a deeper integration with > Hadoop ecosystem so you can do a lot of stuff on HBase data using Hadoop > Tools. HBase has integration with Hive querying but AFAIK it has some > limitations. > > HTH, > Anil Gupta > > > On Sun, Nov 25, 2012 at 4:52 AM, Mahesh Balija <[email protected] > > wrote: > >> Hi Jeff, >> >> As HDFS paradigm is "Write once and read many" you cannot be able >> to update the files on HDFS. >> But for your problem what you can do is you keep the >> logs/userdata in hdfs with different timestamps. >> Run some mapreduce jobs at certain intervals to extract required >> data from those logs and put it to Hbase/Cassandra/Mongodb. >> >> Mongodb read performance is quite faster also it supports ad-hoc >> querying. Also you can use Hadoop-MongoDB connector to read/write the data >> to Mongodb thru Hadoop-Mapreduce. >> >> If you are very specific about updating the hdfs files directly >> then you have to use any commercial Hadoop packages like MapR which >> supports updating the HDFS files. >> >> Best, >> Mahesh Balija, >> Calsoft Labs. >> >> >> >> On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada < >> [email protected]> wrote: >> >>> Hi Jeff, >>> >>> Please look at [1] . You can store your data in HBase tables and query >>> them normally just by mapping them to Hive tables. Regarding Cassandra >>> support, please follow JIRA [2], its not yet in the trunk I suppose! >>> >>> [1] https://cwiki.apache.org/Hive/hbaseintegration.html >>> [2] https://issues.apache.org/jira/browse/HIVE-1434 >>> >>> Thanks, >>> >>> >>> On Sun, Nov 25, 2012 at 2:26 AM, jeff l <[email protected]> wrote: >>> >>>> Hi All, >>>> >>>> I'm coming from the RDBMS world and am looking at hdfs for long term >>>> data storage and analysis. >>>> >>>> I've done some research and set up some smallish hdfs clusters with >>>> hive for testing but I'm having a little trouble understanding how >>>> everything fits together and was hoping someone could point me in the right >>>> direction. >>>> >>>> I'm looking at storing two types of data: >>>> >>>> 1. Append-only data - e.g. weblogs or user logins >>>> 2. Account/User data >>>> >>>> HDFS seems to be perfect for append-only data like #1, but I'm having >>>> trouble figuring out what to do with data that may change frequently. >>>> >>>> A simple example would be user data where various bits of information: >>>> email, etc may change from day to day. Would hbase or cassandra be the >>>> better way to go for this type of data, and can I overlay hive over all ( >>>> hdfs, hbase, cassandra ) so that I can query the data through a single >>>> interface? >>>> >>>> Thanks in advance for any help. >>>> >>> >>> >>> >>> -- >>> Regards, >>> Bharath .V >>> w:http://researchweb.iiit.ac.in/~bharath.v >>> >> >> > > > -- > Thanks & Regards, > Anil Gupta >
