(Though if it is only 7 GB, why not just store it in memory?) On Sunday, August 28, 2016, Dima Spivak <dspi...@cloudera.com> wrote:
> If your data can all fit on one machine, HBase is not the best choice. I > think you'd be better off using a simpler solution for small data and leave > HBase for use cases that require proper clusters. > > On Sunday, August 28, 2016, Manish Maheshwari <mylogi...@gmail.com > <javascript:_e(%7B%7D,'cvml','mylogi...@gmail.com');>> wrote: > >> We dont want to invest into another DB like Dynamo, Cassandra and Already >> are in the Hadoop Stack. Managing another DB would be a pain. Why HBase >> over RDMS, is because we call HBase via Spark Streaming to lookup the >> keys. >> >> Manish >> >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak <dspi...@cloudera.com> >> wrote: >> >> > Hey Manish, >> > >> > Just to ask the naive question, why use HBase if the data fits into >> such a >> > small table? >> > >> > On Sunday, August 28, 2016, Manish Maheshwari <mylogi...@gmail.com> >> wrote: >> > >> > > Hi, >> > > >> > > We have a scenario where HBase is used like a Key Value Database to >> map >> > > Keys to Regions. We have over 5 Million Keys, but the table size is >> less >> > > than 7 GB. The read volume is pretty high - About 50x of the >> put/delete >> > > volume. This causes hot spotting on the Data Node and the region is >> not >> > > split. We cannot change the maxregionsize parameter as that will >> impact >> > > other tables too. >> > > >> > > Our idea is to manually inspect the row key ranges and then split the >> > > region manually and assign them to different region servers. We will >> > > continue to then monitor the rows in one region to see if needs to be >> > > split. >> > > >> > > Any experience of doing this on HBase. Is this a recommended approach? >> > > >> > > Thanks, >> > > Manish >> > > >> > >> > >> > -- >> > -Dima >> > >> > > > -- > -Dima > > -- -Dima