Though I have not looked at it myself but you can run hbase as a long running process on Yarn (apache slider). As far as I understand, you can have an instance of any size with flexibility of growing and shrinking.
Artem Ervits Data Analyst New York Presbyterian Hospital ----- Original Message ----- From: Arun Allamsetty [mailto:[email protected]] Sent: Monday, July 07, 2014 08:37 PM To: [email protected] <[email protected]> Subject: Re: Using HBase in standalone mode in production I have never tried MySQL's blob or varbinary. I guess I can look into that. Thanks for answering my questions. Arun On Jul 7, 2014 6:22 PM, "Dima Spivak" <[email protected]> wrote: > Does MySQL's BLOB or VARBINARY satisfy your use case? > > As for converting a pseudo-distributed cluster to a distributed one, unless > I'm mistaken, you should have no problem doing so. HDFS is quite good with > scaling, whether it's from 10 machines to 20 or 1 to 10 and I don't know of > any reason that HBase would cause any problems in this regard. > > -Dima > > > On Mon, Jul 7, 2014 at 5:09 PM, Arun Allamsetty <[email protected] > > > wrote: > > > I understand. But for example, my use case is where even if I don't have > a > > lot of data, what if I would rather store serialized objects. For this > > traditional RDBMS are not suitable. If I can forego the fail safe > > capabilities, then what is a good choice (if not HBase). > > > > Also, on a different note, if I have a HBase installation in pseudo > > distributed mode, then can I convert it into a distributed setup by > adding > > more machines without any loss in data? > > > > Thanks, > > Arun > > On Jul 7, 2014 6:02 PM, "Dima Spivak" <[email protected]> wrote: > > > > > In general, production systems run in distributed mode because they > > > leverage HBase's scalability and reliability; HBase really only shows > its > > > worth when it's charged with managing terabytes of data on a > > fault-tolerant > > > file system like HDFS. You lose both of these when you run in > standalone > > > mode, so I'd be a bit worried about using such a setup for any > production > > > use. > > > > > > -Dima > > > > > > > > > On Mon, Jul 7, 2014 at 4:25 PM, Arun Allamsetty < > > [email protected] > > > > > > > wrote: > > > > > > > Hi Ted, > > > > > > > > I have. So the book says there are two types of distributed modes. > One > > is > > > > pseudo distributed, which is used when we want to test HBase's > > > distributed > > > > capabilities using a single machine. As far as I understood, this is > > just > > > > to verify the use cases and the requirements. Then we have the fully > > > > distributed mode in which HBase can be installed over multiple > > machines. > > > > > > > > I understand both the scenarios. But what if my application is not > > large > > > > enough to leverage the distributed mode and the pseudo distributed > mode > > > is > > > > pretty much for a PoC. Since the pseudo distributed mode won't be > able > > to > > > > provide any fault tolerance, can one use the standalone mode in > > > production. > > > > > > > > I hope my question is clear even if it does not make much sense. > > > > > > > > Thanks, > > > > Arun > > > > On Jul 7, 2014 5:17 PM, "Ted Yu" <[email protected]> wrote: > > > > > > > > > Have you read http://hbase.apache.org/book.html#standalone_dist ? > > > > > > > > > > Cheers > > > > > > > > > > > > > > > On Mon, Jul 7, 2014 at 3:55 PM, Arun Allamsetty < > > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > So this question might be stupid, retarded even, but it has been > > > > bugging > > > > > me > > > > > > for a while and I cannot think of a better place to ask this. I > am > > > > really > > > > > > impressed with the way HBase works (as a key-value store). Since > it > > > > > stores > > > > > > everything as a byte array, I find it really convenient to store > > > > > serialized > > > > > > objects. Also, I understand that HBase is supposed to be used > when > > > you > > > > > have > > > > > > too much data to be handled by a single machine, so we can scale > > our > > > > > > application by running it in distributed mode. > > > > > > > > > > > > But what if I want to use it because its HashMap kind of > > capabilities > > > > > with > > > > > > an added feature to track versions. Is it recommended that I use > it > > > > for a > > > > > > small application (in standalone mode) with maybe 100K users and > > > > storage > > > > > > needs which probably won't exceed 100G. > > > > > > > > > > > > I know it is never recommended to be used as a transactional > > database > > > > (I > > > > > > have read that in a million places) but I would like to know more > > > about > > > > > it. > > > > > > > > > > > > Thanks, > > > > > > Arun > > > > > > > > > > > > > > > > > > > > > ________________________________ This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you. This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.
