Why hbase need manual split?

2014-08-06 Thread Liu, Ming (HPIT-GADSC)
Hi, all, As I understand, HBase will automatically split a region when the region is too big. So in what scenario, user needs to do a manual split? Could someone kindly give me some examples that user need to do the region split explicitly via HBase Shell or Java API? Thanks very much.

Re: Why hbase need manual split?

2014-08-06 Thread Arun Allamsetty
Hi Ming, The reason why we have it is because the user can decide where each key goes. I can think multiple scenarios off the top of my head where it would be useful and others can correct me if I am wrong. 1. Cases where you cannot have row keys which are equally lexically distributed, leading

Re: Why hbase need manual split?

2014-08-06 Thread john guthrie
i had a customer with a sequence-based key (yes, he knew all the downsides for that). being able to split manually meant he could split a region that got too big at the end vice right down the middle. with a sequentially increasing key, splitting the region in half left one region half the desired

RE: Why hbase need manual split?

2014-08-06 Thread Liu, Ming (HPIT-GADSC)
Thanks Arun, and John, Both of your scenarios make a lot of sense to me. But for the sequence-based key case, I am still confused. It is like an append-only operation, so new data are always written into the same region, but that region will eventually reach the hbase.hregion.max.filesize and

Re: Why hbase need manual split?

2014-08-06 Thread john guthrie
to be honest, we were doing manual splits for the main reason that we wanted to make sure it was done on our schedule. but it also occurred to me that the automatic splits, at least by default, split the region in half. normally the idea is that both new halves continue to grow, but with a

RE: Why hbase need manual split?

2014-08-06 Thread Liu, Ming (HPIT-GADSC)
Thanks John, This is a very good answer, now I understand why you use manual split, thanks. And I have a typo in my previous post, The C is very close to A not to B-A/2. So every split in middle of key range will result a big region and a small region. So very bad. So HBase only do auto

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Hi Ted, Now I finished reading the filtering section and the source code of TestJoinedScanners(0.94). Facts learned: - While scanning, an entire row will be read even for a rowkey filtering. (Since a rowkey is not a physically separate entity and stored in KeyValue object, it's natural. Am I

Problem starting HBase0.98/Hadoop2 minicluster : Metrics source RetryCache/NameNodeRetryCache already exists!

2014-08-06 Thread anil gupta
Hi All, I am trying to run JUnit for SortingCoprocessor(HBase-7474) in HBase0.98. I am getting this error: *14/08/06 07:06:09 ERROR namenode.FSNamesystem: FSNamesystem initialization failed. org.apache.hadoop.metrics2.MetricsException: Metrics source RetryCache/NameNodeRetryCache already

Re: hbase memstore size

2014-08-06 Thread yonghu
I did not quite understand your problem. You store your data in HBase, and I guess later you also will read data from it. Generally, HBase will first check if the data exist in memstore, if not, it will check the disk. If you set the memstore to 0, it denotes every read will directly forward to

Re: hbase memstore size

2014-08-06 Thread Ted Yu
bq. HBase will first check if the data exist in memstore, if not, it will check the disk For read path, don't forget block cache / bucket cache. Cheers On Wed, Aug 6, 2014 at 7:54 AM, yonghu yongyong...@gmail.com wrote: I did not quite understand your problem. You store your data in HBase,

Re: hbase attack scenarios?

2014-08-06 Thread Andrew Purtell
We have no known vulnerabilities that equate to a SQL injection attack vulnerability. However, as Esteban says you'd want to treat HBase like any other datastore underpinning a production service and out of an abundance of caution deploy it into a secure enclave behind an internal service API, so

RE: Why hbase need manual split?

2014-08-06 Thread Rendon, Carlos (KBB)
You are just starting up a service and want the load split between multiple region servers from the start, instead of waiting for the manual splitting. Say you had 5 region servers, one way to create your table via HBase shell is like this create 'tablename', 'f', {NUMREGIONS = 5, SPLITALGO =

Re: Question on the number of column families

2014-08-06 Thread Qiang Tian
Hi, the description of hbase-5416 stated why it was introduced, if you only have 1 CF, dummy CF does not help. it is helpful for multi-CF case, e.g. putting them in one column family. And Non frequently ones in another. bq. Field name will be included in rowkey. Please read the chapter 9

Re: Question on the number of column families

2014-08-06 Thread Ted Yu
bq. While scanning, an entire row will be read even for a rowkey filtering If you specify essential column family in your filter, the above would not be true - only the essential column family would be loaded into memory first. Once the filter passes, the other family would be loaded. Cheers

Re: hbase attack scenarios?

2014-08-06 Thread Wilm Schumacher
Am 06.08.2014 um 19:07 schrieb Andrew Purtell: We have no known vulnerabilities that equate to a SQL injection attack vulnerability. However, as Esteban says you'd want to treat HBase like any other datastore underpinning a production service and out of an abundance of caution deploy it into

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Thank you Ted. But RowFilter class has no method that can be uses to set which column family is essential. (Actually no built-in filter class provides such a method) So, if I (ever) want to apply the 'dummy' column family technique(?), it seems that I must do as follows: - Write my own filter

Re: What is in a HBase block index entry?

2014-08-06 Thread Anoop John
It will be the key of the KeyValue. Key includes rk + cf + qualifier + ts + type. So all these part of key. Your annswer#1 is correct (but with addition of type also).. Hope this make it clear for you. -Anoop- On Tue, Aug 5, 2014 at 9:43 AM, innowireless TaeYun Kim

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Hi Qiang, thank you for your help. 1. Regarding HBASE-5416, I think it's purpose is simple. Avoid loading column families that is irrelevant to filtering while scanning. So, it can be applied to my 'dummy CF' case. That is, a dummy CF can act like an 'relevant' CF to filtering, provided that

Re: Question on the number of column families

2014-08-06 Thread Ted Yu
bq. no built-in filter intelligently determines which column family is essential, except for SingleColumnValueFilter Mostly right - don't forget about SingleColumnValueExcludeFilter which extends SingleColumnValueFilter. Cheers On Wed, Aug 6, 2014 at 9:34 PM, innowireless TaeYun Kim

RE: What is in a HBase block index entry?

2014-08-06 Thread innowireless TaeYun Kim
Thank you Anoop. Though it's a bit strange to include CF in the index, since all the block index is contained in a HFile for a specific CF, I'm sure there would be a good reason (maybe for the performance of the comparison). Anyways it should be almost no issue since the length of the CF should

Re: Question on the number of column families

2014-08-06 Thread Qiang Tian
Hi TaeYun, thanks for explain. On Thu, Aug 7, 2014 at 12:50 PM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: Hi Qiang, thank you for your help. 1. Regarding HBASE-5416, I think it's purpose is simple. Avoid loading column families that is irrelevant to filtering while

Guava version incompatible

2014-08-06 Thread Dai, Kevin
Hi, all I am now using spark to manipulate hbase. But I cant't use HBaseTestingUtility to do unit test. Because spark needs Guava 15.0 and above while Hbase needs Guava 14.0.1. These two versions are incompatible. Is there any way to solve this conflict with maven. Thanks, Kevin.

Re: Guava version incompatible

2014-08-06 Thread Deepa Jayaveer
Is there any tutorials available in the net to connect Spark Java API with HBase ? Thanks and Regards Deepa From: Dai, Kevin yun...@ebay.com To: user@hbase.apache.org user@hbase.apache.org Date: 08/07/2014 11:11 AM Subject:Guava version incompatible Hi, all I am now