Re: How to avoid major compaction during restart?
Er, to I made a mistake in the above question ; the issue is not so much the major compaction but rather that during restart (as nodes go up / down), Hadoop and HBase attempt to rebalance blocks and regions, causing unnecessary movement. So what I'm actually looking for is a way to avoid the balancing for the duration of the restart, which would avoid the need for major compaction afterwards. Marcell On Thu, Jun 28, 2018 at 12:55 PM, Marcell Ortutay wrote: > Hi all, > > I'm interested in ways to avoid a major compaction when restarting all the > HBase region servers in a cluster (for example, for a version upgrade). Are > there any recommended techniques for achieving this? > > Thanks, > Marcell > >
How to avoid major compaction during restart?
Hi all, I'm interested in ways to avoid a major compaction when restarting all the HBase region servers in a cluster (for example, for a version upgrade). Are there any recommended techniques for achieving this? Thanks, Marcell
Re: How to improve HBase read performance.
This ticket: https://issues.apache.org/jira/browse/HBASE-20459 was fixed in the latest version of HBase, upgrading to latest may help with performance On Wed, May 16, 2018 at 3:55 AM, Kang Minwoowrote: > Hi, Users. > > I store a lot of logs in HBase. > However, the reading speed of the log is too slow compared to the Hive ORC > file. > I know that HBase is slow compared to the Hive ORC file. > The problem is that it is too slow. > HBase is about 6 times slower. > > Is there a good way to speed up HBase's reading speed? > Should I put a lot of servers? > > I am using HBase 1.2.6. > > Best regards, > Minwoo Kang >
HBase scans seem slow, compute bound. How to improve?
I'm new to HBase and looking at some performance testing for my use case. I've noticed that HBase scans seem "slow" compared to machine capabilities. Here is a bit more detail on the testing I am running. I have loaded 3 test tables into HBase and sqlite3 for comparison. I'm using sqlite3 as a stand-in for what the "peak" performance can be for this operation. For HBase running a 2 node (1 name node / 1 data note) cluster on EMR with m3.2xlarge instances. The test tables each have 1 million rows with data like this: (1) 1 bigint column, 1 float column (2) 1 bigint column, 1 float column, 100 bytes of filler data (3) 1 bigint column, 1 float column, 1000 bytes of filler data I randomized the filler data to attempt to limit the effects of compression, and also ran tests with compression turned off, but that didn't seem to have much impact. I ran the following test queries on both HBase and sqlite3: (a) SELECT count(*) FROM table WHERE val > .5 (b) SELECT count(*) FROM table WHERE filler like '%x%' In each case I ran the query twice to account for block cache in HBase. Below are the performance numbers on the 2nd (block cached) run: HBase: 1a: 1.373s 2a: 1.538s 2b: 3.582s 3a: 0.98s 3b: 11.354s sqlite3: 1a: 0.156000s 2a: 0.212000s 2b: 0.66s 3a: 0.252000s 3b: 4.364000s In each case, sqlite3 performs much better (2x-9x) than HBase for an equivalent operation. I ran some rudimentary profiling on HBase and it seems like the bulk of the time is spent in this function https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6485 so I'm guessing the computation in there is taking a long time. I have two questions that I'm hoping to get some guidance on: (1) Is this expected performance for HBase range scans? I'm told that HBase is optimized for random key access, not range scans, so perhaps this performance is a result of that tradeoff? (2) Is there anything I can do the improve HBase range scan performance in terms of configuration, data layout, etc.? A few other notes: - I'm using Phoenix for the SQL layer on top of HBase, but my profiling revealed that the limiting factor is HBase, specifically the function cited above - The table sizes are 33MB, 130MB and 990MB in HBase and similar in sqlite3. Thanks, Marcell
HBase range scan seems slow?
Hi all, I'm fairly new to HBase and was a bit surprised by performance I am seeing for a range scan. I am running a range scan over ~3 million rows in an HBase cluster with 4 region servers, each fairly large instances on AWS (24 HDD). I'm pulling a single float value from each row and computing the average. When I run this range scan, it takes ~.5sec to execute, and there is not much performance improvement. This seems long to me. 3 million floats should take maybe 10-20MB to read from disk and transfer, and it should be much faster the second time around since the data supposed to be in memory in block cache at that point. Additionally, I tried running a number of these range scans concurrently against HBase. Again, the performance seemed worse than I expected. The average execution time goes up quite a bit at what seems like low QPS. For example at 1 QPS, the average response time is several seconds. Are these performance numbers typical? or is there some user error that is causing them to be worse than normal? Thanks, Marcell
Re: Want to change key structure
Thanks for the info Anil. I first tried a MR which did Put's, based on the examples at [1] but this was much too slow, as you said. I switching to writing HFiles directly via HFileOutputFormat solves the issue. Also, I wanted to post an issue I ran into, in case anyone runs into it in the future. For a table re-write doing a reduce can be bad, because the MR framework will try to sort the whole table, potentially multiple TB. You can avoid this by calling job.setNumReduceTasks(0). However, if you use HFileOutputFormat.configureIncrementalLoad(), that call will also set up the reducer, which may be a bit surprising (at least it was to me). So the order matters: // This will have a (potentially long) reduce phase. Bad for large tables. job.setNumReduceTasks(0); HFileOutputFormat.configureIncrementalLoad(job, hTable); // Overrides # of reduce tasks Instead this works better for large tables: // This will skip reduce phase HFileOutputFormat.configureIncrementalLoad(job, hTable); job.setNumReduceTasks(0); Followed by a major compaction that will do the sorting for locality. [1] http://hbase.apache.org/0.94/book/mapreduce.example.html On Tue, Feb 20, 2018 at 6:44 AM, anil gupta <anilgupt...@gmail.com> wrote: > Hi Marcell, > > Since key is changing you will need to rewrite the entire table. I think > generating HFlies(rather than doing puts) will be the most efficient here. > IIRC, you will need to use HFileOutputFormat in your MR job. > For locality, i dont think you should worry that much because major > compaction usually takes care of it. If you want very high locality from > beginning then you can run a major compaction on new table after your > initial load. > > HTH, > Anil Gupta > > On Mon, Feb 19, 2018 at 11:46 PM, Marcell Ortutay <mortu...@23andme.com> > wrote: > > > I have a large HBase table (~10 TB) that has an existing key structure. > > Based on some recent analysis, the key structure is causing performance > > problems for our current query load. I would like to re-write the table > > with a new key structure that performs substantially better. > > > > What is the best way to go about re-writing this table? Since they key > > structure will change, it will affect locality, so all the data will have > > to move to a new location. If anyone can point to examples of code that > > does something like this, that would be very helpful. > > > > Thanks, > > Marcell > > > > > > -- > Thanks & Regards, > Anil Gupta >
Want to change key structure
I have a large HBase table (~10 TB) that has an existing key structure. Based on some recent analysis, the key structure is causing performance problems for our current query load. I would like to re-write the table with a new key structure that performs substantially better. What is the best way to go about re-writing this table? Since they key structure will change, it will affect locality, so all the data will have to move to a new location. If anyone can point to examples of code that does something like this, that would be very helpful. Thanks, Marcell