Please find my reply inline. On Wed, Sep 30, 2015 at 3:29 PM, Konstantinos Kougios < [email protected]> wrote:
> Thanks for the reply and the useful information Anil. > > I am aware of the difficulties of distributed joins and aggregations and > that phoenix is a layer on top of hbase. It would be great if it could be > configured to run the queries, even if it takes a lot of time for the > queries to complete. > Anil: I think, it is doable. But, this might require little bit of hit & trial with HBase and Phoenix conf. I would start with increasing HBase and Phoenix timeouts. > > I got mainly 2 tables of 170GB and 550GB. Aggregation queries on both fail > and even make region servers crash (there is no info in the logs and still > don't know why. My server proved to be rock stable so far on other things > but you never know). > Anil: RS should not crash. Are you doing heavy writes along with full table scans at same time? In one of your email, i saw stack trace regarding Region split and compactions? > > I am doing full table scans only because so far I was unable to create the > indexes. I tried async indexes too with the map reduce job to create them > but it runs extremely slowly. > Anil: This doesnt not sounds good. I haven't use those yet. So, i wont be able to help debug the problem. Hopefully, someone else will be able to chime in. > > In theory full table scans are possible with hbase, so even if it was slow > it shouldn't fail. > Anil: IMO, if you are doing full table scans, then maybe you should turn off blockCache for those queries. Basically, there is a lot of cache churn due to full table scans. Cache churn will lead to JVM GC's. > > My setup is a 64GB AMD opteron server with 16 cores. 3 lxc virtual > machines as region servers with Xmx8G, each running on a 3TB 7200rpm disk. > So somehow I simulate 3x low spec servers with enough ram. > > Next thing I will try is give region servers 16GB of RAM. WIth 8GB they > seem to have some memory pressure and I see some slow GC's in the logs. > Anil: 16GB ram should help in some cases. Try to disable blockcache for full table scans. > > Cheers > > > > > > On 30/09/15 21:18, anil gupta wrote: > > Hi Konstantinos, > Please find my reply inline. > > On Wed, Sep 30, 2015 at 12:10 PM, Konstantinos Kougios < > <[email protected]>[email protected]> wrote: > >> Hi all, >> >> I had various issues with big tables while experimenting the couple last >> weeks. >> >> The thing that goes to my mind is that hbase (+phoenix) works only when >> there is a fairly powerful cluster and say 1/2 the data can fit into the >> combined servers memory and disks are fast (SSD?) as well. It doesn't seem >> to be able to work when tables are 2x as large as the memory allocated to >> region servers (frankly I think it is less) >> > Anil: Phoenix is just a SQL layer over HBase. From the query in your > previous emails, it seems like you are doing full table scans with group by > clauses. IMO, HBase is not a DB to be used for full table scans. If 90% of > your use cases are small range scan or gets then HBase should work nicely > with Terabytes of data. I have a 40 TB table in prod on 60 node cluster > where every RS only has 16GB of heap. What kind of workload you are trying > to run with HBase? > > >> >> Things that constantly fail: >> >> - non-trivial queries on large tables (with group by, counts, joins) with >> region server out of memory errors or crashes without any reason for Xmx of >> 4G or 8G >> > Anil: Can you convert these queries into short range based scans? If you > are always going to do full table scan, then maybe you need to use MR or > Spark for those computation and then tune cluster for full table scans. > Cluster tuning varies with full table scan workload. > >> - index creation on the same big tables. Those always fail I think around >> the point when hbase has to flush it's memory regions to the disk and >> couldn't find a solution >> > - spark jobs fail unless they are throttled to feed hbase with the data it >> can take . No backpressure? >> > >> There were no replies to my emails regarding the issues, which makes me >> think there aren't solutions (or solutions are pretty hard to find and not >> many ppl know them). >> >> So after 21 tweaks to the default config, I am still not able to operate >> it as a normal database. >> > Anil: HBase is actually not a normal RDBMS DB. Its a **keyvalue store**. > Phoenix is providing a SQL layer using HBase API. So, user will need to > deal with pros/cons of a key/value store. > >> >> Should I start believing my config is all wrong or that hbase+phoenix is >> only working if there is a sufficiently powerful cluster to handle the data? >> > Anil: **As per my experience**, HBase+Phoenix will work nicely if you are > doing keyvalue lookups and short range scans. > I would suggest you to evaluate data model of HBase tables and try to > convert queries to small range scan or lookups. > >> >> I believe it is a great project and the functionality is really useful. >> What's lacking is 3 sample configs for 3 different strength clusters. >> > Anil: I agree that guidance on configuration of HBase and Phoenix can be > improved so that people can get going quickly. > >> >> Thanks >> > > > > -- > Thanks & Regards, > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
