RE: Hbase and Phoenix Performance improvement

Puneet Kumar Ojha Wed, 01 Jul 2015 04:48:33 -0700

Yes …Salting will improve the scan performance. Try with numbers 5,10,20 . As I 
do not know about the cluster details.


Increase scanner caching to 100000.

Check if SNAPPY is working …I hope you need to put the jars classpath as well.

Since the cardinality of the col1 and col2 fields is very small use date as 
first column. Also put date as integer.

Try modifying the memory settings related to heap in hbase site.xml.

Try naming the Column Qualifiers as single alphabets. They consume space and 
takes more time to scan.

Thanks
Puneet.


From: Nishant Patel [mailto:[email protected]]
Sent: Wednesday, July 01, 2015 4:33 PM
To: [email protected]
Subject: Re: Hbase and Phoenix Performance improvement

HI Puneet/Martin,
Thanks for your response. Please see my answer as below.
I have not specified any salt bucket. I have created Phoenix View on existing 
Hbase Table. Can I specify Salt bucket for Phoenix View?
After loading Hbase data I alter table to use SNAPPY Compression. Are you 
talking about any other compression?
I have set hbase.client.scanner.caching to 500. I tried with 1000 also but did 
not see any performance improvement.

I am not using with production system. I have inserted data once and not 
deleting so there should not be problem. There is no load on Hbase servers as I 
am just reading data right now.
Sample query is as below.

Select column5,count(1) ttr from table where column1='column1' and 
column2='column2' and date>='20150504' and date<='20150704' group by column5.
I am doing scan based on where condition. Column1, column2 and date is part of 
my rowkey so it should not perform complete table scan. My rowkey design is as 
below

column1|column2|date|unique_identifier
Regards,
Nishant

On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet 
<[email protected]<mailto:[email protected]>> wrote:
It sounds like you are scanning rather than getting rows based on a known row 
id. Am I wrong?

One thing I am currently trying is to have indexed columns and "hot" content in 
one column family and let "cold" content in another family. It speed up 
scanning the table when you need to

Le mer. 1 juil. 2015 à 06:56, Nishant Patel 
<[email protected]<mailto:[email protected]>> a écrit :
Hi,
I am trying to measure performance for Hbase and Phoenix.
I have generated 1000 records per day with combination of Column1 and Column2.
I have created 5 different combination for column1 and column2 and created data 
for 365 days. Total records I have generated 5 * 5 * 365 * 1000 = 9125000
I am writing 75+ qualifiers in one Column Family for each record.

Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique identifier. I 
have used one byte character as rowkey separator. I have create view in Phoenix 
on top of Hbase table.
My all queries contain column1 , column2 and date as filter condition.
If date range is less than 1 month I get response in less than 1 second. if 
date range is 3/6/12 months then response comes in seconds. Sometime it takes 
25+ seconds for 12 months range.
My question is, is it possible to get response in phoenix in less than 1 second 
for amount of data I have specified. If yes what kind of tuning need to be 
done? As of now I have not done any changes at Hbase and Phoenix except proper 
rowkey design.
I am trying to verify whether phoenix will suit our requirement or not.
--
Thanks,
Nishant



--
Regards,
Nishant Patel

RE: Hbase and Phoenix Performance improvement

Reply via email to