Also, try separating your columns into multiple column families to prevent having to scan past your 75+ column qualifiers for every query.
On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha <[email protected] > wrote: > Yes …Salting will improve the scan performance. Try with numbers 5,10,20 > . As I do not know about the cluster details. > > > > Increase scanner caching to 100000. > > > > Check if SNAPPY is working …I hope you need to put the jars classpath as > well. > > > > Since the cardinality of the col1 and col2 fields is very small use date > as first column. Also put date as integer. > > > > Try modifying the memory settings related to heap in hbase site.xml. > > > > Try naming the Column Qualifiers as single alphabets. They consume space > and takes more time to scan. > > > > Thanks > > Puneet. > > > > > > *From:* Nishant Patel [mailto:[email protected]] > *Sent:* Wednesday, July 01, 2015 4:33 PM > *To:* [email protected] > *Subject:* Re: Hbase and Phoenix Performance improvement > > > > HI Puneet/Martin, > > Thanks for your response. Please see my answer as below. > > I have not specified any salt bucket. I have created Phoenix View on > existing Hbase Table. Can I specify Salt bucket for Phoenix View? > > After loading Hbase data I alter table to use SNAPPY Compression. Are you > talking about any other compression? > > I have set hbase.client.scanner.caching to 500. I tried with 1000 also but > did not see any performance improvement. > > I am not using with production system. I have inserted data once and not > deleting so there should not be problem. There is no load on Hbase servers > as I am just reading data right now. > > Sample query is as below. > > Select column5,count(1) ttr from table where column1='column1' and > column2='column2' and date>='20150504' and date<='20150704' group by > column5. > > I am doing scan based on where condition. Column1, column2 and date is > part of my rowkey so it should not perform complete table scan. My rowkey > design is as below > > column1|column2|date|unique_identifier > > Regards, > > Nishant > > > > On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet <[email protected]> > wrote: > > It sounds like you are scanning rather than getting rows based on a known > row id. Am I wrong? > > One thing I am currently trying is to have indexed columns and "hot" > content in one column family and let "cold" content in another family. It > speed up scanning the table when you need to > > > > Le mer. 1 juil. 2015 à 06:56, Nishant Patel <[email protected]> a > écrit : > > Hi, > > I am trying to measure performance for Hbase and Phoenix. > > I have generated 1000 records per day with combination of Column1 and > Column2. > > I have created 5 different combination for column1 and column2 and created > data for 365 days. Total records I have generated 5 * 5 * 365 * 1000 = > 9125000 > > I am writing 75+ qualifiers in one Column Family for each record. > > > > Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique > identifier. I have used one byte character as rowkey separator. I have > create view in Phoenix on top of Hbase table. > > My all queries contain column1 , column2 and date as filter condition. > > If date range is less than 1 month I get response in less than 1 second. > if date range is 3/6/12 months then response comes in seconds. Sometime it > takes 25+ seconds for 12 months range. > > My question is, is it possible to get response in phoenix in less than 1 > second for amount of data I have specified. If yes what kind of tuning need > to be done? As of now I have not done any changes at Hbase and Phoenix > except proper rowkey design. > > I am trying to verify whether phoenix will suit our requirement or not. > > -- > > Thanks, > Nishant > > > > > -- > > Regards, > Nishant Patel >
