Re: Hbase and Phoenix Performance improvement

James Taylor Wed, 01 Jul 2015 09:23:24 -0700

Also, try separating your columns into multiple column families to prevent
having to scan past your 75+ column qualifiers for every query.


On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha <[email protected]
> wrote:

>  Yes …Salting will improve the scan performance. Try with numbers 5,10,20
> . As I do not know about the cluster details.
>
>
>
> Increase scanner caching to 100000.
>
>
>
> Check if SNAPPY is working …I hope you need to put the jars classpath as
> well.
>
>
>
> Since the cardinality of the col1 and col2 fields is very small use date
> as first column. Also put date as integer.
>
>
>
> Try modifying the memory settings related to heap in hbase site.xml.
>
>
>
> Try naming the Column Qualifiers as single alphabets. They consume space
> and takes more time to scan.
>
>
>
> Thanks
>
> Puneet.
>
>
>
>
>
> *From:* Nishant Patel [mailto:[email protected]]
> *Sent:* Wednesday, July 01, 2015 4:33 PM
> *To:* [email protected]
> *Subject:* Re: Hbase and Phoenix Performance improvement
>
>
>
> HI Puneet/Martin,
>
> Thanks for your response. Please see my answer as below.
>
> I have not specified any salt bucket. I have created Phoenix View on
> existing Hbase Table. Can I specify Salt bucket for Phoenix View?
>
> After loading Hbase data I alter table to use SNAPPY Compression. Are you
> talking about any other compression?
>
> I have set hbase.client.scanner.caching to 500. I tried with 1000 also but
> did not see any performance improvement.
>
> I am not using with production system. I have inserted data once and not
> deleting so there should not be problem. There is no load on Hbase servers
> as I am just reading data right now.
>
> Sample query is as below.
>
> Select column5,count(1) ttr from table where column1='column1' and
> column2='column2' and date>='20150504' and date<='20150704' group by
> column5.
>
> I am doing scan based on where condition. Column1, column2 and date is
> part of my rowkey so it should not perform complete table scan. My rowkey
> design is as below
>
> column1|column2|date|unique_identifier
>
> Regards,
>
> Nishant
>
>
>
> On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet <[email protected]>
> wrote:
>
> It sounds like you are scanning rather than getting rows based on a known
> row id. Am I wrong?
>
> One thing I am currently trying is to have indexed columns and "hot"
> content in one column family and let "cold" content in another family. It
> speed up scanning the table when you need to
>
>
>
> Le mer. 1 juil. 2015 à 06:56, Nishant Patel <[email protected]> a
> écrit :
>
>    Hi,
>
> I am trying to measure performance for Hbase and Phoenix.
>
> I have generated 1000 records per day with combination of Column1 and
> Column2.
>
> I have created 5 different combination for column1 and column2 and created
> data for 365 days. Total records I have generated 5 * 5 * 365 * 1000 =
> 9125000
>
> I am writing 75+ qualifiers in one Column Family for each record.
>
>
>
> Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique
> identifier. I have used one byte character as rowkey separator. I have
> create view in Phoenix on top of Hbase table.
>
> My all queries contain column1 , column2 and date as filter condition.
>
> If date range is less than 1 month I get response in less than 1 second.
> if date range is 3/6/12 months then response comes in seconds. Sometime it
> takes 25+ seconds for 12 months range.
>
> My question is, is it possible to get response in phoenix in less than 1
> second for amount of data I have specified. If yes what kind of tuning need
> to be done? As of now I have not done any changes at Hbase and Phoenix
> except proper rowkey design.
>
> I am trying to verify whether phoenix will suit our requirement or not.
>
> --
>
> Thanks,
> Nishant
>
>
>
>
> --
>
> Regards,
> Nishant Patel
>

Re: Hbase and Phoenix Performance improvement

Reply via email to