Re: Help with salting

Vijay Vangapandu Sun, 22 Nov 2015 18:26:38 -0800

Hi James,

Thanks for your help. I used reverseBytes on the sequence key. hot spotting 
issue is not completely gone but better than before.
Do you think it will get better with reverse bits?


I have one more question. I see few errors in logs and I am not sure where the 
issue is, we are debugging the issue but just want to post the exception incase 
you encountered this before.

We are trying to insert “en_US” value to a locale column, which VARCHAR data 
type, and birthdate field (DATE) comes before this locale field if that helps.


Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
(42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "en_US" at 
line 1, column 515.

        at 
org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111) 
~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1097)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1178)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.<init>(PhoenixPreparedStatement.java:95)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.jdbc.PhoenixConnection.prepareStatement(PhoenixConnection.java:622)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
com.eharmony.datastore.hbase.query.executor.PhoenixHBaseQueryExecutor.save(PhoenixHBaseQueryExecutor.java:83)
 ~[datastore-hbase-api-0.1.9.jar:na]

        ... 158 common frames omitted

Caused by: org.antlr.runtime.MismatchedTokenException: null

        at 
org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:346)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) 
~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.parse.PhoenixSQLParser.upsert_node(PhoenixSQLParser.java:4454)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]

        at 
org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:738)
 ~[phoenix-4.4.0-HBase-1.1-client-minimal.jar:na]



From: James Taylor <jamestay...@apache.org<mailto:jamestay...@apache.org>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" 
<user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Tuesday, November 3, 2015 at 6:49 PM
To: user <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: Help with salting

Hi Vijay,
Have you considered generating your IDs in a way that prevents hotspotting? One 
way might be to reverse the bits you get back from the sequence generator. You 
could write a simple UDF that does that: https://phoenix.apache.org/udf.html

See inline for answers to your questions.

Thanks,
James

On Tue, Nov 3, 2015 at 3:44 PM, Vijay Vangapandu 
<vijayvangapa...@eharmony.com<mailto:vijayvangapa...@eharmony.com>> wrote:
Hi,

I integrated one of the online services in my company with hbase using apache 
phoenix, after loading few millions of records I noticed that we have hotspot 
problem. All the records are going to one region as the keys are generated 
using sequence.
Usecase is: each user has 1000’s of records with combination of userid and 
second record id as rowkey (primary key uid, XXX). When user logs in we fetch 
all records by using userid and render the results to user. But updates will 
always be with combination (userid + XXX). Below are my questions.

 1.  If I salt the table using apache phoenix, is there any performance impact 
on reads as the reads has to query all regions?
Yes - for range scans, Phoenix needs to run N scans to find all the data where 
N is the number of salt buckets. Worst case, that's N times more load on your 
cluster, but in reality, the impact would likely be lower. A good way to think 
of it is that you're loading N blocks when in the non salted case you might 
only be loading 1 block.

 2.  If I have to salt the table, how many buckets should I use for 8 regional 
servers with 272 regions, roughly 33 regions for a regions server?
Have you seen the Tuning presentation on our Presentations page? 
https://phoenix.apache.org/resources.html. Maybe start with 10 or 11 salt 
buckets. Looks like your region size is pretty small, so not sure how this will 
impact things. Try using Pherf (https://phoenix.apache.org/pherf.html) with 
different salt buckets to get an idea.

 3.  If I salt the table using phoenix, what is the effort to move away  from 
pehonix and use the hbase client directly in later times ( not that I want to 
but just checking the options)
Impossible. :-) The salt byte value calculation is just a few lines of code 
(see SaltingUtil.getSaltingByte()), and you'd need to run scans against all 
salt buckets and merge the results. But assuming your using where clauses and 
other features, that's going to be a lot of work.


Thanks for your help.

--
Vijay Vangapandu
eHarmony, Platform
Principal Software Engineer

Re: Help with salting

Reply via email to