Re: Signed long values in column

Anchal Agrawal Wed, 29 Jul 2015 18:27:34 -0700

Hi James,
Thanks for your reply. I don't understand the issue fully - do HBase's 
Bytes.toBytes() methods not have the same sort order as that of Phoenix? I'd 
really appreciate it if you could give more insight on this. Their 
documentation doesn't mention the sort order. If negative numbers sort ahead of 
the positive numbers, why is that incompatible with Phoenix?


It's interesting because it seems that we can't have columns in Phoenix 
views/tables where the values have (negative) long values. In my setup, it is 
not feasible to create a new Phoenix table and copy over the data because the 
table is very large and we'd need to recreate the Phoenix table with updated 
data every time we want to run queries.

Is it feasible to write a UDF (for a SELECT statement) that converts the 
bytearray in that column to a long value? If it is, would I use the Tuple 
object that's passed in to the evaluate() method to get the values in that 
column? I tried using Tuple's getValue() method to grab the bytearray in the 
column, but I'm running into issues. I'm looking at ToNumberFunction for 
reference.

I really appreciate your help.
- Anchal 

     On Wednesday, July 29, 2015 8:43 AM, James Taylor <[email protected]> 
wrote:
   

 Hi Anchal,Phoenix depends on the sort order of the serialized bytes to match 
the natural sort order of the column value. The HBase Bytes.toBytes() methods 
do not meet this requirement, as negative numbers will sort ahead of positive 
numbers. About the only option you have in this case is to create a new Phoenix 
table and copy the data over from your old table. If the data is being created 
by some external process, then you'd need to change it to use the PDataType 
toBytes() method instead of the HBase Bytes.toBytes() method.
It's possible that Phoenix could relax this constraint for columns that are not 
part of the primary key constraint - please file a JIRA for this. We'd need to 
define a new PDataType (it could share almost all of it's implementation with 
PUnsignedLong) and handle ORDER BY differently for these types.
Thanks,James
On Tue, Jul 28, 2015 at 10:40 PM, Anchal Agrawal <[email protected]> wrote:

Hi,
I'm creating a Phoenix view of an existing HBase table on v4.4.0.

Command: CREATE VIEW "table_name" (pk VARBINARY PRIMARY KEY, "cf"."col" 
DATA_TYPE_HERE);
The col column has long values that are serialized by Bytes.toBytes(long) but 
since some values are negative, I can't use UNSIGNED_LONG. I tried BIGINT 
instead since the documentation says that it maps to java.lang.Long, but that 
resulted in incorrect column values. The datatype documentation for 
UNSIGNED_LONG says "use the regular signed type instead" - which datatype is 
this referring to? LONG isn't supported.

I could create the view with the column values as bytearrays and write a UDF to 
extract long values, but I think that will add to the latency. Is there a way 
around this? I really appreciate your help.

Sincerely,Anchal Agrawal

Re: Signed long values in column

Reply via email to