Re: Problem in finding the largest value of an indexed column

Yufan Liu Thu, 02 Jul 2015 09:46:26 -0700

I'm using 4.4.0-HBase-0.98

2015-07-01 22:31 GMT-07:00 James Taylor <[email protected]>:


> Yufan,
> What version of Phoenix are you using?
> Thanks,
> James
>
> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <[email protected]> wrote:
>
>> When I made more tests, I find that this problem happens after table got
>> split.
>>
>> Here is the DDL I use to create table and index:
>> CREATE TABLE IF NOT EXISTS t1 (
>> uid BIGINT NOT NULL,
>> timestamp BIGINT NOT NULL,
>> eventName VARCHAR
>> CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';
>>
>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)
>>
>> Attach is the sample data I used for test. It has about 4000 rows, when
>> the timestamp_index table has one region, the query returns correct result:
>> 1433334048443, but when I manually split it into 4 regions (use hbase
>> tool), it returns 1433333024961.
>>
>> Let know if you find anything. Thanks!
>>
>>
>> 2015-07-01 11:27 GMT-07:00 James Taylor <[email protected]>:
>>
>>> If you could put a complete test (including your DDL and upsert of
>>> data), that would be much appreciated.
>>> Thanks,
>>> James
>>>
>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <[email protected]> wrote:
>>>
>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY timestamp
>>>> DESC NULLS LAST LIMIT 1". But it still returns the same unexpected result.
>>>> There seems to be some internal problems related.
>>>>
>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <[email protected]>:
>>>>
>>>>> Yes, reverse scan will be leveraged when possible. Make you use NULLS
>>>>> LAST in your ORDER BY as rows are ordered with nulls first.
>>>>>
>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <[email protected]> wrote:
>>>>>
>>>>>> I used the HBase reverse scan to find the last row on the index
>>>>>> table. It returned the expected result. I would like to know is Phoenix's
>>>>>> "ORDER BY"
>>>>>> and "DESC" implemented based on HBase reverse scan?
>>>>>>
>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <[email protected]>:
>>>>>>
>>>>>>> Thank you anyway, Michael!
>>>>>>>
>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister <
>>>>>>> [email protected]>:
>>>>>>>
>>>>>>>>  OK, I’m a Phoenix newbie, so that was the extent of the advice I
>>>>>>>> could give you. There are people here far more experienced than I am 
>>>>>>>> who
>>>>>>>> should be able to give you deeper advice. Have a great weekend!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* Yufan Liu [mailto:[email protected]]
>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM
>>>>>>>> *To:* [email protected]
>>>>>>>> *Subject:* Re: Problem in finding the largest value of an indexed
>>>>>>>> column
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Michael,
>>>>>>>>
>>>>>>>> Thanks for the advice, for the first one, it's "CLIENT 67-CHUNK
>>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST 
>>>>>>>> KEY
>>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which is as expected. For the
>>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER
>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT" 
>>>>>>>> which
>>>>>>>> looks correct, but still returns the unexpected result.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister <
>>>>>>>> [email protected]>:
>>>>>>>>
>>>>>>>> Yufan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Have you tried using the EXPLAIN command to see what plan is being
>>>>>>>> used to access the data?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Michael McAllister
>>>>>>>>
>>>>>>>> Staff Data Warehouse Engineer | Decision Systems
>>>>>>>>
>>>>>>>> [email protected] | C: 512.423.7447 | skype:
>>>>>>>> michael.mcallister.ha <[email protected]> | webex:
>>>>>>>> https://h.a/mikewebex
>>>>>>>>
>>>>>>>> [image: Description: Description: cid:3410354473_30269081]
>>>>>>>>
>>>>>>>> This electronic communication (including any attachment) is
>>>>>>>> confidential.  If you are not an intended recipient of this 
>>>>>>>> communication,
>>>>>>>> please be advised that any disclosure, dissemination, distribution, 
>>>>>>>> copying
>>>>>>>> or other use of this communication or any attachment is strictly
>>>>>>>> prohibited.  If you have received this communication in error, please
>>>>>>>> notify the sender immediately by reply e-mail and promptly destroy all
>>>>>>>> electronic and printed copies of this communication and any attachment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* Yufan Liu [mailto:[email protected]]
>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM
>>>>>>>> *To:* [email protected]
>>>>>>>> *Subject:* Problem in finding the largest value of an indexed
>>>>>>>> column
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We have created a table (eg, t1), and a global index of one numeric
>>>>>>>> column of t1 (eg, timestamp). Now we want to find the largest value of
>>>>>>>> timestamp, we have tried two approaches:
>>>>>>>>
>>>>>>>>
>>>>>>>> 1. select max(timestamp) from t1; This query takes forever to
>>>>>>>> finish, so I think it maybe doing a full table scan/comparison .
>>>>>>>>
>>>>>>>> 2. select timestamp from t1 order by timestamp desc limit 1; This
>>>>>>>> query finished fast, but the result it returns is far from the largest
>>>>>>>> value. It seems it just return the largest value for a certain range of
>>>>>>>> data.
>>>>>>>>
>>>>>>>> Did anyone else encounter this issue/have any suggestion?
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yufan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> best,
>>>>>>>> Yufan
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> best,
>>>>>>> Yufan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> best,
>>>>>> Yufan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> best,
>>>> Yufan
>>>>
>>>>
>>>
>>
>>
>> --
>> best,
>> Yufan
>>
>>
>


-- 
best,
Yufan

Re: Problem in finding the largest value of an indexed column

Reply via email to