On further investigation, I believe it should have been working before. I did a bit of cleanup and attached a new patch to PHOENIX-2096, but this would only prevent a merge sort when one is not required (basically improving performance).
Maybe your index is invalid? You can try rebuilding with this command: https://phoenix.apache.org/language/index.html#alter_index On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu <[email protected]> wrote: > The query on test dataset is returning the expected result with the patch. > But on the original dataset (10million rows, 6 regions), it still return > the same unexpected result, I will dig more into this. Thank you, James! > > 2015-07-02 9:58 GMT-07:00 Yufan Liu <[email protected]>: > >> Sure, let me have a try >> >> 2015-07-02 9:46 GMT-07:00 James Taylor <[email protected]>: >> >>> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. >>> Would you mind confirming that this fixes the issue you're seeing? >>> >>> James >>> >>> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <[email protected]> wrote: >>> >>>> I'm using 4.4.0-HBase-0.98 >>>> >>>> 2015-07-01 22:31 GMT-07:00 James Taylor <[email protected]>: >>>> >>>>> Yufan, >>>>> What version of Phoenix are you using? >>>>> Thanks, >>>>> James >>>>> >>>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <[email protected]> wrote: >>>>> >>>>>> When I made more tests, I find that this problem happens after table >>>>>> got split. >>>>>> >>>>>> Here is the DDL I use to create table and index: >>>>>> CREATE TABLE IF NOT EXISTS t1 ( >>>>>> uid BIGINT NOT NULL, >>>>>> timestamp BIGINT NOT NULL, >>>>>> eventName VARCHAR >>>>>> CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; >>>>>> >>>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) >>>>>> >>>>>> Attach is the sample data I used for test. It has about 4000 rows, >>>>>> when the timestamp_index table has one region, the query returns correct >>>>>> result: 1433334048443, but when I manually split it into 4 regions (use >>>>>> hbase tool), it returns 1433333024961. >>>>>> >>>>>> Let know if you find anything. Thanks! >>>>>> >>>>>> >>>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <[email protected]>: >>>>>> >>>>>>> If you could put a complete test (including your DDL and upsert of >>>>>>> data), that would be much appreciated. >>>>>>> Thanks, >>>>>>> James >>>>>>> >>>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <[email protected]> wrote: >>>>>>> >>>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY >>>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns the same >>>>>>>> unexpected result. There seems to be some internal problems related. >>>>>>>> >>>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <[email protected]>: >>>>>>>> >>>>>>>>> Yes, reverse scan will be leveraged when possible. Make you use >>>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with nulls first. >>>>>>>>> >>>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I used the HBase reverse scan to find the last row on the index >>>>>>>>>> table. It returned the expected result. I would like to know is >>>>>>>>>> Phoenix's >>>>>>>>>> "ORDER BY" >>>>>>>>>> and "DESC" implemented based on HBase reverse scan? >>>>>>>>>> >>>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <[email protected]>: >>>>>>>>>> >>>>>>>>>>> Thank you anyway, Michael! >>>>>>>>>>> >>>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister < >>>>>>>>>>> [email protected]>: >>>>>>>>>>> >>>>>>>>>>>> OK, I’m a Phoenix newbie, so that was the extent of the >>>>>>>>>>>> advice I could give you. There are people here far more >>>>>>>>>>>> experienced than I >>>>>>>>>>>> am who should be able to give you deeper advice. Have a great >>>>>>>>>>>> weekend! >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Mike >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *From:* Yufan Liu [mailto:[email protected]] >>>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM >>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>> *Subject:* Re: Problem in finding the largest value of an >>>>>>>>>>>> indexed column >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi Michael, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the advice, for the first one, it's "CLIENT 67-CHUNK >>>>>>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY >>>>>>>>>>>> FIRST KEY >>>>>>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which is as expected. For >>>>>>>>>>>> the >>>>>>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN >>>>>>>>>>>> OVER >>>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW >>>>>>>>>>>> LIMIT" which >>>>>>>>>>>> looks correct, but still returns the unexpected result. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister < >>>>>>>>>>>> [email protected]>: >>>>>>>>>>>> >>>>>>>>>>>> Yufan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Have you tried using the EXPLAIN command to see what plan is >>>>>>>>>>>> being used to access the data? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Michael McAllister >>>>>>>>>>>> >>>>>>>>>>>> Staff Data Warehouse Engineer | Decision Systems >>>>>>>>>>>> >>>>>>>>>>>> [email protected] | C: 512.423.7447 | skype: >>>>>>>>>>>> michael.mcallister.ha <[email protected]> | webex: >>>>>>>>>>>> https://h.a/mikewebex >>>>>>>>>>>> >>>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081] >>>>>>>>>>>> >>>>>>>>>>>> This electronic communication (including any attachment) is >>>>>>>>>>>> confidential. If you are not an intended recipient of this >>>>>>>>>>>> communication, >>>>>>>>>>>> please be advised that any disclosure, dissemination, >>>>>>>>>>>> distribution, copying >>>>>>>>>>>> or other use of this communication or any attachment is strictly >>>>>>>>>>>> prohibited. If you have received this communication in error, >>>>>>>>>>>> please >>>>>>>>>>>> notify the sender immediately by reply e-mail and promptly destroy >>>>>>>>>>>> all >>>>>>>>>>>> electronic and printed copies of this communication and any >>>>>>>>>>>> attachment. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *From:* Yufan Liu [mailto:[email protected]] >>>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM >>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>> *Subject:* Problem in finding the largest value of an indexed >>>>>>>>>>>> column >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> We have created a table (eg, t1), and a global index of one >>>>>>>>>>>> numeric column of t1 (eg, timestamp). Now we want to find the >>>>>>>>>>>> largest value >>>>>>>>>>>> of timestamp, we have tried two approaches: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 1. select max(timestamp) from t1; This query takes forever to >>>>>>>>>>>> finish, so I think it maybe doing a full table scan/comparison . >>>>>>>>>>>> >>>>>>>>>>>> 2. select timestamp from t1 order by timestamp desc limit 1; >>>>>>>>>>>> This query finished fast, but the result it returns is far from >>>>>>>>>>>> the largest >>>>>>>>>>>> value. It seems it just return the largest value for a certain >>>>>>>>>>>> range of >>>>>>>>>>>> data. >>>>>>>>>>>> >>>>>>>>>>>> Did anyone else encounter this issue/have any suggestion? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Yufan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> best, >>>>>>>>>>>> Yufan >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> best, >>>>>>>>>>> Yufan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> best, >>>>>>>>>> Yufan >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> best, >>>>>>>> Yufan >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> best, >>>>>> Yufan >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> best, >>>> Yufan >>>> >>>> >>> >> >> >> -- >> best, >> Yufan >> >> > > > -- > best, > Yufan > >
