The query on test dataset is returning the expected result with the patch. But on the original dataset (10million rows, 6 regions), it still return the same unexpected result, I will dig more into this. Thank you, James!
2015-07-02 9:58 GMT-07:00 Yufan Liu <[email protected]>: > Sure, let me have a try > > 2015-07-02 9:46 GMT-07:00 James Taylor <[email protected]>: > >> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. >> Would you mind confirming that this fixes the issue you're seeing? >> >> James >> >> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <[email protected]> wrote: >> >>> I'm using 4.4.0-HBase-0.98 >>> >>> 2015-07-01 22:31 GMT-07:00 James Taylor <[email protected]>: >>> >>>> Yufan, >>>> What version of Phoenix are you using? >>>> Thanks, >>>> James >>>> >>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <[email protected]> wrote: >>>> >>>>> When I made more tests, I find that this problem happens after table >>>>> got split. >>>>> >>>>> Here is the DDL I use to create table and index: >>>>> CREATE TABLE IF NOT EXISTS t1 ( >>>>> uid BIGINT NOT NULL, >>>>> timestamp BIGINT NOT NULL, >>>>> eventName VARCHAR >>>>> CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; >>>>> >>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) >>>>> >>>>> Attach is the sample data I used for test. It has about 4000 rows, >>>>> when the timestamp_index table has one region, the query returns correct >>>>> result: 1433334048443, but when I manually split it into 4 regions (use >>>>> hbase tool), it returns 1433333024961. >>>>> >>>>> Let know if you find anything. Thanks! >>>>> >>>>> >>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <[email protected]>: >>>>> >>>>>> If you could put a complete test (including your DDL and upsert of >>>>>> data), that would be much appreciated. >>>>>> Thanks, >>>>>> James >>>>>> >>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <[email protected]> wrote: >>>>>> >>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY >>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns the same >>>>>>> unexpected result. There seems to be some internal problems related. >>>>>>> >>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <[email protected]>: >>>>>>> >>>>>>>> Yes, reverse scan will be leveraged when possible. Make you use >>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with nulls first. >>>>>>>> >>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <[email protected]> wrote: >>>>>>>> >>>>>>>>> I used the HBase reverse scan to find the last row on the index >>>>>>>>> table. It returned the expected result. I would like to know is >>>>>>>>> Phoenix's >>>>>>>>> "ORDER BY" >>>>>>>>> and "DESC" implemented based on HBase reverse scan? >>>>>>>>> >>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <[email protected]>: >>>>>>>>> >>>>>>>>>> Thank you anyway, Michael! >>>>>>>>>> >>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister < >>>>>>>>>> [email protected]>: >>>>>>>>>> >>>>>>>>>>> OK, I’m a Phoenix newbie, so that was the extent of the advice >>>>>>>>>>> I could give you. There are people here far more experienced than I >>>>>>>>>>> am who >>>>>>>>>>> should be able to give you deeper advice. Have a great weekend! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Mike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *From:* Yufan Liu [mailto:[email protected]] >>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM >>>>>>>>>>> *To:* [email protected] >>>>>>>>>>> *Subject:* Re: Problem in finding the largest value of an >>>>>>>>>>> indexed column >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Michael, >>>>>>>>>>> >>>>>>>>>>> Thanks for the advice, for the first one, it's "CLIENT 67-CHUNK >>>>>>>>>>> PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY >>>>>>>>>>> FIRST KEY >>>>>>>>>>> ONLY; SERVER AGGREGATE INTO SINGLE ROW" which is as expected. For >>>>>>>>>>> the >>>>>>>>>>> second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN >>>>>>>>>>> OVER >>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW >>>>>>>>>>> LIMIT" which >>>>>>>>>>> looks correct, but still returns the unexpected result. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister < >>>>>>>>>>> [email protected]>: >>>>>>>>>>> >>>>>>>>>>> Yufan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Have you tried using the EXPLAIN command to see what plan is >>>>>>>>>>> being used to access the data? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Michael McAllister >>>>>>>>>>> >>>>>>>>>>> Staff Data Warehouse Engineer | Decision Systems >>>>>>>>>>> >>>>>>>>>>> [email protected] | C: 512.423.7447 | skype: >>>>>>>>>>> michael.mcallister.ha <[email protected]> | webex: >>>>>>>>>>> https://h.a/mikewebex >>>>>>>>>>> >>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081] >>>>>>>>>>> >>>>>>>>>>> This electronic communication (including any attachment) is >>>>>>>>>>> confidential. If you are not an intended recipient of this >>>>>>>>>>> communication, >>>>>>>>>>> please be advised that any disclosure, dissemination, distribution, >>>>>>>>>>> copying >>>>>>>>>>> or other use of this communication or any attachment is strictly >>>>>>>>>>> prohibited. If you have received this communication in error, >>>>>>>>>>> please >>>>>>>>>>> notify the sender immediately by reply e-mail and promptly destroy >>>>>>>>>>> all >>>>>>>>>>> electronic and printed copies of this communication and any >>>>>>>>>>> attachment. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *From:* Yufan Liu [mailto:[email protected]] >>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM >>>>>>>>>>> *To:* [email protected] >>>>>>>>>>> *Subject:* Problem in finding the largest value of an indexed >>>>>>>>>>> column >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> We have created a table (eg, t1), and a global index of one >>>>>>>>>>> numeric column of t1 (eg, timestamp). Now we want to find the >>>>>>>>>>> largest value >>>>>>>>>>> of timestamp, we have tried two approaches: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. select max(timestamp) from t1; This query takes forever to >>>>>>>>>>> finish, so I think it maybe doing a full table scan/comparison . >>>>>>>>>>> >>>>>>>>>>> 2. select timestamp from t1 order by timestamp desc limit 1; >>>>>>>>>>> This query finished fast, but the result it returns is far from the >>>>>>>>>>> largest >>>>>>>>>>> value. It seems it just return the largest value for a certain >>>>>>>>>>> range of >>>>>>>>>>> data. >>>>>>>>>>> >>>>>>>>>>> Did anyone else encounter this issue/have any suggestion? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Yufan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> best, >>>>>>>>>>> Yufan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> best, >>>>>>>>>> Yufan >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> best, >>>>>>>>> Yufan >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> best, >>>>>>> Yufan >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> best, >>>>> Yufan >>>>> >>>>> >>>> >>> >>> >>> -- >>> best, >>> Yufan >>> >>> >> > > > -- > best, > Yufan > > -- best, Yufan
