Sounds like something else is going wrong. Can you adapt your test by setting the MAX_FILESIZE very low for your table (so that it splits after 4 or 5 rows are added) and package it up as a unit test?
On Thu, Jul 9, 2015 at 1:44 PM, Yufan Liu <[email protected]> wrote: > Just got a chance to revisit this issue: I have rebuilt the index and it > still returns the unexpected result. By using the test case, I tried to > insert enough rows to make it auto-split and it reproduces the problem too. > It seems it still has trouble returning last row sorted by first component > of primary key on split tables. Maybe there is another issue than > PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98 > branch which includes the patch of PHOENIX-2096. > > 2015-07-02 19:55 GMT-07:00 James Taylor <[email protected]>: > >> On further investigation, I believe it should have been working before. I >> did a bit of cleanup and attached a new patch to PHOENIX-2096, but this >> would only prevent a merge sort when one is not required (basically >> improving performance). >> >> Maybe your index is invalid? You can try rebuilding with this command: >> https://phoenix.apache.org/language/index.html#alter_index >> >> On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu <[email protected]> wrote: >> >>> The query on test dataset is returning the expected result with the >>> patch. But on the original dataset (10million rows, 6 regions), it still >>> return the same unexpected result, I will dig more into this. Thank you, >>> James! >>> >>> 2015-07-02 9:58 GMT-07:00 Yufan Liu <[email protected]>: >>> >>>> Sure, let me have a try >>>> >>>> 2015-07-02 9:46 GMT-07:00 James Taylor <[email protected]>: >>>> >>>>> Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. >>>>> Would you mind confirming that this fixes the issue you're seeing? >>>>> >>>>> James >>>>> >>>>> On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu <[email protected]> wrote: >>>>> >>>>>> I'm using 4.4.0-HBase-0.98 >>>>>> >>>>>> 2015-07-01 22:31 GMT-07:00 James Taylor <[email protected]>: >>>>>> >>>>>>> Yufan, >>>>>>> What version of Phoenix are you using? >>>>>>> Thanks, >>>>>>> James >>>>>>> >>>>>>> On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu <[email protected]> wrote: >>>>>>> >>>>>>>> When I made more tests, I find that this problem happens after >>>>>>>> table got split. >>>>>>>> >>>>>>>> Here is the DDL I use to create table and index: >>>>>>>> CREATE TABLE IF NOT EXISTS t1 ( >>>>>>>> uid BIGINT NOT NULL, >>>>>>>> timestamp BIGINT NOT NULL, >>>>>>>> eventName VARCHAR >>>>>>>> CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) >>>>>>>> COMPRESSION='SNAPPY'; >>>>>>>> >>>>>>>> CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) >>>>>>>> >>>>>>>> Attach is the sample data I used for test. It has about 4000 rows, >>>>>>>> when the timestamp_index table has one region, the query returns >>>>>>>> correct >>>>>>>> result: 1433334048443, but when I manually split it into 4 regions (use >>>>>>>> hbase tool), it returns 1433333024961. >>>>>>>> >>>>>>>> Let know if you find anything. Thanks! >>>>>>>> >>>>>>>> >>>>>>>> 2015-07-01 11:27 GMT-07:00 James Taylor <[email protected]>: >>>>>>>> >>>>>>>>> If you could put a complete test (including your DDL and upsert of >>>>>>>>> data), that would be much appreciated. >>>>>>>>> Thanks, >>>>>>>>> James >>>>>>>>> >>>>>>>>> On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I have tried to use query "SELECT timestamp FROM t1 ORDER BY >>>>>>>>>> timestamp DESC NULLS LAST LIMIT 1". But it still returns the same >>>>>>>>>> unexpected result. There seems to be some internal problems related. >>>>>>>>>> >>>>>>>>>> 2015-06-30 18:03 GMT-07:00 James Taylor <[email protected]>: >>>>>>>>>> >>>>>>>>>>> Yes, reverse scan will be leveraged when possible. Make you use >>>>>>>>>>> NULLS LAST in your ORDER BY as rows are ordered with nulls first. >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I used the HBase reverse scan to find the last row on the index >>>>>>>>>>>> table. It returned the expected result. I would like to know is >>>>>>>>>>>> Phoenix's >>>>>>>>>>>> "ORDER BY" >>>>>>>>>>>> and "DESC" implemented based on HBase reverse scan? >>>>>>>>>>>> >>>>>>>>>>>> 2015-06-26 17:25 GMT-07:00 Yufan Liu <[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> Thank you anyway, Michael! >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-06-26 17:21 GMT-07:00 Michael McAllister < >>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>> >>>>>>>>>>>>>> OK, I’m a Phoenix newbie, so that was the extent of the >>>>>>>>>>>>>> advice I could give you. There are people here far more >>>>>>>>>>>>>> experienced than I >>>>>>>>>>>>>> am who should be able to give you deeper advice. Have a great >>>>>>>>>>>>>> weekend! >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mike >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *From:* Yufan Liu [mailto:[email protected]] >>>>>>>>>>>>>> *Sent:* Friday, June 26, 2015 7:19 PM >>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>> *Subject:* Re: Problem in finding the largest value of an >>>>>>>>>>>>>> indexed column >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Michael, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the advice, for the first one, it's "CLIENT >>>>>>>>>>>>>> 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER >>>>>>>>>>>>>> FILTER BY >>>>>>>>>>>>>> FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW" which is as >>>>>>>>>>>>>> expected. For >>>>>>>>>>>>>> the second one, it's "CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL >>>>>>>>>>>>>> SCAN OVER >>>>>>>>>>>>>> TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW >>>>>>>>>>>>>> LIMIT" which >>>>>>>>>>>>>> looks correct, but still returns the unexpected result. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2015-06-26 16:59 GMT-07:00 Michael McAllister < >>>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yufan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Have you tried using the EXPLAIN command to see what plan is >>>>>>>>>>>>>> being used to access the data? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Michael McAllister >>>>>>>>>>>>>> >>>>>>>>>>>>>> Staff Data Warehouse Engineer | Decision Systems >>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] | C: 512.423.7447 | skype: >>>>>>>>>>>>>> michael.mcallister.ha <[email protected]> | webex: >>>>>>>>>>>>>> https://h.a/mikewebex >>>>>>>>>>>>>> >>>>>>>>>>>>>> [image: Description: Description: cid:3410354473_30269081] >>>>>>>>>>>>>> >>>>>>>>>>>>>> This electronic communication (including any attachment) is >>>>>>>>>>>>>> confidential. If you are not an intended recipient of this >>>>>>>>>>>>>> communication, >>>>>>>>>>>>>> please be advised that any disclosure, dissemination, >>>>>>>>>>>>>> distribution, copying >>>>>>>>>>>>>> or other use of this communication or any attachment is strictly >>>>>>>>>>>>>> prohibited. If you have received this communication in error, >>>>>>>>>>>>>> please >>>>>>>>>>>>>> notify the sender immediately by reply e-mail and promptly >>>>>>>>>>>>>> destroy all >>>>>>>>>>>>>> electronic and printed copies of this communication and any >>>>>>>>>>>>>> attachment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *From:* Yufan Liu [mailto:[email protected]] >>>>>>>>>>>>>> *Sent:* Friday, June 26, 2015 6:31 PM >>>>>>>>>>>>>> *To:* [email protected] >>>>>>>>>>>>>> *Subject:* Problem in finding the largest value of an >>>>>>>>>>>>>> indexed column >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have created a table (eg, t1), and a global index of one >>>>>>>>>>>>>> numeric column of t1 (eg, timestamp). Now we want to find the >>>>>>>>>>>>>> largest value >>>>>>>>>>>>>> of timestamp, we have tried two approaches: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. select max(timestamp) from t1; This query takes forever to >>>>>>>>>>>>>> finish, so I think it maybe doing a full table scan/comparison . >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. select timestamp from t1 order by timestamp desc limit 1; >>>>>>>>>>>>>> This query finished fast, but the result it returns is far from >>>>>>>>>>>>>> the largest >>>>>>>>>>>>>> value. It seems it just return the largest value for a certain >>>>>>>>>>>>>> range of >>>>>>>>>>>>>> data. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Did anyone else encounter this issue/have any suggestion? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yufan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> best, >>>>>>>>>>>>>> Yufan >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> best, >>>>>>>>>>>>> Yufan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> best, >>>>>>>>>>>> Yufan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> best, >>>>>>>>>> Yufan >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> best, >>>>>>>> Yufan >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> best, >>>>>> Yufan >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> best, >>>> Yufan >>>> >>>> >>> >>> >>> -- >>> best, >>> Yufan >>> >>> >> > > > -- > best, > Yufan > >
