Re: Problem in finding the largest value of an indexed column
It seems that setting MAX_FILESIZE only won't trigger split automatically (need major compaction?). Also using split on when creating the table to force split seems only works for String type row key. Is there any way else I can use to make the table auto split for the unit test? 2015-07-09 23:36 GMT-07:00 James Taylor jamestay...@apache.org: Sounds like something else is going wrong. Can you adapt your test by setting the MAX_FILESIZE very low for your table (so that it splits after 4 or 5 rows are added) and package it up as a unit test? On Thu, Jul 9, 2015 at 1:44 PM, Yufan Liu yli...@kent.edu wrote: Just got a chance to revisit this issue: I have rebuilt the index and it still returns the unexpected result. By using the test case, I tried to insert enough rows to make it auto-split and it reproduces the problem too. It seems it still has trouble returning last row sorted by first component of primary key on split tables. Maybe there is another issue than PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98 branch which includes the patch of PHOENIX-2096. 2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org: On further investigation, I believe it should have been working before. I did a bit of cleanup and attached a new patch to PHOENIX-2096, but this would only prevent a merge sort when one is not required (basically improving performance). Maybe your index is invalid? You can try rebuilding with this command: https://phoenix.apache.org/language/index.html#alter_index On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote: The query on test dataset is returning the expected result with the patch. But on the original dataset (10million rows, 6 regions), it still return the same unexpected result, I will dig more into this. Thank you, James! 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu: Sure, let me have a try 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org: Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would you mind confirming that this fixes the issue you're seeing? James On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote: I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org : Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT
Re: Problem in finding the largest value of an indexed column
Sounds like something else is going wrong. Can you adapt your test by setting the MAX_FILESIZE very low for your table (so that it splits after 4 or 5 rows are added) and package it up as a unit test? On Thu, Jul 9, 2015 at 1:44 PM, Yufan Liu yli...@kent.edu wrote: Just got a chance to revisit this issue: I have rebuilt the index and it still returns the unexpected result. By using the test case, I tried to insert enough rows to make it auto-split and it reproduces the problem too. It seems it still has trouble returning last row sorted by first component of primary key on split tables. Maybe there is another issue than PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98 branch which includes the patch of PHOENIX-2096. 2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org: On further investigation, I believe it should have been working before. I did a bit of cleanup and attached a new patch to PHOENIX-2096, but this would only prevent a merge sort when one is not required (basically improving performance). Maybe your index is invalid? You can try rebuilding with this command: https://phoenix.apache.org/language/index.html#alter_index On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote: The query on test dataset is returning the expected result with the patch. But on the original dataset (10million rows, 6 regions), it still return the same unexpected result, I will dig more into this. Thank you, James! 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu: Sure, let me have a try 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org: Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would you mind confirming that this fixes the issue you're seeing? James On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote: I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C:
Re: Problem in finding the largest value of an indexed column
Just got a chance to revisit this issue: I have rebuilt the index and it still returns the unexpected result. By using the test case, I tried to insert enough rows to make it auto-split and it reproduces the problem too. It seems it still has trouble returning last row sorted by first component of primary key on split tables. Maybe there is another issue than PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98 branch which includes the patch of PHOENIX-2096. 2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org: On further investigation, I believe it should have been working before. I did a bit of cleanup and attached a new patch to PHOENIX-2096, but this would only prevent a merge sort when one is not required (basically improving performance). Maybe your index is invalid? You can try rebuilding with this command: https://phoenix.apache.org/language/index.html#alter_index On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote: The query on test dataset is returning the expected result with the patch. But on the original dataset (10million rows, 6 regions), it still return the same unexpected result, I will dig more into this. Thank you, James! 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu: Sure, let me have a try 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org: Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would you mind confirming that this fixes the issue you're seeing? James On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote: I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this
Re: Problem in finding the largest value of an indexed column
On further investigation, I believe it should have been working before. I did a bit of cleanup and attached a new patch to PHOENIX-2096, but this would only prevent a merge sort when one is not required (basically improving performance). Maybe your index is invalid? You can try rebuilding with this command: https://phoenix.apache.org/language/index.html#alter_index On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote: The query on test dataset is returning the expected result with the patch. But on the original dataset (10million rows, 6 regions), it still return the same unexpected result, I will dig more into this. Thank you, James! 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu: Sure, let me have a try 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org: Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would you mind confirming that this fixes the issue you're seeing? James On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote: I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed
Re: Problem in finding the largest value of an indexed column
The query on test dataset is returning the expected result with the patch. But on the original dataset (10million rows, 6 regions), it still return the same unexpected result, I will dig more into this. Thank you, James! 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu: Sure, let me have a try 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org: Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would you mind confirming that this fixes the issue you're seeing? James On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote: I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it
Re: Problem in finding the largest value of an indexed column
I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan
Re: Problem in finding the largest value of an indexed column
Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would you mind confirming that this fixes the issue you're seeing? James On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote: I'm using 4.4.0-HBase-0.98 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org: Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan
Re: Problem in finding the largest value of an indexed column
I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com : OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com : Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan
Re: Problem in finding the largest value of an indexed column
When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan 553284,142801056 536488,142801337 560306,142801687 560306,142802063 560306,142802071 547732,142803566 543776,142804848 606972,142805062 535558,142805493 535558,142805496 535558,142805501 542141,142805622 599931,142805996 528159,142806024 577456,142806289 577456,142806303 533248,142806483 533248,142806488 533248,142806562 547732,142806597 574988,142806687 606777,142806763 606777,142806804 606777,142806810 536488,142807053
Re: Problem in finding the largest value of an indexed column
Yufan, What version of Phoenix are you using? Thanks, James On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote: When I made more tests, I find that this problem happens after table got split. Here is the DDL I use to create table and index: CREATE TABLE IF NOT EXISTS t1 ( uid BIGINT NOT NULL, timestamp BIGINT NOT NULL, eventName VARCHAR CONSTRAINT my_pk PRIMARY KEY (uid, timestamp)) COMPRESSION='SNAPPY'; CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName) Attach is the sample data I used for test. It has about 4000 rows, when the timestamp_index table has one region, the query returns correct result: 144048443, but when I manually split it into 4 regions (use hbase tool), it returns 143024961. Let know if you find anything. Thanks! 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org: If you could put a complete test (including your DDL and upsert of data), that would be much appreciated. Thanks, James On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote: I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result. There seems to be some internal problems related. 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org: Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan -- best, Yufan
Re: Problem in finding the largest value of an indexed column
Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST in your ORDER BY as rows are ordered with nulls first. On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote: I used the HBase reverse scan to find the last row on the index table. It returned the expected result. I would like to know is Phoenix's ORDER BY and DESC implemented based on HBase reverse scan? 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu: Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com : Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan -- best, Yufan
RE: Problem in finding the largest value of an indexed column
Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.hamailto:zimmk...@hotmail.com | webex: https://h.a/mikewebex [Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. From: Yufan Liu [mailto:yli...@kent.edu] Sent: Friday, June 26, 2015 6:31 PM To: user@phoenix.apache.org Subject: Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan
Re: Problem in finding the largest value of an indexed column
Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan
RE: Problem in finding the largest value of an indexed column
OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike From: Yufan Liu [mailto:yli...@kent.edu] Sent: Friday, June 26, 2015 7:19 PM To: user@phoenix.apache.org Subject: Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.commailto:mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 512.423.7447tel:512.423.7447 | skype: michael.mcallister.hamailto:zimmk...@hotmail.com | webex: https://h.a/mikewebex [Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. From: Yufan Liu [mailto:yli...@kent.edumailto:yli...@kent.edu] Sent: Friday, June 26, 2015 6:31 PM To: user@phoenix.apache.orgmailto:user@phoenix.apache.org Subject: Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan
Re: Problem in finding the largest value of an indexed column
Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan