subject:"RE\: Problem in finding the largest value of an indexed column"

Re: Problem in finding the largest value of an indexed column

2015-07-13 Thread Yufan Liu

It seems that setting MAX_FILESIZE only won't trigger split automatically
(need major compaction?). Also using split on when creating the table to
force split seems only works for String type row key. Is there any way else
I can use to make the table auto split for the unit test?

2015-07-09 23:36 GMT-07:00 James Taylor jamestay...@apache.org:

 Sounds like something else is going wrong. Can you adapt your test by
 setting the MAX_FILESIZE very low for your table (so that it splits after 4
 or 5 rows are added) and package it up as a unit test?

 On Thu, Jul 9, 2015 at 1:44 PM, Yufan Liu yli...@kent.edu wrote:

 Just got a chance to revisit this issue: I have rebuilt the index and it
 still returns the unexpected result. By using the test case, I tried to
 insert enough rows to make it auto-split and it reproduces the problem too.
 It seems it still has trouble returning last row sorted by first component
 of primary key on split tables. Maybe there is another issue than
 PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98
 branch which includes the patch of PHOENIX-2096.

 2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org:

 On further investigation, I believe it should have been working before.
 I did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
 would only prevent a merge sort when one is not required (basically
 improving performance).

 Maybe your index is invalid? You can try rebuilding with this command:
 https://phoenix.apache.org/language/index.html#alter_index

 On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote:

 The query on test dataset is returning the expected result with the
 patch. But on the original dataset (10million rows, 6 regions), it still
 return the same unexpected result, I will dig more into this. Thank you,
 James!

 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu:

 Sure, let me have a try

 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org:

 Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
 Would you mind confirming that this fixes the issue you're seeing?

 James

 On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after
 table got split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp))
 COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows,
 when the timestamp_index table has one region, the query returns 
 correct
 result: 144048443, but when I manually split it into 4 regions 
 (use
 hbase tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert
 of data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu
 wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY
 timestamp DESC NULLS LAST LIMIT 1. But it still returns the same
 unexpected result. There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org
 :

 Yes, reverse scan will be leveraged when possible. Make you use
 NULLS LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu
 wrote:

 I used the HBase reverse scan to find the last row on the
 index table. It returned the expected result. I would like to 
 know is
 Phoenix's ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the
 advice I could give you. There are people here far more 
 experienced than I
 am who should be able to give you deeper advice. Have a great 
 weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an
 indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT
 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER 
 FILTER BY
 FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as 
 expected. For
 the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL 
 SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW 
 LIMIT

Re: Problem in finding the largest value of an indexed column

2015-07-10 Thread James Taylor

Sounds like something else is going wrong. Can you adapt your test by
setting the MAX_FILESIZE very low for your table (so that it splits after 4
or 5 rows are added) and package it up as a unit test?

On Thu, Jul 9, 2015 at 1:44 PM, Yufan Liu yli...@kent.edu wrote:

 Just got a chance to revisit this issue: I have rebuilt the index and it
 still returns the unexpected result. By using the test case, I tried to
 insert enough rows to make it auto-split and it reproduces the problem too.
 It seems it still has trouble returning last row sorted by first component
 of primary key on split tables. Maybe there is another issue than
 PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98
 branch which includes the patch of PHOENIX-2096.

 2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org:

 On further investigation, I believe it should have been working before. I
 did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
 would only prevent a merge sort when one is not required (basically
 improving performance).

 Maybe your index is invalid? You can try rebuilding with this command:
 https://phoenix.apache.org/language/index.html#alter_index

 On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote:

 The query on test dataset is returning the expected result with the
 patch. But on the original dataset (10million rows, 6 regions), it still
 return the same unexpected result, I will dig more into this. Thank you,
 James!

 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu:

 Sure, let me have a try

 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org:

 Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
 Would you mind confirming that this fixes the issue you're seeing?

 James

 On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after
 table got split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp))
 COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows,
 when the timestamp_index table has one region, the query returns 
 correct
 result: 144048443, but when I manually split it into 4 regions (use
 hbase tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu
 wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY
 timestamp DESC NULLS LAST LIMIT 1. But it still returns the same
 unexpected result. There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use
 NULLS LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu
 wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is 
 Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the
 advice I could give you. There are people here far more 
 experienced than I
 am who should be able to give you deeper advice. Have a great 
 weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an
 indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT
 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER 
 FILTER BY
 FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as 
 expected. For
 the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL 
 SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW 
 LIMIT which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is
 being used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C:

Re: Problem in finding the largest value of an indexed column

2015-07-09 Thread Yufan Liu

Just got a chance to revisit this issue: I have rebuilt the index and it
still returns the unexpected result. By using the test case, I tried to
insert enough rows to make it auto-split and it reproduces the problem too.
It seems it still has trouble returning last row sorted by first component
of primary key on split tables. Maybe there is another issue than
PHOENIX-2096? The phoenix I am using is pulled from latest 4.x-HBase-0.98
branch which includes the patch of PHOENIX-2096.

2015-07-02 19:55 GMT-07:00 James Taylor jamestay...@apache.org:

 On further investigation, I believe it should have been working before. I
 did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
 would only prevent a merge sort when one is not required (basically
 improving performance).

 Maybe your index is invalid? You can try rebuilding with this command:
 https://phoenix.apache.org/language/index.html#alter_index

 On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote:

 The query on test dataset is returning the expected result with the
 patch. But on the original dataset (10million rows, 6 regions), it still
 return the same unexpected result, I will dig more into this. Thank you,
 James!

 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu:

 Sure, let me have a try

 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org:

 Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
 Would you mind confirming that this fixes the issue you're seeing?

 James

 On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after table
 got split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows,
 when the timestamp_index table has one region, the query returns correct
 result: 144048443, but when I manually split it into 4 regions (use
 hbase tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY
 timestamp DESC NULLS LAST LIMIT 1. But it still returns the same
 unexpected result. There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use
 NULLS LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu
 wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is 
 Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the
 advice I could give you. There are people here far more 
 experienced than I
 am who should be able to give you deeper advice. Have a great 
 weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an
 indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT
 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER 
 FILTER BY
 FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as 
 expected. For
 the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL 
 SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW 
 LIMIT which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is
 being used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this

Re: Problem in finding the largest value of an indexed column

2015-07-02 Thread James Taylor

On further investigation, I believe it should have been working before. I
did a bit of cleanup and attached a new patch to PHOENIX-2096, but this
would only prevent a merge sort when one is not required (basically
improving performance).

Maybe your index is invalid? You can try rebuilding with this command:
https://phoenix.apache.org/language/index.html#alter_index

On Thu, Jul 2, 2015 at 5:26 PM, Yufan Liu yli...@kent.edu wrote:

 The query on test dataset is returning the expected result with the patch.
 But on the original dataset (10million rows, 6 regions), it still return
 the same unexpected result, I will dig more into this. Thank you, James!

 2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu:

 Sure, let me have a try

 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org:

 Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
 Would you mind confirming that this fixes the issue you're seeing?

 James

 On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after table
 got split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows,
 when the timestamp_index table has one region, the query returns correct
 result: 144048443, but when I manually split it into 4 regions (use
 hbase tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY
 timestamp DESC NULLS LAST LIMIT 1. But it still returns the same
 unexpected result. There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use
 NULLS LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu
 wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is 
 Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the
 advice I could give you. There are people here far more 
 experienced than I
 am who should be able to give you deeper advice. Have a great 
 weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an
 indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY 
 FIRST KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For 
 the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN 
 OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW 
 LIMIT which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is
 being used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this 
 communication,
 please be advised that any disclosure, dissemination, 
 distribution, copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, 
 please
 notify the sender immediately by reply e-mail and promptly destroy 
 all
 electronic and printed copies of this communication and any 
 attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed

Re: Problem in finding the largest value of an indexed column

2015-07-02 Thread Yufan Liu

The query on test dataset is returning the expected result with the patch.
But on the original dataset (10million rows, 6 regions), it still return
the same unexpected result, I will dig more into this. Thank you, James!

2015-07-02 9:58 GMT-07:00 Yufan Liu yli...@kent.edu:

 Sure, let me have a try

 2015-07-02 9:46 GMT-07:00 James Taylor jamestay...@apache.org:

 Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch.
 Would you mind confirming that this fixes the issue you're seeing?

 James

 On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after table
 got split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows,
 when the timestamp_index table has one region, the query returns correct
 result: 144048443, but when I manually split it into 4 regions (use
 hbase tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY
 timestamp DESC NULLS LAST LIMIT 1. But it still returns the same
 unexpected result. There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use
 NULLS LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is 
 Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice
 I could give you. There are people here far more experienced than I 
 am who
 should be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an
 indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY 
 FIRST KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For 
 the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN 
 OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW 
 LIMIT which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is
 being used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this 
 communication,
 please be advised that any disclosure, dissemination, distribution, 
 copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, 
 please
 notify the sender immediately by reply e-mail and promptly destroy 
 all
 electronic and printed copies of this communication and any 
 attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed
 column



 Hi,

 We have created a table (eg, t1), and a global index of one
 numeric column of t1 (eg, timestamp). Now we want to find the 
 largest value
 of timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to
 finish, so I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1;
 This query finished fast, but the result it

Re: Problem in finding the largest value of an indexed column

2015-07-02 Thread Yufan Liu

I'm using 4.4.0-HBase-0.98

2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after table got
 split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows, when
 the timestamp_index table has one region, the query returns correct result:
 144048443, but when I manually split it into 4 regions (use hbase
 tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp
 DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result.
 There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use NULLS
 LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice I
 could give you. There are people here far more experienced than I am 
 who
 should be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed
 column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST 
 KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT 
 which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being
 used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this 
 communication,
 please be advised that any disclosure, dissemination, distribution, 
 copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, please
 notify the sender immediately by reply e-mail and promptly destroy all
 electronic and printed copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed
 column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric
 column of t1 (eg, timestamp). Now we want to find the largest value of
 timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to
 finish, so I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This
 query finished fast, but the result it returns is far from the largest
 value. It seems it just return the largest value for a certain range of
 data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




 --
 best,
 Yufan




 --
 best,
 Yufan





 --
 best,
 Yufan





 --
 best,
 Yufan





-- 
best,
Yufan

Re: Problem in finding the largest value of an indexed column

2015-07-02 Thread James Taylor

Thanks, Yufan. I found an issue and filed PHOENIX-2096 with a patch. Would
you mind confirming that this fixes the issue you're seeing?

James

On Thu, Jul 2, 2015 at 9:45 AM, Yufan Liu yli...@kent.edu wrote:

 I'm using 4.4.0-HBase-0.98

 2015-07-01 22:31 GMT-07:00 James Taylor jamestay...@apache.org:

 Yufan,
 What version of Phoenix are you using?
 Thanks,
 James

 On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after table got
 split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows, when
 the timestamp_index table has one region, the query returns correct result:
 144048443, but when I manually split it into 4 regions (use hbase
 tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of
 data), that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp
 DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result.
 There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use NULLS
 LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index
 table. It returned the expected result. I would like to know is 
 Phoenix's
 ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice I
 could give you. There are people here far more experienced than I am 
 who
 should be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed
 column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST 
 KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT 
 which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being
 used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this 
 communication,
 please be advised that any disclosure, dissemination, distribution, 
 copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, please
 notify the sender immediately by reply e-mail and promptly destroy all
 electronic and printed copies of this communication and any 
 attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed
 column



 Hi,

 We have created a table (eg, t1), and a global index of one
 numeric column of t1 (eg, timestamp). Now we want to find the largest 
 value
 of timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to
 finish, so I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This
 query finished fast, but the result it returns is far from the largest
 value. It seems it just return the largest value for a certain range 
 of
 data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




 --
 best,
 Yufan




 --
 best,
 Yufan





 --
 best,
 Yufan





 --
 best,
 Yufan





 --
 best,
 Yufan

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread Yufan Liu

I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp DESC
NULLS LAST LIMIT 1. But it still returns the same unexpected result. There
seems to be some internal problems related.

2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST
 in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index table. It
 returned the expected result. I would like to know is Phoenix's ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com
 :

  OK, I’m a Phoenix newbie, so that was the extent of the advice I
 could give you. There are people here far more experienced than I am who
 should be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed
 column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com
 :

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being used
 to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this communication,
 please be advised that any disclosure, dissemination, distribution, copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, please
 notify the sender immediately by reply e-mail and promptly destroy all
 electronic and printed copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric
 column of t1 (eg, timestamp). Now we want to find the largest value of
 timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to finish,
 so I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This query
 finished fast, but the result it returns is far from the largest value. It
 seems it just return the largest value for a certain range of data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




 --
 best,
 Yufan




 --
 best,
 Yufan





-- 
best,
Yufan

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread Yufan Liu

When I made more tests, I find that this problem happens after table got
split.

Here is the DDL I use to create table and index:
CREATE TABLE IF NOT EXISTS t1 (
uid BIGINT NOT NULL,
timestamp BIGINT NOT NULL,
eventName VARCHAR
CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

Attach is the sample data I used for test. It has about 4000 rows, when the
timestamp_index table has one region, the query returns correct result:
144048443, but when I manually split it into 4 regions (use hbase
tool), it returns 143024961.

Let know if you find anything. Thanks!


2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of data),
 that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp
 DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result.
 There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use NULLS
 LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index table.
 It returned the expected result. I would like to know is Phoenix's ORDER
 BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice I
 could give you. There are people here far more experienced than I am who
 should be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed
 column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT 
 which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being
 used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this 
 communication,
 please be advised that any disclosure, dissemination, distribution, 
 copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, please
 notify the sender immediately by reply e-mail and promptly destroy all
 electronic and printed copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric
 column of t1 (eg, timestamp). Now we want to find the largest value of
 timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to finish,
 so I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This
 query finished fast, but the result it returns is far from the largest
 value. It seems it just return the largest value for a certain range of
 data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




 --
 best,
 Yufan




 --
 best,
 Yufan





 --
 best,
 Yufan





-- 
best,
Yufan
553284,142801056
536488,142801337
560306,142801687
560306,142802063
560306,142802071
547732,142803566
543776,142804848
606972,142805062
535558,142805493
535558,142805496
535558,142805501
542141,142805622
599931,142805996
528159,142806024
577456,142806289
577456,142806303
533248,142806483
533248,142806488
533248,142806562
547732,142806597
574988,142806687
606777,142806763
606777,142806804
606777,142806810
536488,142807053

Re: Problem in finding the largest value of an indexed column

2015-07-01 Thread James Taylor

Yufan,
What version of Phoenix are you using?
Thanks,
James

On Wed, Jul 1, 2015 at 2:34 PM, Yufan Liu yli...@kent.edu wrote:

 When I made more tests, I find that this problem happens after table got
 split.

 Here is the DDL I use to create table and index:
 CREATE TABLE IF NOT EXISTS t1 (
 uid BIGINT NOT NULL,
 timestamp BIGINT NOT NULL,
 eventName VARCHAR
 CONSTRAINT my_pk PRIMARY KEY (uid,  timestamp)) COMPRESSION='SNAPPY';

 CREATE INDEX timestamp_index ON t1 (timestamp) INCLUDE (eventName)

 Attach is the sample data I used for test. It has about 4000 rows, when
 the timestamp_index table has one region, the query returns correct result:
 144048443, but when I manually split it into 4 regions (use hbase
 tool), it returns 143024961.

 Let know if you find anything. Thanks!


 2015-07-01 11:27 GMT-07:00 James Taylor jamestay...@apache.org:

 If you could put a complete test (including your DDL and upsert of data),
 that would be much appreciated.
 Thanks,
 James

 On Wed, Jul 1, 2015 at 11:20 AM, Yufan Liu yli...@kent.edu wrote:

 I have tried to use query SELECT timestamp FROM t1 ORDER BY timestamp
 DESC NULLS LAST LIMIT 1. But it still returns the same unexpected result.
 There seems to be some internal problems related.

 2015-06-30 18:03 GMT-07:00 James Taylor jamestay...@apache.org:

 Yes, reverse scan will be leveraged when possible. Make you use NULLS
 LAST in your ORDER BY as rows are ordered with nulls first.

 On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index table.
 It returned the expected result. I would like to know is Phoenix's ORDER
 BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice I
 could give you. There are people here far more experienced than I am who
 should be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed
 column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK
 PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST 
 KEY
 ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the
 second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER
 TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT 
 which
 looks correct, but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister 
 mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being
 used to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this 
 communication,
 please be advised that any disclosure, dissemination, distribution, 
 copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, please
 notify the sender immediately by reply e-mail and promptly destroy all
 electronic and printed copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric
 column of t1 (eg, timestamp). Now we want to find the largest value of
 timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to
 finish, so I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This
 query finished fast, but the result it returns is far from the largest
 value. It seems it just return the largest value for a certain range of
 data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




 --
 best,
 Yufan




 --
 best,
 Yufan





 --
 best,
 Yufan





 --
 best,
 Yufan

Re: Problem in finding the largest value of an indexed column

2015-06-30 Thread James Taylor

Yes, reverse scan will be leveraged when possible. Make you use NULLS LAST
in your ORDER BY as rows are ordered with nulls first.

On Tue, Jun 30, 2015 at 5:25 PM, Yufan Liu yli...@kent.edu wrote:

 I used the HBase reverse scan to find the last row on the index table. It
 returned the expected result. I would like to know is Phoenix's ORDER BY
 and DESC implemented based on HBase reverse scan?

 2015-06-26 17:25 GMT-07:00 Yufan Liu yli...@kent.edu:

 Thank you anyway, Michael!

 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice I could
 give you. There are people here far more experienced than I am who should
 be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL
 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY;
 SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one,
 it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX;
 SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct,
 but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com
 :

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being used
 to access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype:
 michael.mcallister.ha zimmk...@hotmail.com | webex:
 https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is
 confidential.  If you are not an intended recipient of this communication,
 please be advised that any disclosure, dissemination, distribution, copying
 or other use of this communication or any attachment is strictly
 prohibited.  If you have received this communication in error, please
 notify the sender immediately by reply e-mail and promptly destroy all
 electronic and printed copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric
 column of t1 (eg, timestamp). Now we want to find the largest value of
 timestamp, we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to finish, so
 I think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This query
 finished fast, but the result it returns is far from the largest value. It
 seems it just return the largest value for a certain range of data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




 --
 best,
 Yufan




 --
 best,
 Yufan

RE: Problem in finding the largest value of an indexed column

2015-06-26 Thread Michael McAllister

Yufan

Have you tried using the EXPLAIN command to see what plan is being used to 
access the data?

Michael McAllister
Staff Data Warehouse Engineer | Decision Systems
mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 512.423.7447 | 
skype: michael.mcallister.hamailto:zimmk...@hotmail.com | webex: 
https://h.a/mikewebex
[Description: Description: cid:3410354473_30269081]
This electronic communication (including any attachment) is confidential.  If 
you are not an intended recipient of this communication, please be advised that 
any disclosure, dissemination, distribution, copying or other use of this 
communication or any attachment is strictly prohibited.  If you have received 
this communication in error, please notify the sender immediately by reply 
e-mail and promptly destroy all electronic and printed copies of this 
communication and any attachment.

From: Yufan Liu [mailto:yli...@kent.edu]
Sent: Friday, June 26, 2015 6:31 PM
To: user@phoenix.apache.org
Subject: Problem in finding the largest value of an indexed column

Hi,
We have created a table (eg, t1), and a global index of one numeric column of 
t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have 
tried two approaches:

1. select max(timestamp) from t1; This query takes forever to finish, so I 
think it maybe doing a full table scan/comparison .
2. select timestamp from t1 order by timestamp desc limit 1; This query 
finished fast, but the result it returns is far from the largest value. It 
seems it just return the largest value for a certain range of data.
Did anyone else encounter this issue/have any suggestion?

--
Thanks,
Yufan

Re: Problem in finding the largest value of an indexed column

2015-06-26 Thread Yufan Liu

Hi Michael,

Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL
1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY;
SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one,
it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX;
SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct,
but still returns the unexpected result.

2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com:

  Yufan



 Have you tried using the EXPLAIN command to see what plan is being used to
 access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha
 zimmk...@hotmail.com | webex: https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is confidential.
 If you are not an intended recipient of this communication, please be
 advised that any disclosure, dissemination, distribution, copying or other
 use of this communication or any attachment is strictly prohibited.  If you
 have received this communication in error, please notify the sender
 immediately by reply e-mail and promptly destroy all electronic and printed
 copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric column
 of t1 (eg, timestamp). Now we want to find the largest value of timestamp,
 we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to finish, so I
 think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This query
 finished fast, but the result it returns is far from the largest value. It
 seems it just return the largest value for a certain range of data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




-- 
best,
Yufan

RE: Problem in finding the largest value of an indexed column

2015-06-26 Thread Michael McAllister

OK, I’m a Phoenix newbie, so that was the extent of the advice I could give 
you. There are people here far more experienced than I am who should be able to 
give you deeper advice. Have a great weekend!

Mike

From: Yufan Liu [mailto:yli...@kent.edu]
Sent: Friday, June 26, 2015 7:19 PM
To: user@phoenix.apache.org
Subject: Re: Problem in finding the largest value of an indexed column

Hi Michael,
Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY 
FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 
AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's 
CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER 
FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still 
returns the unexpected result.

2015-06-26 16:59 GMT-07:00 Michael McAllister 
mmcallis...@homeaway.commailto:mmcallis...@homeaway.com:
Yufan

Have you tried using the EXPLAIN command to see what plan is being used to 
access the data?

Michael McAllister
Staff Data Warehouse Engineer | Decision Systems
mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 
512.423.7447tel:512.423.7447 | skype: 
michael.mcallister.hamailto:zimmk...@hotmail.com | webex: 
https://h.a/mikewebex
[Description: Description: cid:3410354473_30269081]
This electronic communication (including any attachment) is confidential.  If 
you are not an intended recipient of this communication, please be advised that 
any disclosure, dissemination, distribution, copying or other use of this 
communication or any attachment is strictly prohibited.  If you have received 
this communication in error, please notify the sender immediately by reply 
e-mail and promptly destroy all electronic and printed copies of this 
communication and any attachment.

From: Yufan Liu [mailto:yli...@kent.edumailto:yli...@kent.edu]
Sent: Friday, June 26, 2015 6:31 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Problem in finding the largest value of an indexed column

Hi,
We have created a table (eg, t1), and a global index of one numeric column of 
t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have 
tried two approaches:

1. select max(timestamp) from t1; This query takes forever to finish, so I 
think it maybe doing a full table scan/comparison .
2. select timestamp from t1 order by timestamp desc limit 1; This query 
finished fast, but the result it returns is far from the largest value. It 
seems it just return the largest value for a certain range of data.
Did anyone else encounter this issue/have any suggestion?

--
Thanks,
Yufan

--
best,
Yufan

Re: Problem in finding the largest value of an indexed column

2015-06-26 Thread Yufan Liu

Thank you anyway, Michael!

2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com:

  OK, I’m a Phoenix newbie, so that was the extent of the advice I could
 give you. There are people here far more experienced than I am who should
 be able to give you deeper advice. Have a great weekend!



 Mike



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 7:19 PM
 *To:* user@phoenix.apache.org
 *Subject:* Re: Problem in finding the largest value of an indexed column



 Hi Michael,

 Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL
 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY;
 SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one,
 it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX;
 SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct,
 but still returns the unexpected result.



 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com:

 Yufan



 Have you tried using the EXPLAIN command to see what plan is being used to
 access the data?



 Michael McAllister

 Staff Data Warehouse Engineer | Decision Systems

 mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha
 zimmk...@hotmail.com | webex: https://h.a/mikewebex

 [image: Description: Description: cid:3410354473_30269081]

 This electronic communication (including any attachment) is confidential.
 If you are not an intended recipient of this communication, please be
 advised that any disclosure, dissemination, distribution, copying or other
 use of this communication or any attachment is strictly prohibited.  If you
 have received this communication in error, please notify the sender
 immediately by reply e-mail and promptly destroy all electronic and printed
 copies of this communication and any attachment.



 *From:* Yufan Liu [mailto:yli...@kent.edu]
 *Sent:* Friday, June 26, 2015 6:31 PM
 *To:* user@phoenix.apache.org
 *Subject:* Problem in finding the largest value of an indexed column



 Hi,

 We have created a table (eg, t1), and a global index of one numeric column
 of t1 (eg, timestamp). Now we want to find the largest value of timestamp,
 we have tried two approaches:


 1. select max(timestamp) from t1; This query takes forever to finish, so I
 think it maybe doing a full table scan/comparison .

 2. select timestamp from t1 order by timestamp desc limit 1; This query
 finished fast, but the result it returns is far from the largest value. It
 seems it just return the largest value for a certain range of data.

 Did anyone else encounter this issue/have any suggestion?


 --

 Thanks,

 Yufan




 --

 best,
 Yufan




-- 
best,
Yufan

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

RE: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

RE: Problem in finding the largest value of an indexed column

Re: Problem in finding the largest value of an indexed column

15 matches

Site Navigation

Mail list logo

Footer information