Re: count distinct
Your second query should work with Phoenix 4.3 or later. Thanks, unfortunately at the moment I’m stuck with Phoenix 4.2. I will investigate the problem with the first one and get back to you. Appreciate this. Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.hamailto:zimmk...@hotmail.com | webex: https://h.a/mikewebex [cid:image002.png@01D080DC.77AD4930] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. On Jun 23, 2015, at 7:54 PM, Maryann Xue maryann@gmail.commailto:maryann@gmail.com wrote: Sorry, I missed the first line. Your second query should work with Phoenix 4.3 or later. I will investigate the problem with the first one and get back to you. Thanks, Maryann On Tuesday, June 23, 2015, Michael McAllister mmcallis...@homeaway.commailto:mmcallis...@homeaway.com wrote: Hi (This questions relates to Phoenix 4.2 on HDP 2.2) I have a situation where I want to count the distinct combination of a couple of columns. When I try the following:- select count(distinct a.col1, b.col2) from table tab1 a inner join tab2 b on b.joincol = a.joincol where a.col3 = ‘some condition’ and b.col4 = ‘some other condition'; I get the following error:- Error: ERROR 605 (42P00): Syntax error. Unknown function: DISTINCT_COUNT. (state=42P00,code=605) Playing around with this it looks like count(distinct) works with a single column, but not more. So I try this:- SELECT count(*) FROM (SELECT a.col1, b.col2 FROM TABLE tab1 a INNER JOIN tab2 b ON b.joincol = a.joincol WHERE a.col3 = ‘SOME condition’ AND b.col4 = ‘SOME other condition' GROUP BY a.col1, b.col2) ; I get the following error:- Error: Complex nested queries not supported. (state=,code=0) So, my question … is there any way to get what I’m looking for? Regards, Mike
Re: count distinct
Michael, You're correct, count distinct doesn't support multiple arguments currently (I filed PHOENIX-2062 for this). Another workaround is to combine a.col1 and b.col2 into an expression, for example concatenating them. If order matters, you could do this: select count(distinct col1 || col2) ... If order doesn't matter, you could do this: select count(distinct case when col1 col2 then col1 || col2 else col2 || col1 end) ... Thanks, James On Tue, Jun 23, 2015 at 6:56 PM, Michael McAllister mmcallis...@homeaway.com wrote: Your second query should work with Phoenix 4.3 or later. Thanks, unfortunately at the moment I’m stuck with Phoenix 4.2. I will investigate the problem with the first one and get back to you. Appreciate this. Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. On Jun 23, 2015, at 7:54 PM, Maryann Xue maryann@gmail.com wrote: Sorry, I missed the first line. Your second query should work with Phoenix 4.3 or later. I will investigate the problem with the first one and get back to you. Thanks, Maryann On Tuesday, June 23, 2015, Michael McAllister mmcallis...@homeaway.com wrote: Hi (This questions relates to Phoenix 4.2 on HDP 2.2) I have a situation where I want to count the distinct combination of a couple of columns. When I try the following:- select count(distinct a.col1, b.col2) from table tab1 a inner join tab2 b on b.joincol = a.joincol where a.col3 = ‘some condition’ and b.col4 = ‘some other condition'; I get the following error:- Error: ERROR 605 (42P00): Syntax error. Unknown function: DISTINCT_COUNT. (state=42P00,code=605) Playing around with this it looks like count(distinct) works with a single column, but not more. So I try this:- SELECT count(*) FROM (SELECT a.col1, b.col2 FROM TABLE tab1 a INNER JOIN tab2 b ON b.joincol = a.joincol WHERE a.col3 = ‘SOME condition’ AND b.col4 = ‘SOME other condition' GROUP BY a.col1, b.col2) ; I get the following error:- Error: Complex nested queries not supported. (state=,code=0) So, my question … is there any way to get what I’m looking for? Regards, Mike
Re: count distinct
Which version of Phoenix are you using? On Tuesday, June 23, 2015, Michael McAllister mmcallis...@homeaway.com wrote: Hi (This questions relates to Phoenix 4.2 on HDP 2.2) I have a situation where I want to count the distinct combination of a couple of columns. When I try the following:- select count(distinct a.col1, b.col2) from table tab1 a inner join tab2 b on b.joincol = a.joincol where a.col3 = ‘some condition’ and b.col4 = ‘some other condition'; I get the following error:- Error: ERROR 605 (42P00): Syntax error. Unknown function: DISTINCT_COUNT. (state=42P00,code=605) Playing around with this it looks like count(distinct) works with a single column, but not more. So I try this:- SELECT count(*) FROM (SELECT a.col1, b.col2 FROM TABLE tab1 a INNER JOIN tab2 b ON b.joincol = a.joincol WHERE a.col3 = ‘SOME condition’ AND b.col4 = ‘SOME other condition' GROUP BY a.col1, b.col2) ; I get the following error:- Error: Complex nested queries not supported. (state=,code=0) So, my question … is there any way to get what I’m looking for? Regards, Mike
Re: count distinct
Sorry, I missed the first line. Your second query should work with Phoenix 4.3 or later. I will investigate the problem with the first one and get back to you. Thanks, Maryann On Tuesday, June 23, 2015, Michael McAllister mmcallis...@homeaway.com wrote: Hi (This questions relates to Phoenix 4.2 on HDP 2.2) I have a situation where I want to count the distinct combination of a couple of columns. When I try the following:- select count(distinct a.col1, b.col2) from table tab1 a inner join tab2 b on b.joincol = a.joincol where a.col3 = ‘some condition’ and b.col4 = ‘some other condition'; I get the following error:- Error: ERROR 605 (42P00): Syntax error. Unknown function: DISTINCT_COUNT. (state=42P00,code=605) Playing around with this it looks like count(distinct) works with a single column, but not more. So I try this:- SELECT count(*) FROM (SELECT a.col1, b.col2 FROM TABLE tab1 a INNER JOIN tab2 b ON b.joincol = a.joincol WHERE a.col3 = ‘SOME condition’ AND b.col4 = ‘SOME other condition' GROUP BY a.col1, b.col2) ; I get the following error:- Error: Complex nested queries not supported. (state=,code=0) So, my question … is there any way to get what I’m looking for? Regards, Mike
Fwd: Issues w/ count (distinct col) queries via phoenix table with secondary indexes
Hi, Yesterday, we created a secondary index on one our tables to help improve the read speed of performing select count queries like the one below. The query w/ index runs fast but provides the same counts. The query w/o index (/*+ NO_INDEX */) runs slower but provides the right counts for each column. The index is covered and includes all the columns specified below. The bug we are observing seems similar to https://issues.apache.org/jira/browse/PHOENIX-814 and https://issues.apache.org/jira/browse/PHOENIX-1203. Can anyone points us in the right direction to correct this issue (or workaround) as we need fast retrieval and proper counting? We have the following environment: Hbase 0.98.4.2.2.0.0 Phoenix: 4.2.0.2.2.0.0-2041 Core (CPU) - 8 RAM - 29 Gb OS:centos6 (x86_64) QUERY +++ SELECT A.COL1, A.COL2, COUNT(DISTINCT A.COL3), COUNT(DISTINCT A.COL4) FROM A WHERE A.COL5 = 1427864399000 AND A.COL5 = 1427947199000 GROUP BY A.COL1, A.COL2 HAVING COUNT(DISTINCT A.COL3) 10; A.COL1 and A.COL2 are varchars. The rest of the columns are unsigned_longs. QUERY +++
Re: Issues w/ count (distinct col) queries via phoenix table with secondary indexes
Hey Jude, Would you mind trying with 4.3.1 release and letting us know if the issue is resolved? Thanks, James On Tue, Apr 28, 2015 at 6:07 PM, Jude K j2k...@gmail.com wrote: Hi, Yesterday, we created a secondary index on one our tables to help improve the read speed of performing select count queries like the one below. The query w/ index runs fast but provides the same counts. The query w/o index (/*+ NO_INDEX */) runs slower but provides the right counts for each column. The index is covered and includes all the columns specified below. The bug we are observing seems similar to https://issues.apache.org/jira/browse/PHOENIX-814 and https://issues.apache.org/jira/browse/PHOENIX-1203. Can anyone points us in the right direction to correct this issue (or workaround) as we need fast retrieval and proper counting? We have the following environment: Hbase 0.98.4.2.2.0.0 Phoenix: 4.2.0.2.2.0.0-2041 Core (CPU) - 8 RAM - 29 Gb OS:centos6 (x86_64) QUERY +++ SELECT A.COL1, A.COL2, COUNT(DISTINCT A.COL3), COUNT(DISTINCT A.COL4) FROM A WHERE A.COL5 = 1427864399000 AND A.COL5 = 1427947199000 GROUP BY A.COL1, A.COL2 HAVING COUNT(DISTINCT A.COL3) 10; A.COL1 and A.COL2 are varchars. The rest of the columns are unsigned_longs. QUERY +++
Issues w/ count (distinct col) queries via phoenix table with secondary indexes
Yesterday, we created a secondary index on one our tables to help improve the read speed of performing select count queries like the one below. The query w/ index runs fast but provides the same counts. The query w/o index (/*+ NO_INDEX */) runs slower but provides the right counts for each column. The index is covered and includes all the columns specified below. The bug we are observing seems similar to https://issues.apache.org/jira/browse/PHOENIX-814 and https://issues.apache.org/jira/browse/PHOENIX-1203. Can anyone points us in the right direction to correct this issue (or workaround) as we need fast retrieval and proper counting? We have the following environment: Hbase 0.98.4.2.2.0.0 Phoenix: 4.2.0.2.2.0.0-2041 Core (CPU) - 8 RAM - 29 Gb OS:centos6 (x86_64) QUERY +++ SELECT A.COL1, A.COL2, COUNT(DISTINCT A.COL3), COUNT(DISTINCT A.COL4) FROM A WHERE A.COL5 = 1427864399000 AND A.COL5 = 1427947199000 GROUP BY A.COL1, A.COL2 HAVING COUNT(DISTINCT A.COL3) 10; A.COL1 and A.COL2 are varchars. The rest of the columns are unsigned_longs. QUERY +++