I don't think that the bug is with EXISTS. As you can see with the query from my last email, it contains a nested correlated subquery which contains an aggregate within *EXISTS* and it worked after replacing the reference to the outer table (W."rowId") with a reference to the inner subquery table (H."rowId") within the aggregate subquery. In other words, the following query generates an error:
select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS W where EXISTS (select * from "History" AS H where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId") AND H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2 where (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =* * W."rowId"*) AND (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000)) AND NOT EXISTS (select * from "History" AS H3 where (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId") AND (H3."status" = 'pending') AND (TO_NUMBER(H3."value") < 1000)); But if I replace *W."rowId"* with REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'), the query works fine (which is the query from my last email). So I think the problem is not with EXISTS, but with the fact that a reference to a table from the outer query (W in my query) can be done up to a maximum one nested level, and not more than that. In other words, referring to a table from the outer query from within a subquery that is in the second or more nested level will generate an error. Best, Khaleel On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <maryann....@gmail.com> wrote: > Glad that your queries worked. Please do let us know if any further > problems. > > The bug is with EXISTS. Right now Phoenix does not handle aggregate > function calls correctly within an EXISTS subquery or EXISTS nested > subquery. I have opened a JIRA: > https://issues.apache.org/jira/browse/PHOENIX-1923. > > > Thanks, > Maryann > > > On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom...@gmail.com> > wrote: > >> >> Dear Maryann, >> >> I tried your query and it worked. I also executed a more complex query >> which I need in my testing, which is: >> >> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" >> AS W >> where EXISTS (select * from "History" AS H >> where >> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId") >> AND H."timestamp" = (select MAX(H2."timestamp") >> from "History" AS H2 >> where >> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') = >> >> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')) >> AND >> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000)) >> AND NOT EXISTS (select * from "History" AS H3 >> where >> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId") >> AND (H3."status" = 'pending') AND >> (TO_NUMBER(H3."value") < 1000)); >> >> and it also worked after I understood your fix of the original query. >> >> So the trick here is that we can use the reference to the outer query >> within the next subquery level only, and not up to two levels as I was >> doing? Maybe this limitation exists because Phoenix joins the tables from >> the outer and the inner correlated query, but it can perform this join up >> to one level only? >> >> >> >> Best, >> Khaleel >> >> >> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann....@gmail.com> >> wrote: >> >>> Hi Khaleel, >>> >>> Mind if you try the following query? I think it's the same semantics as >>> you meant for your query. >>> >>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" >>> AS W >>> where >>> W."rowId" in (select H"rowId" from "History" AS H >>> where H."timestamp" = >>> (select MAX(H2."timestamp") from >>> "History" AS H2 where H2."rowId" = H."rowId") >>> AND >>> H."status" = 'approved') >>> >>> >>> >>> Thanks, >>> Maryann >>> >>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <khellom...@gmail.com> >>> wrote: >>> >>>> Dear Maryann, >>>> >>>> Thanks for your question. You are right: the query that I was writing >>>> wasn't the correct one for my purpose. The query that will satisfy my >>>> request would be: >>>> >>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>> "Wiki" AS W >>>> where >>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND >>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2 >>>> where (H2."rowId" = W."rowId")) AND >>>> (H."status" = 'approved') ) >>>> >>>> In this query I specify that I need to select the exact version which >>>> has the Maximum timestamp among all versions of the same data item by using >>>> MAX(H2."timestamp") within the inner subquery within EXISTS. >>>> >>>> However I tried such query and it still produces the same error as the >>>> old query, which is: (Aggregate may not contain columns not in GROUP >>>> BY.) >>>> >>>> Thank you for your help. >>>> >>>> >>>> Best, >>>> Khaleel >>>> >>>> >>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann....@gmail.com> >>>> wrote: >>>> >>>>> Hi Khaleel, >>>>> >>>>> Thanks for the explanation! But my question was since this is an >>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved' >>>>> status, the EXISTS will return true regardless of whether you are testing >>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed >>>>> something? >>>>> >>>>> >>>>> Thanks, >>>>> Maryann >>>>> >>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <khellom...@gmail.com >>>>> > wrote: >>>>> >>>>>> >>>>>> Hello Maryann, >>>>>> >>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in the >>>>>> next release so that I can continue working with this part in my research >>>>>> project. Thanks for keeping me posted. >>>>>> >>>>>> With respect to your question, I am using the "History" table as a >>>>>> data store of all versions of a certain data item. When I say Max( >>>>>> H."timestamp") inside the query, I am selecting the latest version >>>>>> (most recent) that is "approved" (which is checked using the condition >>>>>> H."status" >>>>>> = 'approved') >>>>>> >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> Khaleel >>>>>> >>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <maryann....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Khaleel, >>>>>>> >>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I >>>>>>> will file a JIRA and keep you posted. >>>>>>> >>>>>>> One question though, why would we use MAX(H."timestamp") instead of >>>>>>> H."timestamp"? >>>>>>> What difference would it make? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Maryann >>>>>>> >>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad < >>>>>>> khellom...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I tried executing the following query in Phoenix: >>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H >>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') ); >>>>>>>> >>>>>>>> I got the following error: >>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP >>>>>>>> BY. (state=42Y27,code=1018) >>>>>>>> >>>>>>>> If I replace W."rowId" (in the where clause) with a constant value >>>>>>>> (for example: '3587'), the query works fine. Also, if I replace the >>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works >>>>>>>> fine. >>>>>>>> >>>>>>>> So it seems that Phoenix generates error when using a reference to >>>>>>>> an outer query while using an aggregate within the inner query. >>>>>>>> >>>>>>>> Any solutions? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Khaleel Mershad, Ph.D. >>>>>>>> Research Associate >>>>>>>> >>>>>>>> American University of Beirut >>>>>>>> Department of Electrical and Computer Engineering >>>>>>>> Bliss Street, Beirut, Lebanon >>>>>>>> email: kw...@aub.edu.lb >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >