Thanks for the explanation. I see what you mean. I will also try testing a query that contains nested correlated subqueries with multiple level outer reference, and with using IN instead of EXISTS and I will see if it produces an error.
Best, Khaleel On Sun, Apr 26, 2015 at 7:14 PM, Maryann Xue <maryann....@gmail.com> wrote: > Hi Khaleel, > > Thanks for looking into the problem! But there IS a bug with EXISTS having > an aggregate function, say "max(H."timestamp") as used in your very first > query that did not work. Otherwise, if subquery is SELECT * or SELECT > column1 or SELECT substr(column1), it's fine coz there is no aggregate. > > And interestingly the query with W."rowId" failed because it referenced to > the outmost query that had EXISTS. Otherwise if using H. > rowId", it only had to deal with "SELECT * from H" which was a comparison > subquery and so it worked fine. > > Anyway, I will verify if there is any problem with multiple level outer > reference in correlated subqueries, as an independent issue without EXISTS. > > Given that our EXISTS support is currently incomplete, you may want to > rewrite your EXISTS subqueries with equivalent IN subqueries, which would > most likely just work. > > > Thanks, > Maryann > > > On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad <khellom...@gmail.com> > wrote: > >> >> I don't think that the bug is with EXISTS. As you can see with the query >> from my last email, it contains a nested correlated subquery which contains >> an aggregate within *EXISTS* and it worked after replacing the reference >> to the outer table (W."rowId") with a reference to the inner subquery table >> (H."rowId") within the aggregate subquery. In other words, the following >> query generates an error: >> >> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" >> AS W >> where EXISTS (select * from "History" AS H >> where >> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId") >> AND H."timestamp" = (select MAX(H2."timestamp") >> from "History" AS H2 >> where >> (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') >> =* >> * >> W."rowId"*) >> AND >> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000)) >> AND NOT EXISTS (select * from "History" AS H3 >> where >> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId") >> AND (H3."status" = 'pending') AND >> (TO_NUMBER(H3."value") < 1000)); >> >> >> But if I replace *W."rowId"* with >> REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'), >> the query works fine (which is the query from my last email). So I think >> the problem is not with EXISTS, but with the fact that a reference to a >> table from the outer query (W in my query) can be done up to a maximum one >> nested level, and not more than that. In other words, referring to a table >> from the outer query from within a subquery that is in the second or more >> nested level will generate an error. >> >> >> >> Best, >> Khaleel >> >> >> >> On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <maryann....@gmail.com> >> wrote: >> >>> Glad that your queries worked. Please do let us know if any further >>> problems. >>> >>> The bug is with EXISTS. Right now Phoenix does not handle aggregate >>> function calls correctly within an EXISTS subquery or EXISTS nested >>> subquery. I have opened a JIRA: >>> https://issues.apache.org/jira/browse/PHOENIX-1923. >>> >>> >>> Thanks, >>> Maryann >>> >>> >>> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom...@gmail.com> >>> wrote: >>> >>>> >>>> Dear Maryann, >>>> >>>> I tried your query and it worked. I also executed a more complex query >>>> which I need in my testing, which is: >>>> >>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>> "Wiki" AS W >>>> where EXISTS (select * from "History" AS H >>>> where >>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId") >>>> AND H."timestamp" = (select MAX(H2."timestamp") >>>> from "History" AS H2 >>>> where >>>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') = >>>> >>>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')) >>>> AND >>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000)) >>>> AND NOT EXISTS (select * from "History" AS H3 >>>> where >>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId") >>>> AND (H3."status" = 'pending') AND >>>> (TO_NUMBER(H3."value") < 1000)); >>>> >>>> and it also worked after I understood your fix of the original query. >>>> >>>> So the trick here is that we can use the reference to the outer query >>>> within the next subquery level only, and not up to two levels as I was >>>> doing? Maybe this limitation exists because Phoenix joins the tables from >>>> the outer and the inner correlated query, but it can perform this join up >>>> to one level only? >>>> >>>> >>>> >>>> Best, >>>> Khaleel >>>> >>>> >>>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann....@gmail.com> >>>> wrote: >>>> >>>>> Hi Khaleel, >>>>> >>>>> Mind if you try the following query? I think it's the same semantics >>>>> as you meant for your query. >>>>> >>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>> "Wiki" AS W >>>>> where >>>>> W."rowId" in (select H"rowId" from "History" AS H >>>>> where H."timestamp" = >>>>> (select MAX(H2."timestamp") from >>>>> "History" AS H2 where H2."rowId" = H."rowId") >>>>> AND >>>>> H."status" = 'approved') >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Maryann >>>>> >>>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <khellom...@gmail.com >>>>> > wrote: >>>>> >>>>>> Dear Maryann, >>>>>> >>>>>> Thanks for your question. You are right: the query that I was writing >>>>>> wasn't the correct one for my purpose. The query that will satisfy my >>>>>> request would be: >>>>>> >>>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>>> "Wiki" AS W >>>>>> where >>>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND >>>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2 >>>>>> where (H2."rowId" = W."rowId")) AND >>>>>> (H."status" = 'approved') ) >>>>>> >>>>>> In this query I specify that I need to select the exact version which >>>>>> has the Maximum timestamp among all versions of the same data item by >>>>>> using >>>>>> MAX(H2."timestamp") within the inner subquery within EXISTS. >>>>>> >>>>>> However I tried such query and it still produces the same error as >>>>>> the old query, which is: (Aggregate may not contain columns not in >>>>>> GROUP BY.) >>>>>> >>>>>> Thank you for your help. >>>>>> >>>>>> >>>>>> Best, >>>>>> Khaleel >>>>>> >>>>>> >>>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Khaleel, >>>>>>> >>>>>>> Thanks for the explanation! But my question was since this is an >>>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved' >>>>>>> status, the EXISTS will return true regardless of whether you are >>>>>>> testing >>>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed >>>>>>> something? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Maryann >>>>>>> >>>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad < >>>>>>> khellom...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hello Maryann, >>>>>>>> >>>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in >>>>>>>> the next release so that I can continue working with this part in my >>>>>>>> research project. Thanks for keeping me posted. >>>>>>>> >>>>>>>> With respect to your question, I am using the "History" table as a >>>>>>>> data store of all versions of a certain data item. When I say Max( >>>>>>>> H."timestamp") inside the query, I am selecting the latest version >>>>>>>> (most recent) that is "approved" (which is checked using the condition >>>>>>>> H."status" >>>>>>>> = 'approved') >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Khaleel >>>>>>>> >>>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <maryann....@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Hi Khaleel, >>>>>>>>> >>>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I >>>>>>>>> will file a JIRA and keep you posted. >>>>>>>>> >>>>>>>>> One question though, why would we use MAX(H."timestamp") instead >>>>>>>>> of H."timestamp"? What difference would it make? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Maryann >>>>>>>>> >>>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad < >>>>>>>>> khellom...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I tried executing the following query in Phoenix: >>>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" >>>>>>>>>> AS H >>>>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') ); >>>>>>>>>> >>>>>>>>>> I got the following error: >>>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in >>>>>>>>>> GROUP BY. (state=42Y27,code=1018) >>>>>>>>>> >>>>>>>>>> If I replace W."rowId" (in the where clause) with a constant >>>>>>>>>> value (for example: '3587'), the query works fine. Also, if I >>>>>>>>>> replace the >>>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also >>>>>>>>>> works fine. >>>>>>>>>> >>>>>>>>>> So it seems that Phoenix generates error when using a reference >>>>>>>>>> to an outer query while using an aggregate within the inner query. >>>>>>>>>> >>>>>>>>>> Any solutions? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Khaleel Mershad, Ph.D. >>>>>>>>>> Research Associate >>>>>>>>>> >>>>>>>>>> American University of Beirut >>>>>>>>>> Department of Electrical and Computer Engineering >>>>>>>>>> Bliss Street, Beirut, Lebanon >>>>>>>>>> email: kw...@aub.edu.lb >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >