Thank you in advance! Just let me know.
Thanks, Maryann On Sunday, April 26, 2015, khaleel mershad <khellom...@gmail.com> wrote: > > Thanks for the explanation. I see what you mean. I will also try testing a > query that contains nested correlated subqueries with multiple level > outer reference, and with using IN instead of EXISTS and I will see if it > produces an error. > > > > Best, > Khaleel > > On Sun, Apr 26, 2015 at 7:14 PM, Maryann Xue <maryann....@gmail.com > <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote: > >> Hi Khaleel, >> >> Thanks for looking into the problem! But there IS a bug with EXISTS >> having an aggregate function, say "max(H."timestamp") as used in your very >> first query that did not work. Otherwise, if subquery is SELECT * or SELECT >> column1 or SELECT substr(column1), it's fine coz there is no aggregate. >> >> And interestingly the query with W."rowId" failed because it referenced >> to the outmost query that had EXISTS. Otherwise if using H. >> rowId", it only had to deal with "SELECT * from H" which was a comparison >> subquery and so it worked fine. >> >> Anyway, I will verify if there is any problem with multiple level outer >> reference in correlated subqueries, as an independent issue without EXISTS. >> >> Given that our EXISTS support is currently incomplete, you may want to >> rewrite your EXISTS subqueries with equivalent IN subqueries, which would >> most likely just work. >> >> >> Thanks, >> Maryann >> >> >> On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad <khellom...@gmail.com >> <javascript:_e(%7B%7D,'cvml','khellom...@gmail.com');>> wrote: >> >>> >>> I don't think that the bug is with EXISTS. As you can see with the query >>> from my last email, it contains a nested correlated subquery which contains >>> an aggregate within *EXISTS* and it worked after replacing the >>> reference to the outer table (W."rowId") with a reference to the inner >>> subquery table (H."rowId") within the aggregate subquery. In other words, >>> the following query generates an error: >>> >>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" >>> AS W >>> where EXISTS (select * from "History" AS H >>> where >>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId") >>> AND H."timestamp" = (select MAX(H2."timestamp") >>> from "History" AS H2 >>> where >>> (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') >>> =* >>> * >>> W."rowId"*) >>> AND >>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000)) >>> AND NOT EXISTS (select * from "History" AS H3 >>> where >>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId") >>> AND (H3."status" = 'pending') AND >>> (TO_NUMBER(H3."value") < 1000)); >>> >>> >>> But if I replace *W."rowId"* with >>> REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'), >>> the query works fine (which is the query from my last email). So I think >>> the problem is not with EXISTS, but with the fact that a reference to a >>> table from the outer query (W in my query) can be done up to a maximum one >>> nested level, and not more than that. In other words, referring to a table >>> from the outer query from within a subquery that is in the second or more >>> nested level will generate an error. >>> >>> >>> >>> Best, >>> Khaleel >>> >>> >>> >>> On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <maryann....@gmail.com >>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote: >>> >>>> Glad that your queries worked. Please do let us know if any further >>>> problems. >>>> >>>> The bug is with EXISTS. Right now Phoenix does not handle aggregate >>>> function calls correctly within an EXISTS subquery or EXISTS nested >>>> subquery. I have opened a JIRA: >>>> https://issues.apache.org/jira/browse/PHOENIX-1923. >>>> >>>> >>>> Thanks, >>>> Maryann >>>> >>>> >>>> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','khellom...@gmail.com');>> wrote: >>>> >>>>> >>>>> Dear Maryann, >>>>> >>>>> I tried your query and it worked. I also executed a more complex query >>>>> which I need in my testing, which is: >>>>> >>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>> "Wiki" AS W >>>>> where EXISTS (select * from "History" AS H >>>>> where >>>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId") >>>>> AND H."timestamp" = (select >>>>> MAX(H2."timestamp") from "History" AS H2 >>>>> where >>>>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') = >>>>> >>>>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')) >>>>> AND >>>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000)) >>>>> AND NOT EXISTS (select * from "History" AS H3 >>>>> where >>>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId") >>>>> AND (H3."status" = 'pending') AND >>>>> (TO_NUMBER(H3."value") < 1000)); >>>>> >>>>> and it also worked after I understood your fix of the original query. >>>>> >>>>> So the trick here is that we can use the reference to the outer query >>>>> within the next subquery level only, and not up to two levels as I was >>>>> doing? Maybe this limitation exists because Phoenix joins the tables from >>>>> the outer and the inner correlated query, but it can perform this join up >>>>> to one level only? >>>>> >>>>> >>>>> >>>>> Best, >>>>> Khaleel >>>>> >>>>> >>>>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann....@gmail.com >>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote: >>>>> >>>>>> Hi Khaleel, >>>>>> >>>>>> Mind if you try the following query? I think it's the same semantics >>>>>> as you meant for your query. >>>>>> >>>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>>> "Wiki" AS W >>>>>> where >>>>>> W."rowId" in (select H"rowId" from "History" AS H >>>>>> where H."timestamp" = >>>>>> (select MAX(H2."timestamp") from >>>>>> "History" AS H2 where H2."rowId" = H."rowId") >>>>>> AND >>>>>> H."status" = 'approved') >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Maryann >>>>>> >>>>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad < >>>>>> khellom...@gmail.com >>>>>> <javascript:_e(%7B%7D,'cvml','khellom...@gmail.com');>> wrote: >>>>>> >>>>>>> Dear Maryann, >>>>>>> >>>>>>> Thanks for your question. You are right: the query that I was >>>>>>> writing wasn't the correct one for my purpose. The query that will >>>>>>> satisfy >>>>>>> my request would be: >>>>>>> >>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from >>>>>>> "Wiki" AS W >>>>>>> where >>>>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") >>>>>>> AND >>>>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2 >>>>>>> where (H2."rowId" = W."rowId")) AND >>>>>>> (H."status" = 'approved') ) >>>>>>> >>>>>>> In this query I specify that I need to select the exact version >>>>>>> which has the Maximum timestamp among all versions of the same data >>>>>>> item by >>>>>>> using MAX(H2."timestamp") within the inner subquery within EXISTS. >>>>>>> >>>>>>> However I tried such query and it still produces the same error as >>>>>>> the old query, which is: (Aggregate may not contain columns not in >>>>>>> GROUP BY.) >>>>>>> >>>>>>> Thank you for your help. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> Khaleel >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann....@gmail.com >>>>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote: >>>>>>> >>>>>>>> Hi Khaleel, >>>>>>>> >>>>>>>> Thanks for the explanation! But my question was since this is an >>>>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved' >>>>>>>> status, the EXISTS will return true regardless of whether you are >>>>>>>> testing >>>>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed >>>>>>>> something? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Maryann >>>>>>>> >>>>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad < >>>>>>>> khellom...@gmail.com >>>>>>>> <javascript:_e(%7B%7D,'cvml','khellom...@gmail.com');>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hello Maryann, >>>>>>>>> >>>>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in >>>>>>>>> the next release so that I can continue working with this part in my >>>>>>>>> research project. Thanks for keeping me posted. >>>>>>>>> >>>>>>>>> With respect to your question, I am using the "History" table as a >>>>>>>>> data store of all versions of a certain data item. When I say Max( >>>>>>>>> H."timestamp") inside the query, I am selecting the latest >>>>>>>>> version (most recent) that is "approved" (which is checked using the >>>>>>>>> condition H."status" = 'approved') >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Khaleel >>>>>>>>> >>>>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue < >>>>>>>>> maryann....@gmail.com >>>>>>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote: >>>>>>>>> >>>>>>>>>> Hi Khaleel, >>>>>>>>>> >>>>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I >>>>>>>>>> will file a JIRA and keep you posted. >>>>>>>>>> >>>>>>>>>> One question though, why would we use MAX(H."timestamp") instead >>>>>>>>>> of H."timestamp"? What difference would it make? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Maryann >>>>>>>>>> >>>>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad < >>>>>>>>>> khellom...@gmail.com >>>>>>>>>> <javascript:_e(%7B%7D,'cvml','khellom...@gmail.com');>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I tried executing the following query in Phoenix: >>>>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" >>>>>>>>>>> from "Wiki" AS W where EXISTS( select MAX(H."timestamp") from >>>>>>>>>>> "History" AS >>>>>>>>>>> H where (H."rowId" = W."rowId") AND (H."status" = 'approved') ); >>>>>>>>>>> >>>>>>>>>>> I got the following error: >>>>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in >>>>>>>>>>> GROUP BY. (state=42Y27,code=1018) >>>>>>>>>>> >>>>>>>>>>> If I replace W."rowId" (in the where clause) with a constant >>>>>>>>>>> value (for example: '3587'), the query works fine. Also, if I >>>>>>>>>>> replace the >>>>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also >>>>>>>>>>> works fine. >>>>>>>>>>> >>>>>>>>>>> So it seems that Phoenix generates error when using a reference >>>>>>>>>>> to an outer query while using an aggregate within the inner query. >>>>>>>>>>> >>>>>>>>>>> Any solutions? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Khaleel Mershad, Ph.D. >>>>>>>>>>> Research Associate >>>>>>>>>>> >>>>>>>>>>> American University of Beirut >>>>>>>>>>> Department of Electrical and Computer Engineering >>>>>>>>>>> Bliss Street, Beirut, Lebanon >>>>>>>>>>> email: kw...@aub.edu.lb >>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','kw...@aub.edu.lb');> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >