I don't think that the bug is with EXISTS. As you can see with the query
from my last email, it contains a nested correlated subquery which contains
an aggregate within *EXISTS* and it worked after replacing the reference to
the outer table (W."rowId") with a reference to the inner subquery table
(H."rowId") within the aggregate subquery. In other words, the following
query generates an error:

select distinct W."rowId", W."md5_current", W."size_current" from "Wiki" AS
W
where EXISTS (select * from "History" AS H
                        where (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+')
= W."rowId")
                        AND H."timestamp" = (select MAX(H2."timestamp")
from "History" AS H2
                                                          where
(*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+')
=*
*
W."rowId"*)
                                                          AND (H2."status"
= 'approved')) AND (TO_NUMBER(H."value") > 1000))
AND NOT EXISTS (select * from "History" AS H3
                              where
(REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
                                AND (H3."status" = 'pending') AND
(TO_NUMBER(H3."value") < 1000));


But if I replace *W."rowId"* with
REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'),
the query works fine (which is the query from my last email). So I think
the problem is not with EXISTS, but with the fact that a reference to a
table from the outer query (W in my query) can be done up to a maximum one
nested level, and not more than that. In other words, referring to a table
from the outer query from within a subquery that is in the second or more
nested level will generate an error.



Best,
Khaleel



On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <maryann....@gmail.com> wrote:

> Glad that your queries worked. Please do let us know if any further
> problems.
>
> The bug is with EXISTS. Right now Phoenix does not handle aggregate
> function calls correctly within an EXISTS subquery or EXISTS nested
> subquery. I have opened a JIRA:
> https://issues.apache.org/jira/browse/PHOENIX-1923.
>
>
> Thanks,
> Maryann
>
>
> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom...@gmail.com>
> wrote:
>
>>
>> Dear Maryann,
>>
>> I tried your query and it worked. I also executed a more complex query
>> which I need in my testing, which is:
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where EXISTS (select * from "History" AS H
>>                         where
>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>> from "History" AS H2
>>                                                           where
>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>>
>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>>                                                           AND
>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>> AND NOT EXISTS (select * from "History" AS H3
>>                               where
>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>                                 AND (H3."status" = 'pending') AND
>> (TO_NUMBER(H3."value") < 1000));
>>
>> and it also worked after I understood your fix of the original query.
>>
>> So the trick here is that we can use the reference to the outer query
>> within the next subquery level only, and not up to two levels as I was
>> doing? Maybe this limitation exists because Phoenix joins the tables from
>> the outer and the inner correlated query, but it can perform this join up
>> to one level only?
>>
>>
>>
>> Best,
>> Khaleel
>>
>>
>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann....@gmail.com>
>> wrote:
>>
>>> Hi Khaleel,
>>>
>>> Mind if you try the following query? I think it's the same semantics as
>>> you meant for your query.
>>>
>>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>>> AS W
>>> where
>>> W."rowId" in (select H"rowId" from "History" AS H
>>>                          where H."timestamp" =
>>>                                 (select MAX(H2."timestamp") from
>>> "History" AS H2 where H2."rowId" = H."rowId")
>>>                          AND
>>>  H."status" = 'approved')
>>>
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <khellom...@gmail.com>
>>> wrote:
>>>
>>>> Dear Maryann,
>>>>
>>>> Thanks for your question. You are right: the query that I was writing
>>>> wasn't the correct one for my purpose. The query that will satisfy my
>>>> request would be:
>>>>
>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>> "Wiki" AS W
>>>> where
>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>>> where (H2."rowId" = W."rowId")) AND
>>>>  (H."status" = 'approved') )
>>>>
>>>> In this query I specify that I need to select the exact version which
>>>> has the Maximum timestamp among all versions of the same data item by using
>>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>>
>>>> However I tried such query and it still produces the same error as the
>>>> old query, which is: (Aggregate may not contain columns not in GROUP
>>>> BY.)
>>>>
>>>> Thank you for your help.
>>>>
>>>>
>>>> Best,
>>>> Khaleel
>>>>
>>>>
>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Khaleel,
>>>>>
>>>>> Thanks for the explanation! But my question was since this is an
>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>>> status, the EXISTS will return true regardless of whether you are testing
>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>>> something?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <khellom...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>> Hello Maryann,
>>>>>>
>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in the
>>>>>> next release so that I can continue working with this part in my research
>>>>>> project. Thanks for keeping me posted.
>>>>>>
>>>>>> With respect to your question, I am using the "History" table as a
>>>>>> data store of all versions of a certain data item. When I say Max(
>>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>>> (most recent) that is "approved" (which is checked using the condition 
>>>>>> H."status"
>>>>>> = 'approved')
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Khaleel
>>>>>>
>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <maryann....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Khaleel,
>>>>>>>
>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>>> will file a JIRA and keep you posted.
>>>>>>>
>>>>>>> One question though, why would we use MAX(H."timestamp") instead of 
>>>>>>> H."timestamp"?
>>>>>>> What difference would it make?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>>> khellom...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I tried executing the following query in Phoenix:
>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" AS H
>>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>>
>>>>>>>> I got the following error:
>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP
>>>>>>>> BY. (state=42Y27,code=1018)
>>>>>>>>
>>>>>>>> If I replace W."rowId" (in the where clause) with a constant value
>>>>>>>> (for example: '3587'), the query works fine. Also, if I replace the
>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also works 
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> So it seems that Phoenix generates error when using a reference to
>>>>>>>> an outer query while using an aggregate within the inner query.
>>>>>>>>
>>>>>>>> Any solutions?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>>> Research Associate
>>>>>>>>
>>>>>>>> American University of Beirut
>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>>> email: kw...@aub.edu.lb
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to