Thanks for the explanation. I see what you mean. I will also try testing a
query that contains nested correlated subqueries with multiple level outer
reference, and with using IN instead of EXISTS and I will see if it
produces an error.



Best,
Khaleel

On Sun, Apr 26, 2015 at 7:14 PM, Maryann Xue <maryann....@gmail.com> wrote:

> Hi Khaleel,
>
> Thanks for looking into the problem! But there IS a bug with EXISTS having
> an aggregate function, say "max(H."timestamp") as used in your very first
> query that did not work. Otherwise, if subquery is SELECT * or SELECT
> column1 or SELECT substr(column1), it's fine coz there is no aggregate.
>
> And interestingly the query with W."rowId" failed because it referenced to
> the outmost query that had EXISTS. Otherwise if using H.
> rowId", it only had to deal with "SELECT * from H" which was a comparison
> subquery and so it worked fine.
>
> Anyway, I will verify if there is any problem with multiple level outer
> reference in correlated subqueries, as an independent issue without EXISTS.
>
> Given that our EXISTS support is currently incomplete, you may want to
> rewrite your EXISTS subqueries with equivalent IN subqueries, which would
> most likely just work.
>
>
> Thanks,
> Maryann
>
>
> On Sun, Apr 26, 2015 at 11:54 AM, khaleel mershad <khellom...@gmail.com>
> wrote:
>
>>
>> I don't think that the bug is with EXISTS. As you can see with the query
>> from my last email, it contains a nested correlated subquery which contains
>> an aggregate within *EXISTS* and it worked after replacing the reference
>> to the outer table (W."rowId") with a reference to the inner subquery table
>> (H."rowId") within the aggregate subquery. In other words, the following
>> query generates an error:
>>
>> select distinct W."rowId", W."md5_current", W."size_current" from "Wiki"
>> AS W
>> where EXISTS (select * from "History" AS H
>>                         where
>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>> from "History" AS H2
>>                                                           where 
>> (*REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+')
>> =*
>> *
>> W."rowId"*)
>>                                                           AND
>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>> AND NOT EXISTS (select * from "History" AS H3
>>                               where
>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>                                 AND (H3."status" = 'pending') AND
>> (TO_NUMBER(H3."value") < 1000));
>>
>>
>> But if I replace *W."rowId"* with 
>> REGEXP_SUBSTR(SUBSTR(*H."rowId"*,19),'[^:]+'),
>> the query works fine (which is the query from my last email). So I think
>> the problem is not with EXISTS, but with the fact that a reference to a
>> table from the outer query (W in my query) can be done up to a maximum one
>> nested level, and not more than that. In other words, referring to a table
>> from the outer query from within a subquery that is in the second or more
>> nested level will generate an error.
>>
>>
>>
>> Best,
>> Khaleel
>>
>>
>>
>> On Sun, Apr 26, 2015 at 6:33 PM, Maryann Xue <maryann....@gmail.com>
>> wrote:
>>
>>> Glad that your queries worked. Please do let us know if any further
>>> problems.
>>>
>>> The bug is with EXISTS. Right now Phoenix does not handle aggregate
>>> function calls correctly within an EXISTS subquery or EXISTS nested
>>> subquery. I have opened a JIRA:
>>> https://issues.apache.org/jira/browse/PHOENIX-1923.
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>>
>>> On Sun, Apr 26, 2015 at 8:54 AM, khaleel mershad <khellom...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Dear Maryann,
>>>>
>>>> I tried your query and it worked. I also executed a more complex query
>>>> which I need in my testing, which is:
>>>>
>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>> "Wiki" AS W
>>>> where EXISTS (select * from "History" AS H
>>>>                         where
>>>> (REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+') = W."rowId")
>>>>                         AND H."timestamp" = (select MAX(H2."timestamp")
>>>> from "History" AS H2
>>>>                                                           where
>>>> (REGEXP_SUBSTR(SUBSTR(H2."rowId",19),'[^:]+') =
>>>>
>>>> REGEXP_SUBSTR(SUBSTR(H."rowId",19),'[^:]+'))
>>>>                                                           AND
>>>> (H2."status" = 'approved')) AND (TO_NUMBER(H."value") > 1000))
>>>> AND NOT EXISTS (select * from "History" AS H3
>>>>                               where
>>>> (REGEXP_SUBSTR(SUBSTR(H3."rowId",19),'[^:]+') = W."rowId")
>>>>                                 AND (H3."status" = 'pending') AND
>>>> (TO_NUMBER(H3."value") < 1000));
>>>>
>>>> and it also worked after I understood your fix of the original query.
>>>>
>>>> So the trick here is that we can use the reference to the outer query
>>>> within the next subquery level only, and not up to two levels as I was
>>>> doing? Maybe this limitation exists because Phoenix joins the tables from
>>>> the outer and the inner correlated query, but it can perform this join up
>>>> to one level only?
>>>>
>>>>
>>>>
>>>> Best,
>>>> Khaleel
>>>>
>>>>
>>>> On Sat, Apr 25, 2015 at 8:11 PM, Maryann Xue <maryann....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Khaleel,
>>>>>
>>>>> Mind if you try the following query? I think it's the same semantics
>>>>> as you meant for your query.
>>>>>
>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>> "Wiki" AS W
>>>>> where
>>>>> W."rowId" in (select H"rowId" from "History" AS H
>>>>>                          where H."timestamp" =
>>>>>                                 (select MAX(H2."timestamp") from
>>>>> "History" AS H2 where H2."rowId" = H."rowId")
>>>>>                          AND
>>>>>  H."status" = 'approved')
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Fri, Apr 24, 2015 at 5:25 PM, khaleel mershad <khellom...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Dear Maryann,
>>>>>>
>>>>>> Thanks for your question. You are right: the query that I was writing
>>>>>> wasn't the correct one for my purpose. The query that will satisfy my
>>>>>> request would be:
>>>>>>
>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>> "Wiki" AS W
>>>>>> where
>>>>>> EXISTS (select * from "History" AS H where (H."rowId" = W."rowId") AND
>>>>>> H."timestamp" = (select MAX(H2."timestamp") from "History" AS H2
>>>>>> where (H2."rowId" = W."rowId")) AND
>>>>>>  (H."status" = 'approved') )
>>>>>>
>>>>>> In this query I specify that I need to select the exact version which
>>>>>> has the Maximum timestamp among all versions of the same data item by 
>>>>>> using
>>>>>> MAX(H2."timestamp") within the inner subquery within EXISTS.
>>>>>>
>>>>>> However I tried such query and it still produces the same error as
>>>>>> the old query, which is: (Aggregate may not contain columns not in
>>>>>> GROUP BY.)
>>>>>>
>>>>>> Thank you for your help.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Khaleel
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 23, 2015 at 5:49 PM, Maryann Xue <maryann....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Khaleel,
>>>>>>>
>>>>>>> Thanks for the explanation! But my question was since this is an
>>>>>>> EXISTS, I assume if there ever is a H."timestamp" with an 'approved'
>>>>>>> status, the EXISTS will return true regardless of whether you are 
>>>>>>> testing
>>>>>>> H."timestamp" or Max(H."timestamp"). Is that correct? or have I missed
>>>>>>> something?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Thu, Apr 23, 2015 at 8:49 AM, khaleel mershad <
>>>>>>> khellom...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hello Maryann,
>>>>>>>>
>>>>>>>> Thanks very much for your reply. Hopefully this bug gets fixed in
>>>>>>>> the next release so that I can continue working with this part in my
>>>>>>>> research project. Thanks for keeping me posted.
>>>>>>>>
>>>>>>>> With respect to your question, I am using the "History" table as a
>>>>>>>> data store of all versions of a certain data item. When I say Max(
>>>>>>>> H."timestamp") inside the query, I am selecting the latest version
>>>>>>>> (most recent) that is "approved" (which is checked using the condition 
>>>>>>>> H."status"
>>>>>>>> = 'approved')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Khaleel
>>>>>>>>
>>>>>>>> On Thu, Apr 23, 2015 at 6:22 AM, Maryann Xue <maryann....@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Khaleel,
>>>>>>>>>
>>>>>>>>> Thanks a lot for reporting the problem, which looks like a bug. I
>>>>>>>>> will file a JIRA and keep you posted.
>>>>>>>>>
>>>>>>>>> One question though, why would we use MAX(H."timestamp") instead
>>>>>>>>> of H."timestamp"? What difference would it make?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Maryann
>>>>>>>>>
>>>>>>>>> On Wed, Apr 22, 2015 at 9:45 AM, khaleel mershad <
>>>>>>>>> khellom...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I tried executing the following query in Phoenix:
>>>>>>>>>> select distinct W."rowId", W."md5_current", W."size_current" from
>>>>>>>>>> "Wiki" AS W where EXISTS( select MAX(H."timestamp") from "History" 
>>>>>>>>>> AS H
>>>>>>>>>> where (H."rowId" = W."rowId") AND (H."status" = 'approved') );
>>>>>>>>>>
>>>>>>>>>> I got the following error:
>>>>>>>>>> ERROR 1018 (42Y27): Aggregate may not contain columns not in
>>>>>>>>>> GROUP BY. (state=42Y27,code=1018)
>>>>>>>>>>
>>>>>>>>>> If I replace W."rowId" (in the where clause) with a constant
>>>>>>>>>> value (for example: '3587'), the query works fine. Also, if I 
>>>>>>>>>> replace the
>>>>>>>>>> aggregate MAX(H."timestamp") with H."timestamp", the query also 
>>>>>>>>>> works fine.
>>>>>>>>>>
>>>>>>>>>> So it seems that Phoenix generates error when using a reference
>>>>>>>>>> to an outer query while using an aggregate within the inner query.
>>>>>>>>>>
>>>>>>>>>> Any solutions?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Khaleel Mershad, Ph.D.
>>>>>>>>>> Research Associate
>>>>>>>>>>
>>>>>>>>>> American University of Beirut
>>>>>>>>>> Department of Electrical and Computer Engineering
>>>>>>>>>> Bliss Street, Beirut, Lebanon
>>>>>>>>>> email: kw...@aub.edu.lb
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to