Re: Error when timestamp IN clause contains more elements

2017-07-17 Thread Rahul Raj
I will add explicit casts for these cases.

Thanks for the suggestions,
Rahul

On Sat, Jul 15, 2017 at 2:12 AM, Jinfeng Ni <j...@apache.org> wrote:

> The error would encounter when IN clause is rewritten into Join, and
> planner chooses to use distributed join (involves a Hash Exchange).  For
> distributed join,  when two join keys do not have same types, only certain
> type of implicit cast will be allowed.  In Khurram's case, I think the
> query is not using hash-distributed join method, and hence did not hit this
> error.
>
>
>
>
>
> On Fri, Jul 14, 2017 at 12:17 PM, Khurram Faraaz <kfar...@mapr.com> wrote:
>
> > I tried the below query using data from your query from the IN clause (as
> > is), I didn't see an error/Exception on sqlline, I am on Drill 1.11.0
> >
> >
> > 0: jdbc:drill:schema=dfs.tmp> select * from (values(137676420)) WHERE
> > EXPR$0 IN (137667780, 137676420, 137685060, 137702340,
> > 137728260, 137736900, 137745540, 137788740,
> 137797380,
> > 137909700, 137918340, 137926980, 137970180,
> 137978820,
> > 138030660, 138039300, 138047940, 138091140,
> 138099780);
> > ++
> > | EXPR$0 |
> > ++
> > | 137676420  |
> > ++
> > 1 row selected (0.509 seconds)
> >
> > Thanks,
> > Khurram
> >
> >
> > ________
> > From: Kunal Khatua <kkha...@mapr.com>
> > Sent: Friday, July 14, 2017 11:00:03 PM
> > To: user@drill.apache.org
> > Subject: RE: Error when timestamp IN clause contains more elements
> >
> > This works for 19 elements because the IN clause will do a filter
> pushdown
> > to the source for up to 19 elements. 20 or more... it resorts to using a
> > JOIN, in which case it finds data type mismatch.
> >
> > It is a good practice (for readability purposes) to do casts in such
> cases.
> >
> > Alternatively (not recommended in IMO) is to raise the threshold for the
> > property  "in_subquery_threshold" in sys.options from 20 to a higher
> value.
> >
> > -Original Message-
> > From: Rahul Raj [mailto:rahul@option3consulting.com]
> > Sent: Friday, July 14, 2017 7:56 AM
> > To: user@drill.apache.org
> > Subject: Error when timestamp IN clause contains more elements
> >
> > I am getting the error below when there are more than 19 elements in the
> IN
> > clause:
> >
> > DrillRuntimeException: Join only supports implicit casts between 1.
> > Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left
> type:
> > TIMESTAMP, Right type: BIGINT. Add explicit casts to avoid this error.
> >
> > However, query succeeds when IN clause contains one element less. Query
> is
> >
> > SELECT
> >   `Date`
> > , `COUNT`(`Date`) `COUNT_Date`
> > , `COUNT`(`a`) `COUNT_a`
> > FROM
> >   dfs.test.`latest`
> > WHERE (`Date` IN (137667780, 137676420, 137685060,
> > 137702340, 137728260, 137736900, 137745540,
> 137788740,
> > 137797380, 137909700, 137918340, 137926980,
> 137970180,
> > 137978820, 138030660, 138039300, 138047940,
> 138091140,
> > 138099780))
> > GROUP BY `Date`
> > ORDER BY `Date` ASC, `COUNT_Date` ASC;
> >
> >
> > Regards,
> > Rahul
> >
> > --
> >  This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.
> >
>

-- 
 This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.


Re: Error when timestamp IN clause contains more elements

2017-07-14 Thread Jinfeng Ni
The error would encounter when IN clause is rewritten into Join, and
planner chooses to use distributed join (involves a Hash Exchange).  For
distributed join,  when two join keys do not have same types, only certain
type of implicit cast will be allowed.  In Khurram's case, I think the
query is not using hash-distributed join method, and hence did not hit this
error.





On Fri, Jul 14, 2017 at 12:17 PM, Khurram Faraaz <kfar...@mapr.com> wrote:

> I tried the below query using data from your query from the IN clause (as
> is), I didn't see an error/Exception on sqlline, I am on Drill 1.11.0
>
>
> 0: jdbc:drill:schema=dfs.tmp> select * from (values(137676420)) WHERE
> EXPR$0 IN (137667780, 137676420, 137685060, 137702340,
> 137728260, 137736900, 137745540, 137788740, 137797380,
> 137909700, 137918340, 137926980, 137970180, 137978820,
> 138030660, 138039300, 138047940, 138091140, 138099780);
> ++
> | EXPR$0 |
> ++
> | 137676420  |
> ++
> 1 row selected (0.509 seconds)
>
> Thanks,
> Khurram
>
>
> 
> From: Kunal Khatua <kkha...@mapr.com>
> Sent: Friday, July 14, 2017 11:00:03 PM
> To: user@drill.apache.org
> Subject: RE: Error when timestamp IN clause contains more elements
>
> This works for 19 elements because the IN clause will do a filter pushdown
> to the source for up to 19 elements. 20 or more... it resorts to using a
> JOIN, in which case it finds data type mismatch.
>
> It is a good practice (for readability purposes) to do casts in such cases.
>
> Alternatively (not recommended in IMO) is to raise the threshold for the
> property  "in_subquery_threshold" in sys.options from 20 to a higher value.
>
> -Original Message-
> From: Rahul Raj [mailto:rahul@option3consulting.com]
> Sent: Friday, July 14, 2017 7:56 AM
> To: user@drill.apache.org
> Subject: Error when timestamp IN clause contains more elements
>
> I am getting the error below when there are more than 19 elements in the IN
> clause:
>
> DrillRuntimeException: Join only supports implicit casts between 1.
> Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type:
> TIMESTAMP, Right type: BIGINT. Add explicit casts to avoid this error.
>
> However, query succeeds when IN clause contains one element less. Query is
>
> SELECT
>   `Date`
> , `COUNT`(`Date`) `COUNT_Date`
> , `COUNT`(`a`) `COUNT_a`
> FROM
>   dfs.test.`latest`
> WHERE (`Date` IN (137667780, 137676420, 137685060,
> 137702340, 137728260, 137736900, 137745540, 137788740,
> 137797380, 137909700, 137918340, 137926980, 137970180,
> 137978820, 138030660, 138039300, 138047940, 138091140,
> 138099780))
> GROUP BY `Date`
> ORDER BY `Date` ASC, `COUNT_Date` ASC;
>
>
> Regards,
> Rahul
>
> --
>  This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.
>


Re: Error when timestamp IN clause contains more elements

2017-07-14 Thread Khurram Faraaz
I tried the below query using data from your query from the IN clause (as is), 
I didn't see an error/Exception on sqlline, I am on Drill 1.11.0


0: jdbc:drill:schema=dfs.tmp> select * from (values(137676420)) WHERE 
EXPR$0 IN (137667780, 137676420, 137685060, 137702340, 
137728260, 137736900, 137745540, 137788740, 137797380, 
137909700, 137918340, 137926980, 137970180, 137978820, 
138030660, 138039300, 138047940, 138091140, 138099780);
++
| EXPR$0 |
++
| 137676420  |
++
1 row selected (0.509 seconds)

Thanks,
Khurram



From: Kunal Khatua <kkha...@mapr.com>
Sent: Friday, July 14, 2017 11:00:03 PM
To: user@drill.apache.org
Subject: RE: Error when timestamp IN clause contains more elements

This works for 19 elements because the IN clause will do a filter pushdown to 
the source for up to 19 elements. 20 or more... it resorts to using a JOIN, in 
which case it finds data type mismatch.

It is a good practice (for readability purposes) to do casts in such cases.

Alternatively (not recommended in IMO) is to raise the threshold for the 
property  "in_subquery_threshold" in sys.options from 20 to a higher value.

-Original Message-
From: Rahul Raj [mailto:rahul@option3consulting.com]
Sent: Friday, July 14, 2017 7:56 AM
To: user@drill.apache.org
Subject: Error when timestamp IN clause contains more elements

I am getting the error below when there are more than 19 elements in the IN
clause:

DrillRuntimeException: Join only supports implicit casts between 1. Numeric 
data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type:
TIMESTAMP, Right type: BIGINT. Add explicit casts to avoid this error.

However, query succeeds when IN clause contains one element less. Query is

SELECT
  `Date`
, `COUNT`(`Date`) `COUNT_Date`
, `COUNT`(`a`) `COUNT_a`
FROM
  dfs.test.`latest`
WHERE (`Date` IN (137667780, 137676420, 137685060, 137702340, 
137728260, 137736900, 137745540, 137788740, 137797380, 
137909700, 137918340, 137926980, 137970180, 137978820, 
138030660, 138039300, 138047940, 138091140,
138099780))
GROUP BY `Date`
ORDER BY `Date` ASC, `COUNT_Date` ASC;


Regards,
Rahul

--
 This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom it is addressed. If you 
are not the named addressee then you should not disseminate, distribute or copy 
this e-mail. Please notify the sender immediately and delete this e-mail from 
your system.