Okay manually filtering out the header row while casting values seems to
work.

On Mon, Feb 9, 2015 at 1:59 AM, Andries Engelbrecht <
[email protected]> wrote:

> You can can eliminate the header row with a predicate. Also Drill will try
> to guess the data type of columns, but sometimes it may be wrong. In this
> case the header row can confuse Drill as to the data type of the column as
> the first column has a string, and the next ones have a number. In such
> cases it is a good practice to cast the data type of the column.
>
> Try this query as an example for the file you are testing with.
>
> select columns[0], sum(cast(columns[1] as INT)) from `./csv/test.csv`
> where columns[0]<>'Employee' group by columns[0];
>
> If you remove the header row in the file, you may find that Drill will
> more than likely be able to perform the sum without the cast.
>
> --Andries
>
>
>
> > On Feb 8, 2015, at 7:44 PM, Minnow Noir <[email protected]> wrote:
> >
> > That works with the file that has no row header, Sudhakar.
> >
> >> On Sun, Feb 8, 2015 at 10:34 PM, Sudhakar Thota <[email protected]>
> wrote:
> >>
> >> May be you have to cast it to integer.
> >>
> >> Sudhakar Thota
> >> Sent from my iPhone
> >>
> >>> On Feb 8, 2015, at 6:54 PM, Minnow Noir <[email protected]> wrote:
> >>>
> >>> The error actually happens when I remove the header row from the file.
> >>>
> >>> cat test2.csv
> >>> "Ed",100
> >>> "Pete",200
> >>> "Ed",100
> >>> "Pete",400
> >>>
> >>>
> >>> 0: jdbc:drill:zk=local> select columns[0], columns[1] from
> >>> dfs.`/data/test2.csv`;
> >>> +------------+------------+
> >>> |   EXPR$0   |   EXPR$1   |
> >>> +------------+------------+
> >>> | "Ed"       | 100        |
> >>> | "Pete"     | 200        |
> >>> | "Ed"       | 100        |
> >>> | "Pete"     | 400        |
> >>> +------------+------------+
> >>> 4 rows selected (0.837 seconds)
> >>>
> >>> 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) from
> >>> dfs.`/data/test2.csv` group by columns[0];
> >>> Query failed: Query failed: Failure while running fragment., Only COUNT
> >>> aggregate function supported for Boolean type [
> >>> a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010
> ]
> >>> [ a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on
> sandbox.hortonworks.com:31010
> >> ]
> >>>
> >>>
> >>> Error: exception while executing query: Failure while executing query.
> >>> (state=,code=0)
> >>>
> >>> Also, I don't see anything on the Drill wiki about filtering out the
> >> header
> >>> row.  How is that done?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On Sun, Feb 8, 2015 at 9:23 PM, Andries Engelbrecht <
> >>> [email protected]> wrote:
> >>>
> >>>> You need to filter out the header line on the CSV file as you are
> trying
> >>>> to sum a string in column1.
> >>>>
> >>>>
> >>>> —Andries
> >>>>
> >>>>
> >>>>> On Feb 8, 2015, at 6:12 PM, Minnow Noir <[email protected]>
> wrote:
> >>>>>
> >>>>> I'm trying to perform a basic query in order to learn Drill, but
> >> getting
> >>>> an
> >>>>> the error message in the subject line.
> >>>>>
> >>>>> I created a dead simple CSV file on disk.  Note that Sales values are
> >> not
> >>>>> quoted.
> >>>>>
> >>>>> cat test.csv
> >>>>> Employee,Sales
> >>>>> Ed,100
> >>>>> Pete,200
> >>>>> Ed,100
> >>>>> Pete,400
> >>>>>
> >>>>> When I query it without performing a sum, it returns the expected
> >> values.
> >>>>>
> >>>>> 0: jdbc:drill:zk=local> select columns[0], columns[1] as TotalSales
> >> from
> >>>>> dfs.`/data/test.csv`;
> >>>>> +------------+------------+
> >>>>> |   EXPR$0   | TotalSales |
> >>>>> +------------+------------+
> >>>>> | Employee   | Sales      |
> >>>>> | Ed         | 100        |
> >>>>> | Pete       | 200        |
> >>>>> | Ed         | 100        |
> >>>>> | Pete       | 400        |
> >>>>> +------------+------------+
> >>>>> 5 rows selected (0.11 seconds)
> >>>>>
> >>>>> However, if I throw a sum() in there, I get the confusing error
> message
> >>>> in
> >>>>> the subject line:
> >>>>>
> >>>>> 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) as
> >> TotalSales
> >>>>> from dfs.`
> >>>>> /data/test.csv` group by columns[0];
> >>>>> Query failed: Query failed: Failure while running fragment., Only
> COUNT
> >>>>> aggregate function supported for Boolean type [
> >>>>> bfd34bd1-2fac-4d9e-a9bd-26bced552120 on
> sandbox.hortonworks.com:31010
> >> ]
> >>>>> [ bfd34bd1-2fac-4d9e-a9bd-26bced552120 on
> >> sandbox.hortonworks.com:31010
> >>>> ]
> >>>>>
> >>>>> The message seems to be saying that Drill interpreted the Sales
> column
> >>>> data
> >>>>> as being Boolean somehow, and therefore, the only function that can
> be
> >>>>> called is count().  It's not clear why Drill would interpret the
> Sales
> >>>>> column values as being Boolean.
> >>>>>
> >>>>> Some Googling turned up this thread, which says the error also occurs
> >> for
> >>>>> character data: https://issues.apache.org/jira/browse/DRILL-1998
> >>>>>
> >>>>> Of course, it's not clear why Drill would interpret 100, 200, etc. as
> >>>> being
> >>>>> character data unless it's getting thrown off by the header
> row...which
> >>>> of
> >>>>> course is present in every CSV and TSV file.  However, I created a
> copy
> >>>> of
> >>>>> test.csv *without* the header row, and reran the query, but got the
> >> same
> >>>>> error.
> >>>>>
> >>>>> Any ideas what's causing the issue and how to resolve it?
> >>>>>
> >>>>> Thanks
> >>
>

Reply via email to