You can can eliminate the header row with a predicate. Also Drill will try to guess the data type of columns, but sometimes it may be wrong. In this case the header row can confuse Drill as to the data type of the column as the first column has a string, and the next ones have a number. In such cases it is a good practice to cast the data type of the column.
Try this query as an example for the file you are testing with. select columns[0], sum(cast(columns[1] as INT)) from `./csv/test.csv` where columns[0]<>'Employee' group by columns[0]; If you remove the header row in the file, you may find that Drill will more than likely be able to perform the sum without the cast. --Andries > On Feb 8, 2015, at 7:44 PM, Minnow Noir <[email protected]> wrote: > > That works with the file that has no row header, Sudhakar. > >> On Sun, Feb 8, 2015 at 10:34 PM, Sudhakar Thota <[email protected]> wrote: >> >> May be you have to cast it to integer. >> >> Sudhakar Thota >> Sent from my iPhone >> >>> On Feb 8, 2015, at 6:54 PM, Minnow Noir <[email protected]> wrote: >>> >>> The error actually happens when I remove the header row from the file. >>> >>> cat test2.csv >>> "Ed",100 >>> "Pete",200 >>> "Ed",100 >>> "Pete",400 >>> >>> >>> 0: jdbc:drill:zk=local> select columns[0], columns[1] from >>> dfs.`/data/test2.csv`; >>> +------------+------------+ >>> | EXPR$0 | EXPR$1 | >>> +------------+------------+ >>> | "Ed" | 100 | >>> | "Pete" | 200 | >>> | "Ed" | 100 | >>> | "Pete" | 400 | >>> +------------+------------+ >>> 4 rows selected (0.837 seconds) >>> >>> 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) from >>> dfs.`/data/test2.csv` group by columns[0]; >>> Query failed: Query failed: Failure while running fragment., Only COUNT >>> aggregate function supported for Boolean type [ >>> a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010 ] >>> [ a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010 >> ] >>> >>> >>> Error: exception while executing query: Failure while executing query. >>> (state=,code=0) >>> >>> Also, I don't see anything on the Drill wiki about filtering out the >> header >>> row. How is that done? >>> >>> Thanks >>> >>> >>> On Sun, Feb 8, 2015 at 9:23 PM, Andries Engelbrecht < >>> [email protected]> wrote: >>> >>>> You need to filter out the header line on the CSV file as you are trying >>>> to sum a string in column1. >>>> >>>> >>>> —Andries >>>> >>>> >>>>> On Feb 8, 2015, at 6:12 PM, Minnow Noir <[email protected]> wrote: >>>>> >>>>> I'm trying to perform a basic query in order to learn Drill, but >> getting >>>> an >>>>> the error message in the subject line. >>>>> >>>>> I created a dead simple CSV file on disk. Note that Sales values are >> not >>>>> quoted. >>>>> >>>>> cat test.csv >>>>> Employee,Sales >>>>> Ed,100 >>>>> Pete,200 >>>>> Ed,100 >>>>> Pete,400 >>>>> >>>>> When I query it without performing a sum, it returns the expected >> values. >>>>> >>>>> 0: jdbc:drill:zk=local> select columns[0], columns[1] as TotalSales >> from >>>>> dfs.`/data/test.csv`; >>>>> +------------+------------+ >>>>> | EXPR$0 | TotalSales | >>>>> +------------+------------+ >>>>> | Employee | Sales | >>>>> | Ed | 100 | >>>>> | Pete | 200 | >>>>> | Ed | 100 | >>>>> | Pete | 400 | >>>>> +------------+------------+ >>>>> 5 rows selected (0.11 seconds) >>>>> >>>>> However, if I throw a sum() in there, I get the confusing error message >>>> in >>>>> the subject line: >>>>> >>>>> 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) as >> TotalSales >>>>> from dfs.` >>>>> /data/test.csv` group by columns[0]; >>>>> Query failed: Query failed: Failure while running fragment., Only COUNT >>>>> aggregate function supported for Boolean type [ >>>>> bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010 >> ] >>>>> [ bfd34bd1-2fac-4d9e-a9bd-26bced552120 on >> sandbox.hortonworks.com:31010 >>>> ] >>>>> >>>>> The message seems to be saying that Drill interpreted the Sales column >>>> data >>>>> as being Boolean somehow, and therefore, the only function that can be >>>>> called is count(). It's not clear why Drill would interpret the Sales >>>>> column values as being Boolean. >>>>> >>>>> Some Googling turned up this thread, which says the error also occurs >> for >>>>> character data: https://issues.apache.org/jira/browse/DRILL-1998 >>>>> >>>>> Of course, it's not clear why Drill would interpret 100, 200, etc. as >>>> being >>>>> character data unless it's getting thrown off by the header row...which >>>> of >>>>> course is present in every CSV and TSV file. However, I created a copy >>>> of >>>>> test.csv *without* the header row, and reran the query, but got the >> same >>>>> error. >>>>> >>>>> Any ideas what's causing the issue and how to resolve it? >>>>> >>>>> Thanks >>
