The error actually happens when I remove the header row from the file. cat test2.csv "Ed",100 "Pete",200 "Ed",100 "Pete",400
0: jdbc:drill:zk=local> select columns[0], columns[1] from dfs.`/data/test2.csv`; +------------+------------+ | EXPR$0 | EXPR$1 | +------------+------------+ | "Ed" | 100 | | "Pete" | 200 | | "Ed" | 100 | | "Pete" | 400 | +------------+------------+ 4 rows selected (0.837 seconds) 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) from dfs.`/data/test2.csv` group by columns[0]; Query failed: Query failed: Failure while running fragment., Only COUNT aggregate function supported for Boolean type [ a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010 ] [ a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) Also, I don't see anything on the Drill wiki about filtering out the header row. How is that done? Thanks On Sun, Feb 8, 2015 at 9:23 PM, Andries Engelbrecht < [email protected]> wrote: > You need to filter out the header line on the CSV file as you are trying > to sum a string in column1. > > > —Andries > > > On Feb 8, 2015, at 6:12 PM, Minnow Noir <[email protected]> wrote: > > > I'm trying to perform a basic query in order to learn Drill, but getting > an > > the error message in the subject line. > > > > I created a dead simple CSV file on disk. Note that Sales values are not > > quoted. > > > > cat test.csv > > Employee,Sales > > Ed,100 > > Pete,200 > > Ed,100 > > Pete,400 > > > > When I query it without performing a sum, it returns the expected values. > > > > 0: jdbc:drill:zk=local> select columns[0], columns[1] as TotalSales from > > dfs.`/data/test.csv`; > > +------------+------------+ > > | EXPR$0 | TotalSales | > > +------------+------------+ > > | Employee | Sales | > > | Ed | 100 | > > | Pete | 200 | > > | Ed | 100 | > > | Pete | 400 | > > +------------+------------+ > > 5 rows selected (0.11 seconds) > > > > However, if I throw a sum() in there, I get the confusing error message > in > > the subject line: > > > > 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) as TotalSales > > from dfs.` > > /data/test.csv` group by columns[0]; > > Query failed: Query failed: Failure while running fragment., Only COUNT > > aggregate function supported for Boolean type [ > > bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010 ] > > [ bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010 > ] > > > > The message seems to be saying that Drill interpreted the Sales column > data > > as being Boolean somehow, and therefore, the only function that can be > > called is count(). It's not clear why Drill would interpret the Sales > > column values as being Boolean. > > > > Some Googling turned up this thread, which says the error also occurs for > > character data: https://issues.apache.org/jira/browse/DRILL-1998 > > > > Of course, it's not clear why Drill would interpret 100, 200, etc. as > being > > character data unless it's getting thrown off by the header row...which > of > > course is present in every CSV and TSV file. However, I created a copy > of > > test.csv *without* the header row, and reran the query, but got the same > > error. > > > > Any ideas what's causing the issue and how to resolve it? > > > > Thanks > >
