I'm trying to perform a basic query in order to learn Drill, but getting an
the error message in the subject line.

I created a dead simple CSV file on disk.  Note that Sales values are not
quoted.

 cat test.csv
Employee,Sales
Ed,100
Pete,200
Ed,100
Pete,400

When I query it without performing a sum, it returns the expected values.

0: jdbc:drill:zk=local> select columns[0], columns[1] as TotalSales from
dfs.`/data/test.csv`;
+------------+------------+
|   EXPR$0   | TotalSales |
+------------+------------+
| Employee   | Sales      |
| Ed         | 100        |
| Pete       | 200        |
| Ed         | 100        |
| Pete       | 400        |
+------------+------------+
5 rows selected (0.11 seconds)

However, if I throw a sum() in there, I get the confusing error message in
the subject line:

0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) as TotalSales
from dfs.`
/data/test.csv` group by columns[0];
Query failed: Query failed: Failure while running fragment., Only COUNT
aggregate function supported for Boolean type [
bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010 ]
[ bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010 ]

The message seems to be saying that Drill interpreted the Sales column data
as being Boolean somehow, and therefore, the only function that can be
called is count().  It's not clear why Drill would interpret the Sales
column values as being Boolean.

Some Googling turned up this thread, which says the error also occurs for
character data: https://issues.apache.org/jira/browse/DRILL-1998

Of course, it's not clear why Drill would interpret 100, 200, etc. as being
character data unless it's getting thrown off by the header row...which of
course is present in every CSV and TSV file.  However, I created a copy of
test.csv *without* the header row, and reran the query, but got the same
error.

Any ideas what's causing the issue and how to resolve it?

Thanks

Reply via email to