The error actually happens when I remove the header row from the file.

cat test2.csv
"Ed",100
"Pete",200
"Ed",100
"Pete",400


0: jdbc:drill:zk=local> select columns[0], columns[1] from
dfs.`/data/test2.csv`;
+------------+------------+
|   EXPR$0   |   EXPR$1   |
+------------+------------+
| "Ed"       | 100        |
| "Pete"     | 200        |
| "Ed"       | 100        |
| "Pete"     | 400        |
+------------+------------+
4 rows selected (0.837 seconds)

0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) from
dfs.`/data/test2.csv` group by columns[0];
Query failed: Query failed: Failure while running fragment., Only COUNT
aggregate function supported for Boolean type [
a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010 ]
[ a9344c36-ebb4-4b9d-9c7d-75a0f7e52edd on sandbox.hortonworks.com:31010 ]


Error: exception while executing query: Failure while executing query.
(state=,code=0)

Also, I don't see anything on the Drill wiki about filtering out the header
row.  How is that done?

Thanks


On Sun, Feb 8, 2015 at 9:23 PM, Andries Engelbrecht <
[email protected]> wrote:

> You need to filter out the header line on the CSV file as you are trying
> to sum a string in column1.
>
>
> —Andries
>
>
> On Feb 8, 2015, at 6:12 PM, Minnow Noir <[email protected]> wrote:
>
> > I'm trying to perform a basic query in order to learn Drill, but getting
> an
> > the error message in the subject line.
> >
> > I created a dead simple CSV file on disk.  Note that Sales values are not
> > quoted.
> >
> > cat test.csv
> > Employee,Sales
> > Ed,100
> > Pete,200
> > Ed,100
> > Pete,400
> >
> > When I query it without performing a sum, it returns the expected values.
> >
> > 0: jdbc:drill:zk=local> select columns[0], columns[1] as TotalSales from
> > dfs.`/data/test.csv`;
> > +------------+------------+
> > |   EXPR$0   | TotalSales |
> > +------------+------------+
> > | Employee   | Sales      |
> > | Ed         | 100        |
> > | Pete       | 200        |
> > | Ed         | 100        |
> > | Pete       | 400        |
> > +------------+------------+
> > 5 rows selected (0.11 seconds)
> >
> > However, if I throw a sum() in there, I get the confusing error message
> in
> > the subject line:
> >
> > 0: jdbc:drill:zk=local> select columns[0], sum(columns[1]) as TotalSales
> > from dfs.`
> > /data/test.csv` group by columns[0];
> > Query failed: Query failed: Failure while running fragment., Only COUNT
> > aggregate function supported for Boolean type [
> > bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010 ]
> > [ bfd34bd1-2fac-4d9e-a9bd-26bced552120 on sandbox.hortonworks.com:31010
> ]
> >
> > The message seems to be saying that Drill interpreted the Sales column
> data
> > as being Boolean somehow, and therefore, the only function that can be
> > called is count().  It's not clear why Drill would interpret the Sales
> > column values as being Boolean.
> >
> > Some Googling turned up this thread, which says the error also occurs for
> > character data: https://issues.apache.org/jira/browse/DRILL-1998
> >
> > Of course, it's not clear why Drill would interpret 100, 200, etc. as
> being
> > character data unless it's getting thrown off by the header row...which
> of
> > course is present in every CSV and TSV file.  However, I created a copy
> of
> > test.csv *without* the header row, and reran the query, but got the same
> > error.
> >
> > Any ideas what's causing the issue and how to resolve it?
> >
> > Thanks
>
>

Reply via email to