Hello,

This is my format setting:

    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
    }

I was able to extract the header and get expected results:


> select * from mfs.tmp.`abcd.csv`;
+----+----+----+----+
| A  | B  | C  | D  |
+----+----+----+----+
| 1  | 2  | 3  | 4  |
| 2  | 3  | 4  | 5  |
| 3  | 4  | 5  | 6  |
+----+----+----+----+
3 rows selected (0.196 seconds)

> select A from mfs.tmp.`abcd.csv`;
+----+
| A  |
+----+
| 1  |
| 2  |
| 3  |
+----+
3 rows selected (0.16 seconds)

I am using a MapR cluster with Drill 1.6.0. I had also enabled the new text
reader.

Note: My initial query failed to extract header, similar to what you
reported. I had to set the "skipFirstLine" option to true, for it to work.
Strangely, for subsequent queries, it works even after removing / disabling
the "skipFirstLine" option. This could be a bug, but I'm not able to
reproduce it right now. Will file a JIRA once i have more clarity.



Regards,
Abhishek

On Fri, Apr 15, 2016 at 10:53 AM, Matt <[email protected]> wrote:

> With files in the local filesystem, and an embedded drill bit from the
> download on drill.apache.org, I can successfully query csv data by column
> name with the extractHeader option on, as in SELECT customer_if FROM `file`;
>
> But in a MapR cluster (v. 5.1.0.37549.GA) with the data in MapR-FS, the
> extractHeader options does not seem to be taking effect. A plain "SELECT *"
> returns rows with the header as a data row, not in the columns list.
>
> I have verified that exec.storage.enable_new_text_reader is true, and in
> both cases csv storage is defined as:
>
> ~~~
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     }
> ~~~
>
> Of course with the csv reader not extracting the columns, an attempt to
> reference columns by name results in:
>
> Error: DATA_READ ERROR: Selected column 'customer_id' must have name
> 'columns' or must be plain '*'. In trying to diagnose the issue, I noted
> that at times the file header row not being part of the SELECT * results,
> but also not being used to detect column names.
>
> Both cases are Drill v1.6.0, but the MapR installed version has a
> different commit than the standalone copy I am using:
>
> MapR:
>
> ~~~
>
> +----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> | version  |                 commit_id                 |
>                             commit_message
>             |        commit_time         | build_email  |
>  build_time         |
>
> +----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> | 1.6.0    | 2d532bd206d7ae9f3cb703ee7f51ae3764374d43  | MD-850: Treat the
> type of decimal literals as DOUBLE only when
> planner.enable_decimal_data_type is true  | 31.03.2016 @ 04:47:25 UTC  |
> Unknown      | 31.03.2016 @ 04:40:54 UTC  |
>
> +----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> ~~~
>
> Local:
>
> ~~~
>
> +----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
> | version  |                 commit_id                 |
>  commit_message                    |        commit_time         |
> build_email     |         build_time         |
>
> +----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
> | 1.6.0    | d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb  |
> [maven-release-plugin] prepare release drill-1.6.0  | 10.03.2016 @ 16:34:37
> PST  | [email protected]  | 10.03.2016 @ 17:45:29 PST  |
>
> +----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
> ~~~

Reply via email to