Re: CTAS and save as parquet last column values are shown as null

Abhishek Girish Sun, 23 Jul 2017 23:17:34 -0700

Can you update your csv format plugin as shown below and retry your query?

    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "lineDelimiter": "\r\n",
      "extractHeader": true,
      "delimiter": ","
    }


On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <[email protected]>
wrote:

> 0: jdbc:drill:zk=local> select * FROM
>  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> PublicHoliday/PublicHoliday.csv`
> limit 10 ;
> +-----------------------------------------+
> |                 columns                 |
> +-----------------------------------------+
> | ["Day","Date","Area\r"]                 |
> | ["Friday","15/1/2016","Karnataka\r"]    |
> | ["Tuesday","26/1/2016","Karnataka\r"]   |
> | ["Monday","7/3/2016","Karnataka\r"]     |
> | ["Friday","25/3/2016","Karnataka\r"]    |
> | ["Friday","1/4/2016","Karnataka\r"]     |
> | ["Friday","8/4/2016","Karnataka\r"]     |
> | ["Thursday","14/4/2016","Karnataka\r"]  |
> | ["Tuesday","19/4/2016","Karnataka\r"]   |
> | ["Sunday","1/5/2016","Karnataka\r"]     |
> +-----------------------------------------+
> 10 rows selected (0.122 seconds)
> 0: jdbc:drill:zk=local> select * from
> `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> limit 10 ;
> +-----------+------------+-------+
> |    Day    |    Date    | Area  |
> +-----------+------------+-------+
> | Friday    | 15/1/2016  | null  |
> | Tuesday   | 26/1/2016  | null  |
> | Monday    | 7/3/2016   | null  |
> | Friday    | 25/3/2016  | null  |
> | Friday    | 1/4/2016   | null  |
> | Friday    | 8/4/2016   | null  |
> | Thursday  | 14/4/2016  | null  |
> | Tuesday   | 19/4/2016  | null  |
> | Sunday    | 1/5/2016   | null  |
> | Monday    | 9/5/2016   | null  |
> +-----------+------------+-------+
> 10 rows selected (0.1 seconds)
> 0: jdbc:drill:zk=local>
>
>
> *Drill set up* : Aapche drill is set up on Windows machine in embedded mode
> .
>
> On 24 July 2017 at 13:30, Divya Gehlot <[email protected]> wrote:
>
> >
> > Pasting the result set in text format
> >
> > *Reading parquet file format :*
> >
> >> Day       Date         Area
> >> Friday 15/1/2016 null
> >> Tuesday 26/1/2016 null
> >> Monday 7/3/2016 null
> >> Friday 25/3/2016 null
> >> Friday 1/4/2016 null
> >> Friday 8/4/2016 null
> >
> >
> >
> > *Reading csv file format *
> >
> >> columns
> >> ["Day","Date","Area\r"]
> >> ["Friday","1/4/2016","Karnataka\r"]
> >> ["Friday","15/1/2016","Karnataka\r"]
> >> ["Friday","25/3/2016","Karnataka\r"]
> >> ["Friday","8/4/2016","Karnataka\r"]
> >> ["Monday","7/3/2016","Karnataka\r"]
> >
> >
> >
> >
> >
> > *CTAS query csv to parquet :*
> >
> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> SELECT
> >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> >> VARCHAR(100)) END AS `Day`,
> >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> AS
> >> VARCHAR(100)) END AS `Date`,
> >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> AS
> >> VARCHAR(100)) END AS `Area`
> >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> ',',
> >> extractHeader => true))
> >
> >
> >
> > Thanks,
> > Divya
> >
> > On 24 July 2017 at 13:20, Abhishek Girish <[email protected]> wrote:
> >
> >> Unfortunately, the attachments / pictures haven't come through. Mailing
> >> lists sometimes do not support these. Can you paste as text or share
> links
> >> to it instead?
> >>
> >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <[email protected]>
> >> wrote:
> >>
> >> > yes it shows the proper values when I query the csv file.
> >> > CTAS query csv to parquet :
> >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> > SELECT
> >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> AS
> >> > VARCHAR(100)) END AS `Day`,
> >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> >> AS
> >> > VARCHAR(100)) END AS `Date`,
> >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> >> AS
> >> > VARCHAR(100)) END AS `Area`
> >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> >> ',',
> >> > extractHeader => true))
> >> >
> >> > CSV File
> >> >
> >> > Parquet File
> >> >
> >> >
> >> >
> >> > Appreciate the help !
> >> >
> >> > Thanks,
> >> > Divya 
> >> >
> >> > On 24 July 2017 at 11:52, Abhishek Girish <[email protected]> wrote:
> >> >
> >> >> Can you share a sample row from the CSV and the CTAS query? Also test
> >> if a
> >> >> select columns[n] query on the CSV file works as expected [1] ?
> >> >>
> >> >> It could be an issue with delimiters.
> >> >>
> >> >> [1]
> >> >> https://drill.apache.org/docs/querying-plain-text-files/#col
> >> >> umns[n]-syntax
> >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot <
> [email protected]>
> >> >> wrote:
> >> >>
> >> >> > Hi ,
> >> >> > I am facing as weird issue when I CTAS and save the csv file as
> >> parquet
> >> >> it
> >> >> > displays the last column values as null .
> >> >> > This is not the case with one file .
> >> >> > If I take any csv file with even with any data type and do a
> >> >> > select column1,column2,column3 from table.parquet
> >> >> > it shows the column3 values as null.
> >> >> >
> >> >> > Appreciate the help.
> >> >> >
> >> >> > Thanks,
> >> >> > Divya
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: CTAS and save as parquet last column values are shown as null

Reply via email to