Re: CTAS and save as parquet last column values are shown as null

Abhishek Girish Mon, 24 Jul 2017 08:34:00 -0700

Glad to know that it worked!

As you are using Drill on Windows, the new line delimiter in text files can
be different from that on Linux / Mac. We could see \r\n as the
lineDelimiter (carriage return & new line) and hence when we set the same
in the format plugin, the issue gets resolved. This particular one doesn't
seem to be documented - however ideally you should be able to see this in
[1], [2].


Coming to your question on date formats, you can refer to [3]. It has clear
examples on how to convert to a different date format. Do let us know if
that helped.

[1]
https://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions
[2]
https://drill.apache.org/docs/text-files-csv-tsv-psv/#configuring-drill-to-read-text-files
[3] https://drill.apache.org/docs/data-type-conversion/#to_date

On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot <[email protected]>
wrote:

> Thank you so much it worked
>
> Can you please provide me the pointer to the documentation where updation
> for different format type are mentioned .
>
> As I am facing another issue with date type as the data which I receive in
> csv format has the format of 15/1/2016 when I try to cast or convert
> to_date it throws me error
>
>
>
>
> Thanks ,
> Divya
>
> On 24 July 2017 at 14:17, Abhishek Girish <[email protected]> wrote:
>
>> Can you update your csv format plugin as shown below and retry your query?
>>
>>     "csv": {
>>       "type": "text",
>>       "extensions": [
>>         "csv"
>>       ],
>>       "lineDelimiter": "\r\n",
>>       "extractHeader": true,
>>       "delimiter": ","
>>     }
>>
>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <[email protected]>
>> wrote:
>>
>> > 0: jdbc:drill:zk=local> select * FROM
>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>> > PublicHoliday/PublicHoliday.csv`
>> > limit 10 ;
>> > +-----------------------------------------+
>> > |                 columns                 |
>> > +-----------------------------------------+
>> > | ["Day","Date","Area\r"]                 |
>> > | ["Friday","15/1/2016","Karnataka\r"]    |
>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>> > | ["Monday","7/3/2016","Karnataka\r"]     |
>> > | ["Friday","25/3/2016","Karnataka\r"]    |
>> > | ["Friday","1/4/2016","Karnataka\r"]     |
>> > | ["Friday","8/4/2016","Karnataka\r"]     |
>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>> > | ["Sunday","1/5/2016","Karnataka\r"]     |
>> > +-----------------------------------------+
>> > 10 rows selected (0.122 seconds)
>> > 0: jdbc:drill:zk=local> select * from
>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>> > limit 10 ;
>> > +-----------+------------+-------+
>> > |    Day    |    Date    | Area  |
>> > +-----------+------------+-------+
>> > | Friday    | 15/1/2016  | null  |
>> > | Tuesday   | 26/1/2016  | null  |
>> > | Monday    | 7/3/2016   | null  |
>> > | Friday    | 25/3/2016  | null  |
>> > | Friday    | 1/4/2016   | null  |
>> > | Friday    | 8/4/2016   | null  |
>> > | Thursday  | 14/4/2016  | null  |
>> > | Tuesday   | 19/4/2016  | null  |
>> > | Sunday    | 1/5/2016   | null  |
>> > | Monday    | 9/5/2016   | null  |
>> > +-----------+------------+-------+
>> > 10 rows selected (0.1 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> >
>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>> mode
>> > .
>> >
>> > On 24 July 2017 at 13:30, Divya Gehlot <[email protected]> wrote:
>> >
>> > >
>> > > Pasting the result set in text format
>> > >
>> > > *Reading parquet file format :*
>> > >
>> > >> Day       Date         Area
>> > >> Friday 15/1/2016 null
>> > >> Tuesday 26/1/2016 null
>> > >> Monday 7/3/2016 null
>> > >> Friday 25/3/2016 null
>> > >> Friday 1/4/2016 null
>> > >> Friday 8/4/2016 null
>> > >
>> > >
>> > >
>> > > *Reading csv file format *
>> > >
>> > >> columns
>> > >> ["Day","Date","Area\r"]
>> > >> ["Friday","1/4/2016","Karnataka\r"]
>> > >> ["Friday","15/1/2016","Karnataka\r"]
>> > >> ["Friday","25/3/2016","Karnataka\r"]
>> > >> ["Friday","8/4/2016","Karnataka\r"]
>> > >> ["Monday","7/3/2016","Karnataka\r"]
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > *CTAS query csv to parquet :*
>> > >
>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> SELECT
>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
>> AS
>> > >> VARCHAR(100)) END AS `Day`,
>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Date`
>> > AS
>> > >> VARCHAR(100)) END AS `Date`,
>> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Area`
>> > AS
>> > >> VARCHAR(100)) END AS `Area`
>> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> > ',',
>> > >> extractHeader => true))
>> > >
>> > >
>> > >
>> > > Thanks,
>> > > Divya
>> > >
>> > > On 24 July 2017 at 13:20, Abhishek Girish <[email protected]> wrote:
>> > >
>> > >> Unfortunately, the attachments / pictures haven't come through.
>> Mailing
>> > >> lists sometimes do not support these. Can you paste as text or share
>> > links
>> > >> to it instead?
>> > >>
>> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <
>> [email protected]>
>> > >> wrote:
>> > >>
>> > >> > yes it shows the proper values when I query the csv file.
>> > >> > CTAS query csv to parquet :
>> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> > SELECT
>> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Day`
>> > AS
>> > >> > VARCHAR(100)) END AS `Day`,
>> > >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Date`
>> > >> AS
>> > >> > VARCHAR(100)) END AS `Date`,
>> > >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Area`
>> > >> AS
>> > >> > VARCHAR(100)) END AS `Area`
>> > >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter
>> =>
>> > >> ',',
>> > >> > extractHeader => true))
>> > >> >
>> > >> > CSV File
>> > >> >
>> > >> > Parquet File
>> > >> >
>> > >> >
>> > >> >
>> > >> > Appreciate the help !
>> > >> >
>> > >> > Thanks,
>> > >> > Divya 
>> > >> >
>> > >> > On 24 July 2017 at 11:52, Abhishek Girish <[email protected]>
>> wrote:
>> > >> >
>> > >> >> Can you share a sample row from the CSV and the CTAS query? Also
>> test
>> > >> if a
>> > >> >> select columns[n] query on the CSV file works as expected [1] ?
>> > >> >>
>> > >> >> It could be an issue with delimiters.
>> > >> >>
>> > >> >> [1]
>> > >> >> https://drill.apache.org/docs/querying-plain-text-files/#col
>> > >> >> umns[n]-syntax
>> > >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot <
>> > [email protected]>
>> > >> >> wrote:
>> > >> >>
>> > >> >> > Hi ,
>> > >> >> > I am facing as weird issue when I CTAS and save the csv file as
>> > >> parquet
>> > >> >> it
>> > >> >> > displays the last column values as null .
>> > >> >> > This is not the case with one file .
>> > >> >> > If I take any csv file with even with any data type and do a
>> > >> >> > select column1,column2,column3 from table.parquet
>> > >> >> > it shows the column3 values as null.
>> > >> >> >
>> > >> >> > Appreciate the help.
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Divya
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: CTAS and save as parquet last column values are shown as null

Reply via email to