Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish
Filed DRILL-5684  to
track the doc issue.

On Mon, Jul 24, 2017 at 8:33 AM, Abhishek Girish  wrote:

> Glad to know that it worked!
>
> As you are using Drill on Windows, the new line delimiter in text files
> can be different from that on Linux / Mac. We could see \r\n as the
> lineDelimiter (carriage return & new line) and hence when we set the same
> in the format plugin, the issue gets resolved. This particular one doesn't
> seem to be documented - however ideally you should be able to see this in
> [1], [2].
>
> Coming to your question on date formats, you can refer to [3]. It has
> clear examples on how to convert to a different date format. Do let us know
> if that helped.
>
> [1] https://drill.apache.org/docs/plugin-configuration-
> basics/#list-of-attributes-and-definitions
> [2] https://drill.apache.org/docs/text-files-csv-tsv-psv/#
> configuring-drill-to-read-text-files
> [3] https://drill.apache.org/docs/data-type-conversion/#to_date
>
> On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot 
> wrote:
>
>> Thank you so much it worked
>>
>> Can you please provide me the pointer to the documentation where updation
>> for different format type are mentioned .
>>
>> As I am facing another issue with date type as the data which I receive
>> in csv format has the format of 15/1/2016 when I try to cast or convert
>> to_date it throws me error
>>
>>
>>
>>
>> Thanks ,
>> Divya
>>
>> On 24 July 2017 at 14:17, Abhishek Girish  wrote:
>>
>>> Can you update your csv format plugin as shown below and retry your
>>> query?
>>>
>>> "csv": {
>>>   "type": "text",
>>>   "extensions": [
>>> "csv"
>>>   ],
>>>   "lineDelimiter": "\r\n",
>>>   "extractHeader": true,
>>>   "delimiter": ","
>>> }
>>>
>>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
>>> wrote:
>>>
>>> > 0: jdbc:drill:zk=local> select * FROM
>>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>>> > PublicHoliday/PublicHoliday.csv`
>>> > limit 10 ;
>>> > +-+
>>> > | columns |
>>> > +-+
>>> > | ["Day","Date","Area\r"] |
>>> > | ["Friday","15/1/2016","Karnataka\r"]|
>>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>>> > | ["Monday","7/3/2016","Karnataka\r"] |
>>> > | ["Friday","25/3/2016","Karnataka\r"]|
>>> > | ["Friday","1/4/2016","Karnataka\r"] |
>>> > | ["Friday","8/4/2016","Karnataka\r"] |
>>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>>> > +-+
>>> > 10 rows selected (0.122 seconds)
>>> > 0: jdbc:drill:zk=local> select * from
>>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>>> > limit 10 ;
>>> > +---++---+
>>> > |Day|Date| Area  |
>>> > +---++---+
>>> > | Friday| 15/1/2016  | null  |
>>> > | Tuesday   | 26/1/2016  | null  |
>>> > | Monday| 7/3/2016   | null  |
>>> > | Friday| 25/3/2016  | null  |
>>> > | Friday| 1/4/2016   | null  |
>>> > | Friday| 8/4/2016   | null  |
>>> > | Thursday  | 14/4/2016  | null  |
>>> > | Tuesday   | 19/4/2016  | null  |
>>> > | Sunday| 1/5/2016   | null  |
>>> > | Monday| 9/5/2016   | null  |
>>> > +---++---+
>>> > 10 rows selected (0.1 seconds)
>>> > 0: jdbc:drill:zk=local>
>>> >
>>> >
>>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>>> mode
>>> > .
>>> >
>>> > On 24 July 2017 at 13:30, Divya Gehlot 
>>> wrote:
>>> >
>>> > >
>>> > > Pasting the result set in text format
>>> > >
>>> > > *Reading parquet file format :*
>>> > >
>>> > >> Day   Date Area
>>> > >> Friday 15/1/2016 null
>>> > >> Tuesday 26/1/2016 null
>>> > >> Monday 7/3/2016 null
>>> > >> Friday 25/3/2016 null
>>> > >> Friday 1/4/2016 null
>>> > >> Friday 8/4/2016 null
>>> > >
>>> > >
>>> > >
>>> > > *Reading csv file format *
>>> > >
>>> > >> columns
>>> > >> ["Day","Date","Area\r"]
>>> > >> ["Friday","1/4/2016","Karnataka\r"]
>>> > >> ["Friday","15/1/2016","Karnataka\r"]
>>> > >> ["Friday","25/3/2016","Karnataka\r"]
>>> > >> ["Friday","8/4/2016","Karnataka\r"]
>>> > >> ["Monday","7/3/2016","Karnataka\r"]
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > *CTAS query csv to parquet :*
>>> > >
>>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>>> > >> SELECT
>>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>>> CAST(`Day` AS
>>> > >> VARCHAR(100)) END AS `Day`,
>>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>>> 

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish
Glad to know that it worked!

As you are using Drill on Windows, the new line delimiter in text files can
be different from that on Linux / Mac. We could see \r\n as the
lineDelimiter (carriage return & new line) and hence when we set the same
in the format plugin, the issue gets resolved. This particular one doesn't
seem to be documented - however ideally you should be able to see this in
[1], [2].

Coming to your question on date formats, you can refer to [3]. It has clear
examples on how to convert to a different date format. Do let us know if
that helped.

[1]
https://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions
[2]
https://drill.apache.org/docs/text-files-csv-tsv-psv/#configuring-drill-to-read-text-files
[3] https://drill.apache.org/docs/data-type-conversion/#to_date

On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot 
wrote:

> Thank you so much it worked
>
> Can you please provide me the pointer to the documentation where updation
> for different format type are mentioned .
>
> As I am facing another issue with date type as the data which I receive in
> csv format has the format of 15/1/2016 when I try to cast or convert
> to_date it throws me error
>
>
>
>
> Thanks ,
> Divya
>
> On 24 July 2017 at 14:17, Abhishek Girish  wrote:
>
>> Can you update your csv format plugin as shown below and retry your query?
>>
>> "csv": {
>>   "type": "text",
>>   "extensions": [
>> "csv"
>>   ],
>>   "lineDelimiter": "\r\n",
>>   "extractHeader": true,
>>   "delimiter": ","
>> }
>>
>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
>> wrote:
>>
>> > 0: jdbc:drill:zk=local> select * FROM
>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>> > PublicHoliday/PublicHoliday.csv`
>> > limit 10 ;
>> > +-+
>> > | columns |
>> > +-+
>> > | ["Day","Date","Area\r"] |
>> > | ["Friday","15/1/2016","Karnataka\r"]|
>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>> > | ["Monday","7/3/2016","Karnataka\r"] |
>> > | ["Friday","25/3/2016","Karnataka\r"]|
>> > | ["Friday","1/4/2016","Karnataka\r"] |
>> > | ["Friday","8/4/2016","Karnataka\r"] |
>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>> > +-+
>> > 10 rows selected (0.122 seconds)
>> > 0: jdbc:drill:zk=local> select * from
>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>> > limit 10 ;
>> > +---++---+
>> > |Day|Date| Area  |
>> > +---++---+
>> > | Friday| 15/1/2016  | null  |
>> > | Tuesday   | 26/1/2016  | null  |
>> > | Monday| 7/3/2016   | null  |
>> > | Friday| 25/3/2016  | null  |
>> > | Friday| 1/4/2016   | null  |
>> > | Friday| 8/4/2016   | null  |
>> > | Thursday  | 14/4/2016  | null  |
>> > | Tuesday   | 19/4/2016  | null  |
>> > | Sunday| 1/5/2016   | null  |
>> > | Monday| 9/5/2016   | null  |
>> > +---++---+
>> > 10 rows selected (0.1 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> >
>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>> mode
>> > .
>> >
>> > On 24 July 2017 at 13:30, Divya Gehlot  wrote:
>> >
>> > >
>> > > Pasting the result set in text format
>> > >
>> > > *Reading parquet file format :*
>> > >
>> > >> Day   Date Area
>> > >> Friday 15/1/2016 null
>> > >> Tuesday 26/1/2016 null
>> > >> Monday 7/3/2016 null
>> > >> Friday 25/3/2016 null
>> > >> Friday 1/4/2016 null
>> > >> Friday 8/4/2016 null
>> > >
>> > >
>> > >
>> > > *Reading csv file format *
>> > >
>> > >> columns
>> > >> ["Day","Date","Area\r"]
>> > >> ["Friday","1/4/2016","Karnataka\r"]
>> > >> ["Friday","15/1/2016","Karnataka\r"]
>> > >> ["Friday","25/3/2016","Karnataka\r"]
>> > >> ["Friday","8/4/2016","Karnataka\r"]
>> > >> ["Monday","7/3/2016","Karnataka\r"]
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > *CTAS query csv to parquet :*
>> > >
>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> SELECT
>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
>> AS
>> > >> VARCHAR(100)) END AS `Day`,
>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Date`
>> > AS
>> > >> VARCHAR(100)) END AS `Date`,
>> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Area`
>> > AS
>> > >> VARCHAR(100)) END AS `Area`
>> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> > ',',
>> > >> extractHeader => true))
>> > >
>> > >
>> > >
>> > > Thanks,
>> > 

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Divya Gehlot
Any best practices guide if any body could share would be great



On 24 July 2017 at 14:52, Divya Gehlot  wrote:

> Thank you so much it worked
>
> Can you please provide me the pointer to the documentation where updation
> for different format type are mentioned .
>
> As I am facing another issue with date type as the data which I receive in
> csv format has the format of 15/1/2016 when I try to cast or convert
> to_date it throws me error
>
>
>
>
> Thanks ,
> Divya
>
> On 24 July 2017 at 14:17, Abhishek Girish  wrote:
>
>> Can you update your csv format plugin as shown below and retry your query?
>>
>> "csv": {
>>   "type": "text",
>>   "extensions": [
>> "csv"
>>   ],
>>   "lineDelimiter": "\r\n",
>>   "extractHeader": true,
>>   "delimiter": ","
>> }
>>
>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
>> wrote:
>>
>> > 0: jdbc:drill:zk=local> select * FROM
>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>> > PublicHoliday/PublicHoliday.csv`
>> > limit 10 ;
>> > +-+
>> > | columns |
>> > +-+
>> > | ["Day","Date","Area\r"] |
>> > | ["Friday","15/1/2016","Karnataka\r"]|
>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>> > | ["Monday","7/3/2016","Karnataka\r"] |
>> > | ["Friday","25/3/2016","Karnataka\r"]|
>> > | ["Friday","1/4/2016","Karnataka\r"] |
>> > | ["Friday","8/4/2016","Karnataka\r"] |
>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>> > +-+
>> > 10 rows selected (0.122 seconds)
>> > 0: jdbc:drill:zk=local> select * from
>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>> > limit 10 ;
>> > +---++---+
>> > |Day|Date| Area  |
>> > +---++---+
>> > | Friday| 15/1/2016  | null  |
>> > | Tuesday   | 26/1/2016  | null  |
>> > | Monday| 7/3/2016   | null  |
>> > | Friday| 25/3/2016  | null  |
>> > | Friday| 1/4/2016   | null  |
>> > | Friday| 8/4/2016   | null  |
>> > | Thursday  | 14/4/2016  | null  |
>> > | Tuesday   | 19/4/2016  | null  |
>> > | Sunday| 1/5/2016   | null  |
>> > | Monday| 9/5/2016   | null  |
>> > +---++---+
>> > 10 rows selected (0.1 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> >
>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>> mode
>> > .
>> >
>> > On 24 July 2017 at 13:30, Divya Gehlot  wrote:
>> >
>> > >
>> > > Pasting the result set in text format
>> > >
>> > > *Reading parquet file format :*
>> > >
>> > >> Day   Date Area
>> > >> Friday 15/1/2016 null
>> > >> Tuesday 26/1/2016 null
>> > >> Monday 7/3/2016 null
>> > >> Friday 25/3/2016 null
>> > >> Friday 1/4/2016 null
>> > >> Friday 8/4/2016 null
>> > >
>> > >
>> > >
>> > > *Reading csv file format *
>> > >
>> > >> columns
>> > >> ["Day","Date","Area\r"]
>> > >> ["Friday","1/4/2016","Karnataka\r"]
>> > >> ["Friday","15/1/2016","Karnataka\r"]
>> > >> ["Friday","25/3/2016","Karnataka\r"]
>> > >> ["Friday","8/4/2016","Karnataka\r"]
>> > >> ["Monday","7/3/2016","Karnataka\r"]
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > *CTAS query csv to parquet :*
>> > >
>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> SELECT
>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
>> AS
>> > >> VARCHAR(100)) END AS `Day`,
>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Date`
>> > AS
>> > >> VARCHAR(100)) END AS `Date`,
>> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Area`
>> > AS
>> > >> VARCHAR(100)) END AS `Area`
>> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> > ',',
>> > >> extractHeader => true))
>> > >
>> > >
>> > >
>> > > Thanks,
>> > > Divya
>> > >
>> > > On 24 July 2017 at 13:20, Abhishek Girish  wrote:
>> > >
>> > >> Unfortunately, the attachments / pictures haven't come through.
>> Mailing
>> > >> lists sometimes do not support these. Can you paste as text or share
>> > links
>> > >> to it instead?
>> > >>
>> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <
>> divya.htco...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > yes it shows the proper values when I query the csv file.
>> > >> > CTAS query csv to parquet :
>> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> > SELECT
>> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Day`
>> > AS
>> > >> > VARCHAR(100)) END AS `Day`,
>> > >> > CASE WHEN `Date` = '' THEN 

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Divya Gehlot
Thank you so much it worked

Can you please provide me the pointer to the documentation where updation
for different format type are mentioned .

As I am facing another issue with date type as the data which I receive in
csv format has the format of 15/1/2016 when I try to cast or convert
to_date it throws me error




Thanks ,
Divya

On 24 July 2017 at 14:17, Abhishek Girish  wrote:

> Can you update your csv format plugin as shown below and retry your query?
>
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "lineDelimiter": "\r\n",
>   "extractHeader": true,
>   "delimiter": ","
> }
>
> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
> wrote:
>
> > 0: jdbc:drill:zk=local> select * FROM
> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> > PublicHoliday/PublicHoliday.csv`
> > limit 10 ;
> > +-+
> > | columns |
> > +-+
> > | ["Day","Date","Area\r"] |
> > | ["Friday","15/1/2016","Karnataka\r"]|
> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
> > | ["Monday","7/3/2016","Karnataka\r"] |
> > | ["Friday","25/3/2016","Karnataka\r"]|
> > | ["Friday","1/4/2016","Karnataka\r"] |
> > | ["Friday","8/4/2016","Karnataka\r"] |
> > | ["Thursday","14/4/2016","Karnataka\r"]  |
> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
> > | ["Sunday","1/5/2016","Karnataka\r"] |
> > +-+
> > 10 rows selected (0.122 seconds)
> > 0: jdbc:drill:zk=local> select * from
> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> > limit 10 ;
> > +---++---+
> > |Day|Date| Area  |
> > +---++---+
> > | Friday| 15/1/2016  | null  |
> > | Tuesday   | 26/1/2016  | null  |
> > | Monday| 7/3/2016   | null  |
> > | Friday| 25/3/2016  | null  |
> > | Friday| 1/4/2016   | null  |
> > | Friday| 8/4/2016   | null  |
> > | Thursday  | 14/4/2016  | null  |
> > | Tuesday   | 19/4/2016  | null  |
> > | Sunday| 1/5/2016   | null  |
> > | Monday| 9/5/2016   | null  |
> > +---++---+
> > 10 rows selected (0.1 seconds)
> > 0: jdbc:drill:zk=local>
> >
> >
> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
> mode
> > .
> >
> > On 24 July 2017 at 13:30, Divya Gehlot  wrote:
> >
> > >
> > > Pasting the result set in text format
> > >
> > > *Reading parquet file format :*
> > >
> > >> Day   Date Area
> > >> Friday 15/1/2016 null
> > >> Tuesday 26/1/2016 null
> > >> Monday 7/3/2016 null
> > >> Friday 25/3/2016 null
> > >> Friday 1/4/2016 null
> > >> Friday 8/4/2016 null
> > >
> > >
> > >
> > > *Reading csv file format *
> > >
> > >> columns
> > >> ["Day","Date","Area\r"]
> > >> ["Friday","1/4/2016","Karnataka\r"]
> > >> ["Friday","15/1/2016","Karnataka\r"]
> > >> ["Friday","25/3/2016","Karnataka\r"]
> > >> ["Friday","8/4/2016","Karnataka\r"]
> > >> ["Monday","7/3/2016","Karnataka\r"]
> > >
> > >
> > >
> > >
> > >
> > > *CTAS query csv to parquet :*
> > >
> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > >> SELECT
> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> AS
> > >> VARCHAR(100)) END AS `Day`,
> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> > AS
> > >> VARCHAR(100)) END AS `Date`,
> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> > AS
> > >> VARCHAR(100)) END AS `Area`
> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> > ',',
> > >> extractHeader => true))
> > >
> > >
> > >
> > > Thanks,
> > > Divya
> > >
> > > On 24 July 2017 at 13:20, Abhishek Girish  wrote:
> > >
> > >> Unfortunately, the attachments / pictures haven't come through.
> Mailing
> > >> lists sometimes do not support these. Can you paste as text or share
> > links
> > >> to it instead?
> > >>
> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <
> divya.htco...@gmail.com>
> > >> wrote:
> > >>
> > >> > yes it shows the proper values when I query the csv file.
> > >> > CTAS query csv to parquet :
> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > >> > SELECT
> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> > AS
> > >> > VARCHAR(100)) END AS `Day`,
> > >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
> CAST(`Date`
> > >> AS
> > >> > VARCHAR(100)) END AS `Date`,
> > >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
> CAST(`Area`
> > >> AS
> > >> > VARCHAR(100)) END AS `Area`
> > >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> > >> ',',
> > 

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish
Can you update your csv format plugin as shown below and retry your query?

"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "lineDelimiter": "\r\n",
  "extractHeader": true,
  "delimiter": ","
}

On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
wrote:

> 0: jdbc:drill:zk=local> select * FROM
>  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> PublicHoliday/PublicHoliday.csv`
> limit 10 ;
> +-+
> | columns |
> +-+
> | ["Day","Date","Area\r"] |
> | ["Friday","15/1/2016","Karnataka\r"]|
> | ["Tuesday","26/1/2016","Karnataka\r"]   |
> | ["Monday","7/3/2016","Karnataka\r"] |
> | ["Friday","25/3/2016","Karnataka\r"]|
> | ["Friday","1/4/2016","Karnataka\r"] |
> | ["Friday","8/4/2016","Karnataka\r"] |
> | ["Thursday","14/4/2016","Karnataka\r"]  |
> | ["Tuesday","19/4/2016","Karnataka\r"]   |
> | ["Sunday","1/5/2016","Karnataka\r"] |
> +-+
> 10 rows selected (0.122 seconds)
> 0: jdbc:drill:zk=local> select * from
> `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> limit 10 ;
> +---++---+
> |Day|Date| Area  |
> +---++---+
> | Friday| 15/1/2016  | null  |
> | Tuesday   | 26/1/2016  | null  |
> | Monday| 7/3/2016   | null  |
> | Friday| 25/3/2016  | null  |
> | Friday| 1/4/2016   | null  |
> | Friday| 8/4/2016   | null  |
> | Thursday  | 14/4/2016  | null  |
> | Tuesday   | 19/4/2016  | null  |
> | Sunday| 1/5/2016   | null  |
> | Monday| 9/5/2016   | null  |
> +---++---+
> 10 rows selected (0.1 seconds)
> 0: jdbc:drill:zk=local>
>
>
> *Drill set up* : Aapche drill is set up on Windows machine in embedded mode
> .
>
> On 24 July 2017 at 13:30, Divya Gehlot  wrote:
>
> >
> > Pasting the result set in text format
> >
> > *Reading parquet file format :*
> >
> >> Day   Date Area
> >> Friday 15/1/2016 null
> >> Tuesday 26/1/2016 null
> >> Monday 7/3/2016 null
> >> Friday 25/3/2016 null
> >> Friday 1/4/2016 null
> >> Friday 8/4/2016 null
> >
> >
> >
> > *Reading csv file format *
> >
> >> columns
> >> ["Day","Date","Area\r"]
> >> ["Friday","1/4/2016","Karnataka\r"]
> >> ["Friday","15/1/2016","Karnataka\r"]
> >> ["Friday","25/3/2016","Karnataka\r"]
> >> ["Friday","8/4/2016","Karnataka\r"]
> >> ["Monday","7/3/2016","Karnataka\r"]
> >
> >
> >
> >
> >
> > *CTAS query csv to parquet :*
> >
> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> SELECT
> >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> >> VARCHAR(100)) END AS `Day`,
> >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> AS
> >> VARCHAR(100)) END AS `Date`,
> >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> AS
> >> VARCHAR(100)) END AS `Area`
> >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> ',',
> >> extractHeader => true))
> >
> >
> >
> > Thanks,
> > Divya
> >
> > On 24 July 2017 at 13:20, Abhishek Girish  wrote:
> >
> >> Unfortunately, the attachments / pictures haven't come through. Mailing
> >> lists sometimes do not support these. Can you paste as text or share
> links
> >> to it instead?
> >>
> >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
> >> wrote:
> >>
> >> > yes it shows the proper values when I query the csv file.
> >> > CTAS query csv to parquet :
> >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> > SELECT
> >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> AS
> >> > VARCHAR(100)) END AS `Day`,
> >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> >> AS
> >> > VARCHAR(100)) END AS `Date`,
> >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> >> AS
> >> > VARCHAR(100)) END AS `Area`
> >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> >> ',',
> >> > extractHeader => true))
> >> >
> >> > CSV File
> >> >
> >> > Parquet File
> >> >
> >> >
> >> >
> >> > Appreciate the help !
> >> >
> >> > Thanks,
> >> > Divya ​
> >> >
> >> > On 24 July 2017 at 11:52, Abhishek Girish  wrote:
> >> >
> >> >> Can you share a sample row from the CSV and the CTAS query? Also test
> >> if a
> >> >> select columns[n] query on the CSV file works as expected [1] ?
> >> >>
> >> >> It could be an issue with delimiters.
> >> >>
> >> >> [1]
> >> >> https://drill.apache.org/docs/querying-plain-text-files/#col
> >> >> umns[n]-syntax
> >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot <
> divya.htco...@gmail.com>
> >> >> wrote:
> >> >>
> >> 

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot
0: jdbc:drill:zk=local> select * FROM
 
dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/PublicHoliday/PublicHoliday.csv`
limit 10 ;
+-+
| columns |
+-+
| ["Day","Date","Area\r"] |
| ["Friday","15/1/2016","Karnataka\r"]|
| ["Tuesday","26/1/2016","Karnataka\r"]   |
| ["Monday","7/3/2016","Karnataka\r"] |
| ["Friday","25/3/2016","Karnataka\r"]|
| ["Friday","1/4/2016","Karnataka\r"] |
| ["Friday","8/4/2016","Karnataka\r"] |
| ["Thursday","14/4/2016","Karnataka\r"]  |
| ["Tuesday","19/4/2016","Karnataka\r"]   |
| ["Sunday","1/5/2016","Karnataka\r"] |
+-+
10 rows selected (0.122 seconds)
0: jdbc:drill:zk=local> select * from
`dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
limit 10 ;
+---++---+
|Day|Date| Area  |
+---++---+
| Friday| 15/1/2016  | null  |
| Tuesday   | 26/1/2016  | null  |
| Monday| 7/3/2016   | null  |
| Friday| 25/3/2016  | null  |
| Friday| 1/4/2016   | null  |
| Friday| 8/4/2016   | null  |
| Thursday  | 14/4/2016  | null  |
| Tuesday   | 19/4/2016  | null  |
| Sunday| 1/5/2016   | null  |
| Monday| 9/5/2016   | null  |
+---++---+
10 rows selected (0.1 seconds)
0: jdbc:drill:zk=local>


*Drill set up* : Aapche drill is set up on Windows machine in embedded mode
.

On 24 July 2017 at 13:30, Divya Gehlot  wrote:

>
> Pasting the result set in text format
>
> *Reading parquet file format :*
>
>> Day   Date Area
>> Friday 15/1/2016 null
>> Tuesday 26/1/2016 null
>> Monday 7/3/2016 null
>> Friday 25/3/2016 null
>> Friday 1/4/2016 null
>> Friday 8/4/2016 null
>
>
>
> *Reading csv file format *
>
>> columns
>> ["Day","Date","Area\r"]
>> ["Friday","1/4/2016","Karnataka\r"]
>> ["Friday","15/1/2016","Karnataka\r"]
>> ["Friday","25/3/2016","Karnataka\r"]
>> ["Friday","8/4/2016","Karnataka\r"]
>> ["Monday","7/3/2016","Karnataka\r"]
>
>
>
>
>
> *CTAS query csv to parquet :*
>
> Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> SELECT
>> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
>> VARCHAR(100)) END AS `Day`,
>> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
>> VARCHAR(100)) END AS `Date`,
>> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
>> VARCHAR(100)) END AS `Area`
>> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
>> extractHeader => true))
>
>
>
> Thanks,
> Divya
>
> On 24 July 2017 at 13:20, Abhishek Girish  wrote:
>
>> Unfortunately, the attachments / pictures haven't come through. Mailing
>> lists sometimes do not support these. Can you paste as text or share links
>> to it instead?
>>
>> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
>> wrote:
>>
>> > yes it shows the proper values when I query the csv file.
>> > CTAS query csv to parquet :
>> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > SELECT
>> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
>> > VARCHAR(100)) END AS `Day`,
>> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
>> AS
>> > VARCHAR(100)) END AS `Date`,
>> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
>> AS
>> > VARCHAR(100)) END AS `Area`
>> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> ',',
>> > extractHeader => true))
>> >
>> > CSV File
>> >
>> > Parquet File
>> >
>> >
>> >
>> > Appreciate the help !
>> >
>> > Thanks,
>> > Divya ​
>> >
>> > On 24 July 2017 at 11:52, Abhishek Girish  wrote:
>> >
>> >> Can you share a sample row from the CSV and the CTAS query? Also test
>> if a
>> >> select columns[n] query on the CSV file works as expected [1] ?
>> >>
>> >> It could be an issue with delimiters.
>> >>
>> >> [1]
>> >> https://drill.apache.org/docs/querying-plain-text-files/#col
>> >> umns[n]-syntax
>> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
>> >> wrote:
>> >>
>> >> > Hi ,
>> >> > I am facing as weird issue when I CTAS and save the csv file as
>> parquet
>> >> it
>> >> > displays the last column values as null .
>> >> > This is not the case with one file .
>> >> > If I take any csv file with even with any data type and do a
>> >> > select column1,column2,column3 from table.parquet
>> >> > it shows the column3 values as null.
>> >> >
>> >> > Appreciate the help.
>> >> >
>> >> > Thanks,
>> >> > Divya
>> >> >
>> >>
>> >
>> >
>>
>
>


Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot
Pasting the result set in text format

*Reading parquet file format :*

> Day   Date Area
> Friday 15/1/2016 null
> Tuesday 26/1/2016 null
> Monday 7/3/2016 null
> Friday 25/3/2016 null
> Friday 1/4/2016 null
> Friday 8/4/2016 null



*Reading csv file format *

> columns
> ["Day","Date","Area\r"]
> ["Friday","1/4/2016","Karnataka\r"]
> ["Friday","15/1/2016","Karnataka\r"]
> ["Friday","25/3/2016","Karnataka\r"]
> ["Friday","8/4/2016","Karnataka\r"]
> ["Monday","7/3/2016","Karnataka\r"]





*CTAS query csv to parquet :*

Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> SELECT
> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> VARCHAR(100)) END AS `Day`,
> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> VARCHAR(100)) END AS `Date`,
> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> VARCHAR(100)) END AS `Area`
> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> extractHeader => true))



Thanks,
Divya

On 24 July 2017 at 13:20, Abhishek Girish  wrote:

> Unfortunately, the attachments / pictures haven't come through. Mailing
> lists sometimes do not support these. Can you paste as text or share links
> to it instead?
>
> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
> wrote:
>
> > yes it shows the proper values when I query the csv file.
> > CTAS query csv to parquet :
> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > SELECT
> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> > VARCHAR(100)) END AS `Day`,
> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> > VARCHAR(100)) END AS `Date`,
> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> > VARCHAR(100)) END AS `Area`
> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> > extractHeader => true))
> >
> > CSV File
> >
> > Parquet File
> >
> >
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya ​
> >
> > On 24 July 2017 at 11:52, Abhishek Girish  wrote:
> >
> >> Can you share a sample row from the CSV and the CTAS query? Also test
> if a
> >> select columns[n] query on the CSV file works as expected [1] ?
> >>
> >> It could be an issue with delimiters.
> >>
> >> [1]
> >> https://drill.apache.org/docs/querying-plain-text-files/#col
> >> umns[n]-syntax
> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
> >> wrote:
> >>
> >> > Hi ,
> >> > I am facing as weird issue when I CTAS and save the csv file as
> parquet
> >> it
> >> > displays the last column values as null .
> >> > This is not the case with one file .
> >> > If I take any csv file with even with any data type and do a
> >> > select column1,column2,column3 from table.parquet
> >> > it shows the column3 values as null.
> >> >
> >> > Appreciate the help.
> >> >
> >> > Thanks,
> >> > Divya
> >> >
> >>
> >
> >
>


Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Abhishek Girish
Unfortunately, the attachments / pictures haven't come through. Mailing
lists sometimes do not support these. Can you paste as text or share links
to it instead?

On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
wrote:

> yes it shows the proper values when I query the csv file.
> CTAS query csv to parquet :
> Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> SELECT
> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> VARCHAR(100)) END AS `Day`,
> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> VARCHAR(100)) END AS `Date`,
> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> VARCHAR(100)) END AS `Area`
> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> extractHeader => true))
>
> CSV File
>
> Parquet File
>
>
>
> Appreciate the help !
>
> Thanks,
> Divya ​
>
> On 24 July 2017 at 11:52, Abhishek Girish  wrote:
>
>> Can you share a sample row from the CSV and the CTAS query? Also test if a
>> select columns[n] query on the CSV file works as expected [1] ?
>>
>> It could be an issue with delimiters.
>>
>> [1]
>> https://drill.apache.org/docs/querying-plain-text-files/#col
>> umns[n]-syntax
>> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
>> wrote:
>>
>> > Hi ,
>> > I am facing as weird issue when I CTAS and save the csv file as parquet
>> it
>> > displays the last column values as null .
>> > This is not the case with one file .
>> > If I take any csv file with even with any data type and do a
>> > select column1,column2,column3 from table.parquet
>> > it shows the column3 values as null.
>> >
>> > Appreciate the help.
>> >
>> > Thanks,
>> > Divya
>> >
>>
>
>


Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot
yes it shows the proper values when I query the csv file.
CTAS query csv to parquet :
Create table `dfs`.`tmp`.`publicholiday.parquet` AS
SELECT
CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
VARCHAR(100)) END AS `Day`,
CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
VARCHAR(100)) END AS `Date`,
CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
VARCHAR(100)) END AS `Area`
FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
extractHeader => true))

CSV File

Parquet File



Appreciate the help !

Thanks,
Divya ​

On 24 July 2017 at 11:52, Abhishek Girish  wrote:

> Can you share a sample row from the CSV and the CTAS query? Also test if a
> select columns[n] query on the CSV file works as expected [1] ?
>
> It could be an issue with delimiters.
>
> [1]
> https://drill.apache.org/docs/querying-plain-text-files/#columns[n]-syntax
> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
> wrote:
>
> > Hi ,
> > I am facing as weird issue when I CTAS and save the csv file as parquet
> it
> > displays the last column values as null .
> > This is not the case with one file .
> > If I take any csv file with even with any data type and do a
> > select column1,column2,column3 from table.parquet
> > it shows the column3 values as null.
> >
> > Appreciate the help.
> >
> > Thanks,
> > Divya
> >
>


Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Abhishek Girish
Can you share a sample row from the CSV and the CTAS query? Also test if a
select columns[n] query on the CSV file works as expected [1] ?

It could be an issue with delimiters.

[1]
https://drill.apache.org/docs/querying-plain-text-files/#columns[n]-syntax
On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
wrote:

> Hi ,
> I am facing as weird issue when I CTAS and save the csv file as parquet it
> displays the last column values as null .
> This is not the case with one file .
> If I take any csv file with even with any data type and do a
> select column1,column2,column3 from table.parquet
> it shows the column3 values as null.
>
> Appreciate the help.
>
> Thanks,
> Divya
>


CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot
Hi ,
I am facing as weird issue when I CTAS and save the csv file as parquet it
displays the last column values as null .
This is not the case with one file .
If I take any csv file with even with any data type and do a
select column1,column2,column3 from table.parquet
it shows the column3 values as null.

Appreciate the help.

Thanks,
Divya