subject:"CTAS and save as parquet last column values are shown as null"

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish

Filed DRILL-5684  to
track the doc issue.

On Mon, Jul 24, 2017 at 8:33 AM, Abhishek Girish  wrote:

> Glad to know that it worked!
>
> As you are using Drill on Windows, the new line delimiter in text files
> can be different from that on Linux / Mac. We could see \r\n as the
> lineDelimiter (carriage return & new line) and hence when we set the same
> in the format plugin, the issue gets resolved. This particular one doesn't
> seem to be documented - however ideally you should be able to see this in
> [1], [2].
>
> Coming to your question on date formats, you can refer to [3]. It has
> clear examples on how to convert to a different date format. Do let us know
> if that helped.
>
> [1] https://drill.apache.org/docs/plugin-configuration-
> basics/#list-of-attributes-and-definitions
> [2] https://drill.apache.org/docs/text-files-csv-tsv-psv/#
> configuring-drill-to-read-text-files
> [3] https://drill.apache.org/docs/data-type-conversion/#to_date
>
> On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot 
> wrote:
>
>> Thank you so much it worked
>>
>> Can you please provide me the pointer to the documentation where updation
>> for different format type are mentioned .
>>
>> As I am facing another issue with date type as the data which I receive
>> in csv format has the format of 15/1/2016 when I try to cast or convert
>> to_date it throws me error
>>
>>
>>
>>
>> Thanks ,
>> Divya
>>
>> On 24 July 2017 at 14:17, Abhishek Girish  wrote:
>>
>>> Can you update your csv format plugin as shown below and retry your
>>> query?
>>>
>>> "csv": {
>>>   "type": "text",
>>>   "extensions": [
>>> "csv"
>>>   ],
>>>   "lineDelimiter": "\r\n",
>>>   "extractHeader": true,
>>>   "delimiter": ","
>>> }
>>>
>>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
>>> wrote:
>>>
>>> > 0: jdbc:drill:zk=local> select * FROM
>>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>>> > PublicHoliday/PublicHoliday.csv`
>>> > limit 10 ;
>>> > +-+
>>> > | columns |
>>> > +-+
>>> > | ["Day","Date","Area\r"] |
>>> > | ["Friday","15/1/2016","Karnataka\r"]|
>>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>>> > | ["Monday","7/3/2016","Karnataka\r"] |
>>> > | ["Friday","25/3/2016","Karnataka\r"]|
>>> > | ["Friday","1/4/2016","Karnataka\r"] |
>>> > | ["Friday","8/4/2016","Karnataka\r"] |
>>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>>> > +-+
>>> > 10 rows selected (0.122 seconds)
>>> > 0: jdbc:drill:zk=local> select * from
>>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>>> > limit 10 ;
>>> > +---++---+
>>> > |Day|Date| Area  |
>>> > +---++---+
>>> > | Friday| 15/1/2016  | null  |
>>> > | Tuesday   | 26/1/2016  | null  |
>>> > | Monday| 7/3/2016   | null  |
>>> > | Friday| 25/3/2016  | null  |
>>> > | Friday| 1/4/2016   | null  |
>>> > | Friday| 8/4/2016   | null  |
>>> > | Thursday  | 14/4/2016  | null  |
>>> > | Tuesday   | 19/4/2016  | null  |
>>> > | Sunday| 1/5/2016   | null  |
>>> > | Monday| 9/5/2016   | null  |
>>> > +---++---+
>>> > 10 rows selected (0.1 seconds)
>>> > 0: jdbc:drill:zk=local>
>>> >
>>> >
>>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>>> mode
>>> > .
>>> >
>>> > On 24 July 2017 at 13:30, Divya Gehlot 
>>> wrote:
>>> >
>>> > >
>>> > > Pasting the result set in text format
>>> > >
>>> > > *Reading parquet file format :*
>>> > >
>>> > >> Day   Date Area
>>> > >> Friday 15/1/2016 null
>>> > >> Tuesday 26/1/2016 null
>>> > >> Monday 7/3/2016 null
>>> > >> Friday 25/3/2016 null
>>> > >> Friday 1/4/2016 null
>>> > >> Friday 8/4/2016 null
>>> > >
>>> > >
>>> > >
>>> > > *Reading csv file format *
>>> > >
>>> > >> columns
>>> > >> ["Day","Date","Area\r"]
>>> > >> ["Friday","1/4/2016","Karnataka\r"]
>>> > >> ["Friday","15/1/2016","Karnataka\r"]
>>> > >> ["Friday","25/3/2016","Karnataka\r"]
>>> > >> ["Friday","8/4/2016","Karnataka\r"]
>>> > >> ["Monday","7/3/2016","Karnataka\r"]
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > *CTAS query csv to parquet :*
>>> > >
>>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>>> > >> SELECT
>>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>>> CAST(`Day` AS
>>> > >> VARCHAR(100)) END AS `Day`,
>>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>>>

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish

Glad to know that it worked!

As you are using Drill on Windows, the new line delimiter in text files can
be different from that on Linux / Mac. We could see \r\n as the
lineDelimiter (carriage return & new line) and hence when we set the same
in the format plugin, the issue gets resolved. This particular one doesn't
seem to be documented - however ideally you should be able to see this in
[1], [2].

Coming to your question on date formats, you can refer to [3]. It has clear
examples on how to convert to a different date format. Do let us know if
that helped.

[1]
https://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions
[2]
https://drill.apache.org/docs/text-files-csv-tsv-psv/#configuring-drill-to-read-text-files
[3] https://drill.apache.org/docs/data-type-conversion/#to_date

On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot 
wrote:

> Thank you so much it worked
>
> Can you please provide me the pointer to the documentation where updation
> for different format type are mentioned .
>
> As I am facing another issue with date type as the data which I receive in
> csv format has the format of 15/1/2016 when I try to cast or convert
> to_date it throws me error
>
>
>
>
> Thanks ,
> Divya
>
> On 24 July 2017 at 14:17, Abhishek Girish  wrote:
>
>> Can you update your csv format plugin as shown below and retry your query?
>>
>> "csv": {
>>   "type": "text",
>>   "extensions": [
>> "csv"
>>   ],
>>   "lineDelimiter": "\r\n",
>>   "extractHeader": true,
>>   "delimiter": ","
>> }
>>
>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
>> wrote:
>>
>> > 0: jdbc:drill:zk=local> select * FROM
>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>> > PublicHoliday/PublicHoliday.csv`
>> > limit 10 ;
>> > +-+
>> > | columns |
>> > +-+
>> > | ["Day","Date","Area\r"] |
>> > | ["Friday","15/1/2016","Karnataka\r"]|
>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>> > | ["Monday","7/3/2016","Karnataka\r"] |
>> > | ["Friday","25/3/2016","Karnataka\r"]|
>> > | ["Friday","1/4/2016","Karnataka\r"] |
>> > | ["Friday","8/4/2016","Karnataka\r"] |
>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>> > +-+
>> > 10 rows selected (0.122 seconds)
>> > 0: jdbc:drill:zk=local> select * from
>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>> > limit 10 ;
>> > +---++---+
>> > |Day|Date| Area  |
>> > +---++---+
>> > | Friday| 15/1/2016  | null  |
>> > | Tuesday   | 26/1/2016  | null  |
>> > | Monday| 7/3/2016   | null  |
>> > | Friday| 25/3/2016  | null  |
>> > | Friday| 1/4/2016   | null  |
>> > | Friday| 8/4/2016   | null  |
>> > | Thursday  | 14/4/2016  | null  |
>> > | Tuesday   | 19/4/2016  | null  |
>> > | Sunday| 1/5/2016   | null  |
>> > | Monday| 9/5/2016   | null  |
>> > +---++---+
>> > 10 rows selected (0.1 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> >
>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>> mode
>> > .
>> >
>> > On 24 July 2017 at 13:30, Divya Gehlot  wrote:
>> >
>> > >
>> > > Pasting the result set in text format
>> > >
>> > > *Reading parquet file format :*
>> > >
>> > >> Day   Date Area
>> > >> Friday 15/1/2016 null
>> > >> Tuesday 26/1/2016 null
>> > >> Monday 7/3/2016 null
>> > >> Friday 25/3/2016 null
>> > >> Friday 1/4/2016 null
>> > >> Friday 8/4/2016 null
>> > >
>> > >
>> > >
>> > > *Reading csv file format *
>> > >
>> > >> columns
>> > >> ["Day","Date","Area\r"]
>> > >> ["Friday","1/4/2016","Karnataka\r"]
>> > >> ["Friday","15/1/2016","Karnataka\r"]
>> > >> ["Friday","25/3/2016","Karnataka\r"]
>> > >> ["Friday","8/4/2016","Karnataka\r"]
>> > >> ["Monday","7/3/2016","Karnataka\r"]
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > *CTAS query csv to parquet :*
>> > >
>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> SELECT
>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
>> AS
>> > >> VARCHAR(100)) END AS `Day`,
>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Date`
>> > AS
>> > >> VARCHAR(100)) END AS `Date`,
>> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Area`
>> > AS
>> > >> VARCHAR(100)) END AS `Area`
>> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> > ',',
>> > >> extractHeader => true))
>> > >
>> > >
>> > >
>> > > Thanks,
>> >

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Divya Gehlot

Any best practices guide if any body could share would be great



On 24 July 2017 at 14:52, Divya Gehlot  wrote:

> Thank you so much it worked
>
> Can you please provide me the pointer to the documentation where updation
> for different format type are mentioned .
>
> As I am facing another issue with date type as the data which I receive in
> csv format has the format of 15/1/2016 when I try to cast or convert
> to_date it throws me error
>
>
>
>
> Thanks ,
> Divya
>
> On 24 July 2017 at 14:17, Abhishek Girish  wrote:
>
>> Can you update your csv format plugin as shown below and retry your query?
>>
>> "csv": {
>>   "type": "text",
>>   "extensions": [
>> "csv"
>>   ],
>>   "lineDelimiter": "\r\n",
>>   "extractHeader": true,
>>   "delimiter": ","
>> }
>>
>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
>> wrote:
>>
>> > 0: jdbc:drill:zk=local> select * FROM
>> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
>> > PublicHoliday/PublicHoliday.csv`
>> > limit 10 ;
>> > +-+
>> > | columns |
>> > +-+
>> > | ["Day","Date","Area\r"] |
>> > | ["Friday","15/1/2016","Karnataka\r"]|
>> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
>> > | ["Monday","7/3/2016","Karnataka\r"] |
>> > | ["Friday","25/3/2016","Karnataka\r"]|
>> > | ["Friday","1/4/2016","Karnataka\r"] |
>> > | ["Friday","8/4/2016","Karnataka\r"] |
>> > | ["Thursday","14/4/2016","Karnataka\r"]  |
>> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
>> > | ["Sunday","1/5/2016","Karnataka\r"] |
>> > +-+
>> > 10 rows selected (0.122 seconds)
>> > 0: jdbc:drill:zk=local> select * from
>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
>> > limit 10 ;
>> > +---++---+
>> > |Day|Date| Area  |
>> > +---++---+
>> > | Friday| 15/1/2016  | null  |
>> > | Tuesday   | 26/1/2016  | null  |
>> > | Monday| 7/3/2016   | null  |
>> > | Friday| 25/3/2016  | null  |
>> > | Friday| 1/4/2016   | null  |
>> > | Friday| 8/4/2016   | null  |
>> > | Thursday  | 14/4/2016  | null  |
>> > | Tuesday   | 19/4/2016  | null  |
>> > | Sunday| 1/5/2016   | null  |
>> > | Monday| 9/5/2016   | null  |
>> > +---++---+
>> > 10 rows selected (0.1 seconds)
>> > 0: jdbc:drill:zk=local>
>> >
>> >
>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
>> mode
>> > .
>> >
>> > On 24 July 2017 at 13:30, Divya Gehlot  wrote:
>> >
>> > >
>> > > Pasting the result set in text format
>> > >
>> > > *Reading parquet file format :*
>> > >
>> > >> Day   Date Area
>> > >> Friday 15/1/2016 null
>> > >> Tuesday 26/1/2016 null
>> > >> Monday 7/3/2016 null
>> > >> Friday 25/3/2016 null
>> > >> Friday 1/4/2016 null
>> > >> Friday 8/4/2016 null
>> > >
>> > >
>> > >
>> > > *Reading csv file format *
>> > >
>> > >> columns
>> > >> ["Day","Date","Area\r"]
>> > >> ["Friday","1/4/2016","Karnataka\r"]
>> > >> ["Friday","15/1/2016","Karnataka\r"]
>> > >> ["Friday","25/3/2016","Karnataka\r"]
>> > >> ["Friday","8/4/2016","Karnataka\r"]
>> > >> ["Monday","7/3/2016","Karnataka\r"]
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > *CTAS query csv to parquet :*
>> > >
>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> SELECT
>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
>> AS
>> > >> VARCHAR(100)) END AS `Day`,
>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Date`
>> > AS
>> > >> VARCHAR(100)) END AS `Date`,
>> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Area`
>> > AS
>> > >> VARCHAR(100)) END AS `Area`
>> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> > ',',
>> > >> extractHeader => true))
>> > >
>> > >
>> > >
>> > > Thanks,
>> > > Divya
>> > >
>> > > On 24 July 2017 at 13:20, Abhishek Girish  wrote:
>> > >
>> > >> Unfortunately, the attachments / pictures haven't come through.
>> Mailing
>> > >> lists sometimes do not support these. Can you paste as text or share
>> > links
>> > >> to it instead?
>> > >>
>> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <
>> divya.htco...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > yes it shows the proper values when I query the csv file.
>> > >> > CTAS query csv to parquet :
>> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > >> > SELECT
>> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
>> CAST(`Day`
>> > AS
>> > >> > VARCHAR(100)) END AS `Day`,
>> > >> > CASE WHEN `Date` = '' THEN

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Divya Gehlot

Thank you so much it worked

Can you please provide me the pointer to the documentation where updation
for different format type are mentioned .

As I am facing another issue with date type as the data which I receive in
csv format has the format of 15/1/2016 when I try to cast or convert
to_date it throws me error




Thanks ,
Divya

On 24 July 2017 at 14:17, Abhishek Girish  wrote:

> Can you update your csv format plugin as shown below and retry your query?
>
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "lineDelimiter": "\r\n",
>   "extractHeader": true,
>   "delimiter": ","
> }
>
> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
> wrote:
>
> > 0: jdbc:drill:zk=local> select * FROM
> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> > PublicHoliday/PublicHoliday.csv`
> > limit 10 ;
> > +-+
> > | columns |
> > +-+
> > | ["Day","Date","Area\r"] |
> > | ["Friday","15/1/2016","Karnataka\r"]|
> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
> > | ["Monday","7/3/2016","Karnataka\r"] |
> > | ["Friday","25/3/2016","Karnataka\r"]|
> > | ["Friday","1/4/2016","Karnataka\r"] |
> > | ["Friday","8/4/2016","Karnataka\r"] |
> > | ["Thursday","14/4/2016","Karnataka\r"]  |
> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
> > | ["Sunday","1/5/2016","Karnataka\r"] |
> > +-+
> > 10 rows selected (0.122 seconds)
> > 0: jdbc:drill:zk=local> select * from
> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> > limit 10 ;
> > +---++---+
> > |Day|Date| Area  |
> > +---++---+
> > | Friday| 15/1/2016  | null  |
> > | Tuesday   | 26/1/2016  | null  |
> > | Monday| 7/3/2016   | null  |
> > | Friday| 25/3/2016  | null  |
> > | Friday| 1/4/2016   | null  |
> > | Friday| 8/4/2016   | null  |
> > | Thursday  | 14/4/2016  | null  |
> > | Tuesday   | 19/4/2016  | null  |
> > | Sunday| 1/5/2016   | null  |
> > | Monday| 9/5/2016   | null  |
> > +---++---+
> > 10 rows selected (0.1 seconds)
> > 0: jdbc:drill:zk=local>
> >
> >
> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
> mode
> > .
> >
> > On 24 July 2017 at 13:30, Divya Gehlot  wrote:
> >
> > >
> > > Pasting the result set in text format
> > >
> > > *Reading parquet file format :*
> > >
> > >> Day   Date Area
> > >> Friday 15/1/2016 null
> > >> Tuesday 26/1/2016 null
> > >> Monday 7/3/2016 null
> > >> Friday 25/3/2016 null
> > >> Friday 1/4/2016 null
> > >> Friday 8/4/2016 null
> > >
> > >
> > >
> > > *Reading csv file format *
> > >
> > >> columns
> > >> ["Day","Date","Area\r"]
> > >> ["Friday","1/4/2016","Karnataka\r"]
> > >> ["Friday","15/1/2016","Karnataka\r"]
> > >> ["Friday","25/3/2016","Karnataka\r"]
> > >> ["Friday","8/4/2016","Karnataka\r"]
> > >> ["Monday","7/3/2016","Karnataka\r"]
> > >
> > >
> > >
> > >
> > >
> > > *CTAS query csv to parquet :*
> > >
> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > >> SELECT
> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> AS
> > >> VARCHAR(100)) END AS `Day`,
> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> > AS
> > >> VARCHAR(100)) END AS `Date`,
> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> > AS
> > >> VARCHAR(100)) END AS `Area`
> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> > ',',
> > >> extractHeader => true))
> > >
> > >
> > >
> > > Thanks,
> > > Divya
> > >
> > > On 24 July 2017 at 13:20, Abhishek Girish  wrote:
> > >
> > >> Unfortunately, the attachments / pictures haven't come through.
> Mailing
> > >> lists sometimes do not support these. Can you paste as text or share
> > links
> > >> to it instead?
> > >>
> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <
> divya.htco...@gmail.com>
> > >> wrote:
> > >>
> > >> > yes it shows the proper values when I query the csv file.
> > >> > CTAS query csv to parquet :
> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > >> > SELECT
> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> > AS
> > >> > VARCHAR(100)) END AS `Day`,
> > >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
> CAST(`Date`
> > >> AS
> > >> > VARCHAR(100)) END AS `Date`,
> > >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
> CAST(`Area`
> > >> AS
> > >> > VARCHAR(100)) END AS `Area`
> > >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> > >> ',',
> >

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish

Can you update your csv format plugin as shown below and retry your query?

"csv": {
  "type": "text",
  "extensions": [
"csv"
  ],
  "lineDelimiter": "\r\n",
  "extractHeader": true,
  "delimiter": ","
}

On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot 
wrote:

> 0: jdbc:drill:zk=local> select * FROM
>  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> PublicHoliday/PublicHoliday.csv`
> limit 10 ;
> +-+
> | columns |
> +-+
> | ["Day","Date","Area\r"] |
> | ["Friday","15/1/2016","Karnataka\r"]|
> | ["Tuesday","26/1/2016","Karnataka\r"]   |
> | ["Monday","7/3/2016","Karnataka\r"] |
> | ["Friday","25/3/2016","Karnataka\r"]|
> | ["Friday","1/4/2016","Karnataka\r"] |
> | ["Friday","8/4/2016","Karnataka\r"] |
> | ["Thursday","14/4/2016","Karnataka\r"]  |
> | ["Tuesday","19/4/2016","Karnataka\r"]   |
> | ["Sunday","1/5/2016","Karnataka\r"] |
> +-+
> 10 rows selected (0.122 seconds)
> 0: jdbc:drill:zk=local> select * from
> `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> limit 10 ;
> +---++---+
> |Day|Date| Area  |
> +---++---+
> | Friday| 15/1/2016  | null  |
> | Tuesday   | 26/1/2016  | null  |
> | Monday| 7/3/2016   | null  |
> | Friday| 25/3/2016  | null  |
> | Friday| 1/4/2016   | null  |
> | Friday| 8/4/2016   | null  |
> | Thursday  | 14/4/2016  | null  |
> | Tuesday   | 19/4/2016  | null  |
> | Sunday| 1/5/2016   | null  |
> | Monday| 9/5/2016   | null  |
> +---++---+
> 10 rows selected (0.1 seconds)
> 0: jdbc:drill:zk=local>
>
>
> *Drill set up* : Aapche drill is set up on Windows machine in embedded mode
> .
>
> On 24 July 2017 at 13:30, Divya Gehlot  wrote:
>
> >
> > Pasting the result set in text format
> >
> > *Reading parquet file format :*
> >
> >> Day   Date Area
> >> Friday 15/1/2016 null
> >> Tuesday 26/1/2016 null
> >> Monday 7/3/2016 null
> >> Friday 25/3/2016 null
> >> Friday 1/4/2016 null
> >> Friday 8/4/2016 null
> >
> >
> >
> > *Reading csv file format *
> >
> >> columns
> >> ["Day","Date","Area\r"]
> >> ["Friday","1/4/2016","Karnataka\r"]
> >> ["Friday","15/1/2016","Karnataka\r"]
> >> ["Friday","25/3/2016","Karnataka\r"]
> >> ["Friday","8/4/2016","Karnataka\r"]
> >> ["Monday","7/3/2016","Karnataka\r"]
> >
> >
> >
> >
> >
> > *CTAS query csv to parquet :*
> >
> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> SELECT
> >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> >> VARCHAR(100)) END AS `Day`,
> >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> AS
> >> VARCHAR(100)) END AS `Date`,
> >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> AS
> >> VARCHAR(100)) END AS `Area`
> >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> ',',
> >> extractHeader => true))
> >
> >
> >
> > Thanks,
> > Divya
> >
> > On 24 July 2017 at 13:20, Abhishek Girish  wrote:
> >
> >> Unfortunately, the attachments / pictures haven't come through. Mailing
> >> lists sometimes do not support these. Can you paste as text or share
> links
> >> to it instead?
> >>
> >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
> >> wrote:
> >>
> >> > yes it shows the proper values when I query the csv file.
> >> > CTAS query csv to parquet :
> >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> >> > SELECT
> >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> AS
> >> > VARCHAR(100)) END AS `Day`,
> >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> >> AS
> >> > VARCHAR(100)) END AS `Date`,
> >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> >> AS
> >> > VARCHAR(100)) END AS `Area`
> >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> >> ',',
> >> > extractHeader => true))
> >> >
> >> > CSV File
> >> >
> >> > Parquet File
> >> >
> >> >
> >> >
> >> > Appreciate the help !
> >> >
> >> > Thanks,
> >> > Divya 
> >> >
> >> > On 24 July 2017 at 11:52, Abhishek Girish  wrote:
> >> >
> >> >> Can you share a sample row from the CSV and the CTAS query? Also test
> >> if a
> >> >> select columns[n] query on the CSV file works as expected [1] ?
> >> >>
> >> >> It could be an issue with delimiters.
> >> >>
> >> >> [1]
> >> >> https://drill.apache.org/docs/querying-plain-text-files/#col
> >> >> umns[n]-syntax
> >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot <
> divya.htco...@gmail.com>
> >> >> wrote:
> >> >>
> >>

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot

0: jdbc:drill:zk=local> select * FROM
 
dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/PublicHoliday/PublicHoliday.csv`
limit 10 ;
+-+
| columns |
+-+
| ["Day","Date","Area\r"] |
| ["Friday","15/1/2016","Karnataka\r"]|
| ["Tuesday","26/1/2016","Karnataka\r"]   |
| ["Monday","7/3/2016","Karnataka\r"] |
| ["Friday","25/3/2016","Karnataka\r"]|
| ["Friday","1/4/2016","Karnataka\r"] |
| ["Friday","8/4/2016","Karnataka\r"] |
| ["Thursday","14/4/2016","Karnataka\r"]  |
| ["Tuesday","19/4/2016","Karnataka\r"]   |
| ["Sunday","1/5/2016","Karnataka\r"] |
+-+
10 rows selected (0.122 seconds)
0: jdbc:drill:zk=local> select * from
`dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
limit 10 ;
+---++---+
|Day|Date| Area  |
+---++---+
| Friday| 15/1/2016  | null  |
| Tuesday   | 26/1/2016  | null  |
| Monday| 7/3/2016   | null  |
| Friday| 25/3/2016  | null  |
| Friday| 1/4/2016   | null  |
| Friday| 8/4/2016   | null  |
| Thursday  | 14/4/2016  | null  |
| Tuesday   | 19/4/2016  | null  |
| Sunday| 1/5/2016   | null  |
| Monday| 9/5/2016   | null  |
+---++---+
10 rows selected (0.1 seconds)
0: jdbc:drill:zk=local>


*Drill set up* : Aapche drill is set up on Windows machine in embedded mode
.

On 24 July 2017 at 13:30, Divya Gehlot  wrote:

>
> Pasting the result set in text format
>
> *Reading parquet file format :*
>
>> Day   Date Area
>> Friday 15/1/2016 null
>> Tuesday 26/1/2016 null
>> Monday 7/3/2016 null
>> Friday 25/3/2016 null
>> Friday 1/4/2016 null
>> Friday 8/4/2016 null
>
>
>
> *Reading csv file format *
>
>> columns
>> ["Day","Date","Area\r"]
>> ["Friday","1/4/2016","Karnataka\r"]
>> ["Friday","15/1/2016","Karnataka\r"]
>> ["Friday","25/3/2016","Karnataka\r"]
>> ["Friday","8/4/2016","Karnataka\r"]
>> ["Monday","7/3/2016","Karnataka\r"]
>
>
>
>
>
> *CTAS query csv to parquet :*
>
> Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> SELECT
>> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
>> VARCHAR(100)) END AS `Day`,
>> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
>> VARCHAR(100)) END AS `Date`,
>> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
>> VARCHAR(100)) END AS `Area`
>> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
>> extractHeader => true))
>
>
>
> Thanks,
> Divya
>
> On 24 July 2017 at 13:20, Abhishek Girish  wrote:
>
>> Unfortunately, the attachments / pictures haven't come through. Mailing
>> lists sometimes do not support these. Can you paste as text or share links
>> to it instead?
>>
>> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
>> wrote:
>>
>> > yes it shows the proper values when I query the csv file.
>> > CTAS query csv to parquet :
>> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
>> > SELECT
>> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
>> > VARCHAR(100)) END AS `Day`,
>> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
>> AS
>> > VARCHAR(100)) END AS `Date`,
>> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
>> AS
>> > VARCHAR(100)) END AS `Area`
>> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
>> ',',
>> > extractHeader => true))
>> >
>> > CSV File
>> >
>> > Parquet File
>> >
>> >
>> >
>> > Appreciate the help !
>> >
>> > Thanks,
>> > Divya 
>> >
>> > On 24 July 2017 at 11:52, Abhishek Girish  wrote:
>> >
>> >> Can you share a sample row from the CSV and the CTAS query? Also test
>> if a
>> >> select columns[n] query on the CSV file works as expected [1] ?
>> >>
>> >> It could be an issue with delimiters.
>> >>
>> >> [1]
>> >> https://drill.apache.org/docs/querying-plain-text-files/#col
>> >> umns[n]-syntax
>> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
>> >> wrote:
>> >>
>> >> > Hi ,
>> >> > I am facing as weird issue when I CTAS and save the csv file as
>> parquet
>> >> it
>> >> > displays the last column values as null .
>> >> > This is not the case with one file .
>> >> > If I take any csv file with even with any data type and do a
>> >> > select column1,column2,column3 from table.parquet
>> >> > it shows the column3 values as null.
>> >> >
>> >> > Appreciate the help.
>> >> >
>> >> > Thanks,
>> >> > Divya
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot

Pasting the result set in text format

*Reading parquet file format :*

> Day   Date Area
> Friday 15/1/2016 null
> Tuesday 26/1/2016 null
> Monday 7/3/2016 null
> Friday 25/3/2016 null
> Friday 1/4/2016 null
> Friday 8/4/2016 null



*Reading csv file format *

> columns
> ["Day","Date","Area\r"]
> ["Friday","1/4/2016","Karnataka\r"]
> ["Friday","15/1/2016","Karnataka\r"]
> ["Friday","25/3/2016","Karnataka\r"]
> ["Friday","8/4/2016","Karnataka\r"]
> ["Monday","7/3/2016","Karnataka\r"]





*CTAS query csv to parquet :*

Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> SELECT
> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> VARCHAR(100)) END AS `Day`,
> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> VARCHAR(100)) END AS `Date`,
> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> VARCHAR(100)) END AS `Area`
> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> extractHeader => true))



Thanks,
Divya

On 24 July 2017 at 13:20, Abhishek Girish  wrote:

> Unfortunately, the attachments / pictures haven't come through. Mailing
> lists sometimes do not support these. Can you paste as text or share links
> to it instead?
>
> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
> wrote:
>
> > yes it shows the proper values when I query the csv file.
> > CTAS query csv to parquet :
> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > SELECT
> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> > VARCHAR(100)) END AS `Day`,
> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> > VARCHAR(100)) END AS `Date`,
> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> > VARCHAR(100)) END AS `Area`
> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> > extractHeader => true))
> >
> > CSV File
> >
> > Parquet File
> >
> >
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya 
> >
> > On 24 July 2017 at 11:52, Abhishek Girish  wrote:
> >
> >> Can you share a sample row from the CSV and the CTAS query? Also test
> if a
> >> select columns[n] query on the CSV file works as expected [1] ?
> >>
> >> It could be an issue with delimiters.
> >>
> >> [1]
> >> https://drill.apache.org/docs/querying-plain-text-files/#col
> >> umns[n]-syntax
> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
> >> wrote:
> >>
> >> > Hi ,
> >> > I am facing as weird issue when I CTAS and save the csv file as
> parquet
> >> it
> >> > displays the last column values as null .
> >> > This is not the case with one file .
> >> > If I take any csv file with even with any data type and do a
> >> > select column1,column2,column3 from table.parquet
> >> > it shows the column3 values as null.
> >> >
> >> > Appreciate the help.
> >> >
> >> > Thanks,
> >> > Divya
> >> >
> >>
> >
> >
>

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Abhishek Girish

Unfortunately, the attachments / pictures haven't come through. Mailing
lists sometimes do not support these. Can you paste as text or share links
to it instead?

On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot 
wrote:

> yes it shows the proper values when I query the csv file.
> CTAS query csv to parquet :
> Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> SELECT
> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
> VARCHAR(100)) END AS `Day`,
> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
> VARCHAR(100)) END AS `Date`,
> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
> VARCHAR(100)) END AS `Area`
> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
> extractHeader => true))
>
> CSV File
>
> Parquet File
>
>
>
> Appreciate the help !
>
> Thanks,
> Divya 
>
> On 24 July 2017 at 11:52, Abhishek Girish  wrote:
>
>> Can you share a sample row from the CSV and the CTAS query? Also test if a
>> select columns[n] query on the CSV file works as expected [1] ?
>>
>> It could be an issue with delimiters.
>>
>> [1]
>> https://drill.apache.org/docs/querying-plain-text-files/#col
>> umns[n]-syntax
>> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
>> wrote:
>>
>> > Hi ,
>> > I am facing as weird issue when I CTAS and save the csv file as parquet
>> it
>> > displays the last column values as null .
>> > This is not the case with one file .
>> > If I take any csv file with even with any data type and do a
>> > select column1,column2,column3 from table.parquet
>> > it shows the column3 values as null.
>> >
>> > Appreciate the help.
>> >
>> > Thanks,
>> > Divya
>> >
>>
>
>

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot

yes it shows the proper values when I query the csv file.
CTAS query csv to parquet :
Create table `dfs`.`tmp`.`publicholiday.parquet` AS
SELECT
CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` AS
VARCHAR(100)) END AS `Day`,
CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date` AS
VARCHAR(100)) END AS `Date`,
CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area` AS
VARCHAR(100)) END AS `Area`
FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => ',',
extractHeader => true))

CSV File

Parquet File

Appreciate the help !

Thanks,
Divya 

On 24 July 2017 at 11:52, Abhishek Girish  wrote:

> Can you share a sample row from the CSV and the CTAS query? Also test if a
> select columns[n] query on the CSV file works as expected [1] ?
>
> It could be an issue with delimiters.
>
> [1]
> https://drill.apache.org/docs/querying-plain-text-files/#columns[n]-syntax
> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
> wrote:
>
> > Hi ,
> > I am facing as weird issue when I CTAS and save the csv file as parquet
> it
> > displays the last column values as null .
> > This is not the case with one file .
> > If I take any csv file with even with any data type and do a
> > select column1,column2,column3 from table.parquet
> > it shows the column3 values as null.
> >
> > Appreciate the help.
> >
> > Thanks,
> > Divya
> >
>

Re: CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Abhishek Girish

Can you share a sample row from the CSV and the CTAS query? Also test if a
select columns[n] query on the CSV file works as expected [1] ?

It could be an issue with delimiters.

[1]
https://drill.apache.org/docs/querying-plain-text-files/#columns[n]-syntax
On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot 
wrote:

> Hi ,
> I am facing as weird issue when I CTAS and save the csv file as parquet it
> displays the last column values as null .
> This is not the case with one file .
> If I take any csv file with even with any data type and do a
> select column1,column2,column3 from table.parquet
> it shows the column3 values as null.
>
> Appreciate the help.
>
> Thanks,
> Divya
>

CTAS and save as parquet last column values are shown as null

2017-07-23 Thread Divya Gehlot

Hi ,
I am facing as weird issue when I CTAS and save the csv file as parquet it
displays the last column values as null .
This is not the case with one file .
If I take any csv file with even with any data type and do a
select column1,column2,column3 from table.parquet
it shows the column3 values as null.

Appreciate the help.

Thanks,
Divya

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

Re: CTAS and save as parquet last column values are shown as null

CTAS and save as parquet last column values are shown as null

11 matches

Site Navigation

Mail list logo

Footer information