Glad to know that it worked! As you are using Drill on Windows, the new line delimiter in text files can be different from that on Linux / Mac. We could see \r\n as the lineDelimiter (carriage return & new line) and hence when we set the same in the format plugin, the issue gets resolved. This particular one doesn't seem to be documented - however ideally you should be able to see this in [1], [2].
Coming to your question on date formats, you can refer to [3]. It has clear examples on how to convert to a different date format. Do let us know if that helped. [1] https://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions [2] https://drill.apache.org/docs/text-files-csv-tsv-psv/#configuring-drill-to-read-text-files [3] https://drill.apache.org/docs/data-type-conversion/#to_date On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot <[email protected]> wrote: > Thank you so much it worked > > Can you please provide me the pointer to the documentation where updation > for different format type are mentioned . > > As I am facing another issue with date type as the data which I receive in > csv format has the format of 15/1/2016 when I try to cast or convert > to_date it throws me error > > > > > Thanks , > Divya > > On 24 July 2017 at 14:17, Abhishek Girish <[email protected]> wrote: > >> Can you update your csv format plugin as shown below and retry your query? >> >> "csv": { >> "type": "text", >> "extensions": [ >> "csv" >> ], >> "lineDelimiter": "\r\n", >> "extractHeader": true, >> "delimiter": "," >> } >> >> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <[email protected]> >> wrote: >> >> > 0: jdbc:drill:zk=local> select * FROM >> > dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10. >> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/ >> > PublicHoliday/PublicHoliday.csv` >> > limit 10 ; >> > +-----------------------------------------+ >> > | columns | >> > +-----------------------------------------+ >> > | ["Day","Date","Area\r"] | >> > | ["Friday","15/1/2016","Karnataka\r"] | >> > | ["Tuesday","26/1/2016","Karnataka\r"] | >> > | ["Monday","7/3/2016","Karnataka\r"] | >> > | ["Friday","25/3/2016","Karnataka\r"] | >> > | ["Friday","1/4/2016","Karnataka\r"] | >> > | ["Friday","8/4/2016","Karnataka\r"] | >> > | ["Thursday","14/4/2016","Karnataka\r"] | >> > | ["Tuesday","19/4/2016","Karnataka\r"] | >> > | ["Sunday","1/5/2016","Karnataka\r"] | >> > +-----------------------------------------+ >> > 10 rows selected (0.122 seconds) >> > 0: jdbc:drill:zk=local> select * from >> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10. >> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet` >> > limit 10 ; >> > +-----------+------------+-------+ >> > | Day | Date | Area | >> > +-----------+------------+-------+ >> > | Friday | 15/1/2016 | null | >> > | Tuesday | 26/1/2016 | null | >> > | Monday | 7/3/2016 | null | >> > | Friday | 25/3/2016 | null | >> > | Friday | 1/4/2016 | null | >> > | Friday | 8/4/2016 | null | >> > | Thursday | 14/4/2016 | null | >> > | Tuesday | 19/4/2016 | null | >> > | Sunday | 1/5/2016 | null | >> > | Monday | 9/5/2016 | null | >> > +-----------+------------+-------+ >> > 10 rows selected (0.1 seconds) >> > 0: jdbc:drill:zk=local> >> > >> > >> > *Drill set up* : Aapche drill is set up on Windows machine in embedded >> mode >> > . >> > >> > On 24 July 2017 at 13:30, Divya Gehlot <[email protected]> wrote: >> > >> > > >> > > Pasting the result set in text format >> > > >> > > *Reading parquet file format :* >> > > >> > >> Day Date Area >> > >> Friday 15/1/2016 null >> > >> Tuesday 26/1/2016 null >> > >> Monday 7/3/2016 null >> > >> Friday 25/3/2016 null >> > >> Friday 1/4/2016 null >> > >> Friday 8/4/2016 null >> > > >> > > >> > > >> > > *Reading csv file format * >> > > >> > >> columns >> > >> ["Day","Date","Area\r"] >> > >> ["Friday","1/4/2016","Karnataka\r"] >> > >> ["Friday","15/1/2016","Karnataka\r"] >> > >> ["Friday","25/3/2016","Karnataka\r"] >> > >> ["Friday","8/4/2016","Karnataka\r"] >> > >> ["Monday","7/3/2016","Karnataka\r"] >> > > >> > > >> > > >> > > >> > > >> > > *CTAS query csv to parquet :* >> > > >> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS >> > >> SELECT >> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day` >> AS >> > >> VARCHAR(100)) END AS `Day`, >> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >> CAST(`Date` >> > AS >> > >> VARCHAR(100)) END AS `Date`, >> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >> CAST(`Area` >> > AS >> > >> VARCHAR(100)) END AS `Area` >> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => >> > ',', >> > >> extractHeader => true)) >> > > >> > > >> > > >> > > Thanks, >> > > Divya >> > > >> > > On 24 July 2017 at 13:20, Abhishek Girish <[email protected]> wrote: >> > > >> > >> Unfortunately, the attachments / pictures haven't come through. >> Mailing >> > >> lists sometimes do not support these. Can you paste as text or share >> > links >> > >> to it instead? >> > >> >> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot < >> [email protected]> >> > >> wrote: >> > >> >> > >> > yes it shows the proper values when I query the csv file. >> > >> > CTAS query csv to parquet : >> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS >> > >> > SELECT >> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >> CAST(`Day` >> > AS >> > >> > VARCHAR(100)) END AS `Day`, >> > >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >> CAST(`Date` >> > >> AS >> > >> > VARCHAR(100)) END AS `Date`, >> > >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >> CAST(`Area` >> > >> AS >> > >> > VARCHAR(100)) END AS `Area` >> > >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter >> => >> > >> ',', >> > >> > extractHeader => true)) >> > >> > >> > >> > CSV File >> > >> > >> > >> > Parquet File >> > >> > >> > >> > >> > >> > >> > >> > Appreciate the help ! >> > >> > >> > >> > Thanks, >> > >> > Divya >> > >> > >> > >> > On 24 July 2017 at 11:52, Abhishek Girish <[email protected]> >> wrote: >> > >> > >> > >> >> Can you share a sample row from the CSV and the CTAS query? Also >> test >> > >> if a >> > >> >> select columns[n] query on the CSV file works as expected [1] ? >> > >> >> >> > >> >> It could be an issue with delimiters. >> > >> >> >> > >> >> [1] >> > >> >> https://drill.apache.org/docs/querying-plain-text-files/#col >> > >> >> umns[n]-syntax >> > >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot < >> > [email protected]> >> > >> >> wrote: >> > >> >> >> > >> >> > Hi , >> > >> >> > I am facing as weird issue when I CTAS and save the csv file as >> > >> parquet >> > >> >> it >> > >> >> > displays the last column values as null . >> > >> >> > This is not the case with one file . >> > >> >> > If I take any csv file with even with any data type and do a >> > >> >> > select column1,column2,column3 from table.parquet >> > >> >> > it shows the column3 values as null. >> > >> >> > >> > >> >> > Appreciate the help. >> > >> >> > >> > >> >> > Thanks, >> > >> >> > Divya >> > >> >> > >> > >> >> >> > >> > >> > >> > >> > >> >> > > >> > > >> > >> > >
