Filed DRILL-5684 <https://issues.apache.org/jira/browse/DRILL-5684> to track the doc issue.
On Mon, Jul 24, 2017 at 8:33 AM, Abhishek Girish <agir...@apache.org> wrote: > Glad to know that it worked! > > As you are using Drill on Windows, the new line delimiter in text files > can be different from that on Linux / Mac. We could see \r\n as the > lineDelimiter (carriage return & new line) and hence when we set the same > in the format plugin, the issue gets resolved. This particular one doesn't > seem to be documented - however ideally you should be able to see this in > [1], [2]. > > Coming to your question on date formats, you can refer to [3]. It has > clear examples on how to convert to a different date format. Do let us know > if that helped. > > [1] https://drill.apache.org/docs/plugin-configuration- > basics/#list-of-attributes-and-definitions > [2] https://drill.apache.org/docs/text-files-csv-tsv-psv/# > configuring-drill-to-read-text-files > [3] https://drill.apache.org/docs/data-type-conversion/#to_date > > On Sun, Jul 23, 2017 at 11:52 PM, Divya Gehlot <divya.htco...@gmail.com> > wrote: > >> Thank you so much it worked >> >> Can you please provide me the pointer to the documentation where updation >> for different format type are mentioned . >> >> As I am facing another issue with date type as the data which I receive >> in csv format has the format of 15/1/2016 when I try to cast or convert >> to_date it throws me error >> >> >> >> >> Thanks , >> Divya >> >> On 24 July 2017 at 14:17, Abhishek Girish <agir...@apache.org> wrote: >> >>> Can you update your csv format plugin as shown below and retry your >>> query? >>> >>> "csv": { >>> "type": "text", >>> "extensions": [ >>> "csv" >>> ], >>> "lineDelimiter": "\r\n", >>> "extractHeader": true, >>> "delimiter": "," >>> } >>> >>> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <divya.htco...@gmail.com> >>> wrote: >>> >>> > 0: jdbc:drill:zk=local> select * FROM >>> > dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10. >>> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/ >>> > PublicHoliday/PublicHoliday.csv` >>> > limit 10 ; >>> > +-----------------------------------------+ >>> > | columns | >>> > +-----------------------------------------+ >>> > | ["Day","Date","Area\r"] | >>> > | ["Friday","15/1/2016","Karnataka\r"] | >>> > | ["Tuesday","26/1/2016","Karnataka\r"] | >>> > | ["Monday","7/3/2016","Karnataka\r"] | >>> > | ["Friday","25/3/2016","Karnataka\r"] | >>> > | ["Friday","1/4/2016","Karnataka\r"] | >>> > | ["Friday","8/4/2016","Karnataka\r"] | >>> > | ["Thursday","14/4/2016","Karnataka\r"] | >>> > | ["Tuesday","19/4/2016","Karnataka\r"] | >>> > | ["Sunday","1/5/2016","Karnataka\r"] | >>> > +-----------------------------------------+ >>> > 10 rows selected (0.122 seconds) >>> > 0: jdbc:drill:zk=local> select * from >>> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10. >>> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet` >>> > limit 10 ; >>> > +-----------+------------+-------+ >>> > | Day | Date | Area | >>> > +-----------+------------+-------+ >>> > | Friday | 15/1/2016 | null | >>> > | Tuesday | 26/1/2016 | null | >>> > | Monday | 7/3/2016 | null | >>> > | Friday | 25/3/2016 | null | >>> > | Friday | 1/4/2016 | null | >>> > | Friday | 8/4/2016 | null | >>> > | Thursday | 14/4/2016 | null | >>> > | Tuesday | 19/4/2016 | null | >>> > | Sunday | 1/5/2016 | null | >>> > | Monday | 9/5/2016 | null | >>> > +-----------+------------+-------+ >>> > 10 rows selected (0.1 seconds) >>> > 0: jdbc:drill:zk=local> >>> > >>> > >>> > *Drill set up* : Aapche drill is set up on Windows machine in embedded >>> mode >>> > . >>> > >>> > On 24 July 2017 at 13:30, Divya Gehlot <divya.htco...@gmail.com> >>> wrote: >>> > >>> > > >>> > > Pasting the result set in text format >>> > > >>> > > *Reading parquet file format :* >>> > > >>> > >> Day Date Area >>> > >> Friday 15/1/2016 null >>> > >> Tuesday 26/1/2016 null >>> > >> Monday 7/3/2016 null >>> > >> Friday 25/3/2016 null >>> > >> Friday 1/4/2016 null >>> > >> Friday 8/4/2016 null >>> > > >>> > > >>> > > >>> > > *Reading csv file format * >>> > > >>> > >> columns >>> > >> ["Day","Date","Area\r"] >>> > >> ["Friday","1/4/2016","Karnataka\r"] >>> > >> ["Friday","15/1/2016","Karnataka\r"] >>> > >> ["Friday","25/3/2016","Karnataka\r"] >>> > >> ["Friday","8/4/2016","Karnataka\r"] >>> > >> ["Monday","7/3/2016","Karnataka\r"] >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > *CTAS query csv to parquet :* >>> > > >>> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS >>> > >> SELECT >>> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >>> CAST(`Day` AS >>> > >> VARCHAR(100)) END AS `Day`, >>> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >>> CAST(`Date` >>> > AS >>> > >> VARCHAR(100)) END AS `Date`, >>> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >>> CAST(`Area` >>> > AS >>> > >> VARCHAR(100)) END AS `Area` >>> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter => >>> > ',', >>> > >> extractHeader => true)) >>> > > >>> > > >>> > > >>> > > Thanks, >>> > > Divya >>> > > >>> > > On 24 July 2017 at 13:20, Abhishek Girish <agir...@apache.org> >>> wrote: >>> > > >>> > >> Unfortunately, the attachments / pictures haven't come through. >>> Mailing >>> > >> lists sometimes do not support these. Can you paste as text or share >>> > links >>> > >> to it instead? >>> > >> >>> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot < >>> divya.htco...@gmail.com> >>> > >> wrote: >>> > >> >>> > >> > yes it shows the proper values when I query the csv file. >>> > >> > CTAS query csv to parquet : >>> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS >>> > >> > SELECT >>> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >>> CAST(`Day` >>> > AS >>> > >> > VARCHAR(100)) END AS `Day`, >>> > >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >>> CAST(`Date` >>> > >> AS >>> > >> > VARCHAR(100)) END AS `Date`, >>> > >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE >>> CAST(`Area` >>> > >> AS >>> > >> > VARCHAR(100)) END AS `Area` >>> > >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter >>> => >>> > >> ',', >>> > >> > extractHeader => true)) >>> > >> > >>> > >> > CSV File >>> > >> > >>> > >> > Parquet File >>> > >> > >>> > >> > >>> > >> > >>> > >> > Appreciate the help ! >>> > >> > >>> > >> > Thanks, >>> > >> > Divya >>> > >> > >>> > >> > On 24 July 2017 at 11:52, Abhishek Girish <agir...@apache.org> >>> wrote: >>> > >> > >>> > >> >> Can you share a sample row from the CSV and the CTAS query? Also >>> test >>> > >> if a >>> > >> >> select columns[n] query on the CSV file works as expected [1] ? >>> > >> >> >>> > >> >> It could be an issue with delimiters. >>> > >> >> >>> > >> >> [1] >>> > >> >> https://drill.apache.org/docs/querying-plain-text-files/#col >>> > >> >> umns[n]-syntax >>> > >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot < >>> > divya.htco...@gmail.com> >>> > >> >> wrote: >>> > >> >> >>> > >> >> > Hi , >>> > >> >> > I am facing as weird issue when I CTAS and save the csv file as >>> > >> parquet >>> > >> >> it >>> > >> >> > displays the last column values as null . >>> > >> >> > This is not the case with one file . >>> > >> >> > If I take any csv file with even with any data type and do a >>> > >> >> > select column1,column2,column3 from table.parquet >>> > >> >> > it shows the column3 values as null. >>> > >> >> > >>> > >> >> > Appreciate the help. >>> > >> >> > >>> > >> >> > Thanks, >>> > >> >> > Divya >>> > >> >> > >>> > >> >> >>> > >> > >>> > >> > >>> > >> >>> > > >>> > > >>> > >>> >> >> >