I have this already filled in under: https://issues.apache.org/jira/browse/DRILL-3726
On Wed, Sep 30, 2015 at 11:47 AM, rahul challapalli < [email protected]> wrote: > Looks like a bug to me. Can you raise a jira for this if you haven't done > it already > On Sep 30, 2015 8:04 AM, <[email protected]> wrote: > > > I've seen that issue too... ;) > > > > My personal opinion is that Drill (and sqlline) should treat Windows > > end-of-line characters the same as Unix end-of-line characters. It > doesn't > > seem reasonable to expect users (especially in an enterprise setting) to > > use dos2unix on every data file just so they can trust their query > results. > > > > > > Phil > > > > > > Philip A Grim II > > Chief Engineer > > L-3 Data Tactics > > 7901 Jones Branch Dr. > > Suite 700 > > McLean, VA 22102 > > > > > > ________________________________________ > > From: Christopher Matta [[email protected]] > > Sent: Wednesday, September 30, 2015 10:56 AM > > To: [email protected] > > Subject: CSV with windows carriage return causes issues > > > > I’ve created a very simple reproduction of an issue I’ve observed with > > files that have a carriage return (\r) instead of a line feed (\n) > ending. > > > > My CSV file was created using notepad on Windows and looks like this when > > queried directly from drill: > > > > 0: jdbc:drill:zk=local> select * from > > dfs.`Users/cmatta/Downloads/windows_drill_test.csv`; > > +-----------------------------------------------+ > > | columns | > > +-----------------------------------------------+ > > | ["11111","test1","test2","test3","test4\r"] | > > | ["22222","test5","test6","test7","test8\r"] | > > | ["33333","test9","test10","test11","test12"] | > > +-----------------------------------------------+ > > > > As you can see the first two rows have \r at the end, also note that > > column[0] has five digits. > > > > When casting into their own columns the a column gets a digit truncated: > > > > 0: jdbc:drill:zk=local> select cast(columns[0] as integer) as a, > > cast(columns[1] as varchar(32)) as b, cast(columns[2] as varchar(32)) > > as c, cast(columns[3] as varchar(32)) as d, cast(columns[4] as > > varchar(32)) as e from > > dfs.`Users/cmatta/Downloads/windows_drill_test.csv`; > > +--------+--------+---------+---------+---------+ > > | a | b | c | d | e | > > +--------+--------+---------+---------+---------+ > > |1111 | test1 | test2 | test3 | test4 > > |2222 | test5 | test6 | test7 | test8 > > | 33333 | test9 | test10 | test11 | test12 | > > +--------+--------+---------+---------+---------+ > > > > I can get around this by using regexp_replace on the last column: > > > > 0: jdbc:drill:zk=local> select cast(columns[0] as integer) as a, > > cast(columns[1] as varchar(32)) as b, cast(columns[2] as varchar(32)) > > as c, cast(columns[3] as varchar(32)) as d, > > cast(regexp_replace(columns[4], '\r', '') as varchar(32)) as e from > > dfs.`Users/cmatta/Downloads/windows_drill_test.csv`; > > +--------+--------+---------+---------+---------+ > > | a | b | c | d | e | > > +--------+--------+---------+---------+---------+ > > | 11111 | test1 | test2 | test3 | test4 | > > | 22222 | test5 | test6 | test7 | test8 | > > | 33333 | test9 | test10 | test11 | test12 | > > +--------+--------+---------+---------+---------+ > > > > Is this expected, or should Drill treat carriage returns as line feeds? > Is > > this simply sqlline interpreting the \r character? > > Chris [email protected] > > 215-701-3146 > > > > >
