I’ve created a very simple reproduction of an issue I’ve observed with files that have a carriage return (\r) instead of a line feed (\n) ending.
My CSV file was created using notepad on Windows and looks like this when queried directly from drill: 0: jdbc:drill:zk=local> select * from dfs.`Users/cmatta/Downloads/windows_drill_test.csv`; +-----------------------------------------------+ | columns | +-----------------------------------------------+ | ["11111","test1","test2","test3","test4\r"] | | ["22222","test5","test6","test7","test8\r"] | | ["33333","test9","test10","test11","test12"] | +-----------------------------------------------+ As you can see the first two rows have \r at the end, also note that column[0] has five digits. When casting into their own columns the a column gets a digit truncated: 0: jdbc:drill:zk=local> select cast(columns[0] as integer) as a, cast(columns[1] as varchar(32)) as b, cast(columns[2] as varchar(32)) as c, cast(columns[3] as varchar(32)) as d, cast(columns[4] as varchar(32)) as e from dfs.`Users/cmatta/Downloads/windows_drill_test.csv`; +--------+--------+---------+---------+---------+ | a | b | c | d | e | +--------+--------+---------+---------+---------+ |1111 | test1 | test2 | test3 | test4 |2222 | test5 | test6 | test7 | test8 | 33333 | test9 | test10 | test11 | test12 | +--------+--------+---------+---------+---------+ I can get around this by using regexp_replace on the last column: 0: jdbc:drill:zk=local> select cast(columns[0] as integer) as a, cast(columns[1] as varchar(32)) as b, cast(columns[2] as varchar(32)) as c, cast(columns[3] as varchar(32)) as d, cast(regexp_replace(columns[4], '\r', '') as varchar(32)) as e from dfs.`Users/cmatta/Downloads/windows_drill_test.csv`; +--------+--------+---------+---------+---------+ | a | b | c | d | e | +--------+--------+---------+---------+---------+ | 11111 | test1 | test2 | test3 | test4 | | 22222 | test5 | test6 | test7 | test8 | | 33333 | test9 | test10 | test11 | test12 | +--------+--------+---------+---------+---------+ Is this expected, or should Drill treat carriage returns as line feeds? Is this simply sqlline interpreting the \r character? Chris [email protected] 215-701-3146
