Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9525 )
Change subject: IMPALA-6389: Make '\0' delimited text files work ...................................................................... IMPALA-6389: Make '\0' delimited text files work Initially I didn't want to fully implement this, as the metadata for these tables can't even be fully stored in Postgres; however after digging into some older documentation, it appears that the ASCII NUL character actually has been used as a field separator in various vendors CSV implementation. Therefore, this patch attempts to make things as non-broken as possible and allows \0 as a field or tuple delimiter. Collection column delimiters are not allowed to be \0, as they genuinly may not exist and we don't want to force special escaping on an arbitrary character. Note that the field delimiter must be distinct from the tuple delimiter when they both exist; if it is not, the effect will be that there is no field delimiter (this is actually possible with single column tables). Testing: Created a zero delimited table as described in the JIRA, using MySQL backed Hive metastore; ran select * from tab_separated on the table, updated the unit test. Change-Id: I4b6f38cbe3f1036f60efd31a31d82d0cd8f3d2a8 Reviewed-on: http://gerrit.cloudera.org:8080/9525 Reviewed-by: Dan Hecht <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/exec/delimited-text-parser-test.cc M be/src/exec/delimited-text-parser.cc M be/src/exec/delimited-text-parser.h M be/src/exec/delimited-text-parser.inline.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 8 files changed, 169 insertions(+), 84 deletions(-) Approvals: Dan Hecht: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9525 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4b6f38cbe3f1036f60efd31a31d82d0cd8f3d2a8 Gerrit-Change-Number: 9525 Gerrit-PatchSet: 7 Gerrit-Owner: Zach Amsden <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]>
