Hello Tim Armstrong, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9857
to look at the new patch set (#2).
Change subject: IMPALA-6389: Make '\0' delimited text files work
......................................................................
IMPALA-6389: Make '\0' delimited text files work
Initially I didn't want to fully implement this, as the metadata
for these tables can't even be fully stored in Postgres; however
after digging into some older documentation, it appears that the
ASCII NUL character actually has been used as a field separator
in various vendors CSV implementation.
Therefore, this patch attempts to make things as non-broken as
possible and allows \0 as a field or tuple delimiter. Collection
column delimiters are not allowed to be \0, as they genuinly may
not exist and we don't want to force special escaping on an
arbitrary character. Note that the field delimiter must be distinct
from the tuple delimiter when they both exist; if it is not, the
effect will be that there is no field delimiter (this is actually
possible with single column tables).
Testing: Created a zero delimited table as described in the JIRA,
using MySQL backed Hive metastore; ran select * from tab_separated
on the table, updated the unit test. Additionally, build ASAN
and ran the unit test.
Change-Id: I2190c57681f29f34ee1eb393e30dfdda5839098c
---
M be/src/exec/delimited-text-parser-test.cc
M be/src/exec/delimited-text-parser.cc
M be/src/exec/delimited-text-parser.h
M be/src/exec/delimited-text-parser.inline.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
8 files changed, 172 insertions(+), 86 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/9857/2
--
To view, visit http://gerrit.cloudera.org:8080/9857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2190c57681f29f34ee1eb393e30dfdda5839098c
Gerrit-Change-Number: 9857
Gerrit-PatchSet: 2
Gerrit-Owner: Zach Amsden <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>