Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10950 )
Change subject: IMPALA-376: add built-in functions for parsing JSON ...................................................................... Patch Set 9: (12 comments) Zoltan, thank you for taking a look! Since the RapidJSON has been upgraded to 1.1.0, I'll update this patch soon. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc File be/src/exprs/string-functions.cc: http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@67 PS9, Line 67: thread_local FunctionContext* UdfJsonAllocator::ctx_ = nullptr; > nit: please insert empty line after class definition Done. Remove the static thread_local variable by refactoring the Malloc & Free logics as discussed before. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@69 PS9, Line 69: RapidJSON required the input strings be end of '\0'. > I don't think it's grammatically correct. Maybe: Done http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@70 PS9, Line 70: tailing > nit: trailing Done http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@96 PS9, Line 96: src_ > nit: read_position_ would be a better name I think Done. Good name. I think 'pos_ ' is shorter. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@166 PS9, Line 166: values.erase > Maybe it'd be better to have two vectors and juggling the values between th I think I should make the var name more meaningful. I'd like to rename 'values' to 'queue' since it's the queue in BFS. 'push_back' and 'erase' act more like a queue. On the other hand, the depth of the json selector is usually small, so such kind of functions won't be called too many times in parsing a json string. Refactoring these with two vectors might not have significant benifits. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@194 PS9, Line 194: (path[path_idx] != ' ') > I am not sure that there is a test for it. Either for the true and false ca Sure. Will add test coverage for this. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@195 PS9, Line 195: return -1; > Maybe taking the function context and set some error would be helpful for t Good point! I'll add more error messages. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@199 PS9, Line 199: return NULL > This function returns -1, the return NULL statement is in GetJsonObjectImpl Sorry, this comment is stale. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@217 PS9, Line 217: return -1; > Error could be set in function context. We could also use 'path_idx' in the Sure http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@251 PS9, Line 251: vector<UdfValue*> values; > The C++ standard doesn't specify the initial capacity of std::vector, but i OK. But the queue is usually short, the benefits might be limit. http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/exprs/string-functions.cc@267 PS9, Line 267: (end == nullptr) > Set error in function context? Sure http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/util/string-util.h File be/src/util/string-util.h: http://gerrit.cloudera.org:8080/#/c/10950/9/be/src/util/string-util.h@53 PS9, Line 53: AdvanceIdentifier > Maybe 'FindIdentifierEnd' or something like that would be a better name. Sure. This function is a slightly modified version of https://github.com/apache/impala/blob/2e63752858d71cc745534367a686980e060a8180/be/src/gutil/strings/util.cc#L743 -- To view, visit http://gerrit.cloudera.org:8080/10950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f Gerrit-Change-Number: 10950 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 20 Aug 2018 00:53:47 +0000 Gerrit-HasComments: Yes
