Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10950 )
Change subject: IMPALA-376: add built-in functions for parsing JSON ...................................................................... Patch Set 11: (11 comments) http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/expr-test.cc@8804 PS11, Line 8804: "[[1,2,3],[4,5]]"); Hive's output is different here, but I think this behavior is more reasonable, so I filed HIVE-20448. https://issues.apache.org/jira/browse/HIVE-20448 http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/expr-test.cc@8816 PS11, Line 8816: TestError("get_json_object('[1,2]', '$[-2]')"); : TestError("get_json_object('[1,2]', '$[0.1]')"); : TestError("get_json_object('[1,2]', '$[a]')"); : : TestError("get_json_object('{\"key\": {\"a\": 1}}', '$..a')"); : TestError("get_json_object('{\"key\": \"value\"}', '$$')"); : TestError("get_json_object('{\"a\": 1}', '$a')"); : TestError("get_json_object('{\"a\": 1}', 'a')"); : TestError("get_json_object('[1,2,3]', '$[**]')"); Just a note here: Regarding to the json path we are more strict than Hive, but I think it is a good thing. Hive's behavior is quite random on wrong json paths, it either returns all values from the json, or returns null. Since the json path is a constant expression that we want to evaluate on all json data, it is quite important to raise an error when it is not valid. MySQL also does that for its JSON_EXTRACT() function. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc File be/src/exprs/string-functions.cc: http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@49 PS11, Line 49: RAPIDJSON_ASSERT(false) We usually use DCHECK(false) for this in Impala. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@129 PS11, Line 129: if (it->IsNull()) continue; I don't think we need this, because hive also keeps the null values and I think it is the desired behavior. hive> select get_json_object('[0, 1, null, 2, 3]', '$[*]'); [0,1,null,2,3] MySQL's JSON_EXTRACT() and online jsonpath evaluators like https://jsonpath.curiousconcept.com/ also preserve nulls. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@176 PS11, Line 176: if (path[path_idx] != ' ') Please add test case for wildcard + spaces. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@177 PS11, Line 177: " nit: add a space after the colon http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@179 PS11, Line 179: path[path_idx] Please use static_cast<char>() to print it as a character instead of a number. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@254 PS11, Line 254: 64 nit: might be better to use a named constant here, like INITIAL_QUEUE_CAPACITY or something like that. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@266 PS11, Line 266: if (i == path_str.len) return StringVal::null(); if it means an invalid selector then we should set an error. http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@273 PS11, Line 273: json_str should be 'path_str' http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@303 PS11, Line 303: json_str should be 'path_str' -- To view, visit http://gerrit.cloudera.org:8080/10950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f Gerrit-Change-Number: 10950 Gerrit-PatchSet: 11 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 23 Aug 2018 13:53:44 +0000 Gerrit-HasComments: Yes