Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10950 )

Change subject: IMPALA-376: add built-in functions for parsing JSON
......................................................................


Patch Set 11:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/expr-test.cc@8804
PS11, Line 8804:       "[[1,2,3],[4,5]]");
Hive's output is different here, but I think this behavior is more reasonable, 
so I filed HIVE-20448.

https://issues.apache.org/jira/browse/HIVE-20448


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/expr-test.cc@8816
PS11, Line 8816:   TestError("get_json_object('[1,2]', '$[-2]')");
               :   TestError("get_json_object('[1,2]', '$[0.1]')");
               :   TestError("get_json_object('[1,2]', '$[a]')");
               :
               :   TestError("get_json_object('{\"key\": {\"a\": 1}}', 
'$..a')");
               :   TestError("get_json_object('{\"key\": \"value\"}', '$$')");
               :   TestError("get_json_object('{\"a\": 1}', '$a')");
               :   TestError("get_json_object('{\"a\": 1}', 'a')");
               :   TestError("get_json_object('[1,2,3]', '$[**]')");
Just a note here:
Regarding to the json path we are more strict than Hive, but I think it is a 
good thing. Hive's behavior is quite random on wrong json paths, it either 
returns all values from the json, or returns null.

Since the json path is a constant expression that we want to evaluate on all 
json data, it is quite important to raise an error when it is not valid. MySQL 
also does that for its JSON_EXTRACT() function.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc
File be/src/exprs/string-functions.cc:

http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@49
PS11, Line 49: RAPIDJSON_ASSERT(false)
We usually use DCHECK(false) for this in Impala.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@129
PS11, Line 129: if (it->IsNull()) continue;
I don't think we need this, because hive also keeps the null values and I think 
it is the desired behavior.

 hive> select get_json_object('[0, 1, null, 2, 3]', '$[*]');
 [0,1,null,2,3]

MySQL's JSON_EXTRACT() and online jsonpath evaluators like 
https://jsonpath.curiousconcept.com/ also preserve nulls.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@176
PS11, Line 176: if (path[path_idx] != ' ')
Please add test case for wildcard + spaces.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@177
PS11, Line 177: "
nit: add a space after the colon


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@179
PS11, Line 179: path[path_idx]
Please use static_cast<char>() to print it as a character instead of a number.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@254
PS11, Line 254: 64
nit: might be better to use a named constant here, like INITIAL_QUEUE_CAPACITY 
or something like that.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@266
PS11, Line 266: if (i == path_str.len) return StringVal::null();
if it means an invalid selector then we should set an error.


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@273
PS11, Line 273: json_str
should be 'path_str'


http://gerrit.cloudera.org:8080/#/c/10950/11/be/src/exprs/string-functions.cc@303
PS11, Line 303: json_str
should be 'path_str'



--
To view, visit http://gerrit.cloudera.org:8080/10950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f
Gerrit-Change-Number: 10950
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Aug 2018 13:53:44 +0000
Gerrit-HasComments: Yes

Reply via email to