Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14905 )

Change subject: IMPALA-8547: get_json_object fails to get value for numeric key
......................................................................


Patch Set 1:

(1 comment)

Thanks for the contribution Euguene!

A few comments:

It would be great if you could include some more information in the commit 
message. Take a look at 
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala for 
an example of an Impala commit message. The commit message should include 
information about how to reproduce the issue, how this patch fixes the issue, 
and how this patch was tested.

For the patch itself, the behavior here seems to be a bit ambiguous between SQL 
engines. While Hive (and Spark SQL) can execute the query you pasted in the 
JIRA, the Hive docs 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) actually 
state that:

 Extracts json object from a json string based on json path specified, and 
returns json string of the extracted json object. It will return null if the 
input json string is invalid. NOTE: The json path can only have the characters 
[0-9a-z_], i.e., no upper-case or special characters. Also, the keys *cannot 
start with numbers.* This is due to restrictions on Hive column names.

I checked MySQL as well, and it actually cannot run the query from the JIRA 
description (JSON_EXTRACT is the equivalent in MySQL). It fails with the error: 
"Query Error: Error: ER_INVALID_JSON_PATH: Invalid JSON path expression. The 
error is around character position 3."

However, if you put quotations around the path identifier, it works:

 select JSON_EXTRACT('{"1": 5}', '$."1"');

https://stackoverflow.com/questions/49557144/how-to-extract-values-from-a-numeric-keyed-nested-json-field-in-mysql
 has more info.

Oddly enough, MS SQL Server and Oracle follows the MySQL behavior, but Postgres 
follows the Hive behavior.

In general, Impala tries to follow MySQL behavior. Given that MS SQL and Oracle 
follow the MySQL behavior as well, I think Impala should too.

http://gerrit.cloudera.org:8080/#/c/14905/1/be/src/util/string-util.cc
File be/src/util/string-util.cc:

http://gerrit.cloudera.org:8080/#/c/14905/1/be/src/util/string-util.cc@76
PS1, Line 76: FindEndOfIdentifier
need to update the docs of this method in string-util.h. The docs currently say 
that this method is to find C style identifiers. A C style identifier cannot 
start with a digit 
(https://docs.microsoft.com/en-us/cpp/c-language/c-identifiers?view=vs-2019).



--
To view, visit http://gerrit.cloudera.org:8080/14905
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4
Gerrit-Change-Number: 14905
Gerrit-PatchSet: 1
Gerrit-Owner: Eugene Zimichev <und...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 18:38:48 +0000
Gerrit-HasComments: Yes

Reply via email to