Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14905 )
Change subject: IMPALA-8547: get_json_object fails to get value for numeric key ...................................................................... Patch Set 1: (1 comment) Thanks for the contribution Euguene! A few comments: It would be great if you could include some more information in the commit message. Take a look at https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala for an example of an Impala commit message. The commit message should include information about how to reproduce the issue, how this patch fixes the issue, and how this patch was tested. For the patch itself, the behavior here seems to be a bit ambiguous between SQL engines. While Hive (and Spark SQL) can execute the query you pasted in the JIRA, the Hive docs (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) actually state that: Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid. NOTE: The json path can only have the characters [0-9a-z_], i.e., no upper-case or special characters. Also, the keys *cannot start with numbers.* This is due to restrictions on Hive column names. I checked MySQL as well, and it actually cannot run the query from the JIRA description (JSON_EXTRACT is the equivalent in MySQL). It fails with the error: "Query Error: Error: ER_INVALID_JSON_PATH: Invalid JSON path expression. The error is around character position 3." However, if you put quotations around the path identifier, it works: select JSON_EXTRACT('{"1": 5}', '$."1"'); https://stackoverflow.com/questions/49557144/how-to-extract-values-from-a-numeric-keyed-nested-json-field-in-mysql has more info. Oddly enough, MS SQL Server and Oracle follows the MySQL behavior, but Postgres follows the Hive behavior. In general, Impala tries to follow MySQL behavior. Given that MS SQL and Oracle follow the MySQL behavior as well, I think Impala should too. http://gerrit.cloudera.org:8080/#/c/14905/1/be/src/util/string-util.cc File be/src/util/string-util.cc: http://gerrit.cloudera.org:8080/#/c/14905/1/be/src/util/string-util.cc@76 PS1, Line 76: FindEndOfIdentifier need to update the docs of this method in string-util.h. The docs currently say that this method is to find C style identifiers. A C style identifier cannot start with a digit (https://docs.microsoft.com/en-us/cpp/c-language/c-identifiers?view=vs-2019). -- To view, visit http://gerrit.cloudera.org:8080/14905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4 Gerrit-Change-Number: 14905 Gerrit-PatchSet: 1 Gerrit-Owner: Eugene Zimichev <und...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Comment-Date: Mon, 16 Dec 2019 18:38:48 +0000 Gerrit-HasComments: Yes