Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10950
Change subject: IMPALA-376: add built-in functions for parsing JSON ...................................................................... IMPALA-376: add built-in functions for parsing JSON This patch implement the same function as Hive UDF get_json_object. We reuse RapidJson to parse the json string. Due to the constrain of old RapidJson version, we cannot get detailed errors if parse fails. One of the complexity of this patch is about memory management. In order to track the memory used in RapidJson, we have to wrap FunctionContext into an allocator. However, RapidJson requires the Free function be static. We can only make the FunctionContext pointer in Allocator thread_local. This is safe since each UDF thread allocates and frees its own memory. The only drawback is that thread_local can’t be cross-compiled to IR. This is acceptable since the parameters are complex strings, we won't gain signifcant performance benefit from JIT compilation. Test: * Add unit tests in expr-test * Add e2e tests in exprs.test Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f --- M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc A be/src/exprs/string-functions.cc M be/src/exprs/string-functions.h M be/src/util/string-util.cc M be/src/util/string-util.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 411 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10950/9 -- To view, visit http://gerrit.cloudera.org:8080/10950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f Gerrit-Change-Number: 10950 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang <[email protected]>
