Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10950


Change subject: IMPALA-376: add built-in functions for parsing JSON
......................................................................

IMPALA-376: add built-in functions for parsing JSON

This patch implement the same function as Hive UDF get_json_object.
We reuse RapidJson to parse the json string. Due to the constrain of
old RapidJson version, we cannot get detailed errors if parse fails.

One of the complexity of this patch is about memory management. In
order to track the memory used in RapidJson, we have to wrap
FunctionContext into an allocator. However, RapidJson requires the
Free function be static. We can only make the FunctionContext pointer
in Allocator thread_local. This is safe since each UDF thread
allocates and frees its own memory. The only drawback is that
thread_local can’t be cross-compiled to IR. This is acceptable since
the parameters are complex strings, we won't gain signifcant
performance benefit from JIT compilation.

Test:
* Add unit tests in expr-test
* Add e2e tests in exprs.test

Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
A be/src/exprs/string-functions.cc
M be/src/exprs/string-functions.h
M be/src/util/string-util.cc
M be/src/util/string-util.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/exprs.test
9 files changed, 411 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10950/9
--
To view, visit http://gerrit.cloudera.org:8080/10950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6a9d3598cb3beca0865a7edb094f3a5b602dbd2f
Gerrit-Change-Number: 10950
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <[email protected]>

Reply via email to