json_tuple fails to parse string with emoji

Andrew Ehrlich Tue, 24 Jan 2017 17:16:06 -0800

On Spark 1.6.0, calling json_tuple() with an emoji character in one of the
values returns nulls:


Input:
"""
"myJsonBody": {
          "field1": "📻"
}
"""

Query:
"""
...
LATERAL VIEW JSON_TUPLE(e.myJsonBody,'field1') k AS field1,
...

"""

This looks like a platform-dependent issue; the parsing works fine on my
local computer (OSX, 1.6.3) and fails on the remote cluster(Centos7, 1.6.0)

I noticed that in 1.6.0, json_tuple was implemented this way:
https://github.com/apache/spark/pull/7946/files

So far I have:

   - Checked all java system properties related to charsets on drivers and
   executors
   - Turned up logging to debug level and checked for relevant messages

Any more input? Should I try the dev mailing list?

json_tuple fails to parse string with emoji

Reply via email to