On Spark 1.6.0, calling json_tuple() with an emoji character in one of the values returns nulls:
Input: """ "myJsonBody": { "field1": "📻" } """ Query: """ ... LATERAL VIEW JSON_TUPLE(e.myJsonBody,'field1') k AS field1, ... """ This looks like a platform-dependent issue; the parsing works fine on my local computer (OSX, 1.6.3) and fails on the remote cluster(Centos7, 1.6.0) I noticed that in 1.6.0, json_tuple was implemented this way: https://github.com/apache/spark/pull/7946/files So far I have: - Checked all java system properties related to charsets on drivers and executors - Turned up logging to debug level and checked for relevant messages Any more input? Should I try the dev mailing list?