this is my usecase: Another system upload csv files to my system. In csv files, there are complicated data types such as map. In order to express complicated data types and ordinary string having special characters?? we put urlencoded string in csv files. So we use urlencoded json string to express map,string and array.
second stage: load csv files to spark text table. ############### CREATE TABLE `a_text`( parameters string ) load data inpath 'XXX' into table a_text; ############# Third stage: insert into spark parquet table select from text table. In order to use advantage of complicated data types, we use udf to transform a json string to map , and put map to table. CREATE TABLE `a_parquet`( parameters map<string,string> ) insert into a_parquet select UDF(parameters ) from a_text; So do you have any suggestions? ------------------ ???????? ------------------ ??????: "Ted Yu";<yuzhih...@gmail.com>; ????????: 2016??5??16??(??????) ????0:44 ??????: "??????"<251922...@qq.com>; ????: "user"<user@spark.apache.org>; ????: Re: spark udf can not change a json string to a map Can you let us know more about your use case ? I wonder if you can structure your udf by not returning Map. Cheers On Sun, May 15, 2016 at 9:18 AM, ?????? <251922...@qq.com> wrote: Hi, all. I want to implement a udf which is used to change a json string to a map<string,string>. But some problem occurs. My spark version:1.5.1. my udf code: #################### public Map<String,String> evaluate(final String s) { if (s == null) return null; return getString(s); } @SuppressWarnings("unchecked") public static Map<String,String> getString(String s) { try { String str = URLDecoder.decode(s, "UTF-8"); ObjectMapper mapper = new ObjectMapper(); Map<String,String> map = mapper.readValue(str, Map.class); return map; } catch (Exception e) { return new HashMap<String,String>(); } } ############# exception infos: 16/05/14 21:05:22 ERROR CliDriver: org.apache.spark.sql.AnalysisException: Map type in java is unsupported because JVM type erasure makes spark fail to catch key and value types in Map<>; line 1 pos 352 at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:230) at org.apache.spark.sql.hive.HiveSimpleUDF.javaClassToDataType(hiveUDFs.scala:107) at org.apache.spark.sql.hive.HiveSimpleUDF.<init>(hiveUDFs.scala:136) ################ I have saw that there is a testsuite in spark says spark did not support this kind of udf. But is there a method to implement this udf?