?????? spark udf can not change a json string to a map

?????? Sun, 15 May 2016 19:01:31 -0700

this is my usecase:
   Another system upload csv files to my system. In csv files, there are 
complicated data types such as map. In order to express complicated data types 
and ordinary string having special characters?? we put urlencoded string in csv 
files.  So we use urlencoded json string to express map,string and array.



second stage:
  load csv files to spark text table. 
###############
CREATE TABLE `a_text`(
  parameters  string
)
load data inpath 'XXX' into table a_text;
#############
Third stage:
 insert into spark parquet table select from text table. In order to use 
advantage of complicated data types, we use udf to transform a json string to 
map , and put map to table.


CREATE TABLE `a_parquet`(
  parameters   map<string,string>
)



insert into a_parquet select UDF(parameters ) from a_text;


So do you have any suggestions?












------------------ ???????? ------------------
??????: "Ted Yu";<yuzhih...@gmail.com>;
????????: 2016??5??16??(??????) ????0:44
??????: "??????"<251922...@qq.com>; 
????: "user"<user@spark.apache.org>; 
????: Re: spark udf can not change a json string to a map



Can you let us know more about your use case ?

I wonder if you can structure your udf by not returning Map.


Cheers


On Sun, May 15, 2016 at 9:18 AM, ?????? <251922...@qq.com> wrote:
Hi, all. I want to implement a udf which is used to change a json string to a 
map<string,string>.
But some problem occurs. My spark version:1.5.1.




my udf code:
####################
        public Map<String,String> evaluate(final String s) {
                if (s == null)
                        return null;
                return getString(s);
        }


        @SuppressWarnings("unchecked")
        public static Map<String,String> getString(String s) {
                try {
                        String str =  URLDecoder.decode(s, "UTF-8");
                        ObjectMapper mapper = new ObjectMapper();
                        Map<String,String>  map = mapper.readValue(str, 
Map.class);
                        
                        return map;
                } catch (Exception e) {
                        return new HashMap<String,String>();
                }
        }

#############
exception infos:


16/05/14 21:05:22 ERROR CliDriver: org.apache.spark.sql.AnalysisException: Map 
type in java is unsupported because JVM type erasure makes spark fail to catch 
key and value types in Map<>; line 1 pos 352
        at 
org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:230)
        at 
org.apache.spark.sql.hive.HiveSimpleUDF.javaClassToDataType(hiveUDFs.scala:107)
        at org.apache.spark.sql.hive.HiveSimpleUDF.<init>(hiveUDFs.scala:136)

################




I have saw that there is a testsuite in spark says spark did not support this 
kind of udf.
But is there a method to implement this udf?

?????? spark udf can not change a json string to a map

Reply via email to