Hi, I have complex log files (compressed ".gz", 200G) on HDFS.
+ log file format : 127.0.0.1 [2012Avg08] "a=abc&b=adf&c=aadfad" I think DDL)), CREATE TABLE log_tb (ip STRING, dt STRING, kv Map<STRING, STRING>) ROW FORMAT SERDE "??" STORED AS SEQUENCEFILE; I want the results below. SELECT kv['b'] FROM log_tb LIMIT 10; 1) How do I parsing to Complex log file (compressed(".gz", 200G) 2) If I have to SerDe, what SerDe should I use? 3) Does existed SerDe(input/output) by user define class? 4) If I use to partition with log file, how use to DDL, DML?..plz. sample sql (DDL, DML) Thanks.