Hey, Any leads? On Tue, Nov 22, 2016 at 5:35 PM, Dana Ram Meghwal <dana...@saavn.com> wrote:
> Hey All, > > I am using Hive 2.0 with external meta-store on EMR-5.0.0 and TEZ as > execution engine. > Our data are stored in json format so for serialization and > deserialization purpose we are planning to use lazy serde > (classname is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' ). > > My table definition is > > CREATE EXTERNAL TABLE IF NOT EXISTS > daily_active_users_summary_json_partition_dt_paths_v1 > (uid string, city string, user string, songcount string, songid_list > array<string> ) PARTITIONED BY ( dt string) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > > WITH SERDEPROPERTIES ('paths'='uid,city,user,songcount,songid_list') > > LOCATION 's3://<bucketname removed>/users/daily_active_ > users_summary_json_partition_dt'; > > > and data look like this--- > > {"uid":"xxxxxxyyyy","listening_user_flag":"non_listening","platform":"android","model":"micromax > a110q","aquisition_channel":"organic","state":"delhi","app_ > version":"3.2:","country":"IN","city":"new delhi","new_listening_user_ > flag":"non_listening","manufacturer":"Micromax"," > login_mode":"loggedout","new_user_flag":"returning","digital_channel":"Not > Source"} > > > Note: I have pasted here one record in table. > > > Now, When I do query > > select * from daily_active_users_summary_json_partition_dt_paths_v1 limit > 5; > > > the first field of table takes the complete record and rest of field are > showing to be NULL. > > When I use different serde 'org.apache.hive.hcatalog.data.JsonSerDe' > > then I can see the above query works fine and able to serialize data > perfectly fine. We want to user the lazy serde because our data contains > non-utf-8 character and the later serde does not support non-utf-8 > character serialization/deserialization. > > > Can you please help me solve this, we mostly want to use lazy serde only > as we have already experimented with other serde's none of them is working > for us Is there any configuration which enable > serialization/deserialization while using lazy Serde. > > Or is there any other serde which can fine process non-utf-8 character in > hive-2 and tez. > > Thank you > > > Best Regards, > Dana Ram Meghwal > Software Engineer > dana...@saavn.com > > -- Dana Ram Meghwal Software Engineer dana...@saavn.com