You can simply write a mapreduce job which will do the job for you That will be readily available for hive table On Nov 20, 2012 2:29 PM, "iwannaplay games" <funnlearnfork...@gmail.com> wrote:
> How to preprocess data where millions of records are there out of > which only few thousands contain xml data > > > On 11/20/12, Nitin Pawar <nitinpawar...@gmail.com> wrote: > > Hive currently supports only new line as record separator. If you got > > newline in in column values then you will need to preprocess your data > and > > remove new line from column values > > On Nov 20, 2012 1:30 PM, "iwannaplay games" <funnlearnfork...@gmail.com> > > wrote: > > > >> Hi All, > >> > >> I have a csv file ( separated by |) where data is like > >> > >> id data > >> date > >> 1 apple > >> 24-nov-2011 > >> 2 mango > >> 26-nov-2011 > >> 3 <?xml version="1.0" encoding="utf-8"?> > >> <a>fruits</a> > >> 28-nov-2011 > >> 4 papaya > >> 30-nov-2011 > >> > >> > >> Since id=3 has new line in data field hive takes only first > >> line and treats second line as different row.I want my full xml field > >> to be taken inside data in hive table . > >> > >> it seems hive doesnt support lines terminated by '|' > >> > >> How to treat xml data in hive > >> > >> Thanks & Regards > >> Prabhjot > >> > > >