Hi, Thank you all for your help. I'll try both ways and i'll get back to you.
On Fri, Sep 7, 2012 at 11:02 AM, Mohammad Tariq <donta...@gmail.com> wrote: > I said this assuming that a Hadoop cluster is available since Sandeep is > planning to use Hive. If that is the case then MapReduce would be faster > for such large files. > > Regards, > Mohammad Tariq > > > > On Fri, Sep 7, 2012 at 8:27 PM, Connell, Chuck > <chuck.conn...@nuance.com>wrote: > >> I cannot promise which is faster. A lot depends on how clever your >> scripts are.**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com] >> *Sent:* Friday, September 07, 2012 10:42 AM >> *To:* user@hive.apache.org >> *Subject:* Re: How to load csv data into HIVE**** >> >> ** ** >> >> Hi, >> I wrote a shell script to get csv data but when i run that script on a >> 12GB csv its taking more time. If i run a python script will that be faster? >> **** >> >> On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck <chuck.conn...@nuance.com> >> wrote:**** >> >> How about a Python script that changes it into plain tab-separated text? >> So it would look like this…**** >> >> **** >> >> 174969274<tab>14-mar-2006<tab>3522876<tab> >> <tab>14-mar-2006<tab>500000308<tab>65<tab>1<newline> >> etc…**** >> >> **** >> >> Tab-separated with newlines is easy to read and works perfectly on import. >> **** >> >> **** >> >> Chuck Connell**** >> >> Nuance R&D Data Team**** >> >> Burlington, MA**** >> >> 781-565-4611**** >> >> **** >> >> *From:* Sandeep Reddy P [mailto:sandeepreddy.3...@gmail.com] >> *Subject:* How to load csv data into HIVE**** >> >> **** >> >> Hi, >> Here is the sample data >> "174969274","14-mar-2006","**** >> >> 3522876","","14-mar-2006","500000308","65","1"| >> "174969275","19-jul-2006","3523154","","19-jul-2006","500000308","65","1"| >> "174969276","31-dec-2005","3530333","","31-dec-2005","500000308","65","1"| >> "174969277","14-apr-2005","3531470","","14-apr-2005","500000308","65","1"| >> >> How to load this kind of data into HIVE? >> I'm using shell script to get rid of double quotes and '|' but its taking >> very long time to work on each csv which are 12GB each. What is the best >> way to do this?**** >> >> **** >> >> >> >> >> -- >> Thanks, >> sandeep**** >> > > -- Thanks, sandeep