Hi Jarec, Thanks for a quick reply.
Infact I've checked this ticket as soon as you directed me to. But was just skeptical since it was filed as recent as yesterday. Since exporting a gzipped file using sqoop is a pretty common thing to do , I was wondering if it is a known issue already, or probably fixed in any of the recent versions. If not, I shall keep track of the ticket , try debugging myself or wait to know your findings. Thanks. On Fri, Nov 23, 2012 at 12:17 PM, Jarek Jarcec Cecho <[email protected]>wrote: > Hi Bhargav, > I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was > able to replicate the behaviour in my testing environment today and my > intention is to continue debugging tomorrow. > > As a workaround you can decompress the files manually prior Sqoop export > for now. > > Jarcec > > Links: > 1: https://issues.apache.org/jira/browse/SQOOP-721 > > On Nov 22, 2012, at 10:00 PM, Jarek Jarcec Cecho <[email protected]> > wrote: > > > Hi Bhargav, > > I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was > able to replicate the behaviour in my testing environment today and my > intention is to continue debugging tomorrow. > > > > As a workaround you can decompress the files manually prior Sqoop export > for now. > > > > Jarcec > > > > Links: > > 1: https://issues.apache.org/jira/browse/SQOOP-721 > > > > On Nov 22, 2012, at 9:07 PM, Bhargav Nallapu < > [email protected]> wrote: > > > >> > >> Hi, > >> > >> Finding this strange issue. > >> > >> Context: > >> > >> Hive writes an output to an external table, with LZO compression in > place. So, my hdfs folder has large_file.lzo > >> > >> Using Sqoop, when I try to export this file to the mysql table, the num > of rows is doubled. > >> > >> Then I do, > >> lzop -d large_file.lzo > >> > >> This doesn't happen if I load the same file uncompressing it, > "large_file" Rows are as expected. > >> > >> Where as both small_file and small_file.lzo are loaded with correct > rows. > >> > >> Sqoop : v 1.30 > >> Num of mappers : 1 > >> > >> Observation : Any compressed file (gzipped or lzo) of size greater than > 60 MB (might be 64 MB), while exported to DB puts the double the row count, > probably exact duplicates. > >> Can anyone please help? > >> > > > >
