Re: Sqoop export .lzo to mysql duplicates

Jarek Jarcec Cecho Thu, 22 Nov 2012 22:48:06 -0800

Hi Bhargav,
I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was able 
to replicate the behaviour in my testing environment today and my intention is 
to continue debugging tomorrow.


As a workaround you can decompress the files manually prior Sqoop export for 
now.

Jarcec

Links:
1: https://issues.apache.org/jira/browse/SQOOP-721

On Nov 22, 2012, at 10:00 PM, Jarek Jarcec Cecho <[email protected]> wrote:

> Hi Bhargav,
> I believe that you might be hitting known Sqoop bug SQOOP-721 [1]. I was able 
> to replicate the behaviour in my testing environment today and my intention 
> is to continue debugging tomorrow.
> 
> As a workaround you can decompress the files manually prior Sqoop export for 
> now.
> 
> Jarcec
> 
> Links:
> 1: https://issues.apache.org/jira/browse/SQOOP-721
> 
> On Nov 22, 2012, at 9:07 PM, Bhargav Nallapu 
> <[email protected]> wrote:
> 
>> 
>> Hi,
>> 
>> Finding this strange issue.
>> 
>> Context:
>> 
>> Hive writes an output to an external table, with LZO  compression in place. 
>> So, my hdfs folder has large_file.lzo
>> 
>> Using Sqoop, when I try to export this file to the mysql table, the num of 
>> rows is doubled.
>> 
>> Then I do,
>> lzop -d large_file.lzo
>> 
>> This doesn't happen if I load the same file uncompressing it, "large_file" 
>> Rows are as expected.
>> 
>> Where as both small_file and small_file.lzo are loaded with correct rows.
>> 
>> Sqoop : v 1.30
>> Num of mappers : 1
>> 
>> Observation : Any compressed file (gzipped or lzo) of size greater than 60 
>> MB (might be 64 MB), while exported to DB puts the double the row count, 
>> probably exact duplicates.
>> Can anyone please help?
>> 
>

Re: Sqoop export .lzo to mysql duplicates

Reply via email to