It's quite possible that it's normal, considering the MySQL will use
additional space for indexes, table definitions, etc.

You should be able to validate this fairly easily by doing a mysqldump
on the data, and comparing the size of the dump to what it store in
HDFS. Those two numbers should roughly be in the same ballpark.

-Michael

On Nov 29, 2012, at 10:19 AM, "Kartashov, Andy" <[email protected]> wrote:

> I also show some discrepancy Sqoop'ing data from MySQL.  Both MySQL "select 
> count(*)  from.." and "sqoop -eval -query "select count(*).."  return equal 
> number of rows. But after importing the data into hdfs , hadoop fs -du shows 
> imported data at roughly  1/2 the size of the actual table size in the MySQL 
> DB.  Is that normal?
>
> Cheers.
>
>
> -----Original Message-----
> From: "Christoph Böhm" [mailto:[email protected]]
> Sent: Wednesday, November 28, 2012 3:10 PM
> To: [email protected]
> Subject: Re: discrepancy du in dfs are fs
>
>
> You're right.
> "du -b" returns the expected value.
>
> Thanks.
> Chris
>
> -------- Original-Nachricht --------
>> Datum: Wed, 28 Nov 2012 20:17:18 +0530
>> Von: Mahesh Balija <[email protected]>
>> An: [email protected]
>> Betreff: Re: discrepancy du in dfs are fs
>
>> Hi Chris,
>>
>>          Can you try the following in your local machine,
>>
>>               du -b myfile.txt
>>
>>          and compare this with the hadoop fs -du myfile.txt.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>> On Wed, Nov 28, 2012 at 7:43 PM, <[email protected]> wrote:
>>
>>>
>>> Hi all,
>>>
>>> I wonder wy there is a difference between "du" on HDFS and "get" + "du"
>> on
>>> my local machnine.
>>>
>>> Here is an example:
>>>
>>> hadoop fs -du myfile.txt
>>>> 81355258
>>>
>>> hadoop fs -get myfile.txt .
>>> du myfile.txt
>>>> 34919
>>>
>>> --- nevertheless ---
>>>
>>> hadoop fs -cat  myfile.txt | wc -l
>>>> 4789943
>>>
>>> cat myfile.txt | wc -l
>>>> 4789943
>>>
>>>
>>> Any idea?
>>> Thanks.
>>> Chris
> NOTICE: This e-mail message and any attachments are confidential, subject to 
> copyright and may be privileged. Any unauthorized use, copying or disclosure 
> is prohibited. If you are not the intended recipient, please delete and 
> contact the sender immediately. Please consider the environment before 
> printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui 
> l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent 
> être couverts par le secret professionnel. Toute utilisation, copie ou 
> divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire 
> prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. 
> Veuillez penser à l'environnement avant d'imprimer le présent courriel

Reply via email to