It's quite possible that it's normal, considering the MySQL will use additional space for indexes, table definitions, etc.
You should be able to validate this fairly easily by doing a mysqldump on the data, and comparing the size of the dump to what it store in HDFS. Those two numbers should roughly be in the same ballpark. -Michael On Nov 29, 2012, at 10:19 AM, "Kartashov, Andy" <[email protected]> wrote: > I also show some discrepancy Sqoop'ing data from MySQL. Both MySQL "select > count(*) from.." and "sqoop -eval -query "select count(*).." return equal > number of rows. But after importing the data into hdfs , hadoop fs -du shows > imported data at roughly 1/2 the size of the actual table size in the MySQL > DB. Is that normal? > > Cheers. > > > -----Original Message----- > From: "Christoph Böhm" [mailto:[email protected]] > Sent: Wednesday, November 28, 2012 3:10 PM > To: [email protected] > Subject: Re: discrepancy du in dfs are fs > > > You're right. > "du -b" returns the expected value. > > Thanks. > Chris > > -------- Original-Nachricht -------- >> Datum: Wed, 28 Nov 2012 20:17:18 +0530 >> Von: Mahesh Balija <[email protected]> >> An: [email protected] >> Betreff: Re: discrepancy du in dfs are fs > >> Hi Chris, >> >> Can you try the following in your local machine, >> >> du -b myfile.txt >> >> and compare this with the hadoop fs -du myfile.txt. >> >> Best, >> Mahesh Balija, >> Calsoft Labs. >> >> On Wed, Nov 28, 2012 at 7:43 PM, <[email protected]> wrote: >> >>> >>> Hi all, >>> >>> I wonder wy there is a difference between "du" on HDFS and "get" + "du" >> on >>> my local machnine. >>> >>> Here is an example: >>> >>> hadoop fs -du myfile.txt >>>> 81355258 >>> >>> hadoop fs -get myfile.txt . >>> du myfile.txt >>>> 34919 >>> >>> --- nevertheless --- >>> >>> hadoop fs -cat myfile.txt | wc -l >>>> 4789943 >>> >>> cat myfile.txt | wc -l >>>> 4789943 >>> >>> >>> Any idea? >>> Thanks. >>> Chris > NOTICE: This e-mail message and any attachments are confidential, subject to > copyright and may be privileged. Any unauthorized use, copying or disclosure > is prohibited. If you are not the intended recipient, please delete and > contact the sender immediately. Please consider the environment before > printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui > l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent > être couverts par le secret professionnel. Toute utilisation, copie ou > divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire > prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. > Veuillez penser à l'environnement avant d'imprimer le présent courriel
