Yes, that's the reason I wonder why is the specific one file cause the problem while other data files of a hive table are not.
On Tue, Jan 10, 2017 at 3:42 AM, Ravi Prakash <ravihad...@gmail.com> wrote: > I have not been able to reproduce this: > > [raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt / > [raviprak@ravi ~]$ cd /tmp > [raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt > [raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck > [raviprak@ravi tmp]$ md5sum hck > 8dc8966178cc1bf4eb95a5b31780269c hck > [raviprak@ravi tmp]$ md5sum HuckleberryFinn.txt > 8dc8966178cc1bf4eb95a5b31780269c HuckleberryFinn.txt > [raviprak@ravi tmp]$ hdfs dfs -put hck / > [raviprak@ravi tmp]$ hdfs dfs -checksum /HuckleberryFinn.txt > /HuckleberryFinn.txt MD5-of-0MD5-of-512CRC32C > 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8 > [raviprak@ravi tmp]$ hdfs dfs -checksum /hck > /hck MD5-of-0MD5-of-512CRC32C > 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8 > > This is on trunk. > > On Sun, Jan 8, 2017 at 6:52 PM, Mungeol Heo <mungeol....@gmail.com> wrote: >> >> "^A" is used as delimiter in the file. >> However, I don't think this is the reason causing the problem, because >> there are files also using "^A" as delimiter but with no problem. >> BTW, the reason using "^A" as delimiter is these files are hive data. >> >> On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <ravihad...@gmail.com> >> wrote: >> > Is there a carriage return / new line / some other whitespace which >> > `cat` >> > may be appending? >> > >> > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <mungeol....@gmail.com> >> > wrote: >> >> >> >> Hello, >> >> >> >> Suppose, I name the HDFS file which cause the problem as A. >> >> >> >> hdfs dfs -ls A >> >> -rw-r--r-- 3 web_admin hdfs 868003931 2017-01-04 09:05 A >> >> >> >> hdfs dfs -get A AFromGet >> >> hdfs dfs -cat A > AFromCat >> >> >> >> ls -l >> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan 5 18:32 AFromGet >> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan 5 18:32 AFromCat >> >> >> >> hdfs dfs -put AFromGet >> >> >> >> diff <(hdfs dfs -cat A) <(hdfs dfs -cat AFromGet) >> >> (no output, which means the contents of two files are same. At least, >> >> after "cat") >> >> >> >> hdfs dfs -checksum A >> >> A MD5-of-262144MD5-of-512CRC32C >> >> 000002000000000000040000e667fb4f0dda78101feb2b689af8260b >> >> >> >> hdfs dfs -checksum AFromGet >> >> AFromGet MD5-of-262144MD5-of-512CRC32C >> >> 0000020000000000000400007284759249ff98c7395e6a4bb59343dc >> >> >> >> As I listed some results above. I wonder why is the size of the file >> >> changed. >> >> Any help will be GREAT! >> >> >> >> Thank you. >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org >> >> For additional commands, e-mail: user-h...@hadoop.apache.org >> >> >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org