Hi Tom, My hint is your BLOCKSIZE should be multiple of CRC. Check your property dfs.block.size - convert it into bytes, then divide it with the checksum value that is set, usually its dfs.bytes-per-checksum property that tells this value or you can get the checksum value from the error message you are getting.
HDFS uses this checksum value to make sure the data doesn't get courrpted while transfer (due to loss of bytes etc). I hope setting your block size with the multiple of your CRC checksum should solve your problem Regards Prav On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <tombrow...@gmail.com> wrote: > What is the right way to use the "-crc" option with hadoop dfs > -copyToLocal? > > Is this the wrong list? > > --Tom > > > On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <tombrow...@gmail.com> wrote: > >> I am archiving a large amount of data out of my HDFS file system to a >> separate shared storage solution (There is not much HDFS space left in my >> cluster, and upgrading it is not an option right now). >> >> I understand that HDFS internally manages checksums and won't succeed if >> the data doesn't match the CRC, so I'm not worried about corruption when >> reading from HDFS. >> >> However, I want to store the HDFS crc calculations alongside the data >> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc >> <hdfs-source> <local-dest>" command would work, but it always gives me the >> error "-crc option is not valid when source file system does not have crc >> files" >> >> Can someone explain what exactly that option does, and when (if ever) it >> should be used? >> >> Thanks in advance! >> >> --Tom >> > >