Hi Tom,

My hint is your BLOCKSIZE should be multiple of CRC. Check your property
dfs.block.size - convert it into bytes, then divide it with the checksum
value that is set, usually its dfs.bytes-per-checksum property that tells
this value or you can get the checksum value from the error message you are
getting.

HDFS uses this checksum value to make sure the data doesn't get courrpted
while transfer (due to loss of bytes etc).

I hope setting your block size with the multiple of your CRC checksum
should solve your problem

Regards
Prav


On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <tombrow...@gmail.com> wrote:

> What is the right way to use the "-crc" option with hadoop dfs
> -copyToLocal?
>
> Is this the wrong list?
>
> --Tom
>
>
> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <tombrow...@gmail.com> wrote:
>
>> I am archiving a large amount of data out of my HDFS file system to a
>> separate shared storage solution (There is not much HDFS space left in my
>> cluster, and upgrading it is not an option right now).
>>
>> I understand that HDFS internally manages checksums and won't succeed if
>> the data doesn't match the CRC, so I'm not worried about corruption when
>> reading from HDFS.
>>
>> However, I want to store the HDFS crc calculations alongside the data
>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>> error "-crc option is not valid when source file system does not have crc
>> files"
>>
>> Can someone explain what exactly that option does, and when (if ever) it
>> should be used?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Reply via email to