Thanks Gera for creating ticket on Jira. I am bit new to this patch system and I could not find any proper command on ticket. Can I have a proper command/documentation which I can use for testing checksum after applying patch on my cluster?
Thanks and Regards, Shashi On Sun, Aug 16, 2015 at 2:13 AM, Gera Shegalov <[email protected]> wrote: > I filed https://issues.apache.org/jira/browse/HADOOP-12326 to do that, > you can take a look at the patch. Your understanding is correct: md5 of crc > in each block, then md5 of those block md5s. > > On Sun, Aug 9, 2015 at 7:35 AM Shashi Vishwakarma < > [email protected]> wrote: > >> Hi Gera, >> >> Thanks for your input. I have fairly large amount of data and if I go by >> -cat option followed by md5sum calculation then it will become time >> consuming process. >> >> I could understand from the code that hadoop checksum is nothing but MD5 >> of MD5 of CRC32C and then returning output.I would be more curious to know >> if in case I have to create checksum manually that hadoop is doing >> internally, then how do I do that? >> >> Is there any document or link available which can explain that how this >> checksum calculation works behind the scene? >> >> Thanks >> Shashi >> >> On Sat, Aug 8, 2015 at 8:00 AM, Gera Shegalov <[email protected]> wrote: >> >>> The fs checksum output has more info like bytes per CRC, CRC per block. >>> See e.g.: >>> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/MD5MD5CRC32FileChecksum.java >>> >>> In order to avoid dealing with different formatting or byte order you >>> could use md5sum for the remote file as well if the file is reasonably small >>> >>> hadoop fs -cat /abc.txt | md5sum >>> >>> On Fri, Aug 7, 2015 at 3:35 AM Shashi Vishwakarma < >>> [email protected]> wrote: >>> >>>> Hi >>>> >>>> I have a small confusion regarding checksum verification.Lets say , i >>>> have a file abc.txt and I transferred this file to hdfs. How do I ensure >>>> about data integrity? >>>> >>>> I followed below steps to check that file is correctly transferred. >>>> >>>> *On Local File System:* >>>> >>>> md5sum abc.txt >>>> >>>> 276fb620d097728ba1983928935d6121 TestFile >>>> >>>> *On Hadoop Cluster :* >>>> >>>> hadoop fs -checksum /abc.txt >>>> >>>> /abc.txt MD5-of-0MD5-of-512CRC32C >>>> 000002000000000000000000911156a9cf0d906c56db7c8141320df0 >>>> >>>> Both output looks different to me. Let me know if I am doing anything >>>> wrong. >>>> >>>> How do I verify if my file is transferred properly into HDFS? >>>> >>>> Thanks >>>> Shashi >>>> >>> >>
