[jira] [Updated] (HDDS-3802) Incorrect data returned by reading a FILE_PER_CHUNK block

Sammi Chen (Jira) Mon, 15 Jun 2020 20:30:16 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sammi Chen updated HDDS-3802:
-----------------------------
    Description: 
A summary of s3 big file download result with Apri 22th master branch code,
1. download with aws s3 sdk, md5 sum is different
2. download with "ozone fs -get o3fs://",  md5 sum is different
3. download with "ozone sh key get", md5 sum is the same as the local file
So it seems the issue is from the read part.  And the md5sum result of step 1. 
and step 2. are also different from each other. (edited) 

The difference behaviors are caused by different read buffer size of different 
interface. If the read buffer size equals to chunk size, then fine. If the read 
buffer size is smaller than chunk size, then content returned is incorrent, 
because datanode side read ignore the offset in request, use 0 as offset to 
read the data.


FilePerChunkStrategy#readChunk 


{code:java}
// use offset only if file written by old datanode
        long offset;
        if (file.exists() && file.length() == info.getOffset() + len) {
          offset = info.getOffset();
        } else {
          offset = 0;   ---> this line causes the trouble. 
        }
{code}


  was:
FilePerChunkStrategy#readChunk 


{code:java}
// use offset only if file written by old datanode
        long offset;
        if (file.exists() && file.length() == info.getOffset() + len) {
          offset = info.getOffset();
        } else {
          offset = 0;   ---> this line causes the trouble. 
        }
{code}



> Incorrect data returned by reading a FILE_PER_CHUNK block
> ---------------------------------------------------------
>
>                 Key: HDDS-3802
>                 URL: https://issues.apache.org/jira/browse/HDDS-3802
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Critical
>              Labels: pull-request-available
>
> A summary of s3 big file download result with Apri 22th master branch code,
> 1. download with aws s3 sdk, md5 sum is different
> 2. download with "ozone fs -get o3fs://",  md5 sum is different
> 3. download with "ozone sh key get", md5 sum is the same as the local file
> So it seems the issue is from the read part.  And the md5sum result of step 
> 1. and step 2. are also different from each other. (edited) 
> The difference behaviors are caused by different read buffer size of 
> different interface. If the read buffer size equals to chunk size, then fine. 
> If the read buffer size is smaller than chunk size, then content returned is 
> incorrent, because datanode side read ignore the offset in request, use 0 as 
> offset to read the data.
> FilePerChunkStrategy#readChunk 
> {code:java}
> // use offset only if file written by old datanode
>         long offset;
>         if (file.exists() && file.length() == info.getOffset() + len) {
>           offset = info.getOffset();
>         } else {
>           offset = 0;   ---> this line causes the trouble. 
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-3802) Incorrect data returned by reading a FILE_PER_CHUNK block

Reply via email to