[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-7682:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Charlie, and thanks also to Jing for the 
reviews.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: 2.7.0

 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch, HDFS-7682.003.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-02-11 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.003.patch

[~jingzhao],

Thanks for the comments. I think the latest patch address them by changing the 
test to a check for the src path being a snapshotted file.

Charles


 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch, HDFS-7682.003.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-02-05 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.002.patch

Rebased.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-27 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.001.patch

Hi [~jingzhao],

Thanks for looking at this.

isLastBlockComplete() covers the case where it's a snapshot path as well as a 
closed non-snapshot path. The file length is correct in both those cases so 
it's ok to use that. In the case of a still-being-written file, then 
isLastBlockComplete() returns false and the code works just same as it does 
today. The particular case that this patch is fixing is that a snapshotted file 
is frozen, so the file length is the limit of what should be checksummed, not 
the block lengths (which include the non-snapshotted portion). I've added more 
assertions in the test to demonstrate this.

In other words, the behavior for non-snapshotted files that are still open (and 
possibly being appended to) is not changed by this patch, only that of 
snapshotted files, for which isLastBlockComplete() is a valid check.

HDFS-5343 took a similar approach.


 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-26 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Status: Patch Available  (was: Open)

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-01-26 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
---
Attachment: HDFS-7682.000.patch

Posting patch for a jenkins run.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)