[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-08-08 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118929#comment-16118929
 ] 

Ashu Pachauri commented on HBASE-18399:
---

[~ram_krish] Yes, this case is also covered in the tests for HBASE-18398. The 
patch for that ensures that compacted files for a store are not moved to the 
archive unless they are written to the region manifest on the file system. This 
ensures that the SnapshotFileCache (and in turn the HFileCleaner) never has a 
stale view of the files it's considering to delete.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-08-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118704#comment-16118704
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


With the  patch in HBASE-18398 I hope this issue will also be solved? 

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099817#comment-16099817
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


Thanks for the comment. Looking forward for the patch.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-24 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098961#comment-16098961
 ] 

Ashu Pachauri commented on HBASE-18399:
---

[~ram_krish] I am sorry I did not see your previous comment. I actually started 
working on HBASE-18398, which after some investigation seems to experience the 
same underlying problem as this issue: The snapshot operation is done under a 
region level read lock while the active store file list is updated under the 
store level lock. This means that, as you suggested, it could very well happen 
prior to 1.3, and I don't have a concrete explanation as to why it did no 
happen (or was not noticeable). One reason could be that in branch-1.3, the 
archival happens asynchronously by the HFileArchiver as opposed to being done 
on compaction path prior to branch-1.3.

I am working on solution to HBASE-18398 which, I believe should be able to fix 
this too.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098206#comment-16098206
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


Reading the branch-1.0 which does not have store file accounting I still feel 
this could have occurred even there, because on compaction the compacted files 
where getting removed from the StorefileManager's storefile list.
And those compacted files were  moved to the archive dir.
Now when the Snapshot manifest has got its list of store files - there is 
definitely a chance that by the time the manifest is updated the actual file is 
moved to archive and the above steps could happen. How ever to prove let me 
write a test case and then we can check for the fix here.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097950#comment-16097950
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


Since [~ashu210890] has not replied I will assign this to myself and provide a 
patch. Let me know if you have already started working on this.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-19 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094194#comment-16094194
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


[~ashu210890]
If you are not working on this I can check this.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-19 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092823#comment-16092823
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


bq.No, because the sole reason why a store file is moved to archive while in 
the middle of a snapshot operation is the lack of proper locking between the 
snapshot operation and finalizing the compaction results.
:)
I should have been more explicit. My point was previously before store file 
accounting - after a compaction was done and the files were moved to archive 
was there a synchronization available between snapshot and archival? I am not 
sure about this and hence I asked that. 

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-19 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092738#comment-16092738
 ] 

Ashu Pachauri commented on HBASE-18399:
---

bq. Ya so the file is in archive. The remaining steps are irrespective of the 
store file accounting feature right?
Yes and no. Yes because the remaining steps have nothing to do with store file 
accounting. No, because the sole reason why a store file is moved to archive 
while in the middle of a snapshot operation is the lack of proper locking 
between the snapshot operation and finalizing the compaction results.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully

2017-07-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091390#comment-16091390
 ] 

ramkrishna.s.vasudevan commented on HBASE-18399:


bq.store_file_A is marked as compacted away and HFileArchiver moves the file to 
archive.
Ya so the file is in archive. The remaining steps are irrespective of the store 
file accounting feature right? Just asking.

> Files in a snapshot can go missing even after the snapshot is taken 
> successfully
> 
>
> Key: HBASE-18399
> URL: https://issues.apache.org/jira/browse/HBASE-18399
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Ashu Pachauri
> Fix For: 1.3.2
>
>
> Files missing after the snapshot is taken (only applicable when the TTL for 
> the TimeToLiveHFileCleaner is small, like the default 5 mins)
> * SnapshotManifest#addRegion visits store_file_A, but is yet to write it 
> to the manifest.
> * store_file_A is marked as compacted away and HFileArchiver moves the 
> file to archive.
> * HFileCleaner comes in and sees the store_file_A in archive. It adds the 
> file to the list of files that might need to be cleaned up.
> * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
> * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is 
> unreferenced and should be cleaned up (It has not yet been written to the 
> manifest).
> * SnapshotHFileCleaner is still going through rest of the files in 
> archive.
> * store_file_A reference is created and written to snapshot manifest.
> * Snapshot verification runs and sees the store_file_A is present in 
> archive, and thus the verification passes.
> * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is 
> triggered. If TTL has passed since the store_file_A was moved to archive 
> (SnapshotHFileCleaner could take easily several minutes to go through rest of 
> the files), the TimeToLiveHFileCleaner also marks the file as deletable.
> * Since all cleaner plugins marked file as deletable, the store_file_A is 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)