[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566990#comment-15566990
 ] 

Hudson commented on HBASE-16788:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1769 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1769/])
HBASE-16788 addendum Account for HStore archiveLock in heap size (garyh: rev 
bc7e034052dceddd21a8d45ea7b9b131f46a01f9)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788-addendum.patch, 
> HBASE-16788.001.patch, HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566979#comment-15566979
 ] 

Hudson commented on HBASE-16788:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #36 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/36/])
HBASE-16788 addendum Account for HStore archiveLock in heap size (garyh: rev 
cd3afa5a0d85751936c54fa2398b63ff2efa128c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788-addendum.patch, 
> HBASE-16788.001.patch, HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566768#comment-15566768
 ] 

Hudson commented on HBASE-16788:


SUCCESS: Integrated in Jenkins build HBase-1.4 #460 (See 
[https://builds.apache.org/job/HBase-1.4/460/])
HBASE-16788 addendum Account for HStore archiveLock in heap size (garyh: rev 
f13a21696f2bbd4f572eb35c15282835998d4b34)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788-addendum.patch, 
> HBASE-16788.001.patch, HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566546#comment-15566546
 ] 

Hudson commented on HBASE-16788:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #41 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/41/])
HBASE-16788 Guard HFile archiving under a separate lock (garyh: rev 
8eea3a5777a25907dcf6486bfeafd8482a072b80)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionArchiveConcurrentClose.java
HBASE-16788 addendum Account for HStore archiveLock in heap size (garyh: rev 
cd3afa5a0d85751936c54fa2398b63ff2efa128c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788-addendum.patch, 
> HBASE-16788.001.patch, HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-11 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564628#comment-15564628
 ] 

ramkrishna.s.vasudevan commented on HBASE-16788:


[~tedyu]
Sorry for not able to look at the patch on friday. Because it was a long 
weekend here and I could not check the patch fully wrt to the code.
Gary's patch is simple and blocking close()is fine because one of the thread 
should be doing that work. Hence seeing that patch I found it to be simple.
[~ghelmling]
Thanks for the patch and closing this important bug. Let me raise another IA to 
see if we can handle failures better wrt archiving.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564430#comment-15564430
 ] 

Hudson commented on HBASE-16788:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1764 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1764/])
HBASE-16788 Guard HFile archiving under a separate lock (garyh: rev 
7493e79f15e0a1217dc50ca4431d6ded07df479f)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionArchiveConcurrentClose.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564128#comment-15564128
 ] 

Hudson commented on HBASE-16788:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #35 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/35/])
HBASE-16788 Guard HFile archiving under a separate lock (garyh: rev 
8eea3a5777a25907dcf6486bfeafd8482a072b80)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionArchiveConcurrentClose.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563638#comment-15563638
 ] 

Mikhail Antonov commented on HBASE-16788:
-

When I first looked at the patch, my only concern was about if it makes RS 
stopping significantly slower. Looks like, based on the notes and comments 
above, it does not.

+1, LGTM.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563579#comment-15563579
 ] 

Ted Yu commented on HBASE-16788:


Since this is a blocker, I am fine with Gary's patch going in.

In my opinion, not introducing additional lock would facilitate future 
development. e.g. a feature as complex as HBASE-13082 comes along, the freedom 
would be greater for the respective developer.

Thanks

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563563#comment-15563563
 ] 

Gary Helmling commented on HBASE-16788:
---

{quote}
If I understand the code correctly, it would take longer for the close() to 
complete when concurrent CompactedHFilesDischargeHandler operation gets the 
archiveLock first.
If this is not a concern, I am fine with your patch.
{quote}

It's true that close() may be blocked by the discharge chore thread if it is 
holding the archiveLock.  But whether the work for archiving compacted HFiles 
is being done by the discharge thread or by close(), the same work needs to be 
done before close() can complete.  So I don't expect this to appreciably change 
the time taken by close().  It just means that if close() is blocked by the 
discharger, it should be able to skip over the archive step once it gets to run.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563243#comment-15563243
 ] 

Ted Yu commented on HBASE-16788:


bq. t seems problematic to modify the contents of the compactedfiles list (by 
setting archiving flags on individual files) in getCompactedfiles()

getCompactedfiles() is only called from closeAndArchiveCompactedFiles(). I 
think it is fine to add the flag.

bq. this code path is only called with the readlock where you don't typically 
expect mutations to happen
read lock is used in many other places where some mutation happens. e.g.
{code}
  public long add(final Cell cell) {
lock.readLock().lock();
try {
   return this.memstore.add(cell);
{code}
FYI


> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563093#comment-15563093
 ] 

Ted Yu commented on HBASE-16788:


Gary:
I did look at your patch (last Friday and today).

I looked at usage around closeAndArchiveCompactedFiles(). Three actions are 
related: merge, compaction and close.

If I understand the code correctly, it would take longer for the close() to 
complete when concurrent CompactedHFilesDischargeHandler operation gets the 
archiveLock first.
If this is not a concern, I am fine with your patch.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563020#comment-15563020
 ] 

Gary Helmling commented on HBASE-16788:
---

The failing tests reported by Hadoop QA are all passing locally after a rebase 
on to current master.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563011#comment-15563011
 ] 

Gary Helmling commented on HBASE-16788:
---

To elaborate, one thread holding archiveLock, and then obtaining 
lock.readLock() has no impact on the ability of other threads to obtain 
lock.readLock() outside of archiveLock.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563007#comment-15563007
 ] 

Gary Helmling commented on HBASE-16788:
---

[~ted_yu] did you look at the patch?  archiveLock is a separate lock from 
lock.readLock().  Yes, archiveLock is a mutex.  We do not need read/write lock 
semantics for the archiving case.  That has nothing to do with the existing 
ReentrantReadWriteLock usage, which is unchanged.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562973#comment-15562973
 ] 

Ted Yu commented on HBASE-16788:


>From the javadoc of ReentrantLock :
{code}
 * A reentrant mutual exclusion {@link Lock} with the same basic
 * behavior and semantics as the implicit monitor lock accessed using
 * {@code synchronized} methods and statements, but with extended
 * capabilities.
{code}
Can you elaborate how the read lock (under the archive lock) can be obtained by 
multiple readers ?

Thanks

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-10 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562850#comment-15562850
 ] 

Gary Helmling commented on HBASE-16788:
---

[~tedyu], adding the archiveLock does not change the semantics of the existing 
read/write lock.  Multiple concurrent readers can still obtain the read lock.  
However, only one thread would be able to read the list of compacted files and 
archive them under the archiveLock.

I also looked at your v2 patch.  It seems problematic to modify the contents of 
the compactedfiles list (by setting archiving flags on individual files) in 
getCompactedfiles(), since this code path is only called with the readlock 
where you don't typically expect mutations to happen.  I don't think this 
necessarily matters in this particular case, but I think it would introduce 
confusing semantics going forward.  It also adds a lot more complexity to the 
code that the simple lock I introduced around archiving.

Do you have any objections to my patch?

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557160#comment-15557160
 ] 

Ted Yu commented on HBASE-16788:


Gary:
I have a question about the archiveLock.
Previously two threads can get hold of HStore readlock simultaneously, right ?
With the governing archiveLock, is this characteristic still true ?

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557144#comment-15557144
 ] 

Hadoop QA commented on HBASE-16788:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 41m 37s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
2s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
36m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 119m 43s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 214m 47s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestDispatchMergingRegionsProcedure |
|   | hadoop.hbase.replication.TestMasterReplication |
| Timed out junit tests | 
org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS |
|   | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12832230/HBASE-16788.002.patch 
|
| JIRA Issue | HBASE-16788 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux d24953c438de 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 29d701a |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3887/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/3887/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3887/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556808#comment-15556808
 ] 

Ted Yu commented on HBASE-16788:


[~ghelmling]:
Did you have a chance to look at 16788-suggest.v2 ?

Your new test passes with 16788-suggest.v2.

Thanks

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788.002.patch, HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556045#comment-15556045
 ] 

Hadoop QA commented on HBASE-16788:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
51s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 55s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 58s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
9s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 3s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestRSStatusServlet |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12832162/HBASE-16788.001.patch 
|
| JIRA Issue | HBASE-16788 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux f4b1bd4bbc55 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / bc9a972 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3877/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/3877/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3877/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3877/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Race in compacted file deletion between 

[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1855#comment-1855
 ] 

ramkrishna.s.vasudevan commented on HBASE-16788:


Gary's patch is simple. Mine and his are same except for the fact he creates 
another lock to make things synchronized. I had a wait/notify mechanism +1 on 
it.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1836#comment-1836
 ] 

Ted Yu commented on HBASE-16788:


Gary:
Maybe you haven't looked at 16788-suggest.v2 where failure condition is handled 
by setting archiving flag to false for the files which don't pass archiving.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1821#comment-1821
 ] 

Gary Helmling commented on HBASE-16788:
---

Yes, as I described, the problem here is a race in the combination of obtaining 
the list of compacted files and archiving them.

Not handling rare failure conditions is not an option in my opinion.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1771#comment-1771
 ] 

Ted Yu commented on HBASE-16788:


w.r.t. Gary's patch, we obtain archiveLock and read/write locks.

I think my proposal introduces almost no additional contention.
Normally files failing archive is rare. Even in that case the accounting is 
correct.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: 16788-suggest.v2, HBASE-16788.001.patch, 
> HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1652#comment-1652
 ] 

ramkrishna.s.vasudevan commented on HBASE-16788:


Went thro your patch.  Thanks Ted.
I tried a similar one in the  morning where once the reader is closed we mark 
it as closed. But the problem is that if the archive fails due to some reason 
and the same set of files are selected again that time we will miss those files 
as we have already marked them as archived.



> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1299#comment-1299
 ] 

Ted Yu commented on HBASE-16788:


http://pastebin.com/UtTbYxj1 shows what I meant (just a draft).

TestCompactedHFilesDischarger passes with Ram's new tests.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1058#comment-1058
 ] 

Ted Yu commented on HBASE-16788:


compactedfiles is currently a List. For closeAndArchiveCompactedFiles(), we can 
let it signal that call to getCompactedfiles() is for archiving.

Maybe we can change compactedfiles to a Map. closeAndArchiveCompactedFiles() 
would obtain compacted files and set values for the StoreFile to be true 
(meaning to be archived) before releasing the read lock.

HStore.close() would notice the value for StoreFile and only choose the entry 
with value of false.

My two cents.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554869#comment-15554869
 ] 

Hadoop QA commented on HBASE-16788:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
4s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 55s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 59s 
{color} | {color:red} hbase-server generated 2 new + 0 unchanged - 0 fixed = 2 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 13s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
12s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 119m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Unconditional wait in 
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection, 
boolean)  At HStore.java: At HStore.java:[line 2400] |
|  |  Wait not in loop in 
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection, 
boolean)  At 
HStore.java:org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection,
 boolean)  At HStore.java:[line 2400] |
| Timed out junit tests | 
org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure |
|   | 
org.apache.hadoop.hbase.master.procedure.TestMasterProcedureSchedulerConcurrency
 |
|   | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12832110/HBASE-16788_1.patch |
| JIRA Issue | HBASE-16788 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 8fa05032b93e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 96d34f2 |
| Default 

[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554643#comment-15554643
 ] 

ramkrishna.s.vasudevan commented on HBASE-16788:


[~ghelmling]
Just saw that you have assigned this to you. So feel free to come up with your 
patch if you have a better one. Thank you.

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
> Attachments: HBASE-16788_1.patch
>
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554188#comment-15554188
 ] 

ramkrishna.s.vasudevan commented on HBASE-16788:


bq.Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
readlock and move the call to removeCompactedfiles() inside the lock. This 
means the read operations will be blocked while the files are being archived, 
which is bad.
Yes. That is why went ahead with readLock.
bq.Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
instead of calling removeCompactedfiles() directly
If we do this, then the clearCompactedfiles inside removeCompactedfiles should 
not take the write lock once again or else it may lead to dead lock. But may be 
we need to handle it better.Will see

bq.Add a separate lock for compacted files removal and use in 
closeAndArchiveCompactedFiles() and close()
Yes. Seems to be good soln. Let me check if there is some other way out of it. 

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554110#comment-15554110
 ] 

ramkrishna.s.vasudevan commented on HBASE-16788:


checking this now. thanks for reporting [~ghelmling]

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-06 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553536#comment-15553536
 ] 

Gary Helmling commented on HBASE-16788:
---

[~ramkrishna.s.vasude...@gmail.com], you have any thoughts on this?

> Race in compacted file deletion between HStore close() and 
> closeAndArchiveCompactedFiles()
> --
>
> Key: HBASE-16788
> URL: https://issues.apache.org/jira/browse/HBASE-16788
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.3.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Blocker
>
> HBASE-13082 changed the way that compacted files are archived from being done 
> inline on compaction completion to an async cleanup by the 
> CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
> support this introduced a race condition in the compacted HFile archiving.
> In the following sequence, we can wind up with two separate threads trying to 
> archive the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to 
> {{compactedfiles}} in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
> calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of 
> same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> 
> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been 
> archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the 
> files needs to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
> readlock and move the call to removeCompactedfiles() inside the lock.  This 
> means the read operations will be blocked while the files are being archived, 
> which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
> instead of calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in 
> closeAndArchiveCompactedFiles() and close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)