subject:"\[jira\] \[Commented\] \(HBASE\-18771\) Incorrect StoreFileRefresh leading to split and compaction failures"

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164249#comment-16164249
 ] 

Hudson commented on HBASE-18771:


FAILURE: Integrated in Jenkins build HBase-2.0 #506 (See 
[https://builds.apache.org/job/HBase-2.0/506/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev a797fe0daa18ad2105762f0cc89c4f5c43537a6f)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-13 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164219#comment-16164219
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


Thanks everyone for reviewing and committing :)

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164198#comment-16164198
 ] 

Hudson commented on HBASE-18771:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3707 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3707/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev 3df0351f2280bb914093d6fe3a69e246ae821617)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164176#comment-16164176
 ] 

Hudson commented on HBASE-18771:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #286 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/286/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev 660427ce2ac071bcdca79c9e70a7390e14317eab)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164131#comment-16164131
 ] 

Hudson commented on HBASE-18771:


FAILURE: Integrated in Jenkins build HBase-1.4 #911 (See 
[https://builds.apache.org/job/HBase-1.4/911/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev b6aa23eefa02ca1be7a43b2ba85bd4327db40abc)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164118#comment-16164118
 ] 

Hudson commented on HBASE-18771:


FAILURE: Integrated in Jenkins build HBase-1.5 #55 (See 
[https://builds.apache.org/job/HBase-1.5/55/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev 432ca7e3fba92bf3a7c65bfff2b4b120b3238385)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164041#comment-16164041
 ] 

Hudson commented on HBASE-18771:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #205 (See 
[https://builds.apache.org/job/HBase-1.3-IT/205/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev 660427ce2ac071bcdca79c9e70a7390e14317eab)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164031#comment-16164031
 ] 

Hudson commented on HBASE-18771:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #276 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/276/])
HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction 
(apurtell: rev 660427ce2ac071bcdca79c9e70a7390e14317eab)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163864#comment-16163864
 ] 

Andrew Purtell commented on HBASE-18771:


If tests look good here, I will commit from master back to branch-1.3. 

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163863#comment-16163863
 ] 

Andrew Purtell commented on HBASE-18771:


The changes look good to me but there are a lot of failing tests in the 
precommit report. Let me try this out locally. 


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Ashu Pachauri (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163498#comment-16163498
 ] 

Ashu Pachauri commented on HBASE-18771:
---

+1 on the latest patch.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, 
> HBASE-18771.master.003.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162895#comment-16162895
 ] 

Hadoop QA commented on HBASE-18771:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
59s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
33s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
41m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 25s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.client.TestScanWithoutFetchingData |
|   | org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence |
|   | org.apache.hadoop.hbase.client.TestAsyncReplicationAdminApi |
|   | org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries |
|   | org.apache.hadoop.hbase.client.TestAsyncTableBatch |
|   | org.apache.hadoop.hbase.client.TestAsyncNamespaceAdminApi |
|   | org.apache.hadoop.hbase.client.TestAsyncProcedureAdminApi |
|   | org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | org.apache.hadoop.hbase.TestMultiVersions |
|   | org.apache.hadoop.hbase.client.TestAsyncSnapshotAdminApi |
|   | org.apache.hadoop.hbase.client.TestAsyncNonMetaRegionLocator |
|   | org.apache.hadoop.hbase.quotas.TestQuotaStatusRPCs |
|   | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLablesWithGroups |
|   | org.apache.hadoop.hbase.client.TestAsyncClusterAdminApi |
|   | org.apache.hadoop.hbase.security.access.TestAccessController2 |
|   | org.apache.hadoop.hbase.client.TestAsyncTableScanRenewLease |
|   |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162834#comment-16162834
 ] 

Hadoop QA commented on HBASE-18771:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
29s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
26s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 50s{color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
46s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:b3a2a00 |
| JIRA Issue | HBASE-18771 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886601/HBASE-18771.branch-1.3.005.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux d0c4a79a38d6 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.3 / ae6ff50 |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162746#comment-16162746
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


Thanks for the patch. I am +1. Lets see what [~ashu210890] says. 

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, 
> HBASE-18771.master.001.patch, HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162723#comment-16162723
 ] 

Anoop Sam John commented on HBASE-18771:


bq.MetaTableAccessor#splitRegion although the description of the method says 
"Does not add the location information to the daughter regions since they are 
not open yet.". 
So that looks like a bug..  May be some later jiras  by mistake did this?  U 
can try check. +1 to open a jira and discuss there. 

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, 
> HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162661#comment-16162661
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


[~abhishek.chouhan]
Thanks for the update patch based on [~ashu210890]'s comments.
I think if the assert has to be exact then probably get access to the region 
server from the hbase testing utility
{code}
HTU.getMiniHBaseCluster().getRegionServer(0);
{code}
since you have 3 region servers you may have to check across all 3 and then 
find if the number of online region is equal to what you expect. This would be 
better. 

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, 
> HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-12 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162624#comment-16162624
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


Actually 
bq. return (numRegionsAfterSplit == numRegionsBeforeSplit + 1 && 
admin.isTableAvailable(TEST_TABLE));

results in a false positive. Just before the daughter regions are opened, 
admin.getTableRegions returns the daughter regions(which is 
numRegionsBeforeSplit +1 since parent is offline and excluded) and 
admin.isTableAvailable returns true even before the daughter regions are 
actually opened. This is because isTableAvailable checks if it can 
getServerName for the meta entries, which its able to for the daughter regions. 
Server location are added to the meta entries in MetaTableAccessor#splitRegion 
although the description of the method says "Does not add the location 
information to the daughter regions since they are not open yet.". So basically 
isTableAvailable gives us a false positive when the parent is offline and the 
daughters are not yet open. I can open a bug for this, wdyt [~ashu210890] 
[~apurtell] [~anoop.hbase]

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, 
> HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
>

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-11 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162523#comment-16162523
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


I see. I was under the wrong impression that Admin#getTableRegions returns 
online regions. Thanks [~ashu210890] , let me correct the test as per your 
suggestion.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, 
> HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-11 Thread Ashu Pachauri (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161860#comment-16161860
 ] 

Ashu Pachauri commented on HBASE-18771:
---

[~abhishek.chouhan] Thanks for the updated patch. However, just checking the 
number of regions is not enough. This check will still pass even if the 
daughter regions get stuck in FAILED_OPEN due to a FNFE because 
Admin#getTableRegions returns all the regions. The correct check is that the 
number of ONLINE regions after the split is one larger than the number of 
ONLINE regions before the split.
{code}
+  int numRegionsBeforeSplit = admin.getTableRegions(TEST_TABLE).size();
+  // Check if we can successfully split after compaction
+  
admin.splitRegion(admin.getTableRegions(TEST_TABLE).get(0).getEncodedNameAsBytes(),
 ROW_C);
+  Thread.sleep(1);
+  int numRegionsAfterSplit = admin.getTableRegions(TEST_TABLE).size();
+  assertEquals(numRegionsAfterSplit, numRegionsBeforeSplit + 1);
{code}

I think the following check will suffice:
{code}
assertTrue(admin.isTableAvailable(TEST_TABLE));
final int numRegionsBeforeSplit = admin.getTableRegions(TEST_TABLE).size();
admin.splitRegion(admin.getTableRegions(TEST_TABLE).get(0).getEncodedNameAsBytes(),
 ROW_C);
util.waitFor(2, new Waiter.Predicate() {
  @Override
  public boolean evaluate() throws Exception {
int numRegionsAfterSplit = admin.getTableRegions(TEST_TABLE).size();
// Make sure that the split went through and all the regions are assigned
return (numRegionsAfterSplit == numRegionsBeforeSplit + 1 && 
admin.isTableAvailable(TEST_TABLE));
  }
});
{code} 

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, 
> HBASE-18771.master.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
>

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161492#comment-16161492
 ] 

Hadoop QA commented on HBASE-18771:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
13s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
17m 55s{color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}393m 13s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  3m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}431m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestAdmin2 |
|   | hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort |
|   | hadoop.hbase.snapshot.TestRegionSnapshotTask |
|   | hadoop.hbase.client.TestAdmin1 |
| Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient |
|   | org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS |
|   | org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat |
|   | org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile |
|   | org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy 
|
|   |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161122#comment-16161122
 ] 

Hadoop QA commented on HBASE-18771:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
55s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
36m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m  
2s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:5d60123 |
| JIRA Issue | HBASE-18771 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886375/HBASE-18771.master.002.patch
 |
| Optional Tests |  asflicense  shadedjars  javac  javadoc  unit  findbugs  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 56935e5fc0cc 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 45ba696 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/8557/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/8557/console |
| Powered by | Apache Yetus 0.4.0   http://yetus.apache.org |


This message was automatically generated.



> Incorrect StoreFileRefresh leading to split and compaction failures
>

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-11 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160849#comment-16160849
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


Thanks for reviewing [~ashu210890]. Yep, check based on number of regions would 
be better, let me add that and put up a patch.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.master.001.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Ashu Pachauri (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159633#comment-16159633
 ] 

Ashu Pachauri commented on HBASE-18771:
---

[~yuzhih...@gmail.com]  Yes, we should do this as a follow up of  HBASE-18186, 
i.e. once we get rid of FNFEs that have silently crept in over time due to 
other changes. I have filed HBASE-18786 to figure out what's the right way to 
approach this.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.master.001.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159586#comment-16159586
 ] 

Ted Yu commented on HBASE-18771:


bq. that HStore#refreshStoreFiles should never be triggered for primary region 
replicas

How about adding check in HStore#refreshStoreFiles() for the above (can be done 
in another JIRA) ?

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.master.001.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Ashu Pachauri (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159582#comment-16159582
 ] 

Ashu Pachauri commented on HBASE-18771:
---

On a parallel note, echoing everyone else's opinion here that 
HStore#refreshStoreFiles should never be triggered for primary region replicas. 
This not only indicates an underlying problem (FNFE on scan path) but also may 
cause one to silently overlook some nasty dataloss scenarios.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.master.001.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Ashu Pachauri (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159571#comment-16159571
 ] 

Ashu Pachauri commented on HBASE-18771:
---

Nice find! The patch looks generally good. There is one nitpick in the tests 
though:

{code:java}
+  // Split at this point should not result in the RS being aborted
+  
assertEquals(util.getMiniHBaseCluster().getLiveRegionServerThreads().size(), 3);
{code}
The RS abort may not happen every time a split fails depending on where in the 
process the split transaction failed (or it rolled back). Also, the RS may come 
back up after aborting at the exact same time when this check happens and abort 
again. A more robust check is on the number of online regions for the table, 
which should be exactly one more than the previously online regions.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, 
> HBASE-18771.master.001.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158614#comment-16158614
 ] 

Hadoop QA commented on HBASE-18771:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
34m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 17s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
56s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.regionserver.TestSplitLogWorker |
|   | org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy 
|
|   | org.apache.hadoop.hbase.regionserver.TestCompaction |
|   | org.apache.hadoop.hbase.master.TestGetLastFlushedSequenceId |
|   | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 |
|   | org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook |
|   | org.apache.hadoop.hbase.regionserver.TestTimestampFilterSeekHint |
|   | org.apache.hadoop.hbase.wal.TestWALFiltering |
|   | org.apache.hadoop.hbase.master.TestGetInfoPort |
|   | org.apache.hadoop.hbase.regionserver.TestColumnSeeking |
|   | org.apache.hadoop.hbase.regionserver.TestRegionServerAbort |
|   | org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort |
|   | org.apache.hadoop.hbase.regionserver.TestRegionIncrement |
|   | org.apache.hadoop.hbase.regionserver.TestWALLockup |
|   | org.apache.hadoop.hbase.master.TestTableStateManager |
|   | org.apache.hadoop.hbase.regionserver.TestWalAndCompactingMemStoreFlush |
|   | org.apache.hadoop.hbase.master.assignment.TestAssignmentOnRSCrash |
|   | org.apache.hadoop.hbase.master.TestSplitLogManager |
|   | org.apache.hadoop.hbase.master.balancer.TestFavoredNodeTableImport |
|   | org.apache.hadoop.hbase.regionserver.TestSplitWalDataLoss |
|   | org.apache.hadoop.hbase.master.TestRollingRestart |
|   | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer |
|   |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158367#comment-16158367
 ] 

Hadoop QA commented on HBASE-18771:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 47s{color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 
23s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}117m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:b3a2a00 |
| JIRA Issue | HBASE-18771 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886008/HBASE-18771.branch-1.3.003.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 413198e531bd 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.3 / 6fcb15f |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158217#comment-16158217
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


Thanks [~ram_krish]
bq. One quesiton on the test case - does this test case fail always without 
patch or you need to introduce some locking / new threads to introduce the 
actual bug? Because all are sequential in the tests now and what if 
closeAndArchiveCompactedFiles() gets completed and then the hr.compact(false) 
gets executed?
Test case always fails on branch-1.3 without the fix. In the test case 
closeAndArchiveCompactedFiles() explicitly gets completed before 
he.compact(false) gets called which does not find the file since its already 
archived.

bq. The refreshStoreFiles() API how ever needs the fix in 2.0 and trunk also 
right? So lets get it in branch-2 and above also?
Yes. I checked master and refreshStoreFiles() there is also the same, so i 
think we should fix there also. The test case for split failure passes on 
master due to differences in the way splits are done in the new assignment 
manager.

[~lhofhansl] had some good suggestions on using Collections.emptySet() inplace 
of new ArrayList. Putting up a new patch with those in.


> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-08 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158180#comment-16158180
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


I was thinking once again - The refreshStoreFiles() API how ever needs the fix 
in 2.0 and trunk also right? So lets get it in branch-2 and above also?

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158128#comment-16158128
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


+1 for patch. Thanks for the great find. Probably there is Phoenix CP here in 
this stack trace which could have resulted in this case of FNFE and also the 
code expects FNFE to occur in these cases so fixing refreshStoreFiles should 
solve multiple JIRAs raised related to this. One quesiton on the test case - 
does this test case fail always without patch or you need to introduce some 
locking / new threads to introduce the actual bug? Because all are sequential 
in the tests now and what if closeAndArchiveCompactedFiles() gets completed and 
then the hr.compact(false) gets executed? 
You can commit this to branch-1.3 and probably for branch-2 and above too if 
you can produce a test case.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158119#comment-16158119
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


[~mantonov], [~ashu210890]
This patch solves a major issue. Pls have a look. How ever the reason for FNFE 
is yet to be explored but this could help.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157804#comment-16157804
 ] 

stack commented on HBASE-18771:
---

Removed from hbase2 for now until demonstrated this issue is in hbase2. Thanks 
lads.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157175#comment-16157175
 ] 

Hadoop QA commented on HBASE-18771:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
10s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
26s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m  6s{color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}371m 15s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  4m 
33s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}442m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestRegionServerHostname |
|   | hadoop.hbase.regionserver.TestRecoveredEdits |
|   | hadoop.hbase.mapreduce.TestLoadIncrementalHFilesUseSecurityEndPoint |
|   | hadoop.hbase.mapreduce.TestLoadIncrementalHFiles |
|   | hadoop.hbase.client.TestAdmin2 |
|   | hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFiles |
|   | hadoop.hbase.replication.TestReplicationChangingPeerRegionservers |
| Timed out junit tests | 
org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS |
|   | org.apache.hadoop.hbase.master.TestMasterMetricsWrapper |
|   | org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat |
|   |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156973#comment-16156973
 ] 

Hadoop QA commented on HBASE-18771:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 56s{color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 
53s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:b3a2a00 |
| JIRA Issue | HBASE-18771 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12885798/HBASE-18771.branch-1.3.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux fddd83692a8a 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.3 / 6fcb15f |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156891#comment-16156891
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


Thanks for the update. I think if with your logs if we can drill down why scan 
is failing then I think more or less all cases will get handled across all 
versions :). I can help in the analysis too because 1.3 is not considered 
stable because of these bugs and these are lingering around without proper 
fixes.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156885#comment-16156885
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


bq. Are you using read replica feature? Then all this is possible.
No boss. This is without read replicas.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156880#comment-16156880
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


Didn't see your comment [~ram_krish] before i submitted mine :)
I didn't look much into how the scanner got FNFE, but it was definitely there 
in the logs. The way i reproduced the scenario was by running a load and also 
running a chaos monkey that did splits,merges,compactions,snapshot and also 
full table scans in parallel. I'll try to dig deeper into that too.
> So who is doing this incorrect refresh? Again is that due to scan failure?
Yes in the logs i had 
RpcServer.FifoWFPBQ.default.handler=87,queue=7,port=16020: callId: 12290 
service: ClientService methodName: Scan size: 47 connection: 10.231.90.15:50148 
deadline: 1504528119024
org.apache.hadoop.hbase.UnknownScannerException: Throwing 
UnknownScannerException to reset the client scanner state for clients older 
than 1.3.
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3030)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35072)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2370)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
Caused by: java.io.IOException: unable to read store file
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6394)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6021)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6164)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5938)
at 
org.apache.phoenix.iterate.RegionScannerFactory$1.nextRaw(RegionScannerFactory.java:223)
at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:259)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2767)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2967)
... 5 more

Looks like with HBASE-17712 we no longer call refreshstorefiles in 
handlefilenot found, so this might not happen.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
>

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156866#comment-16156866
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


Are you using read replica feature? Then all this is possible.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156858#comment-16156858
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


Am just trying to understand the case better so that if there is any other 
route cause we should also fix that. Nothing wrong in the patch or suggestion. 
Two things
-> Some times scanners get FNFE. So why is this? Is it due to incorrect store 
file accounting ? Then we should fix it.
-> Split case failures. When we actually close the store we split the files 
that are from the present set of store files. The already compacted files are 
not included (in the normal flow). So the references files are created for the 
present list of files only. I checked this with 1.3 latest code. 
bq.(which also includes already compacted files due to incorrect refresh)
So who is doing this incorrect refresh? Again is that due to scan failure?  ( I 
agree the refresh method has to checked but am more concerned about who is 
calling it and why?).

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156849#comment-16156849
 ] 

ramkrishna.s.vasudevan commented on HBASE-18771:


Thanks [~abhishek.chouhan]. I had been trying hard to reproduce this case based 
on HBASE-18186 but with no success because I was not able to get the actual 
case and there were no actual logs to trace it out except for the ones in the 
JIRA description. 
bq.Now before the files are archived we get a FNFE in a scanner. 
Why is this happening? If you can tell me how this is happening then I can 
think the rest makes sense. Because when any scanner is ongoing we are not 
updating the store files nor we are creating a new scanner from the files that 
were compacted.
Also https://issues.apache.org/jira/browse/HBASE-17712 this seems not in 1.3. 

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156851#comment-16156851
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


The split scenario failure wont happen in master due to the changes related to 
AMv2. In 1.3 when we close the parent region we use the list of store files 
returned by the close method during splitting which is the list of open store 
files(which also includes already compacted files due to incorrect refresh), 
however in case of master we get the list of files from hdfs which will have 
correct files due to the cleanup that happens during close of parent region.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch, 
> HBASE-18771.branch-1.3.002.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156815#comment-16156815
 ] 

Ted Yu commented on HBASE-18771:


Nice finding.

Please add license to TestCompactionFileNotFound class.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18771.branch-1.3.001.patch
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures

2017-09-07 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156699#comment-16156699
 ] 

Abhishek Singh Chouhan commented on HBASE-18771:


Have created a new issue since this might be one of the reasons we're getting 
the other issues. We can mark them dup later if this satisfactorily resolves 
the others too.

> Incorrect StoreFileRefresh leading to split and compaction failures
> ---
>
> Key: HBASE-18771
> URL: https://issues.apache.org/jira/browse/HBASE-18771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3
>
>
> We ran into issues of compaction and split failures with 1.3 similar to 
> HBASE-18186 and HBASE-17406. Here's what i believe is happening -
> Lets say we have 4 store files that are compacted to form a new one. At this 
> point we now have 5 store files, however only 1(the newly formed) is open now 
> for the store and rest are waiting to get archived by HFileArchiver
> Now before the files are archived we get a FNFE in a scanner. This results in 
> HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) 
> being called which results in region.refreshStoreFiles(true) -> 
> HStore.refreshStoreFiles()
> HStore.refreshStoreFiles now checks the hdfs dir and adds the previously 
> compacted files back to the store, however these files are also present in 
> StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, 
> checks compactedFiles list and moves these files into the archive directory. 
> Now when compaction runs it gets:
> 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] 
> regionserver.CompactSplitThread - Compaction selection failed regionName = 
> , storeName = 0, priority = 26, time = 1504528213899
> java.io.FileNotFoundException: File does not exist: hdfs://
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
> at 
> org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
> at 
> org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)
> Similarly if a split happens after archival we fail after PONR while opening 
> daughter regions due to FNFE. This results in parent offline and daughters 
> also in a limbo since they're unable to open. Since we get the error after 
> PONR we also end up aborting the RS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

44 matches

Mail list logo