[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164249#comment-16164249 ] Hudson commented on HBASE-18771: FAILURE: Integrated in Jenkins build HBase-2.0 #506 (See [https://builds.apache.org/job/HBase-2.0/506/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev a797fe0daa18ad2105762f0cc89c4f5c43537a6f) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164219#comment-16164219 ] Abhishek Singh Chouhan commented on HBASE-18771: Thanks everyone for reviewing and committing :) > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164198#comment-16164198 ] Hudson commented on HBASE-18771: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3707 (See [https://builds.apache.org/job/HBase-Trunk_matrix/3707/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev 3df0351f2280bb914093d6fe3a69e246ae821617) * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164176#comment-16164176 ] Hudson commented on HBASE-18771: SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #286 (See [https://builds.apache.org/job/HBase-1.3-JDK8/286/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev 660427ce2ac071bcdca79c9e70a7390e14317eab) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164131#comment-16164131 ] Hudson commented on HBASE-18771: FAILURE: Integrated in Jenkins build HBase-1.4 #911 (See [https://builds.apache.org/job/HBase-1.4/911/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev b6aa23eefa02ca1be7a43b2ba85bd4327db40abc) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164118#comment-16164118 ] Hudson commented on HBASE-18771: FAILURE: Integrated in Jenkins build HBase-1.5 #55 (See [https://builds.apache.org/job/HBase-1.5/55/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev 432ca7e3fba92bf3a7c65bfff2b4b120b3238385) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164041#comment-16164041 ] Hudson commented on HBASE-18771: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #205 (See [https://builds.apache.org/job/HBase-1.3-IT/205/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev 660427ce2ac071bcdca79c9e70a7390e14317eab) * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164031#comment-16164031 ] Hudson commented on HBASE-18771: FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #276 (See [https://builds.apache.org/job/HBase-1.3-JDK7/276/]) HBASE-18771 Incorrect StoreFileRefresh leading to split and compaction (apurtell: rev 660427ce2ac071bcdca79c9e70a7390e14317eab) * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionFileNotFound.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163864#comment-16163864 ] Andrew Purtell commented on HBASE-18771: If tests look good here, I will commit from master back to branch-1.3. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163863#comment-16163863 ] Andrew Purtell commented on HBASE-18771: The changes look good to me but there are a lot of failing tests in the precommit report. Let me try this out locally. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163498#comment-16163498 ] Ashu Pachauri commented on HBASE-18771: --- +1 on the latest patch. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch, > HBASE-18771.master.003.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162895#comment-16162895 ] Hadoop QA commented on HBASE-18771: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 59s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 33s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 20s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 41m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 25s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.client.TestScanWithoutFetchingData | | | org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence | | | org.apache.hadoop.hbase.client.TestAsyncReplicationAdminApi | | | org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries | | | org.apache.hadoop.hbase.client.TestAsyncTableBatch | | | org.apache.hadoop.hbase.client.TestAsyncNamespaceAdminApi | | | org.apache.hadoop.hbase.client.TestAsyncProcedureAdminApi | | | org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics | | | org.apache.hadoop.hbase.TestMultiVersions | | | org.apache.hadoop.hbase.client.TestAsyncSnapshotAdminApi | | | org.apache.hadoop.hbase.client.TestAsyncNonMetaRegionLocator | | | org.apache.hadoop.hbase.quotas.TestQuotaStatusRPCs | | | org.apache.hadoop.hbase.security.visibility.TestVisibilityLablesWithGroups | | | org.apache.hadoop.hbase.client.TestAsyncClusterAdminApi | | | org.apache.hadoop.hbase.security.access.TestAccessController2 | | | org.apache.hadoop.hbase.client.TestAsyncTableScanRenewLease | | |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162834#comment-16162834 ] Hadoop QA commented on HBASE-18771: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 50s{color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 46s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:b3a2a00 | | JIRA Issue | HBASE-18771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886601/HBASE-18771.branch-1.3.005.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux d0c4a79a38d6 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/hbase.sh | | git revision | branch-1.3 / ae6ff50 | | Default Java | 1.7.0_131 | | Multi-JDK versions |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162746#comment-16162746 ] ramkrishna.s.vasudevan commented on HBASE-18771: Thanks for the patch. I am +1. Lets see what [~ashu210890] says. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.branch-1.3.005.patch, > HBASE-18771.master.001.patch, HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162723#comment-16162723 ] Anoop Sam John commented on HBASE-18771: bq.MetaTableAccessor#splitRegion although the description of the method says "Does not add the location information to the daughter regions since they are not open yet.". So that looks like a bug.. May be some later jiras by mistake did this? U can try check. +1 to open a jira and discuss there. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, > HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162661#comment-16162661 ] ramkrishna.s.vasudevan commented on HBASE-18771: [~abhishek.chouhan] Thanks for the update patch based on [~ashu210890]'s comments. I think if the assert has to be exact then probably get access to the region server from the hbase testing utility {code} HTU.getMiniHBaseCluster().getRegionServer(0); {code} since you have 3 region servers you may have to check across all 3 and then find if the number of online region is equal to what you expect. This would be better. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, > HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162624#comment-16162624 ] Abhishek Singh Chouhan commented on HBASE-18771: Actually bq. return (numRegionsAfterSplit == numRegionsBeforeSplit + 1 && admin.isTableAvailable(TEST_TABLE)); results in a false positive. Just before the daughter regions are opened, admin.getTableRegions returns the daughter regions(which is numRegionsBeforeSplit +1 since parent is offline and excluded) and admin.isTableAvailable returns true even before the daughter regions are actually opened. This is because isTableAvailable checks if it can getServerName for the meta entries, which its able to for the daughter regions. Server location are added to the meta entries in MetaTableAccessor#splitRegion although the description of the method says "Does not add the location information to the daughter regions since they are not open yet.". So basically isTableAvailable gives us a false positive when the parent is offline and the daughters are not yet open. I can open a bug for this, wdyt [~ashu210890] [~apurtell] [~anoop.hbase] > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, > HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after >
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162523#comment-16162523 ] Abhishek Singh Chouhan commented on HBASE-18771: I see. I was under the wrong impression that Admin#getTableRegions returns online regions. Thanks [~ashu210890] , let me correct the test as per your suggestion. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, > HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161860#comment-16161860 ] Ashu Pachauri commented on HBASE-18771: --- [~abhishek.chouhan] Thanks for the updated patch. However, just checking the number of regions is not enough. This check will still pass even if the daughter regions get stuck in FAILED_OPEN due to a FNFE because Admin#getTableRegions returns all the regions. The correct check is that the number of ONLINE regions after the split is one larger than the number of ONLINE regions before the split. {code} + int numRegionsBeforeSplit = admin.getTableRegions(TEST_TABLE).size(); + // Check if we can successfully split after compaction + admin.splitRegion(admin.getTableRegions(TEST_TABLE).get(0).getEncodedNameAsBytes(), ROW_C); + Thread.sleep(1); + int numRegionsAfterSplit = admin.getTableRegions(TEST_TABLE).size(); + assertEquals(numRegionsAfterSplit, numRegionsBeforeSplit + 1); {code} I think the following check will suffice: {code} assertTrue(admin.isTableAvailable(TEST_TABLE)); final int numRegionsBeforeSplit = admin.getTableRegions(TEST_TABLE).size(); admin.splitRegion(admin.getTableRegions(TEST_TABLE).get(0).getEncodedNameAsBytes(), ROW_C); util.waitFor(2, new Waiter.Predicate() { @Override public boolean evaluate() throws Exception { int numRegionsAfterSplit = admin.getTableRegions(TEST_TABLE).size(); // Make sure that the split went through and all the regions are assigned return (numRegionsAfterSplit == numRegionsBeforeSplit + 1 && admin.isTableAvailable(TEST_TABLE)); } }); {code} > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.branch-1.3.004.patch, HBASE-18771.master.001.patch, > HBASE-18771.master.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at >
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161492#comment-16161492 ] Hadoop QA commented on HBASE-18771: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 13s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 17m 55s{color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}393m 13s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 3m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}431m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestAdmin2 | | | hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort | | | hadoop.hbase.snapshot.TestRegionSnapshotTask | | | hadoop.hbase.client.TestAdmin1 | | Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient | | | org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS | | | org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat | | | org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile | | | org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy | | |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161122#comment-16161122 ] Hadoop QA commented on HBASE-18771: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 55s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 36m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 2s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:5d60123 | | JIRA Issue | HBASE-18771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886375/HBASE-18771.master.002.patch | | Optional Tests | asflicense shadedjars javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 56935e5fc0cc 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 45ba696 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/8557/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/8557/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Incorrect StoreFileRefresh leading to split and compaction failures >
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160849#comment-16160849 ] Abhishek Singh Chouhan commented on HBASE-18771: Thanks for reviewing [~ashu210890]. Yep, check based on number of regions would be better, let me add that and put up a patch. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.master.001.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159633#comment-16159633 ] Ashu Pachauri commented on HBASE-18771: --- [~yuzhih...@gmail.com] Yes, we should do this as a follow up of HBASE-18186, i.e. once we get rid of FNFEs that have silently crept in over time due to other changes. I have filed HBASE-18786 to figure out what's the right way to approach this. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.master.001.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159586#comment-16159586 ] Ted Yu commented on HBASE-18771: bq. that HStore#refreshStoreFiles should never be triggered for primary region replicas How about adding check in HStore#refreshStoreFiles() for the above (can be done in another JIRA) ? > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.master.001.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159582#comment-16159582 ] Ashu Pachauri commented on HBASE-18771: --- On a parallel note, echoing everyone else's opinion here that HStore#refreshStoreFiles should never be triggered for primary region replicas. This not only indicates an underlying problem (FNFE on scan path) but also may cause one to silently overlook some nasty dataloss scenarios. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.master.001.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159571#comment-16159571 ] Ashu Pachauri commented on HBASE-18771: --- Nice find! The patch looks generally good. There is one nitpick in the tests though: {code:java} + // Split at this point should not result in the RS being aborted + assertEquals(util.getMiniHBaseCluster().getLiveRegionServerThreads().size(), 3); {code} The RS abort may not happen every time a split fails depending on where in the process the split transaction failed (or it rolled back). Also, the RS may come back up after aborting at the exact same time when this check happens and abort again. A more robust check is on the number of online regions for the table, which should be exactly one more than the previously online regions. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch, HBASE-18771.branch-1.3.003.patch, > HBASE-18771.master.001.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158614#comment-16158614 ] Hadoop QA commented on HBASE-18771: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 34m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 17s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 56s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.regionserver.TestSplitLogWorker | | | org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy | | | org.apache.hadoop.hbase.regionserver.TestCompaction | | | org.apache.hadoop.hbase.master.TestGetLastFlushedSequenceId | | | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 | | | org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook | | | org.apache.hadoop.hbase.regionserver.TestTimestampFilterSeekHint | | | org.apache.hadoop.hbase.wal.TestWALFiltering | | | org.apache.hadoop.hbase.master.TestGetInfoPort | | | org.apache.hadoop.hbase.regionserver.TestColumnSeeking | | | org.apache.hadoop.hbase.regionserver.TestRegionServerAbort | | | org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort | | | org.apache.hadoop.hbase.regionserver.TestRegionIncrement | | | org.apache.hadoop.hbase.regionserver.TestWALLockup | | | org.apache.hadoop.hbase.master.TestTableStateManager | | | org.apache.hadoop.hbase.regionserver.TestWalAndCompactingMemStoreFlush | | | org.apache.hadoop.hbase.master.assignment.TestAssignmentOnRSCrash | | | org.apache.hadoop.hbase.master.TestSplitLogManager | | | org.apache.hadoop.hbase.master.balancer.TestFavoredNodeTableImport | | | org.apache.hadoop.hbase.regionserver.TestSplitWalDataLoss | | | org.apache.hadoop.hbase.master.TestRollingRestart | | | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer | | |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158367#comment-16158367 ] Hadoop QA commented on HBASE-18771: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 15m 47s{color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 23s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}117m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:b3a2a00 | | JIRA Issue | HBASE-18771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886008/HBASE-18771.branch-1.3.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 413198e531bd 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/hbase.sh | | git revision | branch-1.3 / 6fcb15f | | Default Java | 1.7.0_131 | | Multi-JDK versions |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158217#comment-16158217 ] Abhishek Singh Chouhan commented on HBASE-18771: Thanks [~ram_krish] bq. One quesiton on the test case - does this test case fail always without patch or you need to introduce some locking / new threads to introduce the actual bug? Because all are sequential in the tests now and what if closeAndArchiveCompactedFiles() gets completed and then the hr.compact(false) gets executed? Test case always fails on branch-1.3 without the fix. In the test case closeAndArchiveCompactedFiles() explicitly gets completed before he.compact(false) gets called which does not find the file since its already archived. bq. The refreshStoreFiles() API how ever needs the fix in 2.0 and trunk also right? So lets get it in branch-2 and above also? Yes. I checked master and refreshStoreFiles() there is also the same, so i think we should fix there also. The test case for split failure passes on master due to differences in the way splits are done in the new assignment manager. [~lhofhansl] had some good suggestions on using Collections.emptySet() inplace of new ArrayList. Putting up a new patch with those in. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158180#comment-16158180 ] ramkrishna.s.vasudevan commented on HBASE-18771: I was thinking once again - The refreshStoreFiles() API how ever needs the fix in 2.0 and trunk also right? So lets get it in branch-2 and above also? > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158128#comment-16158128 ] ramkrishna.s.vasudevan commented on HBASE-18771: +1 for patch. Thanks for the great find. Probably there is Phoenix CP here in this stack trace which could have resulted in this case of FNFE and also the code expects FNFE to occur in these cases so fixing refreshStoreFiles should solve multiple JIRAs raised related to this. One quesiton on the test case - does this test case fail always without patch or you need to introduce some locking / new threads to introduce the actual bug? Because all are sequential in the tests now and what if closeAndArchiveCompactedFiles() gets completed and then the hr.compact(false) gets executed? You can commit this to branch-1.3 and probably for branch-2 and above too if you can produce a test case. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158119#comment-16158119 ] ramkrishna.s.vasudevan commented on HBASE-18771: [~mantonov], [~ashu210890] This patch solves a major issue. Pls have a look. How ever the reason for FNFE is yet to be explored but this could help. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157804#comment-16157804 ] stack commented on HBASE-18771: --- Removed from hbase2 for now until demonstrated this issue is in hbase2. Thanks lads. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.4.0, 1.3.2, 1.5.0 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157175#comment-16157175 ] Hadoop QA commented on HBASE-18771: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 10s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 20m 6s{color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}371m 15s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 4m 33s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}442m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestRegionServerHostname | | | hadoop.hbase.regionserver.TestRecoveredEdits | | | hadoop.hbase.mapreduce.TestLoadIncrementalHFilesUseSecurityEndPoint | | | hadoop.hbase.mapreduce.TestLoadIncrementalHFiles | | | hadoop.hbase.client.TestAdmin2 | | | hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFiles | | | hadoop.hbase.replication.TestReplicationChangingPeerRegionservers | | Timed out junit tests | org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS | | | org.apache.hadoop.hbase.master.TestMasterMetricsWrapper | | | org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat | | |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156973#comment-16156973 ] Hadoop QA commented on HBASE-18771: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 15m 56s{color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 53s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 47s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:b3a2a00 | | JIRA Issue | HBASE-18771 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12885798/HBASE-18771.branch-1.3.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux fddd83692a8a 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/hbase.sh | | git revision | branch-1.3 / 6fcb15f | | Default Java | 1.7.0_131 | | Multi-JDK versions |
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156891#comment-16156891 ] ramkrishna.s.vasudevan commented on HBASE-18771: Thanks for the update. I think if with your logs if we can drill down why scan is failing then I think more or less all cases will get handled across all versions :). I can help in the analysis too because 1.3 is not considered stable because of these bugs and these are lingering around without proper fixes. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156885#comment-16156885 ] Abhishek Singh Chouhan commented on HBASE-18771: bq. Are you using read replica feature? Then all this is possible. No boss. This is without read replicas. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156880#comment-16156880 ] Abhishek Singh Chouhan commented on HBASE-18771: Didn't see your comment [~ram_krish] before i submitted mine :) I didn't look much into how the scanner got FNFE, but it was definitely there in the logs. The way i reproduced the scenario was by running a load and also running a chaos monkey that did splits,merges,compactions,snapshot and also full table scans in parallel. I'll try to dig deeper into that too. > So who is doing this incorrect refresh? Again is that due to scan failure? Yes in the logs i had RpcServer.FifoWFPBQ.default.handler=87,queue=7,port=16020: callId: 12290 service: ClientService methodName: Scan size: 47 connection: 10.231.90.15:50148 deadline: 1504528119024 org.apache.hadoop.hbase.UnknownScannerException: Throwing UnknownScannerException to reset the client scanner state for clients older than 1.3. at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3030) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35072) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2370) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) Caused by: java.io.IOException: unable to read store file at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6394) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6021) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6164) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5938) at org.apache.phoenix.iterate.RegionScannerFactory$1.nextRaw(RegionScannerFactory.java:223) at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77) at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77) at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:259) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2767) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2967) ... 5 more Looks like with HBASE-17712 we no longer call refreshstorefiles in handlefilenot found, so this might not happen. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at >
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156866#comment-16156866 ] ramkrishna.s.vasudevan commented on HBASE-18771: Are you using read replica feature? Then all this is possible. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156858#comment-16156858 ] ramkrishna.s.vasudevan commented on HBASE-18771: Am just trying to understand the case better so that if there is any other route cause we should also fix that. Nothing wrong in the patch or suggestion. Two things -> Some times scanners get FNFE. So why is this? Is it due to incorrect store file accounting ? Then we should fix it. -> Split case failures. When we actually close the store we split the files that are from the present set of store files. The already compacted files are not included (in the normal flow). So the references files are created for the present list of files only. I checked this with 1.3 latest code. bq.(which also includes already compacted files due to incorrect refresh) So who is doing this incorrect refresh? Again is that due to scan failure? ( I agree the refresh method has to checked but am more concerned about who is calling it and why?). > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156849#comment-16156849 ] ramkrishna.s.vasudevan commented on HBASE-18771: Thanks [~abhishek.chouhan]. I had been trying hard to reproduce this case based on HBASE-18186 but with no success because I was not able to get the actual case and there were no actual logs to trace it out except for the ones in the JIRA description. bq.Now before the files are archived we get a FNFE in a scanner. Why is this happening? If you can tell me how this is happening then I can think the rest makes sense. Because when any scanner is ongoing we are not updating the store files nor we are creating a new scanner from the files that were compacted. Also https://issues.apache.org/jira/browse/HBASE-17712 this seems not in 1.3. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156851#comment-16156851 ] Abhishek Singh Chouhan commented on HBASE-18771: The split scenario failure wont happen in master due to the changes related to AMv2. In 1.3 when we close the parent region we use the list of store files returned by the close method during splitting which is the list of open store files(which also includes already compacted files due to incorrect refresh), however in case of master we get the list of files from hdfs which will have correct files due to the cleanup that happens during close of parent region. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch, > HBASE-18771.branch-1.3.002.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156815#comment-16156815 ] Ted Yu commented on HBASE-18771: Nice finding. Please add license to TestCompactionFileNotFound class. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > Attachments: HBASE-18771.branch-1.3.001.patch > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18771) Incorrect StoreFileRefresh leading to split and compaction failures
[ https://issues.apache.org/jira/browse/HBASE-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156699#comment-16156699 ] Abhishek Singh Chouhan commented on HBASE-18771: Have created a new issue since this might be one of the reasons we're getting the other issues. We can mark them dup later if this satisfactorily resolves the others too. > Incorrect StoreFileRefresh leading to split and compaction failures > --- > > Key: HBASE-18771 > URL: https://issues.apache.org/jira/browse/HBASE-18771 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 3.0.0, 1.4.0, 1.3.2, 1.5.0, 2.0.0-alpha-3 > > > We ran into issues of compaction and split failures with 1.3 similar to > HBASE-18186 and HBASE-17406. Here's what i believe is happening - > Lets say we have 4 store files that are compacted to form a new one. At this > point we now have 5 store files, however only 1(the newly formed) is open now > for the store and rest are waiting to get archived by HFileArchiver > Now before the files are archived we get a FNFE in a scanner. This results in > HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) > being called which results in region.refreshStoreFiles(true) -> > HStore.refreshStoreFiles() > HStore.refreshStoreFiles now checks the hdfs dir and adds the previously > compacted files back to the store, however these files are also present in > StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, > checks compactedFiles list and moves these files into the archive directory. > Now when compaction runs it gets: > 2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] > regionserver.CompactSplitThread - Compaction selection failed regionName = > , storeName = 0, priority = 26, time = 1504528213899 > java.io.FileNotFoundException: File does not exist: hdfs:// > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337) > at > org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325) > at > org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63) > at > org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65) > at > org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107) > at > org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679) > Similarly if a split happens after archival we fail after PONR while opening > daughter regions due to FNFE. This results in parent offline and daughters > also in a limbo since they're unable to open. Since we get the error after > PONR we also end up aborting the RS. -- This message was sent by Atlassian JIRA (v6.4.14#64029)