[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813745#comment-13813745 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #826 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/826/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538867) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813757#comment-13813757 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in hbase-0.96-hadoop2 #113 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/113/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538868) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813843#comment-13813843 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in hbase-0.96 #180 (See [https://builds.apache.org/job/hbase-0.96/180/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538868) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813852#comment-13813852 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-TRUNK #4668 (See [https://builds.apache.org/job/HBase-TRUNK/4668/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538867) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814207#comment-13814207 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-0.94-security #329 (See [https://builds.apache.org/job/HBase-0.94-security/329/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers; REVERT (stack: rev 1538869) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814230#comment-13814230 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in HBase-0.94 #1195 (See [https://builds.apache.org/job/HBase-0.94/1195/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers; REVERT (stack: rev 1538869) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812669#comment-13812669 ] Liang Xie commented on HBASE-8942: -- seems the diff makes the TestHRegion case unstable a bit. for i in {1..10};do mvn clean test -P localTests -Dtest=TestHRegion#testParallelAppendWithMemStoreFlush /tmp/${i}; done it shows all are passed on my desktop. butfor i in {1..10};do mvn clean test -P localTests -Dtest=TestHRegion /tmp/${i}; done it shows 3 of 10 failed. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812714#comment-13812714 ] Liang Xie commented on HBASE-8942: -- testParallelAppendWithMemStoreFlush case was introduced by HBASE-6210 the failure means data will be lost probably. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813119#comment-13813119 ] Amitanand Aiyer commented on HBASE-8942: Hey Liang, thanks for pointing the issue out. We will try to port the test and runthrough it. Another issue that we have recently seen is that. This diff exposes some DFS errors during RegionScanner creation... if the compaction deletes one of the files when the scanner is created, before the scanner is registered. https://issues.apache.org/jira/browse/HBASE-9889 should fix that issue. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813157#comment-13813157 ] Amitanand Aiyer commented on HBASE-8942: Seems like the testcase is doing some append operations. This is not available on 0.89, so unable to port the test back. Will probably just focus on the open source trunk failures. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813330#comment-13813330 ] Ted Yu commented on HBASE-8942: --- From the following call stack (trunk), I don't see where readLock is grabbed: {code} HStore.getScanner(Scan, NavigableSetbyte[], long) line: 1683 HRegion$RegionScannerImpl.init(Scan, ListKeyValueScanner, HRegion) line: 3427 HRegion.instantiateRegionScanner(Scan, ListKeyValueScanner) line: 1746 HRegion.getScanner(Scan, ListKeyValueScanner) line: 1738 HRegion.getScanner(Scan) line: 1715 TestHRegionBusyWait(TestHRegion).testWritesWhileScanning() line: 2914 {code} DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813506#comment-13813506 ] Lars Hofhansl commented on HBASE-8942: -- The Store's readlock would be acquired inside. I do not think this is the issue. [~xieliang007], do you still see the issue with this patch reverted? From inspecting the code and the patch I do not see anything wrong with this. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813516#comment-13813516 ] Lars Hofhansl commented on HBASE-8942: -- [~stack], FYI. Might have to revert. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813542#comment-13813542 ] Liang Xie commented on HBASE-8942: -- [~lhofhansl], i could not repro the failure after reverted. i can ensure the case failure was caused by this jira/diff definitely. Let's revert it now, [~lhofhansl], [~saint@gmail.com]. I'll dig it per [~amitanand]'s comment. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813658#comment-13813658 ] stack commented on HBASE-8942: -- Reverted for now from trunk, 0.96, and 0.94. Thanks for figuring this the culprit lads (though looking at it, it looks good to me -- and a nice fix to have). DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813670#comment-13813670 ] Lars Hofhansl commented on HBASE-8942: -- Thanks Stack. You beat me to it :) DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813704#comment-13813704 ] Liang Xie commented on HBASE-8942: -- Reran this diff combined with HBASE-9889's, still repro the above failure successfully. I need to dive into the detail code to find the root cause now:) DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.094.txt, 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812292#comment-13812292 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in hbase-0.96 #178 (See [https://builds.apache.org/job/hbase-0.96/178/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538318) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812319#comment-13812319 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-TRUNK #4665 (See [https://builds.apache.org/job/HBase-TRUNK/4665/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538317) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812340#comment-13812340 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in hbase-0.96-hadoop2 #112 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/112/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538318) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812355#comment-13812355 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #824 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/824/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (stack: rev 1538317) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812556#comment-13812556 ] Lars Hofhansl commented on HBASE-8942: -- Checked the 0.94 code. Should be safe there as well. Good find. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812557#comment-13812557 ] Lars Hofhansl commented on HBASE-8942: -- Committed to 0.94 as well. DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812610#comment-13812610 ] Hudson commented on HBASE-8942: --- FAILURE: Integrated in HBase-0.94-security #328 (See [https://builds.apache.org/job/HBase-0.94-security/328/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (Amitanand Aiyer) (larsh: rev 1538484) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812621#comment-13812621 ] Hudson commented on HBASE-8942: --- SUCCESS: Integrated in HBase-0.94 #1194 (See [https://builds.apache.org/job/HBase-0.94/1194/]) HBASE-8942 DFS errors during a read operation (get/scan), may cause write outliers (Amitanand Aiyer) (larsh: rev 1538484) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb, 0.98.0, 0.96.1, 0.94.14 Attachments: 8942.096.txt, HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811131#comment-13811131 ] Liang Xie commented on HBASE-8942: -- thanks [~amitanand] DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb Attachments: HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8942) DFS errors during a read operation (get/scan), may cause write outliers
[ https://issues.apache.org/jira/browse/HBASE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811132#comment-13811132 ] Liang Xie commented on HBASE-8942: -- [~lhofhansl], would you like to bring it into 0.94 branch as well? seems a low risk improvement:) DFS errors during a read operation (get/scan), may cause write outliers --- Key: HBASE-8942 URL: https://issues.apache.org/jira/browse/HBASE-8942 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb, 0.95.2 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb Attachments: HBase-8942.txt This is a similar issue as discussed in HBASE-8228 1) A scanner holds the Store.ReadLock() while opening the store files ... encounters errors. Thus, takes a long time to finish. 2) A flush is completed, in the mean while. It needs the write lock to commit(), and update scanners. Hence ends up waiting. 3+) All Puts (and also Gets) to the CF, which will need a read lock, will have to wait for 1) and 2) to complete. Thus blocking updates to the system for the DFS timeout. Fix: Open Store files outside the read lock. getScanners() already tries to do this optimisation. However, Store.getScanner() which calls this functions through the StoreScanner constructor, redundantly tries to grab the readLock. Causing the readLock to be held while the storeFiles are being opened, and seeked. We should get rid of the readLock() in Store.getScanner(). This is not required. The constructor for StoreScanner calls getScanners(xxx, xxx, xxx). This has the required locking already. -- This message was sent by Atlassian JIRA (v6.1#6144)