[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990932#comment-16990932 ] Viraj Jasani commented on HBASE-22457: -- {quote} # detect run-away refCount by comparing any reader's refCount with the actual number of open scanners (which we track for each HRegion) if the refCount is larger we know we have a problem. # (a variation) when we attempt to archive an HFile that has refCount, check if there're any open scanners, if not archive anyway. For #1 at least we could enhance the logging and include the number of currently scanners in the log (where we say that we cannot archive an HFile) {quote} For #1, since open scanners are tracked at HRegion(RegionScannerImpl) and not at Store level, we might not be able to compare refCount with open scanners? Also, #2 might also not be true due to open scanners at region level (non-compacted store files) and not at store level? I was thinking if we can also track no of open scanners at store level. I was just going though comments here while looking into HBASE-23349 (refCount 1 preventing archival of compacted store files) > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847806#comment-16847806 ] Andrew Purtell commented on HBASE-22457: [~lhofhansl] do you want to really submit that patch? put this in PA status etc? > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846265#comment-16846265 ] Andrew Purtell commented on HBASE-22457: I opened subtask HBASE-22460 (Reopen a region if store reader references may have leaked) > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846262#comment-16846262 ] Andrew Purtell commented on HBASE-22457: Java offers Closable and AutoClosable interfaces that will trigger some checking behavior in the compiler, but offhand I think the static check is only useful if the resource is acquired and should be closed in the same method block. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846258#comment-16846258 ] Lars Hofhansl commented on HBASE-22457: --- > No, but this is no more or less fast then the close then open we do for > 'alter' processing. It would be implemented the same way, ideally. We came to that conclusion as well in a discussion in the office. Just alter some minor thing on the Table/ColumnDescriptor so that all regions are closed/reopened resulting "Wouldn't it be nice if we had a tool that could do that without forcing us to change something." :) > Scanner wrapping is a key thing. Without it I don't think Phoenix works. Oh totally agree. Though perhaps it is structurally somehow possible to ensure that passed scanner is either wrapped or closed. I can't think of anything, though. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846253#comment-16846253 ] Andrew Purtell commented on HBASE-22457: bq. I do like your idea. Can we do a fast close without flushing the memstore? (Otherwise it might not be "fast" ) No, but this is no more or less fast then the close then open we do for 'alter' processing. It would be implemented the same way, ideally. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846252#comment-16846252 ] Andrew Purtell commented on HBASE-22457: bq. What I'm really looking for is a structural fix where a coprocessor cannot mess things. Perhaps that's not possible without severely limiting what coprocessors are allowed to do. Scanner wrapping is a key thing. Without it I don't think Phoenix works. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846249#comment-16846249 ] Andrew Purtell commented on HBASE-22457: Random patch lgtm One thought: {code} + if (rsh.s != scanner) { +LOG.warn("Scanner name " + scannerName + " does not match the actual scanner used!"); +rsh.s.close(); + } {code} Worth implementing toString on scanner impl classes so we have potentially more than scannerName to go on when debugging this? > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846172#comment-16846172 ] Lars Hofhansl commented on HBASE-22457: --- Some random things that didn't quite look right. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > Attachments: 22457-random-1.5.txt > > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846161#comment-16846161 ] Lars Hofhansl commented on HBASE-22457: --- Oh, it's a hack for sure and, as you point out, might hide the actual problem. I do like your idea. Can we do a fast close without flushing the memstore? (Otherwise it might not be "fast" :)) Another hacks/ideas: # detect run-away refCount by comparing any reader's refCount with the actual number of open scanners (which we track for each HRegion) if the refCount is larger we know we have a problem. # (a variation) when we attempt to archive an HFile that has refCount, check if there're any open scanners, if not archive anyway. For #1 at least we could enhance the logging and include the number of currently scanners in the log (where we say that we cannot archive an HFile) What I'm really looking for is a structural fix where a coprocessor cannot mess things. Perhaps that's not possible without severely limiting what coprocessors are allowed to do. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846109#comment-16846109 ] Andrew Purtell commented on HBASE-22457: [~lhofhansl] bq. Are there any "safepoints"? I.e. points where we know what the reference could should be, so we can reset it? An obvious one is when there are no scanners running at all; in that case we could reset all refCounts to 0. Interesting idea. Debating if it's a bit of a hack. In a refcounting system the number of resource acquisitions should equal the number of releases or else it's a bug. By hiding this class of bugs by behind the scenes twiddling with the counter would we encourage sloppy code? If considering hacks, a decent mitigation for leaks would be a fast close and reopen of the region on the same server (initiated by the RS) will release all resources, like the refcount, leases, etc. The clients should gracefully ride over this like any other region transition. If the refcount is over some ridiculous threshold this mitigation could be triggered along with a fat WARN in the logs. This is orthogonal to hardening though so if we want to pursue the idea I could open a subtask. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846100#comment-16846100 ] Andrew Purtell commented on HBASE-22457: bq. The Phoenix ones should be rare. In our case we are seeing readers with refCount > 1000. This seems to be happening in conjunction with exceptions thrown when clients disconnect during the scan. So one action item here is to review exception handling in all scanner related paths in HBase and Phoenix and ensure references are released at one point in every exception chain. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting
[ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846075#comment-16846075 ] Lars Hofhansl commented on HBASE-22457: --- One idea is: Are there any "safepoints"? I.e. points where we know what the reference could should be, so we can reset it? An obvious one is when there are no scanners running at all; in that case we could reset all refCounts to 0. > Harden the HBase HFile reader reference counting > > > Key: HBASE-22457 > URL: https://issues.apache.org/jira/browse/HBASE-22457 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Priority: Major > > The problem that any coprocessor hook that replaces a passed scanner without > closing it can cause an incorrect reference count. > This was bad and wrong before of course, but now it has pretty bad > consequences, since an incorrect reference could will prevent HFiles from > being archived indefinitely. > All hooks that are passed a scanner and return a scanner are suspect, since > the returned scanner may or may not close the passed scanner: > * preCompact > * preCompactScannerOpen > * preFlush > * preFlushScannerOpen > * preScannerOpen > * preStoreScannerOpen > * preStoreFileReaderOpen...? (not sure about this one, it could mess with the > reader) > I sampled the Phoenix and also Tephra code, and found a few instances where > this is happening. > And for those I filed issued: TEPHRA-300, PHOENIX-5291 > (We're not using Tephra) > The Phoenix ones should be rare. In our case we are seeing readers with > refCount > 1000. > Perhaps there are other issues, a path where not all exceptions are caught > and scanner is left open that way perhaps. (Generally I am not a fan of > reference counting in complex systems - it's too easy to miss something. But > that's a different discussion. :) ). > Let's brainstorm some way in which we can harden this. > [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)