[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459484#comment-15459484 ] Josh Elser commented on ACCUMULO-4428: -- Doing this now. > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Adam J Shook >Priority: Blocker > Fix For: 1.6.6, 1.7.3 > > Time Spent: 4.5h > Remaining Estimate: 0h > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453263#comment-15453263 ] Michael Wall commented on ACCUMULO-4428: Yeah, thanks for cleaning this up [~adamjshook] > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Adam J Shook >Priority: Blocker > Fix For: 1.7.3 > > Time Spent: 4.5h > Remaining Estimate: 0h > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453258#comment-15453258 ] Josh Elser commented on ACCUMULO-4428: -- Done, thanks again, dude. > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Adam J Shook >Priority: Blocker > Fix For: 1.7.3 > > Time Spent: 4.5h > Remaining Estimate: 0h > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453169#comment-15453169 ] Adam J Shook commented on ACCUMULO-4428: Sure, you can add me to your contributors page. I work at [Datacatessen|https://datacatessen.com/] and am in eastern time zone. Thanks! > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Adam J Shook >Priority: Blocker > Fix For: 1.7.3 > > Time Spent: 4.5h > Remaining Estimate: 0h > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447082#comment-15447082 ] Adam J Shook commented on ACCUMULO-4428: Opened https://github.com/apache/accumulo/pull/143 > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Michael Wall >Priority: Blocker > Fix For: 1.7.3 > > Time Spent: 20m > Remaining Estimate: 0h > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446760#comment-15446760 ] Josh Elser commented on ACCUMULO-4428: -- bq. Happy to work on this and get you a PR – I need to patch the GC anyway and run it manually to clean up the files. I was thinking of just making the firstSeenDead map static. I don't see a change like that having any adverse side effects. That would be awesome, [~adamjshook]! > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Michael Wall >Priority: Blocker > Fix For: 1.7.3 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446412#comment-15446412 ] Josh Elser commented on ACCUMULO-4428: -- bq. I was thinking of just making the firstSeenDead map static. I don't see a change like that having any adverse side effects. I had talked to Mike in IRC and we had the same idea. We also thought that it might be better to just make the GarbageCollectWriteAheadLogs instance a singleton inside of SimpleGarbageCollector to avoid unnecessary static state. bq. ok with me moving this ticket back to 1.8.1 and proceeding with the rc for 1.8.0 Great. I am OK with just dropping the 1.8.x completely and dealing with this in ACCUMULO-4333 instead. > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Michael Wall >Priority: Blocker > Fix For: 1.7.3, 1.8.0 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446312#comment-15446312 ] Michael Wall commented on ACCUMULO-4428: When I merged ACCUMULO-4157 up from 1.7 to 1.8, the code for the GarbageCollectWriteAheadLogs had changed dramatically and could not reproduce the error in the ticket. So I never implemented the firstSeenDead Mapin the 1.8.0 codebase. I did however write a ticket to look at it more in 1.8, https://issues.apache.org/jira/browse/ACCUMULO-4333. [~elserj] you ok with me moving this ticket back to 1.8.1 and proceeding with the rc for 1.8.0? > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Michael Wall >Priority: Blocker > Fix For: 1.7.3, 1.8.0 > > > The GarbageCollectWriteAheadLogs uses a Map to track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446304#comment-15446304 ] Adam J Shook commented on ACCUMULO-4428: Yeah, we are seeing this in action. I enabled the trace logs and every period it re-discovers the dead tservers. Not sure how to write any test cases to cover it, but we're hoping to get this into 1.7.3 and will manually clean up the WALs in the meantime. Happy to work on this and get you a PR -- I need to patch the GC anyway and run it manually to clean up the files. I was thinking of just making the firstSeenDead map static. I don't see a change like that having any adverse side effects. > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Assignee: Michael Wall >Priority: Blocker > Fix For: 1.7.3, 1.8.0 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446240#comment-15446240 ] Michael Wall commented on ACCUMULO-4428: Yeah, I see it now. Thanks for the report [~adamjshook] > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Priority: Blocker > Fix For: 1.7.3, 1.8.1 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446234#comment-15446234 ] Josh Elser commented on ACCUMULO-4428: -- He means each cycle the GC runs (it's "period") {code} Span waLogs = Trace.start("walogs"); try { GarbageCollectWriteAheadLogs walogCollector = new GarbageCollectWriteAheadLogs(this, fs, isUsingTrash()); log.info("Beginning garbage collection of write-ahead logs"); walogCollector.collect(status); } catch (Exception e) { log.error("{}", e.getMessage(), e); } finally { waLogs.stop(); } gcSpan.stop(); {code} > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Priority: Blocker > Fix For: 1.7.3, 1.8.1 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446221#comment-15446221 ] Michael Wall commented on ACCUMULO-4428: [~adamjshook] when you say "each run of the SimpleGarbageCollector", do you mean stops and restarts of the GC process? I thought I tested it starting and stopping the GC process and it and it added the dead tserver again. > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook >Priority: Blocker > Fix For: 1.7.3, 1.8.1 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers
[ https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446198#comment-15446198 ] Josh Elser commented on ACCUMULO-4428: -- FYI [~mjwall], this sounds like the changes you introduced in ACCUMULO-4157. [~adamjshook], thanks for filing this! Are we to assume that you have also seen this happening in a real cluster? I'm looking back at the changes now to refresh myself. > GC does not delete WAL files belonging to dead tservers > --- > > Key: ACCUMULO-4428 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4428 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.7.2 >Reporter: Adam J Shook > Fix For: 1.7.3, 1.8.1 > > > The GarbageCollectWriteAheadLogs uses a Mapto track when > it had first seen a dead tserver, waiting an hour before deleting the files. > However, a new instance of this class is re-created during each run of the > SimpleGarbageCollector, causing the state of the dead tservers to be lost. > All of the WAL files belonging to a dead tserver will never be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)