[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-09-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459484#comment-15459484
 ] 

Josh Elser commented on ACCUMULO-4428:
--

Doing this now.

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>Priority: Blocker
> Fix For: 1.6.6, 1.7.3
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-31 Thread Michael Wall (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453263#comment-15453263
 ] 

Michael Wall commented on ACCUMULO-4428:


Yeah, thanks for cleaning this up [~adamjshook]

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>Priority: Blocker
> Fix For: 1.7.3
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453258#comment-15453258
 ] 

Josh Elser commented on ACCUMULO-4428:
--

Done, thanks again, dude.

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>Priority: Blocker
> Fix For: 1.7.3
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-31 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453169#comment-15453169
 ] 

Adam J Shook commented on ACCUMULO-4428:


Sure, you can add me to your contributors page.  I work at 
[Datacatessen|https://datacatessen.com/] and am in eastern time zone.  Thanks!

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Adam J Shook
>Priority: Blocker
> Fix For: 1.7.3
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447082#comment-15447082
 ] 

Adam J Shook commented on ACCUMULO-4428:


Opened https://github.com/apache/accumulo/pull/143

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Michael Wall
>Priority: Blocker
> Fix For: 1.7.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446760#comment-15446760
 ] 

Josh Elser commented on ACCUMULO-4428:
--

bq. Happy to work on this and get you a PR – I need to patch the GC anyway and 
run it manually to clean up the files. I was thinking of just making the 
firstSeenDead map static. I don't see a change like that having any adverse 
side effects.

That would be awesome, [~adamjshook]!

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Michael Wall
>Priority: Blocker
> Fix For: 1.7.3
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446412#comment-15446412
 ] 

Josh Elser commented on ACCUMULO-4428:
--

bq. I was thinking of just making the firstSeenDead map static. I don't see a 
change like that having any adverse side effects.

I had talked to Mike in IRC and we had the same idea. We also thought that it 
might be better to just make the GarbageCollectWriteAheadLogs instance a 
singleton inside of SimpleGarbageCollector to avoid unnecessary static state.

bq. ok with me moving this ticket back to 1.8.1 and proceeding with the rc for 
1.8.0

Great. I am OK with just dropping the 1.8.x completely and dealing with this in 
ACCUMULO-4333 instead.

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Michael Wall
>Priority: Blocker
> Fix For: 1.7.3, 1.8.0
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Michael Wall (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446312#comment-15446312
 ] 

Michael Wall commented on ACCUMULO-4428:


When I merged ACCUMULO-4157 up from 1.7 to 1.8, the code for the 
GarbageCollectWriteAheadLogs had changed dramatically and could not reproduce 
the error in the ticket.  So I never implemented the firstSeenDead 
Map in the 1.8.0 codebase.  I did however write a ticket to 
look at it more in 1.8, https://issues.apache.org/jira/browse/ACCUMULO-4333.

[~elserj] you ok with me moving this ticket back to 1.8.1 and proceeding with 
the rc for 1.8.0?

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Michael Wall
>Priority: Blocker
> Fix For: 1.7.3, 1.8.0
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Adam J Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446304#comment-15446304
 ] 

Adam J Shook commented on ACCUMULO-4428:


Yeah, we are seeing this in action.  I enabled the trace logs and every period 
it re-discovers the dead tservers.  Not sure how to write any test cases to 
cover it, but we're hoping to get this into 1.7.3 and will manually clean up 
the WALs in the meantime.  Happy to work on this and get you a PR -- I need to 
patch the GC anyway and run it manually to clean up the files.  I was thinking 
of just making the firstSeenDead map static.  I don't see a change like that 
having any adverse side effects.

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Assignee: Michael Wall
>Priority: Blocker
> Fix For: 1.7.3, 1.8.0
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Michael Wall (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446240#comment-15446240
 ] 

Michael Wall commented on ACCUMULO-4428:


Yeah, I see it now.  Thanks for the report [~adamjshook]

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Priority: Blocker
> Fix For: 1.7.3, 1.8.1
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446234#comment-15446234
 ] 

Josh Elser commented on ACCUMULO-4428:
--

He means each cycle the GC runs (it's "period")

{code}
  Span waLogs = Trace.start("walogs");
  try {
GarbageCollectWriteAheadLogs walogCollector = new 
GarbageCollectWriteAheadLogs(this, fs, isUsingTrash());
log.info("Beginning garbage collection of write-ahead logs");
walogCollector.collect(status);
  } catch (Exception e) {
log.error("{}", e.getMessage(), e);
  } finally {
waLogs.stop();
  }
  gcSpan.stop();
{code}

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Priority: Blocker
> Fix For: 1.7.3, 1.8.1
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Michael Wall (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446221#comment-15446221
 ] 

Michael Wall commented on ACCUMULO-4428:


[~adamjshook] when you say "each run of the SimpleGarbageCollector", do you 
mean stops and restarts of the GC process?  I thought I tested it starting and 
stopping the GC process and it and it added the dead tserver again.

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
>Priority: Blocker
> Fix For: 1.7.3, 1.8.1
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4428) GC does not delete WAL files belonging to dead tservers

2016-08-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446198#comment-15446198
 ] 

Josh Elser commented on ACCUMULO-4428:
--

FYI [~mjwall], this sounds like the changes you introduced in ACCUMULO-4157.

[~adamjshook], thanks for filing this! Are we to assume that you have also seen 
this happening in a real cluster? I'm looking back at the changes now to 
refresh myself.

> GC does not delete WAL files belonging to dead tservers
> ---
>
> Key: ACCUMULO-4428
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4428
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.7.2
>Reporter: Adam J Shook
> Fix For: 1.7.3, 1.8.1
>
>
> The GarbageCollectWriteAheadLogs uses a Map to track when 
> it had first seen a dead tserver, waiting an hour before deleting the files.  
> However, a new instance of this class is re-created during each run of the 
> SimpleGarbageCollector, causing the state of the dead tservers to be lost.  
> All of the WAL files belonging to a dead tserver will never be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)