[jira] [Commented] (HBASE-22078) corrupted procs in proc WAL

2019-03-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804306#comment-16804306
 ] 

Sergey Shelukhin commented on HBASE-22078:
--

No, the logs that far back were not available..

> corrupted procs in proc WAL
> ---
>
> Key: HBASE-22078
> URL: https://issues.apache.org/jira/browse/HBASE-22078
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Not sure what the root cause is... there are ~500 proc wal files (I actually 
> wonder if cleanup is also blocked by this, since I see these lines on master 
> restart, do WALs with abandoned procedures like that get deleted?).
> {noformat}
> 2019-03-20 07:37:53,212 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7571, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7600, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7610, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7631, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7650, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7651, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7657, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7683, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> {noformat}
> Followed by 
> {noformat}
> 2019-03-20 07:37:53,751 ERROR [master/...:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Corrupt pid=66829, 
> state=WAITING:DISABLE_TABLE_ADD_REPLICATION_BARRIER, hasLock=false; 
> DisableTableProcedure table=...
> {noformat}
> And 1000s of child procedures and grandchild procedures of this procedure.
> I think this area needs general review... we should have a record for the 
> procedure durably persisted before we create any child procedures, so I'm not 
> sure how this could happen. Actually, I also wonder why we even have separate 
> proc WAL when HBase already has a working WAL that's more or less time 
> tested... 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22078) corrupted procs in proc WAL

2019-03-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802929#comment-16802929
 ] 

stack commented on HBASE-22078:
---

bq. there are ~500 proc wal files.

This is a problem. The cause has probably rolled away? Can you see where it 
went bad?

> corrupted procs in proc WAL
> ---
>
> Key: HBASE-22078
> URL: https://issues.apache.org/jira/browse/HBASE-22078
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Not sure what the root cause is... there are ~500 proc wal files (I actually 
> wonder if cleanup is also blocked by this, since I see these lines on master 
> restart, do WALs with abandoned procedures like that get deleted?).
> {noformat}
> 2019-03-20 07:37:53,212 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7571, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7600, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7610, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7631, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7650, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7651, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7657, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7683, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> {noformat}
> Followed by 
> {noformat}
> 2019-03-20 07:37:53,751 ERROR [master/...:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Corrupt pid=66829, 
> state=WAITING:DISABLE_TABLE_ADD_REPLICATION_BARRIER, hasLock=false; 
> DisableTableProcedure table=...
> {noformat}
> And 1000s of child procedures and grandchild procedures of this procedure.
> I think this area needs general review... we should have a record for the 
> procedure durably persisted before we create any child procedures, so I'm not 
> sure how this could happen. Actually, I also wonder why we even have separate 
> proc WAL when HBase already has a working WAL that's more or less time 
> tested... 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22078) corrupted procs in proc WAL

2019-03-21 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798448#comment-16798448
 ] 

Sean Busbey commented on HBASE-22078:
-

to map to the pre-existing WAL subsystem we'd make up some Key-Value structure 
to represent a procedure and then treat procedure completion as "flushed and 
safe to discard"?

> corrupted procs in proc WAL
> ---
>
> Key: HBASE-22078
> URL: https://issues.apache.org/jira/browse/HBASE-22078
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Not sure what the root cause is... there are ~500 proc wal files (I actually 
> wonder if cleanup is also blocked by this, since I see these lines on master 
> restart, do WALs with abandoned procedures like that get deleted?).
> {noformat}
> 2019-03-20 07:37:53,212 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7571, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7600, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7610, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7631, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7650, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7651, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7657, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> 2019-03-20 07:37:53,213 ERROR [master/...:17000:becomeActiveMaster] 
> wal.WALProcedureTree: Missing stack id 7683, max stack id is 7754, root 
> procedure is Procedure(pid=66829, ppid=-1, 
> class=org.apache.hadoop.hbase.master.procedure.DisableTableProcedure)
> {noformat}
> Followed by 
> {noformat}
> 2019-03-20 07:37:53,751 ERROR [master/...:17000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Corrupt pid=66829, 
> state=WAITING:DISABLE_TABLE_ADD_REPLICATION_BARRIER, hasLock=false; 
> DisableTableProcedure table=...
> {noformat}
> And 1000s of child procedures and grandchild procedures of this procedure.
> I think this area needs general review... we should have a record for the 
> procedure durably persisted before we create any child procedures, so I'm not 
> sure how this could happen. Actually, I also wonder why we even have separate 
> proc WAL when HBase already has a working WAL that's more or less time 
> tested... 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)