Hey Adam,

Thanks for sharing this one.

Adam J. Shook wrote:
Hello folks,

One of our clusters has been throwing a handful of replication errors
from the status maker -- see below.  The WAL files in question to not
belong to an active tserver -- some investigation in the code shows that
the createdTime could not be written and these WALs will sit here until
a created time is added.

Does that mean that you saw an exception when the mutation written to accumulo.metadata that had the createTime failed? Or is the cause of why that WAL didn't get this 'attribute' still unknown?

I think the kind of fix to make it dependent on the cause here. e.g. if this is just a bug, a standalone tool to fix this case would be good. However, if there's an inherent issue where this case might happen and we can't guarantee the record was written (server failure), it might be best to add some process to the master/gc to eventually add one (e.g. if we see the wal has been hanging out in the state, add a createdTime after ~12hrs)

I wanted to bring some attention to this -- I think my immediate course
of action here is to manually add a createdTime so the files will be
replicated, then address this within the Accumulo source code itself.
Thoughts?

Status record ([begin: 0 end: 0 infiniteEnd: true closed:true]) for
hdfs://foo:8020/accumulo/wal/blah/blah in table k was written to
metadata table which lacked createtime

Thank you,
--Adam

Reply via email to