Ok, that helps a bit. A few things
> "Could not create ServerSocket.." error as it can't connect to the
tserver.
Note that this is a Server socket. This means that the server (master or
tabletserver) failed to bind the socket it was going to use for the
Thrift server. This means that Accumulo will not work as the processes
can't communicate with each other or clients. The error message should
make it fairly obvious as to why the exception was thrown. Hopefully,
the process killed itself too.
> Hadoop 1.2.1
Hadoop 1 doesn't have the best track-record when it comes to ensure that
a file is actually written to disk when we request it to be (a big part
of the reason we suggest to move to Hadoop 2 when you can). Hard
poweroff can result in bad Accumulo files in HDFS.
You can try adding dfs.datanode.synconclose=true to your hdfs-site.xml
which might help protect against this, but I'm not sure of the error
handling of actually running out of space on the local disk. HDFS'
reserved space configuration can help remove this worry by preventing
writes when HDFS is nearing full instead of the actual file system.
> I deleted the wal logs, hoping that it would revert to what was in
/accumulo/tables
Deleting the WALs also isn't doing what you expect it to :). The WALs,
especially for the metadata table, are extremely important and are
needed to ensure that data is not lost (if WALs for the metadata table
are lost, the table might be in an inconsistent state that Accumulo
can't automatically recover from).
This is probably why your tables are not coming online.
Recovering your existing instance might not be worth the hassle. It's
likely easier to just move the RFiles in HDFS out of the way, and then
reimport them into a reinitialized Accumulo.
An outline of how to do this can be found at
http://accumulo.apache.org/1.6/accumulo_user_manual.html#_hdfs_failure
under the *Q* "The metadata (or root) table has references to a corrupt
WAL". If you need some more guidance than what is listed there, please
feel free to ask!
Kina Winoto wrote:
Hi Josh,
> Versions of Hadoop and Accumulo:
Hadoop 1.2.1
Accumulo 1.6.1
> Are the accumulo.metadata/!METADATA and/or accumulo.root tables online?
Nope.. I tried to scan the tables -- it just hangs
> Have you checked the logs of the Master and/or TabletServer for any
exceptions?
The master log is locked for read operation (an info message). I tried
to shutdown the master with accumulo admin -f stopMaster, but it's still
unhappy.
The tserver log doesn't have any exceptions. However, if I run accumulo
tserver -a localhost, then I'll get a "Could not create ServerSocket.."
error as it can't connect to the tserver.
For more context, I ran into all of this because I'm running this on a
vm and I ran out of disk space so Accumulo could no longer write to the
wal reliably and then checksums weren't matching up. After I created
more space on my vm, I deleted the wal logs, hoping that it would revert
to what was in /accumulo/tables, but then ran into this error where I
have zero tablets.
Thanks for any suggestions on what to do next!
- Kina
On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
Hi Kina,
Can you share some more information?
* Versions of Hadoop and Accumulo
* Are the accumulo.metadata/!METADATA and/or accumulo.root tables
online?
* Have you checked the logs of the Master and/or TabletServer for
any exceptions?
- Josh
Kina Winoto wrote:
Hi,
I'm running a local instance of accumulo with just one tablet
server. I
got into a rut and now I don't have any tablets. There is data
still in
hdfs but I assume the data is corrupted so the tablets aren't being
assigned to the tablet server. Is there a way I can force a
tablet to be
assigned? I don't mind giving up a portion of my data (or all of
it) at
this point. I'd just rather not have to reinitialize accumulo and
recreate all the users and set up all my tables again. Maybe I
can force
a tablet assignment and then delete the tables that are corrupted?
I've encountered a similar issue on a many-node cluster and
would like
to know if my only option is to reinitialize accumulo.
Thanks!
- Kina
—
Sent from Mailbox <https://www.dropbox.com/__mailbox
<https://www.dropbox.com/mailbox>>