Thanks Josh. That was helpful; yes a migration to hadoop 2 is in our future!
In the end, I decided to start a new instance like you ended up suggesting and bulk importing. Thanks for the help! On Tue, Feb 24, 2015 at 1:33 PM, Josh Elser <[email protected]> wrote: > Ok, that helps a bit. A few things > > > "Could not create ServerSocket.." error as it can't connect to the > tserver. > > Note that this is a Server socket. This means that the server (master or > tabletserver) failed to bind the socket it was going to use for the Thrift > server. This means that Accumulo will not work as the processes can't > communicate with each other or clients. The error message should make it > fairly obvious as to why the exception was thrown. Hopefully, the process > killed itself too. > > > Hadoop 1.2.1 > > Hadoop 1 doesn't have the best track-record when it comes to ensure that a > file is actually written to disk when we request it to be (a big part of > the reason we suggest to move to Hadoop 2 when you can). Hard poweroff can > result in bad Accumulo files in HDFS. > > You can try adding dfs.datanode.synconclose=true to your hdfs-site.xml > which might help protect against this, but I'm not sure of the error > handling of actually running out of space on the local disk. HDFS' reserved > space configuration can help remove this worry by preventing writes when > HDFS is nearing full instead of the actual file system. > > > I deleted the wal logs, hoping that it would revert to what was in > /accumulo/tables > > Deleting the WALs also isn't doing what you expect it to :). The WALs, > especially for the metadata table, are extremely important and are needed > to ensure that data is not lost (if WALs for the metadata table are lost, > the table might be in an inconsistent state that Accumulo can't > automatically recover from). > > This is probably why your tables are not coming online. > > Recovering your existing instance might not be worth the hassle. It's > likely easier to just move the RFiles in HDFS out of the way, and then > reimport them into a reinitialized Accumulo. > > An outline of how to do this can be found at http://accumulo.apache.org/1. > 6/accumulo_user_manual.html#_hdfs_failure under the *Q* "The metadata (or > root) table has references to a corrupt WAL". If you need some more > guidance than what is listed there, please feel free to ask! > > Kina Winoto wrote: > >> Hi Josh, >> >> > Versions of Hadoop and Accumulo: >> Hadoop 1.2.1 >> Accumulo 1.6.1 >> > Are the accumulo.metadata/!METADATA and/or accumulo.root tables online? >> Nope.. I tried to scan the tables -- it just hangs >> > Have you checked the logs of the Master and/or TabletServer for any >> exceptions? >> The master log is locked for read operation (an info message). I tried >> to shutdown the master with accumulo admin -f stopMaster, but it's still >> unhappy. >> The tserver log doesn't have any exceptions. However, if I run accumulo >> tserver -a localhost, then I'll get a "Could not create ServerSocket.." >> error as it can't connect to the tserver. >> >> For more context, I ran into all of this because I'm running this on a >> vm and I ran out of disk space so Accumulo could no longer write to the >> wal reliably and then checksums weren't matching up. After I created >> more space on my vm, I deleted the wal logs, hoping that it would revert >> to what was in /accumulo/tables, but then ran into this error where I >> have zero tablets. >> >> Thanks for any suggestions on what to do next! >> >> - Kina >> >> On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Kina, >> >> Can you share some more information? >> >> * Versions of Hadoop and Accumulo >> * Are the accumulo.metadata/!METADATA and/or accumulo.root tables >> online? >> * Have you checked the logs of the Master and/or TabletServer for >> any exceptions? >> >> - Josh >> >> Kina Winoto wrote: >> >> Hi, >> >> I'm running a local instance of accumulo with just one tablet >> server. I >> got into a rut and now I don't have any tablets. There is data >> still in >> hdfs but I assume the data is corrupted so the tablets aren't >> being >> assigned to the tablet server. Is there a way I can force a >> tablet to be >> assigned? I don't mind giving up a portion of my data (or all of >> it) at >> this point. I'd just rather not have to reinitialize accumulo and >> recreate all the users and set up all my tables again. Maybe I >> can force >> a tablet assignment and then delete the tables that are corrupted? >> >> I've encountered a similar issue on a many-node cluster and >> would like >> to know if my only option is to reinitialize accumulo. >> >> Thanks! >> >> - Kina >> >> — >> Sent from Mailbox <https://www.dropbox.com/__mailbox >> <https://www.dropbox.com/mailbox>> >> >> >>
