It begins to show various txid lines Then the errors begin. Thread "org.apache.accumulo.server.fate.Admin" died null It then goes into a series of errors. These all stem at the bottom of them from a deserialize error in a section starting with a "caused by:java.io.EOFException" error.
I will attempt to get a printout of the errors to hand type in if need be. On Mon, Nov 4, 2013 at 1:51 PM, Eric Newton <[email protected]> wrote: > These symptoms would appear to be caused by problems with table > operations, which are heavily dependent on the master being able to > use data in zookeeper. > > So, try to find the first errors, especially those related to > serialization or deserialization closest to when the master first > started. > > What do you get when you run: > > $ ./bin/accumulo org.apache.accumulo.server.fate.Admin print > > ? > > -Eric > > > On Fri, Nov 1, 2013 at 4:17 PM, Dave Mullins <[email protected]> > wrote: > > Hadoop version 0.20.2-cdh3u5 > > This was installed from the cdh rpms but is not controlled by a cloudera > > manager. > > > > I read what documentation I could find on the upgrade. > > I installed from the tarball version of 1.5.0. > > I made sure to include the commons collection in the accumulo library > path. > > I made sure to add the dfs.support.append true to the hdfs-site files. > > I did a complete restart ( to include a reboot) of the system. > > > > All of the tablet servers come online > > all the master's services come online and seem to be working. (The > monitor > > does show the correct number of tablets, tablet servers, and so forth.) > > > > I am able to use some of the features of the accumulo shell > > I can display the contents of a table. > > I can't create or delete a table without getting the following error: > > [impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to > > x.x.x.x:9999:9999 (0) for at least 120040 ms > > > > When I go digging in the logs I find very few errors. (These systems are > not > > on a net I can cut and paste to here so I am trying to represent the > issue > > as best I can.) > > > > There are 4 errors that the Repo runner [0-3] threads died > > > > Another error that springs up occasionally is : WARN: Thread "GC" stuck > on > > io to x.x.x.x:9999:9999 (0) for at least 120040 ms > > > > A netstat run before I start the master up shows nothing running on port > > 9999 nor any connections to that port. > > A netstat after about the accumulo start shows about 16 connections in a > > TIME_WAIT state in the 35k-36k port range from the master. It also show > an > > established state for 1 both both direction (36783) and inbound from port > > 9999 to port 47636 also from the master. > > > > It seems after this point anything that tries to connect to port 9999 > goes > > into a TIME_WAIT and never does anything. > > > > I have checked all the permissions I can think of and everything seems > to be > > correct. > > HDFS is running correctly and jobs not associated with accumulo all see > to > > be working. >
