Yeah, so it seems that our number one mistake is taking the Master down in response to having issues. I guess you get so comfortable bringing the cluster up and down when you are first starting that it seems like a natural knee jerk reaction. This most recent time there was something in yellow/red, but I don't recall what it said and it didn't seem to make sense to me, so since I was having problems with the web console and not sure the actual state of the Master, I just tried to stop it. When it pushed back on shutting down (running stop-all.sh) something about access denied, I cancelled out of the shutdown script - so who knows on where it ended up.
Could you explain a little more about the Master's monitoring console? It runs an embedded Jetty instance and renders data from JMX MBeans from the running Master? I know there is an XML representation, and I thought I saw something about embedding it in a separate JMX console (or maybe it is blurring with my read on the ZK and Hadoop reading), but is there a data store that holds that data, is it accessible by some other means if the web console isn't responding? On Sat, Jul 14, 2012 at 8:09 PM, Eric Newton <[email protected]> wrote: > Is there anything red or yellow on the monitor pages? > > There's a layering to availability: > > Most of the monitoring is done via the master, so if it has recently > restarted, you will see almost no useful information. > > The first tablet of the METADATA table needs to be assigned, recovered > and functional. If you see only one tablet assigned... it needs to be > healthy before anything else can happen. > > Next, the rest of the METADATA table needs to be assigned, recovered > and functional. > > If you are seeing "-" then the METADATA table is not available for some > reason. > > Ensure that hadoop & zookeeper are not using /tmp for storage. > > -Eric > > On Sat, Jul 14, 2012 at 7:18 PM, Roger Lloyd <[email protected]> > wrote: > > I was looking for some insights in regards to a couple of issues I have > > seen, and the likely cause/solution. > > > > 1) Tables go blank > > > > So, everything kicking along fine, I am loading data, works beautifully > for > > days even weeks adding hundreds of millions of entries, splitting > tablets, > > etc. - just smooth. Suddenly, I run into an issue where under the web > > console all the tables all just go to "-" for their values (except the > > !METADATA table). > > > > What could/would cause this? > > > > What is the smart way to react? Our previous attempts have been 1) > re-init > > and reload through the Client API and 2) re-init and recover the tables > > using the bulk loading scheme mentioned in this mailing list. Not sure > that > > we haven't taken more rash action than necessary, simply because we could > > afford to reload, etc. When we increase our deployment, that will be > less > > of an option. Not sure what we are doing something wrong overall. > > > > 2) Client connections to Zookeeper > > > > When I am writing a client in Eclipse, we seem to have this issue where > it > > cycles connections creating and closing sessions (with no errors at all), > > but if I suspend the thread in Eclipse and start it again, then the > session > > opens and stays open. I realize this is probably a Zookeeper problem, > but > > can someone give me a quick run down of what is happening there under the > > hood, so I could try running some zKCli commands to simulate the issue? > > > > We are running version: 1.4.0-incubating-SNAPSHOT and Zookeeper 3.4.3. > If > > we wanted to upgrade to 1.4.1, how involved would that be? Just replace > the > > jar files and the config files? Or would we need to migrate data? > > > > Thanks for your help. > > > > Roger >
