Hmm. Anything on the one that reported assignment failed? Billie
On Fri, May 31, 2013 at 9:53 AM, Ott, Charles H. <[email protected]>wrote: > 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Got > unloadTablet message from user: !SYSTEM**** > > 2013-05-31 09:49:53,471 [tabletserver.Tablet] DEBUG: > initiateClose(saveState=true queueMinC=false disableWrites=false) !0;!0<<* > *** > > 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Failed to > unload tablet !0;!0<<... it was alread closing or closed : Tablet !0;!0<< > already closing**** > > ** ** > > The timestamp is 12 minutes off, since the clocks are out of sync, but > there seems to be the same number of debug statements above as there were > errors in the master.**** > > ** ** > > *From:* [email protected][mailto: > [email protected]] *On Behalf > Of *Billie Rinaldi > *Sent:* Friday, May 31, 2013 12:47 PM > > *To:* [email protected] > *Subject:* Re: Uneven distribute of Hosted Tablets?**** > > ** ** > > Can you go to one of those servers that is reporting unload / assignment > failed and check its tserver log to see why it failed? > > Billie**** > > ** ** > > On Fri, May 31, 2013 at 9:39 AM, Ott, Charles H. <[email protected]> > wrote:**** > > I am not sure if I am using one of the balancers that comes with > Accumulo. There are some errors in my logs for the master since I did the > clean shutdown/startup this morning:**** > > **** > > 2013-05-31 09:37:57,592 [master.Master] ERROR: 10.35.56.92:9997 reports > unload failed for tablet !0;!0<< (A lot of these errors showed up)**** > > **** > > 2013-05-31 09:37:57,795 [master.Master] ERROR: 10.35.58.81:9997 reports > assignment failed for tablet !0;!0<< (only one of these)**** > > **** > > 2013-05-31 09:37:05,784 [master.Master] ERROR: master: > 1620-accumulo.dhcp.saic.com 10.35.56.92:9997 reports unload failed for > tablet !0;!0<< (a lot of these)**** > > **** > > The entire batch of errors all occurred within 1 minute. Then they don’t > occur anymore.**** > > **** > > **** > > **** > > *From:* [email protected][mailto: > [email protected]] *On Behalf > Of *Billie Rinaldi > *Sent:* Friday, May 31, 2013 12:14 PM**** > > > *To:* [email protected] > *Subject:* Re: Uneven distribute of Hosted Tablets?**** > > **** > > So (at the risk of stating the obvious) it seems like your cluster is in a > funny state. I would expect the counts in the "Hosted Tablets" column to > all be roughly the same, especially after restarting the master, assuming > you're using one of the balancers that comes with Accumulo. It's possible > the cluster has gotten into this state due to the clock differences. > Accumulo has a mechanism called "logical time" to deal with clock > differences, but it is not enabled by default. You can enable it when you > create a table. If you don't enable this it is recommended that you use > NTP to synchronize the clocks on your cluster. The !METADATA table has > logical time by default, but your other tables might not contain what you > expect them to if you haven't enabled logical time.**** > > That said, I'm not sure why the clock issue would be affecting the > balancing. You mentioned the new warnings you saw on the monitor page > after you restarted the system. Could you see if there are any older > errors in your log files? > > Billie**** > > **** > > On Fri, May 31, 2013 at 8:10 AM, Ott, Charles H. <[email protected]> > wrote:**** > > -bash-4.1$ ssh 1620-accumulo**** > > -bash-4.1$ date**** > > Fri May 31 *10:52:49 *EDT 2013**** > > **** > > -bash-4.1$ ssh 1620-Node1**** > > -bash-4.1$ date**** > > Fri May 31 *11:05:48* EDT 2013**** > > **** > > -bash-4.1$ ssh 1620-Node2**** > > -bash-4.1$ date**** > > Fri May 31 *11:05:58* EDT 2013**** > > **** > > -bash-4.1$ ssh 1620-Node3**** > > -bash-4.1$ date**** > > Fri May 31 *11:05:58* EDT 2013**** > > **** > > Looks like the master(1620-accumulo) and it’s tablet server are 12-13 > minutes behind the nodes. I’m not sure my > zookeeper+Hadoop+Accumulo+storm+Kafka stack will appreciate moving forward > in time 12 minutes. **** > > **** > > *From:* [email protected][mailto: > [email protected]] *On Behalf > Of *Billie Rinaldi > *Sent:* Friday, May 31, 2013 11:02 AM > *To:* [email protected]**** > > > *Subject:* Re: Uneven distribute of Hosted Tablets?**** > > **** > > Those last contact times are concerning as well. Have they always looked > like that? I notice they were roughly the same on your first screenshot. > Are your server clocks not in sync?**** > > Billie**** > > **** > > On Fri, May 31, 2013 at 7:00 AM, Ott, Charles H. <[email protected]> > wrote:**** > > I performed a clean shutdown and startup of all the processes using the > start-all.sh/stop-all.sh scripts.**** > > **** > > The systems have only been online for about 5 minutes and everything is > working. But I see the following Recent WARN in the Logs:**** > > **** > > time > application count level message**** > > 31 09:37:57,0774 tserver:1620-accumulo 1 > WARN Future location is not to this server for the root tablet**** > > **** > > Hosted tablet distribution seems to be worse:**** > > **** > > (Image Below Here)**** > > > (Image Above Here)**** > > **** > > I am able to login and scans seems to be responsive. I noticed that when > we had our entries ~20 M count, our batch scans were taking much longer. I > was hoping that by distributing the tablets evenly, and splitting some of > the bigger tables, we could get better performance.**** > > As for splitting the bigger table, I received a message from a peer. He > mentioned that I could create a new table and split it on the values I > want. Then use Map reduce job to move the data from the single tablet > table to split table. **** > > **** > > *From:* [email protected][mailto: > [email protected]] *On Behalf > Of *John Vines > *Sent:* Thursday, May 30, 2013 5:30 PM > *To:* [email protected] > *Cc:* Lahr-Vivaz, Emilio F.**** > > > *Subject:* Re: Uneven distribute of Hosted Tablets?**** > > **** > > Your distribution is cause for concern. I thought we had resolved a lot of > the balancer issues in 1.4.1 or 1.4.2. Are you seeing any errors from the > master in your logs? Worst case scenario is you just have to kill the > master process and start it back up and you should see things balancing out. > **** > > **** > > On Thu, May 30, 2013 at 4:40 PM, Ott, Charles H. <[email protected]> > wrote:**** > > Thanks for the feedback. I will keep what you said in mind.**** > > **** > > *From:* [email protected][mailto: > [email protected]] *On Behalf > Of *David Medinets > *Sent:* Thursday, May 30, 2013 4:34 PM > *To:* accumulo-user > *Subject:* Re: Uneven distribute of Hosted Tablets?**** > > **** > > Don't worry about splits until you have a few billion entries and a lot > more servers. What you're seeing now is just a bad signal to noise ratio.* > *** > > **** > > On Thu, May 30, 2013 at 11:22 AM, Ott, Charles H. <[email protected]> > wrote:**** > > First I want to say thanks to the you all. The information provided by > this mailing list has been invaluable to me and I appreciate it.**** > > **** > > My newest concern is the uneven allocation of hosted tablets across my > tablet servers:**** > > **** > > (Image Pasted below here)**** > > **** > > (Image Pasted above here)**** > > **** > > I have been reading about pre-splitting tables in the Accumulo guide. But > I am not sure if that would be the ‘fix’ for this. (Or even if this needs > fixing.)**** > > **** > > I have 3 tables that could potentially grow to *n* number of records. > Currently of those tables (and there single tablet) reside on the > 1620-accumulo server (Hosting 24 tablets).**** > > **** > > Since there is already several entries on those tables, would splitting > them be appropriate? Does splitting guarantee that the new tablets will be > allocated to Node1 instead of Node 3? Or perhaps could I “re-balance” the > cluster so that all of the tablet servers host an approximately equal > number of tablets?**** > > **** > > These tablet servers were all brought up at separate times and I have not > performed any optimizations or custom operations on them.**** > > **** > > **** > > Thanks,**** > > Charles**** > > **** > > **** > > **** > > **** > > **** > > **** > > ** ** >
