2013-05-31 09:37:03,549 [tabletserver.TabletServer] DEBUG: Unassigning 12<<@(null,10.35.58.81:9997[13ec2e209c79745],null)
2013-05-31 09:37:03,667 [tabletserver.TabletServer] DEBUG: Unassigning 14<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,697 [tabletserver.TabletServer] DEBUG: Unassigning 16<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,751 [tabletserver.TabletServer] DEBUG: Unassigning 18<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,785 [tabletserver.TabletServer] DEBUG: Unassigning 1<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,824 [tabletserver.TabletServer] DEBUG: Unassigning 1b<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,868 [tabletserver.TabletServer] DEBUG: Unassigning 1c<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,893 [tabletserver.TabletServer] DEBUG: Unassigning 1d<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,919 [tabletserver.TabletServer] DEBUG: Unassigning 2<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,940 [tabletserver.TabletServer] DEBUG: Unassigning 4<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,969 [tabletserver.TabletServer] DEBUG: Unassigning 7<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:03,997 [tabletserver.TabletServer] DEBUG: Unassigning 9<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,014 [tabletserver.TabletServer] DEBUG: Unassigning a<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,049 [tabletserver.TabletServer] DEBUG: Unassigning d<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,071 [tabletserver.TabletServer] DEBUG: Unassigning g<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,119 [tabletserver.TabletServer] DEBUG: Unassigning i<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,145 [tabletserver.TabletServer] DEBUG: Unassigning j<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,183 [tabletserver.TabletServer] DEBUG: Unassigning k<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,210 [tabletserver.TabletServer] DEBUG: Unassigning l<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,235 [tabletserver.TabletServer] DEBUG: Unassigning o<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,260 [tabletserver.TabletServer] DEBUG: Unassigning p<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,284 [tabletserver.TabletServer] DEBUG: Unassigning u<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,306 [tabletserver.TabletServer] DEBUG: Unassigning z<<@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:04,686 [tabletserver.TabletServer] DEBUG: Unassigning !0<;~@(null,10.35.58.81:9997[13ec2e209c79745],null) 2013-05-31 09:37:48,675 [server.Accumulo] INFO : tserver.bulk.assign.threads = 1 2013-05-31 09:37:57,765 [tabletserver.TabletServer] INFO : 1620-accumulo.dhcp.saic.com/10.35.58.81:9997: got assignment from master: !0;!0<< 2013-05-31 09:37:57,775 [tabletserver.TabletServer] INFO : Reporting tablet !0;!0<< assignment failure: unable to verify Tablet Information 2013-05-31 09:37:57,961 [tabletserver.TabletServer] INFO : 1620-accumulo.dhcp.saic.com/10.35.58.81:9997: got assignment from master: !0;!0<< 2013-05-31 09:37:48,675 [server.Accumulo] INFO : tserver.bulk.assign.threads = 1 2013-05-31 09:37:57,765 [tabletserver.TabletServer] INFO : 1620-accumulo.dhcp.saic.com/10.35.58.81:9997: got assignment from master: !0;!0<< 2013-05-31 09:37:57,775 [tabletserver.TabletServer] INFO : Reporting tablet !0;!0<< assignment failure: unable to verify Tablet Information 2013-05-31 09:37:57,961 [tabletserver.TabletServer] INFO : 1620-accumulo.dhcp.saic.com/10.35.58.81:9997: got assignment from master: !0;!0<< From: [email protected] [mailto:[email protected]] On Behalf Of Billie Rinaldi Sent: Friday, May 31, 2013 1:32 PM To: [email protected] Subject: Re: Uneven distribute of Hosted Tablets? Hmm. Anything on the one that reported assignment failed? Billie On Fri, May 31, 2013 at 9:53 AM, Ott, Charles H. <[email protected]> wrote: 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Got unloadTablet message from user: !SYSTEM 2013-05-31 09:49:53,471 [tabletserver.Tablet] DEBUG: initiateClose(saveState=true queueMinC=false disableWrites=false) !0;!0<< 2013-05-31 09:49:53,471 [tabletserver.TabletServer] DEBUG: Failed to unload tablet !0;!0<<... it was alread closing or closed : Tablet !0;!0<< already closing The timestamp is 12 minutes off, since the clocks are out of sync, but there seems to be the same number of debug statements above as there were errors in the master. From: [email protected] [mailto:[email protected]] On Behalf Of Billie Rinaldi Sent: Friday, May 31, 2013 12:47 PM To: [email protected] Subject: Re: Uneven distribute of Hosted Tablets? Can you go to one of those servers that is reporting unload / assignment failed and check its tserver log to see why it failed? Billie On Fri, May 31, 2013 at 9:39 AM, Ott, Charles H. <[email protected]> wrote: I am not sure if I am using one of the balancers that comes with Accumulo. There are some errors in my logs for the master since I did the clean shutdown/startup this morning: 2013-05-31 09:37:57,592 [master.Master] ERROR: 10.35.56.92:9997 reports unload failed for tablet !0;!0<< (A lot of these errors showed up) 2013-05-31 09:37:57,795 [master.Master] ERROR: 10.35.58.81:9997 reports assignment failed for tablet !0;!0<< (only one of these) 2013-05-31 09:37:05,784 [master.Master] ERROR: master:1620-accumulo.dhcp.saic.com 10.35.56.92:9997 reports unload failed for tablet !0;!0<< (a lot of these) The entire batch of errors all occurred within 1 minute. Then they don't occur anymore. From: [email protected] [mailto:[email protected]] On Behalf Of Billie Rinaldi Sent: Friday, May 31, 2013 12:14 PM To: [email protected] Subject: Re: Uneven distribute of Hosted Tablets? So (at the risk of stating the obvious) it seems like your cluster is in a funny state. I would expect the counts in the "Hosted Tablets" column to all be roughly the same, especially after restarting the master, assuming you're using one of the balancers that comes with Accumulo. It's possible the cluster has gotten into this state due to the clock differences. Accumulo has a mechanism called "logical time" to deal with clock differences, but it is not enabled by default. You can enable it when you create a table. If you don't enable this it is recommended that you use NTP to synchronize the clocks on your cluster. The !METADATA table has logical time by default, but your other tables might not contain what you expect them to if you haven't enabled logical time. That said, I'm not sure why the clock issue would be affecting the balancing. You mentioned the new warnings you saw on the monitor page after you restarted the system. Could you see if there are any older errors in your log files? Billie On Fri, May 31, 2013 at 8:10 AM, Ott, Charles H. <[email protected]> wrote: -bash-4.1$ ssh 1620-accumulo -bash-4.1$ date Fri May 31 10:52:49 EDT 2013 -bash-4.1$ ssh 1620-Node1 -bash-4.1$ date Fri May 31 11:05:48 EDT 2013 -bash-4.1$ ssh 1620-Node2 -bash-4.1$ date Fri May 31 11:05:58 EDT 2013 -bash-4.1$ ssh 1620-Node3 -bash-4.1$ date Fri May 31 11:05:58 EDT 2013 Looks like the master(1620-accumulo) and it's tablet server are 12-13 minutes behind the nodes. I'm not sure my zookeeper+Hadoop+Accumulo+storm+Kafka stack will appreciate moving forward in time 12 minutes. From: [email protected] [mailto:[email protected]] On Behalf Of Billie Rinaldi Sent: Friday, May 31, 2013 11:02 AM To: [email protected] Subject: Re: Uneven distribute of Hosted Tablets? Those last contact times are concerning as well. Have they always looked like that? I notice they were roughly the same on your first screenshot. Are your server clocks not in sync? Billie On Fri, May 31, 2013 at 7:00 AM, Ott, Charles H. <[email protected]> wrote: I performed a clean shutdown and startup of all the processes using the start-all.sh/stop-all.sh scripts. The systems have only been online for about 5 minutes and everything is working. But I see the following Recent WARN in the Logs: time application count level message 31 09:37:57,0774 tserver:1620-accumulo 1 WARN Future location is not to this server for the root tablet Hosted tablet distribution seems to be worse: (Image Below Here) (Image Above Here) I am able to login and scans seems to be responsive. I noticed that when we had our entries ~20 M count, our batch scans were taking much longer. I was hoping that by distributing the tablets evenly, and splitting some of the bigger tables, we could get better performance. As for splitting the bigger table, I received a message from a peer. He mentioned that I could create a new table and split it on the values I want. Then use Map reduce job to move the data from the single tablet table to split table. From: [email protected] [mailto:[email protected]] On Behalf Of John Vines Sent: Thursday, May 30, 2013 5:30 PM To: [email protected] Cc: Lahr-Vivaz, Emilio F. Subject: Re: Uneven distribute of Hosted Tablets? Your distribution is cause for concern. I thought we had resolved a lot of the balancer issues in 1.4.1 or 1.4.2. Are you seeing any errors from the master in your logs? Worst case scenario is you just have to kill the master process and start it back up and you should see things balancing out. On Thu, May 30, 2013 at 4:40 PM, Ott, Charles H. <[email protected]> wrote: Thanks for the feedback. I will keep what you said in mind. From: [email protected] [mailto:[email protected]] On Behalf Of David Medinets Sent: Thursday, May 30, 2013 4:34 PM To: accumulo-user Subject: Re: Uneven distribute of Hosted Tablets? Don't worry about splits until you have a few billion entries and a lot more servers. What you're seeing now is just a bad signal to noise ratio. On Thu, May 30, 2013 at 11:22 AM, Ott, Charles H. <[email protected]> wrote: First I want to say thanks to the you all. The information provided by this mailing list has been invaluable to me and I appreciate it. My newest concern is the uneven allocation of hosted tablets across my tablet servers: (Image Pasted below here) (Image Pasted above here) I have been reading about pre-splitting tables in the Accumulo guide. But I am not sure if that would be the 'fix' for this. (Or even if this needs fixing.) I have 3 tables that could potentially grow to n number of records. Currently of those tables (and there single tablet) reside on the 1620-accumulo server (Hosting 24 tablets). Since there is already several entries on those tables, would splitting them be appropriate? Does splitting guarantee that the new tablets will be allocated to Node1 instead of Node 3? Or perhaps could I "re-balance" the cluster so that all of the tablet servers host an approximately equal number of tablets? These tablet servers were all brought up at separate times and I have not performed any optimizations or custom operations on them. Thanks, Charles
