Matt, This is going to be a long one, sorry. I will attempt to replicate your issue and show you how I accomplished what I think you are trying to do. I'll be using a single jar mini accumulo cluster I created at https://github.com/mjwall/standalone-mac/tree/1.6.6. Note it is 1.6.6
Built and ran with mvn clean package && java -jar target/standalone-1.6.6-mac-shaded-0.0.1-SNAPSHOT.jar once it starts up, here is what you get Starting a Mini Accumulo Cluster: InstanceName: smac Root user password: secret Temp dir is: /var/folders/cd/l8dpphgn3j1gfpr2gs6yb9vjjpd1pt/T/1487858319075-0 Zookeeper is: localhost:2181 Monitor: http://localhost:56202 Starting a shell Shell - Apache Accumulo Interactive Shell - - version: 1.6.6 - instance name: smac - instance id: 2a19ecc3-dd7f-4a8e-9505-5577f9eff2c7 - - type 'help' for a list of available commands - root@smac> The monitor url is shown above, so I hit that to look around It shows 2 metadata tables at this points, which I can confirm with the following scan root@smac blerp> scan -t accumulo.root -c ~tab !0;~ ~tab:~pr [] \x00 !0< ~tab:~pr [] \x01~ Now create a couple of tables with some splits root@smac> createtable blah root@smac blah> addsplits 1 2 3 4 5 6 7 8 9 0 a b c d e f g h i j k l m n o p q r s t u v w x y z root@smac blah> createtable blerp root@smac blerp> addsplits 1 2 3 4 5 6 7 8 9 0 a b c d e f g h i j k l m n o p q r s t u v w x y z Just for reference, here are the tables root@smac blerp> tables -l accumulo.metadata => !0 accumulo.root => +r blah => 1 blerp => 2 So let's create some additional metadata tablets before we delete any tables tables addsplits -t accumulo.metadata 1;1 1;3 1;5 1;7 1;9 1;a 1;c 1;e 1;g 1;i 1;k 1;m 1;o 1;q 1;s 1;u 1;w 1;y addsplits -t accumulo.metadata 2;1 2;3 2;5 2;7 2;9 2;a 2;c 2;e 2;g 2;i 2;k 2;m 2;o 2;q 2;s 2;u 2;w 2;y So there are now 37 metadata tablets in the monitor. Scanning accumulo.root shows that root@smac blerg> scan -t accumulo.root -c ~tab !0;1;1 ~tab:~pr [] \x00 !0;1;3 ~tab:~pr [] \x011;1 !0;1;5 ~tab:~pr [] \x011;3 !0;1;7 ~tab:~pr [] \x011;5 !0;1;9 ~tab:~pr [] \x011;7 !0;1;a ~tab:~pr [] \x011;9 !0;1;c ~tab:~pr [] \x011;a !0;1;e ~tab:~pr [] \x011;c !0;1;g ~tab:~pr [] \x011;e !0;1;i ~tab:~pr [] \x011;g !0;1;k ~tab:~pr [] \x011;i !0;1;m ~tab:~pr [] \x011;k !0;1;o ~tab:~pr [] \x011;m !0;1;q ~tab:~pr [] \x011;o !0;1;s ~tab:~pr [] \x011;q !0;1;u ~tab:~pr [] \x011;s !0;1;w ~tab:~pr [] \x011;u !0;1;y ~tab:~pr [] \x011;w !0;2;1 ~tab:~pr [] \x011;y !0;2;3 ~tab:~pr [] \x012;1 !0;2;5 ~tab:~pr [] \x012;3 !0;2;7 ~tab:~pr [] \x012;5 !0;2;9 ~tab:~pr [] \x012;7 !0;2;a ~tab:~pr [] \x012;9 !0;2;c ~tab:~pr [] \x012;a !0;2;e ~tab:~pr [] \x012;c !0;2;g ~tab:~pr [] \x012;e !0;2;i ~tab:~pr [] \x012;g !0;2;k ~tab:~pr [] \x012;i !0;2;m ~tab:~pr [] \x012;k !0;2;o ~tab:~pr [] \x012;m !0;2;q ~tab:~pr [] \x012;o !0;2;s ~tab:~pr [] \x012;q !0;2;u ~tab:~pr [] \x012;s !0;2;w ~tab:~pr [] \x012;u !0;2;y ~tab:~pr [] \x012;w !0;~ ~tab:~pr [] \x012;y !0< ~tab:~pr [] \x01~ There are associated metadata entries as well root@smac blerg> scan -t accumulo.metadata -b 1; -e 2; -c ~tab 1;0 ~tab:~pr [] \x00 1;1 ~tab:~pr [] \x010 1;2 ~tab:~pr [] \x011 1;3 ~tab:~pr [] \x012 1;4 ~tab:~pr [] \x013 1;5 ~tab:~pr [] \x014 1;6 ~tab:~pr [] \x015 1;7 ~tab:~pr [] \x016 1;8 ~tab:~pr [] \x017 1;9 ~tab:~pr [] \x018 1;a ~tab:~pr [] \x019 1;b ~tab:~pr [] \x01a 1;c ~tab:~pr [] \x01b 1;d ~tab:~pr [] \x01c 1;e ~tab:~pr [] \x01d 1;f ~tab:~pr [] \x01e 1;g ~tab:~pr [] \x01f 1;h ~tab:~pr [] \x01g 1;i ~tab:~pr [] \x01h 1;j ~tab:~pr [] \x01i 1;k ~tab:~pr [] \x01j 1;l ~tab:~pr [] \x01k 1;m ~tab:~pr [] \x01l 1;n ~tab:~pr [] \x01m 1;o ~tab:~pr [] \x01n 1;p ~tab:~pr [] \x01o 1;q ~tab:~pr [] \x01p 1;r ~tab:~pr [] \x01q 1;s ~tab:~pr [] \x01r 1;t ~tab:~pr [] \x01s 1;u ~tab:~pr [] \x01t 1;v ~tab:~pr [] \x01u 1;w ~tab:~pr [] \x01v 1;x ~tab:~pr [] \x01w 1;y ~tab:~pr [] \x01x 1;z ~tab:~pr [] \x01y 1< ~tab:~pr [] \x01z Let's delete the 2 tables root@smac blerg> deletetable blerg deletetable { blerg } (yes|no)? yes Table: [blerg] has been deleted. root@smac> deletetable blah deletetable { blah } (yes|no)? yes Table: [blah] has been deleted. The metadata table is clean root@smac> scan -t accumulo.metadata -b 1; -e 2; -c ~tab root@smac> scan -t accumulo.metadata -b 2; -c ~tab The root table now has empty tablets root@smac> scan -t accumulo.root -c ~tab !0;1;1 ~tab:~pr [] \x00 !0;1;3 ~tab:~pr [] \x011;1 !0;1;5 ~tab:~pr [] \x011;3 !0;1;7 ~tab:~pr [] \x011;5 !0;1;9 ~tab:~pr [] \x011;7 !0;1;a ~tab:~pr [] \x011;9 !0;1;c ~tab:~pr [] \x011;a !0;1;e ~tab:~pr [] \x011;c !0;1;g ~tab:~pr [] \x011;e !0;1;i ~tab:~pr [] \x011;g !0;1;k ~tab:~pr [] \x011;i !0;1;m ~tab:~pr [] \x011;k !0;1;o ~tab:~pr [] \x011;m !0;1;q ~tab:~pr [] \x011;o !0;1;s ~tab:~pr [] \x011;q !0;1;u ~tab:~pr [] \x011;s !0;1;w ~tab:~pr [] \x011;u !0;1;y ~tab:~pr [] \x011;w !0;2;1 ~tab:~pr [] \x011;y !0;2;3 ~tab:~pr [] \x012;1 !0;2;5 ~tab:~pr [] \x012;3 !0;2;7 ~tab:~pr [] \x012;5 !0;2;9 ~tab:~pr [] \x012;7 !0;2;a ~tab:~pr [] \x012;9 !0;2;c ~tab:~pr [] \x012;a !0;2;e ~tab:~pr [] \x012;c !0;2;g ~tab:~pr [] \x012;e !0;2;i ~tab:~pr [] \x012;g !0;2;k ~tab:~pr [] \x012;i !0;2;m ~tab:~pr [] \x012;k !0;2;o ~tab:~pr [] \x012;m !0;2;q ~tab:~pr [] \x012;o !0;2;s ~tab:~pr [] \x012;q !0;2;u ~tab:~pr [] \x012;s !0;2;w ~tab:~pr [] \x012;u !0;2;y ~tab:~pr [] \x012;w !0;~ ~tab:~pr [] \x012;y !0< ~tab:~pr [] \x01~ I believe this to be the situation you are in. Is that correct? So let's merge away the splits for table 1 and 2 into the last split for table 2, 2;y. root@smac> merge -? 2017-02-23 09:44:21,860 [shell.Shell.audit] INFO : root@smac> merge -? usage: merge [-] [-?] [-b <begin-row>] [-e <end-row>] [-f] [-s <arg>] [-t <table>] [-v] description: merges tablets in a table -,--all allow an entire table to be merged into one tablet without prompting the user for confirmation -?,--help display this help -b,--begin-row <begin-row> begin row (exclusive) -e,--end-row <end-row> end row (inclusive) -f,--force merge small tablets to large tablets, even if it goes over the given size -s,--size <arg> merge tablets to the given size over the entire table -t,--table <table> table to be merged -v,--verbose verbose output during merge root@smac> merge -t accumulo.metadata -b 1;0 -e 2;y -v There are now 3 metadata tablets in the monitor and with this scan root@smac> scan -t accumulo.root -c ~tab 2017-02-23 09:45:48,988 [shell.Shell.audit] INFO : root@smac> scan -t accumulo.root -c ~tab !0;2;y ~tab:~pr [] \x00 !0;~ ~tab:~pr [] \x012;y !0< ~tab:~pr [] \x01~ Can you provide more details on what is different from this walkthrough for you? On Wed, Feb 22, 2017 at 9:18 PM Dickson, Matt MR < matt.dick...@defence.gov.au> wrote: > *UNOFFICIAL* > We are on 1.6.5, could it be that the merge is not available in this > version. > > > ------------------------------ > *From:* Christopher [mailto:ctubb...@apache.org] > *Sent:* Thursday, 23 February 2017 12:46 > > *To:* user@accumulo.apache.org > *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL] > On Wed, Feb 22, 2017 at 8:18 PM Dickson, Matt MR < > matt.dick...@defence.gov.au> wrote: > > UNOFFICIAL > > I ran the compaction with no luck. > > I've had a close look at the split points on the metadata table and > confirmed that due to the initial large table we now have 90% of the > metadata for existing tables hosted on one tablet which creates a hotspot. > I've now manually added better split points to the metadata table that has > created tablets with only 4-5M entries rather than 12M+. > > The split points I created isolate the metadata for large tables to > separate tablets but ideally I'd like to split these further which raises 3 > questions. > > 1. If I have table 1xo, is there a smart way to determine the mid point of > the data in the metadata table eg 1xo;xxxx to allow me to create a split > based on that? > > 2. I tried to merge tablets on the metadata table where the size was > smaller than 1M but was met with a warning stating merge on the metadata > table was not allowed. Due to the deletion of the large table we have > several tablets with zero entries and they will never be populate. > > > Hmm. That seems to ring a bell. It was a goal of moving the root tablet > into its own table, that users would be able to merge the metadata table. > However, we may still have an unnecessary constraint on that in the > interface, which is no longer needed. If merging on the metadata table > doesn't work, please file a JIRA at > https://issues.apache.org/browse/ACCUMULO with any error messages you > saw, so we can track it as a bug. > > > 3. How Accumulo should deal with the deletion of a massive table? Should > the metadata table redistribute the tablets to avoid hotspotting on a > single tserver which appears to be whats happening? > > Thanks for all the help so far. > > -----Original Message----- > From: Josh Elser [mailto:josh.el...@gmail.com] > Sent: Thursday, 23 February 2017 10:00 > To: user@accumulo.apache.org > Subject: Re: accumulo.root invalid table reference [SEC=UNOFFICIAL] > > There's likely a delete "tombstone" in another file referenced by that > tablet which is masking those entries. If you compact the tablet, you > should see them all disappear. > > Yes, you should be able to split/merge the metatdata table just like any > other table. Beware, the implications of this are system wide instead of > localized to a single user table :) > > Dickson, Matt MR wrote: > > *UNOFFICIAL* > > > > When I inspect the rfiles associated with the metadata table using the > > rfile-info there are a lot of entries for the old deleted table, 1vm. > > Querying the metadata table returns nothing for the deleted table. > > When a table is deleted should the rfiles have any records referencing > > the old table? > > Also, am I able to manually create new split point on the metadata > > table to force it to break up the large tablet? > > ---------------------------------------------------------------------- > > -- > > *From:* Christopher [mailto:ctubb...@apache.org] > > *Sent:* Wednesday, 22 February 2017 15:46 > > *To:* user@accumulo.apache.org > > *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL] > > > > It should be safe to merge on the metadata table. That was one of the > > goals of moving the root tablet into its own table. I'm pretty sure we > > have a build test to ensure it works. > > > > On Tue, Feb 21, 2017, 18:22 Dickson, Matt MR > > <matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au>> > wrote: > > > > __ > > > > *UNOFFICIAL* > > > > Firstly, thankyou for your advice its been very helpful. > > Increasing the tablet server memory has allowed the metadata table > > to come online. From using the rfile-info and looking at the splits > > for the metadata table it appears that all the metadata table > > entries are in one tablet. All tablet servers then query the one > > node hosting that tablet. > > I suspect the cause of this was a poorly designed table that at one > > point the Accumulo gui reported 1.02T tablets for. We've > > subsequently deleted that table but it might be that there were so > > many entries in the metadata table that all splits on it were due to > > this massive table that had the table id 1vm. > > To rectify this, is it safe to run a merge on the metadata table to > > force it to redistribute? > > > > > ------------------------------------------------------------------------ > > *From:* Michael Wall [mailto:mjw...@gmail.com > > <mailto:mjw...@gmail.com>] > > *Sent:* Wednesday, 22 February 2017 02:44 > > > > *To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org> > > *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL] > > Matt, > > > > If I am reading this correctly, you have a tablet that is being > > loading onto a tserver. That tserver dies, so the tablet is then > > assigned to another tablet. While the tablet is being loading, that > > tserver dies and so on. Is that correct? > > > > Can you identify the tablet that is bouncing around? If so, try > > using rfile-info -d to inspect the rfiles associated with that > > tablet. Also look at the rfiles that compose that tablet to see if > > anything sticks out. > > > > Any logs that would help explain why the tablet server is dying? Can > > you increase the memory of the tserver? > > > > Mike > > > > On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <josh.el...@gmail.com > > <mailto:josh.el...@gmail.com>> wrote: > > > > ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception > > communicating with ZooKeeper, will retry > > SessionExpiredException: KeeperErrorCode = Session expired for > > > > /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.me > > mory > > > > There can be a number of causes for this, but here are the most > > likely ones. > > > > * JVM gc pauses > > * ZooKeeper max client connections > > * Operating System/Hardware-level pauses > > > > The former should be noticeable by the Accumulo log. There is a > > daemon > > running which watches for pauses that happen and then reports > > them. If > > this is happening, you might have to give the process some more > Java > > heap, tweak your CMS/G1 parameters, etc. > > > > For maxClientConnections, see > > > > https://community.hortonworks.com/articles/51191/understanding-apache- > > zookeeper-connection-rate-lim.html > > > > For the latter, swappiness is the most likely candidate > > (assuming this > > is hopping across different physical nodes), as are "transparent > > huge > > pages". If it is limited to a single host, things like bad NICs, > > hard > > drives, and other hardware issues might be a source of slowness. > > > > On Mon, Feb 20, 2017 at 10:18 PM, Dickson, Matt MR > > <matt.dick...@defence.gov.au > > <mailto:matt.dick...@defence.gov.au>> wrote: > > > UNOFFICIAL > > > > > > It looks like an issue with one of the metadata table > > tablets. On startup > > > the server that hosts a particular metadata tablet gets > > scanned by all other > > > tablet servers in the cluster. This then crashes that tablet > > server with an > > > error in the tserver log; > > > > > > ... [zookeeper.ZooCache] WARN: Saw (possibly) transient > exception > > > communicating with ZooKeeper, will retry > > > SessionExpiredException: KeeperErrorCode = Session expired for > > > > > > /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory > > > > > > That metadata table tablet is then transferred to another > > host which then > > > fails also, and so on. > > > > > > While the server is hosting this metadata tablet, we see the > > following log > > > statement from all tserver.logs in the cluster: > > > > > > .... [impl.ThriftScanner] DEBUG: Scan failed, thrift error > > > org.apache.thrift.transport.TTransportException null > > > (!0;1vm\\;125.323.233.23::2016103<,server.com.org:9997 > > <http://server.com.org:9997>,2342423df12341d) > > > Hope that helps complete the picture. > > > > > > > > > ________________________________ > > > From: Christopher [mailto:ctubb...@apache.org > > <mailto:ctubb...@apache.org>] > > > Sent: Tuesday, 21 February 2017 13:17 > > > > > > To: user@accumulo.apache.org <mailto:user@accumulo.apache.org > > > > > Subject: Re: accumulo.root invalid table reference > > [SEC=UNOFFICIAL] > > > > > > Removing them is probably a bad idea. The root table entries > > correspond to > > > split points in the metadata table. There is no need for the > > tables which > > > existed when the metadata table split to still exist for this > > to continue to > > > act as a valid split point. > > > > > > Would need to see the exception stack trace, or at least an > > error message, > > > to troubleshoot the shell scanning error you saw. > > > > > > > > > On Mon, Feb 20, 2017, 20:00 Dickson, Matt MR > > <matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au > >> > > > wrote: > > >> > > >> UNOFFICIAL > > >> > > >> In case it is ok to remove these from the root table, how > > can I scan the > > >> root table for rows with a rowid starting with !0;1vm? > > >> > > >> Running "scan -b !0;1vm" throws an exception and exits the > > shell. > > >> > > >> > > >> -----Original Message----- > > >> From: Dickson, Matt MR [mailto:matt.dick...@defence.gov.au > > <mailto:matt.dick...@defence.gov.au>] > > >> Sent: Tuesday, 21 February 2017 09:30 > > >> To: 'user@accumulo.apache.org <mailto: > user@accumulo.apache.org>' > > >> Subject: RE: accumulo.root invalid table reference > > [SEC=UNOFFICIAL] > > >> > > >> UNOFFICIAL > > >> > > >> > > >> Does that mean I should have entries for 1vm in the metadata > > table > > >> corresponding to the root table? > > >> > > >> We are running 1.6.5 > > >> > > >> > > >> -----Original Message----- > > >> From: Josh Elser [mailto:josh.el...@gmail.com > > <mailto:josh.el...@gmail.com>] > > >> Sent: Tuesday, 21 February 2017 09:22 > > >> To: user@accumulo.apache.org <mailto: > user@accumulo.apache.org> > > >> Subject: Re: accumulo.root invalid table reference > > [SEC=UNOFFICIAL] > > >> > > >> The root table should only reference the tablets in the > > metadata table. > > >> It's a hierarchy: like metadata is for the user tables, root > > is for the > > >> metadata table. > > >> > > >> What version are ya running, Matt? > > >> > > >> Dickson, Matt MR wrote: > > >> > *UNOFFICIAL* > > >> > > > >> > I have a situation where all tablet servers are > > progressively being > > >> > declared dead. From the logs the tservers report errors > like: > > >> > 2017-02-.... DEBUG: Scan failed thrift error > > >> > org.apache.thrift.trasport.TTransportException null > > >> > (!0;1vm\\125.323.233.23::2016103<,server.com.org:9997 > > <http://server.com.org:9997>,2342423df12341d) > > >> > 1vm was a table id that was deleted several months ago so > > it appears > > >> > there is some invalid reference somewhere. > > >> > Scanning the metadata table "scan -b 1vm" returns no rows > > returned for > > >> > 1vm. > > >> > A scan of the accumulo.root table returns approximately 15 > > rows that > > >> > start with; !0:1vm;<i/p addr>/::2016103 /blah/ // How are > > the root > > >> > table entries used and would it be safe to remove these > > entries since > > >> > they reference a deleted table? > > >> > Thanks in advance, > > >> > Matt > > >> > // > > > > > > -- > > > Christopher > > > > -- > > Christopher > > -- > Christopher >