Matt,

This is going to be a long one, sorry.  I will attempt to replicate your
issue and show you how I accomplished what I think you are trying to do.
I'll be using a single jar mini accumulo cluster I created at
https://github.com/mjwall/standalone-mac/tree/1.6.6.  Note it is 1.6.6

Built and ran with

mvn clean package && java -jar
target/standalone-1.6.6-mac-shaded-0.0.1-SNAPSHOT.jar

once it starts up, here is what you get

Starting a Mini Accumulo Cluster:
InstanceName:       smac
Root user password: secret
Temp dir is:
 /var/folders/cd/l8dpphgn3j1gfpr2gs6yb9vjjpd1pt/T/1487858319075-0
Zookeeper is:       localhost:2181
Monitor:            http://localhost:56202
Starting a shell

Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.6
- instance name: smac
- instance id: 2a19ecc3-dd7f-4a8e-9505-5577f9eff2c7
-
- type 'help' for a list of available commands
-
root@smac>

The monitor url is shown above, so I hit that to look around

It shows 2 metadata tables at this points, which I can confirm with the
following scan

root@smac blerp> scan -t accumulo.root -c ~tab
!0;~ ~tab:~pr []    \x00
!0< ~tab:~pr []    \x01~

Now create a couple of tables with some splits

root@smac> createtable blah
root@smac blah> addsplits 1 2 3 4 5 6 7 8 9 0 a b c d e f g h i j k l m n o
p q r s t u v w x y z
root@smac blah> createtable blerp
root@smac blerp> addsplits 1 2 3 4 5 6 7 8 9 0 a b c d e f g h i j k l m n
o p q r s t u v w x y z

Just for reference, here are the tables

root@smac blerp> tables -l
accumulo.metadata    =>        !0
accumulo.root        =>        +r
blah                 =>         1
blerp                =>         2

So let's create some additional metadata tablets before we delete any
tables tables

addsplits -t accumulo.metadata 1;1 1;3 1;5 1;7 1;9 1;a 1;c 1;e 1;g 1;i 1;k
1;m 1;o 1;q 1;s 1;u 1;w 1;y
addsplits -t accumulo.metadata 2;1 2;3 2;5 2;7 2;9 2;a 2;c 2;e 2;g 2;i 2;k
2;m 2;o 2;q 2;s 2;u 2;w 2;y

So there are now 37 metadata tablets in the monitor.  Scanning
accumulo.root shows that

root@smac blerg> scan -t accumulo.root -c ~tab
!0;1;1 ~tab:~pr []    \x00
!0;1;3 ~tab:~pr []    \x011;1
!0;1;5 ~tab:~pr []    \x011;3
!0;1;7 ~tab:~pr []    \x011;5
!0;1;9 ~tab:~pr []    \x011;7
!0;1;a ~tab:~pr []    \x011;9
!0;1;c ~tab:~pr []    \x011;a
!0;1;e ~tab:~pr []    \x011;c
!0;1;g ~tab:~pr []    \x011;e
!0;1;i ~tab:~pr []    \x011;g
!0;1;k ~tab:~pr []    \x011;i
!0;1;m ~tab:~pr []    \x011;k
!0;1;o ~tab:~pr []    \x011;m
!0;1;q ~tab:~pr []    \x011;o
!0;1;s ~tab:~pr []    \x011;q
!0;1;u ~tab:~pr []    \x011;s
!0;1;w ~tab:~pr []    \x011;u
!0;1;y ~tab:~pr []    \x011;w
!0;2;1 ~tab:~pr []    \x011;y
!0;2;3 ~tab:~pr []    \x012;1
!0;2;5 ~tab:~pr []    \x012;3
!0;2;7 ~tab:~pr []    \x012;5
!0;2;9 ~tab:~pr []    \x012;7
!0;2;a ~tab:~pr []    \x012;9
!0;2;c ~tab:~pr []    \x012;a
!0;2;e ~tab:~pr []    \x012;c
!0;2;g ~tab:~pr []    \x012;e
!0;2;i ~tab:~pr []    \x012;g
!0;2;k ~tab:~pr []    \x012;i
!0;2;m ~tab:~pr []    \x012;k
!0;2;o ~tab:~pr []    \x012;m
!0;2;q ~tab:~pr []    \x012;o
!0;2;s ~tab:~pr []    \x012;q
!0;2;u ~tab:~pr []    \x012;s
!0;2;w ~tab:~pr []    \x012;u
!0;2;y ~tab:~pr []    \x012;w
!0;~ ~tab:~pr []    \x012;y
!0< ~tab:~pr []    \x01~

There are associated metadata entries as well

root@smac blerg> scan -t accumulo.metadata -b 1; -e 2; -c ~tab
1;0 ~tab:~pr []    \x00
1;1 ~tab:~pr []    \x010
1;2 ~tab:~pr []    \x011
1;3 ~tab:~pr []    \x012
1;4 ~tab:~pr []    \x013
1;5 ~tab:~pr []    \x014
1;6 ~tab:~pr []    \x015
1;7 ~tab:~pr []    \x016
1;8 ~tab:~pr []    \x017
1;9 ~tab:~pr []    \x018
1;a ~tab:~pr []    \x019
1;b ~tab:~pr []    \x01a
1;c ~tab:~pr []    \x01b
1;d ~tab:~pr []    \x01c
1;e ~tab:~pr []    \x01d
1;f ~tab:~pr []    \x01e
1;g ~tab:~pr []    \x01f
1;h ~tab:~pr []    \x01g
1;i ~tab:~pr []    \x01h
1;j ~tab:~pr []    \x01i
1;k ~tab:~pr []    \x01j
1;l ~tab:~pr []    \x01k
1;m ~tab:~pr []    \x01l
1;n ~tab:~pr []    \x01m
1;o ~tab:~pr []    \x01n
1;p ~tab:~pr []    \x01o
1;q ~tab:~pr []    \x01p
1;r ~tab:~pr []    \x01q
1;s ~tab:~pr []    \x01r
1;t ~tab:~pr []    \x01s
1;u ~tab:~pr []    \x01t
1;v ~tab:~pr []    \x01u
1;w ~tab:~pr []    \x01v
1;x ~tab:~pr []    \x01w
1;y ~tab:~pr []    \x01x
1;z ~tab:~pr []    \x01y
1< ~tab:~pr []    \x01z

Let's delete the 2 tables
root@smac blerg> deletetable blerg
deletetable { blerg } (yes|no)? yes
Table: [blerg] has been deleted.
root@smac> deletetable blah
deletetable { blah } (yes|no)? yes
Table: [blah] has been deleted.

The metadata table is clean

root@smac> scan -t accumulo.metadata -b 1; -e 2; -c ~tab
root@smac> scan -t accumulo.metadata -b 2;  -c ~tab

The root table now has empty tablets

root@smac> scan -t accumulo.root -c ~tab
!0;1;1 ~tab:~pr []    \x00
!0;1;3 ~tab:~pr []    \x011;1
!0;1;5 ~tab:~pr []    \x011;3
!0;1;7 ~tab:~pr []    \x011;5
!0;1;9 ~tab:~pr []    \x011;7
!0;1;a ~tab:~pr []    \x011;9
!0;1;c ~tab:~pr []    \x011;a
!0;1;e ~tab:~pr []    \x011;c
!0;1;g ~tab:~pr []    \x011;e
!0;1;i ~tab:~pr []    \x011;g
!0;1;k ~tab:~pr []    \x011;i
!0;1;m ~tab:~pr []    \x011;k
!0;1;o ~tab:~pr []    \x011;m
!0;1;q ~tab:~pr []    \x011;o
!0;1;s ~tab:~pr []    \x011;q
!0;1;u ~tab:~pr []    \x011;s
!0;1;w ~tab:~pr []    \x011;u
!0;1;y ~tab:~pr []    \x011;w
!0;2;1 ~tab:~pr []    \x011;y
!0;2;3 ~tab:~pr []    \x012;1
!0;2;5 ~tab:~pr []    \x012;3
!0;2;7 ~tab:~pr []    \x012;5
!0;2;9 ~tab:~pr []    \x012;7
!0;2;a ~tab:~pr []    \x012;9
!0;2;c ~tab:~pr []    \x012;a
!0;2;e ~tab:~pr []    \x012;c
!0;2;g ~tab:~pr []    \x012;e
!0;2;i ~tab:~pr []    \x012;g
!0;2;k ~tab:~pr []    \x012;i
!0;2;m ~tab:~pr []    \x012;k
!0;2;o ~tab:~pr []    \x012;m
!0;2;q ~tab:~pr []    \x012;o
!0;2;s ~tab:~pr []    \x012;q
!0;2;u ~tab:~pr []    \x012;s
!0;2;w ~tab:~pr []    \x012;u
!0;2;y ~tab:~pr []    \x012;w
!0;~ ~tab:~pr []    \x012;y
!0< ~tab:~pr []    \x01~

I believe this to be the situation you are in.  Is that correct?

So let's merge away the splits for table 1 and 2 into the last split for
table 2, 2;y.

root@smac> merge -?
2017-02-23 09:44:21,860 [shell.Shell.audit] INFO : root@smac> merge -?
usage: merge [-] [-?] [-b <begin-row>] [-e <end-row>] [-f] [-s <arg>] [-t
<table>] [-v]
description: merges tablets in a table
  -,--all                        allow an entire table to be merged into
one tablet without prompting the user for confirmation
  -?,--help                      display this help
  -b,--begin-row <begin-row>     begin row (exclusive)
  -e,--end-row <end-row>         end row (inclusive)
  -f,--force                     merge small tablets to large tablets, even
if it goes over the given size
  -s,--size <arg>                merge tablets to the given size over the
entire table
  -t,--table <table>             table to be merged
  -v,--verbose                   verbose output during merge
root@smac> merge -t accumulo.metadata -b 1;0 -e 2;y -v

There are now 3 metadata tablets in the monitor and with this scan

root@smac> scan -t accumulo.root -c ~tab
2017-02-23 09:45:48,988 [shell.Shell.audit] INFO : root@smac> scan -t
accumulo.root -c ~tab
!0;2;y ~tab:~pr []    \x00
!0;~ ~tab:~pr []    \x012;y
!0< ~tab:~pr []    \x01~

Can you provide more details on what is different from this walkthrough for
you?




On Wed, Feb 22, 2017 at 9:18 PM Dickson, Matt MR <
matt.dick...@defence.gov.au> wrote:

> *UNOFFICIAL*
> We are on 1.6.5, could it be that the merge is not available in this
> version.
>
>
> ------------------------------
> *From:* Christopher [mailto:ctubb...@apache.org]
> *Sent:* Thursday, 23 February 2017 12:46
>
> *To:* user@accumulo.apache.org
> *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> On Wed, Feb 22, 2017 at 8:18 PM Dickson, Matt MR <
> matt.dick...@defence.gov.au> wrote:
>
> UNOFFICIAL
>
> I ran the compaction with no luck.
>
> I've had a close look at the split points on the metadata table and
> confirmed that due to the initial large table we now have 90% of the
> metadata for existing tables hosted on one tablet which creates a hotspot.
> I've now manually added better split points to the metadata table that has
> created tablets with only 4-5M entries rather than 12M+.
>
> The split points I created isolate the metadata for large tables to
> separate tablets but ideally I'd like to split these further which raises 3
> questions.
>
> 1. If I have table 1xo, is there a smart way to determine the mid point of
> the data in the metadata table eg 1xo;xxxx to allow me to create a split
> based on that?
>
> 2. I tried to merge tablets on the metadata table where the size was
> smaller than 1M but was met with a warning stating merge on the metadata
> table was not allowed. Due to the deletion of the large table we have
> several tablets with zero entries and they will never be populate.
>
>
> Hmm. That seems to ring a bell. It was a goal of moving the root tablet
> into its own table, that users would be able to merge the metadata table.
> However, we may still have an unnecessary constraint on that in the
> interface, which is no longer needed. If merging on the metadata table
> doesn't work, please file a JIRA at
> https://issues.apache.org/browse/ACCUMULO with any error messages you
> saw, so we can track it as a bug.
>
>
> 3. How Accumulo should deal with the deletion of a massive table? Should
> the metadata table redistribute the tablets to avoid hotspotting on a
> single tserver which appears to be whats happening?
>
> Thanks for all the help so far.
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Thursday, 23 February 2017 10:00
> To: user@accumulo.apache.org
> Subject: Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
>
> There's likely a delete "tombstone" in another file referenced by that
> tablet which is masking those entries. If you compact the tablet, you
> should see them all disappear.
>
> Yes, you should be able to split/merge the metatdata table just like any
> other table. Beware, the implications of this are system wide instead of
> localized to a single user table :)
>
> Dickson, Matt MR wrote:
> > *UNOFFICIAL*
> >
> > When I inspect the rfiles associated with the metadata table using the
> > rfile-info there are a lot of entries for the old deleted table, 1vm.
> > Querying the metadata table returns nothing for the deleted table.
> > When a table is deleted should the rfiles have any records referencing
> > the old table?
> > Also, am I able to manually create new split point on the metadata
> > table to force it to break up the large tablet?
> > ----------------------------------------------------------------------
> > --
> > *From:* Christopher [mailto:ctubb...@apache.org]
> > *Sent:* Wednesday, 22 February 2017 15:46
> > *To:* user@accumulo.apache.org
> > *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> >
> > It should be safe to merge on the metadata table. That was one of the
> > goals of moving the root tablet into its own table. I'm pretty sure we
> > have a build test to ensure it works.
> >
> > On Tue, Feb 21, 2017, 18:22 Dickson, Matt MR
> > <matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au>>
> wrote:
> >
> >     __
> >
> >     *UNOFFICIAL*
> >
> >     Firstly, thankyou for your advice its been very helpful.
> >     Increasing the tablet server memory has allowed the metadata table
> >     to come online. From using the rfile-info and looking at the splits
> >     for the metadata table it appears that all the metadata table
> >     entries are in one tablet. All tablet servers then query the one
> >     node hosting that tablet.
> >     I suspect the cause of this was a poorly designed table that at one
> >     point the Accumulo gui reported 1.02T tablets for. We've
> >     subsequently deleted that table but it might be that there were so
> >     many entries in the metadata table that all splits on it were due to
> >     this massive table that had the table id 1vm.
> >     To rectify this, is it safe to run a merge on the metadata table to
> >     force it to redistribute?
> >
> >
>  ------------------------------------------------------------------------
> >     *From:* Michael Wall [mailto:mjw...@gmail.com
> >     <mailto:mjw...@gmail.com>]
> >     *Sent:* Wednesday, 22 February 2017 02:44
> >
> >     *To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org>
> >     *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
> >     Matt,
> >
> >     If I am reading this correctly, you have a tablet that is being
> >     loading onto a tserver. That tserver dies, so the tablet is then
> >     assigned to another tablet. While the tablet is being loading, that
> >     tserver dies and so on. Is that correct?
> >
> >     Can you identify the tablet that is bouncing around? If so, try
> >     using rfile-info -d to inspect the rfiles associated with that
> >     tablet. Also look at the rfiles that compose that tablet to see if
> >     anything sticks out.
> >
> >     Any logs that would help explain why the tablet server is dying? Can
> >     you increase the memory of the tserver?
> >
> >     Mike
> >
> >     On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <josh.el...@gmail.com
> >     <mailto:josh.el...@gmail.com>> wrote:
> >
> >         ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
> >         communicating with ZooKeeper, will retry
> >         SessionExpiredException: KeeperErrorCode = Session expired for
> >
> > /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.me
> > mory
> >
> >         There can be a number of causes for this, but here are the most
> >         likely ones.
> >
> >         * JVM gc pauses
> >         * ZooKeeper max client connections
> >         * Operating System/Hardware-level pauses
> >
> >         The former should be noticeable by the Accumulo log. There is a
> >         daemon
> >         running which watches for pauses that happen and then reports
> >         them. If
> >         this is happening, you might have to give the process some more
> Java
> >         heap, tweak your CMS/G1 parameters, etc.
> >
> >         For maxClientConnections, see
> >
> > https://community.hortonworks.com/articles/51191/understanding-apache-
> > zookeeper-connection-rate-lim.html
> >
> >         For the latter, swappiness is the most likely candidate
> >         (assuming this
> >         is hopping across different physical nodes), as are "transparent
> >         huge
> >         pages". If it is limited to a single host, things like bad NICs,
> >         hard
> >         drives, and other hardware issues might be a source of slowness.
> >
> >         On Mon, Feb 20, 2017 at 10:18 PM, Dickson, Matt MR
> >         <matt.dick...@defence.gov.au
> >         <mailto:matt.dick...@defence.gov.au>> wrote:
> >          > UNOFFICIAL
> >          >
> >          > It looks like an issue with one of the metadata table
> >         tablets. On startup
> >          > the server that hosts a particular metadata tablet gets
> >         scanned by all other
> >          > tablet servers in the cluster. This then crashes that tablet
> >         server with an
> >          > error in the tserver log;
> >          >
> >          > ... [zookeeper.ZooCache] WARN: Saw (possibly) transient
> exception
> >          > communicating with ZooKeeper, will retry
> >          > SessionExpiredException: KeeperErrorCode = Session expired for
> >          >
> >
>  /accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
> >          >
> >          > That metadata table tablet is then transferred to another
> >         host which then
> >          > fails also, and so on.
> >          >
> >          > While the server is hosting this metadata tablet, we see the
> >         following log
> >          > statement from all tserver.logs in the cluster:
> >          >
> >          > .... [impl.ThriftScanner] DEBUG: Scan failed, thrift error
> >          > org.apache.thrift.transport.TTransportException null
> >          > (!0;1vm\\;125.323.233.23::2016103<,server.com.org:9997
> >         <http://server.com.org:9997>,2342423df12341d)
> >          > Hope that helps complete the picture.
> >          >
> >          >
> >          > ________________________________
> >          > From: Christopher [mailto:ctubb...@apache.org
> >         <mailto:ctubb...@apache.org>]
> >          > Sent: Tuesday, 21 February 2017 13:17
> >          >
> >          > To: user@accumulo.apache.org <mailto:user@accumulo.apache.org
> >
> >          > Subject: Re: accumulo.root invalid table reference
> >         [SEC=UNOFFICIAL]
> >          >
> >          > Removing them is probably a bad idea. The root table entries
> >         correspond to
> >          > split points in the metadata table. There is no need for the
> >         tables which
> >          > existed when the metadata table split to still exist for this
> >         to continue to
> >          > act as a valid split point.
> >          >
> >          > Would need to see the exception stack trace, or at least an
> >         error message,
> >          > to troubleshoot the shell scanning error you saw.
> >          >
> >          >
> >          > On Mon, Feb 20, 2017, 20:00 Dickson, Matt MR
> >         <matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au
> >>
> >          > wrote:
> >          >>
> >          >> UNOFFICIAL
> >          >>
> >          >> In case it is ok to remove these from the root table, how
> >         can I scan the
> >          >> root table for rows with a rowid starting with !0;1vm?
> >          >>
> >          >> Running "scan -b !0;1vm" throws an exception and exits the
> >         shell.
> >          >>
> >          >>
> >          >> -----Original Message-----
> >          >> From: Dickson, Matt MR [mailto:matt.dick...@defence.gov.au
> >         <mailto:matt.dick...@defence.gov.au>]
> >          >> Sent: Tuesday, 21 February 2017 09:30
> >          >> To: 'user@accumulo.apache.org <mailto:
> user@accumulo.apache.org>'
> >          >> Subject: RE: accumulo.root invalid table reference
> >         [SEC=UNOFFICIAL]
> >          >>
> >          >> UNOFFICIAL
> >          >>
> >          >>
> >          >> Does that mean I should have entries for 1vm in the metadata
> >         table
> >          >> corresponding to the root table?
> >          >>
> >          >> We are running 1.6.5
> >          >>
> >          >>
> >          >> -----Original Message-----
> >          >> From: Josh Elser [mailto:josh.el...@gmail.com
> >         <mailto:josh.el...@gmail.com>]
> >          >> Sent: Tuesday, 21 February 2017 09:22
> >          >> To: user@accumulo.apache.org <mailto:
> user@accumulo.apache.org>
> >          >> Subject: Re: accumulo.root invalid table reference
> >         [SEC=UNOFFICIAL]
> >          >>
> >          >> The root table should only reference the tablets in the
> >         metadata table.
> >          >> It's a hierarchy: like metadata is for the user tables, root
> >         is for the
> >          >> metadata table.
> >          >>
> >          >> What version are ya running, Matt?
> >          >>
> >          >> Dickson, Matt MR wrote:
> >          >> > *UNOFFICIAL*
> >          >> >
> >          >> > I have a situation where all tablet servers are
> >         progressively being
> >          >> > declared dead. From the logs the tservers report errors
> like:
> >          >> > 2017-02-.... DEBUG: Scan failed thrift error
> >          >> > org.apache.thrift.trasport.TTransportException null
> >          >> > (!0;1vm\\125.323.233.23::2016103<,server.com.org:9997
> >         <http://server.com.org:9997>,2342423df12341d)
> >          >> > 1vm was a table id that was deleted several months ago so
> >         it appears
> >          >> > there is some invalid reference somewhere.
> >          >> > Scanning the metadata table "scan -b 1vm" returns no rows
> >         returned for
> >          >> > 1vm.
> >          >> > A scan of the accumulo.root table returns approximately 15
> >         rows that
> >          >> > start with; !0:1vm;<i/p addr>/::2016103 /blah/ // How are
> >         the root
> >          >> > table entries used and would it be safe to remove these
> >         entries since
> >          >> > they reference a deleted table?
> >          >> > Thanks in advance,
> >          >> > Matt
> >          >> > //
> >          >
> >          > --
> >          > Christopher
> >
> > --
> > Christopher
>
> --
> Christopher
>

Reply via email to