Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

Josh Elser Wed, 22 Feb 2017 15:00:33 -0800

There's likely a delete "tombstone" in another file referenced by thattablet which is masking those entries. If you compact the tablet, youshould see them all disappear.

Yes, you should be able to split/merge the metatdata table just like anyother table. Beware, the implications of this are system wide instead oflocalized to a single user table :)


Dickson, Matt MR wrote:

*UNOFFICIAL*

When I inspect the rfiles associated with the metadata table using the
rfile-info there are a lot of entries for the old deleted table, 1vm.
Querying the metadata table returns nothing for the deleted table.
When a table is deleted should the rfiles have any records referencing
the old table?
Also, am I able to manually create new split point on the metadata table
to force it to break up the large tablet?
------------------------------------------------------------------------
*From:* Christopher [mailto:ctubb...@apache.org]
*Sent:* Wednesday, 22 February 2017 15:46
*To:* user@accumulo.apache.org
*Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

It should be safe to merge on the metadata table. That was one of the
goals of moving the root tablet into its own table. I'm pretty sure we
have a build test to ensure it works.

On Tue, Feb 21, 2017, 18:22 Dickson, Matt MR
<matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au>> wrote:

    __

    *UNOFFICIAL*

    Firstly, thankyou for your advice its been very helpful.
    Increasing the tablet server memory has allowed the metadata table
    to come online. From using the rfile-info and looking at the splits
    for the metadata table it appears that all the metadata table
    entries are in one tablet. All tablet servers then query the one
    node hosting that tablet.
    I suspect the cause of this was a poorly designed table that at one
    point the Accumulo gui reported 1.02T tablets for. We've
    subsequently deleted that table but it might be that there were so
    many entries in the metadata table that all splits on it were due to
    this massive table that had the table id 1vm.
    To rectify this, is it safe to run a merge on the metadata table to
    force it to redistribute?

    ------------------------------------------------------------------------
    *From:* Michael Wall [mailto:mjw...@gmail.com
    <mailto:mjw...@gmail.com>]
    *Sent:* Wednesday, 22 February 2017 02:44

    *To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org>
    *Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
    Matt,

    If I am reading this correctly, you have a tablet that is being
    loading onto a tserver. That tserver dies, so the tablet is then
    assigned to another tablet. While the tablet is being loading, that
    tserver dies and so on. Is that correct?

    Can you identify the tablet that is bouncing around? If so, try
    using rfile-info -d to inspect the rfiles associated with that
    tablet. Also look at the rfiles that compose that tablet to see if
    anything sticks out.

    Any logs that would help explain why the tablet server is dying? Can
    you increase the memory of the tserver?

    Mike

    On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <josh.el...@gmail.com
    <mailto:josh.el...@gmail.com>> wrote:

        ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
        communicating with ZooKeeper, will retry
        SessionExpiredException: KeeperErrorCode = Session expired for
        
/accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory

        There can be a number of causes for this, but here are the most
        likely ones.

        * JVM gc pauses
        * ZooKeeper max client connections
        * Operating System/Hardware-level pauses

        The former should be noticeable by the Accumulo log. There is a
        daemon
        running which watches for pauses that happen and then reports
        them. If
        this is happening, you might have to give the process some more Java
        heap, tweak your CMS/G1 parameters, etc.

        For maxClientConnections, see
        
https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html

        For the latter, swappiness is the most likely candidate
        (assuming this
        is hopping across different physical nodes), as are "transparent
        huge
        pages". If it is limited to a single host, things like bad NICs,
        hard
        drives, and other hardware issues might be a source of slowness.

        On Mon, Feb 20, 2017 at 10:18 PM, Dickson, Matt MR
        <matt.dick...@defence.gov.au
        <mailto:matt.dick...@defence.gov.au>> wrote:
         > UNOFFICIAL
         >
         > It looks like an issue with one of the metadata table
        tablets. On startup
         > the server that hosts a particular metadata tablet gets
        scanned by all other
         > tablet servers in the cluster. This then crashes that tablet
        server with an
         > error in the tserver log;
         >
         > ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
         > communicating with ZooKeeper, will retry
         > SessionExpiredException: KeeperErrorCode = Session expired for
         >
        
/accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
         >
         > That metadata table tablet is then transferred to another
        host which then
         > fails also, and so on.
         >
         > While the server is hosting this metadata tablet, we see the
        following log
         > statement from all tserver.logs in the cluster:
         >
         > .... [impl.ThriftScanner] DEBUG: Scan failed, thrift error
         > org.apache.thrift.transport.TTransportException null
         > (!0;1vm\\;125.323.233.23::2016103<,server.com.org:9997
        <http://server.com.org:9997>,2342423df12341d)
         > Hope that helps complete the picture.
         >
         >
         > ________________________________
         > From: Christopher [mailto:ctubb...@apache.org
        <mailto:ctubb...@apache.org>]
         > Sent: Tuesday, 21 February 2017 13:17
         >
         > To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
         > Subject: Re: accumulo.root invalid table reference
        [SEC=UNOFFICIAL]
         >
         > Removing them is probably a bad idea. The root table entries
        correspond to
         > split points in the metadata table. There is no need for the
        tables which
         > existed when the metadata table split to still exist for this
        to continue to
         > act as a valid split point.
         >
         > Would need to see the exception stack trace, or at least an
        error message,
         > to troubleshoot the shell scanning error you saw.
         >
         >
         > On Mon, Feb 20, 2017, 20:00 Dickson, Matt MR
        <matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au>>
         > wrote:
         >>
         >> UNOFFICIAL
         >>
         >> In case it is ok to remove these from the root table, how
        can I scan the
         >> root table for rows with a rowid starting with !0;1vm?
         >>
         >> Running "scan -b !0;1vm" throws an exception and exits the
        shell.
         >>
         >>
         >> -----Original Message-----
         >> From: Dickson, Matt MR [mailto:matt.dick...@defence.gov.au
        <mailto:matt.dick...@defence.gov.au>]
         >> Sent: Tuesday, 21 February 2017 09:30
         >> To: 'user@accumulo.apache.org <mailto:user@accumulo.apache.org>'
         >> Subject: RE: accumulo.root invalid table reference
        [SEC=UNOFFICIAL]
         >>
         >> UNOFFICIAL
         >>
         >>
         >> Does that mean I should have entries for 1vm in the metadata
        table
         >> corresponding to the root table?
         >>
         >> We are running 1.6.5
         >>
         >>
         >> -----Original Message-----
         >> From: Josh Elser [mailto:josh.el...@gmail.com
        <mailto:josh.el...@gmail.com>]
         >> Sent: Tuesday, 21 February 2017 09:22
         >> To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
         >> Subject: Re: accumulo.root invalid table reference
        [SEC=UNOFFICIAL]
         >>
         >> The root table should only reference the tablets in the
        metadata table.
         >> It's a hierarchy: like metadata is for the user tables, root
        is for the
         >> metadata table.
         >>
         >> What version are ya running, Matt?
         >>
         >> Dickson, Matt MR wrote:
         >> > *UNOFFICIAL*
         >> >
         >> > I have a situation where all tablet servers are
        progressively being
         >> > declared dead. From the logs the tservers report errors like:
         >> > 2017-02-.... DEBUG: Scan failed thrift error
         >> > org.apache.thrift.trasport.TTransportException null
         >> > (!0;1vm\\125.323.233.23::2016103<,server.com.org:9997
        <http://server.com.org:9997>,2342423df12341d)
         >> > 1vm was a table id that was deleted several months ago so
        it appears
         >> > there is some invalid reference somewhere.
         >> > Scanning the metadata table "scan -b 1vm" returns no rows
        returned for
         >> > 1vm.
         >> > A scan of the accumulo.root table returns approximately 15
        rows that
         >> > start with; !0:1vm;<i/p addr>/::2016103 /blah/ // How are
        the root
         >> > table entries used and would it be safe to remove these
        entries since
         >> > they reference a deleted table?
         >> > Thanks in advance,
         >> > Matt
         >> > //
         >
         > --
         > Christopher

--
Christopher

Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]

Reply via email to