*UNOFFICIAL*
When I inspect the rfiles associated with the metadata table using the
rfile-info there are a lot of entries for the old deleted table, 1vm.
Querying the metadata table returns nothing for the deleted table.
When a table is deleted should the rfiles have any records referencing
the old table?
Also, am I able to manually create new split point on the metadata table
to force it to break up the large tablet?
------------------------------------------------------------------------
*From:* Christopher [mailto:ctubb...@apache.org]
*Sent:* Wednesday, 22 February 2017 15:46
*To:* user@accumulo.apache.org
*Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
It should be safe to merge on the metadata table. That was one of the
goals of moving the root tablet into its own table. I'm pretty sure we
have a build test to ensure it works.
On Tue, Feb 21, 2017, 18:22 Dickson, Matt MR
<matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au>> wrote:
__
*UNOFFICIAL*
Firstly, thankyou for your advice its been very helpful.
Increasing the tablet server memory has allowed the metadata table
to come online. From using the rfile-info and looking at the splits
for the metadata table it appears that all the metadata table
entries are in one tablet. All tablet servers then query the one
node hosting that tablet.
I suspect the cause of this was a poorly designed table that at one
point the Accumulo gui reported 1.02T tablets for. We've
subsequently deleted that table but it might be that there were so
many entries in the metadata table that all splits on it were due to
this massive table that had the table id 1vm.
To rectify this, is it safe to run a merge on the metadata table to
force it to redistribute?
------------------------------------------------------------------------
*From:* Michael Wall [mailto:mjw...@gmail.com
<mailto:mjw...@gmail.com>]
*Sent:* Wednesday, 22 February 2017 02:44
*To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org>
*Subject:* Re: accumulo.root invalid table reference [SEC=UNOFFICIAL]
Matt,
If I am reading this correctly, you have a tablet that is being
loading onto a tserver. That tserver dies, so the tablet is then
assigned to another tablet. While the tablet is being loading, that
tserver dies and so on. Is that correct?
Can you identify the tablet that is bouncing around? If so, try
using rfile-info -d to inspect the rfiles associated with that
tablet. Also look at the rfiles that compose that tablet to see if
anything sticks out.
Any logs that would help explain why the tablet server is dying? Can
you increase the memory of the tserver?
Mike
On Tue, Feb 21, 2017 at 10:35 AM Josh Elser <josh.el...@gmail.com
<mailto:josh.el...@gmail.com>> wrote:
... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
communicating with ZooKeeper, will retry
SessionExpiredException: KeeperErrorCode = Session expired for
/accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
There can be a number of causes for this, but here are the most
likely ones.
* JVM gc pauses
* ZooKeeper max client connections
* Operating System/Hardware-level pauses
The former should be noticeable by the Accumulo log. There is a
daemon
running which watches for pauses that happen and then reports
them. If
this is happening, you might have to give the process some more Java
heap, tweak your CMS/G1 parameters, etc.
For maxClientConnections, see
https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html
For the latter, swappiness is the most likely candidate
(assuming this
is hopping across different physical nodes), as are "transparent
huge
pages". If it is limited to a single host, things like bad NICs,
hard
drives, and other hardware issues might be a source of slowness.
On Mon, Feb 20, 2017 at 10:18 PM, Dickson, Matt MR
<matt.dick...@defence.gov.au
<mailto:matt.dick...@defence.gov.au>> wrote:
> UNOFFICIAL
>
> It looks like an issue with one of the metadata table
tablets. On startup
> the server that hosts a particular metadata tablet gets
scanned by all other
> tablet servers in the cluster. This then crashes that tablet
server with an
> error in the tserver log;
>
> ... [zookeeper.ZooCache] WARN: Saw (possibly) transient exception
> communicating with ZooKeeper, will retry
> SessionExpiredException: KeeperErrorCode = Session expired for
>
/accumulo/4234234234234234/namespaces/+accumulo/conf/table.scan.max.memory
>
> That metadata table tablet is then transferred to another
host which then
> fails also, and so on.
>
> While the server is hosting this metadata tablet, we see the
following log
> statement from all tserver.logs in the cluster:
>
> .... [impl.ThriftScanner] DEBUG: Scan failed, thrift error
> org.apache.thrift.transport.TTransportException null
> (!0;1vm\\;125.323.233.23::2016103<,server.com.org:9997
<http://server.com.org:9997>,2342423df12341d)
> Hope that helps complete the picture.
>
>
> ________________________________
> From: Christopher [mailto:ctubb...@apache.org
<mailto:ctubb...@apache.org>]
> Sent: Tuesday, 21 February 2017 13:17
>
> To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
> Subject: Re: accumulo.root invalid table reference
[SEC=UNOFFICIAL]
>
> Removing them is probably a bad idea. The root table entries
correspond to
> split points in the metadata table. There is no need for the
tables which
> existed when the metadata table split to still exist for this
to continue to
> act as a valid split point.
>
> Would need to see the exception stack trace, or at least an
error message,
> to troubleshoot the shell scanning error you saw.
>
>
> On Mon, Feb 20, 2017, 20:00 Dickson, Matt MR
<matt.dick...@defence.gov.au <mailto:matt.dick...@defence.gov.au>>
> wrote:
>>
>> UNOFFICIAL
>>
>> In case it is ok to remove these from the root table, how
can I scan the
>> root table for rows with a rowid starting with !0;1vm?
>>
>> Running "scan -b !0;1vm" throws an exception and exits the
shell.
>>
>>
>> -----Original Message-----
>> From: Dickson, Matt MR [mailto:matt.dick...@defence.gov.au
<mailto:matt.dick...@defence.gov.au>]
>> Sent: Tuesday, 21 February 2017 09:30
>> To: 'user@accumulo.apache.org <mailto:user@accumulo.apache.org>'
>> Subject: RE: accumulo.root invalid table reference
[SEC=UNOFFICIAL]
>>
>> UNOFFICIAL
>>
>>
>> Does that mean I should have entries for 1vm in the metadata
table
>> corresponding to the root table?
>>
>> We are running 1.6.5
>>
>>
>> -----Original Message-----
>> From: Josh Elser [mailto:josh.el...@gmail.com
<mailto:josh.el...@gmail.com>]
>> Sent: Tuesday, 21 February 2017 09:22
>> To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
>> Subject: Re: accumulo.root invalid table reference
[SEC=UNOFFICIAL]
>>
>> The root table should only reference the tablets in the
metadata table.
>> It's a hierarchy: like metadata is for the user tables, root
is for the
>> metadata table.
>>
>> What version are ya running, Matt?
>>
>> Dickson, Matt MR wrote:
>> > *UNOFFICIAL*
>> >
>> > I have a situation where all tablet servers are
progressively being
>> > declared dead. From the logs the tservers report errors like:
>> > 2017-02-.... DEBUG: Scan failed thrift error
>> > org.apache.thrift.trasport.TTransportException null
>> > (!0;1vm\\125.323.233.23::2016103<,server.com.org:9997
<http://server.com.org:9997>,2342423df12341d)
>> > 1vm was a table id that was deleted several months ago so
it appears
>> > there is some invalid reference somewhere.
>> > Scanning the metadata table "scan -b 1vm" returns no rows
returned for
>> > 1vm.
>> > A scan of the accumulo.root table returns approximately 15
rows that
>> > start with; !0:1vm;<i/p addr>/::2016103 /blah/ // How are
the root
>> > table entries used and would it be safe to remove these
entries since
>> > they reference a deleted table?
>> > Thanks in advance,
>> > Matt
>> > //
>
> --
> Christopher
--
Christopher