Thanks for your suggestions

Restarting the tserver that had the assigned to dead server tablets, was tried 
but nothing happened to the tablets because they were not part of any table and 
so did not appear to do anything.

Scanning for missing loc entries – the command you suggested produced no output 
other than a zootraceclient was loaded statement.

Restarting the master works for 1 balance only and then it returns to 1 tablets 
are unloaded.  This is my current workaround for the last few weeks.

I assume the tables are old and delete since their IDs in the metadata are 
lower than currently created ones and the ID doesn’t appear in tables –l

I like your GC idea I will look into that.  I may have cloned tables in the 
past to fix some other problem but it is not something I would normally do.

Thanks for again for your ideas.

From: Mike Miller <[email protected]>
Sent: 06 October 2020 19:53
To: [email protected]
Subject: Re: Continuous tablets unloaded and fails to balance from accumulo 
master

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à 
moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez 
l'assurance que le contenu provient d'une source sûre.


It would help if you provided what commands you are running and some of the 
output (if possible) - or at least more detail of what you are seeing.  It's 
had to provide specifics, because it's hard to understand how you got into this 
state, what you have done, and what the current state is.

If tablets are assigned to a dead server, but you think that server is ok, did 
you try taking that server down?  Once the server is detected as down, that 
should trigger reassignments - at that point you can restart the server.

Scanning the accumulo.metadata table - does every extent have a loc entry? 
Something like:

accumulo shell -u root -p secret -e 'scan -t accumulo.metadata -np -c loc' | 
grep -v loc

Have you tried restarting the master?

If the tables are "old" and deleted - what are you onlining?  Have you tried to 
delete an offline table?

Is you GC running to completion? Do you clone tables?  One issue may be that 
Accumulo gc needs to check that a file is not shared between tables, maybe its 
running into issues completing that check?

On Tue, Oct 6, 2020 at 12:57 PM Christopher 
<[email protected]<mailto:[email protected]>> wrote:
I'm not sure CheckForMetadataProblems can check for all that many different 
types of problems. It is limited.
If you have tablets still in the metadata table for tables that no longer 
exist, that indicates you probably had some sort of crash and possible 
corruption of your metadata.
The only option would be to manually delete those entries.
A command to automatically prune these would probably be dangerous... running 
it when there's a transient ZooKeeper problem, for example, could end up 
deleting all your tables... which would be bad. Although it is dangerous, 
manual surgery on the metadata table to remove these entries, as you suggested, 
is probably the best option.

On Tue, Oct 6, 2020 at 12:03 PM Hart, Andrew 
<[email protected]<mailto:[email protected]>> wrote:
I am still trying to find the one “unloaded tablet” that is preventing the 
cluster balancing, however, there are a lot of unassigned tablets.

I have been getting rid of them by onlining tables and completing failed table 
deletes but I am still left with many tablets that are unassigned.  They seem 
to be mostly from old deleted tables and so I am not sure why they are there at 
all.
The unassigned tablets are shown in accumulo 
org.apache.accumulo.server.util.FindOfflineTablets and in accumulo admin 
checkTablets
And as I said, some are assign to dead server but actually the server isn’t 
dead at all.

CheckForMetadataProblems reports “All is well”

I thought that if I could clear up this mess I could then eventually get to 
just one unassigned tablet which would be the “1 tablets are unloaded” one.  (I 
would then clone the table or copy the data out or something)

So the problem remains.  The cluster doesn’t balance due to migrations.  I 
don’t find a tablet with a future entry and I can’t find it in unassigned or 
offline tablets due to the large number of other (presumably defunct) tablets 
with unassigned problems in tables that no longer exist.

There are warnings in the documentation about manually editing the accumulo 
metadata table but it seems that the only option is to go in with a deletemany 
on any rows that start with an old deleted table.  There does not seem to be an 
“accumulo admin pruneDefunctTablets –t tid” command! :D



From: Mike Miller <[email protected]<mailto:[email protected]>>
Sent: 06 October 2020 16:27
To: [email protected]<mailto:[email protected]>
Subject: Re: Continuous tablets unloaded and fails to balance from accumulo 
master

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à 
moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez 
l'assurance que le contenu provient d'une source sûre.


Do you want to merge old tablets that don't exist anymore?  I am not sure what 
you are asking... you might have better luck if you provide some more info and 
ask on Slack: 
https://accumulo.apache.org/contact-us/#slack<https://urldefense.proofpoint.com/v2/url?u=https-3A__accumulo.apache.org_contact-2Dus_-23slack&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=f1Vi1t2KLSKTuTeSpDUCXg&m=Lgh2fhFz4BGHb5Zc9up-gHPYKgQEyQzp4d5XjC5P35A&s=-e_h4A8fCLAqaw1Etl-J2VMdIHWi-Et0FEJW_DgZTbo&e=>

On Tue, Oct 6, 2020 at 7:25 AM Hart, Andrew 
<[email protected]<mailto:[email protected]>> wrote:
What is the way to remove tablets that still exist in accumulo but do not have 
an online, offline or deleting table?

Some of these tablets say ASSIGNED TO DEAD SERVER but the tserver they refer to 
is up and working properly.

From: Hart, Andrew <[email protected]<mailto:[email protected]>>
Sent: 25 September 2020 13:52
To: [email protected]<mailto:[email protected]>
Subject: RE: Continuous tablets unloaded and fails to balance from accumulo 
master

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à 
moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez 
l'assurance que le contenu provient d'une source sûre.


Thanks for your help.  In looking for this I think I have found that there are 
deleted tables that still have a lot of tablets in the metadata table.
I need to solve that before coming back to find the 1 unloaded tablet.

Cheers And.

From: Mike Miller <[email protected]<mailto:[email protected]>>
Sent: 24 September 2020 16:08
To: [email protected]<mailto:[email protected]>
Subject: Re: Continuous tablets unloaded and fails to balance from accumulo 
master

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à 
moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez 
l'assurance que le contenu provient d'une source sûre.


That might be OK, could just mean it hasn't been assigned yet.  The only way I 
can think of is to populate a list of all tablets from the metadata table and 
find the one without a "loc" column family.

On Thu, Sep 24, 2020 at 10:55 AM Hart, Andrew 
<[email protected]<mailto:[email protected]>> wrote:
No, no future entries in the table.

From: Mike Miller <[email protected]<mailto:[email protected]>>
Sent: 24 September 2020 15:10
To: [email protected]<mailto:[email protected]>
Subject: Re: Continuous tablets unloaded and fails to balance from accumulo 
master

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à 
moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez 
l'assurance que le contenu provient d'une source sûre.


You should be able to figure out the unloaded tablet from the 
"accumulo.metadata" table.  The metadata table will list the tablet location 
using the "loc" column family to indicate it has loaded a tablet that it was 
assigned.
For example the tablet "n;9" will have an entry like:
n;9 loc:1000041fbf00006 []    ip-172-31-87-51.ec2.internal:9997

From my understanding, the unloaded tablet should have a "future" column 
family, meaning it has been assigned a new location but not loaded yet.  If the 
tablet doesn't have a "loc" or "future" column family then that is a problem.

On Thu, Sep 24, 2020 at 6:32 AM Hart, Andrew 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am getting “Not balancing due to 1 outstanding migrations” and “[Normal 
tablets]: 1 tablets unloaded”.
This means that the cluster never balances unless I restart the master, after 
which I get a 1 off balance and then it returns to the above messages.

How do I identify the tablet that is unloaded?  It isn’t in the logs that I can 
see.  Is it possible to tell from the contents of the accumulo.metadata table?

Is there a way to use FindOfflineTablets?

And.

Reply via email to