FYI the Master has a debug log that will print up to 10 tablets that have outstanding migrations. https://github.com/apache/accumulo/blob/0a9837f3f8395d89c5cd7bab7805c4aae28919be/server/base/src/main/java/org/apache/accumulo/server/master/balancer/TabletBalancer.java#L172
On Thu, Nov 26, 2020 at 3:30 AM Hart, Andrew <[email protected]> wrote: > Just for completeness, the solution in the end was to stop then start the > tservers one at a time until the error cleared. I never found a way to > work out which tserver was causing the issue. > > > > *From:* Hart, Andrew [mailto:[email protected]] > *Sent:* 07 October 2020 13:54 > *To:* [email protected] > *Subject:* RE: Continuous tablets unloaded and fails to balance from > accumulo master > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you trust the sender and know the content is safe. > EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce > jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous > ayez l'assurance que le contenu provient d'une source sûre. > > > > Thanks for your suggestions > > > > Restarting the tserver that had the assigned to dead server tablets, was > tried but nothing happened to the tablets because they were not part of any > table and so did not appear to do anything. > > > > Scanning for missing loc entries – the command you suggested produced no > output other than a zootraceclient was loaded statement. > > > > Restarting the master works for 1 balance only and then it returns to 1 > tablets are unloaded. This is my current workaround for the last few weeks. > > > > I assume the tables are old and delete since their IDs in the metadata are > lower than currently created ones and the ID doesn’t appear in tables –l > > > > I like your GC idea I will look into that. I may have cloned tables in > the past to fix some other problem but it is not something I would normally > do. > > > > Thanks for again for your ideas. > > > > *From:* Mike Miller <[email protected]> > *Sent:* 06 October 2020 19:53 > *To:* [email protected] > *Subject:* Re: Continuous tablets unloaded and fails to balance from > accumulo master > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you trust the sender and know the content is safe. > EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce > jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous > ayez l'assurance que le contenu provient d'une source sûre. > > > > It would help if you provided what commands you are running and some of > the output (if possible) - or at least more detail of what you are seeing. > It's had to provide specifics, because it's hard to understand how you got > into this state, what you have done, and what the current state is. > > If tablets are assigned to a dead server, but you think that server is ok, > did you try taking that server down? Once the server is detected as down, > that should trigger reassignments - at that point you can restart the > server. > > Scanning the accumulo.metadata table - does every extent have a loc entry? > Something like: > > accumulo shell -u root -p secret -e 'scan -t accumulo.metadata -np -c loc' > | grep -v loc > > Have you tried restarting the master? > > If the tables are "old" and deleted - what are you onlining? Have you > tried to delete an offline table? > > Is you GC running to completion? Do you clone tables? One issue may be > that Accumulo gc needs to check that a file is not shared between tables, > maybe its running into issues completing that check? > > > > On Tue, Oct 6, 2020 at 12:57 PM Christopher <[email protected]> wrote: > > I'm not sure CheckForMetadataProblems can check for all that many > different types of problems. It is limited. > If you have tablets still in the metadata table for tables that no longer > exist, that indicates you probably had some sort of crash and possible > corruption of your metadata. > The only option would be to manually delete those entries. > A command to automatically prune these would probably be dangerous... > running it when there's a transient ZooKeeper problem, for example, could > end up deleting all your tables... which would be bad. Although it is > dangerous, manual surgery on the metadata table to remove these entries, as > you suggested, is probably the best option. > > > > On Tue, Oct 6, 2020 at 12:03 PM Hart, Andrew <[email protected]> wrote: > > I am still trying to find the one “unloaded tablet” that is preventing the > cluster balancing, however, there are a lot of unassigned tablets. > > > > I have been getting rid of them by onlining tables and completing failed > table deletes but I am still left with many tablets that are unassigned. > They seem to be mostly from old deleted tables and so I am not sure why > they are there at all. > > The unassigned tablets are shown in accumulo > org.apache.accumulo.server.util.FindOfflineTablets and in accumulo admin > checkTablets > > And as I said, some are assign to dead server but actually the server > isn’t dead at all. > > > > CheckForMetadataProblems reports “All is well” > > > > I thought that if I could clear up this mess I could then eventually get > to just one unassigned tablet which would be the “1 tablets are unloaded” > one. (I would then clone the table or copy the data out or something) > > > > So the problem remains. The cluster doesn’t balance due to migrations. I > don’t find a tablet with a future entry and I can’t find it in unassigned > or offline tablets due to the large number of other (presumably defunct) > tablets with unassigned problems in tables that no longer exist. > > > > There are warnings in the documentation about manually editing the > accumulo metadata table but it seems that the only option is to go in with > a deletemany on any rows that start with an old deleted table. There does > not seem to be an “accumulo admin pruneDefunctTablets –t tid” command! :D > > > > > > > > *From:* Mike Miller <[email protected]> > *Sent:* 06 October 2020 16:27 > *To:* [email protected] > *Subject:* Re: Continuous tablets unloaded and fails to balance from > accumulo master > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you trust the sender and know the content is safe. > EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce > jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous > ayez l'assurance que le contenu provient d'une source sûre. > > > > Do you want to merge old tablets that don't exist anymore? I am not sure > what you are asking... you might have better luck if you provide some more > info and ask on Slack: https://accumulo.apache.org/contact-us/#slack > <https://urldefense.proofpoint.com/v2/url?u=https-3A__accumulo.apache.org_contact-2Dus_-23slack&d=DwMFaQ&c=H50I6Bh8SW87d_bXfZP_8g&r=f1Vi1t2KLSKTuTeSpDUCXg&m=Lgh2fhFz4BGHb5Zc9up-gHPYKgQEyQzp4d5XjC5P35A&s=-e_h4A8fCLAqaw1Etl-J2VMdIHWi-Et0FEJW_DgZTbo&e=> > > > > On Tue, Oct 6, 2020 at 7:25 AM Hart, Andrew <[email protected]> wrote: > > What is the way to remove tablets that still exist in accumulo but do not > have an online, offline or deleting table? > > > > Some of these tablets say ASSIGNED TO DEAD SERVER but the tserver they > refer to is up and working properly. > > > > *From:* Hart, Andrew <[email protected]> > *Sent:* 25 September 2020 13:52 > *To:* [email protected] > *Subject:* RE: Continuous tablets unloaded and fails to balance from > accumulo master > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you trust the sender and know the content is safe. > EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce > jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous > ayez l'assurance que le contenu provient d'une source sûre. > > > > Thanks for your help. In looking for this I think I have found that there > are deleted tables that still have a lot of tablets in the metadata table. > > I need to solve that before coming back to find the 1 unloaded tablet. > > > > Cheers And. > > > > *From:* Mike Miller <[email protected]> > *Sent:* 24 September 2020 16:08 > *To:* [email protected] > *Subject:* Re: Continuous tablets unloaded and fails to balance from > accumulo master > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you trust the sender and know the content is safe. > EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce > jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous > ayez l'assurance que le contenu provient d'une source sûre. > > > > That might be OK, could just mean it hasn't been assigned yet. The only > way I can think of is to populate a list of all tablets from the metadata > table and find the one without a "loc" column family. > > > > On Thu, Sep 24, 2020 at 10:55 AM Hart, Andrew <[email protected]> wrote: > > No, no future entries in the table. > > > > *From:* Mike Miller <[email protected]> > *Sent:* 24 September 2020 15:10 > *To:* [email protected] > *Subject:* Re: Continuous tablets unloaded and fails to balance from > accumulo master > > > > EXTERNAL SENDER: Do not click any links or open any attachments unless > you trust the sender and know the content is safe. > EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce > jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous > ayez l'assurance que le contenu provient d'une source sûre. > > > > You should be able to figure out the unloaded tablet from the > "accumulo.metadata" table. The metadata table will list the tablet > location using the "loc" column family to indicate it has loaded a tablet > that it was assigned. > > For example the tablet "n;9" will have an entry like: > > n;9 loc:1000041fbf00006 [] ip-172-31-87-51.ec2.internal:9997 > > > > From my understanding, the unloaded tablet should have a "future" column > family, meaning it has been assigned a new location but not loaded yet. If > the tablet doesn't have a "loc" or "future" column family then that is a > problem. > > > > On Thu, Sep 24, 2020 at 6:32 AM Hart, Andrew <[email protected]> wrote: > > Hi, > > > > I am getting “Not balancing due to 1 outstanding migrations” and “[Normal > tablets]: 1 tablets unloaded”. > > This means that the cluster never balances unless I restart the master, > after which I get a 1 off balance and then it returns to the above messages. > > > > How do I identify the tablet that is unloaded? It isn’t in the logs that > I can see. Is it possible to tell from the contents of the > accumulo.metadata table? > > > > Is there a way to use FindOfflineTablets? > > > > And. > >
