RE: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart

dev1 Tue, 30 Nov 2021 07:15:20 -0800

All of this is hard to quantify, because it really depends on your usage, but 
in general.


The larger number of tablets per server the more “work” the tserver and then 
system need to do to keep track of everything.  It seems to be very rare that 
hosting large number of splits for a single table on every tablet improves 
performance – there maybe some cases where it might, but unless you can 
specifically measure that it is really helping in your specific usage, you may 
want to try to bring the count down.

If you adjust the split threshold so that you have larger tablets with more 
entries you would need fewer splits (you may need to merge to combine tablets). 
The indexing used in the RFiles is efficient in being able to quickly skip to 
the relevant data on scans and this really minimizes the impact on scans for 
larger files. The largest drawback is that compactions will take longer to read 
/ write the larger files.

Are you using the default split size of 3G?  Even setting that to 6G would 
reduce the tablet count by 50%, and larger, say 9G should still be feasible.

If you try this, one strategy for merging would be to set a size larger than 
your target, do the merge and then when that is complete, set the threshold to 
your target and allow the data to pick new split points that should be 
relatively balanced across the tablets.  The merge will take a long time and 
can hammer the namenode – so you might want to consider doing it in stages.

From: Shailesh Ligade <slig...@fbi.gov>
Sent: Tuesday, November 30, 2021 9:37 AM
To: user@accumulo.apache.org
Subject: RE: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart

There are not that many tables but just number of splits. There are 25b entries 
but each  entry is large.
Is there an optimal tserver memory/heap usage to number of tablets 
relationship? I saw some references like 
https://www.oreilly.com/library/view/accumulo/9781491947098/ch10.html that 
states that you should keep 1k tablets per server but I think think that is 
over kill in our situation. Each tserver is quite large 16 core, 128GB.

On the tablet.suspend.duration setting, once I update that setting, do I need 
to restart master? After updating the setting, I saw in master log had old 
value (0s), but if I restart master it shows correct value..in my testing it 
didn’t make any difference, but am just curious.

-S

From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Tuesday, November 30, 2021 9:17 AM
To: 'user@accumulo.apache.org' 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: RE: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart

One thing that you might be able to optimize is the number of tablets per 
server – you stated that you have “roughly 4k+ tablets per tserver”

Is that driven by the number of tables, or do you have lots of splits for a 
much smaller number of tables?

From: Shailesh Ligade <slig...@fbi.gov<mailto:slig...@fbi.gov>>
Sent: Monday, November 29, 2021 11:17 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: Re: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart


Uhmm updated the setting tablet.suspended.duration to 5m

config -s tablet.suspended.duration=5m

but when i issued restart tserver (one at a time without waiting for first to 
come up), i still get all tablets unassigned 🙁 may be, I need to bring masters 
down first?

btw this is for accumulo 1.10.0

am I missing anything?

-S
________________________________
From: Shailesh Ligade <slig...@fbi.gov<mailto:slig...@fbi.gov>>
Sent: Monday, November 29, 2021 10:35 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart

Thanks Michael,

stop cluster using admin stop? The issue is that, since we are using systemd 
with restart=always, it interferes with any of those stop (stop-all, stop-here 
etc) commands/scripts. So either we have to modify systemd settings or may be 
just shutdown vm type of operation (i think that is little brutal)

-S
________________________________
From: Michael Wall <mjw...@gmail.com<mailto:mjw...@gmail.com>>
Sent: Monday, November 29, 2021 9:54 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart

Is there a reason to not just stop the cluster, reset the heap and restart the 
cluster?  That is simpler.

On Mon, Nov 29, 2021 at 9:37 AM dev1 
<d...@etcoleman.com<mailto:d...@etcoleman.com>> wrote:

Yes – and don’t forget to reset it back when you are done.



From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Monday, November 29, 2021 9:36 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: RE: accumulo tserver rolling restart



Thanks,



I am assuming I can set that property using shell and it will take effect 
immediately?



Thanks



-S



From: dev1 <d...@etcoleman.com<mailto:d...@etcoleman.com>>
Sent: Monday, November 29, 2021 9:25 AM
To: 'user@accumulo.apache.org<mailto:user@accumulo.apache.org>' 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: [External] RE: accumulo tserver rolling restart



See 
https://accumulo.apache.org/1.10/accumulo_user_manual.html#_restarting_process_on_a_node<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Faccumulo.apache.org%2F1.10%2Faccumulo_user_manual.html*_restarting_process_on_a_node__%3BIw!!May37g!evyseDphy3PM_d8-tSlk89Sw1fFlSXHtH7vhiQedtcADc_P7OLEHw2kVZjlQ4Q8G_Q%24&data=04%7C01%7CSLIGADE%40FBI.GOV%7C979350c787894f72cca908d9b40c28db%7C022914a9b95f4b7bbace551ce1a04071%7C0%7C0%7C637738787912893850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mP9HEKWVGtNiNdJEcIevM%2BBUkZn24WORSmY3wjXSn8Q%3D&reserved=0>
 – A note on rolling restarts.



There is property that can be set (table.suspend.duration) that will delay the 
reassignment while a tserver is restarting – there is a trade-off on the data 
not being available so try to minimize the time the tserver is off-line.



From: Ligade, Shailesh [USA] 
<ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>>
Sent: Monday, November 29, 2021 9:19 AM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: accumulo tserver rolling restart



Hello,



I want to restart al the tservers, say I updated the tserver heap size. Since 
we ar eusing system, I can issue restart command on a tserver. This causes all 
sorts of tablet movements even though accumulo is down for may be a second. If 
I wait for all unassigned tables to become 0, then to restart next tserver, 
then to completely restart a small cluster (6-8 nodes) take hours (roughly 4k+ 
tablets per tserver)



What may be right way to perform such routine maintenance operation? Is there a 
delay setting we can change so that it will not move tablets around? What may 
be a safe delay value?



-S

RE: [EXTERNAL EMAIL] - Re: accumulo tserver rolling restart

Reply via email to