You can get the hdfs size using standard hdfs commands - count or ls. As long as you have not cloned the table, the size of the hdfs files and the space occupied by the table are equivalent.
You can also get a sense of the referenced files examining the metadata table - the column qualifier file: will just give you the referenced files. You can look at the directories b-xxxxxxx are from a bulk import and t-xxxxxxx files are assigned to the tablets. Also bulk import file names start with I-xxxxxx, files from compactions will be A-xxxxxx if from a full compaction, C-xxxxxxx from a minor compaction and F-xxxxxx is the result of a flush. You can look at the entries for the files - the numbers for the value are number of entities, file size How do you ingest? Bulk or continuous? On a bulk ingest, the imported files end up in /accumulo/table/x/b-xxxxx and then are assigned to tablets - the directories for the Tablets will be created, but will be "empty" until a compaction occurs. A compaction will copy from the files referenced by the tablets into a new file that will be placed into the corresponding /accumulo/table/x/t-xxxxxx directory. When a bulk imported file is no longer referenced by any tablets, it will get garbage collected, until then file will exist and inflate the actual space used by the table. The compaction will also remove any data that is past the TTL for the records. Do you ever run a compaction? With a very large number of tablets, you may want to run the compaction in parts so that you don't end up occupying all of the compaction slots for a long time. Are you using keys (row ids) that are always increasing? An typical example would be a date. Say some of your row ids are yyyy-mm-dd-hh and there is a 10 day TTL. What will happened is that new data will continue to create new tablets and on compaction the old tablets will age-off and have 0 size. You can remove the "unused splits" by running a merge. Anything that creates new row ids that are ordered can do this - new splits are necessary and the old-splits eventually become unnecessary, if the row ids are distributed across the splits it will not do this. It is not necessary a problem if this what you data looks like, just something that you may want to manage with merges. There is usually not much benefit having a large number of tablets for a single table on a server. You can reduce the number of tablets required by setting the split threshold to a larger number and then running a merge. This can be done in sections, and you should run a compaction on the section first. If you have recently compacted, you can figure out the rough number of tables necessary by taking hdfs size / split threshold = number of tablets. If you increase the spilt threshold size you will need fewer tablets. You may also consider setting a split threshold that is larger than your target - say you decided that 5G was a good target, if you set the threshold to 8G during the merge and then setting it to 5G when completed will cause the table to split - and it could give you a better distribution of data in the splits. This can be done while things are running, but it will be a heavy IO load (files and on the hdfs namenode) and can take a very long time. What can be useful is you the getSplits command with the number of split options and create a script that compacts, then merges a section - using the splits as start / end row to the compaction and merge command. Ed Coleman From: Ligade, Shailesh [USA] <ligade_shail...@bah.com> Sent: Monday, January 31, 2022 11:16 AM To: user@accumulo.apache.org Subject: tablets per tablet server for accumulo 1.10.0 Hello, table.split.threshold is set to default 1G (except for metadata nd root - which is set to 64M) What can cause tablets per tablet server count to go high? Within a week, that count jumped from 5k/tablet server to 23k/tablet server, even though total size in hdfs has not changed. Is high count, a cause for concern? We didn't apply any splits. I did a dumpConfig and checked all my tables and didn't see splits either. Is there a way to find tablet size in hdfs? When I look at hdfs /accumulo/table/x/ i see some empty folders, meaning not all folders has rf files. is that normal? Thanks in advance! -S