Are the files building up in the archive "directory" in S3 or in the actual 
data directory? In the past we've had issues where the cleaner chore thread on 
the master runs into EMRFS inconsistencies and eventually seems to give up, 
causing a buildup of files in the archive directory over time. Restarting 
master gets the cleaner thread going again which shows the inconsistencies in 
the log and you can then clean them up in a targeted way (using "emrfs diff" 
"emrfs sync" and "emrfs delete"). Or if you have time during a maintenance 
window you can try running "emrfs sync" on the whole hbase directory, but that 
sometimes takes a long time to run depending on the size of your data and 
Amazon has warned us against running that while the cluster is running.

In terms of why EMRFS inconsistencies occur, that might be a question for AWS 
support. We used to have a lot of problems with that because of dynamodb 
throttling on the EmrFsMetadata table. We see less once we went from 
provisioned capacity to on-demand capacity (the demand seems to be very spiky) 
and also upgrading to newer versions (I wish Amazon published EMRFS bugs but oh 
well - again AWS support might be able to help here). But even with these 
changes we still sometimes see inconsistencies.

If you are seeing parents hanging around the actual data directory, then I've 
never seen that before.

As for your regions stuck in transition state, we also see this occasionally. 
Most of the time this can be fixed just by running "assign '<encoded_region>'" 
in hbase shell. Or if you prefer you can run "hbase hbck -fixAssignments" which 
will basically just do the same thing where it tries to assign regions still in 
a transition state. Both of those things can be done with the cluster running - 
no need to roll the master.

Hope this helps.

--Jacob

-----Original Message-----
From: Austin Heyne [mailto:[email protected]] 
Sent: Thursday, May 21, 2020 10:28 AM
To: [email protected]
Subject: HBase not cleaning up split parents

We're running HBase 1.4.8 on S3 (EMR 5.20) and we're seeing that after a series 
of splits and a major compaction the split parents are not getting removed. The 
on disk size of some of our tables is 6x what HBase is reporting in the table 
details. The RS_COMPACTED_FILES_DISCHARGER threads are all parked waiting and 
we haven't see a reduction in size in well over a week. The only thing of note 
on the cluster is we have two regions stuck in a transition state until we have 
a maintenance window to roll the master.

Has anyone experienced this or have a way to encourage the regionservers to 
start the cleanup process?

Thanks,
Austin

Reply via email to