adoroszlai opened a new pull request #271: HDDS-1812. Du while calculating used disk space reports that chunk files are file not found URL: https://github.com/apache/hadoop-ozone/pull/271 ## What changes were proposed in this pull request? 1. Copy `DU` from `hadoop-common` and customize it to ignore temporary chunk files. Unfortunately the original implementation could not be simply extended. Note that this means temp chunk files will not be counted towards HDDS space usage. One possible alternative is ignoring errors from `du` (which returns with error due to the missing files), but that might make it more difficult to spot other problems. 2. Introduce an interface for creating various "space usage" implementations. `hadoop-common` allows a single implementation to be configured, which is used for all volumes. The factory interface allows creating different implementations or different parameters for different volumes. Currently 3 factories are implemented: 1. pure `du`-based 2. pure `df`-based (this is actually based on `java.io` in both Hadoop Common and HDDS) 3. mock for testing -- this also helps avoid test failures due to "disk out of space" errors when allocating containers Hybrid or other more intelligent implementation could be a future improvement. Eg. one that would use `du` for shared volumes and `df` for dedicated ones. 3. Introduce an interface for the object that persists space usage info (to a file named `scmUsed`) to be able to: 1. skip persisting info from cheaper sources (ie. `df`) 2. use in-memory persistence for unit testing 4. Decouple caching and periodic refresh of space usage from its source. Use a `ScheduledExecutorService` for refreshing disk usage info. Use a "same thread executor" for unit tests. 5. Let `container-service` tests use code from HDDS `common` tests, which provide mock implementations of the new interfaces. This also required: 1. Avoid using real config properties (eg. `ozone.scm.client.bind.host`) for testing config file generation. The generated config file is picked up by other tests and causes failures. 2. Rename Log4J2 config file used by audit logger test to avoid creating an untracked `audit.log` (similar to HDDS-2063). This would happen if some test starts components which use audit logger, and it picks up `log4j.properties` by default. https://issues.apache.org/jira/browse/HDDS-1812 ## How was this patch tested? Added several unit tests, even for logic extracted from existing classes. On compose-based cluster tested configuring each of the real factory implementations and configuring different refresh periods. Tested `du`-based implementation by continuously creating/renaming files from shell.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
