adoroszlai opened a new pull request #271: HDDS-1812. Du while calculating used 
disk space reports that chunk files are file not found
URL: https://github.com/apache/hadoop-ozone/pull/271
 
 
   ## What changes were proposed in this pull request?
   
   1. Copy `DU` from `hadoop-common` and customize it to ignore temporary chunk 
files.  Unfortunately the original implementation could not be simply extended. 
 Note that this means temp chunk files will not be counted towards HDDS space 
usage.  One possible alternative is ignoring errors from `du` (which returns 
with error due to the missing files), but that might make it more difficult to 
spot other problems.
   2. Introduce an interface for creating various "space usage" 
implementations.  `hadoop-common` allows a single implementation to be 
configured, which is used for all volumes.  The factory interface allows 
creating different implementations or different parameters for different 
volumes.  Currently 3 factories are implemented:
      1. pure `du`-based
      2. pure `df`-based (this is actually based on `java.io` in both Hadoop 
Common and HDDS)
      3. mock for testing -- this also helps avoid test failures due to "disk 
out of space" errors when allocating containers
      Hybrid or other more intelligent implementation could be a future 
improvement.  Eg. one that would use `du` for shared volumes and `df` for 
dedicated ones.
   3. Introduce an interface for the object that persists space usage info (to 
a file named `scmUsed`) to be able to:
      1. skip persisting info from cheaper sources (ie. `df`)
      2. use in-memory persistence for unit testing
   4. Decouple caching and periodic refresh of space usage from its source.  
Use a `ScheduledExecutorService` for refreshing disk usage info.  Use a "same 
thread executor" for unit tests.
   5. Let `container-service` tests use code from HDDS `common` tests, which 
provide mock implementations of the new interfaces.  This also required:
      1. Avoid using real config properties (eg. `ozone.scm.client.bind.host`) 
for testing config file generation.  The generated config file is picked up by 
other tests and causes failures.
      2. Rename Log4J2 config file used by audit logger test to avoid creating 
an untracked `audit.log` (similar to HDDS-2063).  This would happen if some 
test starts components which use audit logger, and it picks up 
`log4j.properties` by default.
   
   https://issues.apache.org/jira/browse/HDDS-1812
   
   ## How was this patch tested?
   
   Added several unit tests, even for logic extracted from existing classes.  
On compose-based cluster tested configuring each of the real factory 
implementations and configuring different refresh periods.  Tested `du`-based 
implementation by continuously creating/renaming files from shell.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to