Hi Craig,

[[@Thomas, I left a quesion for you at the bottom.]]

On 27/10/2019 02:40, Prescott,Craig P wrote:
In the Robinhood v4 era, is it known whether the "1KB/entry" rule-of-thumb could still be accurate?  I am trying to estimate flash storage capacity needs for this purpose.

I do not know. You can have a look at what the MongoDB schema for filesystem entries will look like in v4 in this patch <https://review.gerrithub.io/c/cea-hpc/robinhood/+/471515/1/src/backends/mongo.c> (the comment at line 50). Roughly, the size of each value is:

 * _id: 132B (or less);
 * ns: 132B (or less) + length of the filename (per hardlink);
 * symlink: length of the symlink's string (if any);
 * statx: 117B.

The following is entirely subjective, but assuming:

 * filenames are ~64B;
 * there are not many symlinks;
 * nor hardlinks;
 * and IDs are closer to ~38B than 132B.

RobinHood v4 (4.0.0 rather) will store ~257B of information per filesystem entry.


Now, all this calculation is pretty useless because MongoDB stores more than just the values of a document: I believe each document contains a copy of its keys and indexes certainly take storage space as well. Moreover, documents are likely to be extended with filesystem-specific information (eg. the file layout for Lustre), and we have plans to support arbitrary user-driven "tags", similar to extended attributes (xattr).


If you really need an answer, my personal opinion is that if the 1KB/entry rule worked for you before, it should keep working: the amount of information to store is the same after, and I have not read anywhere that MongoDB needs significantly more space than MariaDB.


For a more reliable opinion, the best, I think, would be to wait until v4 is used in production somewhere, and get real numbers from there.


Our current implementation places the changelog reader and mariabd server on the same hardware resource.  We have a relatively large CPU and RAM capacity on this resource (CPU for both changelog reading and mariadb, RAM largely for innodb buffer pool size).  I would also like to know whether to expect CPU and/or RAM needs are expected to change with Robinhood v4. Is there any guidance in this area?  I can scale our current Robinhood utilization by our anticipated client  and core count changes (and thus changelog entry processing needs), but wonder if there is more to it than that.

Once again, until it is deployed anywhere in production, I am merely taking shots in the dark.


I know for a fact that RobinHood v4 is more performant that v3 for a number of use cases. My guess is that for the same performance, v4 needs less CPU/RAM than v3.


Note this will be a Lustre 2.12/2.14/later environment by that time, and we will be using DNE/DoM/PFL features (not in use today).  I am not sure if this makes any difference in the capacity planning, but I thought I should mention it in case it matters for Robinhood.

DNEv2 phase b or later (cross-MDTs renames) is not (fully) compatible with either RobinHood v3 or v4. This is tracked in LU-12574 <https://jira.whamcloud.com/browse/LU-12574>. It should still work OK, but hardlinks and moves across directories managed by different MDTs may not be processed correctly by RobinHood resulting in a divergence between the metadata in the FS and RobinHood's copy of it. Other than that, DNE is a nice feature and helps distribute the load of changelog processing across multiple clients.


I do not expect DoM or PFL to have too much of an impact on RobinHood v3. And v4 does not handle file layouts for now.


Regards,

Quentin


PS: v3 might not be PFL-ready/PFL-aware, @Thomas will have to advise on that.

_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to