Re: [robinhood-support] Robinhood v4 planning questions

quentin.bouget Mon, 28 Oct 2019 05:50:21 -0700

Hi Craig,

[[@Thomas, I left a quesion for you at the bottom.]]


On 27/10/2019 02:40, Prescott,Craig P wrote:

In the Robinhood v4 era, is it known whether the "1KB/entry"rule-of-thumb could still be accurate? I am trying to estimate flashstorage capacity needs for this purpose.

I do not know. You can have a look at what the MongoDB schema forfilesystem entries will look like in v4 in this patch<https://review.gerrithub.io/c/cea-hpc/robinhood/+/471515/1/src/backends/mongo.c>(the comment at line 50). Roughly, the size of each value is:


 * _id: 132B (or less);
 * ns: 132B (or less) + length of the filename (per hardlink);
 * symlink: length of the symlink's string (if any);
 * statx: 117B.

The following is entirely subjective, but assuming:

 * filenames are ~64B;
 * there are not many symlinks;
 * nor hardlinks;
 * and IDs are closer to ~38B than 132B.

RobinHood v4 (4.0.0 rather) will store ~257B of information perfilesystem entry.

Now, all this calculation is pretty useless because MongoDB stores morethan just the values of a document: I believe each document contains acopy of its keys and indexes certainly take storage space as well.Moreover, documents are likely to be extended with filesystem-specificinformation (eg. the file layout for Lustre), and we have plans tosupport arbitrary user-driven "tags", similar to extended attributes(xattr).

If you really need an answer, my personal opinion is that if the1KB/entry rule worked for you before, it should keep working: the amountof information to store is the same after, and I have not read anywherethat MongoDB needs significantly more space than MariaDB.

For a more reliable opinion, the best, I think, would be to wait untilv4 is used in production somewhere, and get real numbers from there.

Our current implementation places the changelog reader and mariabdserver on the same hardware resource. We have a relatively large CPUand RAM capacity on this resource (CPU for both changelog reading andmariadb, RAM largely for innodb buffer pool size). I would also liketo know whether to expect CPU and/or RAM needs are expected to changewith Robinhood v4. Is there any guidance in this area? I can scaleour current Robinhood utilization by our anticipated client and corecount changes (and thus changelog entry processing needs), but wonderif there is more to it than that.

Once again, until it is deployed anywhere in production, I am merelytaking shots in the dark.

I know for a fact that RobinHood v4 is more performant that v3 for anumber of use cases. My guess is that for the same performance, v4 needsless CPU/RAM than v3.

Note this will be a Lustre 2.12/2.14/later environment by that time,and we will be using DNE/DoM/PFL features (not in use today). I amnot sure if this makes any difference in the capacity planning, but Ithought I should mention it in case it matters for Robinhood.

DNEv2 phase b or later (cross-MDTs renames) is not (fully) compatiblewith either RobinHood v3 or v4. This is tracked in LU-12574<https://jira.whamcloud.com/browse/LU-12574>. It should still work OK,but hardlinks and moves across directories managed by different MDTs maynot be processed correctly by RobinHood resulting in a divergencebetween the metadata in the FS and RobinHood's copy of it. Other thanthat, DNE is a nice feature and helps distribute the load of changelogprocessing across multiple clients.

I do not expect DoM or PFL to have too much of an impact on RobinHoodv3. And v4 does not handle file layouts for now.



Regards,

Quentin

PS: v3 might not be PFL-ready/PFL-aware, @Thomas will have to advise onthat.

_______________________________________________
robinhood-support mailing list
robinhood-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Re: [robinhood-support] Robinhood v4 planning questions

Reply via email to