On Thu, Dec 1, 2016 at 1:21 AM, Andriy Gapon <a...@freebsd.org> wrote:

>
> get_clones_stat() could be very slow if a snapshot has many (thousands)
> clones.
> There seems to be two factors that contribute to that.
>
> Clone names are added to an nvlist that's created with NV_UNIQUE_NAME.
> So, each time a new name is appended to the list, the whole list is
> searched
> linearly to see if that name is not already in the list.  That results in
> the
> quadratic complexity.
> That should be easy to fix as we know in advance that we should not get any
> duplicate names, so we can drop NV_UNIQUE_NAME when creating the list.
>


Sounds good.



>
> Also, for each clone we call dsl_dataset_hold_obj() ->
> dsl_dir_hold_obj().  If
> the clone dataset is not yet instantiated, for example, because the clones
> are
> not mounted, then we need to call zap_value_search(), which is expensive.
> And
> if all the clones have the same parent, then the cost of
> zap_value_search() is
> proportional to the number of clones.
>
> I wonder if it would make sense to extend the on disk information with
> reverse-lookup ZAPs, so that we could avoid having to use
> zap_value_search() in
> the hot paths.  That could also help with issues like
> https://www.illumos.org/issues/7606
>
>
Yeah, it would make sense to have a more efficient way to map from dataset
object to its name.  Of course the cost is added complexity of keeping this
in sync with the name->object mapping (e.g. across rename).  I can imagine
several ways of doing this:

1. for each name->object ZAP, add an object->name ZAP

2. For the entire storage pool, add a single object->name ZAP

3. For each object, store its name (e.g. in the dsl_dataset_phys_t
[impractical due to backwards compatibility and name length], or in the
zapified dataset)

4. Read the existing name->object ZAP once and cache the reverse mapping,
e.g.
4a. as a separate data structure e.g. AVL or hash table mapping object->name
4b. by forcing all the dsl_dataset_t's it references to be loaded and
forcing them to stay in the cache until the pool is unloaded

I think that option 3 is the cleanest architecturally, but option 2 might
be more efficient.  Option 4 has the benefit of not requiring an on-disk
format change, which is a big plus.  4a would be pretty straightforward to
implement, it could even be done entirely in the ZAP (e.g. first call to
zap_value_search() caches the reverse mapping, hung off the zap_t, and
future calls to zap_value_search() use the cached mapping).

--matt

--
> Andriy Gapon
> 



-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to