On Fri, Jan 28, 2022 at 6:56 AM Jason Wang <jasow...@redhat.com> wrote: > > > 在 2022/1/28 上午11:57, Peter Xu 写道: > > On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote: > >> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <pet...@redhat.com> wrote: > >>> On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote: > >>>> So I think that the first step to remove complexity from the old one > >>>> is to remove iova_begin and iova_end. > >>>> > >>>> As Jason points out, removing iova_end is easier. It has the drawback > >>>> of having to traverse all the list beyond iova_end, but a well formed > >>>> iova tree should contain none. If the guest can manipulate it, it's > >>>> only hurting itself adding nodes to it. > >>>> > >>>> It's possible to extract the check for hole_right (or this in Jason's > >>>> proposal) as a special case too. > >>>> > >>>> But removing the iova_begin parameter is more complicated. We cannot > >>>> know if it's a valid hole without knowing iova_begin, and we cannot > >>>> resume traversing. Could we assume iova_begin will always be 0? I > >>>> think not, the vdpa device can return anything through syscall. > >>> Frankly I don't know what's the syscall you're talking about, > >> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid > >> range of iova addresses. We get a pair of uint64_t from it, that > >> indicates the minimum and maximum iova address the device (or iommu) > >> supports. > >> > >> We must allocate iova ranges within that address range, which > >> complicates this algorithm a little bit. Since the SVQ iova addresses > >> are not GPA, qemu needs extra code to be able to allocate and free > >> them, creating a new custom iova as. > >> > >> Please let me know if you want more details or if you prefer me to > >> give more context in the patch message. > > That's good enough, thanks. > > > >>> I mean this one: > >>> > >>> https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-epere...@redhat.com/ > >>> > >>> Though this time I have some comments on the details. > >>> > >>> Personally I like that one (probably with some amendment upon the old > >>> version) > >>> more than the current list-based approach. But I'd like to know your > >>> thoughts > >>> too (including Jason). I'll further comment in that thread soon. > >>> > >> Sure, I'm fine with whatever solution we choose, but I'm just running > >> out of ideas to simplify it. Reading your suggestions on old RFC now. > >> > >> Overall I feel list-based one is both more convenient and easy to > >> delete when qemu raises the minimal glib version, but it adds a lot > >> more code. > >> > >> It could add less code with this less elegant changes: > >> * If we just put the list entry in the DMAMap itself, although it > >> exposes unneeded implementation details. > >> * We force the iova tree either to be an allocation-based or an > >> insertion-based, but not both. In other words, you can only either use > >> iova_tree_alloc or iova_tree_insert on the same tree. > > > This seems an odd API I must say :( > > > > Yeah, I just noticed it yesterday that there's no easy choice on it. Let's > > go > > with either way; it shouldn't block the rest of the code. It'll be good if > > Jason or Michael share their preferences too. > > > (Havne't gone through the code deeply) > > I wonder how about just copy-paste gtree_node_first|last()? A quick > google told me it's not complicated. >
Both GTree and GTreeNode definitions are not part of the ABI of glib. I think the ustream code has not changed its layout for a long time but still it's allowed to do so in the future. Having said that, they use a list internally to traverse the nodes, with node->left and node->right instead of TAILQ entries. These patches duplicate that intrusive list in DMAMap entries, and make them private so other parts of qemu are not affected. Thanks! > Thanks > > > > > >> I have a few tests to check the algorithms, but they are not in the > >> qemu test format. I will post them so we all can understand better > >> what is expected from this. > > Sure. Thanks. > > >