On Mon, Jan 23, 2017 at 09:48:48AM +0800, Jason Wang wrote: > > > On 2017年01月22日 16:51, Peter Xu wrote: > >On Sun, Jan 22, 2017 at 03:56:10PM +0800, Jason Wang wrote: > > > >[...] > > > >>>+/** > >>>+ * vtd_page_walk_level - walk over specific level for IOVA range > >>>+ * > >>>+ * @addr: base GPA addr to start the walk > >>>+ * @start: IOVA range start address > >>>+ * @end: IOVA range end address (start <= addr < end) > >>>+ * @hook_fn: hook func to be called when detected page > >>>+ * @private: private data to be passed into hook func > >>>+ * @read: whether parent level has read permission > >>>+ * @write: whether parent level has write permission > >>>+ * @skipped: accumulated skipped ranges > >>What's the usage for this parameter? Looks like it was never used in this > >>series. > >This was for debugging purpose before, and I kept it in case one day > >it can be used again, considering that will not affect much on the > >overall performance. > > I think we usually do not keep debugging codes outside debug macros.
I'll remove it. > > > > >>>+ * @notify_unmap: whether we should notify invalid entries > >>>+ */ > >>>+static int vtd_page_walk_level(dma_addr_t addr, uint64_t start, > >>>+ uint64_t end, vtd_page_walk_hook hook_fn, > >>>+ void *private, uint32_t level, > >>>+ bool read, bool write, uint64_t *skipped, > >>>+ bool notify_unmap) > >>>+{ > >>>+ bool read_cur, write_cur, entry_valid; > >>>+ uint32_t offset; > >>>+ uint64_t slpte; > >>>+ uint64_t subpage_size, subpage_mask; > >>>+ IOMMUTLBEntry entry; > >>>+ uint64_t iova = start; > >>>+ uint64_t iova_next; > >>>+ uint64_t skipped_local = 0; > >>>+ int ret = 0; > >>>+ > >>>+ trace_vtd_page_walk_level(addr, level, start, end); > >>>+ > >>>+ subpage_size = 1ULL << vtd_slpt_level_shift(level); > >>>+ subpage_mask = vtd_slpt_level_page_mask(level); > >>>+ > >>>+ while (iova < end) { > >>>+ iova_next = (iova & subpage_mask) + subpage_size; > >>>+ > >>>+ offset = vtd_iova_level_offset(iova, level); > >>>+ slpte = vtd_get_slpte(addr, offset); > >>>+ > >>>+ /* > >>>+ * When one of the following case happens, we assume the whole > >>>+ * range is invalid: > >>>+ * > >>>+ * 1. read block failed > >>Don't get the meaning (and don't see any code relate to this comment). > >I took above vtd_get_slpte() a "read", so I was trying to mean that we > >failed to read the SLPTE due to some reason, we assume the range is > >invalid. > > I see, so we'd better move the comment above of vtd_get_slpte(). Let me remove the whole comment block... I think the codes explained it clearly even without any comment. (when people see the code check slpte against -1, it'll think about above function naturally) > > > > >>>+ * 3. reserved area non-zero > >>>+ * 2. both read & write flag are not set > >>Should be 1,2,3? And the above comment is conflict with the code at least > >>when notify_unmap is true. > >Yes, okay I don't know why 3 jumped. :( > > > >And yes, I should mention that "both read & write flag not set" only > >suites for page tables here. > > > >>>+ */ > >>>+ > >>>+ if (slpte == (uint64_t)-1) { > >>If this is true, vtd_slpte_nonzero_rsvd(slpte) should be true too I think? > >Yes, but we are doing two checks here: > > > >- checking against -1 to make sure slpte read success > >- checking against nonzero reserved fields to make sure it follows spec > > > >Imho we should not skip the first check here, only if one day removing > >this may really matter (e.g., for performance reason? I cannot think > >of one yet). > > Ok. (return -1 seems not good, but we can address this in the future). Thanks. > > > > >>>+ trace_vtd_page_walk_skip_read(iova, iova_next); > >>>+ skipped_local++; > >>>+ goto next; > >>>+ } > >>>+ > >>>+ if (vtd_slpte_nonzero_rsvd(slpte, level)) { > >>>+ trace_vtd_page_walk_skip_reserve(iova, iova_next); > >>>+ skipped_local++; > >>>+ goto next; > >>>+ } > >>>+ > >>>+ /* Permissions are stacked with parents' */ > >>>+ read_cur = read && (slpte & VTD_SL_R); > >>>+ write_cur = write && (slpte & VTD_SL_W); > >>>+ > >>>+ /* > >>>+ * As long as we have either read/write permission, this is > >>>+ * a valid entry. The rule works for both page or page tables. > >>>+ */ > >>>+ entry_valid = read_cur | write_cur; > >>>+ > >>>+ if (vtd_is_last_slpte(slpte, level)) { > >>>+ entry.target_as = &address_space_memory; > >>>+ entry.iova = iova & subpage_mask; > >>>+ /* > >>>+ * This might be meaningless addr if (!read_cur && > >>>+ * !write_cur), but after all this field will be > >>>+ * meaningless in that case, so let's share the code to > >>>+ * generate the IOTLBs no matter it's an MAP or UNMAP > >>>+ */ > >>>+ entry.translated_addr = vtd_get_slpte_addr(slpte); > >>>+ entry.addr_mask = ~subpage_mask; > >>>+ entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur); > >>>+ if (!entry_valid && !notify_unmap) { > >>>+ trace_vtd_page_walk_skip_perm(iova, iova_next); > >>>+ skipped_local++; > >>>+ goto next; > >>>+ } > >>Under which case should we care about unmap here (better with a comment I > >>think)? > >When PSIs are for invalidation, rather than newly mapped entries. In > >that case, notify_unmap will be true, and here we need to notify > >IOMMU_NONE to do the cache flush or unmap. > > > >(this page walk is not only for replaying, it is for cache flushing as > > well) > > > >Do you have suggestion on the comments? > > I think then we'd better move this to patch 18 which will use notify_unmap. Do you mean to add some more comment on patch 18? > > > > >>>+ trace_vtd_page_walk_one(level, entry.iova, > >>>entry.translated_addr, > >>>+ entry.addr_mask, entry.perm); > >>>+ if (hook_fn) { > >>>+ ret = hook_fn(&entry, private); > >>For better performance, we could try to merge adjacent mappings here. I > >>think both vfio and vhost support this and it can save a lot of ioctls. > >Looks so, and this is in my todo list. > > > >Do you mind I do it later after this series merged? I would really > >appreciate if we can have this codes settled down first (considering > >that this series has been dangling for half a year, or more, startint > >from Aviv's series), and I am just afraid this will led to > >unconvergence of this series (and I believe there are other places > >that can be enhanced in the future as well). > > Yes, let's do it on top. Thanks. > > > > >>>+ if (ret < 0) { > >>>+ error_report("Detected error in page walk hook " > >>>+ "function, stop walk."); > >>>+ return ret; > >>>+ } > >>>+ } > >>>+ } else { > >>>+ if (!entry_valid) { > >>>+ trace_vtd_page_walk_skip_perm(iova, iova_next); > >>>+ skipped_local++; > >>>+ goto next; > >>>+ } > >>>+ trace_vtd_page_walk_level(vtd_get_slpte_addr(slpte), level - > >>>1, > >>>+ iova, MIN(iova_next, end)); > >>This looks duplicated? > >I suppose not. The level is different. > > Seem not? level - 1 was passed to vtd_page_walk_level() as level which did: > > trace_vtd_page_walk_level(addr, level, start, end); Hmm yes I didn't notice that I had one at the entry. :( Let me only keep that one. Thanks, -- peterx