On Fri, 2 Dec 2016, Dario Faggioli wrote:
> On Thu, 2016-12-01 at 15:14 -0800, Stefano Stabellini wrote:
> > On Thu, 1 Dec 2016, Dario Faggioli wrote:
> > >
> > > On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> > > >
> > > > ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
> > > >
> > > blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash
> > > before
> > > the index). Not a big deal, I guess, but I thought it could be nice
> > > to
> > > be a bit more uniform.
> >
> > Sure, but in this case each ring-ref-%u is used to map a different
> > ring.
> >
> Yeah, right. So it may even be a good thing to differentiate, indeed...
>
> > That said, I can make the change.
> >
> I don't know. I, FWIW, thought it would be good, now I'm not so sure
> any longer. Yours and maintainers' call, I guess. :-)
>
> > > If it is, what's the typical envisioned use of these multiple
> > > rings, if
> > > I can ask?
> >
> > They are used to handle multiple read/write requests in parallel.
> > Let's
> > assume that we configure the rings to be 8K each. Given that the data
> > is
> > transmitted over the ring, each ring can hold only one outstanding 4K
> > write request (there is an header for each write request).
> >
> Ok.
>
> > With two 8K rings, we can have two outstanding 4K write requests,
> > each
> > of them processed in parallel on a different vcpu.
> >
> > The system is completely configurable in terms of number and size of
> > rings, so a user can configure it to only export one 4K ring for
> > example or go as far as several 2MB rings.
> >
> Right. So, it is indeed similar to blkif multiqueueing, with which it
> also shares the idea/objective of exploiting parallelism at the (v)CPU
> level, but without (quite obviously, in this case) any direct link to
> hardware queues in disk controllers, and without the protocol itself
> giving any direction or indication of how to actually use all this.
>
> Got it. Nice.
>
> FWIW, I think a few words --just a shorter version of what you just
> said-- may be useful if present in this document.
I'll do.
> > > > /* not actually C compliant (ring_order changes from socket
> > > > to
> > > > socket) */
> > > > struct ring_data {
> > > > char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > > char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > > };
> > > >
> > > Sorry, what does "ring_order changes from socket to socket" mean?
> >
> > Woops, copy/paste error from PVCalls. I meant "ring_order changes
> > from
> > ring to ring".
> >
> Ah, yes, now it makes sense. :-)
>
> BTW, what's the reason for putting ring_order inside xen_9pfs_intf,
> instead of having a ring-page-order (well, actually, a
> ring-%u-page-order) xenstore key?
So that each ring can have a different size.
> > > > The binary layout of `struct xen_9pfs_intf` follows:
> > > >
> > > > 0 4 8 12 16 20
> > > > +---------+---------+---------+---------+---------+
> > > > | in_cons | in_prod |out_cons |out_prod |ring_orde|
> > > > +---------+---------+---------+---------+---------+
> > > >
> > > > 20 24 26 4092 4096
> > > > +---------+---------+----//---+---------+
> > > > | ref[0] | ref[1] | | ref[N] |
> > > > +---------+---------+----//---+---------+
> > > >
> > > > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given
> > > > that
> > > > N
> > > > needs to be a power of two, actually max N is 512.
> > > >
> > > It may again be me being still too naive, but I'd quickly add at
> > > least
> > > another example, with the value of N computed for a multiple pages
> > > ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."
> >
> > For 4 pages, N is 4. N is the number of pages that make up the ring.
> >
> > Maybe there is a misunderstanding, let me try to explain it again:
> > each
> > page shared via xenstore contains information to handle one new ring,
> > including grant references for the pages making up the multipage ring
> > itself. I'll repeat: pages shared via xenstore are not used as a
> > ring,
> > they are used to setup the rings, each page has the info to setup a
> > new
> > ring.
> >
> Right, I got this. And indeed I expressed myself very badly above.
>
> So, the descriptor of 1 ring is one page. Such page contains, in signle
> page rings, the reference to another page, which is the actual ring. If
> the ring is multi-page, the descriptor page contains an array of page
> references which, together, are the actual ring.
Right
> Such array --of which N is, in the diagram above, the last index-- can
> be, as you say, up to 1019 elements big (the available space in a ring
> descriptor page). Therefore, the math I was asking about is really the
> relationship between N and max-ring-page-order. That is, a ring can
> have at most 2^max-ring-page-order pages, and N can be at most 1019
> (well, I think it's 1018 if, as in diagram above, you count from 0, but
> that does not matter much); so:
>
> 2^max-ring-page-order <= N
> lb(2^max-ring-page-order) <= lb(N) //lb(): base 2 logarithm
> max-ring-page-order <= lb(N)
>
> and, considering that max-ring-page-order must be a natural number:
>
> max-ring-page-order <= floor(lb(N))
> max-ring-page-order <= floor(lb(1018))
> max-ring-page-order <= floor(9.9915)
> max-ring-page-order <= 9
>
> so a ring can be at most 2^9 pages big, which indeed matches with your
> own calculations, and bring us to the fact that the maximum size of a
> ring is 512*4Kb=2Mb
>
> So, to recap (sorry for being so long!), I think that saying:
>
> "**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
> N needs to be a power of two, actually max N is 512."
>
> is indeed correct, and probably makes it enough clear that the maximum
> ring size is 2MB. It's not equally easy, IMO, to map this back to the
> fact that this also mean max-ring-page-order must be at most 9, and
> that is not spelled out anywhere else, AFAICT.
>
> Therefore, an example of how things look with a couple of different
> values of ring_order, or some shorter and less boring version of this
> reasoning and calculations may help with that. That's what I'm trying
> to say. :-)
I'll expand the N.B. to cover this.
> > The structure of these "setup pages" is `struct xen_9pfs_intf`. Each
> > page is completely separate and independent from the others. Given
> > that
> > one page is just 4096 bytes, it can contain max 512 grant refs (see
> > calculation above). So the max size of one multipage ring is 512
> > pages =
> > 2MB.
> >
> > Does it make sense?
> >
> It does, and this is probably me being a mix of, not too used to this,
> and too picky... If that's the case, sorry for the noise. :-D
No worries, thanks for the review!
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel