Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Thu, 2008-01-10 at 13:01 +1100, Rusty Russell wrote: On Thursday 10 January 2008 09:10:37 James Bottomley wrote: On Tue, 2008-01-08 at 11:39 +1100, Rusty Russell wrote: On Tuesday 08 January 2008 02:48:23 James Bottomley wrote: We're always open to new APIs (or more powerful and expanded old ones). The way we've been doing the sg_chain conversion is to slide API layers into the drivers so sg_chain becomes a simple API flip when we turn it on. Unfortunately sg_ring doesn't quite fit nicely into this. Hi James, Well, it didn't touch any drivers. The only ones which needed altering were those which wanted to use large scatter-gather lists. You think of the subtlety of sg-chain conversion as a feature; it's a bug :) No, I think of the side effect of hiding scatterlist manipulations inside accessors as a feature because it insulates the drivers from the details of the implementation. I completely disagree. Abstractions add complexity, and that costs. They steal information from their users, and as they build up like sediment they make debugging and understanding harder. Don't be silly ... abstractions are the core of how we program something as complex as platform hardware. By design, abstractions hide information. Good abstractions hide information you don't need to know. Take the DMA API for instance: dma_map_sg() hides all of the platform specifics of whether or not you have an IOMMU and how it is programmed. That's a good abstraction ... drivers would be a complete mess if they had to do it themselves. For example, an array is simple and well understood. An abstraction would just buy confusion and YA thing to learn. When a single array was no longer sufficient, I figured a linked-list of arrays was the simplest replacement. Easy to understand and accepted practice (tho rings are a bit exotic). Because the implementation is the interface, authors can see what is legal. Really, I don't buy this at all. The first thing you complained about when you came up with sg_ring was how painful it was to update the 30 odd SCSI drivers. That was also my reaction when the scatterlist chaining idea came along. My second was supposing this isn't quite right and we modify the way chaining is done ... do I have to update all the drivers all over again? Hence one of the input requirements to the update was a consumer abstraction to insulate drivers from the implementation. Thinking about information that the drivers don't need to know, the obvious thing is how the actual scatterlist is implemented, whether its a list based ring, a pointer/mark based chain or an as yet to be invented annulus. All I think most consumers need is: 1. A way to get the first scatterlist element 2. A way to get the next scatterlist element 3. An iterator over the entire scatterlist 4. A way to tell if they've reached the last scatterlist element. Actually 1,2,4 satisfy 3 but it's such a common operation in most drivers that it was a good simplification. Other things are sizing and element count. More concretely, you're already regarding your abstractions as too expensive, which is why sg_chain() doesn't handle chained sgs. sg_chain isn't part of the consumer interface it's part of the producer/manipulator interface. I care a lot less about getting that one right first time because it only occurs in one place in the SCSI mid-layer and is easy to update and fix. My principal concern at the moment is the consumer interface ... if there's a fault in that we need to know now. Result: you've got a complex implementation and a complex interface with a complex abstraction. sg_chains suck for manipulation, and AFAICT that's inherent. Here, take a look at the sg_ring conversion of scsi_alloc_sgtable and scsi_free_sgtable and you can see why I'm unhappy with the sg_chain code: [...] Hope that clarifies, Actually, not really. If I want to continue the competition, I can just point out that your sg_ring code is bigger than those corresponding segments are after the sg_table conversion of scsi_lib.c ... However, this is pointless. No, it's exactly the point. These details *matter*. The implementation *matters*. sg_table moves this code out of scsi_lib (good!), but still demonstrates how much of a PITA they are to manipulate. As for being able to make arbitrary changes in future without hitting drivers: this is the Kernel ABI pipe dream writ small. No, the kernel ABI is about preserving binary semantics for proprietary drivers. We're talking about an API here which is a totally different thing. I don't believe I said anywhere that it should be a fixed API for all time ... that would also be silly. However, that doesn't mean you don't take the best care you can to get an API as right as you can. If it's wrong, it will be altered, but it is much nicer to get it right first time so it
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Tue, 2008-01-08 at 11:39 +1100, Rusty Russell wrote: On Tuesday 08 January 2008 02:48:23 James Bottomley wrote: We're always open to new APIs (or more powerful and expanded old ones). The way we've been doing the sg_chain conversion is to slide API layers into the drivers so sg_chain becomes a simple API flip when we turn it on. Unfortunately sg_ring doesn't quite fit nicely into this. Hi James, Well, it didn't touch any drivers. The only ones which needed altering were those which wanted to use large scatter-gather lists. You think of the subtlety of sg-chain conversion as a feature; it's a bug :) No, I think of the side effect of hiding scatterlist manipulations inside accessors as a feature because it insulates the drivers from the details of the implementation. The other thing I note is that the problem you're claiming to solve with sg_ring (the ability to add extra scatterlists to the front or the back of an existing one) is already solved with sg_chain, so the only real advantage of sg_ring was that it contains explicit counts, which sg_table (in -mm) also introduces. I just converted virtio using latest Linus for fair comparison Erm, but that's not really a fair comparison; you need the sg_table code in git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git branch sg as well. Actually, I think it's a wash. Now callers need to set up an sg_table as well. It will save the count_sg() calls though. , and it's still pretty ugly. sg_ring needs more work to de-scsi it. sg_table is almost sg_ring except it retains all the chaining warts. But we hit the same problems: 1) sg_chain loses information. The clever chain packaging makes reading easy, but manipulation is severely limited. You can append to your own chains by padding, but not someone elses. This works for SCSI, but what about the rest of us? And don't even think of joining mapped chains: it will almost work. OK, but realistically some of your criticisms are way outside of the design intent. Scatterlists (and now sg_chains) are the way the block subsystem hands pages to its underlying block devices. James, scatterlists are used for more than the block subsystem. I know you're designing for that, but we can do better. Because a single sg_ring is trivially convertable to and from a scatterlist *, the compatibility story is nice. You can implement a subsystem (say, the block layer) with sg_ring, and still hand out struct scatterlist arrays for backwards compat: old code won't ask for v. long scatterlist arrays anyway. Now we have sg_chaining, we can actually convert complex sg_rings into sg chains when someone asks, as my most recent patch does. I think that's one abstraction too many, but it provides a transition path. There have never until now been any requirements to join already dma_map_sg() converted scatterlists ... that would wreak havoc with the way we reuse the list plus damage the idempotence of map/unmap. What is the use case you have for this? (This was, admittedly, a made-up example). 2) sg_chain's end marker is only for reading non-dma elements, not for mapped chains, nor for writing into chains. Plus it's a duplicate of the num arg, which is still passed around for efficiency. (virtio had to implement count_sg() because I didn't want those two redundant num args). 3) By overloading struct scatterlist, it's invisible what code has been converted to chains, and which hasn't. Too clever by half! No it's not ... that's the whole point. Code not yet converted to use the API accessors is by definition unable to use chaining. Code converted to use the accessors by design doesn't care (and so is converted to chains). But you've overloaded the type: what's the difference between: /* Before conversion */ int foo(struct scatterlist *, int); And: /* After conversion */ int foo(struct scatterlist *, int); You have to wade through the implementation to see the difference: does this function take a chained scatterlist or an array scatterlist? Then you add insult to injury by implementing sg_chain() *which doesn't handle chained scatterlists!*. sg_ring would not have required any change to existing drivers, just those that want to use long sg lists. And then it's damn obvious which are which. Which, by definition, would have been pretty much all of them. But it would have started with none of them, and it would have been done over time. Eventually we might have had a flag day to remove raw scatterlist arrays. 4) sg_chain missed a chance to combine all the relevent information (max, num, num_dma and the sg array). sg_table comes close, but you still can't join them (no max information, and even if there were, what if it's full?). Unlike
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Thursday 10 January 2008 09:10:37 James Bottomley wrote: On Tue, 2008-01-08 at 11:39 +1100, Rusty Russell wrote: On Tuesday 08 January 2008 02:48:23 James Bottomley wrote: We're always open to new APIs (or more powerful and expanded old ones). The way we've been doing the sg_chain conversion is to slide API layers into the drivers so sg_chain becomes a simple API flip when we turn it on. Unfortunately sg_ring doesn't quite fit nicely into this. Hi James, Well, it didn't touch any drivers. The only ones which needed altering were those which wanted to use large scatter-gather lists. You think of the subtlety of sg-chain conversion as a feature; it's a bug :) No, I think of the side effect of hiding scatterlist manipulations inside accessors as a feature because it insulates the drivers from the details of the implementation. I completely disagree. Abstractions add complexity, and that costs. They steal information from their users, and as they build up like sediment they make debugging and understanding harder. For example, an array is simple and well understood. An abstraction would just buy confusion and YA thing to learn. When a single array was no longer sufficient, I figured a linked-list of arrays was the simplest replacement. Easy to understand and accepted practice (tho rings are a bit exotic). Because the implementation is the interface, authors can see what is legal. More concretely, you're already regarding your abstractions as too expensive, which is why sg_chain() doesn't handle chained sgs. Result: you've got a complex implementation and a complex interface with a complex abstraction. sg_chains suck for manipulation, and AFAICT that's inherent. Here, take a look at the sg_ring conversion of scsi_alloc_sgtable and scsi_free_sgtable and you can see why I'm unhappy with the sg_chain code: [...] Hope that clarifies, Actually, not really. If I want to continue the competition, I can just point out that your sg_ring code is bigger than those corresponding segments are after the sg_table conversion of scsi_lib.c ... However, this is pointless. No, it's exactly the point. These details *matter*. The implementation *matters*. sg_table moves this code out of scsi_lib (good!), but still demonstrates how much of a PITA they are to manipulate. As for being able to make arbitrary changes in future without hitting drivers: this is the Kernel ABI pipe dream writ small. OK, I give in with -ETIMEDOUT. I'll go away now and do something productive :) Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
Rusty Russell wrote: On Monday 07 January 2008 17:37:41 Tejun Heo wrote: Rusty Russell wrote: Hi Tejun, Nice try! Even ignoring the ugliness of undoing such an operation if the caller doesn't expect you to mangle their chains, consider a one-element sg array. :( Heh heh, that can be dealt with by skipping the first chain if the first chain is empty after chaining. Please take a look at ata_sg_setup_extra() in the following. http://git.kernel.org/?p=linux/kernel/git/tj/libata-dev.git;a=blob;f=driver s/ata/libata-core.c;h=32dde5bbc75ed53e89ac17040da2cd0621a37161;hb=c8847e473a 4a2844244784226eb362be10d52ce9 That said, yeah, it's seriously ugly. Restoring the original sg is ugly too. I definitely agree that we need some improvements here. Erk, that's beyond ugly, into actual evil. /me agrees. To make this general you need to find the last N 1-element chains (but SCSI doesn't do this of course). Oh the horror... /me agrees. I'd be remiss if I didn't mention that the sg_ring ata patches were straightforward, and indescribably beautiful if compared to this! /me agrees. As long as this can be made sane using one unified interface, I don't care whether it's sg_ring, table or dish. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
Tejun Heo [EMAIL PROTECTED] wrote: /me agrees. As long as this can be made sane using one unified interface, I don't care whether it's sg_ring, table or dish. Give us sg_annulus :) On a more serious note, having played with SG chaining for a couple of years in the crypto layer, I must say Rusty's proposal is a great improvement for me. The reason is that if the crypto layer gets an SG list from its user, it is not at liberty to just modify it. As such chaining at the end is cumbersome with the current interface because we have to duplicate the whole list in order to add just a single block at the end. While with sg_ring we could just put the original list along with the new one in a new list. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Mon, 2008-01-07 at 15:38 +1100, Rusty Russell wrote: On Sunday 06 January 2008 02:31:12 James Bottomley wrote: On Wed, 2007-12-19 at 17:31 +1100, Rusty Russell wrote: This patch series is the start of my attempt to simplify and make explicit the chained scatterlist logic. It's not complete: my SATA box boots and seems happy, but all the other users of SCSI need to be updated and checked. But I've gotten far enough to believe it's worth persuing. Sorry for the delay in looking at this, I was busy with Holidays and things. Thankyou for your consideration. When I compare sg_ring with the current sg_chain (and later sg_table) implementations, I'm actually struck by how similar they are. I agree, they're solving the same problem. It is possible that the pain of change is no longer worthwhile, but I hate to see us give up on this. We're adding complexity without making it harder to misuse. We're always open to new APIs (or more powerful and expanded old ones). The way we've been doing the sg_chain conversion is to slide API layers into the drivers so sg_chain becomes a simple API flip when we turn it on. Unfortunately sg_ring doesn't quite fit nicely into this. The other thing I note is that the problem you're claiming to solve with sg_ring (the ability to add extra scatterlists to the front or the back of an existing one) is already solved with sg_chain, so the only real advantage of sg_ring was that it contains explicit counts, which sg_table (in -mm) also introduces. I just converted virtio using latest Linus for fair comparison Erm, but that's not really a fair comparison; you need the sg_table code in git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git branch sg as well. , and it's still pretty ugly. sg_ring needs more work to de-scsi it. sg_table is almost sg_ring except it retains all the chaining warts. But we hit the same problems: 1) sg_chain loses information. The clever chain packaging makes reading easy, but manipulation is severely limited. You can append to your own chains by padding, but not someone elses. This works for SCSI, but what about the rest of us? And don't even think of joining mapped chains: it will almost work. OK, but realistically some of your criticisms are way outside of the design intent. Scatterlists (and now sg_chains) are the way the block subsystem hands pages to its underlying block devices. We do this through two APIs: blk_rq_map_sg() to take a request and place all the pages into a scatterlist respecting the merging parameters and dma_map_sg() which takes the scatterlist and produces an arch specific DMA mapped scatterlist reusing the old list. The requirement is that dma_unmap_sg return the chain to its former state (so block device may map, and unmap between congestion waits to avoid running systems out of scarce IOMMU mappings), but there's not even a parallel blk_rq_unmap_sg() beacuse the scatterlist is assumed disposable by the block driver after the request completes. By someone elses I'm assuming you mean a chain in a stacked block driver? You are actually allowed to transform these ... depending on who does the dma mapping, of course, if the lower driver (yours) does DMA mapping, you can do anything you want. If the upper driver does, then you have to preserve idempotence. There have never until now been any requirements to join already dma_map_sg() converted scatterlists ... that would wreak havoc with the way we reuse the list plus damage the idempotence of map/unmap. What is the use case you have for this? 2) sg_chain's end marker is only for reading non-dma elements, not for mapped chains, nor for writing into chains. Plus it's a duplicate of the num arg, which is still passed around for efficiency. (virtio had to implement count_sg() because I didn't want those two redundant num args). 3) By overloading struct scatterlist, it's invisible what code has been converted to chains, and which hasn't. Too clever by half! No it's not ... that's the whole point. Code not yet converted to use the API accessors is by definition unable to use chaining. Code converted to use the accessors by design doesn't care (and so is converted to chains). Look at sg_chain(): it claims to join two scatterlists, but it doesn't. It assumes that prv is an array, not a chain. Because you've overloaded an existing type, this happens everywhere. Try passing skb_to_sgvec a chained skb. sg_ring would not have required any change to existing drivers, just those that want to use long sg lists. And then it's damn obvious which are which. Which, by definition, would have been pretty much all of them. Another driving factor is that the mempool allocation of scatterlists is suboptimal in terms of their bucket size, so we were looking to tune that once the chaining code was in. The guess is that we'd probalby end up with a uniform size
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Sunday 06 January 2008 02:31:12 James Bottomley wrote: On Wed, 2007-12-19 at 17:31 +1100, Rusty Russell wrote: This patch series is the start of my attempt to simplify and make explicit the chained scatterlist logic. It's not complete: my SATA box boots and seems happy, but all the other users of SCSI need to be updated and checked. But I've gotten far enough to believe it's worth persuing. Sorry for the delay in looking at this, I was busy with Holidays and things. Thankyou for your consideration. When I compare sg_ring with the current sg_chain (and later sg_table) implementations, I'm actually struck by how similar they are. I agree, they're solving the same problem. It is possible that the pain of change is no longer worthwhile, but I hate to see us give up on this. We're adding complexity without making it harder to misuse. The other thing I note is that the problem you're claiming to solve with sg_ring (the ability to add extra scatterlists to the front or the back of an existing one) is already solved with sg_chain, so the only real advantage of sg_ring was that it contains explicit counts, which sg_table (in -mm) also introduces. I just converted virtio using latest Linus for fair comparison, and it's still pretty ugly. sg_ring needs more work to de-scsi it. sg_table is almost sg_ring except it retains all the chaining warts. But we hit the same problems: 1) sg_chain loses information. The clever chain packaging makes reading easy, but manipulation is severely limited. You can append to your own chains by padding, but not someone elses. This works for SCSI, but what about the rest of us? And don't even think of joining mapped chains: it will almost work. 2) sg_chain's end marker is only for reading non-dma elements, not for mapped chains, nor for writing into chains. Plus it's a duplicate of the num arg, which is still passed around for efficiency. (virtio had to implement count_sg() because I didn't want those two redundant num args). 3) By overloading struct scatterlist, it's invisible what code has been converted to chains, and which hasn't. Too clever by half! Look at sg_chain(): it claims to join two scatterlists, but it doesn't. It assumes that prv is an array, not a chain. Because you've overloaded an existing type, this happens everywhere. Try passing skb_to_sgvec a chained skb. sg_ring would not have required any change to existing drivers, just those that want to use long sg lists. And then it's damn obvious which are which. 4) sg_chain missed a chance to combine all the relevent information (max, num, num_dma and the sg array). sg_table comes close, but you still can't join them (no max information, and even if there were, what if it's full?). Unlike sg_ring, it's unlikely to reduce bugs. 5) (A little moot now) sg_ring didn't require arch changes. The other differences are that sg_ring only allows adding at the front or back of an existing sg_ring, it doesn't allow splicing at any point like sg_chain does, so I'd say it's less functional (not that I actually want anyone ever to do this, of course ...) Well it's just as possible, but you might have to copy more elements (with sg chaining you need only copy 1 sg element to make room for the chain elem). Agreed that it's a little out there... The final point is that sg_ring requires a two level traversal: ring list then scatterlist, whereas sg_chain only requires a single level traversal. I grant that we can abstract out the traversal into something that would make users think they're only doing a single level, but I don't see what the extra level really buys us. We hide the real complexity from users and it makes it less safe for non-trivial cases. Hence the introduction of YA datastructure: sg_table. This is getting close: just hang the array off it and you'll have sg_ring and no requirement for dynamic alloc all the time. And once you have a header, no need for chaining tricks... The only thing missing from sg_chain perhaps is an accessor function that does the splicing, which I can easily construct if you want to try it out in virtio. I don't need that (I prepend and append), but it'd be nice if sg_next took a const struct scatterlist *. Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
Hello, Rusty Russell wrote: The other thing I note is that the problem you're claiming to solve with sg_ring (the ability to add extra scatterlists to the front or the back of an existing one) is already solved with sg_chain, so the only real advantage of sg_ring was that it contains explicit counts, which sg_table (in -mm) also introduces. I just converted virtio using latest Linus for fair comparison, and it's still pretty ugly. sg_ring needs more work to de-scsi it. sg_table is almost sg_ring except it retains all the chaining warts. I agree it's ugly. But we hit the same problems: 1) sg_chain loses information. The clever chain packaging makes reading easy, but manipulation is severely limited. You can append to your own chains by padding, but not someone elses. This works for SCSI, but what about the rest of us? And don't even think of joining mapped chains: it will almost work. You can append by allocating one more element on the chain to be appended and moving the last element of the first chain to it while using the last element for chaining. Join can be done by similar technique using three element extra chain in the middle. But, yeah, it's ugly as hell. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Monday 07 January 2008 16:01:40 Tejun Heo wrote: But we hit the same problems: 1) sg_chain loses information. The clever chain packaging makes reading easy, but manipulation is severely limited. You can append to your own chains by padding, but not someone elses. This works for SCSI, but what about the rest of us? And don't even think of joining mapped chains: it will almost work. You can append by allocating one more element on the chain to be appended and moving the last element of the first chain to it while using the last element for chaining. Hi Tejun, Nice try! Even ignoring the ugliness of undoing such an operation if the caller doesn't expect you to mangle their chains, consider a one-element sg array. :( Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] sg_ring: a ring of scatterlist arrays
On Wed, 2007-12-19 at 17:31 +1100, Rusty Russell wrote: This patch series is the start of my attempt to simplify and make explicit the chained scatterlist logic. It's not complete: my SATA box boots and seems happy, but all the other users of SCSI need to be updated and checked. But I've gotten far enough to believe it's worth persuing. Sorry for the delay in looking at this, I was busy with Holidays and things. When I compare sg_ring with the current sg_chain (and later sg_table) implementations, I'm actually struck by how similar they are. The other thing I note is that the problem you're claiming to solve with sg_ring (the ability to add extra scatterlists to the front or the back of an existing one) is already solved with sg_chain, so the only real advantage of sg_ring was that it contains explicit counts, which sg_table (in -mm) also introduces. The other differences are that sg_ring only allows adding at the front or back of an existing sg_ring, it doesn't allow splicing at any point like sg_chain does, so I'd say it's less functional (not that I actually want anyone ever to do this, of course ...) The final point is that sg_ring requires a two level traversal: ring list then scatterlist, whereas sg_chain only requires a single level traversal. I grant that we can abstract out the traversal into something that would make users think they're only doing a single level, but I don't see what the extra level really buys us. The above analysis seems to suggest that sg_chain is simpler and has more functionality than sg_ring, unless I've missed anything? The only thing missing from sg_chain perhaps is an accessor function that does the splicing, which I can easily construct if you want to try it out in virtio. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html