Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-23 Thread Halil Pasic
On Wed, 8 May 2019 15:18:10 +0200 (CEST)
Sebastian Ott  wrote:

> > @@ -224,6 +228,9 @@ struct subchannel *css_alloc_subchannel(struct 
> > subchannel_id schid,
> > INIT_WORK(>todo_work, css_sch_todo);
> > sch->dev.release = _subchannel_release;
> > device_initialize(>dev);
> > +   sch->dma_mask = css_dev_dma_mask;
> > +   sch->dev.dma_mask = >dma_mask;
> > +   sch->dev.coherent_dma_mask = sch->dma_mask;  
> 
> Could we do:
>   sch->dev.dma_mask = >dev.coherent_dma_mask;
>   sch->dev.coherent_dma_mask = css_dev_dma_mask;
> ?

Looks like a good idea to me. We will do it for all 3 (sch, ccw and
css). Thanks!

Regards,
Halil

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-22 Thread Halil Pasic
On Wed, 22 May 2019 14:07:05 +0200 (CEST)
Sebastian Ott  wrote:

> On Mon, 20 May 2019, Halil Pasic wrote:
> > On Thu, 16 May 2019 15:59:22 +0200 (CEST)
> > Sebastian Ott  wrote:
> > > We only have a couple of users for airq_iv:
> > > 
> > > virtio_ccw.c: 2K bits
> > 
> > You mean a single allocation is 2k bits (VIRTIO_IV_BITS = 256 * 8)? My
> > understanding is that the upper bound is more like:
> > MAX_AIRQ_AREAS * VIRTIO_IV_BITS = 20 * 256 * 8 = 40960 bits.
> > 
> > In practice it is most likely just 2k.
> > 
> > > 
> > > pci with floating IRQs: <= 2K (for the per-function bit vectors)
> > > 1..4K (for the summary bit vector)
> > >
> > 
> > As far as I can tell with virtio_pci arch_setup_msi_irqs() gets called
> > once per device and allocates a small number of bits (2 and 3 in my
> > test, it may depend on #virtqueues, but I did not check).
> > 
> > So for an upper bound we would have to multiply with the upper bound of
> > pci devices/functions. What is the upper bound on the number of
> > functions?
> > 
> > > pci with CPU directed IRQs: 2K (for the per-CPU bit vectors)
> > > 1..nr_cpu (for the summary bit vector)
> > > 
> > 
> > I guess this is the same.
> > 
> > > 
> > > The options are:
> > > * page allocations for everything
> > 
> > Worst case we need 20 + #max_pci_dev pages. At the moment we allocate
> > from ZONE_DMA (!) and waste a lot.
> > 
> > > * dma_pool for AIRQ_IV_CACHELINE ,gen_pool for others
> > 
> > I prefer this. Explanation follows.
> > 
> > > * dma_pool for everything
> > > 
> > 
> > Less waste by factor factor 16.
> > 
> > > I think we should do option 3 and use a dma_pool with cachesize
> > > alignment for everything (as a prerequisite we have to limit
> > > config PCI_NR_FUNCTIONS to 2K - but that is not a real constraint).
> > 
> > I prefer option 3 because it is conceptually the smallest change, and
>   ^
>   2
> > provides the behavior which is closest to the current one.
> 
> I can see that this is the smallest change on top of the current
> implementation. I'm good with doing that and looking for further
> simplification/unification later.
> 

Nod. I will go with that for v2.

> 
> > Commit  414cbd1e3d14 "s390/airq: provide cacheline aligned
> > ivs" (Sebastian Ott, 2019-02-27) could have been smaller had you implemented
> > 'kmem_cache for everything' (and I would have had just to replace 
> > kmem_cache with
> > dma_cache to achieve option 3). For some reason you decided to keep the
> > iv->vector = kzalloc(size, GFP_KERNEL) code-path and make the client code 
> > request
> > iv->vector = kmem_cache_zalloc(airq_iv_cache, GFP_KERNEL) explicitly, using 
> > a flag
> > which you only decided to use for directed pci irqs AFAICT.
> > 
> > My understanding of these decisions, and especially of the rationale
> > behind commit 414cbd1e3d14 is limited.
> 
> I introduced per cpu interrupt vectors and wanted to prevent 2 CPUs from
> sharing data from the same cacheline. No other user of the airq stuff had
> this need. If I had been aware of the additional complexity we would add
> on top of that maybe I would have made a different decision.

I understand. Thanks for the explanation!

Regards,
Halil

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-22 Thread Sebastian Ott
On Mon, 20 May 2019, Halil Pasic wrote:
> On Thu, 16 May 2019 15:59:22 +0200 (CEST)
> Sebastian Ott  wrote:
> > We only have a couple of users for airq_iv:
> > 
> > virtio_ccw.c: 2K bits
> 
> You mean a single allocation is 2k bits (VIRTIO_IV_BITS = 256 * 8)? My
> understanding is that the upper bound is more like:
> MAX_AIRQ_AREAS * VIRTIO_IV_BITS = 20 * 256 * 8 = 40960 bits.
> 
> In practice it is most likely just 2k.
> 
> > 
> > pci with floating IRQs: <= 2K (for the per-function bit vectors)
> > 1..4K (for the summary bit vector)
> >
> 
> As far as I can tell with virtio_pci arch_setup_msi_irqs() gets called
> once per device and allocates a small number of bits (2 and 3 in my
> test, it may depend on #virtqueues, but I did not check).
> 
> So for an upper bound we would have to multiply with the upper bound of
> pci devices/functions. What is the upper bound on the number of
> functions?
> 
> > pci with CPU directed IRQs: 2K (for the per-CPU bit vectors)
> > 1..nr_cpu (for the summary bit vector)
> > 
> 
> I guess this is the same.
> 
> > 
> > The options are:
> > * page allocations for everything
> 
> Worst case we need 20 + #max_pci_dev pages. At the moment we allocate
> from ZONE_DMA (!) and waste a lot.
> 
> > * dma_pool for AIRQ_IV_CACHELINE ,gen_pool for others
> 
> I prefer this. Explanation follows.
> 
> > * dma_pool for everything
> > 
> 
> Less waste by factor factor 16.
> 
> > I think we should do option 3 and use a dma_pool with cachesize
> > alignment for everything (as a prerequisite we have to limit
> > config PCI_NR_FUNCTIONS to 2K - but that is not a real constraint).
> 
> I prefer option 3 because it is conceptually the smallest change, and
  ^
  2
> provides the behavior which is closest to the current one.

I can see that this is the smallest change on top of the current
implementation. I'm good with doing that and looking for further
simplification/unification later.


> Commit  414cbd1e3d14 "s390/airq: provide cacheline aligned
> ivs" (Sebastian Ott, 2019-02-27) could have been smaller had you implemented
> 'kmem_cache for everything' (and I would have had just to replace kmem_cache 
> with
> dma_cache to achieve option 3). For some reason you decided to keep the
> iv->vector = kzalloc(size, GFP_KERNEL) code-path and make the client code 
> request
> iv->vector = kmem_cache_zalloc(airq_iv_cache, GFP_KERNEL) explicitly, using a 
> flag
> which you only decided to use for directed pci irqs AFAICT.
> 
> My understanding of these decisions, and especially of the rationale
> behind commit 414cbd1e3d14 is limited.

I introduced per cpu interrupt vectors and wanted to prevent 2 CPUs from
sharing data from the same cacheline. No other user of the airq stuff had
this need. If I had been aware of the additional complexity we would add
on top of that maybe I would have made a different decision.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-21 Thread Michael Mueller




On 20.05.19 14:13, Halil Pasic wrote:

On Thu, 16 May 2019 15:59:22 +0200 (CEST)
Sebastian Ott  wrote:


On Sun, 12 May 2019, Halil Pasic wrote:

I've also got code that deals with AIRQ_IV_CACHELINE by turning the
kmem_cache into a dma_pool.

Cornelia, Sebastian which approach do you prefer:
1) get rid of cio_dma_pool and AIRQ_IV_CACHELINE, and waste a page per
vector, or
2) go with the approach taken by the patch below?


We only have a couple of users for airq_iv:

virtio_ccw.c: 2K bits



You mean a single allocation is 2k bits (VIRTIO_IV_BITS = 256 * 8)? My
understanding is that the upper bound is more like:
MAX_AIRQ_AREAS * VIRTIO_IV_BITS = 20 * 256 * 8 = 40960 bits.

In practice it is most likely just 2k.



pci with floating IRQs: <= 2K (for the per-function bit vectors)
 1..4K (for the summary bit vector)



As far as I can tell with virtio_pci arch_setup_msi_irqs() gets called
once per device and allocates a small number of bits (2 and 3 in my
test, it may depend on #virtqueues, but I did not check).

So for an upper bound we would have to multiply with the upper bound of
pci devices/functions. What is the upper bound on the number of
functions?


pci with CPU directed IRQs: 2K (for the per-CPU bit vectors)
 1..nr_cpu (for the summary bit vector)



I guess this is the same.



The options are:
* page allocations for everything


Worst case we need 20 + #max_pci_dev pages. At the moment we allocate
from ZONE_DMA (!) and waste a lot.


* dma_pool for AIRQ_IV_CACHELINE ,gen_pool for others


I prefer this. Explanation follows.


* dma_pool for everything



Less waste by factor factor 16.


I think we should do option 3 and use a dma_pool with cachesize
alignment for everything (as a prerequisite we have to limit
config PCI_NR_FUNCTIONS to 2K - but that is not a real constraint).



I prefer option 3 because it is conceptually the smallest change, and
provides the behavior which is closest to the current one.

Commit  414cbd1e3d14 "s390/airq: provide cacheline aligned
ivs" (Sebastian Ott, 2019-02-27) could have been smaller had you implemented
'kmem_cache for everything' (and I would have had just to replace kmem_cache 
with
dma_cache to achieve option 3). For some reason you decided to keep the
iv->vector = kzalloc(size, GFP_KERNEL) code-path and make the client code 
request
iv->vector = kmem_cache_zalloc(airq_iv_cache, GFP_KERNEL) explicitly, using a 
flag
which you only decided to use for directed pci irqs AFAICT.

My understanding of these decisions, and especially of the rationale
behind commit 414cbd1e3d14 is limited. Thus if option 3 is the way to
go, and the choices made by 414cbd1e3d14 were sub-optimal, I would feel
much more comfortable if you provided a patch that revises  and switches
everything to kmem_chache. I would then just swap kmem_cache out with a
dma_cache and my change would end up a straightforward and relatively
clean one.

So Sebastian, what shall we do?

Regards,
Halil





Sebastian





Folks, I had a version running with slight changes to the initial
v1 patch set together with a revert of 414cbd1e3d14 "s390/airq: provide 
cacheline aligned ivs". That of course has the deficit of the memory

usage pattern.

Now you are discussing same substantial changes. The exercise was to
get an initial working code through the door. We really need a decision!


Michael

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-20 Thread Halil Pasic
On Thu, 16 May 2019 15:59:22 +0200 (CEST)
Sebastian Ott  wrote:

> On Sun, 12 May 2019, Halil Pasic wrote:
> > I've also got code that deals with AIRQ_IV_CACHELINE by turning the
> > kmem_cache into a dma_pool.
> > 
> > Cornelia, Sebastian which approach do you prefer:
> > 1) get rid of cio_dma_pool and AIRQ_IV_CACHELINE, and waste a page per
> > vector, or
> > 2) go with the approach taken by the patch below?
> 
> We only have a couple of users for airq_iv:
> 
> virtio_ccw.c: 2K bits


You mean a single allocation is 2k bits (VIRTIO_IV_BITS = 256 * 8)? My
understanding is that the upper bound is more like:
MAX_AIRQ_AREAS * VIRTIO_IV_BITS = 20 * 256 * 8 = 40960 bits.

In practice it is most likely just 2k.

> 
> pci with floating IRQs: <= 2K (for the per-function bit vectors)
> 1..4K (for the summary bit vector)
>

As far as I can tell with virtio_pci arch_setup_msi_irqs() gets called
once per device and allocates a small number of bits (2 and 3 in my
test, it may depend on #virtqueues, but I did not check).

So for an upper bound we would have to multiply with the upper bound of
pci devices/functions. What is the upper bound on the number of
functions?

> pci with CPU directed IRQs: 2K (for the per-CPU bit vectors)
> 1..nr_cpu (for the summary bit vector)
> 

I guess this is the same.

> 
> The options are:
> * page allocations for everything

Worst case we need 20 + #max_pci_dev pages. At the moment we allocate
from ZONE_DMA (!) and waste a lot.

> * dma_pool for AIRQ_IV_CACHELINE ,gen_pool for others

I prefer this. Explanation follows.

> * dma_pool for everything
> 

Less waste by factor factor 16.

> I think we should do option 3 and use a dma_pool with cachesize
> alignment for everything (as a prerequisite we have to limit
> config PCI_NR_FUNCTIONS to 2K - but that is not a real constraint).
>

I prefer option 3 because it is conceptually the smallest change, and
provides the behavior which is closest to the current one.

Commit  414cbd1e3d14 "s390/airq: provide cacheline aligned
ivs" (Sebastian Ott, 2019-02-27) could have been smaller had you implemented
'kmem_cache for everything' (and I would have had just to replace kmem_cache 
with
dma_cache to achieve option 3). For some reason you decided to keep the
iv->vector = kzalloc(size, GFP_KERNEL) code-path and make the client code 
request
iv->vector = kmem_cache_zalloc(airq_iv_cache, GFP_KERNEL) explicitly, using a 
flag
which you only decided to use for directed pci irqs AFAICT.

My understanding of these decisions, and especially of the rationale
behind commit 414cbd1e3d14 is limited. Thus if option 3 is the way to
go, and the choices made by 414cbd1e3d14 were sub-optimal, I would feel
much more comfortable if you provided a patch that revises  and switches
everything to kmem_chache. I would then just swap kmem_cache out with a
dma_cache and my change would end up a straightforward and relatively
clean one.

So Sebastian, what shall we do?

Regards,
Halil




> Sebastian
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-16 Thread Sebastian Ott
On Sun, 12 May 2019, Halil Pasic wrote:
> I've also got code that deals with AIRQ_IV_CACHELINE by turning the
> kmem_cache into a dma_pool.
> 
> Cornelia, Sebastian which approach do you prefer:
> 1) get rid of cio_dma_pool and AIRQ_IV_CACHELINE, and waste a page per
> vector, or
> 2) go with the approach taken by the patch below?

We only have a couple of users for airq_iv:

virtio_ccw.c: 2K bits

pci with floating IRQs: <= 2K (for the per-function bit vectors)
1..4K (for the summary bit vector)

pci with CPU directed IRQs: 2K (for the per-CPU bit vectors)
1..nr_cpu (for the summary bit vector)


The options are:
* page allocations for everything
* dma_pool for AIRQ_IV_CACHELINE ,gen_pool for others
* dma_pool for everything

I think we should do option 3 and use a dma_pool with cachesize
alignment for everything (as a prerequisite we have to limit
config PCI_NR_FUNCTIONS to 2K - but that is not a real constraint).

Sebastian

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-16 Thread Cornelia Huck
On Wed, 15 May 2019 19:12:57 +0200
Halil Pasic  wrote:

> On Mon, 13 May 2019 15:29:24 +0200
> Cornelia Huck  wrote:
> 
> > On Sun, 12 May 2019 20:22:56 +0200
> > Halil Pasic  wrote:
> >   
> > > On Fri, 10 May 2019 16:10:13 +0200
> > > Cornelia Huck  wrote:
> > >   
> > > > On Fri, 10 May 2019 00:11:12 +0200
> > > > Halil Pasic  wrote:
> > > > 
> > > > > On Thu, 9 May 2019 12:11:06 +0200
> > > > > Cornelia Huck  wrote:
> > > > > 
> > > > > > On Wed, 8 May 2019 23:22:10 +0200
> > > > > > Halil Pasic  wrote:
> > > > > >   
> > > > > > > On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> > > > > > > Sebastian Ott  wrote:  
> > > > > >   
> > > > > > > > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > > > > > > > >   
> > > > > > > > > unregister_reboot_notifier(_reboot_notifier);
> > > > > > > > >   goto out_unregister;
> > > > > > > > >   }
> > > > > > > > > + cio_dma_pool_init();  
> > > > > > > > 
> > > > > > > > This is too late for early devices (ccw console!).
> > > > > > > 
> > > > > > > You have already raised concern about this last time (thanks). I 
> > > > > > > think,
> > > > > > > I've addressed this issue: the cio_dma_pool is only used by the 
> > > > > > > airq
> > > > > > > stuff. I don't think the ccw console needs it. Please have an 
> > > > > > > other look
> > > > > > > at patch #6, and explain your concern in more detail if it 
> > > > > > > persists.  
> > > > > > 
> > > > > > What about changing the naming/adding comments here, so that (1) 
> > > > > > folks
> > > > > > aren't confused by the same thing in the future and (2) folks don't 
> > > > > > try
> > > > > > to use that pool for something needed for the early ccw consoles?
> > > > > >   
> > > > > 
> > > > > I'm all for clarity! Suggestions for better names?
> > > > 
> > > > css_aiv_dma_pool, maybe? Or is there other cross-device stuff that may
> > > > need it?
> > > > 
> > > 
> > > Ouch! I was considering to use cio_dma_zalloc() for the adapter
> > > interruption vectors but I ended up between the two chairs in the end.
> > > So with this series there are no uses for cio_dma pool.
> > > 
> > > I don't feel strongly about this going one way the other.
> > > 
> > > Against getting rid of the cio_dma_pool and sticking with the speaks
> > > dma_alloc_coherent() that we waste a DMA page per vector, which is a
> > > non obvious side effect.  
> > 
> > That would basically mean one DMA page per virtio-ccw device, right?  
> 
> Not quite: virtio-ccw shares airq vectors among multiple devices. We
> alloc 64 bytes a time and use that as long as we don't run out of bits.
>  
> > For single queue devices, this seems like quite a bit of overhead.
> >  
> 
> Nod.
>  
> > Are we expecting many devices in use per guest?  
> 
> This is affect linux in general, not only guest 2 stuff (i.e. we
> also waste in vanilla lpar mode). And zpci seems to do at least one
> airq_iv_create() per pci function.

Hm, I thought that guests with virtio-ccw devices were the most heavy
user of this.

On LPAR (which would be the environment where I'd expect lots of
devices), how many users of this infrastructure do we typically have?
DASD do not use adapter interrupts, and QDIO devices (qeth/zfcp) use
their own indicator handling (that pre-dates this infrastructure) IIRC,
which basically only leaves the PCI functions you mention above.

> 
> >   
> > > 
> > > What speaks against cio_dma_pool is that it is slightly more code, and
> > > this currently can not be used for very early stuff, which I don't
> > > think is relevant.   
> > 
> > Unless properly documented, it feels like something you can easily trip
> > over, however.
> > 
> > I assume that the "very early stuff" is basically only ccw consoles.
> > Not sure if we can use virtio-serial as an early ccw console -- IIRC
> > that was only about 3215/3270? While QEMU guests are able to use a 3270
> > console, this is experimental and I would not count that as a use case.
> > Anyway, 3215/3270 don't use adapter interrupts, and probably not
> > anything cross-device, either; so unless early virtio-serial is a
> > thing, this restriction is fine if properly documented.
> >   
> 
> Mimu can you dig into this a bit?
> 
> We could also aim for getting rid of this limitation. One idea would be
> some sort of lazy initialization (i.e. triggered by first usage).
> Another idea would be simply doing the initialization earlier.
> Unfortunately I'm not all that familiar with the early stuff. Is there
> some doc you can recommend?

I'd probably look at the code for that.

> 
> Anyway before investing much more into this, I think we should have a
> fair idea which option do we prefer...

Agreed.

> 
> > > What also used to speak against it is that
> > > allocations asking for more than a page would just fail, but I addressed
> > > that in the patch I've hacked up on top of the series, and I'm going to
> > > paste below. While at it I 

Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-15 Thread Halil Pasic
On Mon, 13 May 2019 15:29:24 +0200
Cornelia Huck  wrote:

> On Sun, 12 May 2019 20:22:56 +0200
> Halil Pasic  wrote:
> 
> > On Fri, 10 May 2019 16:10:13 +0200
> > Cornelia Huck  wrote:
> > 
> > > On Fri, 10 May 2019 00:11:12 +0200
> > > Halil Pasic  wrote:
> > >   
> > > > On Thu, 9 May 2019 12:11:06 +0200
> > > > Cornelia Huck  wrote:
> > > >   
> > > > > On Wed, 8 May 2019 23:22:10 +0200
> > > > > Halil Pasic  wrote:
> > > > > 
> > > > > > On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> > > > > > Sebastian Ott  wrote:
> > > > > 
> > > > > > > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > > > > > > > 
> > > > > > > > unregister_reboot_notifier(_reboot_notifier);
> > > > > > > > goto out_unregister;
> > > > > > > > }
> > > > > > > > +   cio_dma_pool_init();
> > > > > > > 
> > > > > > > This is too late for early devices (ccw console!).  
> > > > > > 
> > > > > > You have already raised concern about this last time (thanks). I 
> > > > > > think,
> > > > > > I've addressed this issue: the cio_dma_pool is only used by the airq
> > > > > > stuff. I don't think the ccw console needs it. Please have an other 
> > > > > > look
> > > > > > at patch #6, and explain your concern in more detail if it 
> > > > > > persists.
> > > > > 
> > > > > What about changing the naming/adding comments here, so that (1) folks
> > > > > aren't confused by the same thing in the future and (2) folks don't 
> > > > > try
> > > > > to use that pool for something needed for the early ccw consoles?
> > > > > 
> > > > 
> > > > I'm all for clarity! Suggestions for better names?  
> > > 
> > > css_aiv_dma_pool, maybe? Or is there other cross-device stuff that may
> > > need it?
> > >   
> > 
> > Ouch! I was considering to use cio_dma_zalloc() for the adapter
> > interruption vectors but I ended up between the two chairs in the end.
> > So with this series there are no uses for cio_dma pool.
> > 
> > I don't feel strongly about this going one way the other.
> > 
> > Against getting rid of the cio_dma_pool and sticking with the speaks
> > dma_alloc_coherent() that we waste a DMA page per vector, which is a
> > non obvious side effect.
> 
> That would basically mean one DMA page per virtio-ccw device, right?

Not quite: virtio-ccw shares airq vectors among multiple devices. We
alloc 64 bytes a time and use that as long as we don't run out of bits.
 
> For single queue devices, this seems like quite a bit of overhead.
>

Nod.
 
> Are we expecting many devices in use per guest?

This is affect linux in general, not only guest 2 stuff (i.e. we
also waste in vanilla lpar mode). And zpci seems to do at least one
airq_iv_create() per pci function.

> 
> > 
> > What speaks against cio_dma_pool is that it is slightly more code, and
> > this currently can not be used for very early stuff, which I don't
> > think is relevant. 
> 
> Unless properly documented, it feels like something you can easily trip
> over, however.
> 
> I assume that the "very early stuff" is basically only ccw consoles.
> Not sure if we can use virtio-serial as an early ccw console -- IIRC
> that was only about 3215/3270? While QEMU guests are able to use a 3270
> console, this is experimental and I would not count that as a use case.
> Anyway, 3215/3270 don't use adapter interrupts, and probably not
> anything cross-device, either; so unless early virtio-serial is a
> thing, this restriction is fine if properly documented.
> 

Mimu can you dig into this a bit?

We could also aim for getting rid of this limitation. One idea would be
some sort of lazy initialization (i.e. triggered by first usage).
Another idea would be simply doing the initialization earlier.
Unfortunately I'm not all that familiar with the early stuff. Is there
some doc you can recommend?

Anyway before investing much more into this, I think we should have a
fair idea which option do we prefer...

> > What also used to speak against it is that
> > allocations asking for more than a page would just fail, but I addressed
> > that in the patch I've hacked up on top of the series, and I'm going to
> > paste below. While at it I addressed some other issues as well.
> 
> Hm, which "other issues"?
> 

The kfree() I've forgotten to get rid of, and this 'document does not
work early' (pun intended) business.

> > 
> > I've also got code that deals with AIRQ_IV_CACHELINE by turning the
> > kmem_cache into a dma_pool.
> 
> Isn't that percolating to other airq users again? Or maybe I just don't
> understand what you're proposing here...

Absolutely.

> 
> > 
> > Cornelia, Sebastian which approach do you prefer:
> > 1) get rid of cio_dma_pool and AIRQ_IV_CACHELINE, and waste a page per
> > vector, or
> > 2) go with the approach taken by the patch below?
> 
> I'm not sure that I properly understand this (yeah, you probably
> guessed); so I'm not sure I can make a good call here.
> 

I hope you, I managed to clarify some of the 

Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-13 Thread Cornelia Huck
On Sun, 12 May 2019 20:22:56 +0200
Halil Pasic  wrote:

> On Fri, 10 May 2019 16:10:13 +0200
> Cornelia Huck  wrote:
> 
> > On Fri, 10 May 2019 00:11:12 +0200
> > Halil Pasic  wrote:
> >   
> > > On Thu, 9 May 2019 12:11:06 +0200
> > > Cornelia Huck  wrote:
> > >   
> > > > On Wed, 8 May 2019 23:22:10 +0200
> > > > Halil Pasic  wrote:
> > > > 
> > > > > On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> > > > > Sebastian Ott  wrote:
> > > > 
> > > > > > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > > > > > >   unregister_reboot_notifier(_reboot_notifier);
> > > > > > >   goto out_unregister;
> > > > > > >   }
> > > > > > > + cio_dma_pool_init();
> > > > > > 
> > > > > > This is too late for early devices (ccw console!).  
> > > > > 
> > > > > You have already raised concern about this last time (thanks). I 
> > > > > think,
> > > > > I've addressed this issue: the cio_dma_pool is only used by the airq
> > > > > stuff. I don't think the ccw console needs it. Please have an other 
> > > > > look
> > > > > at patch #6, and explain your concern in more detail if it persists.  
> > > > >   
> > > > 
> > > > What about changing the naming/adding comments here, so that (1) folks
> > > > aren't confused by the same thing in the future and (2) folks don't try
> > > > to use that pool for something needed for the early ccw consoles?
> > > > 
> > > 
> > > I'm all for clarity! Suggestions for better names?  
> > 
> > css_aiv_dma_pool, maybe? Or is there other cross-device stuff that may
> > need it?
> >   
> 
> Ouch! I was considering to use cio_dma_zalloc() for the adapter
> interruption vectors but I ended up between the two chairs in the end.
> So with this series there are no uses for cio_dma pool.
> 
> I don't feel strongly about this going one way the other.
> 
> Against getting rid of the cio_dma_pool and sticking with the speaks
> dma_alloc_coherent() that we waste a DMA page per vector, which is a
> non obvious side effect.

That would basically mean one DMA page per virtio-ccw device, right?
For single queue devices, this seems like quite a bit of overhead.

Are we expecting many devices in use per guest?

> 
> What speaks against cio_dma_pool is that it is slightly more code, and
> this currently can not be used for very early stuff, which I don't
> think is relevant. 

Unless properly documented, it feels like something you can easily trip
over, however.

I assume that the "very early stuff" is basically only ccw consoles.
Not sure if we can use virtio-serial as an early ccw console -- IIRC
that was only about 3215/3270? While QEMU guests are able to use a 3270
console, this is experimental and I would not count that as a use case.
Anyway, 3215/3270 don't use adapter interrupts, and probably not
anything cross-device, either; so unless early virtio-serial is a
thing, this restriction is fine if properly documented.

> What also used to speak against it is that
> allocations asking for more than a page would just fail, but I addressed
> that in the patch I've hacked up on top of the series, and I'm going to
> paste below. While at it I addressed some other issues as well.

Hm, which "other issues"?

> 
> I've also got code that deals with AIRQ_IV_CACHELINE by turning the
> kmem_cache into a dma_pool.

Isn't that percolating to other airq users again? Or maybe I just don't
understand what you're proposing here...

> 
> Cornelia, Sebastian which approach do you prefer:
> 1) get rid of cio_dma_pool and AIRQ_IV_CACHELINE, and waste a page per
> vector, or
> 2) go with the approach taken by the patch below?

I'm not sure that I properly understand this (yeah, you probably
guessed); so I'm not sure I can make a good call here.

> 
> 
> Regards,
> Halil
> ---8<-
> From: Halil Pasic 
> Date: Sun, 12 May 2019 18:08:05 +0200
> Subject: [PATCH] WIP: use cio dma pool for airqs
> 
> Let's not waste a DMA page per adapter interrupt bit vector.
> ---
> Lightly tested...
> ---
>  arch/s390/include/asm/airq.h |  1 -
>  drivers/s390/cio/airq.c  | 10 +++---
>  drivers/s390/cio/css.c   | 18 +++---
>  3 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/s390/include/asm/airq.h b/arch/s390/include/asm/airq.h
> index 1492d48..981a3eb 100644
> --- a/arch/s390/include/asm/airq.h
> +++ b/arch/s390/include/asm/airq.h
> @@ -30,7 +30,6 @@ void unregister_adapter_interrupt(struct airq_struct *airq);
>  /* Adapter interrupt bit vector */
>  struct airq_iv {
>   unsigned long *vector;  /* Adapter interrupt bit vector */
> - dma_addr_t vector_dma; /* Adapter interrupt bit vector dma */
>   unsigned long *avail;   /* Allocation bit mask for the bit vector */
>   unsigned long *bitlock; /* Lock bit mask for the bit vector */
>   unsigned long *ptr; /* Pointer associated with each bit */
> diff --git a/drivers/s390/cio/airq.c b/drivers/s390/cio/airq.c
> 

Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-12 Thread Halil Pasic
On Fri, 10 May 2019 16:10:13 +0200
Cornelia Huck  wrote:

> On Fri, 10 May 2019 00:11:12 +0200
> Halil Pasic  wrote:
> 
> > On Thu, 9 May 2019 12:11:06 +0200
> > Cornelia Huck  wrote:
> > 
> > > On Wed, 8 May 2019 23:22:10 +0200
> > > Halil Pasic  wrote:
> > >   
> > > > On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> > > > Sebastian Ott  wrote:  
> > >   
> > > > > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > > > > > unregister_reboot_notifier(_reboot_notifier);
> > > > > > goto out_unregister;
> > > > > > }
> > > > > > +   cio_dma_pool_init();  
> > > > > 
> > > > > This is too late for early devices (ccw console!).
> > > > 
> > > > You have already raised concern about this last time (thanks). I think,
> > > > I've addressed this issue: the cio_dma_pool is only used by the airq
> > > > stuff. I don't think the ccw console needs it. Please have an other look
> > > > at patch #6, and explain your concern in more detail if it persists.  
> > > 
> > > What about changing the naming/adding comments here, so that (1) folks
> > > aren't confused by the same thing in the future and (2) folks don't try
> > > to use that pool for something needed for the early ccw consoles?
> > >   
> > 
> > I'm all for clarity! Suggestions for better names?
> 
> css_aiv_dma_pool, maybe? Or is there other cross-device stuff that may
> need it?
> 

Ouch! I was considering to use cio_dma_zalloc() for the adapter
interruption vectors but I ended up between the two chairs in the end.
So with this series there are no uses for cio_dma pool.

I don't feel strongly about this going one way the other.

Against getting rid of the cio_dma_pool and sticking with the speaks
dma_alloc_coherent() that we waste a DMA page per vector, which is a
non obvious side effect.

What speaks against cio_dma_pool is that it is slightly more code, and
this currently can not be used for very early stuff, which I don't
think is relevant. What also used to speak against it is that
allocations asking for more than a page would just fail, but I addressed
that in the patch I've hacked up on top of the series, and I'm going to
paste below. While at it I addressed some other issues as well.

I've also got code that deals with AIRQ_IV_CACHELINE by turning the
kmem_cache into a dma_pool.

Cornelia, Sebastian which approach do you prefer:
1) get rid of cio_dma_pool and AIRQ_IV_CACHELINE, and waste a page per
vector, or
2) go with the approach taken by the patch below?


Regards,
Halil
---8<-
From: Halil Pasic 
Date: Sun, 12 May 2019 18:08:05 +0200
Subject: [PATCH] WIP: use cio dma pool for airqs

Let's not waste a DMA page per adapter interrupt bit vector.
---
Lightly tested...
---
 arch/s390/include/asm/airq.h |  1 -
 drivers/s390/cio/airq.c  | 10 +++---
 drivers/s390/cio/css.c   | 18 +++---
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/airq.h b/arch/s390/include/asm/airq.h
index 1492d48..981a3eb 100644
--- a/arch/s390/include/asm/airq.h
+++ b/arch/s390/include/asm/airq.h
@@ -30,7 +30,6 @@ void unregister_adapter_interrupt(struct airq_struct *airq);
 /* Adapter interrupt bit vector */
 struct airq_iv {
unsigned long *vector;  /* Adapter interrupt bit vector */
-   dma_addr_t vector_dma; /* Adapter interrupt bit vector dma */
unsigned long *avail;   /* Allocation bit mask for the bit vector */
unsigned long *bitlock; /* Lock bit mask for the bit vector */
unsigned long *ptr; /* Pointer associated with each bit */
diff --git a/drivers/s390/cio/airq.c b/drivers/s390/cio/airq.c
index 7a5c0a0..f11f437 100644
--- a/drivers/s390/cio/airq.c
+++ b/drivers/s390/cio/airq.c
@@ -136,8 +136,7 @@ struct airq_iv *airq_iv_create(unsigned long bits, unsigned 
long flags)
goto out;
iv->bits = bits;
size = iv_size(bits);
-   iv->vector = dma_alloc_coherent(cio_get_dma_css_dev(), size,
->vector_dma, GFP_KERNEL);
+   iv->vector = cio_dma_zalloc(size);
if (!iv->vector)
goto out_free;
if (flags & AIRQ_IV_ALLOC) {
@@ -172,8 +171,7 @@ struct airq_iv *airq_iv_create(unsigned long bits, unsigned 
long flags)
kfree(iv->ptr);
kfree(iv->bitlock);
kfree(iv->avail);
-   dma_free_coherent(cio_get_dma_css_dev(), size, iv->vector,
- iv->vector_dma);
+   cio_dma_free(iv->vector, size);
kfree(iv);
 out:
return NULL;
@@ -189,9 +187,7 @@ void airq_iv_release(struct airq_iv *iv)
kfree(iv->data);
kfree(iv->ptr);
kfree(iv->bitlock);
-   kfree(iv->vector);
-   dma_free_coherent(cio_get_dma_css_dev(), iv_size(iv->bits),
- iv->vector, iv->vector_dma);
+   cio_dma_free(iv->vector, iv_size(iv->bits));
kfree(iv->avail);
kfree(iv);
 }

Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-10 Thread Cornelia Huck
On Fri, 10 May 2019 00:11:12 +0200
Halil Pasic  wrote:

> On Thu, 9 May 2019 12:11:06 +0200
> Cornelia Huck  wrote:
> 
> > On Wed, 8 May 2019 23:22:10 +0200
> > Halil Pasic  wrote:
> >   
> > > On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> > > Sebastian Ott  wrote:  
> >   
> > > > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > > > >   unregister_reboot_notifier(_reboot_notifier);
> > > > >   goto out_unregister;
> > > > >   }
> > > > > + cio_dma_pool_init();  
> > > > 
> > > > This is too late for early devices (ccw console!).
> > > 
> > > You have already raised concern about this last time (thanks). I think,
> > > I've addressed this issue: tje cio_dma_pool is only used by the airq
> > > stuff. I don't think the ccw console needs it. Please have an other look
> > > at patch #6, and explain your concern in more detail if it persists.  
> > 
> > What about changing the naming/adding comments here, so that (1) folks
> > aren't confused by the same thing in the future and (2) folks don't try
> > to use that pool for something needed for the early ccw consoles?
> >   
> 
> I'm all for clarity! Suggestions for better names?

css_aiv_dma_pool, maybe? Or is there other cross-device stuff that may
need it?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-09 Thread Halil Pasic
On Thu, 9 May 2019 12:11:06 +0200
Cornelia Huck  wrote:

> On Wed, 8 May 2019 23:22:10 +0200
> Halil Pasic  wrote:
> 
> > On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> > Sebastian Ott  wrote:
> 
> > > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > > > unregister_reboot_notifier(_reboot_notifier);
> > > > goto out_unregister;
> > > > }
> > > > +   cio_dma_pool_init();
> > > 
> > > This is too late for early devices (ccw console!).  
> > 
> > You have already raised concern about this last time (thanks). I think,
> > I've addressed this issue: tje cio_dma_pool is only used by the airq
> > stuff. I don't think the ccw console needs it. Please have an other look
> > at patch #6, and explain your concern in more detail if it persists.
> 
> What about changing the naming/adding comments here, so that (1) folks
> aren't confused by the same thing in the future and (2) folks don't try
> to use that pool for something needed for the early ccw consoles?
> 

I'm all for clarity! Suggestions for better names?

Regards,
Halil

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-09 Thread Cornelia Huck
On Wed, 8 May 2019 23:22:10 +0200
Halil Pasic  wrote:

> On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> Sebastian Ott  wrote:

> > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > >   unregister_reboot_notifier(_reboot_notifier);
> > >   goto out_unregister;
> > >   }
> > > + cio_dma_pool_init();
> > 
> > This is too late for early devices (ccw console!).  
> 
> You have already raised concern about this last time (thanks). I think,
> I've addressed this issue: tje cio_dma_pool is only used by the airq
> stuff. I don't think the ccw console needs it. Please have an other look
> at patch #6, and explain your concern in more detail if it persists.

What about changing the naming/adding comments here, so that (1) folks
aren't confused by the same thing in the future and (2) folks don't try
to use that pool for something needed for the early ccw consoles?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-09 Thread Sebastian Ott


On Wed, 8 May 2019, Halil Pasic wrote:
> On Wed, 8 May 2019 15:18:10 +0200 (CEST)
> Sebastian Ott  wrote:
> > > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > >   unregister_reboot_notifier(_reboot_notifier);
> > >   goto out_unregister;
> > >   }
> > > + cio_dma_pool_init();  
> > 
> > This is too late for early devices (ccw console!).
> 
> You have already raised concern about this last time (thanks). I think,
> I've addressed this issue: tje cio_dma_pool is only used by the airq
> stuff. I don't think the ccw console needs it. Please have an other look
> at patch #6, and explain your concern in more detail if it persists.

Oh, you don't use the global pool for that - yes that should work.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-08 Thread Halil Pasic
On Wed, 8 May 2019 15:18:10 +0200 (CEST)
Sebastian Ott  wrote:

> > +void cio_gp_dma_free(struct gen_pool *gp_dma, void *cpu_addr, size_t size)
> > +{
> > +   if (!cpu_addr)
> > +   return;
> > +   memset(cpu_addr, 0, size);  
> 
> Hm, normally I'd do the memset during alloc not during free - but maybe
> this makes more sense here with your usecase in mind.

I allocate the backing as zeroed, and zero the memory before putting it
back to the pool. So the stuff in the pool is always zeroed.

> 
> > @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
> > unregister_reboot_notifier(_reboot_notifier);
> > goto out_unregister;
> > }
> > +   cio_dma_pool_init();  
> 
> This is too late for early devices (ccw console!).

You have already raised concern about this last time (thanks). I think,
I've addressed this issue: tje cio_dma_pool is only used by the airq
stuff. I don't think the ccw console needs it. Please have an other look
at patch #6, and explain your concern in more detail if it persists.

(@Mimu: What do you think, am I wrong here?)

Thanks for having a look!

Regards,
Halil

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-05-08 Thread Sebastian Ott


On Fri, 26 Apr 2019, Halil Pasic wrote:
> @@ -224,6 +228,9 @@ struct subchannel *css_alloc_subchannel(struct 
> subchannel_id schid,
>   INIT_WORK(>todo_work, css_sch_todo);
>   sch->dev.release = _subchannel_release;
>   device_initialize(>dev);
> + sch->dma_mask = css_dev_dma_mask;
> + sch->dev.dma_mask = >dma_mask;
> + sch->dev.coherent_dma_mask = sch->dma_mask;

Could we do:
sch->dev.dma_mask = >dev.coherent_dma_mask;
sch->dev.coherent_dma_mask = css_dev_dma_mask;
?

> +#define POOL_INIT_PAGES 1
> +static struct gen_pool *cio_dma_pool;
> +/* Currently cio supports only a single css */
> +#define  CIO_DMA_GFP (GFP_KERNEL | __GFP_ZERO)

__GFP_ZERO has no meaning with the dma api (since all implementations do
an implicit zero initialization) but let's keep it for the sake of
documentation.

We need GFP_DMA here (which will return addresses < 2G on s390)!

> +void cio_gp_dma_free(struct gen_pool *gp_dma, void *cpu_addr, size_t size)
> +{
> + if (!cpu_addr)
> + return;
> + memset(cpu_addr, 0, size);

Hm, normally I'd do the memset during alloc not during free - but maybe
this makes more sense here with your usecase in mind.

> @@ -1063,6 +1163,7 @@ static int __init css_bus_init(void)
>   unregister_reboot_notifier(_reboot_notifier);
>   goto out_unregister;
>   }
> + cio_dma_pool_init();

This is too late for early devices (ccw console!).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/10] s390/cio: introduce DMA pools to cio

2019-04-26 Thread Halil Pasic
To support protected virtualization cio will need to make sure the
memory used for communication with the hypervisor is DMA memory.

Let us introduce one global cio, and some tools for pools seated
at individual devices.

Our DMA pools are implemented as a gen_pool backed with DMA pages. The
idea is to avoid each allocation effectively wasting a page, as we
typically allocate much less than PAGE_SIZE.

Signed-off-by: Halil Pasic 
---
 arch/s390/Kconfig   |   1 +
 arch/s390/include/asm/cio.h |  11 +
 drivers/s390/cio/cio.h  |   1 +
 drivers/s390/cio/css.c  | 101 
 4 files changed, 114 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 5500d05d4d53..5861311d95d9 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -195,6 +195,7 @@ config S390
select VIRT_TO_BUS
select HAVE_NMI
select SWIOTLB
+   select GENERIC_ALLOCATOR
 
 
 config SCHED_OMIT_FRAME_POINTER
diff --git a/arch/s390/include/asm/cio.h b/arch/s390/include/asm/cio.h
index 1727180e8ca1..43c007d2775a 100644
--- a/arch/s390/include/asm/cio.h
+++ b/arch/s390/include/asm/cio.h
@@ -328,6 +328,17 @@ static inline u8 pathmask_to_pos(u8 mask)
 void channel_subsystem_reinit(void);
 extern void css_schedule_reprobe(void);
 
+extern void *cio_dma_zalloc(size_t size);
+extern void cio_dma_free(void *cpu_addr, size_t size);
+extern struct device *cio_get_dma_css_dev(void);
+
+struct gen_pool;
+void *cio_gp_dma_zalloc(struct gen_pool *gp_dma, struct device *dma_dev,
+   size_t size);
+void cio_gp_dma_free(struct gen_pool *gp_dma, void *cpu_addr, size_t size);
+void cio_gp_dma_destroy(struct gen_pool *gp_dma, struct device *dma_dev);
+struct gen_pool *cio_gp_dma_create(struct device *dma_dev, int nr_pages);
+
 /* Function from drivers/s390/cio/chsc.c */
 int chsc_sstpc(void *page, unsigned int op, u16 ctrl, u64 *clock_delta);
 int chsc_sstpi(void *page, void *result, size_t size);
diff --git a/drivers/s390/cio/cio.h b/drivers/s390/cio/cio.h
index 92eabbb5f18d..f23f7e2c33f7 100644
--- a/drivers/s390/cio/cio.h
+++ b/drivers/s390/cio/cio.h
@@ -113,6 +113,7 @@ struct subchannel {
enum sch_todo todo;
struct work_struct todo_work;
struct schib_config config;
+   u64 dma_mask;
 } __attribute__ ((aligned(8)));
 
 DECLARE_PER_CPU_ALIGNED(struct irb, cio_irb);
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c
index aea502922646..7087cc314fe9 100644
--- a/drivers/s390/cio/css.c
+++ b/drivers/s390/cio/css.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -199,6 +201,8 @@ static int css_validate_subchannel(struct subchannel_id 
schid,
return err;
 }
 
+static u64 css_dev_dma_mask = DMA_BIT_MASK(31);
+
 struct subchannel *css_alloc_subchannel(struct subchannel_id schid,
struct schib *schib)
 {
@@ -224,6 +228,9 @@ struct subchannel *css_alloc_subchannel(struct 
subchannel_id schid,
INIT_WORK(>todo_work, css_sch_todo);
sch->dev.release = _subchannel_release;
device_initialize(>dev);
+   sch->dma_mask = css_dev_dma_mask;
+   sch->dev.dma_mask = >dma_mask;
+   sch->dev.coherent_dma_mask = sch->dma_mask;
return sch;
 
 err:
@@ -899,6 +906,9 @@ static int __init setup_css(int nr)
dev_set_name(>device, "css%x", nr);
css->device.groups = cssdev_attr_groups;
css->device.release = channel_subsystem_release;
+   /* some cio DMA memory needs to be 31 bit addressable */
+   css->device.coherent_dma_mask = css_dev_dma_mask,
+   css->device.dma_mask = _dev_dma_mask;
 
mutex_init(>mutex);
css->cssid = chsc_get_cssid(nr);
@@ -1018,6 +1028,96 @@ static struct notifier_block css_power_notifier = {
.notifier_call = css_power_event,
 };
 
+#define POOL_INIT_PAGES 1
+static struct gen_pool *cio_dma_pool;
+/* Currently cio supports only a single css */
+#define  CIO_DMA_GFP (GFP_KERNEL | __GFP_ZERO)
+
+
+struct device *cio_get_dma_css_dev(void)
+{
+   return _subsystems[0]->device;
+}
+
+struct gen_pool *cio_gp_dma_create(struct device *dma_dev, int nr_pages)
+{
+   struct gen_pool *gp_dma;
+   void *cpu_addr;
+   dma_addr_t dma_addr;
+   int i;
+
+   gp_dma = gen_pool_create(3, -1);
+   if (!gp_dma)
+   return NULL;
+   for (i = 0; i < nr_pages; ++i) {
+   cpu_addr = dma_alloc_coherent(dma_dev, PAGE_SIZE, _addr,
+ CIO_DMA_GFP);
+   if (!cpu_addr)
+   return gp_dma;
+   gen_pool_add_virt(gp_dma, (unsigned long) cpu_addr,
+ dma_addr, PAGE_SIZE, -1);
+   }
+   return gp_dma;
+}
+
+static void __gp_dma_free_dma(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk, void *data)
+{
+   dma_free_coherent((struct device