Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-27 Thread Eugeniy Paltsev
On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > > > Hi,
> > > > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > > > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > This IP can be (ans is) configured with small block size.
> > > > (note, that I am not saying about runtime HW configuration)
> > > >
> > > > And there is opportunity what we can't use sg_list directly and
> > > > need
> > > > to
> > > > split sg_list to a smaller chunks.
> > >
> > > That's what I have referred quite ago. The driver should provide
> > > an
> > > interface to tell potential caller what maximum block (number of
> > > items
> > > with given bus width) it supports.
> > >
> > > We have struct dma_parms in struct device, but what we actually
> > > need
> > > is
> > > to support similar on per channel basis in DMAengine framework.
> > >
> > > So, instead of working around this I recommend either to
> > > implement
> > > it
> > > properly or rely on the fact that in the future someone
> > > eventually
> > > does that for you.
> > >
> > > Each driver which has this re-splitting mechanism should be
> > > cleaned
> > > up and refactored.
> >
> > I still can't see any pros of this implementation.
> > There is no performance profit: we anyway need to re-splitt sg_list
> > (but now in dma-user driver instead of dma driver)
--->---
> > If we want to use same descriptors several times we just can use
> > DMA_CTRL_REUSE option - the descriptors will be created one time
> > and re-splitting will be сompleted only one time.
>
> There are two type of descriptors: SW and HW. That flag about SW
> descriptor, so, it in most cases has nothing to do with the actual
> entry size.

Hmm, I checked all virt-dma code attentively and I don't see this
limitation.
The only existing limitation I can see is that we can use
DMA_CTRL_REUSE only for channels supporting slave transfers. (But it is
irrelevant to our discussion)

So, we can use DMA_CTRL_REUSE for both HW/SW descriptor types.

--
 Eugeniy Paltsev

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-27 Thread Eugeniy Paltsev
On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > > > Hi,
> > > > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > > > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > This IP can be (ans is) configured with small block size.
> > > > (note, that I am not saying about runtime HW configuration)
> > > >
> > > > And there is opportunity what we can't use sg_list directly and
> > > > need
> > > > to
> > > > split sg_list to a smaller chunks.
> > >
> > > That's what I have referred quite ago. The driver should provide
> > > an
> > > interface to tell potential caller what maximum block (number of
> > > items
> > > with given bus width) it supports.
> > >
> > > We have struct dma_parms in struct device, but what we actually
> > > need
> > > is
> > > to support similar on per channel basis in DMAengine framework.
> > >
> > > So, instead of working around this I recommend either to
> > > implement
> > > it
> > > properly or rely on the fact that in the future someone
> > > eventually
> > > does that for you.
> > >
> > > Each driver which has this re-splitting mechanism should be
> > > cleaned
> > > up and refactored.
> >
> > I still can't see any pros of this implementation.
> > There is no performance profit: we anyway need to re-splitt sg_list
> > (but now in dma-user driver instead of dma driver)
--->---
> > If we want to use same descriptors several times we just can use
> > DMA_CTRL_REUSE option - the descriptors will be created one time
> > and re-splitting will be сompleted only one time.
>
> There are two type of descriptors: SW and HW. That flag about SW
> descriptor, so, it in most cases has nothing to do with the actual
> entry size.

Hmm, I checked all virt-dma code attentively and I don't see this
limitation.
The only existing limitation I can see is that we can use
DMA_CTRL_REUSE only for channels supporting slave transfers. (But it is
irrelevant to our discussion)

So, we can use DMA_CTRL_REUSE for both HW/SW descriptor types.

--
 Eugeniy Paltsev

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-27 Thread Andy Shevchenko
On Thu, 2017-04-27 at 13:21 +, Eugeniy Paltsev wrote:
> On Wed, 2017-04-26 at 18:04 +0300, Andy Shevchenko wrote:
> > On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> > > On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > > > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:

> > > > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > > > descriptors
> > > > are not freed even after terminate_all is called)
> > > 
> > > If it's active it will be freed.
> > > Otherwise caller should check somewhere that descriptor fails.
> > > 
> > > But actually this is fragile and we need to monitor failed
> > > descriptors.
> > > Thanks for reporting.
> > > 
> > > > 
> > > > > Of course, if you want to keep by some reason (should be
> > > > > stated
> > > > > what
> > > > > the reason in comment) erred descriptors, you can do that.
> > > > 
> > > > So, I'll create desc_error list and store failed descriptors in
> > > > this
> > > > list until terminate_all() is called.
> > > > Is it OK implementation?
> > > 
> > > Nope, we need to amend virt-chan API for that. I'm on it. Will
> > > send
> > > a series soon.
> > 
> > I have to correct what I wrote before.
> > 
> > We have two options:
> > a) one I proposed above;
> > b) move descriptor to complete list and call complete callback with
> > result.
> > 
> > So, it looks like the b) variant is what is done already in 4 (did I
> > calculate correctly?) drivers and respective users.
> 
> In my opinion we should call error descriptor complete callback.
> But I don't think we should move error descriptor to desc_completed
> list.
> 
> When descriptor following our error descriptor will be completed
> successfully vchan tasklet will be called.
> So all descriptors from desc_completed list will be freed (including
> our error descriptor)
> We'll lost information about error descriptors and we'll not be able
> to
> return correct status from dma_async_is_tx_complete().

While I more agree on the point that we better to have explicit list of
failed descriptors, here is another point that user (which is interested
in error checking) has to provide callback_result instead of callback.

> In my opinion, we should create desc_error list.
> When we get error we'll move descriptor to desc_error list and call
> complete callback.

Vinod, Lars, others, what is(are) your opinion(s)?

-- 
Andy Shevchenko 
Intel Finland Oy


Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-27 Thread Andy Shevchenko
On Thu, 2017-04-27 at 13:21 +, Eugeniy Paltsev wrote:
> On Wed, 2017-04-26 at 18:04 +0300, Andy Shevchenko wrote:
> > On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> > > On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > > > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:

> > > > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > > > descriptors
> > > > are not freed even after terminate_all is called)
> > > 
> > > If it's active it will be freed.
> > > Otherwise caller should check somewhere that descriptor fails.
> > > 
> > > But actually this is fragile and we need to monitor failed
> > > descriptors.
> > > Thanks for reporting.
> > > 
> > > > 
> > > > > Of course, if you want to keep by some reason (should be
> > > > > stated
> > > > > what
> > > > > the reason in comment) erred descriptors, you can do that.
> > > > 
> > > > So, I'll create desc_error list and store failed descriptors in
> > > > this
> > > > list until terminate_all() is called.
> > > > Is it OK implementation?
> > > 
> > > Nope, we need to amend virt-chan API for that. I'm on it. Will
> > > send
> > > a series soon.
> > 
> > I have to correct what I wrote before.
> > 
> > We have two options:
> > a) one I proposed above;
> > b) move descriptor to complete list and call complete callback with
> > result.
> > 
> > So, it looks like the b) variant is what is done already in 4 (did I
> > calculate correctly?) drivers and respective users.
> 
> In my opinion we should call error descriptor complete callback.
> But I don't think we should move error descriptor to desc_completed
> list.
> 
> When descriptor following our error descriptor will be completed
> successfully vchan tasklet will be called.
> So all descriptors from desc_completed list will be freed (including
> our error descriptor)
> We'll lost information about error descriptors and we'll not be able
> to
> return correct status from dma_async_is_tx_complete().

While I more agree on the point that we better to have explicit list of
failed descriptors, here is another point that user (which is interested
in error checking) has to provide callback_result instead of callback.

> In my opinion, we should create desc_error list.
> When we get error we'll move descriptor to desc_error list and call
> complete callback.

Vinod, Lars, others, what is(are) your opinion(s)?

-- 
Andy Shevchenko 
Intel Finland Oy


Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-27 Thread Eugeniy Paltsev
On Wed, 2017-04-26 at 18:04 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> > On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > > > Descriptor is active until terminate_all() is called or new
> > > > descriptor
> > > > is supplied. So, the caller has a quite time to check on it.
> > > >
> > > > So, what's wrong on it by your opinion?
> > >
> > > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > > descriptors
> > > are not freed even after terminate_all is called)
> >
> > If it's active it will be freed.
> > Otherwise caller should check somewhere that descriptor fails.
> >
> > But actually this is fragile and we need to monitor failed
> > descriptors.
> > Thanks for reporting.
> >
> > >
> > > > Of course, if you want to keep by some reason (should be stated
> > > > what
> > > > the reason in comment) erred descriptors, you can do that.
> > >
> > > So, I'll create desc_error list and store failed descriptors in
> > > this
> > > list until terminate_all() is called.
> > > Is it OK implementation?
> >
> > Nope, we need to amend virt-chan API for that. I'm on it. Will send
> > a series soon.
>
> I have to correct what I wrote before.
>
> We have two options:
> a) one I proposed above;
> b) move descriptor to complete list and call complete callback with
> result.
>
> So, it looks like the b) variant is what is done already in 4 (did I
> calculate correctly?) drivers and respective users.

In my opinion we should call error descriptor complete callback.
But I don't think we should move error descriptor to desc_completed
list.

When descriptor following our error descriptor will be completed
successfully vchan tasklet will be called.
So all descriptors from desc_completed list will be freed (including
our error descriptor)
We'll lost information about error descriptors and we'll not be able to
return correct status from dma_async_is_tx_complete().

In my opinion, we should create desc_error list.
When we get error we'll move descriptor to desc_error list and call
complete callback.

--
 Eugeniy Paltsev

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-27 Thread Eugeniy Paltsev
On Wed, 2017-04-26 at 18:04 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> > On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > > > Descriptor is active until terminate_all() is called or new
> > > > descriptor
> > > > is supplied. So, the caller has a quite time to check on it.
> > > >
> > > > So, what's wrong on it by your opinion?
> > >
> > > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > > descriptors
> > > are not freed even after terminate_all is called)
> >
> > If it's active it will be freed.
> > Otherwise caller should check somewhere that descriptor fails.
> >
> > But actually this is fragile and we need to monitor failed
> > descriptors.
> > Thanks for reporting.
> >
> > >
> > > > Of course, if you want to keep by some reason (should be stated
> > > > what
> > > > the reason in comment) erred descriptors, you can do that.
> > >
> > > So, I'll create desc_error list and store failed descriptors in
> > > this
> > > list until terminate_all() is called.
> > > Is it OK implementation?
> >
> > Nope, we need to amend virt-chan API for that. I'm on it. Will send
> > a series soon.
>
> I have to correct what I wrote before.
>
> We have two options:
> a) one I proposed above;
> b) move descriptor to complete list and call complete callback with
> result.
>
> So, it looks like the b) variant is what is done already in 4 (did I
> calculate correctly?) drivers and respective users.

In my opinion we should call error descriptor complete callback.
But I don't think we should move error descriptor to desc_completed
list.

When descriptor following our error descriptor will be completed
successfully vchan tasklet will be called.
So all descriptors from desc_completed list will be freed (including
our error descriptor)
We'll lost information about error descriptors and we'll not be able to
return correct status from dma_async_is_tx_complete().

In my opinion, we should create desc_error list.
When we get error we'll move descriptor to desc_error list and call
complete callback.

--
 Eugeniy Paltsev

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-26 Thread Andy Shevchenko
On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:

> > > Descriptor is active until terminate_all() is called or new
> > > descriptor
> > > is supplied. So, the caller has a quite time to check on it.
> > > 
> > > So, what's wrong on it by your opinion?
> > 
> > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > descriptors
> > are not freed even after terminate_all is called)
> 
> If it's active it will be freed.
> Otherwise caller should check somewhere that descriptor fails.
> 
> But actually this is fragile and we need to monitor failed
> descriptors.
> Thanks for reporting.
> 
> > 
> > > Of course, if you want to keep by some reason (should be stated
> > > what
> > > the reason in comment) erred descriptors, you can do that.
> > 
> > So, I'll create desc_error list and store failed descriptors in this
> > list until terminate_all() is called.
> > Is it OK implementation?
> 
> Nope, we need to amend virt-chan API for that. I'm on it. Will send a
> series soon.

I have to correct what I wrote before.

We have two options:
a) one I proposed above;
b) move descriptor to complete list and call complete callback with
result.

So, it looks like the b) variant is what is done already in 4 (did I
calculate correctly?) drivers and respective users.

-- 
Andy Shevchenko 
Intel Finland Oy


Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-26 Thread Andy Shevchenko
On Tue, 2017-04-25 at 21:12 +0300, Andy Shevchenko wrote:
> On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> > On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:

> > > Descriptor is active until terminate_all() is called or new
> > > descriptor
> > > is supplied. So, the caller has a quite time to check on it.
> > > 
> > > So, what's wrong on it by your opinion?
> > 
> > Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> > descriptors
> > are not freed even after terminate_all is called)
> 
> If it's active it will be freed.
> Otherwise caller should check somewhere that descriptor fails.
> 
> But actually this is fragile and we need to monitor failed
> descriptors.
> Thanks for reporting.
> 
> > 
> > > Of course, if you want to keep by some reason (should be stated
> > > what
> > > the reason in comment) erred descriptors, you can do that.
> > 
> > So, I'll create desc_error list and store failed descriptors in this
> > list until terminate_all() is called.
> > Is it OK implementation?
> 
> Nope, we need to amend virt-chan API for that. I'm on it. Will send a
> series soon.

I have to correct what I wrote before.

We have two options:
a) one I proposed above;
b) move descriptor to complete list and call complete callback with
result.

So, it looks like the b) variant is what is done already in 4 (did I
calculate correctly?) drivers and respective users.

-- 
Andy Shevchenko 
Intel Finland Oy


Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-25 Thread Andy Shevchenko
On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > > Hi,
> > > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:

> > > This IP can be (ans is) configured with small block size.
> > > (note, that I am not saying about runtime HW configuration)
> > > 
> > > And there is opportunity what we can't use sg_list directly and
> > > need
> > > to
> > > split sg_list to a smaller chunks.
> > 
> > That's what I have referred quite ago. The driver should provide an
> > interface to tell potential caller what maximum block (number of
> > items
> > with given bus width) it supports.
> > 
> > We have struct dma_parms in struct device, but what we actually need
> > is
> > to support similar on per channel basis in DMAengine framework.
> > 
> > So, instead of working around this I recommend either to implement
> > it
> > properly or rely on the fact that in the future someone eventually
> > does that for you.
> > 
> > Each driver which has this re-splitting mechanism should be cleaned
> > up and refactored.
> 
> I still can't see any pros of this implementation.
> There is no performance profit: we anyway need to re-splitt sg_list
> (but now in dma-user driver instead of dma driver)

There is, you seems don't see it.

Currently:
 User:
  prepares sg-list

 DMA driver:
  a) iterates over it, and
  b) if sg_len is bigger than block it re-splits it.

New approach:

 User:
  a) gets DMA channel and its properties
  b) prepares sg-list taking into consideration block size

 DMA driver:
  a) uses the given sg-list and for sake of simplicity bails out if
something wrong with it

So, it means in some cases (where entry is big enough) we have to
prepare list *and* re-split it. It's really performance consuming (like
UART case where buffer is small enough and latency like above matters).

> If we want to use same descriptors several times we just can use
> DMA_CTRL_REUSE option - the descriptors will be created one time and
> re-splitting will be сompleted only one time.

There are two type of descriptors: SW and HW. That flag about SW
descriptor, so, it in most cases has nothing to do with the actual entry
size.

> But there are cons of this implementation:
> we need to implement re-splitting mechanism in each place we use dma
> instead of one dma driver. So there are more places for bugs and
> etc...

No, you completely missed the point.

With new approach we do the preparation only once per descriptor /
transfer and in one place where the sg-list is created.

> > > > Better not to pretend that it has been processed successfully.
> > > > Don't
> > > > call callback on it and set its status to DMA_ERROR (that's why
> > > > descriptors in many drivers have dma_status field). When user
> > > > asks for
> > > > status (using cookie) the saved value would be returned until
> > > > descriptor
> > > > is active.
> > > > 
> > > > Do you have some other workflow in mind?
> > > 
> > > Hmm...
> > > Do you mean I should left error descriptors in desc_issued list
> > > or I should create another list (like desc_error) in my driver and
> > > move
> > > error descriptors to desc_error list?
> > > 
> > > And when exactly should I free error descriptors?
> > 
> > See below.
> > 
> > > I checked hsu/hsu.c dma driver implementation:
> > >    vdma descriptor is deleted from desc_issued list when transfer
> > >    starts. When descriptor marked as error descriptor
> > >    vchan_cookie_complete isn't called for this descriptor. And
> > > this
> > >    descriptor isn't placed in any list. So error descriptors
> > > *never*
> > >    will be freed.
> > >    I don't actually like this approach.
> > 
> > Descriptor is active until terminate_all() is called or new
> > descriptor
> > is supplied. So, the caller has a quite time to check on it.
> > 
> > So, what's wrong on it by your opinion?
> 
> Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> descriptors
> are not freed even after terminate_all is called)

If it's active it will be freed.
Otherwise caller should check somewhere that descriptor fails.

But actually this is fragile and we need to monitor failed descriptors.
Thanks for reporting.

> 
> > Of course, if you want to keep by some reason (should be stated what
> > the reason in comment) erred descriptors, you can do that.
> 
> So, I'll create desc_error list and store failed descriptors in this
> list until terminate_all() is called.
> Is it OK implementation?

Nope, we need to amend virt-chan API for that. I'm on it. Will send a
series soon.

> There is a simple reason:
> I had to do same actions in probe/remove as in
> runtime_resume/runtime_suspend.
> (like enabling clock, enabling dma)

> 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-25 Thread Andy Shevchenko
On Tue, 2017-04-25 at 15:16 +, Eugeniy Paltsev wrote:
> On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> > On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > > Hi,
> > > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:

> > > This IP can be (ans is) configured with small block size.
> > > (note, that I am not saying about runtime HW configuration)
> > > 
> > > And there is opportunity what we can't use sg_list directly and
> > > need
> > > to
> > > split sg_list to a smaller chunks.
> > 
> > That's what I have referred quite ago. The driver should provide an
> > interface to tell potential caller what maximum block (number of
> > items
> > with given bus width) it supports.
> > 
> > We have struct dma_parms in struct device, but what we actually need
> > is
> > to support similar on per channel basis in DMAengine framework.
> > 
> > So, instead of working around this I recommend either to implement
> > it
> > properly or rely on the fact that in the future someone eventually
> > does that for you.
> > 
> > Each driver which has this re-splitting mechanism should be cleaned
> > up and refactored.
> 
> I still can't see any pros of this implementation.
> There is no performance profit: we anyway need to re-splitt sg_list
> (but now in dma-user driver instead of dma driver)

There is, you seems don't see it.

Currently:
 User:
  prepares sg-list

 DMA driver:
  a) iterates over it, and
  b) if sg_len is bigger than block it re-splits it.

New approach:

 User:
  a) gets DMA channel and its properties
  b) prepares sg-list taking into consideration block size

 DMA driver:
  a) uses the given sg-list and for sake of simplicity bails out if
something wrong with it

So, it means in some cases (where entry is big enough) we have to
prepare list *and* re-split it. It's really performance consuming (like
UART case where buffer is small enough and latency like above matters).

> If we want to use same descriptors several times we just can use
> DMA_CTRL_REUSE option - the descriptors will be created one time and
> re-splitting will be сompleted only one time.

There are two type of descriptors: SW and HW. That flag about SW
descriptor, so, it in most cases has nothing to do with the actual entry
size.

> But there are cons of this implementation:
> we need to implement re-splitting mechanism in each place we use dma
> instead of one dma driver. So there are more places for bugs and
> etc...

No, you completely missed the point.

With new approach we do the preparation only once per descriptor /
transfer and in one place where the sg-list is created.

> > > > Better not to pretend that it has been processed successfully.
> > > > Don't
> > > > call callback on it and set its status to DMA_ERROR (that's why
> > > > descriptors in many drivers have dma_status field). When user
> > > > asks for
> > > > status (using cookie) the saved value would be returned until
> > > > descriptor
> > > > is active.
> > > > 
> > > > Do you have some other workflow in mind?
> > > 
> > > Hmm...
> > > Do you mean I should left error descriptors in desc_issued list
> > > or I should create another list (like desc_error) in my driver and
> > > move
> > > error descriptors to desc_error list?
> > > 
> > > And when exactly should I free error descriptors?
> > 
> > See below.
> > 
> > > I checked hsu/hsu.c dma driver implementation:
> > >    vdma descriptor is deleted from desc_issued list when transfer
> > >    starts. When descriptor marked as error descriptor
> > >    vchan_cookie_complete isn't called for this descriptor. And
> > > this
> > >    descriptor isn't placed in any list. So error descriptors
> > > *never*
> > >    will be freed.
> > >    I don't actually like this approach.
> > 
> > Descriptor is active until terminate_all() is called or new
> > descriptor
> > is supplied. So, the caller has a quite time to check on it.
> > 
> > So, what's wrong on it by your opinion?
> 
> Hmm, this looks OK. (In my example (hsu/hsu.c driver) error
> descriptors
> are not freed even after terminate_all is called)

If it's active it will be freed.
Otherwise caller should check somewhere that descriptor fails.

But actually this is fragile and we need to monitor failed descriptors.
Thanks for reporting.

> 
> > Of course, if you want to keep by some reason (should be stated what
> > the reason in comment) erred descriptors, you can do that.
> 
> So, I'll create desc_error list and store failed descriptors in this
> list until terminate_all() is called.
> Is it OK implementation?

Nope, we need to amend virt-chan API for that. I'm on it. Will send a
series soon.

> There is a simple reason:
> I had to do same actions in probe/remove as in
> runtime_resume/runtime_suspend.
> (like enabling clock, enabling dma)

> 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-25 Thread Eugeniy Paltsev
On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > Hi,
> > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > > > This patch adds support for the DW AXI DMAC controller.
> > > > > > +static inline void
> > > > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32
> > > > > > val)
> > > > > > +{
> > > > > > +   iowrite32(val, chip->regs + reg);
> > > > >
> > > > > Are you going to use IO ports for this IP? I don't think so.
> > > > > Wouldn't be better to call readl()/writel() instead?
> > > >
> > > > As I understand, it's better to use ioread/iowrite as more
> > > > universal
> > > > IO
> > > > access way. Am I wrong?
> > >
> > > As I said above the ioreadX/iowriteX makes only sense when your
> > > IP
> > > would be accessed via IO region or MMIO. I'm pretty sure IO is
> > > not
> > > the case at all for this IP.
> >
> > MMIO? This IP works exactly via memory-mapped I/O.
>
> Yes, and why do you need to check this on each IO read/write?
> Please, switch to plain readX()/writeX() instead.
Ok, I'll switch to readX()/writeX().

> > > > > > +   val = axi_chan_ioread32(chan,
> > > > > > CH_INTSTATUS_ENA);
> > > > > > +   val &= ~irq_mask;
> > > > > > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > > > val);
> > > > > > +   }
> > > > > > +
> > > > > > +   return min_t(size_t, __ffs(sdl), max_width);
> > > > > > +}
> > > > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > > > +{
> > > > > > +   struct axi_dma_chan *chan = desc->chan;
> > > > > > +   struct dw_axi_dma *dw = chan->chip->dw;
> > > > > > +   struct axi_dma_desc *child, *_next;
> > > > > > +   unsigned int descs_put = 0;
> > > > > > +   list_for_each_entry_safe(child, _next, 
> > > > > > > xfer_list,
> > > > > >
> > > > > > xfer_list) {
> > > > >
> > > > > xfer_list looks redundant.
> > > > > Can you elaborate why virtual channel management is not
> > > > > working
> > > > > for
> > > > > you?
> > > >
> > > > Each virtual descriptor encapsulates several hardware
> > > > descriptors,
> > > > which belong to same transfer.
> > > > This list (xfer_list) is used only for allocating/freeing these
> > > > descriptors and it doesn't affect on virtual dma work logic.
> > > > I can see this approach in several drivers with VirtDMA (but
> > > > they
> > > > mostly use array instead of list)
> > >
> > > You described how most of the DMA drivers are implemented, though
> > > they
> > > are using just sg_list directly. I would recommend to do the same
> > > and
> > > get rid of this list.
> >
> > This IP can be (ans is) configured with small block size.
> > (note, that I am not saying about runtime HW configuration)
> >
> > And there is opportunity what we can't use sg_list directly and
> > need
> > to
> > split sg_list to a smaller chunks.
>
> That's what I have referred quite ago. The driver should provide an
> interface to tell potential caller what maximum block (number of
> items
> with given bus width) it supports.
>
> We have struct dma_parms in struct device, but what we actually need
> is
> to support similar on per channel basis in DMAengine framework.
>
> So, instead of working around this I recommend either to implement it
> properly or rely on the fact that in the future someone eventually
> does that for you.
>
> Each driver which has this re-splitting mechanism should be cleaned
> up and refactored.
I still can't see any pros of this implementation.
There is no performance profit: we anyway need to re-splitt sg_list
(but now in dma-user driver instead of dma driver)

If we want to use same descriptors several times we just can use
DMA_CTRL_REUSE option - the descriptors will be created one time and
re-splitting will be сompleted only one time.

But there are cons of this implementation:
we need to implement re-splitting mechanism in each place we use dma
instead of one dma driver. So there are more places for bugs and etc...

> > > > > Btw, are you planning to use priority at all? For now on I
> > > > > didn't
> > > > > see
> > > > > a single driver (from the set I have checked, like 4-5 of
> > > > > them)
> > > > > that
> > > > > uses priority anyhow. It makes driver more complex for
> > > > > nothing.
> > > >
> > > > Only for dma slave operations.
> > >
> > > So, in other words you *have* an actual two or more users that
> > > *need*
> > > prioritization?
> >
> > As I remember there was an idea to give higher priority to audio
> > dma
> > chanels.
>
> I don't see cyclic transfers support in the driver. So, I would
> suggest
> just drop entire prioritization for now. When it would be actual user
> one may start thinking of it.
> Just a rule of common sense: do not implement something which will
> have no user or 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-25 Thread Eugeniy Paltsev
On Mon, 2017-04-24 at 19:56 +0300, Andy Shevchenko wrote:
> On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> > Hi,
> > On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > > > This patch adds support for the DW AXI DMAC controller.
> > > > > > +static inline void
> > > > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32
> > > > > > val)
> > > > > > +{
> > > > > > +   iowrite32(val, chip->regs + reg);
> > > > >
> > > > > Are you going to use IO ports for this IP? I don't think so.
> > > > > Wouldn't be better to call readl()/writel() instead?
> > > >
> > > > As I understand, it's better to use ioread/iowrite as more
> > > > universal
> > > > IO
> > > > access way. Am I wrong?
> > >
> > > As I said above the ioreadX/iowriteX makes only sense when your
> > > IP
> > > would be accessed via IO region or MMIO. I'm pretty sure IO is
> > > not
> > > the case at all for this IP.
> >
> > MMIO? This IP works exactly via memory-mapped I/O.
>
> Yes, and why do you need to check this on each IO read/write?
> Please, switch to plain readX()/writeX() instead.
Ok, I'll switch to readX()/writeX().

> > > > > > +   val = axi_chan_ioread32(chan,
> > > > > > CH_INTSTATUS_ENA);
> > > > > > +   val &= ~irq_mask;
> > > > > > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > > > val);
> > > > > > +   }
> > > > > > +
> > > > > > +   return min_t(size_t, __ffs(sdl), max_width);
> > > > > > +}
> > > > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > > > +{
> > > > > > +   struct axi_dma_chan *chan = desc->chan;
> > > > > > +   struct dw_axi_dma *dw = chan->chip->dw;
> > > > > > +   struct axi_dma_desc *child, *_next;
> > > > > > +   unsigned int descs_put = 0;
> > > > > > +   list_for_each_entry_safe(child, _next, 
> > > > > > > xfer_list,
> > > > > >
> > > > > > xfer_list) {
> > > > >
> > > > > xfer_list looks redundant.
> > > > > Can you elaborate why virtual channel management is not
> > > > > working
> > > > > for
> > > > > you?
> > > >
> > > > Each virtual descriptor encapsulates several hardware
> > > > descriptors,
> > > > which belong to same transfer.
> > > > This list (xfer_list) is used only for allocating/freeing these
> > > > descriptors and it doesn't affect on virtual dma work logic.
> > > > I can see this approach in several drivers with VirtDMA (but
> > > > they
> > > > mostly use array instead of list)
> > >
> > > You described how most of the DMA drivers are implemented, though
> > > they
> > > are using just sg_list directly. I would recommend to do the same
> > > and
> > > get rid of this list.
> >
> > This IP can be (ans is) configured with small block size.
> > (note, that I am not saying about runtime HW configuration)
> >
> > And there is opportunity what we can't use sg_list directly and
> > need
> > to
> > split sg_list to a smaller chunks.
>
> That's what I have referred quite ago. The driver should provide an
> interface to tell potential caller what maximum block (number of
> items
> with given bus width) it supports.
>
> We have struct dma_parms in struct device, but what we actually need
> is
> to support similar on per channel basis in DMAengine framework.
>
> So, instead of working around this I recommend either to implement it
> properly or rely on the fact that in the future someone eventually
> does that for you.
>
> Each driver which has this re-splitting mechanism should be cleaned
> up and refactored.
I still can't see any pros of this implementation.
There is no performance profit: we anyway need to re-splitt sg_list
(but now in dma-user driver instead of dma driver)

If we want to use same descriptors several times we just can use
DMA_CTRL_REUSE option - the descriptors will be created one time and
re-splitting will be сompleted only one time.

But there are cons of this implementation:
we need to implement re-splitting mechanism in each place we use dma
instead of one dma driver. So there are more places for bugs and etc...

> > > > > Btw, are you planning to use priority at all? For now on I
> > > > > didn't
> > > > > see
> > > > > a single driver (from the set I have checked, like 4-5 of
> > > > > them)
> > > > > that
> > > > > uses priority anyhow. It makes driver more complex for
> > > > > nothing.
> > > >
> > > > Only for dma slave operations.
> > >
> > > So, in other words you *have* an actual two or more users that
> > > *need*
> > > prioritization?
> >
> > As I remember there was an idea to give higher priority to audio
> > dma
> > chanels.
>
> I don't see cyclic transfers support in the driver. So, I would
> suggest
> just drop entire prioritization for now. When it would be actual user
> one may start thinking of it.
> Just a rule of common sense: do not implement something which will
> have no user or 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-24 Thread Andy Shevchenko
On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> Hi,
> On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > > This patch adds support for the DW AXI DMAC controller.
> > > > > +#define AXI_DMA_BUSWIDTHS  \
> > > > > + (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > > > > + DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > > > > + DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > > > > + DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > > > > + DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > > > > + DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > > > > + DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > > > +/* TODO: check: do we need to use BIT() macro here? */
> > > > 
> > > > Still TODO? I remember I answered to this on the first round.
> > > 
> > > Yes, I remember it.
> > > I left this TODO as a reminder because src_addr_widths and
> > > dst_addr_widths are
> > > not used anywhere and they are set differently in different
> > > drivers
> > > (with or without BIT macro).
> > 
> > Strange. AFAIK they are representing bits (which is not the best
> > idea) in the resulting u64 field. So, anything bigger than 63
> > doesn't
> >  make sense.
> 
> They are u32 fields!

Even "better"!

> From dmaengine.h :
> struct dma_device {
> ...
> u32 src_addr_widths;
> u32 dst_addr_widths;
> };
> 
> > In drivers where they are not bits quite likely a bug is hidden.
> 
> For example (from pxa_dma.c):
> const enum dma_slave_buswidth widths = DMA_SLAVE_BUSWIDTH_1_BYTE |
> DMA_SLAVE_BUSWIDTH_2_BYTES | DMA_SLAVE_BUSWIDTH_4_BYTES;
> 
> And there are a lot of drivers, which don't use BIT for this fields.
> sh/shdmac.c
> sh/rcar-dmac.c
> qcom/bam_dma.c
> mmp_pdma.c
> ste_dma40.c
> And many others...

Definitely the concept of that interface never thought enough and broken
by design.

> > > > > +static inline void
> > > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32
> > > > > val)
> > > > > +{
> > > > > + iowrite32(val, chip->regs + reg);
> > > > 
> > > > Are you going to use IO ports for this IP? I don't think so.
> > > > Wouldn't be better to call readl()/writel() instead?
> > > 
> > > As I understand, it's better to use ioread/iowrite as more
> > > universal
> > > IO
> > > access way. Am I wrong?
> > 
> > As I said above the ioreadX/iowriteX makes only sense when your IP
> > would be accessed via IO region or MMIO. I'm pretty sure IO is not
> > the case at all for this IP.
> 
> MMIO? This IP works exactly via memory-mapped I/O.

Yes, and why do you need to check this on each IO read/write?
Please, switch to plain readX()/writeX() instead.

> > > > > + val = axi_chan_ioread32(chan,
> > > > > CH_INTSTATUS_ENA);
> > > > > + val &= ~irq_mask;
> > > > > + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > > val);
> > > > > + }
> > > > > +
> > > > > + return min_t(size_t, __ffs(sdl), max_width);
> > > > > +}
> > > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > > +{
> > > > > + struct axi_dma_chan *chan = desc->chan;
> > > > > + struct dw_axi_dma *dw = chan->chip->dw;
> > > > > + struct axi_dma_desc *child, *_next;
> > > > > + unsigned int descs_put = 0;
> > > > > + list_for_each_entry_safe(child, _next, 
> > > > > > xfer_list,
> > > > > 
> > > > > xfer_list) {
> > > > 
> > > > xfer_list looks redundant.
> > > > Can you elaborate why virtual channel management is not working
> > > > for
> > > > you?
> > > 
> > > Each virtual descriptor encapsulates several hardware descriptors,
> > > which belong to same transfer.
> > > This list (xfer_list) is used only for allocating/freeing these
> > > descriptors and it doesn't affect on virtual dma work logic.
> > > I can see this approach in several drivers with VirtDMA (but they
> > > mostly use array instead of list)
> > 
> > You described how most of the DMA drivers are implemented, though
> > they
> > are using just sg_list directly. I would recommend to do the same
> > and
> > get rid of this list.
> 
> This IP can be (ans is) configured with small block size.
> (note, that I am not saying about runtime HW configuration)
> 
> And there is opportunity what we can't use sg_list directly and need
> to
> split sg_list to a smaller chunks.

That's what I have referred quite ago. The driver should provide an
interface to tell potential caller what maximum block (number of items
with given bus width) it supports.

We have struct dma_parms in struct device, but what we actually need is
to support similar on per channel basis in DMAengine framework.

So, instead of working around this I recommend either to implement it
properly or rely on the fact that in the future someone eventually does
that for you.

Each driver which has this re-splitting mechanism should be cleaned up
and refactored.

> > > > Btw, are 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-24 Thread Andy Shevchenko
On Mon, 2017-04-24 at 15:55 +, Eugeniy Paltsev wrote:
> Hi,
> On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> > On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > > This patch adds support for the DW AXI DMAC controller.
> > > > > +#define AXI_DMA_BUSWIDTHS  \
> > > > > + (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > > > > + DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > > > > + DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > > > > + DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > > > > + DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > > > > + DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > > > > + DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > > > +/* TODO: check: do we need to use BIT() macro here? */
> > > > 
> > > > Still TODO? I remember I answered to this on the first round.
> > > 
> > > Yes, I remember it.
> > > I left this TODO as a reminder because src_addr_widths and
> > > dst_addr_widths are
> > > not used anywhere and they are set differently in different
> > > drivers
> > > (with or without BIT macro).
> > 
> > Strange. AFAIK they are representing bits (which is not the best
> > idea) in the resulting u64 field. So, anything bigger than 63
> > doesn't
> >  make sense.
> 
> They are u32 fields!

Even "better"!

> From dmaengine.h :
> struct dma_device {
> ...
> u32 src_addr_widths;
> u32 dst_addr_widths;
> };
> 
> > In drivers where they are not bits quite likely a bug is hidden.
> 
> For example (from pxa_dma.c):
> const enum dma_slave_buswidth widths = DMA_SLAVE_BUSWIDTH_1_BYTE |
> DMA_SLAVE_BUSWIDTH_2_BYTES | DMA_SLAVE_BUSWIDTH_4_BYTES;
> 
> And there are a lot of drivers, which don't use BIT for this fields.
> sh/shdmac.c
> sh/rcar-dmac.c
> qcom/bam_dma.c
> mmp_pdma.c
> ste_dma40.c
> And many others...

Definitely the concept of that interface never thought enough and broken
by design.

> > > > > +static inline void
> > > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32
> > > > > val)
> > > > > +{
> > > > > + iowrite32(val, chip->regs + reg);
> > > > 
> > > > Are you going to use IO ports for this IP? I don't think so.
> > > > Wouldn't be better to call readl()/writel() instead?
> > > 
> > > As I understand, it's better to use ioread/iowrite as more
> > > universal
> > > IO
> > > access way. Am I wrong?
> > 
> > As I said above the ioreadX/iowriteX makes only sense when your IP
> > would be accessed via IO region or MMIO. I'm pretty sure IO is not
> > the case at all for this IP.
> 
> MMIO? This IP works exactly via memory-mapped I/O.

Yes, and why do you need to check this on each IO read/write?
Please, switch to plain readX()/writeX() instead.

> > > > > + val = axi_chan_ioread32(chan,
> > > > > CH_INTSTATUS_ENA);
> > > > > + val &= ~irq_mask;
> > > > > + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > > val);
> > > > > + }
> > > > > +
> > > > > + return min_t(size_t, __ffs(sdl), max_width);
> > > > > +}
> > > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > > +{
> > > > > + struct axi_dma_chan *chan = desc->chan;
> > > > > + struct dw_axi_dma *dw = chan->chip->dw;
> > > > > + struct axi_dma_desc *child, *_next;
> > > > > + unsigned int descs_put = 0;
> > > > > + list_for_each_entry_safe(child, _next, 
> > > > > > xfer_list,
> > > > > 
> > > > > xfer_list) {
> > > > 
> > > > xfer_list looks redundant.
> > > > Can you elaborate why virtual channel management is not working
> > > > for
> > > > you?
> > > 
> > > Each virtual descriptor encapsulates several hardware descriptors,
> > > which belong to same transfer.
> > > This list (xfer_list) is used only for allocating/freeing these
> > > descriptors and it doesn't affect on virtual dma work logic.
> > > I can see this approach in several drivers with VirtDMA (but they
> > > mostly use array instead of list)
> > 
> > You described how most of the DMA drivers are implemented, though
> > they
> > are using just sg_list directly. I would recommend to do the same
> > and
> > get rid of this list.
> 
> This IP can be (ans is) configured with small block size.
> (note, that I am not saying about runtime HW configuration)
> 
> And there is opportunity what we can't use sg_list directly and need
> to
> split sg_list to a smaller chunks.

That's what I have referred quite ago. The driver should provide an
interface to tell potential caller what maximum block (number of items
with given bus width) it supports.

We have struct dma_parms in struct device, but what we actually need is
to support similar on per channel basis in DMAengine framework.

So, instead of working around this I recommend either to implement it
properly or rely on the fact that in the future someone eventually does
that for you.

Each driver which has this re-splitting mechanism should be cleaned up
and refactored.

> > > > Btw, are 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-24 Thread Eugeniy Paltsev
Hi,
On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > This patch adds support for the DW AXI DMAC controller.
> > > > +#define AXI_DMA_BUSWIDTHS    \
> > > > +   (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > > > +   DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > > > +   DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > > > +   DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > > > +   DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > > > +   DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > > > +   DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > > +/* TODO: check: do we need to use BIT() macro here? */
> > >
> > > Still TODO? I remember I answered to this on the first round.
> >
> > Yes, I remember it.
> > I left this TODO as a reminder because src_addr_widths and
> > dst_addr_widths are
> > not used anywhere and they are set differently in different drivers
> > (with or without BIT macro).
>
> Strange. AFAIK they are representing bits (which is not the best
> idea) in the resulting u64 field. So, anything bigger than 63 doesn't
> make sense.
They are u32 fields!
From dmaengine.h :
struct dma_device {
...
u32 src_addr_widths;
u32 dst_addr_widths;
};

> In drivers where they are not bits quite likely a bug is hidden.
For example (from pxa_dma.c):
const enum dma_slave_buswidth widths = DMA_SLAVE_BUSWIDTH_1_BYTE |
DMA_SLAVE_BUSWIDTH_2_BYTES | DMA_SLAVE_BUSWIDTH_4_BYTES;

And there are a lot of drivers, which don't use BIT for this fields.
sh/shdmac.c
sh/rcar-dmac.c
qcom/bam_dma.c
mmp_pdma.c
ste_dma40.c
And many others...

> >
> > > > +
> > > > +static inline void
> > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > > > +{
> > > > +   iowrite32(val, chip->regs + reg);
> > >
> > > Are you going to use IO ports for this IP? I don't think so.
> > > Wouldn't be better to call readl()/writel() instead?
> >
> > As I understand, it's better to use ioread/iowrite as more
> > universal
> > IO
> > access way. Am I wrong?
>
> As I said above the ioreadX/iowriteX makes only sense when your IP
> would be accessed via IO region or MMIO. I'm pretty sure IO is not
> the case at all for this IP.
MMIO? This IP works exactly via memory-mapped I/O.

> > > > +static inline void axi_chan_irq_disable(struct axi_dma_chan
> > > > *chan,
> > > > u32 irq_mask)
> > > > +{
> > > > +   u32 val;
> > > > +
> > > > +   if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > > > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > DWAXIDMAC_IRQ_NONE);
> > > > +   } else {
> > >
> > > I don't see the benefit. (Yes, I see one read-less path, I think
> > > it
> > > makes perplexity for nothing here)
> >
> > This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
> > Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until
> > I add DMA_SLAVE support.
> > But I can cut off this 'if' statment, if it is necessery.
>
> It's not necessary, but from my point of view increases readability
> of the code a lot for the price of an additional readl().
Ok.

> >
> > > > +   val = axi_chan_ioread32(chan,
> > > > CH_INTSTATUS_ENA);
> > > > +   val &= ~irq_mask;
> > > > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > val);
> > > > +   }
> > > > +
> > > > +   return min_t(size_t, __ffs(sdl), max_width);
> > > > +}
> > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > +{
> > > > +   struct axi_dma_chan *chan = desc->chan;
> > > > +   struct dw_axi_dma *dw = chan->chip->dw;
> > > > +   struct axi_dma_desc *child, *_next;
> > > > +   unsigned int descs_put = 0;
> > > > +   list_for_each_entry_safe(child, _next, 
> > > > >xfer_list,
> > > > xfer_list) {
> > >
> > > xfer_list looks redundant.
> > > Can you elaborate why virtual channel management is not working
> > > for
> > > you?
> >
> > Each virtual descriptor encapsulates several hardware descriptors,
> > which belong to same transfer.
> > This list (xfer_list) is used only for allocating/freeing these
> > descriptors and it doesn't affect on virtual dma work logic.
> > I can see this approach in several drivers with VirtDMA (but they
> > mostly use array instead of list)
>
> You described how most of the DMA drivers are implemented, though
> they
> are using just sg_list directly. I would recommend to do the same and
> get rid of this list.
This IP can be (ans is) configured with small block size.
(note, that I am not saying about runtime HW configuration)

And there is opportunity what we can't use sg_list directly and need to
split sg_list to a smaller chunks.

> > > Btw, are you planning to use priority at all? For now on I didn't
> > > see
> > > a single driver (from the set I have checked, like 4-5 of them)
> > > that
> > > uses priority anyhow. It makes driver more complex for 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-24 Thread Eugeniy Paltsev
Hi,
On Fri, 2017-04-21 at 18:13 +0300, Andy Shevchenko wrote:
> On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> > On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > > This patch adds support for the DW AXI DMAC controller.
> > > > +#define AXI_DMA_BUSWIDTHS    \
> > > > +   (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > > > +   DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > > > +   DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > > > +   DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > > > +   DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > > > +   DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > > > +   DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > > +/* TODO: check: do we need to use BIT() macro here? */
> > >
> > > Still TODO? I remember I answered to this on the first round.
> >
> > Yes, I remember it.
> > I left this TODO as a reminder because src_addr_widths and
> > dst_addr_widths are
> > not used anywhere and they are set differently in different drivers
> > (with or without BIT macro).
>
> Strange. AFAIK they are representing bits (which is not the best
> idea) in the resulting u64 field. So, anything bigger than 63 doesn't
> make sense.
They are u32 fields!
From dmaengine.h :
struct dma_device {
...
u32 src_addr_widths;
u32 dst_addr_widths;
};

> In drivers where they are not bits quite likely a bug is hidden.
For example (from pxa_dma.c):
const enum dma_slave_buswidth widths = DMA_SLAVE_BUSWIDTH_1_BYTE |
DMA_SLAVE_BUSWIDTH_2_BYTES | DMA_SLAVE_BUSWIDTH_4_BYTES;

And there are a lot of drivers, which don't use BIT for this fields.
sh/shdmac.c
sh/rcar-dmac.c
qcom/bam_dma.c
mmp_pdma.c
ste_dma40.c
And many others...

> >
> > > > +
> > > > +static inline void
> > > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > > > +{
> > > > +   iowrite32(val, chip->regs + reg);
> > >
> > > Are you going to use IO ports for this IP? I don't think so.
> > > Wouldn't be better to call readl()/writel() instead?
> >
> > As I understand, it's better to use ioread/iowrite as more
> > universal
> > IO
> > access way. Am I wrong?
>
> As I said above the ioreadX/iowriteX makes only sense when your IP
> would be accessed via IO region or MMIO. I'm pretty sure IO is not
> the case at all for this IP.
MMIO? This IP works exactly via memory-mapped I/O.

> > > > +static inline void axi_chan_irq_disable(struct axi_dma_chan
> > > > *chan,
> > > > u32 irq_mask)
> > > > +{
> > > > +   u32 val;
> > > > +
> > > > +   if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > > > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > DWAXIDMAC_IRQ_NONE);
> > > > +   } else {
> > >
> > > I don't see the benefit. (Yes, I see one read-less path, I think
> > > it
> > > makes perplexity for nothing here)
> >
> > This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
> > Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until
> > I add DMA_SLAVE support.
> > But I can cut off this 'if' statment, if it is necessery.
>
> It's not necessary, but from my point of view increases readability
> of the code a lot for the price of an additional readl().
Ok.

> >
> > > > +   val = axi_chan_ioread32(chan,
> > > > CH_INTSTATUS_ENA);
> > > > +   val &= ~irq_mask;
> > > > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > > val);
> > > > +   }
> > > > +
> > > > +   return min_t(size_t, __ffs(sdl), max_width);
> > > > +}
> > > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > > +{
> > > > +   struct axi_dma_chan *chan = desc->chan;
> > > > +   struct dw_axi_dma *dw = chan->chip->dw;
> > > > +   struct axi_dma_desc *child, *_next;
> > > > +   unsigned int descs_put = 0;
> > > > +   list_for_each_entry_safe(child, _next, 
> > > > >xfer_list,
> > > > xfer_list) {
> > >
> > > xfer_list looks redundant.
> > > Can you elaborate why virtual channel management is not working
> > > for
> > > you?
> >
> > Each virtual descriptor encapsulates several hardware descriptors,
> > which belong to same transfer.
> > This list (xfer_list) is used only for allocating/freeing these
> > descriptors and it doesn't affect on virtual dma work logic.
> > I can see this approach in several drivers with VirtDMA (but they
> > mostly use array instead of list)
>
> You described how most of the DMA drivers are implemented, though
> they
> are using just sg_list directly. I would recommend to do the same and
> get rid of this list.
This IP can be (ans is) configured with small block size.
(note, that I am not saying about runtime HW configuration)

And there is opportunity what we can't use sg_list directly and need to
split sg_list to a smaller chunks.

> > > Btw, are you planning to use priority at all? For now on I didn't
> > > see
> > > a single driver (from the set I have checked, like 4-5 of them)
> > > that
> > > uses priority anyhow. It makes driver more complex for 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-21 Thread Andy Shevchenko
On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > This patch adds support for the DW AXI DMAC controller.
> > > 

> > > +#include 
> > 
> > Are you sure you still need of.h along with depends OF ?
> 
> "of_match_ptr" used from of.h

It safe not to use it and always have a table. In this case the driver
even would be available for ACPI-enabled platforms (I suppose some ARM64
might find this useful).

> > > +#define AXI_DMA_BUSWIDTHS  \
> > > + (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > > + DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > > + DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > > + DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > > + DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > > + DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > > + DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > +/* TODO: check: do we need to use BIT() macro here? */
> > 
> > Still TODO? I remember I answered to this on the first round.
> 
> Yes, I remember it.
> I left this TODO as a reminder because src_addr_widths and
> dst_addr_widths are
> not used anywhere and they are set differently in different drivers
> (with or without BIT macro).

Strange. AFAIK they are representing bits (which is not the best idea)
in the resulting u64 field. So, anything bigger than 63 doesn't make
sense. In drivers where they are not bits quite likely a bug is hidden.

> 
> > > +
> > > +static inline void
> > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > > +{
> > > + iowrite32(val, chip->regs + reg);
> > 
> > Are you going to use IO ports for this IP? I don't think so.
> > Wouldn't be better to call readl()/writel() instead?
> 
> As I understand, it's better to use ioread/iowrite as more universal
> IO
> access way. Am I wrong?

As I said above the ioreadX/iowriteX makes only sense when your IP would
be accessed via IO region or MMIO. I'm pretty sure IO is not the case at
all for this IP.

> > > +static inline void axi_chan_irq_disable(struct axi_dma_chan
> > > *chan,
> > > u32 irq_mask)
> > > +{
> > > + u32 val;
> > > +
> > > + if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > > + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > DWAXIDMAC_IRQ_NONE);
> > > + } else {
> > 
> > I don't see the benefit. (Yes, I see one read-less path, I think it
> > makes perplexity for nothing here)
> 
> This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
> Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until I
> add DMA_SLAVE support.
> But I can cut off this 'if' statment, if it is necessery.

It's not necessary, but from my point of view increases readability of
the code a lot for the price of an additional readl().

> 
> > > + val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> > > + val &= ~irq_mask;
> > > + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> > > + }

> > > +
> > > + return min_t(size_t, __ffs(sdl), max_width);
> > > +}
> > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > +{
> > > + struct axi_dma_chan *chan = desc->chan;
> > > + struct dw_axi_dma *dw = chan->chip->dw;
> > > + struct axi_dma_desc *child, *_next;
> > > + unsigned int descs_put = 0;
> > > + list_for_each_entry_safe(child, _next, >xfer_list,
> > > xfer_list) {
> > 
> > xfer_list looks redundant.
> > Can you elaborate why virtual channel management is not working for
> > you?
> 
> Each virtual descriptor encapsulates several hardware descriptors,
> which belong to same transfer.
> This list (xfer_list) is used only for allocating/freeing these
> descriptors and it doesn't affect on virtual dma work logic.
> I can see this approach in several drivers with VirtDMA (but they
> mostly use array instead of list)

You described how most of the DMA drivers are implemented, though they
are using just sg_list directly. I would recommend to do the same and
get rid of this list.

> > Btw, are you planning to use priority at all? For now on I didn't
> > see
> > a single driver (from the set I have checked, like 4-5 of them) that
> > uses priority anyhow. It makes driver more complex for nothing.
> 
> Only for dma slave operations.

So, in other words you *have* an actual two or more users that *need*
prioritization?

> > > + if (unlikely(dst_nents == 0 || src_nents == 0))
> > > + return NULL;
> > > +
> > > + if (unlikely(dst_sg == NULL || src_sg == NULL))
> > > + return NULL;
> > 
> > If we need those checks they should go to dmaengine.h/dmaengine.c.
> 
> I checked several drivers, which implements device_prep_dma_sg and
> they
> implements this checkers.
> Should I add something like "dma_sg_desc_invalid" function to
> dmaengine.h ?

I suppose it's better to implement them in the corresponding helpers in
dmaengine.h.

> > > +static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
> > > +    struct axi_dma_desc
> > > *desc_head)
> > > +{
> > > + struct 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-21 Thread Andy Shevchenko
On Fri, 2017-04-21 at 14:29 +, Eugeniy Paltsev wrote:
> On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> > On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > > This patch adds support for the DW AXI DMAC controller.
> > > 

> > > +#include 
> > 
> > Are you sure you still need of.h along with depends OF ?
> 
> "of_match_ptr" used from of.h

It safe not to use it and always have a table. In this case the driver
even would be available for ACPI-enabled platforms (I suppose some ARM64
might find this useful).

> > > +#define AXI_DMA_BUSWIDTHS  \
> > > + (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > > + DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > > + DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > > + DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > > + DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > > + DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > > + DMA_SLAVE_BUSWIDTH_64_BYTES)
> > > +/* TODO: check: do we need to use BIT() macro here? */
> > 
> > Still TODO? I remember I answered to this on the first round.
> 
> Yes, I remember it.
> I left this TODO as a reminder because src_addr_widths and
> dst_addr_widths are
> not used anywhere and they are set differently in different drivers
> (with or without BIT macro).

Strange. AFAIK they are representing bits (which is not the best idea)
in the resulting u64 field. So, anything bigger than 63 doesn't make
sense. In drivers where they are not bits quite likely a bug is hidden.

> 
> > > +
> > > +static inline void
> > > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > > +{
> > > + iowrite32(val, chip->regs + reg);
> > 
> > Are you going to use IO ports for this IP? I don't think so.
> > Wouldn't be better to call readl()/writel() instead?
> 
> As I understand, it's better to use ioread/iowrite as more universal
> IO
> access way. Am I wrong?

As I said above the ioreadX/iowriteX makes only sense when your IP would
be accessed via IO region or MMIO. I'm pretty sure IO is not the case at
all for this IP.

> > > +static inline void axi_chan_irq_disable(struct axi_dma_chan
> > > *chan,
> > > u32 irq_mask)
> > > +{
> > > + u32 val;
> > > +
> > > + if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > > + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > > DWAXIDMAC_IRQ_NONE);
> > > + } else {
> > 
> > I don't see the benefit. (Yes, I see one read-less path, I think it
> > makes perplexity for nothing here)
> 
> This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
> Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until I
> add DMA_SLAVE support.
> But I can cut off this 'if' statment, if it is necessery.

It's not necessary, but from my point of view increases readability of
the code a lot for the price of an additional readl().

> 
> > > + val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> > > + val &= ~irq_mask;
> > > + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> > > + }

> > > +
> > > + return min_t(size_t, __ffs(sdl), max_width);
> > > +}
> > > +static void axi_desc_put(struct axi_dma_desc *desc)
> > > +{
> > > + struct axi_dma_chan *chan = desc->chan;
> > > + struct dw_axi_dma *dw = chan->chip->dw;
> > > + struct axi_dma_desc *child, *_next;
> > > + unsigned int descs_put = 0;
> > > + list_for_each_entry_safe(child, _next, >xfer_list,
> > > xfer_list) {
> > 
> > xfer_list looks redundant.
> > Can you elaborate why virtual channel management is not working for
> > you?
> 
> Each virtual descriptor encapsulates several hardware descriptors,
> which belong to same transfer.
> This list (xfer_list) is used only for allocating/freeing these
> descriptors and it doesn't affect on virtual dma work logic.
> I can see this approach in several drivers with VirtDMA (but they
> mostly use array instead of list)

You described how most of the DMA drivers are implemented, though they
are using just sg_list directly. I would recommend to do the same and
get rid of this list.

> > Btw, are you planning to use priority at all? For now on I didn't
> > see
> > a single driver (from the set I have checked, like 4-5 of them) that
> > uses priority anyhow. It makes driver more complex for nothing.
> 
> Only for dma slave operations.

So, in other words you *have* an actual two or more users that *need*
prioritization?

> > > + if (unlikely(dst_nents == 0 || src_nents == 0))
> > > + return NULL;
> > > +
> > > + if (unlikely(dst_sg == NULL || src_sg == NULL))
> > > + return NULL;
> > 
> > If we need those checks they should go to dmaengine.h/dmaengine.c.
> 
> I checked several drivers, which implements device_prep_dma_sg and
> they
> implements this checkers.
> Should I add something like "dma_sg_desc_invalid" function to
> dmaengine.h ?

I suppose it's better to implement them in the corresponding helpers in
dmaengine.h.

> > > +static void axi_chan_list_dump_lli(struct axi_dma_chan *chan,
> > > +    struct axi_dma_desc
> > > *desc_head)
> > > +{
> > > + struct 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-21 Thread Eugeniy Paltsev
Hi Andy,
thanks for respond.
My comments are inlined below.

On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > This patch adds support for the DW AXI DMAC controller.
> >
> > DW AXI DMAC is a part of upcoming development board from Synopsys.
> >
> > In this driver implementation only DMA_MEMCPY and DMA_SG transfers
> > are supported.
> >
> > +++ b/drivers/dma/axi_dma_platform.c
> > @@ -0,0 +1,1044 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
>
> Are you sure you still need of.h along with depends OF ?
"of_match_ptr" used from of.h

> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "axi_dma_platform.h"
> > +#include "axi_dma_platform_reg.h"
>
> Can't you have this in one header?
Sure.

> > +#include "dmaengine.h"
> > +#include "virt-dma.h"
> > +#define AXI_DMA_BUSWIDTHS    \
> > +   (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > +   DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > +   DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > +   DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > +   DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > +   DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > +   DMA_SLAVE_BUSWIDTH_64_BYTES)
> > +/* TODO: check: do we need to use BIT() macro here? */
>
> Still TODO? I remember I answered to this on the first round.
Yes, I remember it.
I left this TODO as a reminder because src_addr_widths and
dst_addr_widths are
not used anywhere and they are set differently in different drivers
(with or without BIT macro).

> > +
> > +static inline void
> > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > +{
> > +   iowrite32(val, chip->regs + reg);
>
> Are you going to use IO ports for this IP? I don't think so.
> Wouldn't be better to call readl()/writel() instead?
As I understand, it's better to use ioread/iowrite as more universal IO
access way. Am I wrong?

> > +}
> > +static inline void
> > +axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
> > +{
> > +   iowrite32(val & 0x, chan->chan_regs + reg);
> Useless conjunction.
>
> > +   iowrite32(val >> 32, chan->chan_regs + reg + 4);
> > +}
>
> Can your hardware get 8 bytes at once?
> For such cases we have iowrite64() for 64-bit kernels
>
> But with readq()/writeq() we have specific helpers to make this
> pretty,
> i.e. lo_hi_readq() / lo_hi_writeq() (or hi_lo_*() variants).
Ok, I possibly can use lo_hi_writeq here.

> > +static inline void axi_chan_irq_disable(struct axi_dma_chan *chan,
> > u32 irq_mask)
> > +{
> > +   u32 val;
> > +
> > +   if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > DWAXIDMAC_IRQ_NONE);
> > +   } else {
>
> I don't see the benefit. (Yes, I see one read-less path, I think it
> makes perplexity for nothing here)
This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until I
add DMA_SLAVE support.
But I can cut off this 'if' statment, if it is necessery.

> > +   val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> > +   val &= ~irq_mask;
> > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> > +   }
> > +}
> > +static inline void axi_chan_disable(struct axi_dma_chan *chan)
> > +{
> > +}
> > +
> > +static inline void axi_chan_enable(struct axi_dma_chan *chan)
> > +{
> > +}
> > +static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan,
> > dma_addr_t src,
> > +      dma_addr_t dst, size_t len)
> > +{
> > +   u32 max_width = chan->chip->dw->hdata->m_data_width;
> > +   size_t sdl = (src | dst | len);
>
> Redundant parens, redundant temporary variable (you may do this in
> place).
Ok.

> > +
> > +   return min_t(size_t, __ffs(sdl), max_width);
> > +}
> > +static void axi_desc_put(struct axi_dma_desc *desc)
> > +{
> > +   struct axi_dma_chan *chan = desc->chan;
> > +   struct dw_axi_dma *dw = chan->chip->dw;
> > +   struct axi_dma_desc *child, *_next;
> > +   unsigned int descs_put = 0;
> > +   list_for_each_entry_safe(child, _next, >xfer_list,
> > xfer_list) {
>
> xfer_list looks redundant.
> Can you elaborate why virtual channel management is not working for
> you?
Each virtual descriptor encapsulates several hardware descriptors,
which belong to same transfer.
This list (xfer_list) is used only for allocating/freeing these
descriptors and it doesn't affect on virtual dma work logic.
I can see this approach in several drivers with VirtDMA (but they
mostly use array instead of list)

> > +   list_del(>xfer_list);
> > +   dma_pool_free(dw->desc_pool, child, child-
> > > vd.tx.phys);
> >
> > +   descs_put++;
> > +   }
> > +}
> > +/* Called in chan locked context */
> > +static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
> > +     struct axi_dma_desc *first)
> 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-21 Thread Eugeniy Paltsev
Hi Andy,
thanks for respond.
My comments are inlined below.

On Tue, 2017-04-18 at 15:31 +0300, Andy Shevchenko wrote:
> On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> > This patch adds support for the DW AXI DMAC controller.
> >
> > DW AXI DMAC is a part of upcoming development board from Synopsys.
> >
> > In this driver implementation only DMA_MEMCPY and DMA_SG transfers
> > are supported.
> >
> > +++ b/drivers/dma/axi_dma_platform.c
> > @@ -0,0 +1,1044 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
>
> Are you sure you still need of.h along with depends OF ?
"of_match_ptr" used from of.h

> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "axi_dma_platform.h"
> > +#include "axi_dma_platform_reg.h"
>
> Can't you have this in one header?
Sure.

> > +#include "dmaengine.h"
> > +#include "virt-dma.h"
> > +#define AXI_DMA_BUSWIDTHS    \
> > +   (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> > +   DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> > +   DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> > +   DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> > +   DMA_SLAVE_BUSWIDTH_16_BYTES | \
> > +   DMA_SLAVE_BUSWIDTH_32_BYTES | \
> > +   DMA_SLAVE_BUSWIDTH_64_BYTES)
> > +/* TODO: check: do we need to use BIT() macro here? */
>
> Still TODO? I remember I answered to this on the first round.
Yes, I remember it.
I left this TODO as a reminder because src_addr_widths and
dst_addr_widths are
not used anywhere and they are set differently in different drivers
(with or without BIT macro).

> > +
> > +static inline void
> > +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> > +{
> > +   iowrite32(val, chip->regs + reg);
>
> Are you going to use IO ports for this IP? I don't think so.
> Wouldn't be better to call readl()/writel() instead?
As I understand, it's better to use ioread/iowrite as more universal IO
access way. Am I wrong?

> > +}
> > +static inline void
> > +axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
> > +{
> > +   iowrite32(val & 0x, chan->chan_regs + reg);
> Useless conjunction.
>
> > +   iowrite32(val >> 32, chan->chan_regs + reg + 4);
> > +}
>
> Can your hardware get 8 bytes at once?
> For such cases we have iowrite64() for 64-bit kernels
>
> But with readq()/writeq() we have specific helpers to make this
> pretty,
> i.e. lo_hi_readq() / lo_hi_writeq() (or hi_lo_*() variants).
Ok, I possibly can use lo_hi_writeq here.

> > +static inline void axi_chan_irq_disable(struct axi_dma_chan *chan,
> > u32 irq_mask)
> > +{
> > +   u32 val;
> > +
> > +   if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> > DWAXIDMAC_IRQ_NONE);
> > +   } else {
>
> I don't see the benefit. (Yes, I see one read-less path, I think it
> makes perplexity for nothing here)
This function is called mostly with irq_mask = DWAXIDMAC_IRQ_ALL.
Actually it is called only with irq_mask = DWAXIDMAC_IRQ_ALL until I
add DMA_SLAVE support.
But I can cut off this 'if' statment, if it is necessery.

> > +   val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> > +   val &= ~irq_mask;
> > +   axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> > +   }
> > +}
> > +static inline void axi_chan_disable(struct axi_dma_chan *chan)
> > +{
> > +}
> > +
> > +static inline void axi_chan_enable(struct axi_dma_chan *chan)
> > +{
> > +}
> > +static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan,
> > dma_addr_t src,
> > +      dma_addr_t dst, size_t len)
> > +{
> > +   u32 max_width = chan->chip->dw->hdata->m_data_width;
> > +   size_t sdl = (src | dst | len);
>
> Redundant parens, redundant temporary variable (you may do this in
> place).
Ok.

> > +
> > +   return min_t(size_t, __ffs(sdl), max_width);
> > +}
> > +static void axi_desc_put(struct axi_dma_desc *desc)
> > +{
> > +   struct axi_dma_chan *chan = desc->chan;
> > +   struct dw_axi_dma *dw = chan->chip->dw;
> > +   struct axi_dma_desc *child, *_next;
> > +   unsigned int descs_put = 0;
> > +   list_for_each_entry_safe(child, _next, >xfer_list,
> > xfer_list) {
>
> xfer_list looks redundant.
> Can you elaborate why virtual channel management is not working for
> you?
Each virtual descriptor encapsulates several hardware descriptors,
which belong to same transfer.
This list (xfer_list) is used only for allocating/freeing these
descriptors and it doesn't affect on virtual dma work logic.
I can see this approach in several drivers with VirtDMA (but they
mostly use array instead of list)

> > +   list_del(>xfer_list);
> > +   dma_pool_free(dw->desc_pool, child, child-
> > > vd.tx.phys);
> >
> > +   descs_put++;
> > +   }
> > +}
> > +/* Called in chan locked context */
> > +static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
> > +     struct axi_dma_desc *first)
> 

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-18 Thread Andy Shevchenko
On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> This patch adds support for the DW AXI DMAC controller.
> 
> DW AXI DMAC is a part of upcoming development board from Synopsys.
> 
> In this driver implementation only DMA_MEMCPY and DMA_SG transfers
> are supported.

> 

> +++ b/drivers/dma/axi_dma_platform.c
> @@ -0,0 +1,1044 @@
> +/*
> + * Synopsys DesignWare AXI DMA Controller driver.
> + *
> + * Copyright (C) 2017 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or
> modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

> +#include 

Are you sure you still need of.h along with depends OF ?

> +#include 
> +#include 
> +#include 
> +#include 
> +

> +#include "axi_dma_platform.h"
> +#include "axi_dma_platform_reg.h"

Can't you have this in one header?

> +#include "dmaengine.h"
> +#include "virt-dma.h"

> +#define AXI_DMA_BUSWIDTHS  \
> + (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> + DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> + DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> + DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> + DMA_SLAVE_BUSWIDTH_16_BYTES | \
> + DMA_SLAVE_BUSWIDTH_32_BYTES | \
> + DMA_SLAVE_BUSWIDTH_64_BYTES)
> +/* TODO: check: do we need to use BIT() macro here? */

Still TODO? I remember I answered to this on the first round.

> +
> +static inline void
> +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> +{
> + iowrite32(val, chip->regs + reg);

Are you going to use IO ports for this IP? I don't think so.
Wouldn't be better to call readl()/writel() instead?

> +}

> +static inline void
> +axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
> +{
> + iowrite32(val & 0x, chan->chan_regs + reg);

Useless conjunction.

> + iowrite32(val >> 32, chan->chan_regs + reg + 4);
> +}

Can your hardware get 8 bytes at once?

For such cases we have iowrite64() for 64-bit kernels

But with readq()/writeq() we have specific helpers to make this pretty,
i.e. lo_hi_readq() / lo_hi_writeq() (or hi_lo_*() variants).

> +static inline void axi_chan_irq_disable(struct axi_dma_chan *chan,
> u32 irq_mask)
> +{
> + u32 val;
> +

> + if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> DWAXIDMAC_IRQ_NONE);
> + } else {

I don't see the benefit. (Yes, I see one read-less path, I think it
makes perplexity for nothing here)

> + val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> + val &= ~irq_mask;
> + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> + }
> +}

> +static inline void axi_chan_disable(struct axi_dma_chan *chan)
> +{

> +}
> +
> +static inline void axi_chan_enable(struct axi_dma_chan *chan)
> +{

> +}

> +static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan,
> dma_addr_t src,
> +    dma_addr_t dst, size_t len)
> +{
> + u32 max_width = chan->chip->dw->hdata->m_data_width;

> + size_t sdl = (src | dst | len);

Redundant parens, redundant temporary variable (you may do this in
place).

> +
> + return min_t(size_t, __ffs(sdl), max_width);
> +}

> +static void axi_desc_put(struct axi_dma_desc *desc)
> +{
> + struct axi_dma_chan *chan = desc->chan;
> + struct dw_axi_dma *dw = chan->chip->dw;
> + struct axi_dma_desc *child, *_next;
> + unsigned int descs_put = 0;

> + list_for_each_entry_safe(child, _next, >xfer_list,
> xfer_list) {

xfer_list looks redundant.
Can you elaborate why virtual channel management is not working for you?

> + list_del(>xfer_list);
> + dma_pool_free(dw->desc_pool, child, child-
> >vd.tx.phys);
> + descs_put++;
> + }

> +}

> +/* Called in chan locked context */
> +static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
> +   struct axi_dma_desc *first)
> +{

> + u32 reg, irq_mask;
> + u8 lms = 0;

Does it make any sense? It looks like lms is always 0.
Or I miss the source of its value?

> + u32 priority = chan->chip->dw->hdata->priority[chan->id];

Reversed xmas tree, please.

Btw, are you planning to use priority at all? For now on I didn't see a
single driver (from the set I have checked, like 4-5 of them) that uses
priority anyhow. It makes driver more complex for nothing.

> +
> + if (unlikely(axi_chan_is_hw_enable(chan))) {
> + dev_err(chan2dev(chan), "%s is non-idle!\n",
> +

Re: [PATCH v2 2/2] dmaengine: Add DW AXI DMAC driver

2017-04-18 Thread Andy Shevchenko
On Fri, 2017-04-07 at 17:04 +0300, Eugeniy Paltsev wrote:
> This patch adds support for the DW AXI DMAC controller.
> 
> DW AXI DMAC is a part of upcoming development board from Synopsys.
> 
> In this driver implementation only DMA_MEMCPY and DMA_SG transfers
> are supported.

> 

> +++ b/drivers/dma/axi_dma_platform.c
> @@ -0,0 +1,1044 @@
> +/*
> + * Synopsys DesignWare AXI DMA Controller driver.
> + *
> + * Copyright (C) 2017 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or
> modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

> +#include 

Are you sure you still need of.h along with depends OF ?

> +#include 
> +#include 
> +#include 
> +#include 
> +

> +#include "axi_dma_platform.h"
> +#include "axi_dma_platform_reg.h"

Can't you have this in one header?

> +#include "dmaengine.h"
> +#include "virt-dma.h"

> +#define AXI_DMA_BUSWIDTHS  \
> + (DMA_SLAVE_BUSWIDTH_1_BYTE  | \
> + DMA_SLAVE_BUSWIDTH_2_BYTES  | \
> + DMA_SLAVE_BUSWIDTH_4_BYTES  | \
> + DMA_SLAVE_BUSWIDTH_8_BYTES  | \
> + DMA_SLAVE_BUSWIDTH_16_BYTES | \
> + DMA_SLAVE_BUSWIDTH_32_BYTES | \
> + DMA_SLAVE_BUSWIDTH_64_BYTES)
> +/* TODO: check: do we need to use BIT() macro here? */

Still TODO? I remember I answered to this on the first round.

> +
> +static inline void
> +axi_dma_iowrite32(struct axi_dma_chip *chip, u32 reg, u32 val)
> +{
> + iowrite32(val, chip->regs + reg);

Are you going to use IO ports for this IP? I don't think so.
Wouldn't be better to call readl()/writel() instead?

> +}

> +static inline void
> +axi_chan_iowrite64(struct axi_dma_chan *chan, u32 reg, u64 val)
> +{
> + iowrite32(val & 0x, chan->chan_regs + reg);

Useless conjunction.

> + iowrite32(val >> 32, chan->chan_regs + reg + 4);
> +}

Can your hardware get 8 bytes at once?

For such cases we have iowrite64() for 64-bit kernels

But with readq()/writeq() we have specific helpers to make this pretty,
i.e. lo_hi_readq() / lo_hi_writeq() (or hi_lo_*() variants).

> +static inline void axi_chan_irq_disable(struct axi_dma_chan *chan,
> u32 irq_mask)
> +{
> + u32 val;
> +

> + if (likely(irq_mask == DWAXIDMAC_IRQ_ALL)) {
> + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA,
> DWAXIDMAC_IRQ_NONE);
> + } else {

I don't see the benefit. (Yes, I see one read-less path, I think it
makes perplexity for nothing here)

> + val = axi_chan_ioread32(chan, CH_INTSTATUS_ENA);
> + val &= ~irq_mask;
> + axi_chan_iowrite32(chan, CH_INTSTATUS_ENA, val);
> + }
> +}

> +static inline void axi_chan_disable(struct axi_dma_chan *chan)
> +{

> +}
> +
> +static inline void axi_chan_enable(struct axi_dma_chan *chan)
> +{

> +}

> +static u32 axi_chan_get_xfer_width(struct axi_dma_chan *chan,
> dma_addr_t src,
> +    dma_addr_t dst, size_t len)
> +{
> + u32 max_width = chan->chip->dw->hdata->m_data_width;

> + size_t sdl = (src | dst | len);

Redundant parens, redundant temporary variable (you may do this in
place).

> +
> + return min_t(size_t, __ffs(sdl), max_width);
> +}

> +static void axi_desc_put(struct axi_dma_desc *desc)
> +{
> + struct axi_dma_chan *chan = desc->chan;
> + struct dw_axi_dma *dw = chan->chip->dw;
> + struct axi_dma_desc *child, *_next;
> + unsigned int descs_put = 0;

> + list_for_each_entry_safe(child, _next, >xfer_list,
> xfer_list) {

xfer_list looks redundant.
Can you elaborate why virtual channel management is not working for you?

> + list_del(>xfer_list);
> + dma_pool_free(dw->desc_pool, child, child-
> >vd.tx.phys);
> + descs_put++;
> + }

> +}

> +/* Called in chan locked context */
> +static void axi_chan_block_xfer_start(struct axi_dma_chan *chan,
> +   struct axi_dma_desc *first)
> +{

> + u32 reg, irq_mask;
> + u8 lms = 0;

Does it make any sense? It looks like lms is always 0.
Or I miss the source of its value?

> + u32 priority = chan->chip->dw->hdata->priority[chan->id];

Reversed xmas tree, please.

Btw, are you planning to use priority at all? For now on I didn't see a
single driver (from the set I have checked, like 4-5 of them) that uses
priority anyhow. It makes driver more complex for nothing.

> +
> + if (unlikely(axi_chan_is_hw_enable(chan))) {
> + dev_err(chan2dev(chan), "%s is non-idle!\n",
> +