Re: [PATCH 4/9] dma: edma: Find missed events and issue them
On 08/01/2013 01:13 AM, Sekhar Nori wrote: > On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote: >> On 07/31/2013 04:18 AM, Sekhar Nori wrote: >>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote: >>>> Hi Sekhar, >>>> >>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote: >>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >>>>>> In an effort to move to using Scatter gather lists of any size with >>>>>> EDMA as discussed at [1] instead of placing limitations on the driver, >>>>>> we work through the limitations of the EDMAC hardware to find missed >>>>>> events and issue them. >>>>>> >>>>>> The sequence of events that require this are: >>>>>> >>>>>> For the scenario where MAX slots for an EDMA channel is 3: >>>>>> >>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null >>>>>> >>>>>> The above SG list will have to be DMA'd in 2 sets: >>>>>> >>>>>> (1) SG1 -> SG2 -> SG3 -> Null >>>>>> (2) SG4 -> SG5 -> SG6 -> Null >>>>>> >>>>>> After (1) is succesfully transferred, the events from the MMC controller >>>>>> donot stop coming and are missed by the time we have setup the transfer >>>>>> for (2). So here, we catch the events missed as an error condition and >>>>>> issue them manually. >>>>> >>>>> Are you sure there wont be any effect of these missed events on the >>>>> peripheral side. For example, wont McASP get into an underrun condition >>>>> when it encounters a null PaRAM set? Even UART has to transmit to a >>>> >>>> But it will not encounter null PaRAM set because McASP uses contiguous >>>> buffers for transfer which are not scattered across physical memory. >>>> This can be accomplished with an SG of size 1. For such SGs, this patch >>>> series leaves it linked Dummy and does not link to Null set. Null set is >>>> only used for SG lists that are > MAX_NR_SG in size such as those >>>> created for example by MMC and Crypto. >>>> >>>>> particular baud so I guess it cannot wait like the way MMC/SD can. >>>> >>>> Existing driver have to wait anyway if they hit MAX SG limit today. If >>>> they don't want to wait, they would have allocated a contiguous block of >>>> memory and DMA that in one stretch so they don't lose any events, and in >>>> such cases we are not linking to Null. >>> >>> As long as DMA driver can advertize its MAX SG limit, peripherals can >>> always work around that by limiting the number of sync events they >>> generate so as to not having any of the events getting missed. With this >>> series, I am worried that EDMA drivers is advertizing that it can handle >>> any length SG list while not taking care of missing any events while >>> doing so. This will break the assumptions that driver writers make. >> >> This is already being done by some other DMA engine drivers ;). We can >> advertise more than we can handle at a time, that's the basis of this >> whole idea. >> >> I understand what you're saying but events are not something that have >> be serviced immediately, they can be queued etc and the actually >> transfer from the DMA controller can be delayed. As long as we don't >> miss the event we are fine which my series takes care off. >> >> So far I have tested this series on following modules in various >> configurations and have seen no issues: >> - Crypto AES >> - MMC/SD >> - SPI (128x160 display) > > Notice how in each of these cases the peripheral is in control of when > data is driven out? Please test with McASP in a configuration where > codec drives the frame-sync/bit-clock or with UART under high baud rate. McASP allocates a contiguous buffer. For this case there is always an SG of size 1 and this patch series doesn't effect it at all, there is not stalling. Further McASP audio driver is still awaiting conversion to use DMA engine so there's no way yet to test it. >>>>> Also, wont this lead to under-utilization of the peripheral bandwith? >>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA >>>>> is waiting to be set-up. >>>> >>>> But it is waiting anyway even today. Currently based on MAX segs, MMC >>
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
Just some corrections here.. On 08/01/2013 03:28 PM, Joel Fernandes wrote: >>> 2. If the interrupt handler for some reason doesn't complete or get >>> service in time, we will end up DMA'ing incorrect data as events >>> wouldn't stop coming in even if interrupt is not yet handled (in your >>> example linked sets P1 or P2 would be old ones being repeated). Where as >>> with my method, we are not doing any DMA once we finish the current >>> MAX_NR_SG set even if events continue to come. >> >> Where is repetition and possibility of wrong data being transferred? We >> have a linear list of PaRAM sets - not a loop. You would link the end to >> PaRAM set chain to dummy PaRAM set which BTW will not cause missed >> events. The more number of PaRAM sets you add to the chain, the more > > There would have to be a loop, how else would you ensure continuity and > uninterrupted DMA? > > Consider if you have 2 sets of linked sets: > L1 is the first set of Linked sets and L2 is the second. > > When L1 is done, EDMA continues with L2 (due to the link) while > interrupt handler prepares L1. The continuity depends on L1 being linked > to L2. Only the absolute last break up of the MAX_NR_SG linked set will > be linked to Dummy. > > So consider MAX_NR_SG=10, and sg_len = 35 > > L1 - L2 - L1 - L1 - Dummy Should be, L1 - L2 - L1 - L2 - Dummy > > The split would be in number of slots, > 10 - 10 - 10 - 5 - Dummy > >> time CPU gets to intervene before DMA eventually stalls. This is a >> tradeoff system designers can manage. > > Consider what happens in the case where MAX_SG_NR=1 or 2. In that case, > there's a change we might not get enough time for the interrupt handler > to setup next series of linked set. > > Some how this limitation has to be overcome by advising in comments than > MAX_SG_NR should always be greater than a certain number to ensure > proper operation. s/than/that/ Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
Hi Sekhar, Thanks for your detailed illustrations. On 08/02/2013 08:26 AM, Sekhar Nori wrote: [..] >> This can be used only for buffers that are contiguous in memory, not >> those that are scattered across memory. > > I was hinting at using the linking facility of EDMA to achieve this. > Each PaRAM set has full 32-bit source and destination pointers so I see > no reason why non-contiguous case cannot be handled. > > Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are > typically 4 times the number of channels. In this case we use one DMA > PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set > and P1 and P2 are the Link sets. > > Initial setup: > > SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL > ^ ^ ^ > | | | > P0 -> P1 -> P2 -> NULL > > P[0..2].TCINTEN = 1, so get an interrupt after each SG element > completion. On each completion interrupt, hardware automatically copies > the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred > out, the state of hardware is: > > SG1 -> SG2 -> SG3 -> SG3 -> SG6 -> NULL > ^ ^ > | | > P0,1P2 -> NULL > | ^ > | | > - > > SG1 transfer has already started by the time the TC interrupt is > handled. As you can see P1 is now redundant and ready to be recycled. So > in the interrupt handler, software recycles P1. Thus: > > SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL > ^ ^ ^ > | | | > P0 -> P2 -> P1 -> NULL > > Now, on next interrupt, P2 gets copied and thus can get recycled. > Hardware state: > > SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL > ^ ^ > | | > P0,2P1 -> NULL > | ^ > | | > - > > As part of TC completion interrupt handling: > > SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL > ^ ^ ^ > | | | > P0 -> P1 -> P2 -> NULL > > This goes on until the SG list in exhausted. If you use more PaRAM sets, > interrupt handler gets more time to recycle the PaRAM set. At no point > we touch P0 as it is always under active transfer. Thus the peripheral > is always kept busy. > > Do you see any reason why such a mechanism cannot be implemented? This is possible and looks like another way to do it, but there are 2 problems I can see with it. 1. Its inefficient because of too many interrupts: Imagine case where we have an SG list of size 30 and MAX_NR_SG size is 10. This method will trigger 30 interrupts always, where as with my patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR , you'd get even fewer interrupts. >>> >>> Yes, but you are seeing only one side of inefficiency. In your design >>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA >>> is to keep it going while CPU does bookeeping in background. This is >>> simply not going to scale with fast peripherals. >> >> Agreed. So far though, I've no way to reproduce a fast peripheral that >> scatters data across physical memory and suffers from any stall. >> >>> Besides, missed events are error conditions as far as EDMA and the >>> peripheral is considered. You are handling error interrupt to support a >>> successful transaction. Think about why EDMA considers missed events as >>> error condition. >> >> I agree with this, its not the best way to do it. I have been working on >> a different approach. >> >> However, in support of the series: >> 1. It doesn't break any existing code >> 2. It works for all current DMA users (performance and correctness) >> 3. It removes the SG limitations on DMA users. > > Right, all of this should be true even with the approach I am suggesting. > >> So what you suggested, would be more of a feature addition than a >> limitation of this series. It is atleast better than what's being done >> now - forcing the limit to the total number of SGs, so it is a step in >> the right direction. > > No, I do not see my approach is an feature addition to what you are > doing. They are both very contrasting ways. For example, you would not > need the manual (re)trigger in CC error condition in what I am proposing. > >> 2. If the interrupt handler for some reason doesn't complete or get service in time, we will end up DMA'ing incorrect data as events wouldn't stop coming in even if interrupt is not yet handled (in your example linked sets P1 or P2 would be old ones being repeated). Where as with my method, we are not doing any DMA once we finish the current MAX_NR_SG set even if events continue to come. >>> >>> Where is repetition and possibility of wrong data being transferred? We >>> have a linear list of PaRAM sets - not a loop. You
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
Hi Sekhar, Considering you agree with my understanding of the approach you proposed, I worked on some code to quickly try the different approach (ping-pong) between sets, here is a hack patch: https://github.com/joelagnel/linux-kernel/commits/dma/edma-no-sg-limits-interleaved As I suspected it also has problems with missing interrupts, coming back to my other point about getting errors if ISR doesn't get enough time to setup for the next transfer. If you'd use < 5 MAX_NR slots you start seeing EDMA errors. For > 5 slots, I don't see errors, but there is stalling because of missed interrupts. I observe that for an SG-list of size 10, it takes atleast 7 ms before the interrupt handlers (ISR) gets a chance to execute. This I feel is quite long, what is your opinion about this? Describing my approach here: If MAX slots is 10 for example, we split it into 2 cyclically linked sets of size 5 each. Interrupts are setup to trigger for every 5 PaRAM set transfers. After the first 5 transfer, the ISR recycles them for the next 5 entries in the SG-list. This happens in parallel/simultaneously as the second set of 5 are being transferred. Thanks, -Joel On 08/02/2013 01:15 PM, Joel Fernandes wrote:[..] > Even in your diagrams you are actually showing such a cyclic link > > >>>>>> >>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL >>>>>> ^ ^ ^ >>>>>> | | | >>>>>> P0 -> P2 -> P1 -> NULL > > Comparing this.. > >>>>>> >>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled. >>>>>> Hardware state: >>>>>> >>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL >>>>>> ^ ^ >>>>>> | | >>>>>> P0,2P1 -> NULL >>>>>> | ^ >>>>>> | | >>>>>> - >>>>>> >>>>>> As part of TC completion interrupt handling: >>>>>> >>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL >>>>>> ^ ^ ^ >>>>>> | | | >>>>>> P0 -> P1 -> P2 -> NULL > > .. with this. Notice that P2 -> P1 became P1 -> P2 > > The next thing logical diagram would look like: > >>>>>> >>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled. >>>>>> Hardware state: >>>>>> >>>>>> SG3 -> SG4 -> SG5 -> SG6 -> NULL >>>>>> ^ ^ >>>>>> | | >>>>>> P0,1P2 -> NULL >>>>>> | ^ >>>>>> | | >>>>>> - >>>>>> >>>>>> As part of TC completion interrupt handling: >>>>>> >>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL >>>>>> ^ ^ ^ >>>>>> | | | >>>>>> P0 -> P2 -> P1 -> NULL > > > "P1 gets copied" happens only because of the cyclic link from P2 to P1, > it wouldn't have happened if P2 was linked to Dummy as you described. > > Now coming to 2 linked sets vs 1, I meant the same thing that to give > interrupt handler more time, we could have something like: > >>>>>> As part of TC completion interrupt handling: >>>>>> >>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL >>>>>> ^ ^ ^ >>>>>> | | | >>>>>> P0 -> P1 -> P2 -> P3 -> P4 -> Null > > So what I was describing as 2 sets of linked sets is P1 and P2 being 1 > set, and P3 and P4 being another set. We would then recycle a complete > set at the same time. That way interrupt handler could do more at once > and get more time to recycle. So we would setup TC interrupts only for > P2 and P4 in the above diagrams. > > Thanks, > > -Joel > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 08/12] dma: edma: Add one more required slot to MAX slots
We'd now need a separate slot just for the channel and separate ones for the 2 linked sets, so we make adjustments to allocate an extra channel accordingly. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index a242269..df50a04 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -48,7 +48,7 @@ /* Max of 16 segments per channel to conserve PaRAM slots */ #define MAX_NR_SG 16 -#define EDMA_MAX_SLOTS MAX_NR_SG +#define EDMA_MAX_SLOTS (MAX_NR_SG+1) #define EDMA_DESCRIPTORS 16 struct edma_desc { @@ -311,6 +311,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len; + /* Allocate one extra to account for the channel itself */ + num_slots_needed++; + for (i = 0; i < num_slots_needed; i++) { if (echan->slot[i] < 0) { echan->slot[i] = -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 05/12] ARM: edma: Add function to enable interrupt for a PaRAM slot
To prevent common programming errors, add a function to enable interrupts correctly. Also keeps calling code more readable. Signed-off-by: Joel Fernandes --- arch/arm/common/edma.c | 15 +++ include/linux/platform_data/edma.h |1 + 2 files changed, 16 insertions(+) diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 6433b6c..aa43c49 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1095,6 +1095,21 @@ void edma_set_transfer_params(unsigned slot, EXPORT_SYMBOL(edma_set_transfer_params); /** + * edma_enable_interrupt - enable interrupt on parameter RAM slot + * @slot: parameter RAM slot which is the link target + * + * The originating slot should not be part of any active DMA transfer. + */ +void edma_enable_interrupt(unsigned slot) +{ + pr_debug("Enabling interrupt on slot %d\n", slot); + + edma_parm_modify(EDMA_CTLR(slot), PARM_OPT, EDMA_CHAN_SLOT(slot), +~TCINTEN, TCINTEN); +} +EXPORT_SYMBOL(edma_enable_interrupt); + +/** * edma_link - link one parameter RAM slot to another * @from: parameter RAM slot originating the link * @to: parameter RAM slot which is the link target diff --git a/include/linux/platform_data/edma.h b/include/linux/platform_data/edma.h index 57300fd..184ff5f 100644 --- a/include/linux/platform_data/edma.h +++ b/include/linux/platform_data/edma.h @@ -136,6 +136,7 @@ void edma_set_dest_index(unsigned slot, s16 dest_bidx, s16 dest_cidx); void edma_set_transfer_params(unsigned slot, u16 acnt, u16 bcnt, u16 ccnt, u16 bcnt_rld, enum sync_dimension sync_mode); void edma_link(unsigned from, unsigned to); +void edma_enable_interrupt(unsigned slot); void edma_unlink(unsigned from); /* calls that operate on an entire parameter RAM slot */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 06/12] ARM: edma: Add pr_debug in edma_link
Useful for visualizing linking of PaRAM slots Signed-off-by: Joel Fernandes --- arch/arm/common/edma.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index aa43c49..34d3fc9 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1125,6 +1125,8 @@ void edma_link(unsigned from, unsigned to) ctlr_to = EDMA_CTLR(to); to = EDMA_CHAN_SLOT(to); + pr_debug("Setting up link %d -> %d\n", from, to); + if (from >= edma_cc[ctlr_from]->num_slots) return; if (to >= edma_cc[ctlr_to]->num_slots) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 03/12] dma: edma: remove limits on number of slots
With this series, this check is no longer required and we shouldn't need to reject drivers DMA'ing more than the MAX number of slots. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 7b0853c..b6d609c 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -247,12 +247,6 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( return NULL; } - if (sg_len > MAX_NR_SG) { - dev_err(dev, "Exceeded max SG segments %d > %d\n", - sg_len, MAX_NR_SG); - return NULL; - } - edesc = kzalloc(sizeof(*edesc) + sg_len * sizeof(edesc->pset[0]), GFP_ATOMIC); if (!edesc) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 10/12] dma: edma: Check if MAX_NR_SG is even in prep function
Splitting of MAX available slots into 2 sets of size MAX_NR_LS requires to the MAX_NR_SG function to be even. We ensure the same in prep function. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 70923a2..061f0cf 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -354,6 +354,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( int src_bidx, dst_bidx, src_cidx, dst_cidx; int i, num_slots_needed; + if (MAX_NR_SG & 1) + dev_err(dev, "%s: MAX_NR_SG must be even\n", __func__); + if (unlikely(!echan || !sgl || !sg_len)) return NULL; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 04/12] dma: edma: Write out and handle MAX_NR_SG at a given time
Process SG-elements in batches of MAX_NR_SG if they are greater than MAX_NR_SG. Due to this, at any given time only those many slots will be used in the given channel no matter how long the scatter list is. We keep track of how much has been written inorder to process the next batch of elements in the scatter-list and detect completion. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 76 ++-- 1 file changed, 50 insertions(+), 26 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index b6d609c..080d669 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -56,6 +56,7 @@ struct edma_desc { struct list_headnode; int absync; int pset_nr; + int total_processed; struct edmacc_param pset[0]; }; @@ -104,22 +105,36 @@ static void edma_desc_free(struct virt_dma_desc *vdesc) /* Dispatch a queued descriptor to the controller (caller holds lock) */ static void edma_execute(struct edma_chan *echan) { - struct virt_dma_desc *vdesc = vchan_next_desc(>vchan); + struct virt_dma_desc *vdesc; struct edma_desc *edesc; - int i; + struct device *dev = echan->vchan.chan.device->dev; - if (!vdesc) { - echan->edesc = NULL; - return; + int i, j, total_left, total_process; + + /* If either we processed all psets or we're still not started */ + if (!echan->edesc || + echan->edesc->pset_nr == echan->edesc->total_processed) { + /* Get next vdesc */ + vdesc = vchan_next_desc(>vchan); + if (!vdesc) { + echan->edesc = NULL; + return; + } + list_del(>node); + echan->edesc = to_edma_desc(>tx); } - list_del(>node); + edesc = echan->edesc; + + /* Find out how many left */ + total_left = edesc->pset_nr - edesc->total_processed; + total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left; - echan->edesc = edesc = to_edma_desc(>tx); /* Write descriptor PaRAM set(s) */ - for (i = 0; i < edesc->pset_nr; i++) { - edma_write_slot(echan->slot[i], >pset[i]); + for (i = 0; i < total_process; i++) { + j = i + edesc->total_processed; + edma_write_slot(echan->slot[i], >pset[j]); dev_dbg(echan->vchan.chan.device->dev, "\n pset[%d]:\n" " chnum\t%d\n" @@ -132,24 +147,29 @@ static void edma_execute(struct edma_chan *echan) " bidx\t%08x\n" " cidx\t%08x\n" " lkrld\t%08x\n", - i, echan->ch_num, echan->slot[i], - edesc->pset[i].opt, - edesc->pset[i].src, - edesc->pset[i].dst, - edesc->pset[i].a_b_cnt, - edesc->pset[i].ccnt, - edesc->pset[i].src_dst_bidx, - edesc->pset[i].src_dst_cidx, - edesc->pset[i].link_bcntrld); + j, echan->ch_num, echan->slot[i], + edesc->pset[j].opt, + edesc->pset[j].src, + edesc->pset[j].dst, + edesc->pset[j].a_b_cnt, + edesc->pset[j].ccnt, + edesc->pset[j].src_dst_bidx, + edesc->pset[j].src_dst_cidx, + edesc->pset[j].link_bcntrld); /* Link to the previous slot if not the last set */ - if (i != (edesc->pset_nr - 1)) + if (i != (total_process - 1)) edma_link(echan->slot[i], echan->slot[i+1]); /* Final pset links to the dummy pset */ else edma_link(echan->slot[i], echan->ecc->dummy_slot); } - edma_start(echan->ch_num); + edesc->total_processed += total_process; + + if (edesc->total_processed <= MAX_NR_SG) { + dev_dbg(dev, "first transfer starting %d\n", echan->ch_num); + edma_start(echan->ch_num); + } } static int edma_terminate_all(struct edma_chan *echan) @@ -358,19 +378,23 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) struct edma_desc *edesc; unsigned long flags; - /* Stop the channel */ - edma_stop(echan->ch_num); -
[PATCH v3 11/12] dma: edma: Keep tracking of Pending interrupts (pending_acks)
The total_processed variable was being used to keep track of when DMA is completed for a particular descriptor. This doesn't work anymore, as the interrupt for completion can come in much later than when the total_processed variable is updated to reflect that all SG entries have been issues. Introduce another variable that reflects the actual value. This will serve a purpose different from what total_processed was required to help with. Also add code that give special treatment for the first linked set interrupt as pending_acks is handled a bit differently for that case. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 32 +--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 061f0cf..eca1b47 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -59,6 +59,10 @@ struct edma_desc { int pset_nr; int total_processed; int next_setup_linkset; + + int no_first_ls_ack; + int pending_acks; + struct edmacc_param pset[0]; }; @@ -147,8 +151,7 @@ static void edma_execute(struct edma_chan *echan) struct edmacc_param tmp_param; /* If either we processed all psets or we're still not started */ - if (!echan->edesc || - echan->edesc->pset_nr == echan->edesc->total_processed) { + if (!echan->edesc || !echan->edesc->pending_acks) { /* Get next vdesc */ vdesc = vchan_next_desc(>vchan); if (!vdesc) { @@ -180,6 +183,8 @@ static void edma_execute(struct edma_chan *echan) edesc->total_processed += total_link_set; + edesc->pending_acks += total_link_set; + total_left = edesc->pset_nr - edesc->total_processed; total_link_set = total_left > MAX_NR_LS ? @@ -190,6 +195,8 @@ static void edma_execute(struct edma_chan *echan) where total pset_nr is strictly within MAX_NR size */ if (total_left > total_link_set) edma_enable_interrupt(echan->slot[i]); + else + edesc->no_first_ls_ack = 1; /* Setup link between linked set 0 to set 1 */ edma_link(echan->slot[i], echan->slot[i+1]); @@ -210,6 +217,9 @@ static void edma_execute(struct edma_chan *echan) } edesc->total_processed += total_link_set; + + edesc->pending_acks += total_link_set; + total_left = edesc->pset_nr - edesc->total_processed; if (total_left) @@ -267,6 +277,8 @@ static void edma_execute(struct edma_chan *echan) edesc->total_processed += total_link_set; + edesc->pending_acks += total_link_set; + slot_off = total_link_set + ls_cur_off - 1; if (edesc->total_processed == edesc->pset_nr) @@ -498,8 +510,22 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) edesc = echan->edesc; + if (edesc->pending_acks < MAX_NR_LS) + edesc->pending_acks = 0; + else + edesc->pending_acks -= MAX_NR_LS; + + if (edesc->no_first_ls_ack) { + if (edesc->pending_acks < MAX_NR_LS) + edesc->pending_acks = 0; + else + edesc->pending_acks -= MAX_NR_LS; + + edesc->no_first_ls_ack = 0; + } + if (edesc) { - if (edesc->total_processed == edesc->pset_nr) { + if (!edesc->pending_acks) { dev_dbg(dev, "Transfer complete," " stopping channel %d\n", ch_num); edma_stop(echan->ch_num); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 12/12] dma: edma: Return if nothing left todo in edma_execute
It is possible edma_execute is called even when all the SG elements have been submitted for transmission, we add a check for the same and avoid executing the rest of the function. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index eca1b47..62987fc 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -168,6 +168,9 @@ static void edma_execute(struct edma_chan *echan) total_left = edesc->pset_nr - edesc->total_processed; total_link_set = total_left > MAX_NR_LS ? MAX_NR_LS : total_left; + if (!total_left) + return; + /* First time, setup 2 cyclically linked sets, each containing half the slots allocated for this channel */ if (edesc->total_processed == 0) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 02/12] ARM: edma: Don't clear EMR of channel in edma_stop
We certainly don't want error conditions to be cleared any other place but the EDMA error handler, as this will make us 'forget' about missed events we might need to know errors have occurred. This fixes a race condition where the EMR was being cleared by the transfer completion interrupt handler. Basically, what was happening was: Missed event | | V SG1-SG2-SG3-Null \ \__TC Interrupt (Almost same time as ARM is executing TC interrupt handler, an event got missed and also forgotten by clearing the EMR). This causes the following problems: 1. If error interrupt is also pending and TC interrupt clears the EMR by calling edma_stop as has been observed in the edma_callback function, the ARM will execute the error interrupt even though the EMR is clear. As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens enough number of times, IRQ subsystem disables the interrupt thinking its spurious which makes error handler never execute again. 2. Also even if error handler doesn't return IRQ_NONE, the removing of EMR removes the knowledge about which channel had a missed event, and thus a manual trigger on such channels cannot be performed. The EMR is ultimately being cleared by the Error interrupt handler once it is handled so we remove code that does it in edma_stop and allow it to happen there. Signed-off-by: Joel Fernandes --- arch/arm/common/edma.c |1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 3567ba1..6433b6c 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1307,7 +1307,6 @@ void edma_stop(unsigned channel) edma_shadow0_write_array(ctlr, SH_EECR, j, mask); edma_shadow0_write_array(ctlr, SH_ECR, j, mask); edma_shadow0_write_array(ctlr, SH_SECR, j, mask); - edma_write_array(ctlr, EDMA_EMCR, j, mask); pr_debug("EDMA: EER%d %08x\n", j, edma_shadow0_read_array(ctlr, SH_EER, j)); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 07/12] dma: edma: Add function to dump a PaRAM set from PaRAM
Previously, such a dump function was used but it wasn't reading from the PaRAM, rather just from a edmacc_param structure, we add a helpful function for debugging that directly reads from the PaRAM and gives the current state correctly (including links and interrupt information). Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 080d669..a242269 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -102,6 +102,37 @@ static void edma_desc_free(struct virt_dma_desc *vdesc) kfree(container_of(vdesc, struct edma_desc, vdesc)); } +static inline void dump_pset(struct edma_chan *echan, int slot, +struct edmacc_param *pset_unused, int pset_idx) +{ + struct edmacc_param pset_local, *pset; + pset = _local; + + edma_read_slot(slot, pset); + + dev_dbg(echan->vchan.chan.device->dev, + "\n pset[%d]:\n" + " chnum\t%d\n" + " slot\t%d\n" + " opt\t%08x\n" + " src\t%08x\n" + " dst\t%08x\n" + " abcnt\t%08x\n" + " ccnt\t%08x\n" + " bidx\t%08x\n" + " cidx\t%08x\n" + " lkrld\t%08x\n", + pset_idx, echan->ch_num, slot, + pset[0].opt, + pset[0].src, + pset[0].dst, + pset[0].a_b_cnt, + pset[0].ccnt, + pset[0].src_dst_bidx, + pset[0].src_dst_cidx, + pset[0].link_bcntrld); +} + /* Dispatch a queued descriptor to the controller (caller holds lock) */ static void edma_execute(struct edma_chan *echan) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 00/12] edma: Add support for SG lists of any length
Here is a more improved approach for DMA support of SG lists of any length in the EDMA DMA Engine driver. In the previous approach [1] we depended on error interrupts to detect missed events and manually retrigger them, however as discussed in [2], there are concerns this can be trouble some for high speed peripherals which may need a more real-time response from the DMA controller. In this approach, we divide the total no of MAX slots per channel, into 2 linked sets which are cyclically linked to each other (the cyclic link between the 2 sets make sure that the DMA is continuous till the whole SG list has exhausted). We then enable completion interrupts on both linked sets which results in recyling/preparing respective linked set with the next set of SG entries. The interrupt handler executes in parallel while the EDMA controller DMA's the next list. This results in no interruption. Special handling is done for first linked set (as we set up both linked sets initially before starting with the DMA), and last one where the cyclic link has to be broken and a link to the Dummy slot has to be created. Also we keep track of whether all pending DMA operations have completed before we can mark it as complete. [1] https://lkml.org/lkml/2013/7/29/312 [2] https://lkml.org/lkml/2013/7/30/54 Joel Fernandes (12): dma: edma: Setup parameters to DMA MAX_NR_SG at a time ARM: edma: Don't clear EMR of channel in edma_stop dma: edma: remove limits on number of slots dma: edma: Write out and handle MAX_NR_SG at a given time ARM: edma: Add function to enable interrupt for a PaRAM slot ARM: edma: Add pr_debug in edma_link dma: edma: Add function to dump a PaRAM set from PaRAM dma: edma: Add one more required slot to MAX slots dma: edma: Implement multiple linked sets for continuity dma: edma: Check if MAX_NR_SG is even in prep function dma: edma: Keep tracking of Pending interrupts (pending_acks) dma: edma: Return if nothing left todo in edma_execute arch/arm/common/edma.c | 18 ++- drivers/dma/edma.c | 279 +--- include/linux/platform_data/edma.h |1 + 3 files changed, 243 insertions(+), 55 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 09/12] dma: edma: Implement multiple linked sets for continuity
Here we implement splitting up of the total MAX number of slots available for a channel into 2 cyclically linked sets. Transfer completion Interrupts are enabled on both linked sets and respective handler recycles them on completion to process the next linked set. Both linked sets are cyclically linked to each other to ensure continuity of DMA operations. Interrupt handlers execute asynchronously to the EDMA events and recycles the linked sets at the right time, as a result EDMA is not blocked or dependent on interrupts and DMA continues till the end of the SG-lists without any interruption. Suggested-by: Sekhar Nori Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 157 +++- 1 file changed, 118 insertions(+), 39 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index df50a04..70923a2 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -48,6 +48,7 @@ /* Max of 16 segments per channel to conserve PaRAM slots */ #define MAX_NR_SG 16 +#define MAX_NR_LS (MAX_NR_SG >> 1) #define EDMA_MAX_SLOTS (MAX_NR_SG+1) #define EDMA_DESCRIPTORS 16 @@ -57,6 +58,7 @@ struct edma_desc { int absync; int pset_nr; int total_processed; + int next_setup_linkset; struct edmacc_param pset[0]; }; @@ -140,7 +142,9 @@ static void edma_execute(struct edma_chan *echan) struct edma_desc *edesc; struct device *dev = echan->vchan.chan.device->dev; - int i, j, total_left, total_process; + int i, total_left, total_link_set; + int ls_cur_off, ls_next_off, slot_off; + struct edmacc_param tmp_param; /* If either we processed all psets or we're still not started */ if (!echan->edesc || @@ -159,48 +163,121 @@ static void edma_execute(struct edma_chan *echan) /* Find out how many left */ total_left = edesc->pset_nr - edesc->total_processed; - total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left; - - - /* Write descriptor PaRAM set(s) */ - for (i = 0; i < total_process; i++) { - j = i + edesc->total_processed; - edma_write_slot(echan->slot[i], >pset[j]); - dev_dbg(echan->vchan.chan.device->dev, - "\n pset[%d]:\n" - " chnum\t%d\n" - " slot\t%d\n" - " opt\t%08x\n" - " src\t%08x\n" - " dst\t%08x\n" - " abcnt\t%08x\n" - " ccnt\t%08x\n" - " bidx\t%08x\n" - " cidx\t%08x\n" - " lkrld\t%08x\n", - j, echan->ch_num, echan->slot[i], - edesc->pset[j].opt, - edesc->pset[j].src, - edesc->pset[j].dst, - edesc->pset[j].a_b_cnt, - edesc->pset[j].ccnt, - edesc->pset[j].src_dst_bidx, - edesc->pset[j].src_dst_cidx, - edesc->pset[j].link_bcntrld); - /* Link to the previous slot if not the last set */ - if (i != (total_process - 1)) + total_link_set = total_left > MAX_NR_LS ? MAX_NR_LS : total_left; + + /* First time, setup 2 cyclically linked sets, each containing half + the slots allocated for this channel */ + if (edesc->total_processed == 0) { + for (i = 0; i < total_link_set; i++) { + edma_write_slot(echan->slot[i+1], >pset[i]); + + if (i != total_link_set - 1) { + edma_link(echan->slot[i+1], echan->slot[i+2]); + dump_pset(echan, echan->slot[i+1], + edesc->pset, i); + } + } + + edesc->total_processed += total_link_set; + + total_left = edesc->pset_nr - edesc->total_processed; + + total_link_set = total_left > MAX_NR_LS ? +MAX_NR_LS : total_left; + + if (total_link_set) { + /* Don't setup interrupt for first linked set for cases + where total pset_nr is strictly within MAX_NR size */ + if (total_left > total_link_set) + edma_enable_interrupt(echan->slot[i]); + + /* Setup link between linked set 0 to set
[PATCH v3 01/12] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
Changes are made here for configuring existing parameters to support DMA'ing them out in batches as needed. Also allocate as many as slots as needed by the SG list, but not more than MAX_NR_SG. Then these slots will be reused accordingly. For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only 10 slots will be allocated to DMA the entire SG list of size 40. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 5f3e532..7b0853c 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( enum dma_slave_buswidth dev_width; u32 burst; struct scatterlist *sg; - int i; int acnt, bcnt, ccnt, src, dst, cidx; int src_bidx, dst_bidx, src_cidx, dst_cidx; + int i, num_slots_needed; if (unlikely(!echan || !sgl || !sg_len)) return NULL; @@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( edesc->pset_nr = sg_len; - for_each_sg(sgl, sg, sg_len, i) { - /* Allocate a PaRAM slot, if needed */ + /* Allocate a PaRAM slot, if needed */ + + num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len; + + for (i = 0; i < num_slots_needed; i++) { if (echan->slot[i] < 0) { echan->slot[i] = edma_alloc_slot(EDMA_CTLR(echan->ch_num), @@ -273,6 +276,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( return NULL; } } + } + + /* Configure PaRAM sets for each SG */ + for_each_sg(sgl, sg, sg_len, i) { acnt = dev_width; @@ -330,6 +337,7 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( /* Configure A or AB synchronized transfers */ if (edesc->absync) edesc->pset[i].opt |= SYNCDIM; + /* If this is the last set, enable completion interrupt flag */ if (i == sg_len - 1) edesc->pset[i].opt |= TCINTEN; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 00/12] edma: Add support for SG lists of any length
Following offline discussions with Sekhar, we discussed some ideas to change a few things in this patch series to make it fail-safe. As such, the only changes we are making for v4 will be to not cyclically link immediately but doing so only once the ISR has finished setup (apart from other style cleanups). Any conditions where we are not able to modify the link in time (due to heavily loaded system) will be detected and reported by the use of linking to a NULL set. The new approach will be fast because of no requirement to stall or wait, and any DMA issues with heavily loaded systems can be detected as error conditions. This should architecturally be the final version of the patch series to add DMA support for SG lists of any length. Thanks, -Joel On 08/05/2013 11:14 AM, Joel Fernandes wrote: > Here is a more improved approach for DMA support of SG lists of any length > in the EDMA DMA Engine driver. > > In the previous approach [1] we depended on error interrupts to detect > missed events and manually retrigger them, however as discussed in [2], > there are concerns this can be trouble some for high speed peripherals > which may need a more real-time response from the DMA controller. > > In this approach, we divide the total no of MAX slots per channel, into > 2 linked sets which are cyclically linked to each other (the cyclic > link between the 2 sets make sure that the DMA is continuous till the whole > SG list has exhausted). We then enable completion interrupts on both linked > sets which results in recyling/preparing respective linked set with the > next set of SG entries. The interrupt handler executes in parallel while > the EDMA controller DMA's the next list. This results in no interruption. > > Special handling is done for first linked set (as we set up both linked > sets initially before starting with the DMA), and last one where the cyclic > link has to be broken and a link to the Dummy slot has to be created. > Also we keep track of whether all pending DMA operations have completed > before we can mark it as complete. > > [1] https://lkml.org/lkml/2013/7/29/312 > [2] https://lkml.org/lkml/2013/7/30/54 > > Joel Fernandes (12): > dma: edma: Setup parameters to DMA MAX_NR_SG at a time > ARM: edma: Don't clear EMR of channel in edma_stop > dma: edma: remove limits on number of slots > dma: edma: Write out and handle MAX_NR_SG at a given time > ARM: edma: Add function to enable interrupt for a PaRAM slot > ARM: edma: Add pr_debug in edma_link > dma: edma: Add function to dump a PaRAM set from PaRAM > dma: edma: Add one more required slot to MAX slots > dma: edma: Implement multiple linked sets for continuity > dma: edma: Check if MAX_NR_SG is even in prep function > dma: edma: Keep tracking of Pending interrupts (pending_acks) > dma: edma: Return if nothing left todo in edma_execute > > arch/arm/common/edma.c | 18 ++- > drivers/dma/edma.c | 279 > +--- > include/linux/platform_data/edma.h |1 + > 3 files changed, 243 insertions(+), 55 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 06/14] crypto: omap-aes: Remove previously used intermediate buffers
Intermdiate buffers were allocated, mapped and used for DMA. These are no longer required as we use the SGs from crypto layer directly in previous commits in the series. Also along with it, remove the logic for copying SGs etc as they are no longer used, and all the associated variables in omap_aes_device. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 90 - 1 file changed, 90 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 5e034a1..bbdd1c3 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -149,25 +149,13 @@ struct omap_aes_dev { struct ablkcipher_request *req; size_t total; struct scatterlist *in_sg; - struct scatterlist in_sgl; - size_t in_offset; struct scatterlist *out_sg; - struct scatterlist out_sgl; - size_t out_offset; - - size_t buflen; - void*buf_in; - size_t dma_size; int dma_in; struct dma_chan *dma_lch_in; - dma_addr_t dma_addr_in; - void*buf_out; int dma_out; struct dma_chan *dma_lch_out; int in_sg_len; int out_sg_len; - dma_addr_t dma_addr_out; - const struct omap_aes_pdata *pdata; }; @@ -352,33 +340,6 @@ static int omap_aes_dma_init(struct omap_aes_dev *dd) dd->dma_lch_out = NULL; dd->dma_lch_in = NULL; - dd->buf_in = (void *)__get_free_pages(GFP_KERNEL, OMAP_AES_CACHE_SIZE); - dd->buf_out = (void *)__get_free_pages(GFP_KERNEL, OMAP_AES_CACHE_SIZE); - dd->buflen = PAGE_SIZE << OMAP_AES_CACHE_SIZE; - dd->buflen &= ~(AES_BLOCK_SIZE - 1); - - if (!dd->buf_in || !dd->buf_out) { - dev_err(dd->dev, "unable to alloc pages.\n"); - goto err_alloc; - } - - /* MAP here */ - dd->dma_addr_in = dma_map_single(dd->dev, dd->buf_in, dd->buflen, -DMA_TO_DEVICE); - if (dma_mapping_error(dd->dev, dd->dma_addr_in)) { - dev_err(dd->dev, "dma %d bytes error\n", dd->buflen); - err = -EINVAL; - goto err_map_in; - } - - dd->dma_addr_out = dma_map_single(dd->dev, dd->buf_out, dd->buflen, - DMA_FROM_DEVICE); - if (dma_mapping_error(dd->dev, dd->dma_addr_out)) { - dev_err(dd->dev, "dma %d bytes error\n", dd->buflen); - err = -EINVAL; - goto err_map_out; - } - dma_cap_zero(mask); dma_cap_set(DMA_SLAVE, mask); @@ -405,14 +366,6 @@ static int omap_aes_dma_init(struct omap_aes_dev *dd) err_dma_out: dma_release_channel(dd->dma_lch_in); err_dma_in: - dma_unmap_single(dd->dev, dd->dma_addr_out, dd->buflen, -DMA_FROM_DEVICE); -err_map_out: - dma_unmap_single(dd->dev, dd->dma_addr_in, dd->buflen, DMA_TO_DEVICE); -err_map_in: - free_pages((unsigned long)dd->buf_out, OMAP_AES_CACHE_SIZE); - free_pages((unsigned long)dd->buf_in, OMAP_AES_CACHE_SIZE); -err_alloc: if (err) pr_err("error: %d\n", err); return err; @@ -422,11 +375,6 @@ static void omap_aes_dma_cleanup(struct omap_aes_dev *dd) { dma_release_channel(dd->dma_lch_out); dma_release_channel(dd->dma_lch_in); - dma_unmap_single(dd->dev, dd->dma_addr_out, dd->buflen, -DMA_FROM_DEVICE); - dma_unmap_single(dd->dev, dd->dma_addr_in, dd->buflen, DMA_TO_DEVICE); - free_pages((unsigned long)dd->buf_out, OMAP_AES_CACHE_SIZE); - free_pages((unsigned long)dd->buf_in, OMAP_AES_CACHE_SIZE); } static void sg_copy_buf(void *buf, struct scatterlist *sg, @@ -443,42 +391,6 @@ static void sg_copy_buf(void *buf, struct scatterlist *sg, scatterwalk_done(, out, 0); } -static int sg_copy(struct scatterlist **sg, size_t *offset, void *buf, - size_t buflen, size_t total, int out) -{ - unsigned int count, off = 0; - - while (buflen && total) { - count = min((*sg)->length - *offset, total); - count = min(count, buflen); - - if (!count) - return off; - - /* -* buflen and total are AES_BLOCK_SIZE size aligned, -* so count should be also aligned -*/ - - sg_copy_buf(buf + off, *s
[PATCH v2 00/14] crypto: omap-aes: Improve DMA, add PIO mode and support for AM437x
Following patch series rewrites the DMA code to be cleaner and faster. Earlier, only a single SG was used for DMA purpose, and the SG-list passed from the crypto layer was being copied and DMA'd one entry at a time. This turns out to be quite inefficient and lot of code, we replace it with much simpler approach that directly passes the SG-list from crypto to the DMA layers for cases where possible. For all cases where such a direct passing of SG list is not possible, we create a new SG-list and do the copying. This is still better than before, as we create an SG list as big as needed and not just 1-element list. We also add PIO mode support to the driver, and switch to it whenever the DMA channel allocation is not available. This also has shown to give good performance for small blocks as shown below. Tests have been performed on AM335x, OMAP4 and AM437x SoCs. Below is a sample run on AM335x SoC (beaglebone board), showing performance improvement (20% for 8K blocks): With DMA rewrite (key size = 128-bit) 16 byte blocks: 4318 operations in 1 seconds (69088 bytes) 64 byte blocks: 4360 operations in 1 seconds (279040 bytes) 256 byte blocks: 3609 operations in 1 seconds (923904 bytes) 1024 byte blocks: 3418 operations in 1 seconds (3500032 bytes) 8192 byte blocks: 1766 operations in 1 seconds (14467072 bytes) Without DMA rewrite: 16 byte blocks: 4417 operations in 1 seconds (70672 bytes) 64 byte blocks: 4221 operations in 1 seconds (270144 bytes) 256 byte blocks: 3528 operations in 1 seconds (903168 bytes) 1024 byte blocks: 3281 operations in 1 seconds (3359744 bytes) 8192 byte blocks: 1460 operations in 1 seconds (11960320 bytes) With PIO mode, good performance is observed for small blocks: 16 byte blocks: 20585 operations in 1 seconds (329360 bytes) 64 byte blocks: 8106 operations in 1 seconds (518784 bytes) 256 byte blocks: 2359 operations in 1 seconds (603904 bytes) 1024 byte blocks: 605 operations in 1 seconds (619520 bytes) 8192 byte blocks: 79 operations in 1 seconds (647168 bytes) Future work in this direction would be to dynamically change between PIO/DMA mode based on the block size. Changes since last series: * Unaligned cases for omap-aes are handled with patch: "Add support for cases of unaligned lengths" * Support for am437x SoC is added and tested. * Changes following review comments on debug patch Note: The debug patch: "crypto: omap-aes: Add useful debug macros" will generate a checkpatch error, which cannot be fixed. Refer to patch for error message and reasons for why cannot be fixed, thanks. Joel Fernandes (14): crypto: scatterwalk: Add support for calculating number of SG elements crypto: omap-aes: Add useful debug macros crypto: omap-aes: Populate number of SG elements crypto: omap-aes: Simplify DMA usage by using direct SGs crypto: omap-aes: Sync SG before DMA operation crypto: omap-aes: Remove previously used intermediate buffers crypto: omap-aes: Add IRQ info and helper macros crypto: omap-aes: PIO mode: Add IRQ handler and walk SGs crypto: omap-aes: PIO mode: platform data for OMAP4/AM437x and trigger crypto: omap-aes: Switch to PIO mode during probe crypto: omap-aes: Add support for cases of unaligned lengths crypto: omap-aes: Convert kzalloc to devm_kzalloc crypto: omap-aes: Convert request_irq to devm_request_irq crypto: omap-aes: Kconfig: Add build support for AM437x crypto/scatterwalk.c | 22 ++ drivers/crypto/Kconfig |2 +- drivers/crypto/omap-aes.c| 466 +++--- include/crypto/scatterwalk.h |2 + 4 files changed, 284 insertions(+), 208 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 08/14] crypto: omap-aes: PIO mode: Add IRQ handler and walk SGs
We add an IRQ handler that implements a state-machine for PIO-mode and data structures for walking the scatter-gather list. The IRQ handler is called in succession both when data is available to read or next data can be sent for processing. This process continues till the entire in/out SG lists have been walked. Once the SG-list has been completely walked, the IRQ handler schedules the done_task tasklet. Also add a useful macro that is used through out the IRQ code for a common pattern of calculating how much an SG list has been walked. This improves code readability and avoids checkpatch errors. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 90 + 1 file changed, 90 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 9a964e8..889dc99 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -39,6 +39,8 @@ #define DST_MAXBURST 4 #define DMA_MIN(DST_MAXBURST * sizeof(u32)) +#define _calc_walked(inout) (dd->inout##_walk.offset - dd->inout##_sg->offset) + /* OMAP TRM gives bitfields as start:end, where start is the higher bit number. For example 7:0 */ #define FLD_MASK(start, end) (((1 << ((start) - (end) + 1)) - 1) << (end)) @@ -91,6 +93,8 @@ #define FLAGS_FAST BIT(5) #define FLAGS_BUSY BIT(6) +#define AES_BLOCK_WORDS(AES_BLOCK_SIZE >> 2) + struct omap_aes_ctx { struct omap_aes_dev *dd; @@ -156,6 +160,8 @@ struct omap_aes_dev { size_t total; struct scatterlist *in_sg; struct scatterlist *out_sg; + struct scatter_walk in_walk; + struct scatter_walk out_walk; int dma_in; struct dma_chan *dma_lch_in; int dma_out; @@ -852,6 +858,90 @@ static const struct omap_aes_pdata omap_aes_pdata_omap4 = { .minor_shift= 0, }; +static irqreturn_t omap_aes_irq(int irq, void *dev_id) +{ + struct omap_aes_dev *dd = dev_id; + u32 status, i; + u32 *src, *dst; + + status = omap_aes_read(dd, AES_REG_IRQ_STATUS(dd)); + if (status & AES_REG_IRQ_DATA_IN) { + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x0); + + BUG_ON(!dd->in_sg); + + BUG_ON(_calc_walked(in) > dd->in_sg->length); + + src = sg_virt(dd->in_sg) + _calc_walked(in); + + for (i = 0; i < AES_BLOCK_WORDS; i++) { + omap_aes_write(dd, AES_REG_DATA_N(dd, i), *src); + + scatterwalk_advance(>in_walk, 4); + if (dd->in_sg->length == _calc_walked(in)) { + dd->in_sg = scatterwalk_sg_next(dd->in_sg); + if (dd->in_sg) { + scatterwalk_start(>in_walk, + dd->in_sg); + src = sg_virt(dd->in_sg) + + _calc_walked(in); + } + } else { + src++; + } + } + + /* Clear IRQ status */ + status &= ~AES_REG_IRQ_DATA_IN; + omap_aes_write(dd, AES_REG_IRQ_STATUS(dd), status); + + /* Enable DATA_OUT interrupt */ + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x4); + + } else if (status & AES_REG_IRQ_DATA_OUT) { + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x0); + + BUG_ON(!dd->out_sg); + + BUG_ON(_calc_walked(out) > dd->out_sg->length); + + dst = sg_virt(dd->out_sg) + _calc_walked(out); + + for (i = 0; i < AES_BLOCK_WORDS; i++) { + *dst = omap_aes_read(dd, AES_REG_DATA_N(dd, i)); + scatterwalk_advance(>out_walk, 4); + if (dd->out_sg->length == _calc_walked(out)) { + dd->out_sg = scatterwalk_sg_next(dd->out_sg); + if (dd->out_sg) { + scatterwalk_start(>out_walk, + dd->out_sg); + dst = sg_virt(dd->out_sg) + + _calc_walked(out); + } + } else { + dst++; + } + } + + dd->total -= AES_BLOCK_SIZE; + + BUG_ON(dd->total < 0); + +
[PATCH v2 07/14] crypto: omap-aes: Add IRQ info and helper macros
Add IRQ information to pdata and helper macros. These are required for PIO-mode support. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |8 1 file changed, 8 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index bbdd1c3..9a964e8 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -75,6 +75,10 @@ #define AES_REG_LENGTH_N(x)(0x54 + ((x) * 0x04)) +#define AES_REG_IRQ_STATUS(dd) ((dd)->pdata->irq_status_ofs) +#define AES_REG_IRQ_ENABLE(dd) ((dd)->pdata->irq_enable_ofs) +#define AES_REG_IRQ_DATA_INBIT(1) +#define AES_REG_IRQ_DATA_OUT BIT(2) #define DEFAULT_TIMEOUT(5*HZ) #define FLAGS_MODE_MASK0x000f @@ -120,6 +124,8 @@ struct omap_aes_pdata { u32 data_ofs; u32 rev_ofs; u32 mask_ofs; + u32 irq_enable_ofs; + u32 irq_status_ofs; u32 dma_enable_in; u32 dma_enable_out; @@ -836,6 +842,8 @@ static const struct omap_aes_pdata omap_aes_pdata_omap4 = { .data_ofs = 0x60, .rev_ofs= 0x80, .mask_ofs = 0x84, + .irq_status_ofs = 0x8c, + .irq_enable_ofs = 0x90, .dma_enable_in = BIT(5), .dma_enable_out = BIT(6), .major_mask = 0x0700, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 12/14] crypto: omap-aes: Convert kzalloc to devm_kzalloc
Use devm_kzalloc instead of kzalloc. With this change, there is no need to call kfree in error/exit paths. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index cbeaaf4..6a4ac4a 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -1147,7 +1147,7 @@ static int omap_aes_probe(struct platform_device *pdev) int err = -ENOMEM, i, j, irq = -1; u32 reg; - dd = kzalloc(sizeof(struct omap_aes_dev), GFP_KERNEL); + dd = devm_kzalloc(dev, sizeof(struct omap_aes_dev), GFP_KERNEL); if (dd == NULL) { dev_err(dev, "unable to alloc data struct.\n"); goto err_data; @@ -1241,7 +1241,6 @@ err_irq: tasklet_kill(>queue_task); pm_runtime_disable(dev); err_res: - kfree(dd); dd = NULL; err_data: dev_err(dev, "initialization failed.\n"); @@ -1269,7 +1268,6 @@ static int omap_aes_remove(struct platform_device *pdev) tasklet_kill(>queue_task); omap_aes_dma_cleanup(dd); pm_runtime_disable(dd->dev); - kfree(dd); dd = NULL; return 0; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 02/14] crypto: omap-aes: Add useful debug macros
When DEBUG is enabled, these macros can be used to print variables in integer and hex format, and clearly display which registers, offsets and values are being read/written , including printing the names of the offsets and their values. Note: This patch results in a checkpatch error that cannot be fixed. ERROR: Macros with multiple statements should be enclosed in a do - while loop +#define omap_aes_read(dd, offset) \ + __raw_readl(dd->io_base + offset); \ + pr_debug("omap_aes_read(" #offset ")\n"); Using do-while loop will break a lot of code such as: ret = omap_aes_read(..); On the other hand, not using a do-while loop will only result in a spurious debug print message when DEBUG is enabled, all other issues would be caught at compile time if any. As such, there is no code in the driver as of now that requires a do-while loop, but there is code that will break if a do-while loop is used in the macro so we ignore the checkpatch error in this case. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index ee15b0f..26b802b 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -13,7 +13,8 @@ * */ -#define pr_fmt(fmt) "%s: " fmt, __func__ +#define prn(num) pr_debug(#num "=%d\n", num) +#define prx(num) pr_debug(#num "=%x\n", num) #include #include @@ -172,16 +173,35 @@ struct omap_aes_dev { static LIST_HEAD(dev_list); static DEFINE_SPINLOCK(list_lock); +#ifdef DEBUG +/* + * Note: In DEBUG mode, when using conditionals, omap_aes_read _must_ + * be surrounded by braces otherwise you may see spurious prints. + */ +#define omap_aes_read(dd, offset) \ + __raw_readl(dd->io_base + offset); \ + pr_debug("omap_aes_read(" #offset ")\n"); +#else static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) { return __raw_readl(dd->io_base + offset); } +#endif +#ifdef DEBUG +#define omap_aes_write(dd, offset, value) \ + do {\ + pr_debug("omap_aes_write(" #offset "=%x) value=%d\n", \ +offset, value);\ + __raw_writel(value, dd->io_base + offset); \ + } while (0) +#else static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, u32 value) { __raw_writel(value, dd->io_base + offset); } +#endif static inline void omap_aes_write_mask(struct omap_aes_dev *dd, u32 offset, u32 value, u32 mask) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 04/14] crypto: omap-aes: Simplify DMA usage by using direct SGs
In early version of this driver, assumptions were made such as DMA layer requires contiguous buffers etc. Due to this, new buffers were allocated, mapped and used for DMA. These assumptions are no longer true and DMAEngine scatter-gather DMA doesn't have such requirements. We simply the DMA operations by directly using the scatter-gather buffers provided by the crypto layer instead of creating our own. Lot of logic that handled DMA'ing only X number of bytes of the total, or as much as fitted into a 3rd party buffer is removed and is no longer required. Also, good performance improvement of atleast ~20% seen with encrypting a buffer size of 8K (1800 ops/sec vs 1400 ops/sec). Improvement will be higher for much larger blocks though such benchmarking is left as an exercise for the reader. Also DMA usage is much more simplified and coherent with rest of the code. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 147 - 1 file changed, 25 insertions(+), 122 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index e369e6e..64dd5c1 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -480,22 +480,14 @@ static int sg_copy(struct scatterlist **sg, size_t *offset, void *buf, } static int omap_aes_crypt_dma(struct crypto_tfm *tfm, - struct scatterlist *in_sg, struct scatterlist *out_sg) + struct scatterlist *in_sg, struct scatterlist *out_sg, + int in_sg_len, int out_sg_len) { struct omap_aes_ctx *ctx = crypto_tfm_ctx(tfm); struct omap_aes_dev *dd = ctx->dd; struct dma_async_tx_descriptor *tx_in, *tx_out; struct dma_slave_config cfg; - dma_addr_t dma_addr_in = sg_dma_address(in_sg); - int ret, length = sg_dma_len(in_sg); - - pr_debug("len: %d\n", length); - - dd->dma_size = length; - - if (!(dd->flags & FLAGS_FAST)) - dma_sync_single_for_device(dd->dev, dma_addr_in, length, - DMA_TO_DEVICE); + int ret; memset(, 0, sizeof(cfg)); @@ -514,7 +506,7 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, return ret; } - tx_in = dmaengine_prep_slave_sg(dd->dma_lch_in, in_sg, 1, + tx_in = dmaengine_prep_slave_sg(dd->dma_lch_in, in_sg, in_sg_len, DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!tx_in) { @@ -533,7 +525,7 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, return ret; } - tx_out = dmaengine_prep_slave_sg(dd->dma_lch_out, out_sg, 1, + tx_out = dmaengine_prep_slave_sg(dd->dma_lch_out, out_sg, out_sg_len, DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!tx_out) { @@ -551,7 +543,7 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, dma_async_issue_pending(dd->dma_lch_out); /* start DMA */ - dd->pdata->trigger(dd, length); + dd->pdata->trigger(dd, dd->total); return 0; } @@ -560,93 +552,28 @@ static int omap_aes_crypt_dma_start(struct omap_aes_dev *dd) { struct crypto_tfm *tfm = crypto_ablkcipher_tfm( crypto_ablkcipher_reqtfm(dd->req)); - int err, fast = 0, in, out; - size_t count; - dma_addr_t addr_in, addr_out; - struct scatterlist *in_sg, *out_sg; - int len32; + int err; pr_debug("total: %d\n", dd->total); - if (sg_is_last(dd->in_sg) && sg_is_last(dd->out_sg)) { - /* check for alignment */ - in = IS_ALIGNED((u32)dd->in_sg->offset, sizeof(u32)); - out = IS_ALIGNED((u32)dd->out_sg->offset, sizeof(u32)); - - fast = in && out; + err = dma_map_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); + if (!err) { + dev_err(dd->dev, "dma_map_sg() error\n"); + return -EINVAL; } - if (fast) { - count = min(dd->total, sg_dma_len(dd->in_sg)); - count = min(count, sg_dma_len(dd->out_sg)); - - if (count != dd->total) { - pr_err("request length != buffer length\n"); - return -EINVAL; - } - - pr_debug("fast\n"); - - err = dma_map_sg(dd->dev, dd->in_sg, 1, DMA_TO_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() error\n"); - return -EINVAL; - } - - err = dma_map_sg(dd->dev, dd->out_sg, 1,
[PATCH v2 10/14] crypto: omap-aes: Switch to PIO mode during probe
In cases where requesting for DMA channels fails for some reason, or channel numbers are not provided in DT or platform data, we switch to PIO-only mode also checking if platform provides IRQ numbers and interrupt register offsets in DT and platform data. All dma-only paths are avoided in this mode. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 62b3260..d214dbf 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -1064,7 +1064,7 @@ static int omap_aes_probe(struct platform_device *pdev) struct omap_aes_dev *dd; struct crypto_alg *algp; struct resource res; - int err = -ENOMEM, i, j; + int err = -ENOMEM, i, j, irq = -1; u32 reg; dd = kzalloc(sizeof(struct omap_aes_dev), GFP_KERNEL); @@ -1108,8 +1108,23 @@ static int omap_aes_probe(struct platform_device *pdev) tasklet_init(>queue_task, omap_aes_queue_task, (unsigned long)dd); err = omap_aes_dma_init(dd); - if (err) - goto err_dma; + if (err && AES_REG_IRQ_STATUS(dd) && AES_REG_IRQ_ENABLE(dd)) { + dd->pio_only = 1; + + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + dev_err(dev, "can't get IRQ resource\n"); + goto err_irq; + } + + err = request_irq(irq, omap_aes_irq, 0, + dev_name(dev), dd); + if (err) { + dev_err(dev, "Unable to grab omap-aes IRQ\n"); + goto err_irq; + } + } + INIT_LIST_HEAD(>list); spin_lock(_lock); @@ -1137,8 +1152,11 @@ err_algs: for (j = dd->pdata->algs_info[i].registered - 1; j >= 0; j--) crypto_unregister_alg( >pdata->algs_info[i].algs_list[j]); - omap_aes_dma_cleanup(dd); -err_dma: + if (dd->pio_only) + free_irq(irq, dd); + else + omap_aes_dma_cleanup(dd); +err_irq: tasklet_kill(>done_task); tasklet_kill(>queue_task); pm_runtime_disable(dev); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 09/14] crypto: omap-aes: PIO mode: platform data for OMAP4/AM437x and trigger
We initialize the scatter gather walk lists needed for PIO mode and avoid all DMA paths such as mapping/unmapping buffers by checking for the pio_only flag. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 43 ++- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 889dc99..62b3260 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -168,6 +168,7 @@ struct omap_aes_dev { struct dma_chan *dma_lch_out; int in_sg_len; int out_sg_len; + int pio_only; const struct omap_aes_pdata *pdata; }; @@ -413,6 +414,16 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, struct dma_slave_config cfg; int ret; + if (dd->pio_only) { + scatterwalk_start(>in_walk, dd->in_sg); + scatterwalk_start(>out_walk, dd->out_sg); + + /* Enable DATAIN interrupt and let it take + care of the rest */ + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x2); + return 0; + } + dma_sync_sg_for_device(dd->dev, dd->in_sg, in_sg_len, DMA_TO_DEVICE); memset(, 0, sizeof(cfg)); @@ -482,21 +493,25 @@ static int omap_aes_crypt_dma_start(struct omap_aes_dev *dd) pr_debug("total: %d\n", dd->total); - err = dma_map_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() error\n"); - return -EINVAL; - } + if (!dd->pio_only) { + err = dma_map_sg(dd->dev, dd->in_sg, dd->in_sg_len, +DMA_TO_DEVICE); + if (!err) { + dev_err(dd->dev, "dma_map_sg() error\n"); + return -EINVAL; + } - err = dma_map_sg(dd->dev, dd->out_sg, dd->out_sg_len, DMA_FROM_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() error\n"); - return -EINVAL; + err = dma_map_sg(dd->dev, dd->out_sg, dd->out_sg_len, +DMA_FROM_DEVICE); + if (!err) { + dev_err(dd->dev, "dma_map_sg() error\n"); + return -EINVAL; + } } err = omap_aes_crypt_dma(tfm, dd->in_sg, dd->out_sg, dd->in_sg_len, dd->out_sg_len); - if (err) { + if (err && !dd->pio_only) { dma_unmap_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); dma_unmap_sg(dd->dev, dd->out_sg, dd->out_sg_len, DMA_FROM_DEVICE); @@ -600,9 +615,11 @@ static void omap_aes_done_task(unsigned long data) pr_debug("enter done_task\n"); - dma_sync_sg_for_cpu(dd->dev, dd->in_sg, dd->in_sg_len, DMA_FROM_DEVICE); - - omap_aes_crypt_dma_stop(dd); + if (!dd->pio_only) { + dma_sync_sg_for_device(dd->dev, dd->out_sg, dd->out_sg_len, + DMA_FROM_DEVICE); + omap_aes_crypt_dma_stop(dd); + } omap_aes_finish_req(dd, 0); omap_aes_handle_queue(dd, NULL); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 05/14] crypto: omap-aes: Sync SG before DMA operation
Earlier functions that did a similar sync are replaced by the dma_sync_sg_* which can operate on entire SG list. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |4 1 file changed, 4 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 64dd5c1..5e034a1 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -489,6 +489,8 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, struct dma_slave_config cfg; int ret; + dma_sync_sg_for_device(dd->dev, dd->in_sg, in_sg_len, DMA_TO_DEVICE); + memset(, 0, sizeof(cfg)); cfg.src_addr = dd->phys_base + AES_REG_DATA_N(dd, 0); @@ -676,6 +678,8 @@ static void omap_aes_done_task(unsigned long data) pr_debug("enter done_task\n"); + dma_sync_sg_for_cpu(dd->dev, dd->in_sg, dd->in_sg_len, DMA_FROM_DEVICE); + omap_aes_crypt_dma_stop(dd); omap_aes_finish_req(dd, 0); omap_aes_handle_queue(dd, NULL); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 13/14] crypto: omap-aes: Convert request_irq to devm_request_irq
Keeps request_irq exit/error code paths simpler. Suggested-by: Lokesh Vutla Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 6a4ac4a..ab449b5 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -1197,7 +1197,7 @@ static int omap_aes_probe(struct platform_device *pdev) goto err_irq; } - err = request_irq(irq, omap_aes_irq, 0, + err = devm_request_irq(dev, irq, omap_aes_irq, 0, dev_name(dev), dd); if (err) { dev_err(dev, "Unable to grab omap-aes IRQ\n"); @@ -1232,9 +1232,7 @@ err_algs: for (j = dd->pdata->algs_info[i].registered - 1; j >= 0; j--) crypto_unregister_alg( >pdata->algs_info[i].algs_list[j]); - if (dd->pio_only) - free_irq(irq, dd); - else + if (!dd->pio_only) omap_aes_dma_cleanup(dd); err_irq: tasklet_kill(>done_task); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 11/14] crypto: omap-aes: Add support for cases of unaligned lengths
For cases where offset/length of on any page of the input SG is not aligned by AES_BLOCK_SIZE, we copy all the pages from the input SG list into a contiguous buffer and prepare a single element SG list for this buffer with length as the total bytes to crypt. This is requried for cases such as when an SG list of 16 bytes total size contains 16 pages each containing 1 byte. DMA using the direct buffers of such instances is not possible. For this purpose, we first detect if the unaligned case and accordingly allocate enough number of pages to satisfy the request and prepare SG lists. We then copy data into the buffer, and copy data out of it on completion. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 86 +++-- 1 file changed, 83 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index d214dbf..cbeaaf4 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -157,9 +157,23 @@ struct omap_aes_dev { struct tasklet_struct queue_task; struct ablkcipher_request *req; + + /* +* total is used by PIO mode for book keeping so introduce +* variable total_save as need it to calc page_order +*/ size_t total; + size_t total_save; + struct scatterlist *in_sg; struct scatterlist *out_sg; + + /* Buffers for copying for unaligned cases */ + struct scatterlist in_sgl; + struct scatterlist out_sgl; + struct scatterlist *orig_out; + int sgs_copied; + struct scatter_walk in_walk; struct scatter_walk out_walk; int dma_in; @@ -543,12 +557,51 @@ static int omap_aes_crypt_dma_stop(struct omap_aes_dev *dd) dmaengine_terminate_all(dd->dma_lch_in); dmaengine_terminate_all(dd->dma_lch_out); - dma_unmap_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); - dma_unmap_sg(dd->dev, dd->out_sg, dd->out_sg_len, DMA_FROM_DEVICE); - return err; } +int omap_aes_check_aligned(struct scatterlist *sg) +{ + while (sg) { + if (!IS_ALIGNED(sg->offset, 4)) + return -1; + if (!IS_ALIGNED(sg->length, AES_BLOCK_SIZE)) + return -1; + sg = sg_next(sg); + } + return 0; +} + +int omap_aes_copy_sgs(struct omap_aes_dev *dd) +{ + void *buf_in, *buf_out; + int pages; + + pages = get_order(dd->total); + + buf_in = (void *)__get_free_pages(GFP_ATOMIC, pages); + buf_out = (void *)__get_free_pages(GFP_ATOMIC, pages); + + if (!buf_in || !buf_out) { + pr_err("Couldn't allocated pages for unaligned cases.\n"); + return -1; + } + + dd->orig_out = dd->out_sg; + + sg_copy_buf(buf_in, dd->in_sg, 0, dd->total, 0); + + sg_init_table(>in_sgl, 1); + sg_set_buf(>in_sgl, buf_in, dd->total); + dd->in_sg = >in_sgl; + + sg_init_table(>out_sgl, 1); + sg_set_buf(>out_sgl, buf_out, dd->total); + dd->out_sg = >out_sgl; + + return 0; +} + static int omap_aes_handle_queue(struct omap_aes_dev *dd, struct ablkcipher_request *req) { @@ -582,9 +635,19 @@ static int omap_aes_handle_queue(struct omap_aes_dev *dd, /* assign new request to device */ dd->req = req; dd->total = req->nbytes; + dd->total_save = req->nbytes; dd->in_sg = req->src; dd->out_sg = req->dst; + if (omap_aes_check_aligned(dd->in_sg) || + omap_aes_check_aligned(dd->out_sg)) { + if (omap_aes_copy_sgs(dd)) + pr_err("Failed to copy SGs for unaligned cases\n"); + dd->sgs_copied = 1; + } else { + dd->sgs_copied = 0; + } + dd->in_sg_len = scatterwalk_bytes_sglen(dd->in_sg, dd->total); dd->out_sg_len = scatterwalk_bytes_sglen(dd->out_sg, dd->total); BUG_ON(dd->in_sg_len < 0 || dd->out_sg_len < 0); @@ -612,14 +675,31 @@ static int omap_aes_handle_queue(struct omap_aes_dev *dd, static void omap_aes_done_task(unsigned long data) { struct omap_aes_dev *dd = (struct omap_aes_dev *)data; + void *buf_in, *buf_out; + int pages; pr_debug("enter done_task\n"); if (!dd->pio_only) { dma_sync_sg_for_device(dd->dev, dd->out_sg, dd->out_sg_len, DMA_FROM_DEVICE); + dma_unmap_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); +
[PATCH v2 01/14] crypto: scatterwalk: Add support for calculating number of SG elements
Crypto layer only passes nbytes to encrypt but in omap-aes driver we need to know number of SG elements to pass to dmaengine slave API. We add function for the same to scatterwalk library. Signed-off-by: Joel Fernandes --- crypto/scatterwalk.c | 22 ++ include/crypto/scatterwalk.h |2 ++ 2 files changed, 24 insertions(+) diff --git a/crypto/scatterwalk.c b/crypto/scatterwalk.c index 7281b8a..79ca227 100644 --- a/crypto/scatterwalk.c +++ b/crypto/scatterwalk.c @@ -124,3 +124,25 @@ void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg, scatterwalk_done(, out, 0); } EXPORT_SYMBOL_GPL(scatterwalk_map_and_copy); + +int scatterwalk_bytes_sglen(struct scatterlist *sg, int num_bytes) +{ + int offset = 0, n = 0; + + /* num_bytes is too small */ + if (num_bytes < sg->length) + return -1; + + do { + offset += sg->length; + n++; + sg = scatterwalk_sg_next(sg); + + /* num_bytes is too large */ + if (unlikely(!sg && (num_bytes < offset))) + return -1; + } while (sg && (num_bytes > offset)); + + return n; +} +EXPORT_SYMBOL_GPL(scatterwalk_bytes_sglen); diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h index 3744d2a..13621cc 100644 --- a/include/crypto/scatterwalk.h +++ b/include/crypto/scatterwalk.h @@ -113,4 +113,6 @@ void scatterwalk_done(struct scatter_walk *walk, int out, int more); void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg, unsigned int start, unsigned int nbytes, int out); +int scatterwalk_bytes_sglen(struct scatterlist *sg, int num_bytes); + #endif /* _CRYPTO_SCATTERWALK_H */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 03/14] crypto: omap-aes: Populate number of SG elements
Crypto layer only passes nbytes but number of SG elements is needed for mapping or unmapping SGs at one time using dma_map* API and also needed to pass in for dmaengine prep function. We call function added to scatterwalk for this purpose in omap_aes_handle_queue to populate the values which are used later. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 26b802b..e369e6e 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -164,6 +164,8 @@ struct omap_aes_dev { void*buf_out; int dma_out; struct dma_chan *dma_lch_out; + int in_sg_len; + int out_sg_len; dma_addr_t dma_addr_out; const struct omap_aes_pdata *pdata; @@ -731,6 +733,10 @@ static int omap_aes_handle_queue(struct omap_aes_dev *dd, dd->out_offset = 0; dd->out_sg = req->dst; + dd->in_sg_len = scatterwalk_bytes_sglen(dd->in_sg, dd->total); + dd->out_sg_len = scatterwalk_bytes_sglen(dd->out_sg, dd->total); + BUG_ON(dd->in_sg_len < 0 || dd->out_sg_len < 0); + rctx = ablkcipher_request_ctx(req); ctx = crypto_ablkcipher_ctx(crypto_ablkcipher_reqtfm(req)); rctx->mode &= FLAGS_MODE_MASK; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 14/14] crypto: omap-aes: Kconfig: Add build support for AM437x
For AM437x SoC, ARCH_OMAP2 and ARCH_OMAP3 is not enabled in the defconfig. We follow same thing as SHA driver, and add depends on ARCH_OMAP2PLUS so that the config is selectable for AES driver on AM437x SoC builds. Signed-off-by: Joel Fernandes --- drivers/crypto/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index dffb855..e289afa 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -252,7 +252,7 @@ config CRYPTO_DEV_OMAP_SHAM config CRYPTO_DEV_OMAP_AES tristate "Support for OMAP AES hw engine" - depends on ARCH_OMAP2 || ARCH_OMAP3 + depends on ARCH_OMAP2 || ARCH_OMAP3 || ARCH_OMAP2PLUS select CRYPTO_AES select CRYPTO_BLKCIPHER2 help -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 02/14] crypto: omap-aes: Add useful debug macros
On 08/17/2013 11:22 PM, Joe Perches wrote: > On Sat, 2013-08-17 at 21:42 -0500, Joel Fernandes wrote: >> When DEBUG is enabled, these macros can be used to print variables in integer >> and hex format, and clearly display which registers, offsets and values are >> being read/written , including printing the names of the offsets and their >> values. >> >> Note: >> This patch results in a checkpatch error that cannot be fixed. >> ERROR: Macros with multiple statements should be enclosed in a do - while >> loop >> +#define omap_aes_read(dd, offset) \ >> + __raw_readl(dd->io_base + offset); \ >> + pr_debug("omap_aes_read(" #offset ")\n"); >> >> Using do-while loop will break a lot of code such as: >> ret = omap_aes_read(..); > > That's where you use a statement expression macro > > #define omap_aes_read(dd, offset) \ > ({\ > pr_debug("omap_aes_read("omap_aes_read(" #offset ")\n");\ > __raw_readl((dd)->iobase + offset); \ > }) > That made things a lot simpler, thanks. Re-spinning just this patch as below: --->8--- From: Joel Fernandes Subject: [PATCH] crypto: omap-aes: Add useful debug macros When DEBUG is enabled, these macros can be used to print variables in integer and hex format, and clearly display which registers, offsets and values are being read/written , including printing the names of the offsets and their values. Using statement expression macros in read path as, Suggested-by: Joe Perches Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index ee15b0f..e26d4d4 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -13,7 +13,9 @@ * */ -#define pr_fmt(fmt) "%s: " fmt, __func__ +#define pr_fmt(fmt) "%20s: " fmt, __func__ +#define prn(num) pr_debug(#num "=%d\n", num) +#define prx(num) pr_debug(#num "=%x\n", num) #include #include @@ -172,16 +174,36 @@ struct omap_aes_dev { static LIST_HEAD(dev_list); static DEFINE_SPINLOCK(list_lock); +#ifdef DEBUG +#define omap_aes_read(dd, offset) \ +({ \ + int _read_ret; \ + _read_ret = __raw_readl(dd->io_base + offset); \ + pr_debug("omap_aes_read(" #offset "=%#x)= %#x\n", \ +offset, _read_ret);\ + _read_ret; \ +}) +#else static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) { return __raw_readl(dd->io_base + offset); } +#endif +#ifdef DEBUG +#define omap_aes_write(dd, offset, value) \ + do {\ + pr_debug("omap_aes_write(" #offset "=%#x) value=%#x\n", \ +offset, value);\ + __raw_writel(value, dd->io_base + offset); \ + } while (0) +#else static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, u32 value) { __raw_writel(value, dd->io_base + offset); } +#endif static inline void omap_aes_write_mask(struct omap_aes_dev *dd, u32 offset, u32 value, u32 mask) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 04/14] crypto: omap-aes: Simplify DMA usage by using direct SGs
On 08/20/2013 07:57 AM, Lokesh Vutla wrote: > Hi Joel, > > On Sunday 18 August 2013 08:12 AM, Joel Fernandes wrote: >> In early version of this driver, assumptions were made such as DMA layer >> requires contiguous buffers etc. Due to this, new buffers were allocated, >> mapped and used for DMA. These assumptions are no longer true and DMAEngine >> scatter-gather DMA doesn't have such requirements. We simply the DMA >> operations >> by directly using the scatter-gather buffers provided by the crypto layer >> instead of creating our own. >> >> Lot of logic that handled DMA'ing only X number of bytes of the total, or as >> much as fitted into a 3rd party buffer is removed and is no longer required. >> >> Also, good performance improvement of atleast ~20% seen with encrypting a >> buffer size of 8K (1800 ops/sec vs 1400 ops/sec). Improvement will be higher >> for much larger blocks though such benchmarking is left as an exercise for >> the >> reader. Also DMA usage is much more simplified and coherent with rest of the >> code. >> >> Signed-off-by: Joel Fernandes >> --- >> drivers/crypto/omap-aes.c | 147 >> - >> 1 file changed, 25 insertions(+), 122 deletions(-) >> >> diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c >> index e369e6e..64dd5c1 100644 >> --- a/drivers/crypto/omap-aes.c >> +++ b/drivers/crypto/omap-aes.c >> @@ -480,22 +480,14 @@ static int sg_copy(struct scatterlist **sg, size_t >> *offset, void *buf, >> } >> >> static int omap_aes_crypt_dma(struct crypto_tfm *tfm, >> -struct scatterlist *in_sg, struct scatterlist *out_sg) >> +struct scatterlist *in_sg, struct scatterlist *out_sg, >> +int in_sg_len, int out_sg_len) >> { >> struct omap_aes_ctx *ctx = crypto_tfm_ctx(tfm); >> struct omap_aes_dev *dd = ctx->dd; >> struct dma_async_tx_descriptor *tx_in, *tx_out; >> struct dma_slave_config cfg; >> -dma_addr_t dma_addr_in = sg_dma_address(in_sg); >> -int ret, length = sg_dma_len(in_sg); >> - >> -pr_debug("len: %d\n", length); >> - >> -dd->dma_size = length; >> - >> -if (!(dd->flags & FLAGS_FAST)) >> -dma_sync_single_for_device(dd->dev, dma_addr_in, length, >> - DMA_TO_DEVICE); >> +int ret; > By this change FLAGS_FAST is unsed, it can be cleaned right? > or Am I missing something? Yes, FLAGS_FAST would be unused now and can go away. Since it is very trivial change, I will make this change in the not-immediate future and submit. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 02/12] ARM: edma: Don't clear EMR of channel in edma_stop
On Sun, Aug 11, 2013 at 11:25 PM, Sekhar Nori wrote: > On 8/8/2013 5:19 PM, Sekhar Nori wrote: >> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >>> We certainly don't want error conditions to be cleared any other >>> place but the EDMA error handler, as this will make us 'forget' >>> about missed events we might need to know errors have occurred. >>> >>> This fixes a race condition where the EMR was being cleared >>> by the transfer completion interrupt handler. >>> >>> Basically, what was happening was: >>> >>> Missed event >>> | >>> | >>> V >>> SG1-SG2-SG3-Null >>> \ >>> \__TC Interrupt (Almost same time as ARM is executing >>> TC interrupt handler, an event got missed and also forgotten >>> by clearing the EMR). >>> >>> This causes the following problems: >>> >>> 1. >>> If error interrupt is also pending and TC interrupt clears the EMR >>> by calling edma_stop as has been observed in the edma_callback function, >>> the ARM will execute the error interrupt even though the EMR is clear. >>> As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens >>> enough number of times, IRQ subsystem disables the interrupt thinking >>> its spurious which makes error handler never execute again. >>> >>> 2. >>> Also even if error handler doesn't return IRQ_NONE, the removing of EMR >>> removes the knowledge about which channel had a missed event, and thus >>> a manual trigger on such channels cannot be performed. >>> >>> The EMR is ultimately being cleared by the Error interrupt handler >>> once it is handled so we remove code that does it in edma_stop and >>> allow it to happen there. >>> >>> Signed-off-by: Joel Fernandes >> >> Queuing this for v3.11 fixes. While committing, I changed the headline >> to remove capitalization and made it more readable by removing register >> level details. The new headline is: >> >> ARM: edma: don't clear missed events in edma_stop() > > Forgot to ask, should this be tagged for stable? IOW, how serious is > this race in current kernel (without the entire series applied)? I have > never observed it myself - so please provide details how easy/difficult > it is to hit this condition. The race was uncovered by recent EDMA patch series, So this patch can go in for next kernel release as such, I am not aware of any other DMA user that maybe uncovering the race condition. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 08/12] dma: edma: Add one more required slot to MAX slots
On 08/12/2013 08:31 AM, Mark Brown wrote: > On Mon, Aug 12, 2013 at 11:26:49AM +0530, Sekhar Nori wrote: > >> No need for a separate patch for this, just do this in the patch where >> you include the two linked sets. > > Can you guys please think about the CC lists you're using for these > patch serieses? I've certainly no interest in random patches to the > DaVinci DMA controllers, and I suspect the same is true for most of the > people on the CC list. > Sorry. I was actually getting a sense of this. What I did initially was I copied the CC list from last years work on the same series hoping that it would be a complete mail chain. Bad judgement on my part. I should have paid more attention to Mark's rants about his inbox getting spammed with this series in those old threads ;) I guess moving forward I will use get_maintainer.pl and refine my git-send scripts with only interested folks. Any words of advice on when to and when not to CC someone remotely connected to a series is welcome. @Sekhar, let's keep only linux-omap, linux-davinci and few other interested folks on the CC chain for the rest of review on the series. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 01/12] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
Dropped quite a few from the CC list... On 08/12/2013 02:15 AM, Sekhar Nori wrote: > On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >> Changes are made here for configuring existing parameters to support >> DMA'ing them out in batches as needed. >> >> Also allocate as many as slots as needed by the SG list, but not more >> than MAX_NR_SG. Then these slots will be reused accordingly. >> For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only >> 10 slots will be allocated to DMA the entire SG list of size 40. >> >> Signed-off-by: Joel Fernandes >> --- >> drivers/dma/edma.c | 14 +++--- >> 1 file changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c >> index 5f3e532..7b0853c 100644 >> --- a/drivers/dma/edma.c >> +++ b/drivers/dma/edma.c >> @@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor >> *edma_prep_slave_sg( >> enum dma_slave_buswidth dev_width; >> u32 burst; >> struct scatterlist *sg; >> -int i; >> int acnt, bcnt, ccnt, src, dst, cidx; >> int src_bidx, dst_bidx, src_cidx, dst_cidx; >> +int i, num_slots_needed; > > 'nslots' is more to my liking. Better keep variable names short. > >> >> if (unlikely(!echan || !sgl || !sg_len)) >> return NULL; >> @@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor >> *edma_prep_slave_sg( >> >> edesc->pset_nr = sg_len; >> >> -for_each_sg(sgl, sg, sg_len, i) { >> -/* Allocate a PaRAM slot, if needed */ >> +/* Allocate a PaRAM slot, if needed */ >> + >> +num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len; > > nslots = min(MAX_NR_SG, sg_len); I agree the original naming was quite long. I would rather using something more descriptive though than nslots. How does slots_needed sound? Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 01/12] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
Responding to other comments in this post, On 08/12/2013 02:15 AM, Sekhar Nori wrote: [..] >> if (unlikely(!echan || !sgl || !sg_len)) >> return NULL; >> @@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor >> *edma_prep_slave_sg( >> >> edesc->pset_nr = sg_len; >> >> -for_each_sg(sgl, sg, sg_len, i) { >> -/* Allocate a PaRAM slot, if needed */ >> +/* Allocate a PaRAM slot, if needed */ >> + >> +num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len; > > nslots = min(MAX_NR_SG, sg_len); Changed to this, with the +1 > >> + >> +for (i = 0; i < num_slots_needed; i++) { >> if (echan->slot[i] < 0) { >> echan->slot[i] = >> edma_alloc_slot(EDMA_CTLR(echan->ch_num), >> @@ -273,6 +276,10 @@ static struct dma_async_tx_descriptor >> *edma_prep_slave_sg( >> return NULL; >> } >> } >> +} >> + >> +/* Configure PaRAM sets for each SG */ >> +for_each_sg(sgl, sg, sg_len, i) { >> >> acnt = dev_width; >> >> @@ -330,6 +337,7 @@ static struct dma_async_tx_descriptor >> *edma_prep_slave_sg( >> /* Configure A or AB synchronized transfers */ >> if (edesc->absync) >> edesc->pset[i].opt |= SYNCDIM; >> + > > Random extra newline. Removing.. > > The patch as such is fine, but I dont think it makes lot of sense > standalone. This needs to be merged into the patch where you actually > handle the entire SG list in batches. I think it does actually, this patch just takes care of preparing the param set list correctly and allocating slots. It doesn't the actual DMA or take part in the algorithm. As a result, the patch can be reused incase in future the main algorithm is rewritten in a subsequent series. Further this patch was reused straight from old implementation so it proved to be useful being a separate patch last time. I also plan to rewrite just this functionality in the future. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 01/12] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
On 08/12/2013 06:55 PM, Joel Fernandes wrote: > Dropped quite a few from the CC list... > > On 08/12/2013 02:15 AM, Sekhar Nori wrote: >> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >>> Changes are made here for configuring existing parameters to support >>> DMA'ing them out in batches as needed. >>> >>> Also allocate as many as slots as needed by the SG list, but not more >>> than MAX_NR_SG. Then these slots will be reused accordingly. >>> For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only >>> 10 slots will be allocated to DMA the entire SG list of size 40. >>> >>> Signed-off-by: Joel Fernandes >>> --- >>> drivers/dma/edma.c | 14 +++--- >>> 1 file changed, 11 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c >>> index 5f3e532..7b0853c 100644 >>> --- a/drivers/dma/edma.c >>> +++ b/drivers/dma/edma.c >>> @@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor >>> *edma_prep_slave_sg( >>> enum dma_slave_buswidth dev_width; >>> u32 burst; >>> struct scatterlist *sg; >>> - int i; >>> int acnt, bcnt, ccnt, src, dst, cidx; >>> int src_bidx, dst_bidx, src_cidx, dst_cidx; >>> + int i, num_slots_needed; >> >> 'nslots' is more to my liking. Better keep variable names short. >> >>> >>> if (unlikely(!echan || !sgl || !sg_len)) >>> return NULL; >>> @@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor >>> *edma_prep_slave_sg( >>> >>> edesc->pset_nr = sg_len; >>> >>> - for_each_sg(sgl, sg, sg_len, i) { >>> - /* Allocate a PaRAM slot, if needed */ >>> + /* Allocate a PaRAM slot, if needed */ >>> + >>> + num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len; >> >> nslots = min(MAX_NR_SG, sg_len); > > I agree the original naming was quite long. I would rather using > something more descriptive though than nslots. How does slots_needed sound? Sorry for the noise, nslots is fine and I've changed it to the same. -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 09/12] dma: edma: Implement multiple linked sets for continuity
On 08/12/2013 01:56 PM, Sekhar Nori wrote: > On Monday 05 August 2013 04:14 PM, Joel Fernandes wrote: >> Here we implement splitting up of the total MAX number of slots >> available for a channel into 2 cyclically linked sets. Transfer >> completion Interrupts are enabled on both linked sets and respective >> handler recycles them on completion to process the next linked set. >> Both linked sets are cyclically linked to each other to ensure >> continuity of DMA operations. Interrupt handlers execute asynchronously >> to the EDMA events and recycles the linked sets at the right time, >> as a result EDMA is not blocked or dependent on interrupts and DMA >> continues till the end of the SG-lists without any interruption. >> >> Suggested-by: Sekhar Nori >> Signed-off-by: Joel Fernandes >> --- >> drivers/dma/edma.c | 157 >> +++- >> 1 file changed, 118 insertions(+), 39 deletions(-) >> >> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c >> index df50a04..70923a2 100644 >> --- a/drivers/dma/edma.c >> +++ b/drivers/dma/edma.c >> @@ -48,6 +48,7 @@ >> >> /* Max of 16 segments per channel to conserve PaRAM slots */ >> #define MAX_NR_SG 16 >> +#define MAX_NR_LS (MAX_NR_SG >> 1) >> #define EDMA_MAX_SLOTS (MAX_NR_SG+1) >> #define EDMA_DESCRIPTORS16 >> >> @@ -57,6 +58,7 @@ struct edma_desc { >> int absync; >> int pset_nr; >> int total_processed; >> +int next_setup_linkset; >> struct edmacc_param pset[0]; >> }; >> >> @@ -140,7 +142,9 @@ static void edma_execute(struct edma_chan *echan) >> struct edma_desc *edesc; >> struct device *dev = echan->vchan.chan.device->dev; >> >> -int i, j, total_left, total_process; >> +int i, total_left, total_link_set; >> +int ls_cur_off, ls_next_off, slot_off; >> +struct edmacc_param tmp_param; >> >> /* If either we processed all psets or we're still not started */ >> if (!echan->edesc || >> @@ -159,48 +163,121 @@ static void edma_execute(struct edma_chan *echan) >> >> /* Find out how many left */ >> total_left = edesc->pset_nr - edesc->total_processed; >> -total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left; >> - >> - >> -/* Write descriptor PaRAM set(s) */ >> -for (i = 0; i < total_process; i++) { >> -j = i + edesc->total_processed; >> -edma_write_slot(echan->slot[i], >pset[j]); >> -dev_dbg(echan->vchan.chan.device->dev, >> -"\n pset[%d]:\n" >> -" chnum\t%d\n" >> -" slot\t%d\n" >> -" opt\t%08x\n" >> -" src\t%08x\n" >> -" dst\t%08x\n" >> -" abcnt\t%08x\n" >> -" ccnt\t%08x\n" >> -" bidx\t%08x\n" >> -" cidx\t%08x\n" >> -" lkrld\t%08x\n", >> -j, echan->ch_num, echan->slot[i], >> -edesc->pset[j].opt, >> -edesc->pset[j].src, >> -edesc->pset[j].dst, >> -edesc->pset[j].a_b_cnt, >> -edesc->pset[j].ccnt, >> -edesc->pset[j].src_dst_bidx, >> -edesc->pset[j].src_dst_cidx, >> -edesc->pset[j].link_bcntrld); >> -/* Link to the previous slot if not the last set */ >> -if (i != (total_process - 1)) > >> +total_link_set = total_left > MAX_NR_LS ? MAX_NR_LS : total_left; > > The name you gave here sounds like this is defining total number of > linked PaRAM sets. Rather this is actually tracking the number of PaRAM > sets (slots) in current linked set, correct? Then may be just call it > 'nslots' or even 'num_slots'? There are just too many variables with > "total" prefix to keep track of in this function! I would rather just leave this naming alone. The code is quite self documenting: total_link_set means "Calculate what's the total size of a Linkset, or total no.of slots in a linkset we n
Re: [PATCH v3 07/12] dma: edma: Add function to dump a PaRAM set from PaRAM
On 08/12/2013 12:52 AM, Sekhar Nori wrote: > On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >> Previously, such a dump function was used but it wasn't reading >> from the PaRAM, rather just from a edmacc_param structure, we >> add a helpful function for debugging that directly reads from >> the PaRAM and gives the current state correctly (including links >> and interrupt information). >> >> Signed-off-by: Joel Fernandes > > You should convert existing instances of PaRAM set dump to use this new > function along with introducing it. Well, the old dump callers were thrown out with completely rewritten code. This rewritten code already contains the dump calls. Are you saying pull out those dump pset calls into a separate patch? Thanks, -Joel > Sekhar > >> --- >> drivers/dma/edma.c | 31 +++ >> 1 file changed, 31 insertions(+) >> >> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c >> index 080d669..a242269 100644 >> --- a/drivers/dma/edma.c >> +++ b/drivers/dma/edma.c >> @@ -102,6 +102,37 @@ static void edma_desc_free(struct virt_dma_desc *vdesc) >> kfree(container_of(vdesc, struct edma_desc, vdesc)); >> } >> >> +static inline void dump_pset(struct edma_chan *echan, int slot, >> + struct edmacc_param *pset_unused, int pset_idx) >> +{ >> +struct edmacc_param pset_local, *pset; >> +pset = _local; >> + >> +edma_read_slot(slot, pset); >> + >> +dev_dbg(echan->vchan.chan.device->dev, >> +"\n pset[%d]:\n" >> +" chnum\t%d\n" >> +" slot\t%d\n" >> +" opt\t%08x\n" >> +" src\t%08x\n" >> +" dst\t%08x\n" >> +" abcnt\t%08x\n" >> +" ccnt\t%08x\n" >> +" bidx\t%08x\n" >> +" cidx\t%08x\n" >> +" lkrld\t%08x\n", >> +pset_idx, echan->ch_num, slot, >> +pset[0].opt, >> +pset[0].src, >> +pset[0].dst, >> +pset[0].a_b_cnt, >> +pset[0].ccnt, >> +pset[0].src_dst_bidx, >> +pset[0].src_dst_cidx, >> +pset[0].link_bcntrld); >> +} >> + >> /* Dispatch a queued descriptor to the controller (caller holds lock) */ >> static void edma_execute(struct edma_chan *echan) >> { >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-omap" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 08/12] dma: edma: Add one more required slot to MAX slots
On 08/12/2013 12:56 AM, Sekhar Nori wrote: > On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >> We'd now need a separate slot just for the channel and separate >> ones for the 2 linked sets, so we make adjustments to allocate >> an extra channel accordingly. >> >> Signed-off-by: Joel Fernandes > > No need for a separate patch for this, just do this in the patch where > you include the two linked sets. Ok, I dropped this patch. Anyway, most of this patch has gone away now because of other changes you suggested. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/10] crypto: omap-aes: DMA and PIO mode improvements
This patch series is a rewrite of the DMA portion of omap-aes driver and also adds support for PIO mode. Both these modes, give better performance than before. Earlier, only a single SG was used for DMA purpose, and the SG-list passed from the crypto layer was being copied and DMA'd one entry at a time. This turns out to be quite inefficient, so we replace it with much simpler code that directly passes the SG-list from crypto to the DMA layer. We also add PIO mode support to the driver, and switch to PIO mode whenever the DMA channel allocation is not available. This is only for OMAP4 platform will work on any platform on which IRQ information is populated. Tests performed on am33xx and omap4 SoCs , notice the 50% perf improvement for large 8K blocks: Sample run on am33xx (beaglebone): With DMA rewrite: [ 26.410052] test 0 (128 bit key, 16 byte blocks): 4318 operations in 1 seconds (69088 bytes) [ 27.414314] test 1 (128 bit key, 64 byte blocks): 4360 operations in 1 seconds (279040 bytes) [ 28.414406] test 2 (128 bit key, 256 byte blocks): 3609 operations in 1 seconds (923904 bytes) [ 29.414410] test 3 (128 bit key, 1024 byte blocks): 3418 operations in 1 seconds (3500032 bytes) [ 30.414510] test 4 (128 bit key, 8192 byte blocks): 1766 operations in 1 seconds (14467072 bytes) Without DMA rewrite: [ 31.920519] test 0 (128 bit key, 16 byte blocks): 4417 operations in 1 seconds (70672 bytes) [ 32.925997] test 1 (128 bit key, 64 byte blocks): 4221 operations in 1 seconds (270144 bytes) [ 33.926194] test 2 (128 bit key, 256 byte blocks): 3528 operations in 1 seconds (903168 bytes) [ 34.926225] test 3 (128 bit key, 1024 byte blocks): 3281 operations in 1 seconds (3359744 bytes) [ 35.926385] test 4 (128 bit key, 8192 byte blocks): 1460 operations in 1 seconds (11960320 bytes) With PIO mode, note the tremndous boost in performance for small blocks there: [ 27.294905] test 0 (128 bit key, 16 byte blocks): 20585 operations in 1 seconds (329360 bytes) [ 28.302282] test 1 (128 bit key, 64 byte blocks): 8106 operations in 1 seconds (518784 bytes) [ 29.302374] test 2 (128 bit key, 256 byte blocks): 2359 operations in 1 seconds (603904 bytes) [ 30.302575] test 3 (128 bit key, 1024 byte blocks): 605 operations in 1 seconds (619520 bytes) [ 31.303781] test 4 (128 bit key, 8192 byte blocks): 79 operations in 1 seconds (647168 bytes) Future work in this direction would be to dynamically change between PIO/DMA mode based on the block size. Joel Fernandes (10): crypto: scatterwalk: Add support for calculating number of SG elements crypto: omap-aes: Add useful debug macros crypto: omap-aes: Populate number of SG elements crypto: omap-aes: Simplify DMA usage by using direct SGs crypto: omap-aes: Sync SG before DMA operation crypto: omap-aes: Remove previously used intermediate buffers crypto: omap-aes: Add IRQ info and helper macros crypto: omap-aes: PIO mode: Add IRQ handler and walk SGs crypto: omap-aes: PIO mode: platform data for OMAP4 and trigger it crypto: omap-aes: Switch to PIO mode in probe function crypto/scatterwalk.c | 22 +++ drivers/crypto/omap-aes.c| 400 -- include/crypto/scatterwalk.h |2 + 3 files changed, 217 insertions(+), 207 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/10] crypto: scatterwalk: Add support for calculating number of SG elements
Crypto layer only passes nbytes to encrypt but in omap-aes driver we need to know number of SG elements to pass to dmaengine slave API. We add function for the same to scatterwalk library. Signed-off-by: Joel Fernandes --- crypto/scatterwalk.c | 22 ++ include/crypto/scatterwalk.h |2 ++ 2 files changed, 24 insertions(+) diff --git a/crypto/scatterwalk.c b/crypto/scatterwalk.c index 7281b8a..79ca227 100644 --- a/crypto/scatterwalk.c +++ b/crypto/scatterwalk.c @@ -124,3 +124,25 @@ void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg, scatterwalk_done(, out, 0); } EXPORT_SYMBOL_GPL(scatterwalk_map_and_copy); + +int scatterwalk_bytes_sglen(struct scatterlist *sg, int num_bytes) +{ + int offset = 0, n = 0; + + /* num_bytes is too small */ + if (num_bytes < sg->length) + return -1; + + do { + offset += sg->length; + n++; + sg = scatterwalk_sg_next(sg); + + /* num_bytes is too large */ + if (unlikely(!sg && (num_bytes < offset))) + return -1; + } while (sg && (num_bytes > offset)); + + return n; +} +EXPORT_SYMBOL_GPL(scatterwalk_bytes_sglen); diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h index 3744d2a..13621cc 100644 --- a/include/crypto/scatterwalk.h +++ b/include/crypto/scatterwalk.h @@ -113,4 +113,6 @@ void scatterwalk_done(struct scatter_walk *walk, int out, int more); void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg, unsigned int start, unsigned int nbytes, int out); +int scatterwalk_bytes_sglen(struct scatterlist *sg, int num_bytes); + #endif /* _CRYPTO_SCATTERWALK_H */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/10] crypto: omap-aes: Remove previously used intermediate buffers
Intermdiate buffers were allocated, mapped and used for DMA. These are no longer required as we use the SGs from crypto layer directly in previous commits in the series. Also along with it, remove the logic for copying SGs etc as they are no longer used, and all the associated variables in omap_aes_device. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 90 - 1 file changed, 90 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index df2f867..71da52b 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -156,25 +156,13 @@ struct omap_aes_dev { struct ablkcipher_request *req; size_t total; struct scatterlist *in_sg; - struct scatterlist in_sgl; - size_t in_offset; struct scatterlist *out_sg; - struct scatterlist out_sgl; - size_t out_offset; - - size_t buflen; - void*buf_in; - size_t dma_size; int dma_in; struct dma_chan *dma_lch_in; - dma_addr_t dma_addr_in; - void*buf_out; int dma_out; struct dma_chan *dma_lch_out; int in_sg_len; int out_sg_len; - dma_addr_t dma_addr_out; - const struct omap_aes_pdata *pdata; }; @@ -362,33 +350,6 @@ static int omap_aes_dma_init(struct omap_aes_dev *dd) dd->dma_lch_out = NULL; dd->dma_lch_in = NULL; - dd->buf_in = (void *)__get_free_pages(GFP_KERNEL, OMAP_AES_CACHE_SIZE); - dd->buf_out = (void *)__get_free_pages(GFP_KERNEL, OMAP_AES_CACHE_SIZE); - dd->buflen = PAGE_SIZE << OMAP_AES_CACHE_SIZE; - dd->buflen &= ~(AES_BLOCK_SIZE - 1); - - if (!dd->buf_in || !dd->buf_out) { - dev_err(dd->dev, "unable to alloc pages.\n"); - goto err_alloc; - } - - /* MAP here */ - dd->dma_addr_in = dma_map_single(dd->dev, dd->buf_in, dd->buflen, -DMA_TO_DEVICE); - if (dma_mapping_error(dd->dev, dd->dma_addr_in)) { - dev_err(dd->dev, "dma %d bytes error\n", dd->buflen); - err = -EINVAL; - goto err_map_in; - } - - dd->dma_addr_out = dma_map_single(dd->dev, dd->buf_out, dd->buflen, - DMA_FROM_DEVICE); - if (dma_mapping_error(dd->dev, dd->dma_addr_out)) { - dev_err(dd->dev, "dma %d bytes error\n", dd->buflen); - err = -EINVAL; - goto err_map_out; - } - dma_cap_zero(mask); dma_cap_set(DMA_SLAVE, mask); @@ -415,14 +376,6 @@ static int omap_aes_dma_init(struct omap_aes_dev *dd) err_dma_out: dma_release_channel(dd->dma_lch_in); err_dma_in: - dma_unmap_single(dd->dev, dd->dma_addr_out, dd->buflen, -DMA_FROM_DEVICE); -err_map_out: - dma_unmap_single(dd->dev, dd->dma_addr_in, dd->buflen, DMA_TO_DEVICE); -err_map_in: - free_pages((unsigned long)dd->buf_out, OMAP_AES_CACHE_SIZE); - free_pages((unsigned long)dd->buf_in, OMAP_AES_CACHE_SIZE); -err_alloc: if (err) pr_err("error: %d\n", err); return err; @@ -432,11 +385,6 @@ static void omap_aes_dma_cleanup(struct omap_aes_dev *dd) { dma_release_channel(dd->dma_lch_out); dma_release_channel(dd->dma_lch_in); - dma_unmap_single(dd->dev, dd->dma_addr_out, dd->buflen, -DMA_FROM_DEVICE); - dma_unmap_single(dd->dev, dd->dma_addr_in, dd->buflen, DMA_TO_DEVICE); - free_pages((unsigned long)dd->buf_out, OMAP_AES_CACHE_SIZE); - free_pages((unsigned long)dd->buf_in, OMAP_AES_CACHE_SIZE); } static void sg_copy_buf(void *buf, struct scatterlist *sg, @@ -453,42 +401,6 @@ static void sg_copy_buf(void *buf, struct scatterlist *sg, scatterwalk_done(, out, 0); } -static int sg_copy(struct scatterlist **sg, size_t *offset, void *buf, - size_t buflen, size_t total, int out) -{ - unsigned int count, off = 0; - - while (buflen && total) { - count = min((*sg)->length - *offset, total); - count = min(count, buflen); - - if (!count) - return off; - - /* -* buflen and total are AES_BLOCK_SIZE size aligned, -* so count should be also aligned -*/ - - sg_copy_buf(buf + off, *s
[PATCH 03/10] crypto: omap-aes: Populate number of SG elements
Crypto layer only passes nbytes but number of SG elements is needed for mapping/unmapping SGs at one time using dma_map* API and also needed to pass in for dmaengine prep function. We call function added to scatterwalk for this purpose in omap_aes_handle_queue to populate the values which are used later. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 3838e0a..b7189a8 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -171,6 +171,8 @@ struct omap_aes_dev { void*buf_out; int dma_out; struct dma_chan *dma_lch_out; + int in_sg_len; + int out_sg_len; dma_addr_t dma_addr_out; const struct omap_aes_pdata *pdata; @@ -741,6 +743,10 @@ static int omap_aes_handle_queue(struct omap_aes_dev *dd, dd->out_offset = 0; dd->out_sg = req->dst; + dd->in_sg_len = scatterwalk_bytes_sglen(dd->in_sg, dd->total); + dd->out_sg_len = scatterwalk_bytes_sglen(dd->out_sg, dd->total); + BUG_ON(dd->in_sg_len < 0 || dd->out_sg_len < 0); + rctx = ablkcipher_request_ctx(req); ctx = crypto_ablkcipher_ctx(crypto_ablkcipher_reqtfm(req)); rctx->mode &= FLAGS_MODE_MASK; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/10] crypto: omap-aes: Sync SG before DMA operation
Earlier functions that did a similar sync are replaced by the dma_sync_sg_* which can operate on entire SG list. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |4 1 file changed, 4 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 9104a7f..df2f867 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -499,6 +499,8 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, struct dma_slave_config cfg; int ret; + dma_sync_sg_for_device(dd->dev, dd->in_sg, in_sg_len, DMA_TO_DEVICE); + memset(, 0, sizeof(cfg)); cfg.src_addr = dd->phys_base + AES_REG_DATA_N(dd, 0); @@ -686,6 +688,8 @@ static void omap_aes_done_task(unsigned long data) pr_debug("enter done_task\n"); + dma_sync_sg_for_cpu(dd->dev, dd->in_sg, dd->in_sg_len, DMA_FROM_DEVICE); + omap_aes_crypt_dma_stop(dd); omap_aes_finish_req(dd, 0); omap_aes_handle_queue(dd, NULL); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/10] crypto: omap-aes: Simplify DMA usage by using direct SGs
In early version of this driver, assumptions were made such as DMA layer requires contiguous buffers etc. Due to this, new buffers were allocated, mapped and used for DMA. These assumptions are no longer true and DMAEngine scatter-gather DMA doesn't have such requirements. We simply the DMA operations by directly using the scatter-gather buffers provided by the crypto layer instead of creating our own. Lot of logic that handled DMA'ing only X number of bytes of the total, or as much as fitted into a 3rd party buffer is removed and is no longer required. Also, drastically improves performance (atleast 35% seen with encrypting a buffer size of 8K (2800 ops/sec vs 1800 ops/sec). Also DMA usage is much more simplified and coherent with rest of the code. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 147 - 1 file changed, 25 insertions(+), 122 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index b7189a8..9104a7f 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -490,22 +490,14 @@ static int sg_copy(struct scatterlist **sg, size_t *offset, void *buf, } static int omap_aes_crypt_dma(struct crypto_tfm *tfm, - struct scatterlist *in_sg, struct scatterlist *out_sg) + struct scatterlist *in_sg, struct scatterlist *out_sg, + int in_sg_len, int out_sg_len) { struct omap_aes_ctx *ctx = crypto_tfm_ctx(tfm); struct omap_aes_dev *dd = ctx->dd; struct dma_async_tx_descriptor *tx_in, *tx_out; struct dma_slave_config cfg; - dma_addr_t dma_addr_in = sg_dma_address(in_sg); - int ret, length = sg_dma_len(in_sg); - - pr_debug("len: %d\n", length); - - dd->dma_size = length; - - if (!(dd->flags & FLAGS_FAST)) - dma_sync_single_for_device(dd->dev, dma_addr_in, length, - DMA_TO_DEVICE); + int ret; memset(, 0, sizeof(cfg)); @@ -524,7 +516,7 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, return ret; } - tx_in = dmaengine_prep_slave_sg(dd->dma_lch_in, in_sg, 1, + tx_in = dmaengine_prep_slave_sg(dd->dma_lch_in, in_sg, in_sg_len, DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!tx_in) { @@ -543,7 +535,7 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, return ret; } - tx_out = dmaengine_prep_slave_sg(dd->dma_lch_out, out_sg, 1, + tx_out = dmaengine_prep_slave_sg(dd->dma_lch_out, out_sg, out_sg_len, DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!tx_out) { @@ -561,7 +553,7 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, dma_async_issue_pending(dd->dma_lch_out); /* start DMA */ - dd->pdata->trigger(dd, length); + dd->pdata->trigger(dd, dd->total); return 0; } @@ -570,93 +562,28 @@ static int omap_aes_crypt_dma_start(struct omap_aes_dev *dd) { struct crypto_tfm *tfm = crypto_ablkcipher_tfm( crypto_ablkcipher_reqtfm(dd->req)); - int err, fast = 0, in, out; - size_t count; - dma_addr_t addr_in, addr_out; - struct scatterlist *in_sg, *out_sg; - int len32; + int err; pr_debug("total: %d\n", dd->total); - if (sg_is_last(dd->in_sg) && sg_is_last(dd->out_sg)) { - /* check for alignment */ - in = IS_ALIGNED((u32)dd->in_sg->offset, sizeof(u32)); - out = IS_ALIGNED((u32)dd->out_sg->offset, sizeof(u32)); - - fast = in && out; + err = dma_map_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); + if (!err) { + dev_err(dd->dev, "dma_map_sg() error\n"); + return -EINVAL; } - if (fast) { - count = min(dd->total, sg_dma_len(dd->in_sg)); - count = min(count, sg_dma_len(dd->out_sg)); - - if (count != dd->total) { - pr_err("request length != buffer length\n"); - return -EINVAL; - } - - pr_debug("fast\n"); - - err = dma_map_sg(dd->dev, dd->in_sg, 1, DMA_TO_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() error\n"); - return -EINVAL; - } - - err = dma_map_sg(dd->dev, dd->out_sg, 1, DMA_FROM_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() erro
[PATCH 08/10] crypto: omap-aes: PIO mode: Add IRQ handler and walk SGs
We add an IRQ handler that implements a state-machine for PIO-mode and data structures for walking the scatter-gather list. The IRQ handler is called in succession both when data is available to read or next data can be sent for processing. This process continues till the entire in/out SG lists have been walked. Once the SG-list has been completely walked, the IRQ handler schedules the done_task tasklet. Also add a useful macro that is used through out the IRQ code for a common pattern of calculating how much an SG list has been walked. This improves code readability and avoids checkpatch errors. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 90 + 1 file changed, 90 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index c057eac..891455b 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -46,6 +46,8 @@ #define DST_MAXBURST 4 #define DMA_MIN(DST_MAXBURST * sizeof(u32)) +#define _calc_walked(inout) (dd->inout##_walk.offset - dd->inout##_sg->offset) + /* OMAP TRM gives bitfields as start:end, where start is the higher bit number. For example 7:0 */ #define FLD_MASK(start, end) (((1 << ((start) - (end) + 1)) - 1) << (end)) @@ -98,6 +100,8 @@ #define FLAGS_FAST BIT(5) #define FLAGS_BUSY BIT(6) +#define AES_BLOCK_WORDS(AES_BLOCK_SIZE >> 2) + struct omap_aes_ctx { struct omap_aes_dev *dd; @@ -163,6 +167,8 @@ struct omap_aes_dev { size_t total; struct scatterlist *in_sg; struct scatterlist *out_sg; + struct scatter_walk in_walk; + struct scatter_walk out_walk; int dma_in; struct dma_chan *dma_lch_in; int dma_out; @@ -862,6 +868,90 @@ static const struct omap_aes_pdata omap_aes_pdata_omap4 = { .minor_shift= 0, }; +static irqreturn_t omap_aes_irq(int irq, void *dev_id) +{ + struct omap_aes_dev *dd = dev_id; + u32 status, i; + u32 *src, *dst; + + status = omap_aes_read(dd, AES_REG_IRQ_STATUS(dd)); + if (status & AES_REG_IRQ_DATA_IN) { + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x0); + + BUG_ON(!dd->in_sg); + + BUG_ON(_calc_walked(in) > dd->in_sg->length); + + src = sg_virt(dd->in_sg) + _calc_walked(in); + + for (i = 0; i < AES_BLOCK_WORDS; i++) { + omap_aes_write(dd, AES_REG_DATA_N(dd, i), *src); + + scatterwalk_advance(>in_walk, 4); + if (dd->in_sg->length == _calc_walked(in)) { + dd->in_sg = scatterwalk_sg_next(dd->in_sg); + if (dd->in_sg) { + scatterwalk_start(>in_walk, + dd->in_sg); + src = sg_virt(dd->in_sg) + + _calc_walked(in); + } + } else { + src++; + } + } + + /* Clear IRQ status */ + status &= ~AES_REG_IRQ_DATA_IN; + omap_aes_write(dd, AES_REG_IRQ_STATUS(dd), status); + + /* Enable DATA_OUT interrupt */ + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x4); + + } else if (status & AES_REG_IRQ_DATA_OUT) { + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x0); + + BUG_ON(!dd->out_sg); + + BUG_ON(_calc_walked(out) > dd->out_sg->length); + + dst = sg_virt(dd->out_sg) + _calc_walked(out); + + for (i = 0; i < AES_BLOCK_WORDS; i++) { + *dst = omap_aes_read(dd, AES_REG_DATA_N(dd, i)); + scatterwalk_advance(>out_walk, 4); + if (dd->out_sg->length == _calc_walked(out)) { + dd->out_sg = scatterwalk_sg_next(dd->out_sg); + if (dd->out_sg) { + scatterwalk_start(>out_walk, + dd->out_sg); + dst = sg_virt(dd->out_sg) + + _calc_walked(out); + } + } else { + dst++; + } + } + + dd->total -= AES_BLOCK_SIZE; + + BUG_ON(dd->total < 0); + +
[PATCH 10/10] crypto: omap-aes: Switch to PIO mode in probe function
In cases where requesting for DMA channels fails for some reason, or channel numbers are not provided in DT or platform data, we switch to PIO-only mode also checking if platform provides IRQ numbers and interrupt register offsets in DT and platform data. All dma-only paths are avoided in this mode. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 54f2729..34e3b77 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -1074,7 +1074,7 @@ static int omap_aes_probe(struct platform_device *pdev) struct omap_aes_dev *dd; struct crypto_alg *algp; struct resource res; - int err = -ENOMEM, i, j; + int err = -ENOMEM, i, j, irq = -1; u32 reg; dd = kzalloc(sizeof(struct omap_aes_dev), GFP_KERNEL); @@ -1118,8 +1118,23 @@ static int omap_aes_probe(struct platform_device *pdev) tasklet_init(>queue_task, omap_aes_queue_task, (unsigned long)dd); err = omap_aes_dma_init(dd); - if (err) - goto err_dma; + if (err && AES_REG_IRQ_STATUS(dd) && AES_REG_IRQ_ENABLE(dd)) { + dd->pio_only = 1; + + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + dev_err(dev, "can't get IRQ resource\n"); + goto err_irq; + } + + err = request_irq(irq, omap_aes_irq, 0, + dev_name(dev), dd); + if (err) { + dev_err(dev, "Unable to grab omap-aes IRQ\n"); + goto err_irq; + } + } + INIT_LIST_HEAD(>list); spin_lock(_lock); @@ -1147,8 +1162,11 @@ err_algs: for (j = dd->pdata->algs_info[i].registered - 1; j >= 0; j--) crypto_unregister_alg( >pdata->algs_info[i].algs_list[j]); - omap_aes_dma_cleanup(dd); -err_dma: + if (dd->pio_only) + free_irq(irq, dd); + else + omap_aes_dma_cleanup(dd); +err_irq: tasklet_kill(>done_task); tasklet_kill(>queue_task); pm_runtime_disable(dev); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/10] crypto: omap-aes: PIO mode: platform data for OMAP4 and trigger it
We initialize the scatter gather walk lists needed for PIO mode and avoid all DMA paths such as mapping/unmapping buffers by checking for the pio_only flag. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 43 ++- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 891455b..54f2729 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -175,6 +175,7 @@ struct omap_aes_dev { struct dma_chan *dma_lch_out; int in_sg_len; int out_sg_len; + int pio_only; const struct omap_aes_pdata *pdata; }; @@ -423,6 +424,16 @@ static int omap_aes_crypt_dma(struct crypto_tfm *tfm, struct dma_slave_config cfg; int ret; + if (dd->pio_only) { + scatterwalk_start(>in_walk, dd->in_sg); + scatterwalk_start(>out_walk, dd->out_sg); + + /* Enable DATAIN interrupt and let it take + care of the rest */ + omap_aes_write(dd, AES_REG_IRQ_ENABLE(dd), 0x2); + return 0; + } + dma_sync_sg_for_device(dd->dev, dd->in_sg, in_sg_len, DMA_TO_DEVICE); memset(, 0, sizeof(cfg)); @@ -492,21 +503,25 @@ static int omap_aes_crypt_dma_start(struct omap_aes_dev *dd) pr_debug("total: %d\n", dd->total); - err = dma_map_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() error\n"); - return -EINVAL; - } + if (!dd->pio_only) { + err = dma_map_sg(dd->dev, dd->in_sg, dd->in_sg_len, +DMA_TO_DEVICE); + if (!err) { + dev_err(dd->dev, "dma_map_sg() error\n"); + return -EINVAL; + } - err = dma_map_sg(dd->dev, dd->out_sg, dd->out_sg_len, DMA_FROM_DEVICE); - if (!err) { - dev_err(dd->dev, "dma_map_sg() error\n"); - return -EINVAL; + err = dma_map_sg(dd->dev, dd->out_sg, dd->out_sg_len, +DMA_FROM_DEVICE); + if (!err) { + dev_err(dd->dev, "dma_map_sg() error\n"); + return -EINVAL; + } } err = omap_aes_crypt_dma(tfm, dd->in_sg, dd->out_sg, dd->in_sg_len, dd->out_sg_len); - if (err) { + if (err && !dd->pio_only) { dma_unmap_sg(dd->dev, dd->in_sg, dd->in_sg_len, DMA_TO_DEVICE); dma_unmap_sg(dd->dev, dd->out_sg, dd->out_sg_len, DMA_FROM_DEVICE); @@ -610,9 +625,11 @@ static void omap_aes_done_task(unsigned long data) pr_debug("enter done_task\n"); - dma_sync_sg_for_cpu(dd->dev, dd->in_sg, dd->in_sg_len, DMA_FROM_DEVICE); - - omap_aes_crypt_dma_stop(dd); + if (!dd->pio_only) { + dma_sync_sg_for_device(dd->dev, dd->out_sg, dd->out_sg_len, + DMA_FROM_DEVICE); + omap_aes_crypt_dma_stop(dd); + } omap_aes_finish_req(dd, 0); omap_aes_handle_queue(dd, NULL); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/10] crypto: omap-aes: Add IRQ info and helper macros
Add IRQ information to pdata and helper macros. These are required for PIO-mode support. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c |8 1 file changed, 8 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index 71da52b..c057eac 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -82,6 +82,10 @@ #define AES_REG_LENGTH_N(x)(0x54 + ((x) * 0x04)) +#define AES_REG_IRQ_STATUS(dd) ((dd)->pdata->irq_status_ofs) +#define AES_REG_IRQ_ENABLE(dd) ((dd)->pdata->irq_enable_ofs) +#define AES_REG_IRQ_DATA_INBIT(1) +#define AES_REG_IRQ_DATA_OUT BIT(2) #define DEFAULT_TIMEOUT(5*HZ) #define FLAGS_MODE_MASK0x000f @@ -127,6 +131,8 @@ struct omap_aes_pdata { u32 data_ofs; u32 rev_ofs; u32 mask_ofs; + u32 irq_enable_ofs; + u32 irq_status_ofs; u32 dma_enable_in; u32 dma_enable_out; @@ -846,6 +852,8 @@ static const struct omap_aes_pdata omap_aes_pdata_omap4 = { .data_ofs = 0x60, .rev_ofs= 0x80, .mask_ofs = 0x84, + .irq_status_ofs = 0x8c, + .irq_enable_ofs = 0x90, .dma_enable_in = BIT(5), .dma_enable_out = BIT(6), .major_mask = 0x0700, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/10] crypto: omap-aes: Add useful debug macros
When DEBUG is enabled, these macros can be used to print variables in integer and hex format, and clearly display which registers, offsets and values are being read/written , including printing the names of the offsets and their values. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-aes.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index ee15b0f..3838e0a 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -15,6 +15,14 @@ #define pr_fmt(fmt) "%s: " fmt, __func__ +#ifdef DEBUG +#define prn(num) printk(#num "=%d\n", num) +#define prx(num) printk(#num "=%x\n", num) +#else +#define prn(num) do { } while (0) +#define prx(num) do { } while (0) +#endif + #include #include #include @@ -172,13 +180,35 @@ struct omap_aes_dev { static LIST_HEAD(dev_list); static DEFINE_SPINLOCK(list_lock); +#ifdef DEBUG +#define omap_aes_read(dd, offset) \ + do {\ + omap_aes_read_1(dd, offset);\ + pr_debug("omap_aes_read(" #offset ")\n"); \ + } while (0) + +static inline u32 omap_aes_read_1(struct omap_aes_dev *dd, u32 offset) +#else static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) +#endif { return __raw_readl(dd->io_base + offset); } +#ifdef DEBUG +#define omap_aes_write(dd, offset, value) \ + do {\ + pr_debug("omap_aes_write(" #offset "=%x) value=%d\n", \ +offset, value);\ + omap_aes_write_1(dd, offset, value);\ + } while (0) + +static inline void omap_aes_write_1(struct omap_aes_dev *dd, u32 offset, + u32 value) +#else static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, u32 value) +#endif { __raw_writel(value, dd->io_base + offset); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] crypto: omap-aes: DMA and PIO mode improvements
On 08/14/2013 06:12 PM, Joel Fernandes wrote: > This patch series is a rewrite of the DMA portion of omap-aes driver > and also adds support for PIO mode. Both these modes, give better > performance than before. > > Earlier, only a single SG was used for DMA purpose, and the SG-list > passed from the crypto layer was being copied and DMA'd one entry at > a time. This turns out to be quite inefficient, so we replace it with > much simpler code that directly passes the SG-list from crypto to the > DMA layer. > > We also add PIO mode support to the driver, and switch to PIO mode > whenever the DMA channel allocation is not available. This is only for > OMAP4 platform will work on any platform on which IRQ information is > populated. > > Tests performed on am33xx and omap4 SoCs , notice the 50% perf improvement Just correcting, this is more like 35% not 50% when using DMA. Thanks, -Joel > for large 8K blocks: > > Sample run on am33xx (beaglebone): > With DMA rewrite: > [ 26.410052] test 0 (128 bit key, 16 byte blocks): 4318 operations in 1 > seconds (69088 bytes) > [ 27.414314] test 1 (128 bit key, 64 byte blocks): 4360 operations in 1 > seconds (279040 bytes) > [ 28.414406] test 2 (128 bit key, 256 byte blocks): 3609 operations in 1 > seconds (923904 bytes) > [ 29.414410] test 3 (128 bit key, 1024 byte blocks): 3418 operations in 1 > seconds (3500032 bytes) > [ 30.414510] test 4 (128 bit key, 8192 byte blocks): 1766 operations in 1 > seconds (14467072 bytes) > > Without DMA rewrite: > [ 31.920519] test 0 (128 bit key, 16 byte blocks): 4417 operations in 1 > seconds (70672 bytes) > [ 32.925997] test 1 (128 bit key, 64 byte blocks): 4221 operations in 1 > seconds (270144 bytes) > [ 33.926194] test 2 (128 bit key, 256 byte blocks): 3528 operations in 1 > seconds (903168 bytes) > [ 34.926225] test 3 (128 bit key, 1024 byte blocks): 3281 operations in 1 > seconds (3359744 bytes) > [ 35.926385] test 4 (128 bit key, 8192 byte blocks): 1460 operations in 1 > seconds (11960320 bytes) > > With PIO mode, note the tremndous boost in performance for small blocks there: > [ 27.294905] test 0 (128 bit key, 16 byte blocks): 20585 operations in 1 > seconds (329360 bytes) > [ 28.302282] test 1 (128 bit key, 64 byte blocks): 8106 operations in 1 > seconds (518784 bytes) > [ 29.302374] test 2 (128 bit key, 256 byte blocks): 2359 operations in 1 > seconds (603904 bytes) > [ 30.302575] test 3 (128 bit key, 1024 byte blocks): 605 operations in 1 > seconds (619520 bytes) > [ 31.303781] test 4 (128 bit key, 8192 byte blocks): 79 operations in 1 > seconds (647168 bytes) > > Future work in this direction would be to dynamically change between > PIO/DMA mode based on the block size. > > Joel Fernandes (10): > crypto: scatterwalk: Add support for calculating number of SG > elements > crypto: omap-aes: Add useful debug macros > crypto: omap-aes: Populate number of SG elements > crypto: omap-aes: Simplify DMA usage by using direct SGs > crypto: omap-aes: Sync SG before DMA operation > crypto: omap-aes: Remove previously used intermediate buffers > crypto: omap-aes: Add IRQ info and helper macros > crypto: omap-aes: PIO mode: Add IRQ handler and walk SGs > crypto: omap-aes: PIO mode: platform data for OMAP4 and trigger it > crypto: omap-aes: Switch to PIO mode in probe function > > crypto/scatterwalk.c | 22 +++ > drivers/crypto/omap-aes.c| 400 > -- > include/crypto/scatterwalk.h |2 + > 3 files changed, 217 insertions(+), 207 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/10] crypto: omap-aes: Add useful debug macros
On 08/14/2013 06:29 PM, Joe Perches wrote: > On Wed, 2013-08-14 at 18:12 -0500, Joel Fernandes wrote: >> When DEBUG is enabled, these macros can be used to print variables >> in integer and hex format, and clearly display which registers, >> offsets and values are being read/written , including printing the >> names of the offsets and their values. > [] >> diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c > [] >> @@ -15,6 +15,14 @@ >> >> #define pr_fmt(fmt) "%s: " fmt, __func__ >> >> +#ifdef DEBUG >> +#define prn(num) printk(#num "=%d\n", num) >> +#define prx(num) printk(#num "=%x\n", num) > > pr_debug? Sure, can change that. >> +#else >> +#define prn(num) do { } while (0) >> +#define prx(num) do { } while (0) >> +#endif > [] >> @@ -172,13 +180,35 @@ struct omap_aes_dev { >> static LIST_HEAD(dev_list); >> static DEFINE_SPINLOCK(list_lock); >> >> +#ifdef DEBUG >> +#define omap_aes_read(dd, offset) \ >> +do {\ >> +omap_aes_read_1(dd, offset);\ >> +pr_debug("omap_aes_read(" #offset ")\n"); \ >> +} while (0) >> + >> +static inline u32 omap_aes_read_1(struct omap_aes_dev *dd, u32 offset) >> +#else >> static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) >> +#endif >> { >> return __raw_readl(dd->io_base + offset); >> } >> >> +#ifdef DEBUG >> +#define omap_aes_write(dd, offset, value) \ >> +do {\ >> +pr_debug("omap_aes_write(" #offset "=%x) value=%d\n", \ >> + offset, value);\ >> +omap_aes_write_1(dd, offset, value);\ >> +} while (0) >> + >> +static inline void omap_aes_write_1(struct omap_aes_dev *dd, u32 offset, >> + u32 value) >> +#else >> static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, >>u32 value) >> +#endif >> { >> __raw_writel(value, dd->io_base + offset); >> } > > Umm, yuck? > > Is there any real value in read_1 and write_1? Can you be more descriptive? There is a lot of value in them for debug to show clearly sequence of read/writes. Moreover, they are no-op'd when DEBUG is disabled. Other than the obvious use, I guess it could be rewritten to be bit cleaner. I can do that. -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/10] crypto: omap-aes: Add useful debug macros
On 08/14/2013 07:47 PM, Joe Perches wrote: > On Wed, 2013-08-14 at 18:40 -0500, Joel Fernandes wrote: >> On 08/14/2013 06:29 PM, Joe Perches wrote: >>> On Wed, 2013-08-14 at 18:12 -0500, Joel Fernandes wrote: >>>> When DEBUG is enabled, these macros can be used to print variables >>>> in integer and hex format, and clearly display which registers, >>>> offsets and values are being read/written , including printing the >>>> names of the offsets and their values. >>> [] >>>> diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c >>> [] >>>> @@ -15,6 +15,14 @@ >>>> >>>> #define pr_fmt(fmt) "%s: " fmt, __func__ >>>> >>>> +#ifdef DEBUG >>>> +#define prn(num) printk(#num "=%d\n", num) >>>> +#define prx(num) printk(#num "=%x\n", num) >>> >>> pr_debug? >> >> Sure, can change that. >> >>>> +#else >>>> +#define prn(num) do { } while (0) >>>> +#define prx(num) do { } while (0) >>>> +#endif >>> [] >>>> @@ -172,13 +180,35 @@ struct omap_aes_dev { >>>> static LIST_HEAD(dev_list); >>>> static DEFINE_SPINLOCK(list_lock); >>>> >>>> +#ifdef DEBUG >>>> +#define omap_aes_read(dd, offset) \ >>>> + do {\ >>>> + omap_aes_read_1(dd, offset);\ >>>> + pr_debug("omap_aes_read(" #offset ")\n"); \ >>>> + } while (0) >>>> + >>>> +static inline u32 omap_aes_read_1(struct omap_aes_dev *dd, u32 offset) >>>> +#else >>>> static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) >>>> +#endif >>>> { >>>>return __raw_readl(dd->io_base + offset); >>>> } >>>> >>>> +#ifdef DEBUG >>>> +#define omap_aes_write(dd, offset, value) \ >>>> + do {\ >>>> + pr_debug("omap_aes_write(" #offset "=%x) value=%d\n", \ >>>> + offset, value);\ >>>> + omap_aes_write_1(dd, offset, value);\ >>>> + } while (0) >>>> + >>>> +static inline void omap_aes_write_1(struct omap_aes_dev *dd, u32 offset, >>>> +u32 value) >>>> +#else >>>> static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, >>>> u32 value) >>>> +#endif >>>> { >>>>__raw_writel(value, dd->io_base + offset); >>>> } >>> >>> Umm, yuck? >>> >>> Is there any real value in read_1 and write_1? >> >> Can you be more descriptive? There is a lot of value in them for debug >> to show clearly sequence of read/writes. Moreover, they are no-op'd when >> DEBUG is disabled. > > pr_debug is no-op'd when DEBUG is not #defined. > so just use a single > > omap_aes_write(...) > { > pr_debug(...) > __raw_writel(...); > } Actually this doesn't work, as the pr_debug cannot print the name of the offset as my original patch set does using "#offset". There are many places where named offsets are used to pass to read/write, and this macro helps to visually see which offset is being written to by name. So the original patch would stand in its current form except for a small rewrite of the write debug part of it as follows to be cleaner getting rid of the _1. For the read , we still need it as we need to return the value from a function which cannot be done in a macro. So the new patch would look something like this: #ifdef DEBUG #define omap_aes_read(dd, offset) \ omap_aes_read_1(dd, offset); \ pr_debug("omap_aes_read(" #offset ")\n"); static inline u32 omap_aes_read_1(struct omap_aes_dev *dd, u32 offset) #else static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) #endif { return __raw_readl(dd->io_base + offset); } #ifdef DEBUG #define omap_aes_write(dd, offset, value) \ do { \ pr_debug("omap_aes_write(" #offset "=%x) value=%d\n", \ offset, value); \ __raw_writel(value, dd->io_base + offset); \ } while (0) #else static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, u32 value) { __raw_writel(value, dd->io_base + offset); } #endif Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] crypto: omap-aes: DMA and PIO mode improvements
On 08/15/2013 12:58 AM, Dmitry Kasatkin wrote: > On 15/08/13 02:30, Joel Fernandes wrote: >> On 08/14/2013 06:12 PM, Joel Fernandes wrote: >>> This patch series is a rewrite of the DMA portion of omap-aes driver >>> and also adds support for PIO mode. Both these modes, give better >>> performance than before. >>> >>> Earlier, only a single SG was used for DMA purpose, and the SG-list >>> passed from the crypto layer was being copied and DMA'd one entry at >>> a time. This turns out to be quite inefficient, so we replace it with >>> much simpler code that directly passes the SG-list from crypto to the >>> DMA layer. >>> >>> We also add PIO mode support to the driver, and switch to PIO mode >>> whenever the DMA channel allocation is not available. This is only for >>> OMAP4 platform will work on any platform on which IRQ information is >>> populated. >>> >>> Tests performed on am33xx and omap4 SoCs , notice the 50% perf improvement >> Just correcting, this is more like 35% not 50% when using DMA. > > Hmm :) > > 1766/1460 = ~20% Yes sorry, I messed the cover letter up. If I resend the series again, I'll update this number. On OMAP4 though, I saw 2800 ops/sec vs 1800 ops/sec which is around 50% so it depends on SoC and DMA controller. OMAP4 uses SDMA while AM335x uses EDMA. Also with very large blocks, this improvement will be much higher as we will not be doing all the intermediate copy but I haven't tested with such large blocks. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/10] crypto: omap-aes: Add useful debug macros
On 08/15/2013 01:23 AM, Dmitry Kasatkin wrote: > On 15/08/13 06:12, Joel Fernandes wrote: >> On 08/14/2013 07:47 PM, Joe Perches wrote: >>> On Wed, 2013-08-14 at 18:40 -0500, Joel Fernandes wrote: >>>> On 08/14/2013 06:29 PM, Joe Perches wrote: >>>>> On Wed, 2013-08-14 at 18:12 -0500, Joel Fernandes wrote: >>>>>> When DEBUG is enabled, these macros can be used to print variables >>>>>> in integer and hex format, and clearly display which registers, >>>>>> offsets and values are being read/written , including printing the >>>>>> names of the offsets and their values. >>>>> [] >>>>>> diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c >>>>> [] >>>>>> @@ -15,6 +15,14 @@ >>>>>> >>>>>> #define pr_fmt(fmt) "%s: " fmt, __func__ >>>>>> >>>>>> +#ifdef DEBUG >>>>>> +#define prn(num) printk(#num "=%d\n", num) >>>>>> +#define prx(num) printk(#num "=%x\n", num) >>>>> pr_debug? >>>> Sure, can change that. >>>> >>>>>> +#else >>>>>> +#define prn(num) do { } while (0) >>>>>> +#define prx(num) do { } while (0) >>>>>> +#endif >>>>> [] >>>>>> @@ -172,13 +180,35 @@ struct omap_aes_dev { >>>>>> static LIST_HEAD(dev_list); >>>>>> static DEFINE_SPINLOCK(list_lock); >>>>>> >>>>>> +#ifdef DEBUG >>>>>> +#define omap_aes_read(dd, offset) >>>>>> \ >>>>>> +do { >>>>>> \ >>>>>> +omap_aes_read_1(dd, offset); >>>>>> \ >>>>>> +pr_debug("omap_aes_read(" #offset ")\n"); >>>>>> \ >>>>>> +} while (0) >>>>>> + >>>>>> +static inline u32 omap_aes_read_1(struct omap_aes_dev *dd, u32 offset) >>>>>> +#else >>>>>> static inline u32 omap_aes_read(struct omap_aes_dev *dd, u32 offset) >>>>>> +#endif >>>>>> { >>>>>> return __raw_readl(dd->io_base + offset); >>>>>> } >>>>>> >>>>>> +#ifdef DEBUG >>>>>> +#define omap_aes_write(dd, offset, value) >>>>>> \ >>>>>> +do { >>>>>> \ >>>>>> +pr_debug("omap_aes_write(" #offset "=%x) value=%d\n", >>>>>> \ >>>>>> + offset, value); >>>>>> \ >>>>>> +omap_aes_write_1(dd, offset, value); >>>>>> \ >>>>>> +} while (0) >>>>>> + >>>>>> +static inline void omap_aes_write_1(struct omap_aes_dev *dd, u32 offset, >>>>>> + u32 value) >>>>>> +#else >>>>>> static inline void omap_aes_write(struct omap_aes_dev *dd, u32 offset, >>>>>>u32 value) >>>>>> +#endif >>>>>> { >>>>>> __raw_writel(value, dd->io_base + offset); >>>>>> } >>>>> Umm, yuck? >>>>> >>>>> Is there any real value in read_1 and write_1? >>>> Can you be more descriptive? There is a lot of value in them for debug >>>> to show clearly sequence of read/writes. Moreover, they are no-op'd when >>>> DEBUG is disabled. >>> pr_debug is no-op'd when DEBUG is not #defined. >>> so just use a single >>> >>> omap_aes_write(...) >>> { >>> pr_debug(...) >>> __raw_writel(...); >>> } >> Actually this doesn't work, as the pr_debug cannot print the name of the >> offset as my original patch set does using "#offset". >> >> There are many places where named offsets are used to pass to >> read/write, and this macro helps to visually see which offset is being >> written to by name. >> >
[PATCH v4 5/6] dma: edma: Leave linked to Null slot instead of DUMMY slot
Dummy slot has been used as a way for missed-events not to be reported as missing. This has been particularly troublesome for cases where we might want to temporarily pause all incoming events. For EDMA DMAC, there is no way to do any such pausing of events as the occurence of the "next" event is not software controlled. Using "edma_pause" in IRQ handlers doesn't help as by then the event in concern from the slave is already missed. Linking a dummy slot, is seen to absorb these events which we didn't want to miss. So we don't link to dummy, but instead leave it linked to NULL set, allow an error condition and detect the channel that missed it. Signed-off-by: Joel Fernandes dma: edma: Link to dummy slot only for last SG list split Consider the case where we have a scatter-list like: SG1->SG2->SG3->SG4->SG5->SG6->Null For ex, for a MAX_NR_SG of 2, earlier we were splitting this as: SG1->SG2->Null SG3->SG4->Null SG5->SG6->Null Now we split it as SG1->SG2->Null SG3->SG4->Null SG5->SG6->Dummy This approach results in lesser unwanted interrupts that occur for the last list split. The Dummy slot has the property of not raising an error condition if events are missed unlike the Null slot. We are OK with this as we're done with processing the whole list once we reach Dummy. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 0e77b57..25e47d4 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -158,13 +158,18 @@ static void edma_execute(struct edma_chan *echan) /* Link to the previous slot if not the last set */ if (i != (nslots - 1)) edma_link(echan->slot[i], echan->slot[i+1]); - /* Final pset links to the dummy pset */ - else - edma_link(echan->slot[i], echan->ecc->dummy_slot); } edesc->processed += nslots; + /* +* If this is either the last set in a set of SG-list transactions +* then setup a link to the dummy slot, this results in all future +* events being absorbed and that's OK because we're done +*/ + if (edesc->processed == edesc->pset_nr) + edma_link(echan->slot[nslots-1], echan->ecc->dummy_slot); + edma_resume(echan->ch_num); if (edesc->processed <= MAX_NR_SG) { -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/6] ARM: edma: Add function to manually trigger an EDMA channel
Manual trigger for events missed as a result of splitting a scatter gather list and DMA'ing it in batches. Add a helper function to trigger a channel incase any such events are missed. Signed-off-by: Joel Fernandes --- arch/arm/common/edma.c | 17 + include/linux/platform_data/edma.h | 2 ++ 2 files changed, 19 insertions(+) diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 3567ba1..67e8cf5 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1236,6 +1236,23 @@ void edma_resume(unsigned channel) } EXPORT_SYMBOL(edma_resume); +int edma_trigger_channel(unsigned channel) +{ + unsigned ctlr; + unsigned int mask; + + ctlr = EDMA_CTLR(channel); + channel = EDMA_CHAN_SLOT(channel); + mask = BIT(channel & 0x1f); + + edma_shadow0_write_array(ctlr, SH_ESR, (channel >> 5), mask); + + pr_debug("EDMA: ESR%d %08x\n", (channel >> 5), +edma_shadow0_read_array(ctlr, SH_ESR, (channel >> 5))); + return 0; +} +EXPORT_SYMBOL(edma_trigger_channel); + /** * edma_start - start dma on a channel * @channel: channel being activated diff --git a/include/linux/platform_data/edma.h b/include/linux/platform_data/edma.h index 57300fd..179fb91 100644 --- a/include/linux/platform_data/edma.h +++ b/include/linux/platform_data/edma.h @@ -180,4 +180,6 @@ struct edma_soc_info { const s16 (*xbar_chans)[2]; }; +int edma_trigger_channel(unsigned); + #endif -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/6] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
Changes are made here for configuring existing parameters to support DMA'ing them out in batches as needed. Also allocate as many as slots as needed by the SG list, but not more than MAX_NR_SG. Then these slots will be reused accordingly. For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only 10 slots will be allocated to DMA the entire SG list of size 40. Also enable TC interrupts for slots that are a last in a current iteration, or that fall on a MAX_NR_SG boundary. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 5f3e532..e522ad5 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( enum dma_slave_buswidth dev_width; u32 burst; struct scatterlist *sg; - int i; int acnt, bcnt, ccnt, src, dst, cidx; int src_bidx, dst_bidx, src_cidx, dst_cidx; + int i, nslots; if (unlikely(!echan || !sgl || !sg_len)) return NULL; @@ -262,8 +262,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( edesc->pset_nr = sg_len; - for_each_sg(sgl, sg, sg_len, i) { - /* Allocate a PaRAM slot, if needed */ + /* Allocate a PaRAM slot, if needed */ + nslots = min_t(unsigned, MAX_NR_SG, sg_len); + + for (i = 0; i < nslots; i++) { if (echan->slot[i] < 0) { echan->slot[i] = edma_alloc_slot(EDMA_CTLR(echan->ch_num), @@ -273,6 +275,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( return NULL; } } + } + + /* Configure PaRAM sets for each SG */ + for_each_sg(sgl, sg, sg_len, i) { acnt = dev_width; @@ -330,6 +336,12 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( /* Configure A or AB synchronized transfers */ if (edesc->absync) edesc->pset[i].opt |= SYNCDIM; + + /* If this is the last in a current SG set of transactions, + enable interrupts so that next set is processed */ + if (!((i+1) % MAX_NR_SG)) + edesc->pset[i].opt |= TCINTEN; + /* If this is the last set, enable completion interrupt flag */ if (i == sg_len - 1) edesc->pset[i].opt |= TCINTEN; -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 0/6] dma: edma: Support scatter-lists of any length
The following series adds support to EDMA driver to enable DMA of scatter-gather lists of arbitrary length, but still make use of only a certain MAX number of slots at a time for a given channel. Thus free-ing up the rest of the slots to other slaves/channels. With this there is no need for slave drivers to query the EDMA driver about how much is the MAX it can send at a time as done in [1]. Drivers can send SG lists of any number of entries to DMA. Reference discussion at [2]. With this, all the patches for MMC and EDMA related to "sg limits" can be dropped. Tested omap-aes and omap_hsmmc drivers with different MAX number of slots, even just 1. In the case where it is 1, only 1-slot is used to DMA an entire scatter list of arbitrary length. Since this series touches EDMA private API code also shared with davinci-pcm, playback of a 16-bit 44.1KHz audio file with davinci-pcm has been tested. Sample test run with 1 vs 16 (MAX number of slots/SG) in omap-aes driver: MAX slots = 1: (128 bit key, 8192 byte blocks): 1266 operations in 1 seconds (10371072 bytes) MAX slots = 16: (128 bit key, 8192 byte blocks): 1601 operations in 1 seconds (13115392 bytes) Note: For the above test, 8K buffer is mapped into SG list of size 2 so only 2 slots are required. So beyond size 2, there will not be any noticeable performance improvement. But above experiment just shows as proof of concept that even using 1 slot is managed by just DMA'ing 1 SG entry at a time. Patch series history: v1: The initial patch series v2: Several style related cleanups. v3: Rewrote major portions of the series to avoid usage of error interrupts. v4: Go back to v1: After performing tests, it is determined that the v1 series is a good enough approach and that v3 triggers problems in certain cases. Tests have shown that there is not any noticeable performance difference between using vs not using error interrupts. v4 also squashes some patches in v1. At this point, we are freezing the overall architecture of the patch series to v4. [1] https://lkml.org/lkml/2013/7/18/432 [2] http://marc.info/?l=linux-omap=137416733628831=2 Joel Fernandes (6): dma: edma: Setup parameters to DMA MAX_NR_SG at a time dma: edma: Write out and handle MAX_NR_SG at a given time ARM: edma: Add function to manually trigger an EDMA channel dma: edma: Find missed events and issue them dma: edma: Leave linked to Null slot instead of DUMMY slot dma: edma: Remove limits on number of slots arch/arm/common/edma.c | 17 drivers/dma/edma.c | 164 - include/linux/platform_data/edma.h | 2 + 3 files changed, 144 insertions(+), 39 deletions(-) -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 6/6] dma: edma: Remove limits on number of slots
With this series, this check is no longer required and we shouldn't need to reject drivers DMA'ing more than the MAX number of slots. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 25e47d4..d966413 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -287,12 +287,6 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( return NULL; } - if (sg_len > MAX_NR_SG) { - dev_err(dev, "Exceeded max SG segments %d > %d\n", - sg_len, MAX_NR_SG); - return NULL; - } - edesc = kzalloc(sizeof(*edesc) + sg_len * sizeof(edesc->pset[0]), GFP_ATOMIC); if (!edesc) { -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 2/6] dma: edma: Write out and handle MAX_NR_SG at a given time
Process SG-elements in batches of MAX_NR_SG if they are greater than MAX_NR_SG. Due to this, at any given time only those many slots will be used in the given channel no matter how long the scatter list is. We keep track of how much has been written inorder to process the next batch of elements in the scatter-list and detect completion. For such intermediate transfer completions (one batch of MAX_NR_SG), make use of pause and resume functions instead of start and stop when such intermediate transfer is in progress or completed as we donot want to clear any pending events. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 79 -- 1 file changed, 53 insertions(+), 26 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index e522ad5..732829b 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -56,6 +56,7 @@ struct edma_desc { struct list_headnode; int absync; int pset_nr; + int processed; struct edmacc_param pset[0]; }; @@ -104,22 +105,34 @@ static void edma_desc_free(struct virt_dma_desc *vdesc) /* Dispatch a queued descriptor to the controller (caller holds lock) */ static void edma_execute(struct edma_chan *echan) { - struct virt_dma_desc *vdesc = vchan_next_desc(>vchan); + struct virt_dma_desc *vdesc; struct edma_desc *edesc; - int i; - - if (!vdesc) { - echan->edesc = NULL; - return; + struct device *dev = echan->vchan.chan.device->dev; + int i, j, left, nslots; + + /* If either we processed all psets or we're still not started */ + if (!echan->edesc || + echan->edesc->pset_nr == echan->edesc->processed) { + /* Get next vdesc */ + vdesc = vchan_next_desc(>vchan); + if (!vdesc) { + echan->edesc = NULL; + return; + } + list_del(>node); + echan->edesc = to_edma_desc(>tx); } - list_del(>node); + edesc = echan->edesc; - echan->edesc = edesc = to_edma_desc(>tx); + /* Find out how many left */ + left = edesc->pset_nr - edesc->processed; + nslots = min(MAX_NR_SG, left); /* Write descriptor PaRAM set(s) */ - for (i = 0; i < edesc->pset_nr; i++) { - edma_write_slot(echan->slot[i], >pset[i]); + for (i = 0; i < nslots; i++) { + j = i + edesc->processed; + edma_write_slot(echan->slot[i], >pset[j]); dev_dbg(echan->vchan.chan.device->dev, "\n pset[%d]:\n" " chnum\t%d\n" @@ -132,24 +145,31 @@ static void edma_execute(struct edma_chan *echan) " bidx\t%08x\n" " cidx\t%08x\n" " lkrld\t%08x\n", - i, echan->ch_num, echan->slot[i], - edesc->pset[i].opt, - edesc->pset[i].src, - edesc->pset[i].dst, - edesc->pset[i].a_b_cnt, - edesc->pset[i].ccnt, - edesc->pset[i].src_dst_bidx, - edesc->pset[i].src_dst_cidx, - edesc->pset[i].link_bcntrld); + j, echan->ch_num, echan->slot[i], + edesc->pset[j].opt, + edesc->pset[j].src, + edesc->pset[j].dst, + edesc->pset[j].a_b_cnt, + edesc->pset[j].ccnt, + edesc->pset[j].src_dst_bidx, + edesc->pset[j].src_dst_cidx, + edesc->pset[j].link_bcntrld); /* Link to the previous slot if not the last set */ - if (i != (edesc->pset_nr - 1)) + if (i != (nslots - 1)) edma_link(echan->slot[i], echan->slot[i+1]); /* Final pset links to the dummy pset */ else edma_link(echan->slot[i], echan->ecc->dummy_slot); } - edma_start(echan->ch_num); + edesc->processed += nslots; + + edma_resume(echan->ch_num); + + if (edesc->processed <= MAX_NR_SG) { + dev_dbg(dev, "first transfer starting %d\n", echan->ch_num); + edma_start(echan->ch_num); + } } static int edma_terminate_all(struct edma_chan *echan) @@ -368,19 +388,26 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
[PATCH v4 4/6] dma: edma: Find missed events and issue them
In an effort to move to using Scatter gather lists of any size with EDMA as discussed at [1] instead of placing limitations on the driver, we work through the limitations of the EDMAC hardware to find missed events and issue them. The sequence of events that require this are: For the scenario where MAX slots for an EDMA channel is 3: SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null The above SG list will have to be DMA'd in 2 sets: (1) SG1 -> SG2 -> SG3 -> Null (2) SG4 -> SG5 -> SG6 -> Null After (1) is succesfully transferred, the events from the MMC controller donot stop coming and are missed by the time we have setup the transfer for (2). So here, we catch the events missed as an error condition and issue them manually. In the second part of the patch, we make handle the NULL slot cases: For crypto IP, we continue to receive events even continuously in NULL slot, the setup of the next set of SG elements happens after the error handler executes. This is results in some recursion problems. Due to this, we continously receive error interrupts when we manually trigger an event from the error handler. We fix this, by first detecting if the Channel is currently transferring from a NULL slot or not, that's where the edma_read_slot in the error callback from interrupt handler comes in. With this we can determine if the set up of the next SG list has completed, and we manually trigger only in this case. If the setup has _not_ completed, we are still in NULL so we just set a missed flag and allow the manual triggerring to happen in edma_execute which will be eventually called. This fixes the above mentioned race conditions seen with the crypto drivers. [1] http://marc.info/?l=linux-omap=137416733628831=2 Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 50 +- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 732829b..0e77b57 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -70,6 +70,7 @@ struct edma_chan { int ch_num; boolalloced; int slot[EDMA_MAX_SLOTS]; + int missed; struct dma_slave_config cfg; }; @@ -170,6 +171,20 @@ static void edma_execute(struct edma_chan *echan) dev_dbg(dev, "first transfer starting %d\n", echan->ch_num); edma_start(echan->ch_num); } + + /* +* This happens due to setup times between intermediate transfers +* in long SG lists which have to be broken up into transfers of +* MAX_NR_SG +*/ + if (echan->missed) { + dev_dbg(dev, "missed event in execute detected\n"); + edma_clean_channel(echan->ch_num); + edma_stop(echan->ch_num); + edma_start(echan->ch_num); + edma_trigger_channel(echan->ch_num); + echan->missed = 0; + } } static int edma_terminate_all(struct edma_chan *echan) @@ -387,6 +402,7 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) struct device *dev = echan->vchan.chan.device->dev; struct edma_desc *edesc; unsigned long flags; + struct edmacc_param p; /* Pause the channel */ edma_pause(echan->ch_num); @@ -414,7 +430,39 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) break; case DMA_CC_ERROR: - dev_dbg(dev, "transfer error on channel %d\n", ch_num); + spin_lock_irqsave(>vchan.lock, flags); + + edma_read_slot(EDMA_CHAN_SLOT(echan->slot[0]), ); + + /* +* Issue later based on missed flag which will be sure +* to happen as: +* (1) we finished transmitting an intermediate slot and +* edma_execute is coming up. +* (2) or we finished current transfer and issue will +* call edma_execute. +* +* Important note: issuing can be dangerous here and +* lead to some nasty recursion when we are in a NULL +* slot. So we avoid doing so and set the missed flag. +*/ + if (p.a_b_cnt == 0 && p.ccnt == 0) { + dev_dbg(dev, "Error occurred, looks like slot is null, just setting miss\n"); + echan->missed = 1; + } else { + /* +* The slot is already programmed but the event got +* missed, so its safe to issue it here. +*/ + dev_dbg(dev, "Error occurred but slot is no
[PATCH 2/3] crypto: omap-des: Add config and build options
Add config and build options for the newly added omap-des driver. Signed-off-by: Joel Fernandes --- drivers/crypto/Kconfig | 11 +++ drivers/crypto/Makefile | 1 + 2 files changed, 12 insertions(+) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index e289afa..119a8e5 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -259,6 +259,17 @@ config CRYPTO_DEV_OMAP_AES OMAP processors have AES module accelerator. Select this if you want to use the OMAP module for AES algorithms. +config CRYPTO_DEV_OMAP_DES + tristate "Support for OMAP DES3DES hw engine" + depends on ARCH_OMAP2PLUS + select CRYPTO_DES + select CRYPTO_BLKCIPHER2 + help + OMAP processors have DES/3DES module accelerator. Select this if you + want to use the OMAP module for DES and 3DES algorithms. Currently + the ECB and CBC modes of operation supported by the driver. Also + accesses made on unaligned boundaries are also supported. + config CRYPTO_DEV_PICOXCELL tristate "Support for picoXcell IPSEC and Layer2 crypto engines" depends on ARCH_PICOXCELL && HAVE_CLK diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile index 38ce13d..ada440f 100644 --- a/drivers/crypto/Makefile +++ b/drivers/crypto/Makefile @@ -11,6 +11,7 @@ obj-$(CONFIG_CRYPTO_DEV_IXP4XX) += ixp4xx_crypto.o obj-$(CONFIG_CRYPTO_DEV_PPC4XX) += amcc/ obj-$(CONFIG_CRYPTO_DEV_OMAP_SHAM) += omap-sham.o obj-$(CONFIG_CRYPTO_DEV_OMAP_AES) += omap-aes.o +obj-$(CONFIG_CRYPTO_DEV_OMAP_DES) += omap-des.o obj-$(CONFIG_CRYPTO_DEV_PICOXCELL) += picoxcell_crypto.o obj-$(CONFIG_CRYPTO_DEV_SAHARA) += sahara.o obj-$(CONFIG_CRYPTO_DEV_S5P) += s5p-sss.o -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] crypto: omap-des: Add driver for OMAP DES module
OMAP4 and AM437x SoCs have a DES3DES module which is capable of performing DES encrypt/decrypt and 3DES ede encrypt/decrypt operations. Following patch series adds support for the same. Tests have been performed with crypto/testmgr and all tests are successful. Joel Fernandes (3): crypto: omap-des: Add omap-des driver for OMAP4/AM43xx crypto: omap-des: Add config and build options crypto: omap-des: Add triple DES (des3_ede) support to driver drivers/crypto/Kconfig| 11 + drivers/crypto/Makefile |1 + drivers/crypto/omap-des.c | 1239 + 3 files changed, 1251 insertions(+) create mode 100644 drivers/crypto/omap-des.c -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] crypto: omap-des: Add triple DES (des3_ede) support to driver
OMAP DES module supports 3DES operation where 3 64-bit keys are used to perform a DES encrypt-decrypt-encrypt (ede) operation on a buffer. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-des.c | 53 --- 1 file changed, 50 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c index 6a9a25f..0df60cb 100644 --- a/drivers/crypto/omap-des.c +++ b/drivers/crypto/omap-des.c @@ -83,7 +83,7 @@ struct omap_des_ctx { struct omap_des_dev *dd; int keylen; - u32 key[DES_KEY_SIZE / sizeof(u32)]; + u32 key[(3 * DES_KEY_SIZE) / sizeof(u32)]; unsigned long flags; }; @@ -265,8 +265,10 @@ static int omap_des_write_ctrl(struct omap_des_dev *dd) val |= DES_REG_CTRL_CBC; if (dd->flags & FLAGS_ENCRYPT) val |= DES_REG_CTRL_DIRECTION; + if (key32 == 6) + val |= DES_REG_CTRL_TDES; - mask |= DES_REG_CTRL_CBC | DES_REG_CTRL_DIRECTION; + mask |= DES_REG_CTRL_CBC | DES_REG_CTRL_DIRECTION | DES_REG_CTRL_TDES; omap_des_write_mask(dd, DES_REG_CTRL(dd), val, mask); @@ -725,7 +727,7 @@ static int omap_des_setkey(struct crypto_ablkcipher *tfm, const u8 *key, { struct omap_des_ctx *ctx = crypto_ablkcipher_ctx(tfm); - if (keylen != DES_KEY_SIZE) + if (keylen != DES_KEY_SIZE && keylen != (3*DES_KEY_SIZE)) return -EINVAL; pr_debug("enter, keylen: %d\n", keylen); @@ -817,6 +819,51 @@ static struct crypto_alg algs_ecb_cbc[] = { .encrypt= omap_des_cbc_encrypt, .decrypt= omap_des_cbc_decrypt, } +}, +{ + .cra_name = "ecb(des3_ede)", + .cra_driver_name= "ecb-des3-omap", + .cra_priority = 100, + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | + CRYPTO_ALG_KERN_DRIVER_ONLY | + CRYPTO_ALG_ASYNC, + .cra_blocksize = DES_BLOCK_SIZE, + .cra_ctxsize= sizeof(struct omap_des_ctx), + .cra_alignmask = 0, + .cra_type = _ablkcipher_type, + .cra_module = THIS_MODULE, + .cra_init = omap_des_cra_init, + .cra_exit = omap_des_cra_exit, + .cra_u.ablkcipher = { + .min_keysize= 3*DES_KEY_SIZE, + .max_keysize= 3*DES_KEY_SIZE, + .setkey = omap_des_setkey, + .encrypt= omap_des_ecb_encrypt, + .decrypt= omap_des_ecb_decrypt, + } +}, +{ + .cra_name = "cbc(des3_ede)", + .cra_driver_name= "cbc-des3-omap", + .cra_priority = 100, + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | + CRYPTO_ALG_KERN_DRIVER_ONLY | + CRYPTO_ALG_ASYNC, + .cra_blocksize = DES_BLOCK_SIZE, + .cra_ctxsize= sizeof(struct omap_des_ctx), + .cra_alignmask = 0, + .cra_type = _ablkcipher_type, + .cra_module = THIS_MODULE, + .cra_init = omap_des_cra_init, + .cra_exit = omap_des_cra_exit, + .cra_u.ablkcipher = { + .min_keysize= 3*DES_KEY_SIZE, + .max_keysize= 3*DES_KEY_SIZE, + .ivsize = DES_BLOCK_SIZE, + .setkey = omap_des_setkey, + .encrypt= omap_des_cbc_encrypt, + .decrypt= omap_des_cbc_decrypt, + } } }; -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] crypto: omap-des: Add omap-des driver for OMAP4/AM43xx
Add omap-des driver with platform data for OMAP4. Support added for DES ECB and CBC modes. Where possible, code is reused from omap-aes driver with changes made for adjusting key size, block size, removing non-existent encryption modes and adding support for OMAP4 platform data and offsets. Tests have been conducted with the CRYPTO test manager, and functionality is verified at different page length alignments. Signed-off-by: Joel Fernandes --- drivers/crypto/omap-des.c | 1192 + 1 file changed, 1192 insertions(+) create mode 100644 drivers/crypto/omap-des.c diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c new file mode 100644 index 000..6a9a25f --- /dev/null +++ b/drivers/crypto/omap-des.c @@ -0,0 +1,1192 @@ +/* + * Cryptographic API. + * + * Support for OMAP DES and Triple DES HW acceleration. + * + * Copyright (c) 2012 Texas Instruments Incorporated + * Author: Joel Fernandes + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + * + */ + +#define pr_fmt(fmt) "%s: " fmt, __func__ + +#ifdef DEBUG +#define prn(num) printk(#num "=%d\n", num) +#define prx(num) printk(#num "=%x\n", num) +#else +#define prn(num) do { } while (0) +#define prx(num) do { } while (0) +#endif + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DST_MAXBURST 2 + +#define DES_BLOCK_WORDS(DES_BLOCK_SIZE >> 2) + +#define _calc_walked(inout) (dd->inout##_walk.offset - dd->inout##_sg->offset) + +#define DES_REG_KEY(dd, x) ((dd)->pdata->key_ofs - \ + ((x ^ 0x01) * 0x04)) + +#define DES_REG_IV(dd, x) ((dd)->pdata->iv_ofs + ((x) * 0x04)) + +#define DES_REG_CTRL(dd) ((dd)->pdata->ctrl_ofs) +#define DES_REG_CTRL_CBC (1 << 4) +#define DES_REG_CTRL_TDES (1 << 3) +#define DES_REG_CTRL_DIRECTION (1 << 2) +#define DES_REG_CTRL_INPUT_READY (1 << 1) +#define DES_REG_CTRL_OUTPUT_READY (1 << 0) + +#define DES_REG_DATA_N(dd, x) ((dd)->pdata->data_ofs + ((x) * 0x04)) + +#define DES_REG_REV(dd)((dd)->pdata->rev_ofs) + +#define DES_REG_MASK(dd) ((dd)->pdata->mask_ofs) + +#define DES_REG_LENGTH_N(x)(0x24 + ((x) * 0x04)) + +#define DES_REG_IRQ_STATUS(dd) ((dd)->pdata->irq_status_ofs) +#define DES_REG_IRQ_ENABLE(dd) ((dd)->pdata->irq_enable_ofs) +#define DES_REG_IRQ_DATA_INBIT(1) +#define DES_REG_IRQ_DATA_OUT BIT(2) + +#define FLAGS_MODE_MASK0x000f +#define FLAGS_ENCRYPT BIT(0) +#define FLAGS_CBC BIT(1) +#define FLAGS_INIT BIT(4) +#define FLAGS_BUSY BIT(6) + +struct omap_des_ctx { + struct omap_des_dev *dd; + + int keylen; + u32 key[DES_KEY_SIZE / sizeof(u32)]; + unsigned long flags; +}; + +struct omap_des_reqctx { + unsigned long mode; +}; + +#define OMAP_DES_QUEUE_LENGTH 1 +#define OMAP_DES_CACHE_SIZE0 + +struct omap_des_algs_info { + struct crypto_alg *algs_list; + unsigned intsize; + unsigned intregistered; +}; + +struct omap_des_pdata { + struct omap_des_algs_info *algs_info; + unsigned intalgs_info_size; + + void(*trigger)(struct omap_des_dev *dd, int length); + + u32 key_ofs; + u32 iv_ofs; + u32 ctrl_ofs; + u32 data_ofs; + u32 rev_ofs; + u32 mask_ofs; + u32 irq_enable_ofs; + u32 irq_status_ofs; + + u32 dma_enable_in; + u32 dma_enable_out; + u32 dma_start; + + u32 major_mask; + u32 major_shift; + u32 minor_mask; + u32 minor_shift; +}; + +struct omap_des_dev { + struct list_headlist; + unsigned long phys_base; + void __iomem*io_base; + struct omap_des_ctx *ctx; + struct device *dev; + unsigned long flags; + int err; + + /* spinlock used for queues */ + spinlock_t lock; + struct crypto_queue queue; + + struct tasklet_struct done_task; + struct tasklet_struct queue_task; + + struct ablkcipher_request *req; + /* +* total is used by PIO
Re: [PATCH 1/3] crypto: omap-des: Add omap-des driver for OMAP4/AM43xx
On 08/29/2013 06:27 PM, Joel Fernandes wrote: > Add omap-des driver with platform data for OMAP4. Support added for DES > ECB and CBC modes. > > Where possible, code is reused from omap-aes driver with changes made for > adjusting key size, block size, removing non-existent encryption modes > and adding support for OMAP4 platform data and offsets. > > Tests have been conducted with the CRYPTO test manager, and functionality > is verified at different page length alignments. > > Signed-off-by: Joel Fernandes > --- > drivers/crypto/omap-des.c | 1192 > + > 1 file changed, 1192 insertions(+) > create mode 100644 drivers/crypto/omap-des.c > > diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c > new file mode 100644 > index 000..6a9a25f > --- /dev/null > +++ b/drivers/crypto/omap-des.c > @@ -0,0 +1,1192 @@ > +/* > + * Cryptographic API. > + * > + * Support for OMAP DES and Triple DES HW acceleration. > + * > + * Copyright (c) 2012 Texas Instruments Incorporated > + * Author: Joel Fernandes Have to change this to 2013. Will do so if there are no other comments. Regards, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: dts: add AM33XX EDMA support
Hi Benoit, On 08/26/2013 03:36 AM, Benoit Cousson wrote: > - minus all the TI emails which are not working anymore :-( > > I've just sent my previous email too soon... > > Now the patch is different :-) I'll take that one. Unfortunately this patch is still missing from your latest pull request: Subject "[GIT PULL] ARM: OMAP: Device Tree for 3.12 take #2" git://git.kernel.org/pub/scm/linux/kernel/git/bcousson/linux-omap-dt.git tags/for_3.12/dts_signed (commit 4843be165c10f9886c87eeb20acf19a3ddec6653) Below is a scissor patch that cleanly applies on above branch. Thanks, -Joel --->8 From: Matt Porter Subject: [PATCH] ARM: dts: add AM33XX EDMA support Adds AM33XX EDMA support to the am33xx.dtsi as documented in Documentation/devicetree/bindings/dma/ti-edma.txt v2 changes: Drop DT entries that are non-hardware-description Joel Fernandes Discussion in [1]. v3 changes: Changed node name from "edma: edma@" to "edma: dma-controller@" by Sebastian Andrzej Siewior [1] https://patchwork.kernel.org/patch/2226761/ Signed-off-by: Matt Porter Signed-off-by: Joel A Fernandes --- arch/arm/boot/dts/am33xx.dtsi | 12 1 file changed, 12 insertions(+) diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi index 5996d63..f5869ed 100644 --- a/arch/arm/boot/dts/am33xx.dtsi +++ b/arch/arm/boot/dts/am33xx.dtsi @@ -101,6 +101,18 @@ reg = <0x4820 0x1000>; }; + edma: dma-controller@4900 { + compatible = "ti,edma3"; + ti,hwmods = "tpcc", "tptc0", "tptc1", "tptc2"; + reg = <0x4900 0x1>, + <0x44e10f90 0x10>; + interrupts = <12 13 14>; + #dma-cells = <1>; + dma-channels = <64>; + ti,edma-regions = <4>; + ti,edma-slots = <256>; + }; + gpio0: gpio@44e07000 { compatible = "ti,omap4-gpio"; ti,hwmods = "gpio1"; -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: EDMA: Fix clearing of unused list for DT DMA resources
Hi Tony or Sekhar, If this patch looks ok, could you pick it up for -rc cycle? It fixes DMA breakages after the merge window for devices for which DMA resources are being populated in device tree instead pdev. Thanks, -Joel On 07/22/2013 12:59 PM, Joel Fernandes wrote: > HWMOD removal for MMC is breaking edma_start as the events are being manually > triggered due to unused channel list not being clear, Thanks to Balaji TK for > finding this issue. > > This patch fixes the issue, by reading the "dmas" property from the DT node if > it exists and clearing the bits in the unused channel list. > > Cc: Balaji T K > Cc: Pantel Antoniou > Signed-off-by: Joel Fernandes > --- > v2 changes: Fixed compiler warning > > arch/arm/common/edma.c | 31 +++ > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c > index a432e6c..765d578 100644 > --- a/arch/arm/common/edma.c > +++ b/arch/arm/common/edma.c > @@ -561,14 +561,29 @@ static int reserve_contiguous_slots(int ctlr, unsigned > int id, > static int prepare_unused_channel_list(struct device *dev, void *data) > { > struct platform_device *pdev = to_platform_device(dev); > - int i, ctlr; > - > - for (i = 0; i < pdev->num_resources; i++) { > - if ((pdev->resource[i].flags & IORESOURCE_DMA) && > - (int)pdev->resource[i].start >= 0) { > - ctlr = EDMA_CTLR(pdev->resource[i].start); > - clear_bit(EDMA_CHAN_SLOT(pdev->resource[i].start), > - edma_cc[ctlr]->edma_unused); > + int i = 0, ctlr; > + u32 dma_chan; > + const __be32 *dma_chan_p; > + struct property *prop; > + > + if (dev->of_node) { > + of_property_for_each_u32(dev->of_node, "dmas", prop, \ > + dma_chan_p, dma_chan) { > + if (i++ & 1) { > + ctlr = EDMA_CTLR(dma_chan); > + clear_bit(EDMA_CHAN_SLOT(dma_chan), > + edma_cc[ctlr]->edma_unused); > + } > + } > + } else { > + for (; i < pdev->num_resources; i++) { > + if ((pdev->resource[i].flags & IORESOURCE_DMA) && > + (int)pdev->resource[i].start >= 0) { > + ctlr = EDMA_CTLR(pdev->resource[i].start); > + clear_bit(EDMA_CHAN_SLOT( > + pdev->resource[i].start), > + edma_cc[ctlr]->edma_unused); > + } > } > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
Manual trigger for events missed as a result of splitting a scatter gather list and DMA'ing it in batches. Add a helper function to trigger a channel incase any such events are missed. Signed-off-by: Joel Fernandes --- arch/arm/common/edma.c | 21 + include/linux/platform_data/edma.h |2 ++ 2 files changed, 23 insertions(+) diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 3567ba1..10995b2 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel) } EXPORT_SYMBOL(edma_resume); +int edma_manual_trigger(unsigned channel) +{ + unsigned ctlr; + int j; + unsigned int mask; + + ctlr = EDMA_CTLR(channel); + channel = EDMA_CHAN_SLOT(channel); + mask = BIT(channel & 0x1f); + + j = channel >> 5; + + /* EDMA channels without event association */ + edma_shadow0_write_array(ctlr, SH_ESR, j, mask); + + pr_debug("EDMA: ESR%d %08x\n", j, +edma_shadow0_read_array(ctlr, SH_ESR, j)); + return 0; +} +EXPORT_SYMBOL(edma_manual_trigger); + /** * edma_start - start dma on a channel * @channel: channel being activated diff --git a/include/linux/platform_data/edma.h b/include/linux/platform_data/edma.h index 57300fd..0e11eca 100644 --- a/include/linux/platform_data/edma.h +++ b/include/linux/platform_data/edma.h @@ -180,4 +180,6 @@ struct edma_soc_info { const s16 (*xbar_chans)[2]; }; +int edma_manual_trigger(unsigned); + #endif -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/9] dma: edma: Leave linked to Null slot instead of DUMMY slot
Dummy slot has been used as a way for missed-events not to be reported as missing. This has been particularly troublesome for cases where we might want to temporarily pause all incoming events. For EDMA DMAC, there is no way to do any such pausing of events as the occurence of the "next" event is not software controlled. Using "edma_pause" in IRQ handlers doesn't help as by then the event in concern from the slave is already missed. Linking a dummy slot, is seen to absorb these events which we didn't want to miss. So we don't link to dummy, but instead leave it linked to NULL set, allow an error condition and detect the channel that missed it. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index aa4989f..1eda5cc 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -159,9 +159,6 @@ static void edma_execute(struct edma_chan *echan) /* Link to the previous slot if not the last set */ if (i != (total_process - 1)) edma_link(echan->slot[i], echan->slot[i+1]); - /* Final pset links to the dummy pset */ - else - edma_link(echan->slot[i], echan->ecc->dummy_slot); } edesc->total_processed += total_process; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/9] dma: edma: Link to dummy slot only for last SG list split
Consider the case where we have a scatter-list like: SG1->SG2->SG3->SG4->SG5->SG6->Null For ex, for a MAX_NR_SG of 2, earlier we were splitting this as: SG1->SG2->Null SG3->SG4->Null SG5->SG6->Null Now we split it as SG1->SG2->Null SG3->SG4->Null SG5->SG6->Dummy This approach results in lesser unwanted interrupts that occur for the last list split. The Dummy slot has the property of not raising an error condition if events are missed unlike the Null slot. We are OK with this as we're done with processing the whole list once we reach Dummy. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index c72e8c9..0d3ebde 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -164,6 +164,12 @@ static void edma_execute(struct edma_chan *echan) edesc->total_processed += total_process; + /* If this is either the last set in a set of SG-list transactions + then setup a link to the dummy slot, this results in all future + events being absorbed and that's OK because we're done */ + if (edesc->total_processed == edesc->pset_nr) + edma_link(echan->slot[total_process-1], echan->ecc->dummy_slot); + edma_resume(echan->ch_num); if (edesc->total_processed <= MAX_NR_SG) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/9] dma: edma: Find missed events and issue them
In an effort to move to using Scatter gather lists of any size with EDMA as discussed at [1] instead of placing limitations on the driver, we work through the limitations of the EDMAC hardware to find missed events and issue them. The sequence of events that require this are: For the scenario where MAX slots for an EDMA channel is 3: SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null The above SG list will have to be DMA'd in 2 sets: (1) SG1 -> SG2 -> SG3 -> Null (2) SG4 -> SG5 -> SG6 -> Null After (1) is succesfully transferred, the events from the MMC controller donot stop coming and are missed by the time we have setup the transfer for (2). So here, we catch the events missed as an error condition and issue them manually. [1] http://marc.info/?l=linux-omap=137416733628831=2 Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index d9a151b..aa4989f 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -417,7 +417,15 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) break; case DMA_CC_ERROR: - dev_dbg(dev, "transfer error on channel %d\n", ch_num); + if (echan->edesc) { + dev_dbg(dev, "Missed event on %d, retrying\n", + ch_num); + edma_clean_channel(echan->ch_num); + edma_stop(echan->ch_num); + edma_start(echan->ch_num); + edma_manual_trigger(echan->ch_num); + } + dev_dbg(dev, "handled error on channel %d\n", ch_num); break; default: break; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/9] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
Changes are made here for configuring existing parameters to support DMA'ing them out in batches as needed. Also allocate as many as slots as needed by the SG list, but not more than MAX_NR_SG. Then these slots will be reused accordingly. For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only 10 slots will be allocated to DMA the entire SG list of size 40. Also enable TC interrupts for slots that are a last in a current iteration, or that fall on a MAX_NR_SG boundary. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 5f3e532..0b68f94 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( enum dma_slave_buswidth dev_width; u32 burst; struct scatterlist *sg; - int i; int acnt, bcnt, ccnt, src, dst, cidx; int src_bidx, dst_bidx, src_cidx, dst_cidx; + int i, num_slots_needed; if (unlikely(!echan || !sgl || !sg_len)) return NULL; @@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( edesc->pset_nr = sg_len; - for_each_sg(sgl, sg, sg_len, i) { - /* Allocate a PaRAM slot, if needed */ + /* Allocate a PaRAM slot, if needed */ + + num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len; + + for (i = 0; i < num_slots_needed; i++) { if (echan->slot[i] < 0) { echan->slot[i] = edma_alloc_slot(EDMA_CTLR(echan->ch_num), @@ -273,6 +276,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( return NULL; } } + } + + /* Configure PaRAM sets for each SG */ + for_each_sg(sgl, sg, sg_len, i) { acnt = dev_width; @@ -330,6 +337,12 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( /* Configure A or AB synchronized transfers */ if (edesc->absync) edesc->pset[i].opt |= SYNCDIM; + + /* If this is the last in a current SG set of transactions, + enable interrupts so that next set is processed */ + if (!((i+1) % MAX_NR_SG)) + edesc->pset[i].opt |= TCINTEN; + /* If this is the last set, enable completion interrupt flag */ if (i == sg_len - 1) edesc->pset[i].opt |= TCINTEN; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 9/9] dma: edma: remove limits on number of slots
With this series, this check is no longer required and we shouldn't need to reject drivers DMA'ing more than the MAX number of slots. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c |6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 0d3ebde..abf2e87 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -285,12 +285,6 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg( return NULL; } - if (sg_len > MAX_NR_SG) { - dev_err(dev, "Exceeded max SG segments %d > %d\n", - sg_len, MAX_NR_SG); - return NULL; - } - edesc = kzalloc(sizeof(*edesc) + sg_len * sizeof(edesc->pset[0]), GFP_ATOMIC); if (!edesc) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/9] dma: edma: Support scatter-lists of any length
The following series adds support to EDMA driver to enable DMA of scatter-gather lists of arbitrary length, but still make use of only a certain MAX number of slots at a time for a given channel. Thus free-ing up the rest of the slots to other slaves/channels. With this there is no need for slave drivers to query the EDMA driver about how much is the MAX it can send at a time as done in [1]. Drivers can send SG lists of any number of entries to DMA. Reference discussion at [2]. Tested omap-aes and omap_hsmmc drivers with different MAX number of slots, even just 1. In the case where it is 1, only 1-slot is used to DMA an entire scatter list of arbitrary length. Since this series touches EDMA private API code also shared with davinci-pcm, playback of a 16-bit 44.1KHz audio file with davinci-pcm has been tested. Sample test run with 1 vs 16 (MAX number of slots/SG) in omap-aes driver: MAX slots = 1: (128 bit key, 8192 byte blocks): 1266 operations in 1 seconds (10371072 bytes) MAX slots = 16: (128 bit key, 8192 byte blocks): 1601 operations in 1 seconds (13115392 bytes) Note: For the above test, 8K buffer is mapped into SG list of size 2 so only 2 slots are required. So beyond size 2, there will not be any noticeable performance improvement. But using 1 slot is even managed by just DMA'ing 1 SG entry at a time. [1] https://lkml.org/lkml/2013/7/18/432 [2] http://marc.info/?l=linux-omap=137416733628831=2 Joel Fernandes (9): dma: edma: Setup parameters to DMA MAX_NR_SG at a time dma: edma: Write out and handle MAX_NR_SG at a given time ARM: edma: Add function to manually trigger an EDMA channel dma: edma: Find missed events and issue them dma: edma: Leave linked to Null slot instead of DUMMY slot dma: edma: Detect null slot errors and handle them correctly ARM: edma: Don't clear EMR of channel in edma_stop dma: edma: Link to dummy slot only for last SG list split dma: edma: remove limits on number of slots arch/arm/common/edma.c | 22 - drivers/dma/edma.c | 157 +++- include/linux/platform_data/edma.h |2 + 3 files changed, 142 insertions(+), 39 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
We certainly don't want error conditions to be cleared anywhere as this will make us 'forget' about missed events. We depend on knowing which events were missed in order to be able to reissue them. This fixes a race condition where the EMR was being cleared by the transfer completion interrupt handler. Basically, what was happening was: Missed event | | V SG1-SG2-SG3-Null \ \__TC Interrupt (Almost same time as ARM is executing TC interrupt handler, an event got missed and also forgotten by clearing the EMR). The EMR is ultimately being cleared by the Error interrupt handler once it is handled so we don't have to do it in edma_stop. Signed-off-by: Joel Fernandes --- arch/arm/common/edma.c |1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 10995b2..dec772e 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel) edma_shadow0_write_array(ctlr, SH_EECR, j, mask); edma_shadow0_write_array(ctlr, SH_ECR, j, mask); edma_shadow0_write_array(ctlr, SH_SECR, j, mask); - edma_write_array(ctlr, EDMA_EMCR, j, mask); pr_debug("EDMA: EER%d %08x\n", j, edma_shadow0_read_array(ctlr, SH_EER, j)); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/9] dma: edma: Write out and handle MAX_NR_SG at a given time
Process SG-elements in batches of MAX_NR_SG if they are greater than MAX_NR_SG. Due to this, at any given time only those many slots will be used in the given channel no matter how long the scatter list is. We keep track of how much has been written inorder to process the next batch of elements in the scatter-list and detect completion. For such intermediate transfer completions (one batch of MAX_NR_SG), make use of pause and resume functions instead of start and stop when such intermediate transfer is in progress or completed as we donot want to clear any pending events. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 79 +++- 1 file changed, 54 insertions(+), 25 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 0b68f94..d9a151b 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -56,6 +56,7 @@ struct edma_desc { struct list_headnode; int absync; int pset_nr; + int total_processed; struct edmacc_param pset[0]; }; @@ -104,22 +105,36 @@ static void edma_desc_free(struct virt_dma_desc *vdesc) /* Dispatch a queued descriptor to the controller (caller holds lock) */ static void edma_execute(struct edma_chan *echan) { - struct virt_dma_desc *vdesc = vchan_next_desc(>vchan); + struct virt_dma_desc *vdesc; struct edma_desc *edesc; - int i; + struct device *dev = echan->vchan.chan.device->dev; - if (!vdesc) { - echan->edesc = NULL; - return; + int i, j, total_left, total_process; + + /* If either we processed all psets or we're still not started */ + if (!echan->edesc || + echan->edesc->pset_nr == echan->edesc->total_processed) { + /* Get next vdesc */ + vdesc = vchan_next_desc(>vchan); + if (!vdesc) { + echan->edesc = NULL; + return; + } + list_del(>node); + echan->edesc = to_edma_desc(>tx); } - list_del(>node); + edesc = echan->edesc; + + /* Find out how many left */ + total_left = edesc->pset_nr - edesc->total_processed; + total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left; - echan->edesc = edesc = to_edma_desc(>tx); /* Write descriptor PaRAM set(s) */ - for (i = 0; i < edesc->pset_nr; i++) { - edma_write_slot(echan->slot[i], >pset[i]); + for (i = 0; i < total_process; i++) { + j = i + edesc->total_processed; + edma_write_slot(echan->slot[i], >pset[j]); dev_dbg(echan->vchan.chan.device->dev, "\n pset[%d]:\n" " chnum\t%d\n" @@ -132,24 +147,31 @@ static void edma_execute(struct edma_chan *echan) " bidx\t%08x\n" " cidx\t%08x\n" " lkrld\t%08x\n", - i, echan->ch_num, echan->slot[i], - edesc->pset[i].opt, - edesc->pset[i].src, - edesc->pset[i].dst, - edesc->pset[i].a_b_cnt, - edesc->pset[i].ccnt, - edesc->pset[i].src_dst_bidx, - edesc->pset[i].src_dst_cidx, - edesc->pset[i].link_bcntrld); + j, echan->ch_num, echan->slot[i], + edesc->pset[j].opt, + edesc->pset[j].src, + edesc->pset[j].dst, + edesc->pset[j].a_b_cnt, + edesc->pset[j].ccnt, + edesc->pset[j].src_dst_bidx, + edesc->pset[j].src_dst_cidx, + edesc->pset[j].link_bcntrld); /* Link to the previous slot if not the last set */ - if (i != (edesc->pset_nr - 1)) + if (i != (total_process - 1)) edma_link(echan->slot[i], echan->slot[i+1]); /* Final pset links to the dummy pset */ else edma_link(echan->slot[i], echan->ecc->dummy_slot); } - edma_start(echan->ch_num); + edesc->total_processed += total_process; + + edma_resume(echan->ch_num); + + if (edesc->total_processed <= MAX_NR_SG) { + dev_dbg(dev, "first transfer starting %d\n", echan->ch_num); + edma_start(echan->ch_num); + } } static int edma_termin
[PATCH 6/9] dma: edma: Detect null slot errors and handle them correctly
For crypto IP, we continue to receive events even continuously in NULL slot, and request lines don't get de-asserted unlike omap_hsmmc. Due to this, we continously receive error interrupts when we manually trigger an event. We fix this, by first detecting if the Channel is currently transferring from a NULL slot or not, that's where the edma_read_slot in the error callback from interrupt handler comes in. Second thing we do is, if we detect if we are on a NULL slot, we don't forcefully trigger as this will only result in more error conditions. Instead we set a missed flag and allow the manual triggerring to happen in edma_execute which will eventually be called. This fixes the issue where we are on a NULL slot and continue to receive events from modules like crypto that don't stop their request events after a transfer is completed. Signed-off-by: Joel Fernandes --- drivers/dma/edma.c | 42 ++ 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index 1eda5cc..c72e8c9 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -70,6 +70,7 @@ struct edma_chan { int ch_num; boolalloced; int slot[EDMA_MAX_SLOTS]; + int missed; struct dma_slave_config cfg; }; @@ -169,6 +170,18 @@ static void edma_execute(struct edma_chan *echan) dev_dbg(dev, "first transfer starting %d\n", echan->ch_num); edma_start(echan->ch_num); } + + /* This happens due to setup times between intermediate transfers + in long SG lists which have to be broken up into transfers of + MAX_NR_SG */ + if (echan->missed) { + dev_dbg(dev, "missed event in execute detected\n"); + edma_clean_channel(echan->ch_num); + edma_stop(echan->ch_num); + edma_start(echan->ch_num); + edma_manual_trigger(echan->ch_num); + echan->missed = 0; + } } static int edma_terminate_all(struct edma_chan *echan) @@ -387,6 +400,7 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) struct device *dev = echan->vchan.chan.device->dev; struct edma_desc *edesc; unsigned long flags; + struct edmacc_param p; /* Pause the channel */ edma_pause(echan->ch_num); @@ -414,15 +428,35 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data) break; case DMA_CC_ERROR: - if (echan->edesc) { - dev_dbg(dev, "Missed event on %d, retrying\n", - ch_num); + spin_lock_irqsave(>vchan.lock, flags); + + edma_read_slot(EDMA_CHAN_SLOT(echan->slot[0]), ); + + if (p.a_b_cnt == 0 && p.ccnt == 0) { + dev_dbg(dev, "Error occurred, looks like slot is null, just setting miss\n"); + /* + Issue later based on missed flag which will be sure + to happen as: + (1) we finished transmitting an intermediate slot and + edma_execute is coming up. + (2) or we finished current transfer and issue will + call edma_execute. + + Important note: issuing can be dangerous here and + lead to some nasty recursion as this is a NULL slot + at this point. + */ + echan->missed = 1; + } else { + dev_dbg(dev, "Error occurred but slot is non-null, TRIGGERING\n"); edma_clean_channel(echan->ch_num); edma_stop(echan->ch_num); edma_start(echan->ch_num); edma_manual_trigger(echan->ch_num); } - dev_dbg(dev, "handled error on channel %d\n", ch_num); + + spin_unlock_irqrestore(>vchan.lock, flags); + break; default: break; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: EDMA: Fix clearing of unused list for DT DMA resources
On 07/29/2013 02:01 AM, Sekhar Nori wrote:> On Monday 22 July 2013 11:29 PM, Joel Fernandes wrote: >> HWMOD removal for MMC is breaking edma_start as the events are being manually >> triggered due to unused channel list not being clear, Thanks to Balaji TK for >> finding this issue. > > So, Reported-by: Balaji T K > > ? Sure, I'll add that as well. > >> >> This patch fixes the issue, by reading the "dmas" property from the DT node if >> it exists and clearing the bits in the unused channel list. >> >> Cc: Balaji T K >> Cc: Pantel Antoniou >> Signed-off-by: Joel Fernandes >> --- >> v2 changes: Fixed compiler warning >> >> arch/arm/common/edma.c | 31 +++ >> 1 file changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c >> index a432e6c..765d578 100644 >> --- a/arch/arm/common/edma.c >> +++ b/arch/arm/common/edma.c >> @@ -561,14 +561,29 @@ static int reserve_contiguous_slots(int ctlr, unsigned int id, >> static int prepare_unused_channel_list(struct device *dev, void *data) >> { >> struct platform_device *pdev = to_platform_device(dev); >> -int i, ctlr; >> - >> -for (i = 0; i < pdev->num_resources; i++) { >> -if ((pdev->resource[i].flags & IORESOURCE_DMA) && >> -(int)pdev->resource[i].start >= 0) { >> -ctlr = EDMA_CTLR(pdev->resource[i].start); >> -clear_bit(EDMA_CHAN_SLOT(pdev->resource[i].start), >> -edma_cc[ctlr]->edma_unused); >> +int i = 0, ctlr; >> +u32 dma_chan; >> +const __be32 *dma_chan_p; >> +struct property *prop; >> + >> +if (dev->of_node) { >> +of_property_for_each_u32(dev->of_node, "dmas", prop, \ >> + dma_chan_p, dma_chan) { >> +if (i++ & 1) { >> +ctlr = EDMA_CTLR(dma_chan); >> +clear_bit(EDMA_CHAN_SLOT(dma_chan), >> +edma_cc[ctlr]->edma_unused); >> +} > > I think it will be cleaner if you use of_property_count_strings() on > dma-names to get the count of DMA channels for this device and then use > of_parse_phandle_with_args() to get access to the args. Honestly, I have > not used these APIs myself, so if this doesn’t work the way I think it > should then let me know and I will try to do some experiments myself. > > The current way of skipping over even fields really depends on > #dma-cells for EDMA to be 1. That in itself is not such a bad > assumption, but if you have APIs which already take care of this for > you, why not use them? Sure, looks like of_dma_request_slave_channel does same thing in drive/dma/of-dma.c. Let me follow up with a patch that does it this way. I agree its cleaner. >> +} >> +} else { >> +for (; i < pdev->num_resources; i++) { >> +if ((pdev->resource[i].flags & IORESOURCE_DMA) && >> +(int)pdev->resource[i].start >= 0) { >> +ctlr = EDMA_CTLR(pdev->resource[i].start); >> +clear_bit(EDMA_CHAN_SLOT( >> + pdev->resource[i].start), >> + edma_cc[ctlr]->edma_unused); >> +} > > So there is very little in common between OF and non-OF versions of this > function. Why not have two different versions of this function for the > two cases? The OF version can reside under the CONFIG_OF conditional > already in use in the file. This will also save you the ugly line breaks > you had to resort to due to too deep indentation. Actually those line breaks are not necessary and wouldn't result in compilation errors. I was planning to drop them. I'll go ahead and split it out anyway, now that also the OF version of the function is going to be bit longer if we use the of_parse functions. Thanks for your review, Regards, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: EDMA: Fix clearing of unused list for DT DMA resources
On 07/29/2013 02:04 AM, Sekhar Nori wrote: > On Sunday 28 July 2013 05:02 AM, Joel Fernandes wrote: >> Hi Tony or Sekhar, >> >> If this patch looks ok, could you pick it up for -rc cycle? >> >> It fixes DMA breakages after the merge window for devices for which DMA >> resources are being populated in device tree instead pdev. > > But which DT-enabled platform in kernel is using EDMA? A grep for edma > over arch/arm/boot/dts/* brings up nothing. > Does this really have to go into the -rc cycle or can it wait till > v3.12? But unused channel list is also populated for private EDMA callers like davinci-pcm no? IIRC, HWMOD data is also removed for McASP so EDMA for those may break. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dma: edma: add device_slave_caps() support
Hi Vinod, On 07/29/2013 04:45 AM, Vinod Koul wrote: > On Thu, Jul 25, 2013 at 12:53:51PM +0530, Vinod Koul wrote: >> On Wed, Jul 24, 2013 at 02:36:26PM -0500, Joel Fernandes wrote: >>>> Also another point worth considering is the approach Russell suggested, I >>>> havent >>>> gotten a chance to dig deeper but if I understood it correctly then >>>> programming >>>> the device_dma_parameters should be the right thing to do. Again I need to >>>> look >>>> deeper and esp wrt edma >>> >>> OK. I have some patches sitting in my tree too that I'm working on. With >>> that I don't need to know about maximum number of allowed segments and >>> can send along any number of segment. I will rework them and post them. >>> fwiw, I will also implement caps API incase like Lars did populating the >>> other fields though these will not be unused. >>> >>> For segment size, at this time I don't know any driver that uses it >>> other than davinci-pcm. For this reason the calculations can be done as >>> Lars suggested (for minimum of maximum). Do you know in advance if >>> you're going to amend to drop segment size if we go with what Russell >>> suggested, or are you going to leave the seg-size in the caps API anyway. >> I am just back and havent really done my work on this. Let me check and as I >> said if my understanding is right I would be inclined to remove these >> fields... > Okay so the max sg_len should be done by setting the device_dma_parameters. > > See the example in MXS DMA and MMC drivers Ah, I see those examples. Thanks. > > Now for second parameter why do you need that to be passed, I thought in edma > the > clients needs this value somehow to program the channel. It would be good that > if we can delink from this and let edma derive. Do you mean the burst width and device width parameters? On second thought, I am thinking do we really need to set this particular parameter if we can rearchitect the driver a bit. EDMA theoretically can transmit very large segment sizes. There are 3 internal counters (ACNT, BCNT, CCNT) that are 16-bit, total size is quite large (product of counters). EDMA driver however currently assumes that ACNT is device width and BCNT is max burst in units of device width. These parameters are again client configured so these have to be passed to the EDMA driver by configuration or passed through API to calculate maximum segment size. This is required because client may be unaware that CCNT is only 16 bit, so following calculation is being done in the SG segment size patch: +*edma_get_slave_sg_limits(struct dma_chan *chan, + enum dma_slave_buswidth addr_width, + u32 maxburst) +{ [..] + echan->sg_limits.max_seg_len = + (SZ_64K - 1) * addr_width * maxburst; The other down side of assigning ACNT device width and BCNT as burst size it seems to be wasteful of the range allowed by 16-bit width. I am thinking if we can break free from these assumptions, then we can transmit segments of literally any length and not have to set any segment size limits at all. I'll try to work on some patches to see if we can overcome segment size limits. > Also there should be no such thing as max_sg, the driver should be able to > perform any size of sg_list. Internally it can be breaking the dma > into multiple trasnactions. Sure, I just submitted a patch series that does exactly that: http://marc.info/?l=linux-omap=137510483230005=2 Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: EDMA: Fix clearing of unused list for DT DMA resources
On 07/29/2013 11:52 PM, Sekhar Nori wrote:> On Tuesday 30 July 2013 09:23 AM, Joel Fernandes wrote: >> On 07/29/2013 02:04 AM, Sekhar Nori wrote: >>> On Sunday 28 July 2013 05:02 AM, Joel Fernandes wrote: >>>> Hi Tony or Sekhar, >>>> >>>> If this patch looks ok, could you pick it up for -rc cycle? >>>> >>>> It fixes DMA breakages after the merge window for devices for which DMA >>>> resources are being populated in device tree instead pdev. >>> >>> But which DT-enabled platform in kernel is using EDMA? A grep for edma >>> over arch/arm/boot/dts/* brings up nothing. >>> Does this really have to go into the -rc cycle or can it wait till >>> v3.12? >> >> >> But unused channel list is also populated for private EDMA callers like >> davinci-pcm no? IIRC, HWMOD data is also removed for McASP so EDMA for >> those may break. > > Yes, but which DT-enabled platform has davinci-pcm working today? > AFAICS, this bug affects DT-enabled platforms only. Yes, you're right. I didn't realize nothing else is working yet. I guess its fine then for next kernel release. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: EDMA: Fix clearing of unused list for DT DMA resources
On 07/30/2013 11:29 AM, Sekhar Nori wrote: > On 7/30/2013 9:17 AM, Joel Fernandes wrote: > >>>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c >>>> index a432e6c..765d578 100644 >>>> --- a/arch/arm/common/edma.c >>>> +++ b/arch/arm/common/edma.c > >>>> + } else { >>>> + for (; i < pdev->num_resources; i++) { >>>> + if ((pdev->resource[i].flags & IORESOURCE_DMA) && >>>> + (int)pdev->resource[i].start >= 0) { >>>> + ctlr = EDMA_CTLR(pdev->resource[i].start); >>>> + clear_bit(EDMA_CHAN_SLOT( >>>> +pdev->resource[i].start), >>>> +edma_cc[ctlr]->edma_unused); >>>> + } >>> >>> So there is very little in common between OF and non-OF versions of this >>> function. Why not have two different versions of this function for the >>> two cases? The OF version can reside under the CONFIG_OF conditional >>> already in use in the file. This will also save you the ugly line breaks >>> you had to resort to due to too deep indentation. >> >> Actually those line breaks are not necessary and wouldn't result in >> compilation errors. I was planning to drop them. I'll go ahead and split >> it out anyway, now that also the OF version of the function is going to >> be bit longer if we use the of_parse functions. >> >> Thanks for your review, > > It turns out, I gave a bad idea. What I suggested will break the case of > non-DT boot with CONFIG_OF enabled. So what you had was fine. May be > just return from "if (dev->of_node)" so you don't need to have an else > block and can save on the indentation.> Ok, sure. I will go ahead and return from the if block. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
Hi Sekhar, On 07/30/2013 02:05 AM, Sekhar Nori wrote: > On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >> In an effort to move to using Scatter gather lists of any size with >> EDMA as discussed at [1] instead of placing limitations on the driver, >> we work through the limitations of the EDMAC hardware to find missed >> events and issue them. >> >> The sequence of events that require this are: >> >> For the scenario where MAX slots for an EDMA channel is 3: >> >> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null >> >> The above SG list will have to be DMA'd in 2 sets: >> >> (1) SG1 -> SG2 -> SG3 -> Null >> (2) SG4 -> SG5 -> SG6 -> Null >> >> After (1) is succesfully transferred, the events from the MMC controller >> donot stop coming and are missed by the time we have setup the transfer >> for (2). So here, we catch the events missed as an error condition and >> issue them manually. > > Are you sure there wont be any effect of these missed events on the > peripheral side. For example, wont McASP get into an underrun condition > when it encounters a null PaRAM set? Even UART has to transmit to a But it will not encounter null PaRAM set because McASP uses contiguous buffers for transfer which are not scattered across physical memory. This can be accomplished with an SG of size 1. For such SGs, this patch series leaves it linked Dummy and does not link to Null set. Null set is only used for SG lists that are > MAX_NR_SG in size such as those created for example by MMC and Crypto. > particular baud so I guess it cannot wait like the way MMC/SD can. Existing driver have to wait anyway if they hit MAX SG limit today. If they don't want to wait, they would have allocated a contiguous block of memory and DMA that in one stretch so they don't lose any events, and in such cases we are not linking to Null. > Also, wont this lead to under-utilization of the peripheral bandwith? > Meaning, MMC/SD is ready with data but cannot transfer because the DMA > is waiting to be set-up. But it is waiting anyway even today. Currently based on MAX segs, MMC driver/subsystem will make SG list of size max_segs. Between these sessions of creating such smaller SG-lists, if for some reason the MMC controller is sending events, these will be lost anyway. What will happen now with this patch series is we are simply accepting a bigger list than this, and handling all the max_segs stuff within the EDMA driver itself without outside world knowing. This is actually more efficient as for long transfers, we are not going back and forth much between the client and EDMA driver. > Did you consider a ping-pong scheme with say three PaRAM sets per > channel? That way you can keep a continuous transfer going on from the > peripheral over the complete SG list. Do you mean ping-pong scheme as used in the davinci-pcm driver today? This can be used only for buffers that are contiguous in memory, not those that are scattered across memory. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
On 07/30/2013 03:29 AM, Sekhar Nori wrote: > On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >> We certainly don't want error conditions to be cleared anywhere > > 'anywhere' is a really loaded term. > >> as this will make us 'forget' about missed events. We depend on >> knowing which events were missed in order to be able to reissue them. > >> This fixes a race condition where the EMR was being cleared >> by the transfer completion interrupt handler. >> >> Basically, what was happening was: >> >> Missed event >> | >> | >> V >> SG1-SG2-SG3-Null >> \ >> \__TC Interrupt (Almost same time as ARM is executing >> TC interrupt handler, an event got missed and also forgotten >> by clearing the EMR). > > Sorry, but I dont see how edma_stop() is coming into picture in the race > you describe? In edma_callback function, for the case of DMA_COMPLETE (Transfer completion interrupt), edma_stop() is called when all sets have been processed. This had the effect of clearing the EMR. This has 2 problems: 1. If error interrupt is also pending and TC interrupt clears the EMR. Due to this the ARM will execute the error interrupt even though the EMR is clear. As a result, the following if condition in dma_ccerr_handler will be true and IRQ_NONE is returned. if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) && (edma_read_array(ctlr, EDMA_EMR, 1) == 0) && (edma_read(ctlr, EDMA_QEMR) == 0) && (edma_read(ctlr, EDMA_CCERR) == 0)) return IRQ_NONE; If this happens enough number of times, IRQ subsystem disables the interrupt thinking its spurious which creates serious problems. 2. If the above if statement condition is removed, then EMR is 0 so the callback function will not be called in dma_ccerr_handler thus the event is forgotten, never triggered manually or never sets missed flag of the channel. So about the race: TC interrupt handler executing before the error interrupt handler can result in clearing the EMR and creates these problems. >> The EMR is ultimately being cleared by the Error interrupt >> handler once it is handled so we don't have to do it in edma_stop. > > This, I agree with. edma_clean_channel() also there to re-initialize the > channel so doing it in edma_stop() certainly seems superfluous. Sure. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
On 07/30/2013 12:18 AM, Sekhar Nori wrote: > On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >> Manual trigger for events missed as a result of splitting a >> scatter gather list and DMA'ing it in batches. Add a helper >> function to trigger a channel incase any such events are missed. >> >> Signed-off-by: Joel Fernandes >> --- >> arch/arm/common/edma.c | 21 + >> include/linux/platform_data/edma.h |2 ++ >> 2 files changed, 23 insertions(+) >> >> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c >> index 3567ba1..10995b2 100644 >> --- a/arch/arm/common/edma.c >> +++ b/arch/arm/common/edma.c >> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel) >> } >> EXPORT_SYMBOL(edma_resume); >> >> +int edma_manual_trigger(unsigned channel) > > edma_trigger_channel() maybe? Brings consistency with > edma_alloc_channel() edma_free_channel() etc. Ok, sure. > >> +{ >> +unsigned ctlr; >> +int j; >> +unsigned int mask; >> + >> +ctlr = EDMA_CTLR(channel); >> +channel = EDMA_CHAN_SLOT(channel); >> +mask = BIT(channel & 0x1f); >> + >> +j = channel >> 5; >> + >> +/* EDMA channels without event association */ > > May be actually check for no-event association before you trigger in > software? You can do that by looking at unused channel list, no? But, we want to trigger whether there is event association or not in this function. For ex, MMC has event associated but still this function is used to trigger event for it. > >> +edma_shadow0_write_array(ctlr, SH_ESR, j, mask); > > edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less > readable, but I leave it to you. Sure that's more readable, will changed it to that. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
On 07/31/2013 04:35 AM, Sekhar Nori wrote: > On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote: >> On 07/30/2013 03:29 AM, Sekhar Nori wrote: >>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >>>> We certainly don't want error conditions to be cleared anywhere >>> >>> 'anywhere' is a really loaded term. >>> >>>> as this will make us 'forget' about missed events. We depend on >>>> knowing which events were missed in order to be able to reissue them. >>> >>>> This fixes a race condition where the EMR was being cleared >>>> by the transfer completion interrupt handler. >>>> >>>> Basically, what was happening was: >>>> >>>> Missed event >>>> | >>>> | >>>> V >>>> SG1-SG2-SG3-Null >>>> \ >>>> \__TC Interrupt (Almost same time as ARM is executing >>>> TC interrupt handler, an event got missed and also forgotten >>>> by clearing the EMR). >>> >>> Sorry, but I dont see how edma_stop() is coming into picture in the race >>> you describe? >> >> In edma_callback function, for the case of DMA_COMPLETE (Transfer >> completion interrupt), edma_stop() is called when all sets have been >> processed. This had the effect of clearing the EMR. > > Ah, thanks. I was missing the fact that the race comes into picture only > when using the DMA engine driver. I guess that should be mentioned > somewhere since it is not immediately obvious. > > The patch looks good to me. So if you respin just this one with some > updated explanation based on what you wrote below, I will take it. Sure I'll do that. Also the trigger_channel patch, will you be taking that one too? I can send these 2 in a series as they touch arch/arm/common/edma.c Thanks, -Joel > > Thanks, > Sekhar > > -- > To unsubscribe from this list: send the line "unsubscribe linux-omap" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
On 07/31/2013 04:18 AM, Sekhar Nori wrote: > On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote: >> Hi Sekhar, >> >> On 07/30/2013 02:05 AM, Sekhar Nori wrote: >>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >>>> In an effort to move to using Scatter gather lists of any size with >>>> EDMA as discussed at [1] instead of placing limitations on the driver, >>>> we work through the limitations of the EDMAC hardware to find missed >>>> events and issue them. >>>> >>>> The sequence of events that require this are: >>>> >>>> For the scenario where MAX slots for an EDMA channel is 3: >>>> >>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null >>>> >>>> The above SG list will have to be DMA'd in 2 sets: >>>> >>>> (1) SG1 -> SG2 -> SG3 -> Null >>>> (2) SG4 -> SG5 -> SG6 -> Null >>>> >>>> After (1) is succesfully transferred, the events from the MMC controller >>>> donot stop coming and are missed by the time we have setup the transfer >>>> for (2). So here, we catch the events missed as an error condition and >>>> issue them manually. >>> >>> Are you sure there wont be any effect of these missed events on the >>> peripheral side. For example, wont McASP get into an underrun condition >>> when it encounters a null PaRAM set? Even UART has to transmit to a >> >> But it will not encounter null PaRAM set because McASP uses contiguous >> buffers for transfer which are not scattered across physical memory. >> This can be accomplished with an SG of size 1. For such SGs, this patch >> series leaves it linked Dummy and does not link to Null set. Null set is >> only used for SG lists that are > MAX_NR_SG in size such as those >> created for example by MMC and Crypto. >> >>> particular baud so I guess it cannot wait like the way MMC/SD can. >> >> Existing driver have to wait anyway if they hit MAX SG limit today. If >> they don't want to wait, they would have allocated a contiguous block of >> memory and DMA that in one stretch so they don't lose any events, and in >> such cases we are not linking to Null. > > As long as DMA driver can advertize its MAX SG limit, peripherals can > always work around that by limiting the number of sync events they > generate so as to not having any of the events getting missed. With this > series, I am worried that EDMA drivers is advertizing that it can handle > any length SG list while not taking care of missing any events while > doing so. This will break the assumptions that driver writers make. This is already being done by some other DMA engine drivers ;). We can advertise more than we can handle at a time, that's the basis of this whole idea. I understand what you're saying but events are not something that have be serviced immediately, they can be queued etc and the actually transfer from the DMA controller can be delayed. As long as we don't miss the event we are fine which my series takes care off. So far I have tested this series on following modules in various configurations and have seen no issues: - Crypto AES - MMC/SD - SPI (128x160 display) >>> Also, wont this lead to under-utilization of the peripheral bandwith? >>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA >>> is waiting to be set-up. >> >> But it is waiting anyway even today. Currently based on MAX segs, MMC >> driver/subsystem will make SG list of size max_segs. Between these >> sessions of creating such smaller SG-lists, if for some reason the MMC >> controller is sending events, these will be lost anyway. > > But if MMC/SD driver knows how many events it should generate if it > knows the MAX SG limit. So there should not be any missed events in > current code. And I am not claiming that your solution is making matters > worse. But its not making it much better as well. This is not true for crypto, the events are not deasserted and crypto continues to send events. This is what led to the "don't trigger in Null" patch where I'm setting the missed flag to avoid recursion. >> This can be used only for buffers that are contiguous in memory, not >> those that are scattered across memory. > > I was hinting at using the linking facility of EDMA to achieve this. > Each PaRAM set has full 32-bit source and destination pointers so I see > no reason why non-contiguous case cannot be handled. > > Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are > typically 4 times the number of channels. In this
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
On 07/31/2013 09:27 PM, Joel Fernandes wrote: > On 07/31/2013 04:18 AM, Sekhar Nori wrote: >> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote: >>> Hi Sekhar, >>> >>> On 07/30/2013 02:05 AM, Sekhar Nori wrote: >>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >>>>> In an effort to move to using Scatter gather lists of any size with >>>>> EDMA as discussed at [1] instead of placing limitations on the driver, >>>>> we work through the limitations of the EDMAC hardware to find missed >>>>> events and issue them. >>>>> >>>>> The sequence of events that require this are: >>>>> >>>>> For the scenario where MAX slots for an EDMA channel is 3: >>>>> >>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null >>>>> >>>>> The above SG list will have to be DMA'd in 2 sets: >>>>> >>>>> (1) SG1 -> SG2 -> SG3 -> Null >>>>> (2) SG4 -> SG5 -> SG6 -> Null >>>>> >>>>> After (1) is succesfully transferred, the events from the MMC controller >>>>> donot stop coming and are missed by the time we have setup the transfer >>>>> for (2). So here, we catch the events missed as an error condition and >>>>> issue them manually. >>>> >>>> Are you sure there wont be any effect of these missed events on the >>>> peripheral side. For example, wont McASP get into an underrun condition >>>> when it encounters a null PaRAM set? Even UART has to transmit to a >>> >>> But it will not encounter null PaRAM set because McASP uses contiguous >>> buffers for transfer which are not scattered across physical memory. >>> This can be accomplished with an SG of size 1. For such SGs, this patch >>> series leaves it linked Dummy and does not link to Null set. Null set is >>> only used for SG lists that are > MAX_NR_SG in size such as those >>> created for example by MMC and Crypto. >>> >>>> particular baud so I guess it cannot wait like the way MMC/SD can. >>> >>> Existing driver have to wait anyway if they hit MAX SG limit today. If >>> they don't want to wait, they would have allocated a contiguous block of >>> memory and DMA that in one stretch so they don't lose any events, and in >>> such cases we are not linking to Null. >> >> As long as DMA driver can advertize its MAX SG limit, peripherals can >> always work around that by limiting the number of sync events they >> generate so as to not having any of the events getting missed. With this >> series, I am worried that EDMA drivers is advertizing that it can handle >> any length SG list while not taking care of missing any events while >> doing so. This will break the assumptions that driver writers make. Sorry, just forgot to respond to "not taking care of missing any events while doing so". Can you clarify this? DMA engine driver is taking care of missed events. Also- missing of events doesn't result in feedback to the peripheral. Peripheral sends even to DMA controller, event is missed. Peripheral doesn't know anything about what happened and is waiting for transfer from the DMA controller. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/9] dma: edma: Find missed events and issue them
On 07/31/2013 09:27 PM, Joel Fernandes wrote: > On 07/31/2013 04:18 AM, Sekhar Nori wrote: >> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote: >>> Hi Sekhar, >>> >>> On 07/30/2013 02:05 AM, Sekhar Nori wrote: >>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote: >>>>> In an effort to move to using Scatter gather lists of any size with >>>>> EDMA as discussed at [1] instead of placing limitations on the driver, >>>>> we work through the limitations of the EDMAC hardware to find missed >>>>> events and issue them. >>>>> >>>>> The sequence of events that require this are: >>>>> >>>>> For the scenario where MAX slots for an EDMA channel is 3: >>>>> >>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null >>>>> >>>>> The above SG list will have to be DMA'd in 2 sets: >>>>> >>>>> (1) SG1 -> SG2 -> SG3 -> Null >>>>> (2) SG4 -> SG5 -> SG6 -> Null >>>>> >>>>> After (1) is succesfully transferred, the events from the MMC controller >>>>> donot stop coming and are missed by the time we have setup the transfer >>>>> for (2). So here, we catch the events missed as an error condition and >>>>> issue them manually. >>>> >>>> Are you sure there wont be any effect of these missed events on the >>>> peripheral side. For example, wont McASP get into an underrun condition >>>> when it encounters a null PaRAM set? Even UART has to transmit to a >>> >>> But it will not encounter null PaRAM set because McASP uses contiguous >>> buffers for transfer which are not scattered across physical memory. >>> This can be accomplished with an SG of size 1. For such SGs, this patch >>> series leaves it linked Dummy and does not link to Null set. Null set is >>> only used for SG lists that are > MAX_NR_SG in size such as those >>> created for example by MMC and Crypto. >>> >>>> particular baud so I guess it cannot wait like the way MMC/SD can. >>> >>> Existing driver have to wait anyway if they hit MAX SG limit today. If >>> they don't want to wait, they would have allocated a contiguous block of >>> memory and DMA that in one stretch so they don't lose any events, and in >>> such cases we are not linking to Null. >> >> As long as DMA driver can advertize its MAX SG limit, peripherals can >> always work around that by limiting the number of sync events they >> generate so as to not having any of the events getting missed. With this >> series, I am worried that EDMA drivers is advertizing that it can handle >> any length SG list while not taking care of missing any events while >> doing so. This will break the assumptions that driver writers make. > > This is already being done by some other DMA engine drivers ;). We can > advertise more than we can handle at a time, that's the basis of this > whole idea. > > I understand what you're saying but events are not something that have > be serviced immediately, they can be queued etc and the actually > transfer from the DMA controller can be delayed. As long as we don't > miss the event we are fine which my series takes care off. > > So far I have tested this series on following modules in various > configurations and have seen no issues: > - Crypto AES > - MMC/SD > - SPI (128x160 display) > >>>> Also, wont this lead to under-utilization of the peripheral bandwith? >>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA >>>> is waiting to be set-up. >>> >>> But it is waiting anyway even today. Currently based on MAX segs, MMC >>> driver/subsystem will make SG list of size max_segs. Between these >>> sessions of creating such smaller SG-lists, if for some reason the MMC >>> controller is sending events, these will be lost anyway. >> >> But if MMC/SD driver knows how many events it should generate if it >> knows the MAX SG limit. So there should not be any missed events in >> current code. And I am not claiming that your solution is making matters >> worse. But its not making it much better as well. > > This is not true for crypto, the events are not deasserted and crypto > continues to send events. This is what led to the "don't trigger in > Null" patch where I'm setting the missed flag to avoid recursion. > >>> This can be used only for buffers that are contiguous
Re: [PATCH] dma: edma: add device_slave_caps() support
On 07/24/2013 03:03 AM, Lars-Peter Clausen wrote: > On 07/23/2013 06:43 PM, Joel Fernandes wrote: >> Implement device_slave_caps(). EDMA has a limited number of slots. >> Slave drivers such as omap_hsmmc will query the driver to make >> sure they don't pass in more than these many scatter segments. >> >> Signed-off-by: Joel Fernandes >> --- >> Vinod, or Dan- If this patch looks ok, can you please merge in for >> -rc cycle? This patch is required to fix MMC support on AM33xx. This >> patch is blocking 3 other patches which fix various MMC things. Thanks! >> >> Notes: >> (1) this approach is temporary and only for -rc cycle to fix MMC on >> AM335x. It will be replace by the RFC series in future kernels: >> http://www.spinics.net/lists/arm-kernel/msg260094.html >> >> (2) Patch depends Vinod's patch at: >> http://permalink.gmane.org/gmane.linux.kernel/1525112 >> >> drivers/dma/edma.c |9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c >> index 7222cbe..81d5429 100644 >> --- a/drivers/dma/edma.c >> +++ b/drivers/dma/edma.c >> @@ -517,6 +517,14 @@ static void edma_issue_pending(struct dma_chan *chan) >> spin_unlock_irqrestore(>vchan.lock, flags); >> } >> >> +static inline int edma_slave_caps(struct dma_chan *chan, >> +struct dma_slave_caps *caps) >> +{ >> +caps->max_sg_nr = MAX_NR_SG; > > Hm, what about the other fields? > Other fields are unused, the max segment size is supposed to be calculated "given" the address width and burst size. Since these can't be provided to get_caps, I have left it out for now. See: https://lkml.org/lkml/2013/3/6/464 Even if it did, the "segment size" itself is unused in the MMC driver that this is supposed to fix, unlike the "number of segments" which I'm populating above. If you know of a better way to populate max segment size, let me know. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dma: edma: add device_slave_caps() support
On 07/24/2013 03:40 AM, Lars-Peter Clausen wrote: > On 07/24/2013 10:28 AM, Fernandes, Joel wrote: >> >> On Jul 24, 2013, at 3:23 AM, "Lars-Peter Clausen" wrote: >> >>> On 07/24/2013 10:11 AM, Joel Fernandes wrote: >>>> On 07/24/2013 03:03 AM, Lars-Peter Clausen wrote: >>>>> On 07/23/2013 06:43 PM, Joel Fernandes wrote: >>>>>> Implement device_slave_caps(). EDMA has a limited number of slots. >>>>>> Slave drivers such as omap_hsmmc will query the driver to make >>>>>> sure they don't pass in more than these many scatter segments. >>>>>> >>>>>> Signed-off-by: Joel Fernandes >>>>>> --- >>>>>> Vinod, or Dan- If this patch looks ok, can you please merge in for >>>>>> -rc cycle? This patch is required to fix MMC support on AM33xx. This >>>>>> patch is blocking 3 other patches which fix various MMC things. Thanks! >>>>>> >>>>>> Notes: >>>>>> (1) this approach is temporary and only for -rc cycle to fix MMC on >>>>>> AM335x. It will be replace by the RFC series in future kernels: >>>>>> http://www.spinics.net/lists/arm-kernel/msg260094.html >>>>>> >>>>>> (2) Patch depends Vinod's patch at: >>>>>> http://permalink.gmane.org/gmane.linux.kernel/1525112 >>>>>> >>>>>> drivers/dma/edma.c |9 + >>>>>> 1 file changed, 9 insertions(+) >>>>>> >>>>>> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c >>>>>> index 7222cbe..81d5429 100644 >>>>>> --- a/drivers/dma/edma.c >>>>>> +++ b/drivers/dma/edma.c >>>>>> @@ -517,6 +517,14 @@ static void edma_issue_pending(struct dma_chan >>>>>> *chan) >>>>>>spin_unlock_irqrestore(>vchan.lock, flags); >>>>>> } >>>>>> >>>>>> +static inline int edma_slave_caps(struct dma_chan *chan, >>>>>> +struct dma_slave_caps *caps) >>>>>> +{ >>>>>> +caps->max_sg_nr = MAX_NR_SG; >>>>> >>>>> Hm, what about the other fields? >>>> >>>> Other fields are unused, the max segment size is supposed to be >>>> calculated "given" the address width and burst size. Since these >>>> can't be provided to get_caps, I have left it out for now. >>>> See: https://lkml.org/lkml/2013/3/6/464 >>> >>> The PL330 driver is similar in this regard, the maximum segment size also >>> depends on address width and burst width. What I did for the get_slave_caps >>> implementation is to set it to the minimum maximum size. E.g. in you case >>> that should be SZ_64K - 1 (burstsize and addrwidth both set to 1). >> >> So you're setting max to minimum maximum size? Isn't that like telling the >> driver that its segments can't be bigger than this... Unless I'm missing >> something.. > > Yes. This is a limitation of the current slave_caps API. The maximum needs > to be the maximum for all possible configurations. A specific configuration > may allow a larger maximum. So we maybe have to extend the API to be able to > query the limits for a certain configuration. Not sure what the best way > would be to do that, either adding a config parameter to get_slave_caps or > to break it into two functions like you proposed one for the static > capabilities and one for the sg limits. I am OK with either approach as long as a decision can be made quickly by maintainers. Right now lot of back and forth has happened and 3 different versions of the same thing have been posted since January. Since this is such a trivial change, it doesn't make sense to spend so much time on it IMO The sad part is though this change is trivial, other drivers such as MMC are broken and cannot be enabled due to this. We cannot afford to leave them broken. If Vinod is not available, can Dan please respond on how to proceed on this? We really need this trivial change to go into this -rc cycle and not delay it by another kernel release. Thank you. -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dma: edma: add device_slave_caps() support
On 07/24/2013 01:33 PM, Vinod Koul wrote: > On Wed, Jul 24, 2013 at 01:55:24PM -0500, Joel Fernandes wrote: >> On 07/24/2013 03:40 AM, Lars-Peter Clausen wrote: >>> On 07/24/2013 10:28 AM, Fernandes, Joel wrote: >>>>>>>> Vinod, or Dan- If this patch looks ok, can you please merge in for >>>>>>>> -rc cycle? This patch is required to fix MMC support on AM33xx. This >>>>>>>> patch is blocking 3 other patches which fix various MMC things. Thanks! > Sorry I was travelling so not realy on top of email for last few days... > > Now I am not sure we can send this to -rc. OK. > If it were just this one, we could have pushed but it also depends on a new > API > which is sitting in my -next. I am not super comfortable to send that to Linus > for the -rcs. Sure, he would scream at me! OK. > Also another point worth considering is the approach Russell suggested, I > havent > gotten a chance to dig deeper but if I understood it correctly then > programming > the device_dma_parameters should be the right thing to do. Again I need to > look > deeper and esp wrt edma OK. I have some patches sitting in my tree too that I'm working on. With that I don't need to know about maximum number of allowed segments and can send along any number of segment. I will rework them and post them. fwiw, I will also implement caps API incase like Lars did populating the other fields though these will not be unused. For segment size, at this time I don't know any driver that uses it other than davinci-pcm. For this reason the calculations can be done as Lars suggested (for minimum of maximum). Do you know in advance if you're going to amend to drop segment size if we go with what Russell suggested, or are you going to leave the seg-size in the caps API anyway. Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression 3.11-rc1: omap4panda: no usb and consequently no ethernet
Hi Arend, On 07/25/2013 08:06 AM, Arend van Spriel wrote: > On 07/18/2013 02:42 PM, Roger Quadros wrote: >> On 07/18/2013 03:38 PM, Arend van Spriel wrote: >>> On 07/18/2013 01:30 PM, Roger Quadros wrote: On 07/18/2013 02:24 PM, Arend van Spriel wrote: > On 07/18/2013 01:18 PM, Roger Quadros wrote: >> Hi Arend, >> >> On 07/18/2013 11:41 AM, Arend van Spriel wrote: >>> Hi Tony, >>> >>> >>> We are using the panda board (es variant) for testing our SDIO >>> based chips. For this we have an adapter card connection to >>> expansion connector A. As this adapter is not publicly available >>> we had internally patched board-omap4panda.c. Also we follow the >>> -rc releases and use TFTP to boot the kernel image which requires >>> USB. >>> >>> Moving to 3.11-rc1 I found that the mentioned board file was >>> removed by your commit: >>> >>> commit b42b918194c4791510ac049e3d507169a7de8544 >>> Author: Tony Lindgren >>> Date: Thu May 30 12:53:05 2013 -0700 >>> >>>ARM: OMAP2+: Remove board-omap4panda.c >>> >>> As our patches on that file are internal I do not hold it against >>> you. This is no regression and we need to fix it. >>> >>> So my first step was to follow the recipe given in that commit. >>> Beside that I noticed a thread about USB issue on LKML so I also >>> applied the following commit: >>> >>> commit 352f573e59050c7a604c35c58b4bbfc51edebbee >>> Author: Roger Quadros >>> Date: Tue Jun 18 19:04:47 2013 +0300 >>> >>>ARM: OMAP2+: Provide alias to USB PHY clock >>> >>> The attached kernel log seems to suggest that the device tree is >>> picked up by the kernel, but the USB does not seem very active. >>> No ethernet connectivity. This does seem a regression to me. Is >>> there some other patch that I need to get it going again? >>> >> >> I tried with your config and 3.11-rc1 kernel with the above commit >> on top and ethernet seems to >> work for me. My boot log is attached. >> >> Are you sure that you are building the DTB for panda-es and the >> bootloader is picking up the right file and >> not an outdated one? > > I appended the DTB to the kernel image thus avoiding the need to > update u-boot (at least that is what I understood from Tony's commit). > I understand this can be a little tedious at first. This is my u-boot boot.txt fatload mmc 0:1 0x825f omap4-panda-es.dtb fatload mmc 0:1 0x8030 uImage set fdt_high 0x setenv bootargs console=ttyO2,115200n8 mem=1G@0x8000 root=/dev/mmcblk0p2 rootwait bootm 0x8030 - 0x825f You need to generate boot.scr from the above boot.txt and place it in SD card boot partition. mkimage -A arm -T script -C none -n "Boot Image" -d boot.txt boot.scr And of course copy the omap4-panda-es.dtb to SD card boot partition. hope this helps. >>> >>> Thanks for sharing this. >>> >> Why is the version of SPL loader and u-boot different in your log? >> You need to use the MLO file generated by the u-boot build along >> with the uImage. >> >> Just to be sure we are on the same setup could you please try with >> latest u-boot release (2013-04). Thanks. > > I recall having difficulty with TFTP boot using a 2013 u-boot > release, but that hurdle is for later. I will try. > OK. We can figure this out as well. >>> >>> I tried with same SPL and u-boot version, but that did not work out. >>> So I moved to v2013.04 and the log looks better. I was especially >>> interested in this: >>> >>> [2.807434] usb 1-1.1: new high-speed USB device number 3 using >>> ehci-omap >>> [2.932495] usb 1-1.1: New USB device found, idVendor=0424, >>> idProduct=ec00 >>> [2.932495] usb 1-1.1: New USB device strings: Mfr=0, Product=0, >>> SerialNumbe0 >>> [2.958770] smsc95xx v1.0.4 >>> Starting logging: OK >>> Initializing random number generator... [3.045806] smsc95xx >>> 1-1.1:1.0 eth0 >> >> Cool! :). >> >> FYI. I also tested tftpboot from u-boot and NFS root file system and >> it works fine. > > Hi Roger, > > Can I get back on this topic. When USB and ethernet was working for me > as stated above, I was not doing tftpboot. When I use tftpboot the > images are obtained from the tftp server, but after kernel has started > there is nothing in /sys/bus/usb/devices/. I quickly tried 3.11-rc2 + Roger's USB PHY clock patch on omap4-panda-es and enabled following USB options: CONFIG_USB_MUSB_HDRC=y CONFIG_USB_PHY=y CONFIG_OMAP_USB2=y CONFIG_OMAP_CONTROL_USB=y CONFIG_USB_MUSB_OMAP2PLUS=y CONFIG_MFD_OMAP_USB_HOST=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_OHCI_HCD=y CONFIG_USB_EHCI_HCD_OMAP=y With this I see both USB ports, and ethernet: bash-4.2# ifconfig eth0 eth0 Link encap:Ethernet HWaddr
Re: Regression 3.11-rc1: omap4panda: no usb and consequently no ethernet
On 07/25/2013 09:49 PM, Joel Fernandes wrote: [..] >> Can I get back on this topic. When USB and ethernet was working for me >> as stated above, I was not doing tftpboot. When I use tftpboot the >> images are obtained from the tftp server, but after kernel has started >> there is nothing in /sys/bus/usb/devices/. > > I quickly tried 3.11-rc2 + Roger's USB PHY clock patch on omap4-panda-es > and enabled following USB options: > > CONFIG_USB_MUSB_HDRC=y > CONFIG_USB_PHY=y > CONFIG_OMAP_USB2=y > CONFIG_OMAP_CONTROL_USB=y > CONFIG_USB_MUSB_OMAP2PLUS=y > CONFIG_MFD_OMAP_USB_HOST=y > CONFIG_USB_EHCI_HCD=y > CONFIG_USB_OHCI_HCD=y > CONFIG_USB_EHCI_HCD_OMAP=y Sorry I missed saying I also set had to set CONFIG_NOP_USB_XCEIV=y Thanks, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/