Re: [PATCH 2/2] uacce: add uacce module
On Mon, Aug 26, 2019 at 06:29:10AM +0200, Greg Kroah-Hartman wrote: > Date: Mon, 26 Aug 2019 06:29:10 +0200 > From: Greg Kroah-Hartman > To: Kenneth Lee > CC: zhangfei , Arnd Bergmann , > linux-accelerat...@lists.ozlabs.org, linux-kernel@vger.kernel.org, Zaibo > Xu , Zhou Wang > Subject: Re: [PATCH 2/2] uacce: add uacce module > User-Agent: Mutt/1.12.1 (2019-06-15) > Message-ID: <20190826042910.ga26...@kroah.com> > > On Mon, Aug 26, 2019 at 12:10:42PM +0800, Kenneth Lee wrote: > > On Wed, Aug 21, 2019 at 09:05:42AM -0700, Greg Kroah-Hartman wrote: > > > Date: Wed, 21 Aug 2019 09:05:42 -0700 > > > From: Greg Kroah-Hartman > > > To: zhangfei > > > CC: Arnd Bergmann , linux-accelerat...@lists.ozlabs.org, > > > linux-kernel@vger.kernel.org, Kenneth Lee , Zaibo > > > Xu , Zhou Wang > > > Subject: Re: [PATCH 2/2] uacce: add uacce module > > > User-Agent: Mutt/1.12.1 (2019-06-15) > > > Message-ID: <20190821160542.ga14...@kroah.com> > > > > > > On Wed, Aug 21, 2019 at 10:30:22PM +0800, zhangfei wrote: > > > > > > > > > > > > On 2019/8/21 下午5:17, Greg Kroah-Hartman wrote: > > > > > On Wed, Aug 21, 2019 at 03:21:18PM +0800, zhangfei@foxmail.com > > > > > wrote: > > > > > > Hi, Greg > > > > > > > > > > > > On 2019/8/21 上午12:59, Greg Kroah-Hartman wrote: > > > > > > > On Tue, Aug 20, 2019 at 09:08:55PM +0800, zhangfei wrote: > > > > > > > > On 2019/8/15 下午10:13, Greg Kroah-Hartman wrote: > > > > > > > > > On Wed, Aug 14, 2019 at 05:34:25PM +0800, Zhangfei Gao wrote: > > > > > > > > > > +int uacce_register(struct uacce *uacce) > > > > > > > > > > +{ > > > > > > > > > > + int ret; > > > > > > > > > > + > > > > > > > > > > + if (!uacce->pdev) { > > > > > > > > > > + pr_debug("uacce parent device not set\n"); > > > > > > > > > > + return -ENODEV; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + if (uacce->flags & UACCE_DEV_NOIOMMU) { > > > > > > > > > > + add_taint(TAINT_CRAP, LOCKDEP_STILL_OK); > > > > > > > > > > + dev_warn(uacce->pdev, > > > > > > > > > > +"Register to noiommu mode, which > > > > > > > > > > export kernel data to user space and may vulnerable to > > > > > > > > > > attack"); > > > > > > > > > > + } > > > > > > > > > THat is odd, why even offer this feature then if it is a > > > > > > > > > major issue? > > > > > > > > UACCE_DEV_NOIOMMU maybe confusing here. > > > > > > > > > > > > > > > > In this mode, app use ioctl to get dma_handle from > > > > > > > > dma_alloc_coherent. > > > > > > > That's odd, why not use the other default apis to do that? > > > > > > > > > > > > > > > It does not matter iommu is enabled or not. > > > > > > > > In case iommu is disabled, it maybe dangerous to kernel, so we > > > > > > > > added warning here, is it required? > > > > > > > You should use the other documentated apis for this, don't create > > > > > > > your > > > > > > > own. > > > > > > I am sorry, not understand here. > > > > > > Do you mean there is a standard ioctl or standard api in user > > > > > > space, it can > > > > > > get dma_handle from dma_alloc_coherent from kernel? > > > > > There should be a standard way to get such a handle from userspace > > > > > today. Isn't that what the ion interface does? DRM also does this, > > > > > as > > > > > does UIO I think. > > > > Thanks Greg, > > > > Still not find it, will do more search. > > > > But this may introduce dependency in our lib, like depend on ion? > > > > > Do you have a spec somewhere that shows exactly what you are trying to > > > > > do here, along with example userspace
Re: [PATCH 0/2] A General Accelerator Framework, WarpDrive
On Thu, Aug 15, 2019 at 01:04:24PM -0400, Jerome Glisse wrote: > Date: Thu, 15 Aug 2019 13:04:24 -0400 > From: Jerome Glisse > To: Zhangfei Gao > CC: linux-accelerat...@lists.ozlabs.org, Greg Kroah-Hartman > , linux-kernel@vger.kernel.org, Arnd Bergmann > > Subject: Re: [PATCH 0/2] A General Accelerator Framework, WarpDrive > User-Agent: Mutt/1.11.3 (2019-02-01) > Message-ID: <20190815170424.ga30...@redhat.com> > > On Wed, Aug 14, 2019 at 05:34:23PM +0800, Zhangfei Gao wrote: > > *WarpDrive* is a general accelerator framework for the user application to > > access the hardware without going through the kernel in data path. > > > > WarpDrive is the name for the whole framework. The component in kernel > > is called uacce, meaning "Unified/User-space-access-intended Accelerator > > Framework". It makes use of the capability of IOMMU to maintain a > > unified virtual address space between the hardware and the process. > > > > WarpDrive is intended to be used with Jean Philippe Brucker's SVA > > patchset[1], which enables IO side page fault and PASID support. > > We have keep verifying with Jean's sva/current [2] > > We also keep verifying with Eric's SMMUv3 Nested Stage patch [3] > > > > This series and related zip & qm driver as well as dummy driver for qemu > > test: > > https://github.com/Linaro/linux-kernel-warpdrive/tree/5.3-rc1-warpdrive-v1 > > zip driver already been upstreamed. > > zip supporting uacce will be the next step. > > > > The library and user application: > > https://github.com/Linaro/warpdrive/tree/wdprd-v1-current > > Do we want a new framework ? I think that is the first question that > should be answer here. Accelerator are in many forms and so far they > never have been enough commonality to create a framework, even GPUs > with the drm is an example of that, drm only offer share framework > for the modesetting part of the GPU (as thankfully monitor connector > are not specific to GPU brands :)) > > FPGA is another example the only common code expose to userspace is > about bitstream management AFAIK. > > I would argue that a framework should only be created once there is > enough devices with same userspace API. Meanwhile you can provide > in kernel helper that allow driver to expose same API. If after a > while we have enough device driver which all use that same in kernel > helpers API then it will a good time to introduce a new framework. > Meanwhile this will allow individual device driver to tinker with > their API and maybe get to something useful to more devices in the > end. > > Note that what i propose also allow userspace code sharing for all > driver that use the same in kernel helper. > > Cheers, > Jérôme Hi, Jerome, I explain the idea here: https://zhuanlan.zhihu.com/p/79680889. We think this is a comment requirement for eveybody. Hope this can help the discussion. Thanks -- -Kenneth(Hisilicon)
Re: [PATCH 2/2] uacce: add uacce module
On Wed, Aug 21, 2019 at 09:05:42AM -0700, Greg Kroah-Hartman wrote: > Date: Wed, 21 Aug 2019 09:05:42 -0700 > From: Greg Kroah-Hartman > To: zhangfei > CC: Arnd Bergmann , linux-accelerat...@lists.ozlabs.org, > linux-kernel@vger.kernel.org, Kenneth Lee , Zaibo > Xu , Zhou Wang > Subject: Re: [PATCH 2/2] uacce: add uacce module > User-Agent: Mutt/1.12.1 (2019-06-15) > Message-ID: <20190821160542.ga14...@kroah.com> > > On Wed, Aug 21, 2019 at 10:30:22PM +0800, zhangfei wrote: > > > > > > On 2019/8/21 下午5:17, Greg Kroah-Hartman wrote: > > > On Wed, Aug 21, 2019 at 03:21:18PM +0800, zhangfei@foxmail.com wrote: > > > > Hi, Greg > > > > > > > > On 2019/8/21 上午12:59, Greg Kroah-Hartman wrote: > > > > > On Tue, Aug 20, 2019 at 09:08:55PM +0800, zhangfei wrote: > > > > > > On 2019/8/15 下午10:13, Greg Kroah-Hartman wrote: > > > > > > > On Wed, Aug 14, 2019 at 05:34:25PM +0800, Zhangfei Gao wrote: > > > > > > > > +int uacce_register(struct uacce *uacce) > > > > > > > > +{ > > > > > > > > + int ret; > > > > > > > > + > > > > > > > > + if (!uacce->pdev) { > > > > > > > > + pr_debug("uacce parent device not set\n"); > > > > > > > > + return -ENODEV; > > > > > > > > + } > > > > > > > > + > > > > > > > > + if (uacce->flags & UACCE_DEV_NOIOMMU) { > > > > > > > > + add_taint(TAINT_CRAP, LOCKDEP_STILL_OK); > > > > > > > > + dev_warn(uacce->pdev, > > > > > > > > +"Register to noiommu mode, which > > > > > > > > export kernel data to user space and may vulnerable to attack"); > > > > > > > > + } > > > > > > > THat is odd, why even offer this feature then if it is a major > > > > > > > issue? > > > > > > UACCE_DEV_NOIOMMU maybe confusing here. > > > > > > > > > > > > In this mode, app use ioctl to get dma_handle from > > > > > > dma_alloc_coherent. > > > > > That's odd, why not use the other default apis to do that? > > > > > > > > > > > It does not matter iommu is enabled or not. > > > > > > In case iommu is disabled, it maybe dangerous to kernel, so we > > > > > > added warning here, is it required? > > > > > You should use the other documentated apis for this, don't create your > > > > > own. > > > > I am sorry, not understand here. > > > > Do you mean there is a standard ioctl or standard api in user space, it > > > > can > > > > get dma_handle from dma_alloc_coherent from kernel? > > > There should be a standard way to get such a handle from userspace > > > today. Isn't that what the ion interface does? DRM also does this, as > > > does UIO I think. > > Thanks Greg, > > Still not find it, will do more search. > > But this may introduce dependency in our lib, like depend on ion? > > > Do you have a spec somewhere that shows exactly what you are trying to > > > do here, along with example userspace code? It's hard to determine it > > > given you only have one "half" of the code here and no users of the apis > > > you are creating. > > > > > The purpose is doing dma in user space. > > Oh no, please no. Are you _SURE_ you want to do this? > > Again, look at how ION does this and how the DMAbuff stuff is replacing > it. Use that api please instead, otherwise you will get it wrong and we > don't want to duplicate efforts. > > thanks, > > greg k-h Dear Greg. I wrote a blog to explain the intention of WarpDrive here: https://zhuanlan.zhihu.com/p/79680889. Sharing data is not our intention, Sharing address is. NOIOMMU mode is just a temporary solution to let some hardware which does not care the security issue to try WarpDrive for the first step. Some user do not care this much in embedded scenario. We saw VFIO use the same model so we also want to make a try. If you insist this is risky, we can remove it. Thanks. -- -Kenneth(Hisilicon)
Re: [PATCH/RFC 0/5] HW accel subsystem
在 2019/2/1 下午6:07, Greg Kroah-Hartman 写道: On Fri, Feb 01, 2019 at 05:10:40PM +0800, Kenneth Lee wrote: After the RFCv2 was sent to the lkml, we do not get much feedback. But the Infini-band guys said they did not like it. They think the solution is re-invention of ib-verbs. No one needs to re-invent a monstrosity that is ib-verbs. If anything, that is a model that should never be recreated again, showing that we can learn from past mistakes :) But we do not think so. ib-verbs maintains semantics of "REMOTE memory". But UACCE maintains semantics of "LOCAL memory". We don't need to send, or sync memory with other parties. We share those memory with all processes who share the local bus. I agree, don't try to duplicate the mess that people moved away from (hint, everyone sane wraps ib-verbs in another model that can actually be used and understood...) But we know we need more "complete" solution to let people understand and accept our idea. So now we are working on it with our Compression and RSA accelerator on Hi1620 Server SoC. We are also planning to port our AI framework on it. Do you think we can cooperate to create an framework in Linux together? Please feel free to ask for more information. We are happy to answer it. Sure, that sounds like a great goal! Thank you very much for your encouragement:) Kenneth Lee thanks, greg k-h
Re: [PATCH/RFC 0/5] HW accel subsystem
On Fri, Jan 25, 2019 at 10:16:11AM -0800, Olof Johansson wrote: > Date: Fri, 25 Jan 2019 10:16:11 -0800 > From: Olof Johansson > To: linux-kernel@vger.kernel.org > CC: ogab...@habana.ai, Greg Kroah-Hartman , > jgli...@redhat.com, Andrew Donnellan , > Frederic Barrat , airl...@redhat.com, > linux-accelerat...@lists.ozlabs.org > Subject: [PATCH/RFC 0/5] HW accel subsystem > X-Mailer: git-send-email 2.11.0 > Message-ID: <20190125181616.62609-1-o...@lixom.net> > > Per discussion in on the Habana Labs driver submission > (https://lore.kernel.org/lkml/2019012357.31477-1-oded.gab...@gmail.com/), > there seems to be time to create a separate subsystem for hw accellerators > instead of letting them proliferate around the tree (and/or in misc). > > There's difference in opinion on how stringent the requirements are for > a fully open stack for these kind of drivers. I've documented the middle > road approach in the first patch (requiring some sort of open low-level > userspace for the kernel interaction, and a way to use/test it). > > Comments and suggestions for better approaches are definitely welcome. Dear Olof, How are you? Let me introduce myself. My name is Kenenth Lee, working for Hisilicon. Our company provide server, AI, networking and terminal SoCs to the market. We tried to create an accelerator framework a year back and now we are working on the branch here (There is document in Documentation/warpdrive directory): https://github.com/Kenneth-Lee/linux-kernel-warpdrive/tree/wdprd-v1 The user space framework is here: https://github.com/Kenneth-Lee/warpdrive/tree/wdprd-v1 We have tried to create it on VFIO at the very beginning. The RFCv1 is here: https://lwn.net/Articles/763990/ But it seems it is not fit. There are two major issues: 1. The VFIO framework enforces the concept of separating the resource into devices before using it. This is not an accelerator style. Accelerator is another CPU to let the others to share it. 2. The way VFIO used to pin memory in place, has some flaw. In the current kernel, if you fork a sub-rpcess after pin the dma memory, you may lost the physical pages. (You can get more detail in the threads) So we tried RFCv2 and build the solution directly on IOMMU. We call our solution as WarpDrive and the kernel module is called uacce. Our assumption is that: 1. Most of users of the accelerator are in user space. 2. An accelerator is always another heterogeneous processor. It is waiting and processing work load sent from CPU. 3. The data structure in the CPU may be complex. It is no good to wrap the data and send it to hardware again and again. The better way is to keep the data in place and give a pointer to the accelerator, leaving it to finish the job. So we create a pipe (we called it queue) between the user process and the hardware directly. It is presented as a file to the user space. The user process mmap the queue file to address the mmio space of the hardware, share memory and so on. With the capability of IOMMU, we can share the whole or part of process space with the hardware. This can make the software solution easier. After the RFCv2 was sent to the lkml, we do not get much feedback. But the Infini-band guys said they did not like it. They think the solution is re-invention of ib-verbs. But we do not think so. ib-verbs maintains semantics of "REMOTE memory". But UACCE maintains semantics of "LOCAL memory". We don't need to send, or sync memory with other parties. We share those memory with all processes who share the local bus. But we know we need more "complete" solution to let people understand and accept our idea. So now we are working on it with our Compression and RSA accelerator on Hi1620 Server SoC. We are also planning to port our AI framework on it. Do you think we can cooperate to create an framework in Linux together? Please feel free to ask for more information. We are happy to answer it. Cheers -- -Kenneth(Hisilicon)
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 08:29:39PM -0700, Jason Gunthorpe wrote: > Date: Mon, 19 Nov 2018 20:29:39 -0700 > From: Jason Gunthorpe > To: Kenneth Lee > CC: Leon Romanovsky , Kenneth Lee , > Tim Sell , linux-...@vger.kernel.org, Alexander > Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Zhou Wang , > Doug Ledford , Uwe Kleine-König > , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Vinod Koul > , linux-cry...@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.9.4 (2018-02-28) > Message-ID: <20181120032939.gr4...@ziepe.ca> > > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-...@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-König > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.gb4...@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to > > > > have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But > > > > this cannot > > > > solve the COW problem: if the user process A share a page P with > > > > device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you > > > > have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing > > to > > do with mm_notifer. > > Well, that sure sounds like it would be a bug in mmu_notifiers.. Yes, it can be taken that way:) But it is going to be a tough bug. > > But considering Jean's SVA stuff seems based on mmu notifiers, I have > a hard time believing that it has any different behavior from RDMA's > ODP, and if it does have different behavior, then it is probably just > a bug in the ODP implementation. As Jean has explained, his solution is based on page table sharing. I think ODP should also consider this new feature. > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > > > > support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > > > > don't need > > > > to write any code for that. Because it has been done by IOMMU > > > > framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 08:29:39PM -0700, Jason Gunthorpe wrote: > Date: Mon, 19 Nov 2018 20:29:39 -0700 > From: Jason Gunthorpe > To: Kenneth Lee > CC: Leon Romanovsky , Kenneth Lee , > Tim Sell , linux-...@vger.kernel.org, Alexander > Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Zhou Wang , > Doug Ledford , Uwe Kleine-König > , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Vinod Koul > , linux-cry...@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.9.4 (2018-02-28) > Message-ID: <20181120032939.gr4...@ziepe.ca> > > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-...@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-König > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.gb4...@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to > > > > have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But > > > > this cannot > > > > solve the COW problem: if the user process A share a page P with > > > > device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you > > > > have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing > > to > > do with mm_notifer. > > Well, that sure sounds like it would be a bug in mmu_notifiers.. Yes, it can be taken that way:) But it is going to be a tough bug. > > But considering Jean's SVA stuff seems based on mmu notifiers, I have > a hard time believing that it has any different behavior from RDMA's > ODP, and if it does have different behavior, then it is probably just > a bug in the ODP implementation. As Jean has explained, his solution is based on page table sharing. I think ODP should also consider this new feature. > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > > > > support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > > > > don't need > > > > to write any code for that. Because it has been done by IOMMU > > > > framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > Date: Mon, 19 Nov 2018 17:14:05 +0800 > From: Kenneth Lee > To: Leon Romanovsky > CC: Tim Sell , linux-...@vger.kernel.org, > Alexander Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Vinod Koul , Jason > Gunthorpe , Doug Ledford , Uwe > Kleine-König , David Kershner > , Kenneth Lee , Johan > Hovold , Cyrille Pitchen > , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Zhou Wang > , linux-cry...@vger.kernel.org, Philippe > Ombredanne , Sanyog Kale , > "David S. Miller" , > linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.5.21 (2010-09-15) > Message-ID: <20181119091405.GE157308@Turing-Arch-b> > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > > Date: Thu, 15 Nov 2018 16:54:55 +0200 > > From: Leon Romanovsky > > To: Kenneth Lee > > CC: Kenneth Lee , Tim Sell , > > linux-...@vger.kernel.org, Alexander Shishkin > > , Zaibo Xu , > > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > > Christoph Lameter , Hao Fang , Gavin > > Schenk , RDMA mailing list > > , Zhou Wang , Jason > > Gunthorpe , Doug Ledford , Uwe > > Kleine-König , David Kershner > > , Johan Hovold , Cyrille > > Pitchen , Sagar Dharia > > , Jens Axboe , > > guodong...@linaro.org, linux-netdev , Randy Dunlap > > , linux-kernel@vger.kernel.org, Vinod Koul > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > , Sanyog Kale , "David S. > > Miller" , linux-accelerat...@lists.ozlabs.org > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > User-Agent: Mutt/1.10.1 (2018-07-13) > > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > > From: Leon Romanovsky > > > > To: Kenneth Lee > > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > > Alexander Shishkin , Zaibo Xu > > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > > , Gavin Schenk , RDMA > > > > mailing > > > > list , Zhou Wang , > > > > Jason Gunthorpe , Doug Ledford , > > > > Uwe > > > > Kleine-König , David Kershner > > > > , Johan Hovold , Cyrille > > > > Pitchen , Sagar Dharia > > > > , Jens Axboe , > > > > guodong...@linaro.org, linux-netdev , Randy > > > > Dunlap > > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > > , Sanyog Kale , Kenneth > > > > Lee > > > > , "David S. Miller" , > > > > linux-accelerat...@lists.ozlabs.org > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > > From: Kenneth Lee > > > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > > application to > > > > > > > access the hardware without going through the kernel in data path. > > > > > > > > > > > > > > The kernel component to provide kernel facility to driver for > > > > > > > expose the > > > > > > > user interface is called uacce. It a short name for > > > > > > > "Unified/User-space-access-intended Accelerator Framework". > > > > > > > > > > > > > > This patch add document to explain how it works. > > > > > > + RDMA and netdev folks > > > > > > > > > > >
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > Date: Mon, 19 Nov 2018 17:14:05 +0800 > From: Kenneth Lee > To: Leon Romanovsky > CC: Tim Sell , linux-...@vger.kernel.org, > Alexander Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Vinod Koul , Jason > Gunthorpe , Doug Ledford , Uwe > Kleine-König , David Kershner > , Kenneth Lee , Johan > Hovold , Cyrille Pitchen > , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Zhou Wang > , linux-cry...@vger.kernel.org, Philippe > Ombredanne , Sanyog Kale , > "David S. Miller" , > linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.5.21 (2010-09-15) > Message-ID: <20181119091405.GE157308@Turing-Arch-b> > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > > Date: Thu, 15 Nov 2018 16:54:55 +0200 > > From: Leon Romanovsky > > To: Kenneth Lee > > CC: Kenneth Lee , Tim Sell , > > linux-...@vger.kernel.org, Alexander Shishkin > > , Zaibo Xu , > > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > > Christoph Lameter , Hao Fang , Gavin > > Schenk , RDMA mailing list > > , Zhou Wang , Jason > > Gunthorpe , Doug Ledford , Uwe > > Kleine-König , David Kershner > > , Johan Hovold , Cyrille > > Pitchen , Sagar Dharia > > , Jens Axboe , > > guodong...@linaro.org, linux-netdev , Randy Dunlap > > , linux-kernel@vger.kernel.org, Vinod Koul > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > , Sanyog Kale , "David S. > > Miller" , linux-accelerat...@lists.ozlabs.org > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > User-Agent: Mutt/1.10.1 (2018-07-13) > > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > > From: Leon Romanovsky > > > > To: Kenneth Lee > > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > > Alexander Shishkin , Zaibo Xu > > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > > , Gavin Schenk , RDMA > > > > mailing > > > > list , Zhou Wang , > > > > Jason Gunthorpe , Doug Ledford , > > > > Uwe > > > > Kleine-König , David Kershner > > > > , Johan Hovold , Cyrille > > > > Pitchen , Sagar Dharia > > > > , Jens Axboe , > > > > guodong...@linaro.org, linux-netdev , Randy > > > > Dunlap > > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > > , Sanyog Kale , Kenneth > > > > Lee > > > > , "David S. Miller" , > > > > linux-accelerat...@lists.ozlabs.org > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > > From: Kenneth Lee > > > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > > application to > > > > > > > access the hardware without going through the kernel in data path. > > > > > > > > > > > > > > The kernel component to provide kernel facility to driver for > > > > > > > expose the > > > > > > > user interface is called uacce. It a short name for > > > > > > > "Unified/User-space-access-intended Accelerator Framework". > > > > > > > > > > > > > > This patch add document to explain how it works. > > > > > > + RDMA and netdev folks > > > > > > > > > > >
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > Date: Thu, 15 Nov 2018 16:54:55 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: Kenneth Lee , Tim Sell , > linux-...@vger.kernel.org, Alexander Shishkin > , Zaibo Xu , > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > Christoph Lameter , Hao Fang , Gavin > Schenk , RDMA mailing list > , Zhou Wang , Jason > Gunthorpe , Doug Ledford , Uwe > Kleine-König , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Vinod Koul > , linux-cry...@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.10.1 (2018-07-13) > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > From: Leon Romanovsky > > > To: Kenneth Lee > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > Alexander Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Jason Gunthorpe , Doug Ledford , Uwe > > > Kleine-König , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , Kenneth > > > Lee > > > , "David S. Miller" , > > > linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > From: Kenneth Lee > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > application to > > > > > > access the hardware without going through the kernel in data path. > > > > > > > > > > > > The kernel component to provide kernel facility to driver for > > > > > > expose the > > > > > > user interface is called uacce. It a short name for > > > > > > "Unified/User-space-access-intended Accelerator Framework". > > > > > > > > > > > > This patch add document to explain how it works. > > > > > + RDMA and netdev folks > > > > > > > > > > Sorry, to be late in the game, I don't see other patches, but from > > > > > the description below it seems like you are reinventing RDMA verbs > > > > > model. I have hard time to see the differences in the proposed > > > > > framework to already implemented in drivers/infiniband/* for the > > > > > kernel > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the > > > > > user > > > > > space parts. > > > > > > > > Thanks Leon, > > > > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot > > > > from > > > > the exist code of RDMA. But we we have to make a new one because we > > > > cannot > > > > register accelerators such as AI operation, encryption or compression > > > > to the > > > > RDMA framework:) > > > > > > Assuming that you did everything right and still failed to use RDMA > > > framework, you was supposed to fix it and not to reinvent new exactly > > > same one. It is how we develop kernel, by reusing existing code. > > > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we? > > You don't introduce new NIC or GPU, but proposing another interfac
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > Date: Thu, 15 Nov 2018 16:54:55 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: Kenneth Lee , Tim Sell , > linux-...@vger.kernel.org, Alexander Shishkin > , Zaibo Xu , > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > Christoph Lameter , Hao Fang , Gavin > Schenk , RDMA mailing list > , Zhou Wang , Jason > Gunthorpe , Doug Ledford , Uwe > Kleine-König , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Vinod Koul > , linux-cry...@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.10.1 (2018-07-13) > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > From: Leon Romanovsky > > > To: Kenneth Lee > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > Alexander Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Jason Gunthorpe , Doug Ledford , Uwe > > > Kleine-König , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > , linux-cry...@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , Kenneth > > > Lee > > > , "David S. Miller" , > > > linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > From: Kenneth Lee > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > application to > > > > > > access the hardware without going through the kernel in data path. > > > > > > > > > > > > The kernel component to provide kernel facility to driver for > > > > > > expose the > > > > > > user interface is called uacce. It a short name for > > > > > > "Unified/User-space-access-intended Accelerator Framework". > > > > > > > > > > > > This patch add document to explain how it works. > > > > > + RDMA and netdev folks > > > > > > > > > > Sorry, to be late in the game, I don't see other patches, but from > > > > > the description below it seems like you are reinventing RDMA verbs > > > > > model. I have hard time to see the differences in the proposed > > > > > framework to already implemented in drivers/infiniband/* for the > > > > > kernel > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the > > > > > user > > > > > space parts. > > > > > > > > Thanks Leon, > > > > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot > > > > from > > > > the exist code of RDMA. But we we have to make a new one because we > > > > cannot > > > > register accelerators such as AI operation, encryption or compression > > > > to the > > > > RDMA framework:) > > > > > > Assuming that you did everything right and still failed to use RDMA > > > framework, you was supposed to fix it and not to reinvent new exactly > > > same one. It is how we develop kernel, by reusing existing code. > > > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we? > > You don't introduce new NIC or GPU, but proposing another interfac
[RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
From: Kenneth Lee WarpDrive is a general accelerator framework for the user application to access the hardware without going through the kernel in data path. The kernel component to provide kernel facility to driver for expose the user interface is called uacce. It a short name for "Unified/User-space-access-intended Accelerator Framework". This patch add document to explain how it works. Signed-off-by: Kenneth Lee --- Documentation/warpdrive/warpdrive.rst | 260 +++ Documentation/warpdrive/wd-arch.svg | 764 Documentation/warpdrive/wd.svg | 526 ++ Documentation/warpdrive/wd_q_addr_space.svg | 359 + 4 files changed, 1909 insertions(+) create mode 100644 Documentation/warpdrive/warpdrive.rst create mode 100644 Documentation/warpdrive/wd-arch.svg create mode 100644 Documentation/warpdrive/wd.svg create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst new file mode 100644 index ..ef84d3a2d462 --- /dev/null +++ b/Documentation/warpdrive/warpdrive.rst @@ -0,0 +1,260 @@ +Introduction of WarpDrive += + +*WarpDrive* is a general accelerator framework for the user application to +access the hardware without going through the kernel in data path. + +It can be used as the quick channel for accelerators, network adaptors or +other hardware for application in user space. + +This may make some implementation simpler. E.g. you can reuse most of the +*netdev* driver in kernel and just share some ring buffer to the user space +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with +the *netdev* in the user space as a https reversed proxy, etc. + +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which +can share particular load from the CPU: + +.. image:: wd.svg +:alt: WarpDrive Concept + +The virtual concept, queue, is used to manage the requests sent to the +accelerator. The application send requests to the queue by writing to some +particular address, while the hardware takes the requests directly from the +address and send feedback accordingly. + +The format of the queue may differ from hardware to hardware. But the +application need not to make any system call for the communication. + +*WarpDrive* tries to create a shared virtual address space for all involved +accelerators. Within this space, the requests sent to queue can refer to any +virtual address, which will be valid to the application and all involved +accelerators. + +The name *WarpDrive* is simply a cool and general name meaning the framework +makes the application faster. It includes general user library, kernel +management module and drivers for the hardware. In kernel, the management +module is called *uacce*, meaning "Unified/User-space-access-intended +Accelerator Framework". + + +How does it work + + +*WarpDrive* uses *mmap* and *IOMMU* to play the trick. + +*Uacce* creates a chrdev for the device registered to it. A "queue" will be +created when the chrdev is opened. The application access the queue by mmap +different address region of the queue file. + +The following figure demonstrated the queue file address space: + +.. image:: wd_q_addr_space.svg +:alt: WarpDrive Queue Address Space + +The first region of the space, device region, is used for the application to +write request or read answer to or from the hardware. + +Normally, there can be three types of device regions mmio and memory regions. +It is recommended to use common memory for request/answer descriptors and use +the mmio space for device notification, such as doorbell. But of course, this +is all up to the interface designer. + +There can be two types of device memory regions, kernel-only and user-shared. +This will be explained in the "kernel APIs" section. + +The Static Share Virtual Memory region is necessary only when the device IOMMU +does not support "Share Virtual Memory". This will be explained after the +*IOMMU* idea. + + +Architecture + + +The full *WarpDrive* architecture is represented in the following class +diagram: + +.. image:: wd-arch.svg +:alt: WarpDrive Architecture + + +The user API + + +We adopt a polling style interface in the user space: :: + +int wd_request_queue(struct wd_queue *q); +void wd_release_queue(struct wd_queue *q); + +int wd_send(struct wd_queue *q, void *req); +int wd_recv(struct wd_queue *q, void **req); +int wd_recv_sync(struct wd_queue *q, void **req); +void wd_flush(struct wd_queue *q); + +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into +kernel and waits until the queue become available. + +If the queue do not support SVA/SVM. The following helper function +can be used to crea
[RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
From: Kenneth Lee WarpDrive is a general accelerator framework for the user application to access the hardware without going through the kernel in data path. The kernel component to provide kernel facility to driver for expose the user interface is called uacce. It a short name for "Unified/User-space-access-intended Accelerator Framework". This patch add document to explain how it works. Signed-off-by: Kenneth Lee --- Documentation/warpdrive/warpdrive.rst | 260 +++ Documentation/warpdrive/wd-arch.svg | 764 Documentation/warpdrive/wd.svg | 526 ++ Documentation/warpdrive/wd_q_addr_space.svg | 359 + 4 files changed, 1909 insertions(+) create mode 100644 Documentation/warpdrive/warpdrive.rst create mode 100644 Documentation/warpdrive/wd-arch.svg create mode 100644 Documentation/warpdrive/wd.svg create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst new file mode 100644 index ..ef84d3a2d462 --- /dev/null +++ b/Documentation/warpdrive/warpdrive.rst @@ -0,0 +1,260 @@ +Introduction of WarpDrive += + +*WarpDrive* is a general accelerator framework for the user application to +access the hardware without going through the kernel in data path. + +It can be used as the quick channel for accelerators, network adaptors or +other hardware for application in user space. + +This may make some implementation simpler. E.g. you can reuse most of the +*netdev* driver in kernel and just share some ring buffer to the user space +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with +the *netdev* in the user space as a https reversed proxy, etc. + +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which +can share particular load from the CPU: + +.. image:: wd.svg +:alt: WarpDrive Concept + +The virtual concept, queue, is used to manage the requests sent to the +accelerator. The application send requests to the queue by writing to some +particular address, while the hardware takes the requests directly from the +address and send feedback accordingly. + +The format of the queue may differ from hardware to hardware. But the +application need not to make any system call for the communication. + +*WarpDrive* tries to create a shared virtual address space for all involved +accelerators. Within this space, the requests sent to queue can refer to any +virtual address, which will be valid to the application and all involved +accelerators. + +The name *WarpDrive* is simply a cool and general name meaning the framework +makes the application faster. It includes general user library, kernel +management module and drivers for the hardware. In kernel, the management +module is called *uacce*, meaning "Unified/User-space-access-intended +Accelerator Framework". + + +How does it work + + +*WarpDrive* uses *mmap* and *IOMMU* to play the trick. + +*Uacce* creates a chrdev for the device registered to it. A "queue" will be +created when the chrdev is opened. The application access the queue by mmap +different address region of the queue file. + +The following figure demonstrated the queue file address space: + +.. image:: wd_q_addr_space.svg +:alt: WarpDrive Queue Address Space + +The first region of the space, device region, is used for the application to +write request or read answer to or from the hardware. + +Normally, there can be three types of device regions mmio and memory regions. +It is recommended to use common memory for request/answer descriptors and use +the mmio space for device notification, such as doorbell. But of course, this +is all up to the interface designer. + +There can be two types of device memory regions, kernel-only and user-shared. +This will be explained in the "kernel APIs" section. + +The Static Share Virtual Memory region is necessary only when the device IOMMU +does not support "Share Virtual Memory". This will be explained after the +*IOMMU* idea. + + +Architecture + + +The full *WarpDrive* architecture is represented in the following class +diagram: + +.. image:: wd-arch.svg +:alt: WarpDrive Architecture + + +The user API + + +We adopt a polling style interface in the user space: :: + +int wd_request_queue(struct wd_queue *q); +void wd_release_queue(struct wd_queue *q); + +int wd_send(struct wd_queue *q, void *req); +int wd_recv(struct wd_queue *q, void **req); +int wd_recv_sync(struct wd_queue *q, void **req); +void wd_flush(struct wd_queue *q); + +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into +kernel and waits until the queue become available. + +If the queue do not support SVA/SVM. The following helper function +can be used to crea
Re: [PATCH 4/7] crypto: add hisilicon Queue Manager driver
On Sun, Sep 02, 2018 at 07:15:07PM -0700, Randy Dunlap wrote: > Date: Sun, 2 Sep 2018 19:15:07 -0700 > From: Randy Dunlap > To: Kenneth Lee , Jonathan Corbet , > Herbert Xu , "David S . Miller" > , Joerg Roedel , Alex Williamson > , Kenneth Lee , Hao > Fang , Zhou Wang , Zaibo Xu > , Philippe Ombredanne , Greg > Kroah-Hartman , Thomas Gleixner > , linux-...@vger.kernel.org, > linux-kernel@vger.kernel.org, linux-cry...@vger.kernel.org, > io...@lists.linux-foundation.org, k...@vger.kernel.org, > linux-accelerat...@lists.ozlabs.org, Lu Baolu , > Sanjay Kumar > CC: linux...@huawei.com > Subject: Re: [PATCH 4/7] crypto: add hisilicon Queue Manager driver > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 > Thunderbird/52.9.1 > Message-ID: <4e46a451-d1cd-ac68-84b4-20792fdbc...@infradead.org> > > On 09/02/2018 05:52 PM, Kenneth Lee wrote: > > diff --git a/drivers/crypto/hisilicon/Kconfig > > b/drivers/crypto/hisilicon/Kconfig > > index 8ca9c503bcb0..02a6eef84101 100644 > > --- a/drivers/crypto/hisilicon/Kconfig > > +++ b/drivers/crypto/hisilicon/Kconfig > > @@ -1,4 +1,8 @@ > > # SPDX-License-Identifier: GPL-2.0 > > +config CRYPTO_DEV_HISILICON > > + tristate "Support for HISILICON CRYPTO ACCELERATOR" > > + help > > + Enable this to use Hisilicon Hardware Accelerators > > Accelerators. Thanks, will change it in next version. > > > -- > ~Randy -- -Kenneth(Hisilicon) 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: [PATCH 4/7] crypto: add hisilicon Queue Manager driver
On Sun, Sep 02, 2018 at 07:15:07PM -0700, Randy Dunlap wrote: > Date: Sun, 2 Sep 2018 19:15:07 -0700 > From: Randy Dunlap > To: Kenneth Lee , Jonathan Corbet , > Herbert Xu , "David S . Miller" > , Joerg Roedel , Alex Williamson > , Kenneth Lee , Hao > Fang , Zhou Wang , Zaibo Xu > , Philippe Ombredanne , Greg > Kroah-Hartman , Thomas Gleixner > , linux-...@vger.kernel.org, > linux-kernel@vger.kernel.org, linux-cry...@vger.kernel.org, > io...@lists.linux-foundation.org, k...@vger.kernel.org, > linux-accelerat...@lists.ozlabs.org, Lu Baolu , > Sanjay Kumar > CC: linux...@huawei.com > Subject: Re: [PATCH 4/7] crypto: add hisilicon Queue Manager driver > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 > Thunderbird/52.9.1 > Message-ID: <4e46a451-d1cd-ac68-84b4-20792fdbc...@infradead.org> > > On 09/02/2018 05:52 PM, Kenneth Lee wrote: > > diff --git a/drivers/crypto/hisilicon/Kconfig > > b/drivers/crypto/hisilicon/Kconfig > > index 8ca9c503bcb0..02a6eef84101 100644 > > --- a/drivers/crypto/hisilicon/Kconfig > > +++ b/drivers/crypto/hisilicon/Kconfig > > @@ -1,4 +1,8 @@ > > # SPDX-License-Identifier: GPL-2.0 > > +config CRYPTO_DEV_HISILICON > > + tristate "Support for HISILICON CRYPTO ACCELERATOR" > > + help > > + Enable this to use Hisilicon Hardware Accelerators > > Accelerators. Thanks, will change it in next version. > > > -- > ~Randy -- -Kenneth(Hisilicon) 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: [RFC PATCH 3/7] vfio: add spimdev support
On Thu, Aug 02, 2018 at 12:43:27PM -0600, Alex Williamson wrote: > Date: Thu, 2 Aug 2018 12:43:27 -0600 > From: Alex Williamson > To: Cornelia Huck > CC: Kenneth Lee , "Tian, Kevin" > , Kenneth Lee , Jonathan Corbet > , Herbert Xu , "David S . > Miller" , Joerg Roedel , Hao Fang > , Zhou Wang , Zaibo Xu > , Philippe Ombredanne , "Greg > Kroah-Hartman" , Thomas Gleixner > , "linux-...@vger.kernel.org" > , "linux-kernel@vger.kernel.org" > , "linux-cry...@vger.kernel.org" > , "io...@lists.linux-foundation.org" > , "k...@vger.kernel.org" > , "linux-accelerat...@lists.ozlabs.org\" > , Lu Baolu > , Kumar", , " linux...@huawei.com " > "> > Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support > Message-ID: <20180802124327.403b1...@t450s.home> > > On Thu, 2 Aug 2018 10:35:28 +0200 > Cornelia Huck wrote: > > > On Thu, 2 Aug 2018 15:34:40 +0800 > > Kenneth Lee wrote: > > > > > On Thu, Aug 02, 2018 at 04:24:22AM +, Tian, Kevin wrote: > > > > > > > From: Kenneth Lee [mailto:liguo...@hisilicon.com] > > > > > Sent: Thursday, August 2, 2018 11:47 AM > > > > > > > > > > > > > > > > > > From: Kenneth Lee > > > > > > > Sent: Wednesday, August 1, 2018 6:22 PM > > > > > > > > > > > > > > From: Kenneth Lee > > > > > > > > > > > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But > > > > > > > differ > > > > > from > > > > > > > the general vfio-mdev: > > > > > > > > > > > > > > 1. It shares its parent's IOMMU. > > > > > > > 2. There is no hardware resource attached to the mdev is created. > > > > > > > The > > > > > > > hardware resource (A `queue') is allocated only when the mdev is > > > > > > > opened. > > > > > > > > > > > > Alex has concern on doing so, as pointed out in: > > > > > > > > > > > > https://www.spinics.net/lists/kvm/msg172652.html > > > > > > > > > > > > resource allocation should be reserved at creation time. > > > > > > > > > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is > > > > > for "many > > > > > processes", it is just an access point to the process. Not a device > > > > > to VM. I > > > > > hope > > > > > Alex can accept it:) > > > > > > > > > > > > > VFIO is just about assigning device resource to user space. It doesn't > > > > care > > > > whether it's native processes or VM using the device so far. Along the > > > > direction > > > > which you described, looks VFIO needs to support the configuration that > > > > some mdevs are used for native process only, while others can be used > > > > for both native and VM. I'm not sure whether there is a clean way to > > > > enforce it... > > > > > > I had the same idea at the beginning. But finally I found that the life > > > cycle > > > of the virtual device for VM and process were different. Consider you > > > create > > > some mdevs for VM use, you will give all those mdevs to lib-virt, which > > > distribute those mdev to VMs or containers. If the VM or container exits, > > > the > > > mdev is returned to the lib-virt and used for next allocation. It is the > > > administrator who controlled every mdev's allocation. > > Libvirt currently does no management of mdev devices, so I believe > this example is fictitious. The extent of libvirt's interaction with > mdev is that XML may specify an mdev UUID as the source for a hostdev > and set the permissions on the device files appropriately. Whether > mdevs are created in advance and re-used or created and destroyed > around a VM instance (for example via qemu hooks scripts) is not a > policy that libvirt imposes. > > > > But for process, it is different. There is no lib-virt in control. The > > > administrator's intension is to grant some type of application to access > > > the > > > hardware. The application can get a handle of the hardware, send request > > > and ge
Re: [RFC PATCH 3/7] vfio: add spimdev support
On Thu, Aug 02, 2018 at 12:43:27PM -0600, Alex Williamson wrote: > Date: Thu, 2 Aug 2018 12:43:27 -0600 > From: Alex Williamson > To: Cornelia Huck > CC: Kenneth Lee , "Tian, Kevin" > , Kenneth Lee , Jonathan Corbet > , Herbert Xu , "David S . > Miller" , Joerg Roedel , Hao Fang > , Zhou Wang , Zaibo Xu > , Philippe Ombredanne , "Greg > Kroah-Hartman" , Thomas Gleixner > , "linux-...@vger.kernel.org" > , "linux-kernel@vger.kernel.org" > , "linux-cry...@vger.kernel.org" > , "io...@lists.linux-foundation.org" > , "k...@vger.kernel.org" > , "linux-accelerat...@lists.ozlabs.org\" > , Lu Baolu > , Kumar", , " linux...@huawei.com " > "> > Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support > Message-ID: <20180802124327.403b1...@t450s.home> > > On Thu, 2 Aug 2018 10:35:28 +0200 > Cornelia Huck wrote: > > > On Thu, 2 Aug 2018 15:34:40 +0800 > > Kenneth Lee wrote: > > > > > On Thu, Aug 02, 2018 at 04:24:22AM +, Tian, Kevin wrote: > > > > > > > From: Kenneth Lee [mailto:liguo...@hisilicon.com] > > > > > Sent: Thursday, August 2, 2018 11:47 AM > > > > > > > > > > > > > > > > > > From: Kenneth Lee > > > > > > > Sent: Wednesday, August 1, 2018 6:22 PM > > > > > > > > > > > > > > From: Kenneth Lee > > > > > > > > > > > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But > > > > > > > differ > > > > > from > > > > > > > the general vfio-mdev: > > > > > > > > > > > > > > 1. It shares its parent's IOMMU. > > > > > > > 2. There is no hardware resource attached to the mdev is created. > > > > > > > The > > > > > > > hardware resource (A `queue') is allocated only when the mdev is > > > > > > > opened. > > > > > > > > > > > > Alex has concern on doing so, as pointed out in: > > > > > > > > > > > > https://www.spinics.net/lists/kvm/msg172652.html > > > > > > > > > > > > resource allocation should be reserved at creation time. > > > > > > > > > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is > > > > > for "many > > > > > processes", it is just an access point to the process. Not a device > > > > > to VM. I > > > > > hope > > > > > Alex can accept it:) > > > > > > > > > > > > > VFIO is just about assigning device resource to user space. It doesn't > > > > care > > > > whether it's native processes or VM using the device so far. Along the > > > > direction > > > > which you described, looks VFIO needs to support the configuration that > > > > some mdevs are used for native process only, while others can be used > > > > for both native and VM. I'm not sure whether there is a clean way to > > > > enforce it... > > > > > > I had the same idea at the beginning. But finally I found that the life > > > cycle > > > of the virtual device for VM and process were different. Consider you > > > create > > > some mdevs for VM use, you will give all those mdevs to lib-virt, which > > > distribute those mdev to VMs or containers. If the VM or container exits, > > > the > > > mdev is returned to the lib-virt and used for next allocation. It is the > > > administrator who controlled every mdev's allocation. > > Libvirt currently does no management of mdev devices, so I believe > this example is fictitious. The extent of libvirt's interaction with > mdev is that XML may specify an mdev UUID as the source for a hostdev > and set the permissions on the device files appropriately. Whether > mdevs are created in advance and re-used or created and destroyed > around a VM instance (for example via qemu hooks scripts) is not a > policy that libvirt imposes. > > > > But for process, it is different. There is no lib-virt in control. The > > > administrator's intension is to grant some type of application to access > > > the > > > hardware. The application can get a handle of the hardware, send request > > > and ge
Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
On Fri, Aug 03, 2018 at 03:20:43PM +0100, Alan Cox wrote: > Date: Fri, 3 Aug 2018 15:20:43 +0100 > From: Alan Cox > To: Jerome Glisse > CC: "Tian, Kevin" , Kenneth Lee > , Hao Fang , Herbert Xu > , "k...@vger.kernel.org" > , Jonathan Corbet , Greg > Kroah-Hartman , "linux-...@vger.kernel.org" > , "Kumar, Sanjay K" , > "io...@lists.linux-foundation.org" , > "linux-kernel@vger.kernel.org" , > "linux...@huawei.com" , Alex Williamson > , Thomas Gleixner , > "linux-cry...@vger.kernel.org" , Philippe > Ombredanne , Zaibo Xu , Kenneth > Lee , "David S . Miller" , > Ross Zwisler > Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive > Organization: Intel Corporation > X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) > Message-ID: <20180803152043.40f88947@alans-desktop> > > > If we are going to have any kind of general purpose accelerator API then > > > it has to be able to implement things like > > > > Why is the existing driver model not good enough ? So you want > > a device with function X you look into /dev/X (for instance > > for GPU you look in /dev/dri) > > Except when my GPU is in an FPGA in which case it might be somewhere else > or it's a general purpose accelerator that happens to be usable as a GPU. > Unusual today in big computer space but you'll find it in > microcontrollers. > > > Each of those device need a userspace driver and thus this > > user space driver can easily knows where to look. I do not > > expect that every application will reimplement those drivers > > but instead use some kind of library that provide a high > > level API for each of those devices. > > Think about it from the user level. You have a pipeline of things you > wish to execute, you need to get the right accelerator combinations and > they need to fit together to meet system constraints like number of > IOMMU ids the accelerator supports, where they are connected. > > > Now you have a hierarchy of memory for the CPU (HBM, local > > node main memory aka you DDR dimm, persistent memory) each > > It's not a heirarchy, it's a graph. There's no fundamental reason two > accelerators can't be close to two different CPU cores but have shared > HBM that is far from each processor. There are physical reasons it tends > to look more like a heirarchy today. > > > Anyway i think finding devices and finding relation between > > devices and memory is 2 separate problems and as such should > > be handled separatly. > > At a certain level they are deeply intertwined because you need a common > API. It's not good if I want a particular accelerator and need to then > see which API its under on this machine and which interface I have to > use, and maybe have a mix of FPGA, WarpDrive and Google ASIC interfaces > all different. > > The job of the kernel is to impose some kind of sanity and unity on this > lot. > > All of it in the end comes down to > > 'Somehow glue some chunk of memory into my address space and find any > supporting driver I need' > Agree. This is also our intension on WarpDrive. And it looks VFIO is the best place to fulfill this requirement. > plus virtualization of the above. > > That bit's easy - but making it usable is a different story. > > Alan -- -Kenneth(Hisilicon) 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
On Fri, Aug 03, 2018 at 03:20:43PM +0100, Alan Cox wrote: > Date: Fri, 3 Aug 2018 15:20:43 +0100 > From: Alan Cox > To: Jerome Glisse > CC: "Tian, Kevin" , Kenneth Lee > , Hao Fang , Herbert Xu > , "k...@vger.kernel.org" > , Jonathan Corbet , Greg > Kroah-Hartman , "linux-...@vger.kernel.org" > , "Kumar, Sanjay K" , > "io...@lists.linux-foundation.org" , > "linux-kernel@vger.kernel.org" , > "linux...@huawei.com" , Alex Williamson > , Thomas Gleixner , > "linux-cry...@vger.kernel.org" , Philippe > Ombredanne , Zaibo Xu , Kenneth > Lee , "David S . Miller" , > Ross Zwisler > Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive > Organization: Intel Corporation > X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) > Message-ID: <20180803152043.40f88947@alans-desktop> > > > If we are going to have any kind of general purpose accelerator API then > > > it has to be able to implement things like > > > > Why is the existing driver model not good enough ? So you want > > a device with function X you look into /dev/X (for instance > > for GPU you look in /dev/dri) > > Except when my GPU is in an FPGA in which case it might be somewhere else > or it's a general purpose accelerator that happens to be usable as a GPU. > Unusual today in big computer space but you'll find it in > microcontrollers. > > > Each of those device need a userspace driver and thus this > > user space driver can easily knows where to look. I do not > > expect that every application will reimplement those drivers > > but instead use some kind of library that provide a high > > level API for each of those devices. > > Think about it from the user level. You have a pipeline of things you > wish to execute, you need to get the right accelerator combinations and > they need to fit together to meet system constraints like number of > IOMMU ids the accelerator supports, where they are connected. > > > Now you have a hierarchy of memory for the CPU (HBM, local > > node main memory aka you DDR dimm, persistent memory) each > > It's not a heirarchy, it's a graph. There's no fundamental reason two > accelerators can't be close to two different CPU cores but have shared > HBM that is far from each processor. There are physical reasons it tends > to look more like a heirarchy today. > > > Anyway i think finding devices and finding relation between > > devices and memory is 2 separate problems and as such should > > be handled separatly. > > At a certain level they are deeply intertwined because you need a common > API. It's not good if I want a particular accelerator and need to then > see which API its under on this machine and which interface I have to > use, and maybe have a mix of FPGA, WarpDrive and Google ASIC interfaces > all different. > > The job of the kernel is to impose some kind of sanity and unity on this > lot. > > All of it in the end comes down to > > 'Somehow glue some chunk of memory into my address space and find any > supporting driver I need' > Agree. This is also our intension on WarpDrive. And it looks VFIO is the best place to fulfill this requirement. > plus virtualization of the above. > > That bit's easy - but making it usable is a different story. > > Alan -- -Kenneth(Hisilicon) 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
[PATCH v4] IB/umem: Release pid in error and ODP flow
1. Release pid before enter odp flow 2. Release pid when fail to allocate memory Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> Reviewed-by: Haggai Eran <hagg...@mellanox.com> --- Change from v1 to v2: Correcting the patch title and description Change from v2 to v3: Update the title and add "Fixes" fields in the description Change from v3 to v4: Keep the Fixes tag at the end of the description drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f0cb28..4609b921f899 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH v4] IB/umem: Release pid in error and ODP flow
1. Release pid before enter odp flow 2. Release pid when fail to allocate memory Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") Signed-off-by: Kenneth Lee Reviewed-by: Haggai Eran --- Change from v1 to v2: Correcting the patch title and description Change from v2 to v3: Update the title and add "Fixes" fields in the description Change from v3 to v4: Keep the Fixes tag at the end of the description drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f0cb28..4609b921f899 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
Re: [PATCH v3] IB/umem: Release pid in error and ODP flow
On Tue, Jan 03, 2017 at 12:12:24PM +0200, Leon Romanovsky wrote: > Date: Tue, 3 Jan 2017 12:12:24 +0200 > From: Leon Romanovsky <l...@kernel.org> > To: Kenneth Lee <liguo...@hisilicon.com> > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20170103101224.GH12077@mtr-leonro.local> > > On Tue, Jan 03, 2017 at 10:32:50AM +0800, Kenneth Lee wrote: > > On Sun, Jan 01, 2017 at 08:47:12AM +0200, Leon Romanovsky wrote: > > > Date: Sun, 1 Jan 2017 08:47:12 +0200 > > > From: Leon Romanovsky <l...@kernel.org> > > > To: Kenneth Lee <liguo...@hisilicon.com> > > > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > > > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > > > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > > > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > > > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > > > User-Agent: Mutt/1.7.2 (2016-11-26) > > > Message-ID: <20170101064712.GQ26885@mtr-leonro.local> > > > > > > On Fri, Dec 30, 2016 at 06:18:29PM +0800, Kenneth Lee wrote: > > > > There are two bugfixes in this patch: > > > > > > > > Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement > > > > mm->pinned_vm from ib_umem_get") > > > > This patch introduce the get_task_pid but not put it back on > > > > all error > > > > path > > > > > > > > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging > > > > regions") > > > > This patch introduce a ODP flow without release pid before > > > > enter it > > > > > > > > > > > > Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> > > > > Reviewed-by: Haggai Eran <hagg...@mellanox.com> > > > > --- > > > > Change from v1 to v2: > > > > Correcting the patch title and description > > > > Change from v2 to v3: > > > > Update the title and add "Fixes" fields in the description > > > > > > OK, > > > > > > I see that you still didn't read Documentation/SubmittingPatches. You > > > must read that document before you are sending patches. > > > > > > But I'll stop here, the code is correct (it fixes bugs) and commit message > > > more usefull than before. > > > > > > > > > > > > > > drivers/infiniband/core/umem.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/infiniband/core/umem.c > > > > b/drivers/infiniband/core/umem.c > > > > index 1e62a5f..4609b92 100644 > > > > --- a/drivers/infiniband/core/umem.c > > > > +++ b/drivers/infiniband/core/umem.c > > > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > > > *context, unsigned long addr, > > > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > > > > > if (access & IB_ACCESS_ON_DEMAND) { > > > > + put_pid(umem->pid); > > > > ret = ib_umem_odp_get(context, umem); > > > > if (ret) { > > > > kfree(umem); > > > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > > > *context, unsigned long addr, > > > > > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > > > if (!page_list) { > > > > + put_pid(umem->pid); > > > > kfree(umem); > > > > return ERR_PTR(-ENOMEM); > > > > } > > > > -- > > > > 1.9.1 > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > > the body of a message to majord...@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Thanks, > > > > I did read the doc, but maybe I mis-understant some points. Could you please > > point it out? > > Fixes line should be placed above bottom signatures. > > As an example of properly written patch, you can take a look on the > following patch [1] from Steve. > > [1] http://marc.info/?l=linux-rdma=148244272205411=2 Thank you. A sample help a lot. But please allow me to argue a little: Documentation/process/submitting-patches.rst does really not mention where Fixes tags should be put:) > > > > > And sorry. please ignore the last message. I forget to use a bottom-post > > style. > > > > > > > > -- > > -Kenneth(Hisilicon) -- -Kenneth(Hisilicon)
Re: [PATCH v3] IB/umem: Release pid in error and ODP flow
On Tue, Jan 03, 2017 at 12:12:24PM +0200, Leon Romanovsky wrote: > Date: Tue, 3 Jan 2017 12:12:24 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20170103101224.GH12077@mtr-leonro.local> > > On Tue, Jan 03, 2017 at 10:32:50AM +0800, Kenneth Lee wrote: > > On Sun, Jan 01, 2017 at 08:47:12AM +0200, Leon Romanovsky wrote: > > > Date: Sun, 1 Jan 2017 08:47:12 +0200 > > > From: Leon Romanovsky > > > To: Kenneth Lee > > > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > > > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > > > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > > > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > > > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > > > User-Agent: Mutt/1.7.2 (2016-11-26) > > > Message-ID: <20170101064712.GQ26885@mtr-leonro.local> > > > > > > On Fri, Dec 30, 2016 at 06:18:29PM +0800, Kenneth Lee wrote: > > > > There are two bugfixes in this patch: > > > > > > > > Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement > > > > mm->pinned_vm from ib_umem_get") > > > > This patch introduce the get_task_pid but not put it back on > > > > all error > > > > path > > > > > > > > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging > > > > regions") > > > > This patch introduce a ODP flow without release pid before > > > > enter it > > > > > > > > > > > > Signed-off-by: Kenneth Lee > > > > Reviewed-by: Haggai Eran > > > > --- > > > > Change from v1 to v2: > > > > Correcting the patch title and description > > > > Change from v2 to v3: > > > > Update the title and add "Fixes" fields in the description > > > > > > OK, > > > > > > I see that you still didn't read Documentation/SubmittingPatches. You > > > must read that document before you are sending patches. > > > > > > But I'll stop here, the code is correct (it fixes bugs) and commit message > > > more usefull than before. > > > > > > > > > > > > > > drivers/infiniband/core/umem.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/infiniband/core/umem.c > > > > b/drivers/infiniband/core/umem.c > > > > index 1e62a5f..4609b92 100644 > > > > --- a/drivers/infiniband/core/umem.c > > > > +++ b/drivers/infiniband/core/umem.c > > > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > > > *context, unsigned long addr, > > > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > > > > > if (access & IB_ACCESS_ON_DEMAND) { > > > > + put_pid(umem->pid); > > > > ret = ib_umem_odp_get(context, umem); > > > > if (ret) { > > > > kfree(umem); > > > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > > > *context, unsigned long addr, > > > > > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > > > if (!page_list) { > > > > + put_pid(umem->pid); > > > > kfree(umem); > > > > return ERR_PTR(-ENOMEM); > > > > } > > > > -- > > > > 1.9.1 > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > > the body of a message to majord...@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Thanks, > > > > I did read the doc, but maybe I mis-understant some points. Could you please > > point it out? > > Fixes line should be placed above bottom signatures. > > As an example of properly written patch, you can take a look on the > following patch [1] from Steve. > > [1] http://marc.info/?l=linux-rdma=148244272205411=2 Thank you. A sample help a lot. But please allow me to argue a little: Documentation/process/submitting-patches.rst does really not mention where Fixes tags should be put:) > > > > > And sorry. please ignore the last message. I forget to use a bottom-post > > style. > > > > > > > > -- > > -Kenneth(Hisilicon) -- -Kenneth(Hisilicon)
Re: [PATCH v3] IB/umem: Release pid in error and ODP flow
On Sun, Jan 01, 2017 at 08:47:12AM +0200, Leon Romanovsky wrote: > Date: Sun, 1 Jan 2017 08:47:12 +0200 > From: Leon Romanovsky <l...@kernel.org> > To: Kenneth Lee <liguo...@hisilicon.com> > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20170101064712.GQ26885@mtr-leonro.local> > > On Fri, Dec 30, 2016 at 06:18:29PM +0800, Kenneth Lee wrote: > > There are two bugfixes in this patch: > > > > Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm > > from ib_umem_get") > > This patch introduce the get_task_pid but not put it back on all error > > path > > > > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") > > This patch introduce a ODP flow without release pid before enter it > > > > > > Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> > > Reviewed-by: Haggai Eran <hagg...@mellanox.com> > > --- > > Change from v1 to v2: > > Correcting the patch title and description > > Change from v2 to v3: > > Update the title and add "Fixes" fields in the description > > OK, > > I see that you still didn't read Documentation/SubmittingPatches. You > must read that document before you are sending patches. > > But I'll stop here, the code is correct (it fixes bugs) and commit message > more usefull than before. > > > > > > drivers/infiniband/core/umem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 1e62a5f..4609b92 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > if (access & IB_ACCESS_ON_DEMAND) { > > + put_pid(umem->pid); > > ret = ib_umem_odp_get(context, umem); > > if (ret) { > > kfree(umem); > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > if (!page_list) { > > + put_pid(umem->pid); > > kfree(umem); > > return ERR_PTR(-ENOMEM); > > } > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks, I did read the doc, but maybe I mis-understant some points. Could you please point it out? And sorry. please ignore the last message. I forget to use a bottom-post style. -- -Kenneth(Hisilicon)
Re: [PATCH v3] IB/umem: Release pid in error and ODP flow
On Sun, Jan 01, 2017 at 08:47:12AM +0200, Leon Romanovsky wrote: > Date: Sun, 1 Jan 2017 08:47:12 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20170101064712.GQ26885@mtr-leonro.local> > > On Fri, Dec 30, 2016 at 06:18:29PM +0800, Kenneth Lee wrote: > > There are two bugfixes in this patch: > > > > Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm > > from ib_umem_get") > > This patch introduce the get_task_pid but not put it back on all error > > path > > > > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") > > This patch introduce a ODP flow without release pid before enter it > > > > > > Signed-off-by: Kenneth Lee > > Reviewed-by: Haggai Eran > > --- > > Change from v1 to v2: > > Correcting the patch title and description > > Change from v2 to v3: > > Update the title and add "Fixes" fields in the description > > OK, > > I see that you still didn't read Documentation/SubmittingPatches. You > must read that document before you are sending patches. > > But I'll stop here, the code is correct (it fixes bugs) and commit message > more usefull than before. > > > > > > drivers/infiniband/core/umem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 1e62a5f..4609b92 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > if (access & IB_ACCESS_ON_DEMAND) { > > + put_pid(umem->pid); > > ret = ib_umem_odp_get(context, umem); > > if (ret) { > > kfree(umem); > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > if (!page_list) { > > + put_pid(umem->pid); > > kfree(umem); > > return ERR_PTR(-ENOMEM); > > } > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks, I did read the doc, but maybe I mis-understant some points. Could you please point it out? And sorry. please ignore the last message. I forget to use a bottom-post style. -- -Kenneth(Hisilicon)
Re: [PATCH v3] IB/umem: Release pid in error and ODP flow
Thanks, I did read the doc, but maybe I mis-understant some points. Could you please point it out? On Sun, Jan 01, 2017 at 08:47:12AM +0200, Leon Romanovsky wrote: > Date: Sun, 1 Jan 2017 08:47:12 +0200 > From: Leon Romanovsky <l...@kernel.org> > To: Kenneth Lee <liguo...@hisilicon.com> > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20170101064712.GQ26885@mtr-leonro.local> > > On Fri, Dec 30, 2016 at 06:18:29PM +0800, Kenneth Lee wrote: > > There are two bugfixes in this patch: > > > > Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm > > from ib_umem_get") > > This patch introduce the get_task_pid but not put it back on all error > > path > > > > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") > > This patch introduce a ODP flow without release pid before enter it > > > > > > Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> > > Reviewed-by: Haggai Eran <hagg...@mellanox.com> > > --- > > Change from v1 to v2: > > Correcting the patch title and description > > Change from v2 to v3: > > Update the title and add "Fixes" fields in the description > > OK, > > I see that you still didn't read Documentation/SubmittingPatches. You > must read that document before you are sending patches. > > But I'll stop here, the code is correct (it fixes bugs) and commit message > more usefull than before. > > > > > > drivers/infiniband/core/umem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 1e62a5f..4609b92 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > if (access & IB_ACCESS_ON_DEMAND) { > > + put_pid(umem->pid); > > ret = ib_umem_odp_get(context, umem); > > if (ret) { > > kfree(umem); > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > if (!page_list) { > > + put_pid(umem->pid); > > kfree(umem); > > return ERR_PTR(-ENOMEM); > > } > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- -Kenneth(Hisilicon)
Re: [PATCH v3] IB/umem: Release pid in error and ODP flow
Thanks, I did read the doc, but maybe I mis-understant some points. Could you please point it out? On Sun, Jan 01, 2017 at 08:47:12AM +0200, Leon Romanovsky wrote: > Date: Sun, 1 Jan 2017 08:47:12 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3] IB/umem: Release pid in error and ODP flow > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20170101064712.GQ26885@mtr-leonro.local> > > On Fri, Dec 30, 2016 at 06:18:29PM +0800, Kenneth Lee wrote: > > There are two bugfixes in this patch: > > > > Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm > > from ib_umem_get") > > This patch introduce the get_task_pid but not put it back on all error > > path > > > > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") > > This patch introduce a ODP flow without release pid before enter it > > > > > > Signed-off-by: Kenneth Lee > > Reviewed-by: Haggai Eran > > --- > > Change from v1 to v2: > > Correcting the patch title and description > > Change from v2 to v3: > > Update the title and add "Fixes" fields in the description > > OK, > > I see that you still didn't read Documentation/SubmittingPatches. You > must read that document before you are sending patches. > > But I'll stop here, the code is correct (it fixes bugs) and commit message > more usefull than before. > > > > > > drivers/infiniband/core/umem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 1e62a5f..4609b92 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > if (access & IB_ACCESS_ON_DEMAND) { > > + put_pid(umem->pid); > > ret = ib_umem_odp_get(context, umem); > > if (ret) { > > kfree(umem); > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > if (!page_list) { > > + put_pid(umem->pid); > > kfree(umem); > > return ERR_PTR(-ENOMEM); > > } > > -- > > 1.9.1 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- -Kenneth(Hisilicon)
[PATCH v3] IB/umem: Release pid in error and ODP flow
There are two bugfixes in this patch: Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") This patch introduce the get_task_pid but not put it back on all error path Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") This patch introduce a ODP flow without release pid before enter it Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> Reviewed-by: Haggai Eran <hagg...@mellanox.com> --- Change from v1 to v2: Correcting the patch title and description Change from v2 to v3: Update the title and add "Fixes" fields in the description drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH v3] IB/umem: Release pid in error and ODP flow
There are two bugfixes in this patch: Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") This patch introduce the get_task_pid but not put it back on all error path Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") This patch introduce a ODP flow without release pid before enter it Signed-off-by: Kenneth Lee Reviewed-by: Haggai Eran --- Change from v1 to v2: Correcting the patch title and description Change from v2 to v3: Update the title and add "Fixes" fields in the description drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
On Fri, Dec 30, 2016 at 08:55:10AM +0200, Leon Romanovsky wrote: > Date: Fri, 30 Dec 2016 08:55:10 +0200 > From: Leon Romanovsky <l...@kernel.org> > To: Kenneth Lee <liguo...@hisilicon.com> > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get() > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20161230065510.GL26885@mtr-leonro.local> > > On Fri, Dec 30, 2016 at 12:50:11PM +0800, Kenneth Lee wrote: > > Hi, Leon, > > > > 1. I do change the title except for the version number itself:) But my > > English > > is quite bad, maybe the title is still quite stupid. I can update it > > according > > to your advice. > > Yes, please > The main points are: > 1. Remove "bugifix", it is not needed. > 2. Use description in the title and not function names. > > > > > 2. I catched the bug by reading the final code, not by bisect-ing the old > > commit. Do you means I should find out which commit introducing the bug? It > > will > > not be easily to say which it is because it is a "missing bug", rather than > > a > > "introduced bug". Indicate the commit may not help to remove a patch/commit > > from > > the stable tree. > > The fixes line won't cause for removal of commit, but to addition of > yours on top of their code base. > > git blame is your friend. > > one fixes line is: > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") > > and the second line is ! NOT !, you need to go deeper in the logs > !! > Fixes: f7c6a7b5d599 ("IB/uverbs: Export ib_umem_get()/ib_umem_release() to > modules") > > > > > Could you please give more suggestion? Thanks. > > Please, don't use top-posting for this mailing list. > It is really-really annoying. > > > > > On Thu, Dec 29, 2016 at 10:17:56AM +0200, Leon Romanovsky wrote: > > > Date: Thu, 29 Dec 2016 10:17:56 +0200 > > > From: Leon Romanovsky <l...@kernel.org> > > > To: Kenneth Lee <liguo...@hisilicon.com> > > > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > > > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > > > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > > > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > > > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get() > > > User-Agent: Mutt/1.7.2 (2016-11-26) > > > Message-ID: <20161229081756.GI26885@mtr-leonro.local> > > > > > > On Thu, Dec 29, 2016 at 04:27:28PM +0800, Kenneth Lee wrote: > > > > There are two bugfixes in this patch: > > > > > > > > 1. When the execution go to the ib_umem_odp_get() path, pid should be > > > > put > > > >back. > > > > 2. When the memory allocation fail, the pid also should be put back > > > > before > > > >exit. > > > > > > > > Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> > > > > Reviewed-by: Haggai Eran <hagg...@mellanox.com> > > > > --- > > > > Change from v1 to v2: > > > > Correcting the patch title and description > > > > > > I don't see any changes except version in the title. > > > What about anything like this? > > > [PATCH v3] IB/umem: Release pid in error and ODP flows > > > > > > And Fixes line please, it will help to forward it to stable trees. > > > > > > Thanks > > > > > > > > -- > > -Kenneth(Hisilicon) Very helpful. Thank you. I will send the Patch v3 soon. -- -Kenneth(Hisilicon)
Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
On Fri, Dec 30, 2016 at 08:55:10AM +0200, Leon Romanovsky wrote: > Date: Fri, 30 Dec 2016 08:55:10 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get() > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20161230065510.GL26885@mtr-leonro.local> > > On Fri, Dec 30, 2016 at 12:50:11PM +0800, Kenneth Lee wrote: > > Hi, Leon, > > > > 1. I do change the title except for the version number itself:) But my > > English > > is quite bad, maybe the title is still quite stupid. I can update it > > according > > to your advice. > > Yes, please > The main points are: > 1. Remove "bugifix", it is not needed. > 2. Use description in the title and not function names. > > > > > 2. I catched the bug by reading the final code, not by bisect-ing the old > > commit. Do you means I should find out which commit introducing the bug? It > > will > > not be easily to say which it is because it is a "missing bug", rather than > > a > > "introduced bug". Indicate the commit may not help to remove a patch/commit > > from > > the stable tree. > > The fixes line won't cause for removal of commit, but to addition of > yours on top of their code base. > > git blame is your friend. > > one fixes line is: > Fixes: 8ada2c1c0c1d ("IB/core: Add support for on demand paging regions") > > and the second line is ! NOT !, you need to go deeper in the logs > !! > Fixes: f7c6a7b5d599 ("IB/uverbs: Export ib_umem_get()/ib_umem_release() to > modules") > > > > > Could you please give more suggestion? Thanks. > > Please, don't use top-posting for this mailing list. > It is really-really annoying. > > > > > On Thu, Dec 29, 2016 at 10:17:56AM +0200, Leon Romanovsky wrote: > > > Date: Thu, 29 Dec 2016 10:17:56 +0200 > > > From: Leon Romanovsky > > > To: Kenneth Lee > > > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > > > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > > > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > > > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > > > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get() > > > User-Agent: Mutt/1.7.2 (2016-11-26) > > > Message-ID: <20161229081756.GI26885@mtr-leonro.local> > > > > > > On Thu, Dec 29, 2016 at 04:27:28PM +0800, Kenneth Lee wrote: > > > > There are two bugfixes in this patch: > > > > > > > > 1. When the execution go to the ib_umem_odp_get() path, pid should be > > > > put > > > >back. > > > > 2. When the memory allocation fail, the pid also should be put back > > > > before > > > >exit. > > > > > > > > Signed-off-by: Kenneth Lee > > > > Reviewed-by: Haggai Eran > > > > --- > > > > Change from v1 to v2: > > > > Correcting the patch title and description > > > > > > I don't see any changes except version in the title. > > > What about anything like this? > > > [PATCH v3] IB/umem: Release pid in error and ODP flows > > > > > > And Fixes line please, it will help to forward it to stable trees. > > > > > > Thanks > > > > > > > > -- > > -Kenneth(Hisilicon) Very helpful. Thank you. I will send the Patch v3 soon. -- -Kenneth(Hisilicon)
Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
Hi, Leon, 1. I do change the title except for the version number itself:) But my English is quite bad, maybe the title is still quite stupid. I can update it according to your advice. 2. I catched the bug by reading the final code, not by bisect-ing the old commit. Do you means I should find out which commit introducing the bug? It will not be easily to say which it is because it is a "missing bug", rather than a "introduced bug". Indicate the commit may not help to remove a patch/commit from the stable tree. Could you please give more suggestion? Thanks. On Thu, Dec 29, 2016 at 10:17:56AM +0200, Leon Romanovsky wrote: > Date: Thu, 29 Dec 2016 10:17:56 +0200 > From: Leon Romanovsky <l...@kernel.org> > To: Kenneth Lee <liguo...@hisilicon.com> > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get() > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20161229081756.GI26885@mtr-leonro.local> > > On Thu, Dec 29, 2016 at 04:27:28PM +0800, Kenneth Lee wrote: > > There are two bugfixes in this patch: > > > > 1. When the execution go to the ib_umem_odp_get() path, pid should be put > >back. > > 2. When the memory allocation fail, the pid also should be put back before > >exit. > > > > Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> > > Reviewed-by: Haggai Eran <hagg...@mellanox.com> > > --- > > Change from v1 to v2: > > Correcting the patch title and description > > I don't see any changes except version in the title. > What about anything like this? > [PATCH v3] IB/umem: Release pid in error and ODP flows > > And Fixes line please, it will help to forward it to stable trees. > > Thanks -- -Kenneth(Hisilicon)
Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
Hi, Leon, 1. I do change the title except for the version number itself:) But my English is quite bad, maybe the title is still quite stupid. I can update it according to your advice. 2. I catched the bug by reading the final code, not by bisect-ing the old commit. Do you means I should find out which commit introducing the bug? It will not be easily to say which it is because it is a "missing bug", rather than a "introduced bug". Indicate the commit may not help to remove a patch/commit from the stable tree. Could you please give more suggestion? Thanks. On Thu, Dec 29, 2016 at 10:17:56AM +0200, Leon Romanovsky wrote: > Date: Thu, 29 Dec 2016 10:17:56 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: dledf...@redhat.com, sean.he...@intel.com, hal.rosenst...@gmail.com, > robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, ma...@mellanox.com, > linux-r...@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get() > User-Agent: Mutt/1.7.2 (2016-11-26) > Message-ID: <20161229081756.GI26885@mtr-leonro.local> > > On Thu, Dec 29, 2016 at 04:27:28PM +0800, Kenneth Lee wrote: > > There are two bugfixes in this patch: > > > > 1. When the execution go to the ib_umem_odp_get() path, pid should be put > >back. > > 2. When the memory allocation fail, the pid also should be put back before > >exit. > > > > Signed-off-by: Kenneth Lee > > Reviewed-by: Haggai Eran > > --- > > Change from v1 to v2: > > Correcting the patch title and description > > I don't see any changes except version in the title. > What about anything like this? > [PATCH v3] IB/umem: Release pid in error and ODP flows > > And Fixes line please, it will help to forward it to stable trees. > > Thanks -- -Kenneth(Hisilicon)
[PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
There are two bugfixes in this patch: 1. When the execution go to the ib_umem_odp_get() path, pid should be put back. 2. When the memory allocation fail, the pid also should be put back before exit. Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> Reviewed-by: Haggai Eran <hagg...@mellanox.com> --- Change from v1 to v2: Correcting the patch title and description drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH v2] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
There are two bugfixes in this patch: 1. When the execution go to the ib_umem_odp_get() path, pid should be put back. 2. When the memory allocation fail, the pid also should be put back before exit. Signed-off-by: Kenneth Lee Reviewed-by: Haggai Eran --- Change from v1 to v2: Correcting the patch title and description drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
Re: [PATCH] ib umem: bug: put pid back before return from error path
Hi, Sorry for the delay (I'd got some problem in my procmailrc file, and miss this mail). The new patch, with title "[PATCH] ib umem: bugfix: mixed put_pid()s in ib_umem_get()", has been sent. On Thu, Dec 22, 2016 at 10:00:57AM +0200, Mark Bloch wrote: > Date: Thu, 22 Dec 2016 10:00:57 +0200 > From: Mark Bloch <ma...@mellanox.com> > To: Kenneth Lee <liguo...@hisilicon.com>, dledf...@redhat.com, > sean.he...@intel.com, hal.rosenst...@gmail.com > CC: robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, linux-r...@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: [PATCH] ib umem: bug: put pid back before return from error > path > User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 > Thunderbird/45.5.1 > Message-ID: <e470fbab-a330-b9c4-f2b3-65fb45f24...@mellanox.com> > > Hi, > > You have two bugs here: > 1) When using ODP, ib_umem_release() checks for umem->odp_data != NULL >calls ib_umem_odp_release() and returns immediately without calling > put_pid(). >This one isn't in the error path so the title doesn't fit. > > 2) In case the allocation failed, we return in -ENOMEM without calling > put_pid(). > > Can you please resend this with proper fixes line and a better description of > what is going on. > > On 22/12/2016 09:11, Kenneth Lee wrote: > > I catched this bug when reading the code. I'm sorry I have no hardware to > > test > > it. But it is abviously a bug. > > > > Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> > > --- > > drivers/infiniband/core/umem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 1e62a5f..4609b92 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > if (access & IB_ACCESS_ON_DEMAND) { > > + put_pid(umem->pid); > > ret = ib_umem_odp_get(context, umem); > > if (ret) { > > kfree(umem); > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > if (!page_list) { > > + put_pid(umem->pid); > > kfree(umem); > > return ERR_PTR(-ENOMEM); > > } > > > > Mark. -- -Kenneth(Hisilicon)
Re: [PATCH] ib umem: bug: put pid back before return from error path
Hi, Sorry for the delay (I'd got some problem in my procmailrc file, and miss this mail). The new patch, with title "[PATCH] ib umem: bugfix: mixed put_pid()s in ib_umem_get()", has been sent. On Thu, Dec 22, 2016 at 10:00:57AM +0200, Mark Bloch wrote: > Date: Thu, 22 Dec 2016 10:00:57 +0200 > From: Mark Bloch > To: Kenneth Lee , dledf...@redhat.com, > sean.he...@intel.com, hal.rosenst...@gmail.com > CC: robin.mur...@arm.com, jroe...@suse.de, egtv...@samfundet.no, > vgu...@synopsys.com, dave.han...@linux.intel.com, lstoa...@gmail.com, > k...@kernel.org, seb...@linux.vnet.ibm.com, linux-r...@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: [PATCH] ib umem: bug: put pid back before return from error > path > User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 > Thunderbird/45.5.1 > Message-ID: > > Hi, > > You have two bugs here: > 1) When using ODP, ib_umem_release() checks for umem->odp_data != NULL >calls ib_umem_odp_release() and returns immediately without calling > put_pid(). >This one isn't in the error path so the title doesn't fit. > > 2) In case the allocation failed, we return in -ENOMEM without calling > put_pid(). > > Can you please resend this with proper fixes line and a better description of > what is going on. > > On 22/12/2016 09:11, Kenneth Lee wrote: > > I catched this bug when reading the code. I'm sorry I have no hardware to > > test > > it. But it is abviously a bug. > > > > Signed-off-by: Kenneth Lee > > --- > > drivers/infiniband/core/umem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > index 1e62a5f..4609b92 100644 > > --- a/drivers/infiniband/core/umem.c > > +++ b/drivers/infiniband/core/umem.c > > @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); > > > > if (access & IB_ACCESS_ON_DEMAND) { > > + put_pid(umem->pid); > > ret = ib_umem_odp_get(context, umem); > > if (ret) { > > kfree(umem); > > @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext > > *context, unsigned long addr, > > > > page_list = (struct page **) __get_free_page(GFP_KERNEL); > > if (!page_list) { > > + put_pid(umem->pid); > > kfree(umem); > > return ERR_PTR(-ENOMEM); > > } > > > > Mark. -- -Kenneth(Hisilicon)
[PATCH] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
There are two bugfixes in this patch: 1. When the execution go to the ib_umem_odp_get() path, pid should be put back. 2. When the memory allocation fail, the pid also should be put back before exit. Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> --- drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH] ib umem: bugfix: mixed put_pid()s in ib_umem_get()
There are two bugfixes in this patch: 1. When the execution go to the ib_umem_odp_get() path, pid should be put back. 2. When the memory allocation fail, the pid also should be put back before exit. Signed-off-by: Kenneth Lee --- drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH] ib umem: bug: put pid back before return from error path
I catched this bug when reading the code. I'm sorry I have no hardware to test it. But it is abviously a bug. Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> --- drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH] ib umem: bug: put pid back before return from error path
I catched this bug when reading the code. I'm sorry I have no hardware to test it. But it is abviously a bug. Signed-off-by: Kenneth Lee --- drivers/infiniband/core/umem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 1e62a5f..4609b92 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); if (access & IB_ACCESS_ON_DEMAND) { + put_pid(umem->pid); ret = ib_umem_odp_get(context, umem); if (ret) { kfree(umem); @@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { + put_pid(umem->pid); kfree(umem); return ERR_PTR(-ENOMEM); } -- 1.9.1
[PATCH] [bugfix] replace unnessary ldax with common ldr
(add comment for the previous mail, sorry for the duplication) There is no store_ex pairing with this load_ex. It is not necessary and gave wrong hint to the cache system. Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> --- arch/arm64/include/asm/spinlock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index c85e96d..3334c4f 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -63,7 +63,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) */ " sevl\n" "2:wfe\n" -" ldaxrh %w2, %4\n" +" ldrh%w2, %4\n" " eor %w1, %w2, %w0, lsr #16\n" " cbnz%w1, 2b\n" /* We got the lock. Critical section starts here. */ -- 1.9.1
[PATCH] [bugfix] replace unnessary ldax with common ldr
(add comment for the previous mail, sorry for the duplication) There is no store_ex pairing with this load_ex. It is not necessary and gave wrong hint to the cache system. Signed-off-by: Kenneth Lee --- arch/arm64/include/asm/spinlock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index c85e96d..3334c4f 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -63,7 +63,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) */ " sevl\n" "2:wfe\n" -" ldaxrh %w2, %4\n" +" ldrh%w2, %4\n" " eor %w1, %w2, %w0, lsr #16\n" " cbnz%w1, 2b\n" /* We got the lock. Critical section starts here. */ -- 1.9.1
[PATCH] [bugfix] replace unnessary ldax with common ldr
Signed-off-by: Kenneth Lee <liguo...@hisilicon.com> --- arch/arm64/include/asm/spinlock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index c85e96d..3334c4f 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -63,7 +63,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) */ " sevl\n" "2:wfe\n" -" ldaxrh %w2, %4\n" +" ldrh%w2, %4\n" " eor %w1, %w2, %w0, lsr #16\n" " cbnz%w1, 2b\n" /* We got the lock. Critical section starts here. */ -- 1.9.1
[PATCH] [bugfix] replace unnessary ldax with common ldr
Signed-off-by: Kenneth Lee --- arch/arm64/include/asm/spinlock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index c85e96d..3334c4f 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -63,7 +63,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) */ " sevl\n" "2:wfe\n" -" ldaxrh %w2, %4\n" +" ldrh%w2, %4\n" " eor %w1, %w2, %w0, lsr #16\n" " cbnz%w1, 2b\n" /* We got the lock. Critical section starts here. */ -- 1.9.1
Re: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro
On Tue, Oct 13, 2015 at 04:18:23PM +0200, Arnd Bergmann wrote: > Date: Tue, 13 Oct 2015 16:18:23 +0200 > From: Arnd Bergmann > To: Kenneth Lee > Cc: da...@davemloft.net, j...@perches.com, liguo...@hisilicon.com, > yisen.zhu...@huawei.com, net...@vger.kernel.org, linux...@huawei.com, > salil.me...@huawei.com, kenneth-lee-2...@foxmail.com, > xuw...@hisilicon.com, lisheng...@huawei.com, linux-kernel@vger.kernel.org, > huangdaode > Subject: Re: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the > once HANDEL_TX_MSG macro > Message-ID: <6914069.XLdT9Eli48@wuerfel> > > On Tuesday 13 October 2015 21:27:12 Kenneth Lee wrote: > > > > Hi, Arnd, > > > > Thank you for the comment. Yes, the io_base is a security problem, we > > will fix it in coming patch soon. > > > > But can we keep the sysfs? The interface from hnae is not used only by > > ethernet driver but also by Open Data Plane driver. If we more it to > > upper layers. Both drivers will have the same logic. > > > > So how about we just add documents to Documention/ABI? > > Hi Kenneth, > > In the end this is up to David Miller of course, but I'd say we are > better off not introducing any ABIs for ODP prematurely. > > We are talking about very generic statistics data, and you should > already provide them for the ethernet driver using the standard > interfaces. > > I have not seen any discussion about adding an ODP subsystem for > the Linux kernel, or what the API will be, but I think we should > not export any interfaces from a particular device driver directly > but always go through a common layer here and use an extensible > interface that can be implemented by everyone. > > The API has not been part of a release yet, so I'd say we should > remove it for now. Once we have a net/odp/ directory, we can > add a driver-independent implementation there and call it from > the hisi driver. > > Arnd Hi, Arnd, Agree. We will remove the interface for the time being. Thank you. -- -Kenneth Lee (Hisilicon) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro
On Tue, Oct 13, 2015 at 04:18:23PM +0200, Arnd Bergmann wrote: > Date: Tue, 13 Oct 2015 16:18:23 +0200 > From: Arnd Bergmann <a...@arndb.de> > To: Kenneth Lee <kenneth-lee-2...@foxmail.com> > Cc: da...@davemloft.net, j...@perches.com, liguo...@hisilicon.com, > yisen.zhu...@huawei.com, net...@vger.kernel.org, linux...@huawei.com, > salil.me...@huawei.com, kenneth-lee-2...@foxmail.com, > xuw...@hisilicon.com, lisheng...@huawei.com, linux-kernel@vger.kernel.org, > huangdaode <huangda...@hisilicon.com> > Subject: Re: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the > once HANDEL_TX_MSG macro > Message-ID: <6914069.XLdT9Eli48@wuerfel> > > On Tuesday 13 October 2015 21:27:12 Kenneth Lee wrote: > > > > Hi, Arnd, > > > > Thank you for the comment. Yes, the io_base is a security problem, we > > will fix it in coming patch soon. > > > > But can we keep the sysfs? The interface from hnae is not used only by > > ethernet driver but also by Open Data Plane driver. If we more it to > > upper layers. Both drivers will have the same logic. > > > > So how about we just add documents to Documention/ABI? > > Hi Kenneth, > > In the end this is up to David Miller of course, but I'd say we are > better off not introducing any ABIs for ODP prematurely. > > We are talking about very generic statistics data, and you should > already provide them for the ethernet driver using the standard > interfaces. > > I have not seen any discussion about adding an ODP subsystem for > the Linux kernel, or what the API will be, but I think we should > not export any interfaces from a particular device driver directly > but always go through a common layer here and use an extensible > interface that can be implemented by everyone. > > The API has not been part of a release yet, so I'd say we should > remove it for now. Once we have a net/odp/ directory, we can > add a driver-independent implementation there and call it from > the hisi driver. > > Arnd Hi, Arnd, Agree. We will remove the interface for the time being. Thank you. -- -Kenneth Lee (Hisilicon) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro
On Tue, Oct 13, 2015 at 03:06:21PM +0800, huangdaode wrote: > Date: Tue, 13 Oct 2015 15:06:21 +0800 > From: huangdaode > To: Kenneth Lee > Subject: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once > HANDEL_TX_MSG macro > Message-ID: <561cad6d.2060...@hisilicon.com> > > Forwarded Message > > Subject: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once > HANDEL_TX_MSG macro >Date: Mon, 12 Oct 2015 13:59:39 +0200 >From: Arnd Bergmann > To: huangdaode > CC: da...@davemloft.net, j...@perches.com, liguo...@hisilicon.com, > > yisen.zhu...@huawei.com, net...@vger.kernel.org, > linux...@huawei.com, salil.me...@huawei.com, > kenneth-lee-2...@foxmail.com, xuw...@hisilicon.com, > lisheng...@huawei.com, linux-kernel@vger.kernel.org > > On Monday 12 October 2015 11:23:44 huangdaode wrote: > > + s += sprintf(s, > > + "\t\ttx_ring on > %p:%u,%u,%u,%u,%u,%llu,%llu\n", > > + h->qs[i]->tx_ring.io_base, > > + h->qs[i]->tx_ring.buf_size, > > + h->qs[i]->tx_ring.desc_num, > > + h->qs[i]->tx_ring.max_desc_num_per_pkt, > > + > h->qs[i]->tx_ring.max_raw_data_sz_per_desc, > > + h->qs[i]->tx_ring.max_pkt_size, > > + h->qs[i]->tx_ring.stats.sw_err_cnt, > > + h->qs[i]->tx_ring.stats.io_err_cnt); > > There is actually a more significant problem with this code, which I > failed to notice when doing the original bugfix: > > You have a sysfs interface here that exports internal data of the > device that should not be visible like this. One problem is that > the io_base is a kernel pointer that must not be visible to non-root > users (so we don't easily create an attack surface for exploits). > Another problem is that the format is not documented in Documentation/ABI/ > and that you have multiple values in one sysfs file here. > > It would probably be better to completely remove that sysfs interface, and > to use the ethtool netlink interface to export them. > > Arnd > > . Hi, Arnd, Thank you for the comment. Yes, the io_base is a security problem, we will fix it in coming patch soon. But can we keep the sysfs? The interface from hnae is not used only by ethernet driver but also by Open Data Plane driver. If we more it to upper layers. Both drivers will have the same logic. So how about we just add documents to Documention/ABI? Thanks -- -Kenneth Lee (Hisilicon) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro
On Tue, Oct 13, 2015 at 03:06:21PM +0800, huangdaode wrote: > Date: Tue, 13 Oct 2015 15:06:21 +0800 > From: huangdaode <huangda...@hisilicon.com> > To: Kenneth Lee <kenneth_lee_2...@126.com> > Subject: Fwd: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once > HANDEL_TX_MSG macro > Message-ID: <561cad6d.2060...@hisilicon.com> > > Forwarded Message > > Subject: Re: [PATCH net-next v2 1/2] hisilicon net: removes the once > HANDEL_TX_MSG macro >Date: Mon, 12 Oct 2015 13:59:39 +0200 >From: Arnd Bergmann <a...@arndb.de> > To: huangdaode <huangda...@hisilicon.com> > CC: da...@davemloft.net, j...@perches.com, liguo...@hisilicon.com, > > yisen.zhu...@huawei.com, net...@vger.kernel.org, > linux...@huawei.com, salil.me...@huawei.com, > kenneth-lee-2...@foxmail.com, xuw...@hisilicon.com, > lisheng...@huawei.com, linux-kernel@vger.kernel.org > > On Monday 12 October 2015 11:23:44 huangdaode wrote: > > + s += sprintf(s, > > + "\t\ttx_ring on > %p:%u,%u,%u,%u,%u,%llu,%llu\n", > > + h->qs[i]->tx_ring.io_base, > > + h->qs[i]->tx_ring.buf_size, > > + h->qs[i]->tx_ring.desc_num, > > + h->qs[i]->tx_ring.max_desc_num_per_pkt, > > + > h->qs[i]->tx_ring.max_raw_data_sz_per_desc, > > + h->qs[i]->tx_ring.max_pkt_size, > > + h->qs[i]->tx_ring.stats.sw_err_cnt, > > + h->qs[i]->tx_ring.stats.io_err_cnt); > > There is actually a more significant problem with this code, which I > failed to notice when doing the original bugfix: > > You have a sysfs interface here that exports internal data of the > device that should not be visible like this. One problem is that > the io_base is a kernel pointer that must not be visible to non-root > users (so we don't easily create an attack surface for exploits). > Another problem is that the format is not documented in Documentation/ABI/ > and that you have multiple values in one sysfs file here. > > It would probably be better to completely remove that sysfs interface, and > to use the ethtool netlink interface to export them. > > Arnd > > . Hi, Arnd, Thank you for the comment. Yes, the io_base is a security problem, we will fix it in coming patch soon. But can we keep the sysfs? The interface from hnae is not used only by ethernet driver but also by Open Data Plane driver. If we more it to upper layers. Both drivers will have the same logic. So how about we just add documents to Documention/ABI? Thanks -- -Kenneth Lee (Hisilicon) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 答复: [PATCH 1/5] net: add Hisilicon Network Subsystem support (config and documents)
On Fri, Aug 21, 2015 at 04:00:35PM +0200, Arnd Bergmann wrote: > Date: Fri, 21 Aug 2015 16:00:35 +0200 > From: Arnd Bergmann > To: "Liguozhu (Kenneth)" > CC: "mark.rutl...@arm.com" , > "devicet...@vger.kernel.org" , > "pawel.m...@arm.com" , "ijc+devicet...@hellion.org.uk" > , "catalin.mari...@arm.com" > , "will.dea...@arm.com" , > "linux-kernel@vger.kernel.org" , Linuxarm > , "paul.gortma...@windriver.com" > , "robh...@kernel.org" , > "ga...@codeaurora.org" , "zhangfei@linaro.org" > , "net...@vger.kernel.org" > , "da...@davemloft.net" , > "linux-arm-ker...@lists.infradead.org" > > Subject: Re: 答复: [PATCH 1/5] net: add Hisilicon Network Subsystem > support (config and documents) > User-Agent: KMail/4.11.5 (Linux/3.16.0-10-generic; KDE/4.11.5; x86_64; ; ) > Message-ID: <2543796.7JthO5WCfI@wuerfel> > > On Monday 17 August 2015 01:28:07 Liguozhu wrote: > > Thanks, Arnd. > > > > Regarding the ae-name: it is the name of the Acceleration Engine. It is > > provided > > by the BIOS according to the position and the feature enabled of the IP. > > So "soc0" means it is on SoC No. 0, while "n4" means it is running on > >"Non-dsaf mode 4". Ideally, we should setup the rule to name it. But as I > > said in the patchset, the IP is original designed for a bare metal solution, > > it is worthless to export all modes and we are planning to add more mode > > for Linux itself in the IP in future version. So I think the better way is > > to leave it as a "name" but add more meaning in the future. > > The name property is a bit awkward. The position is normally implied by > the location of the parent device in the DT, so you should not need that > at all and instead derive it elsewhere. You can also add strings to the > compatible property instead of this, to signify differences in the programming > that are based on how the IP block is used. > > > Regarding the ae-opts: it is the initial value for the AE's runtime options, > > Currently, we have only "port number" (there are 6XGE+2GE port for a DSAF > > AE) > > as option. But for future version, we will add other options such as "enable > > Spanning Tree Protocol algorithm)" and so on. > > I think these can easily be converted into an index property and boolean > flags (present if true, absent otherwise) for additional features. > > > Should I add these background to somewhere? > > The binding document needs to list all supported configurations, if you > have a string property, describe specifically what strings are allowed > and what they mean, but better try to avoid strings altogether. > > Arnd > ___ > linuxarm mailing list > linux...@huawei.com > http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm Dear Arnd, We are working on the new PatchSet. I describe the new design here so in case you can tell us if we make something wrong. So now we will keep some attributes in enthernet node like this: ethernet@0{ compatible = "hisilicon,hns-nic"; ae-name = "dsaf1"; port-id = <0>; }; ae-name is simply a name referring to the name of dsa_name in SAF node. port-id is the index of port provided by DSAF (the accelerator). DSAF can connect to 8 PHYs. Port 0 to 1 are both used for adminstration purpose. They are called debug ports. The remaining 6 PHYs are taken according to the mode of DSAF. In NIC mode of DSAF, all 6 PHYs are taken as ethernet ports to the CPU. The port-id can be 2 to 7. Here is the diagram: +-+---+ |CPU | +-+-+-+---+-+-+-+-+-+-+ | | | | | | | | debug service port port (0,1) (2-7) In Switch mode of DSAF, all 6 PHYs are taken as physical ports connect to a LAN Switch while the CPU side assume itself have one single NIC connect to this switch. In this case, the port-id will be 2 only. +-+---+ |CPU | +-+-+-+---+-+-+-+-+-+-+ | | | service port(2) debug ++ port| switch | (0,1) +-+-+-+-+-+-++ | | | | | | external port -- -Kenneth(Hisilicon) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 答复: [PATCH 1/5] net: add Hisilicon Network Subsystem support (config and documents)
On Fri, Aug 21, 2015 at 04:00:35PM +0200, Arnd Bergmann wrote: Date: Fri, 21 Aug 2015 16:00:35 +0200 From: Arnd Bergmann a...@arndb.de To: Liguozhu (Kenneth) liguo...@hisilicon.com CC: mark.rutl...@arm.com mark.rutl...@arm.com, devicet...@vger.kernel.org devicet...@vger.kernel.org, pawel.m...@arm.com pawel.m...@arm.com, ijc+devicet...@hellion.org.uk ijc+devicet...@hellion.org.uk, catalin.mari...@arm.com catalin.mari...@arm.com, will.dea...@arm.com will.dea...@arm.com, linux-kernel@vger.kernel.org linux-kernel@vger.kernel.org, Linuxarm linux...@huawei.com, paul.gortma...@windriver.com paul.gortma...@windriver.com, robh...@kernel.org robh...@kernel.org, ga...@codeaurora.org ga...@codeaurora.org, zhangfei@linaro.org zhangfei@linaro.org, net...@vger.kernel.org net...@vger.kernel.org, da...@davemloft.net da...@davemloft.net, linux-arm-ker...@lists.infradead.org linux-arm-ker...@lists.infradead.org Subject: Re: 答复: [PATCH 1/5] net: add Hisilicon Network Subsystem support (config and documents) User-Agent: KMail/4.11.5 (Linux/3.16.0-10-generic; KDE/4.11.5; x86_64; ; ) Message-ID: 2543796.7JthO5WCfI@wuerfel On Monday 17 August 2015 01:28:07 Liguozhu wrote: Thanks, Arnd. Regarding the ae-name: it is the name of the Acceleration Engine. It is provided by the BIOS according to the position and the feature enabled of the IP. So soc0 means it is on SoC No. 0, while n4 means it is running on Non-dsaf mode 4. Ideally, we should setup the rule to name it. But as I said in the patchset, the IP is original designed for a bare metal solution, it is worthless to export all modes and we are planning to add more mode for Linux itself in the IP in future version. So I think the better way is to leave it as a name but add more meaning in the future. The name property is a bit awkward. The position is normally implied by the location of the parent device in the DT, so you should not need that at all and instead derive it elsewhere. You can also add strings to the compatible property instead of this, to signify differences in the programming that are based on how the IP block is used. Regarding the ae-opts: it is the initial value for the AE's runtime options, Currently, we have only port number (there are 6XGE+2GE port for a DSAF AE) as option. But for future version, we will add other options such as enable Spanning Tree Protocol algorithm) and so on. I think these can easily be converted into an index property and boolean flags (present if true, absent otherwise) for additional features. Should I add these background to somewhere? The binding document needs to list all supported configurations, if you have a string property, describe specifically what strings are allowed and what they mean, but better try to avoid strings altogether. Arnd ___ linuxarm mailing list linux...@huawei.com http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm Dear Arnd, We are working on the new PatchSet. I describe the new design here so in case you can tell us if we make something wrong. So now we will keep some attributes in enthernet node like this: ethernet@0{ compatible = hisilicon,hns-nic; ae-name = dsaf1; port-id = 0; }; ae-name is simply a name referring to the name of dsa_name in SAF node. port-id is the index of port provided by DSAF (the accelerator). DSAF can connect to 8 PHYs. Port 0 to 1 are both used for adminstration purpose. They are called debug ports. The remaining 6 PHYs are taken according to the mode of DSAF. In NIC mode of DSAF, all 6 PHYs are taken as ethernet ports to the CPU. The port-id can be 2 to 7. Here is the diagram: +-+---+ |CPU | +-+-+-+---+-+-+-+-+-+-+ | | | | | | | | debug service port port (0,1) (2-7) In Switch mode of DSAF, all 6 PHYs are taken as physical ports connect to a LAN Switch while the CPU side assume itself have one single NIC connect to this switch. In this case, the port-id will be 2 only. +-+---+ |CPU | +-+-+-+---+-+-+-+-+-+-+ | | | service port(2) debug ++ port| switch | (0,1) +-+-+-+-+-+-++ | | | | | | external port -- -Kenneth(Hisilicon) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] net: add Hisilicon Network Subsystem hnae framework support
Thanks, Klimov, You are right. I will fix it in next patches. On Tue, Aug 18, 2015 at 03:12:02AM +0300, Alexey Klimov wrote: > Date: Tue, 18 Aug 2015 03:12:02 +0300 > From: Alexey Klimov > To: Kenneth Lee > CC: robh...@kernel.org, pawel.m...@arm.com, Mark Rutland > , ijc+devicet...@hellion.org.uk, Kumar Gala > , Catalin Marinas , Will > Deacon , yisen.zhu...@huawei.com, "David S. Miller" > , paul.gortma...@windriver.com, > dingtianh...@huawei.com, zhangfei@linaro.org, > devicet...@vger.kernel.org, Linux Kernel Mailing List > , linux-arm-ker...@lists.infradead.org, > net...@vger.kernel.org, linux...@huawei.com, salil.me...@huawei.com, > huangda...@hisilicon.com, Kenneth Lee , Yury Norov > > Subject: Re: [PATCH 2/5] net: add Hisilicon Network Subsystem hnae > framework support > Message-ID: > > > Hi Kenneth, > > just small minor question. > > On Fri, Aug 14, 2015 at 1:30 PM, Kenneth Lee wrote: > > HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a > > unified ring buffer interface for Hisilicon Network Acceleration Engines. > > > > With the interface, upper layer can work as ethernet driver, ODP driver or > > other service driver on purpose. > > > > Signed-off-by: Kenneth Lee > > Signed-off-by: Yisen Zhuang > > --- > > drivers/net/ethernet/hisilicon/Kconfig | 33 +- > > drivers/net/ethernet/hisilicon/Makefile | 1 + > > drivers/net/ethernet/hisilicon/hns/Makefile | 15 + > > drivers/net/ethernet/hisilicon/hns/hnae.c | 494 +++ > > drivers/net/ethernet/hisilicon/hns/hnae.h | 582 > > > > 5 files changed, 1124 insertions(+), 1 deletion(-) > > create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile > > create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c > > create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h > > > > diff --git a/drivers/net/ethernet/hisilicon/Kconfig > > b/drivers/net/ethernet/hisilicon/Kconfig > > index dead17b..1e4f5a7 100644 > > --- a/drivers/net/ethernet/hisilicon/Kconfig > > +++ b/drivers/net/ethernet/hisilicon/Kconfig > > @@ -5,7 +5,7 @@ > > config NET_VENDOR_HISILICON > > bool "Hisilicon devices" > > default y > > - depends on ARM > > + depends on ARM || ARM64 > > ---help--- > > If you have a network (Ethernet) card belonging to this class, > > say Y. > > > > @@ -31,4 +31,35 @@ config HIP04_ETH > > If you wish to compile a kernel for a hardware with hisilicon p04 > > SoC and > > want to use the internal ethernet then you should answer Y to > > this. > > > > +config HNS > > + tristate "Hisilicon Network Subsystem Support (Framework)" > > + ---help--- > > + This selects the framework support for Hisilicon Network > > Subsystem. It > > + is needed by any driver which provides HNS acceleration engine or > > make > > + use of the engine > > + > > +config HNS_DSAF > > + tristate "Hisilicon HNS DSAF device Support" > > + select HNS > > + select HNS_MDIO > > + ---help--- > > + This selects the DSAF (Distributed System Area Frabric) network > > + acceleration engine support. The engine is used in Hisilicon P660, > > + Hi1610 and further ICT SoC > > + > > +config HNS_MDIO > > + tristate "Hisilicon HNS MDIO device Support" > > + select MDIO > > + ---help--- > > + This selects the HNS MDIO support. It is needed by HNS_DSAF to > > access > > + the PHY > > + > > +config HNS_ENET > > + tristate "Hisilicon HNS Ethernet Device Support" > > + select PHYLIB > > + select HNS > > + ---help--- > > + This selects the general ethernet driver for HNS. This module > > make > > + use of any HNS AE driver, such as HNS_DSAF > > + > > endif # NET_VENDOR_HISILICON > > diff --git a/drivers/net/ethernet/hisilicon/Makefile > > b/drivers/net/ethernet/hisilicon/Makefile > > index 6c14540..2503a9b 100644 > > --- a/drivers/net/ethernet/hisilicon/Makefile > > +++ b/drivers/net/ethernet/hisilicon/Makefile > > @@ -4,3 +4,4 @@ > > > > obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o > > obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o > > +obj-$(CONFIG_HNS) += hns/ > > diff --git a/drivers/net/ethernet/
Re: [PATCH 2/5] net: add Hisilicon Network Subsystem hnae framework support
Thanks, Klimov, You are right. I will fix it in next patches. On Tue, Aug 18, 2015 at 03:12:02AM +0300, Alexey Klimov wrote: Date: Tue, 18 Aug 2015 03:12:02 +0300 From: Alexey Klimov klimov.li...@gmail.com To: Kenneth Lee liguo...@hisilicon.com CC: robh...@kernel.org, pawel.m...@arm.com, Mark Rutland mark.rutl...@arm.com, ijc+devicet...@hellion.org.uk, Kumar Gala ga...@codeaurora.org, Catalin Marinas catalin.mari...@arm.com, Will Deacon will.dea...@arm.com, yisen.zhu...@huawei.com, David S. Miller da...@davemloft.net, paul.gortma...@windriver.com, dingtianh...@huawei.com, zhangfei@linaro.org, devicet...@vger.kernel.org, Linux Kernel Mailing List linux-kernel@vger.kernel.org, linux-arm-ker...@lists.infradead.org, net...@vger.kernel.org, linux...@huawei.com, salil.me...@huawei.com, huangda...@hisilicon.com, Kenneth Lee liguo...@huawei.com, Yury Norov yury.no...@gmail.com Subject: Re: [PATCH 2/5] net: add Hisilicon Network Subsystem hnae framework support Message-ID: CALW4P+J8LkLshu5TuRT+8c__KRwJ8XAdMV4yA0KEnrfUg=m...@mail.gmail.com Hi Kenneth, just small minor question. On Fri, Aug 14, 2015 at 1:30 PM, Kenneth Lee liguo...@hisilicon.com wrote: HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a unified ring buffer interface for Hisilicon Network Acceleration Engines. With the interface, upper layer can work as ethernet driver, ODP driver or other service driver on purpose. Signed-off-by: Kenneth Lee liguo...@huawei.com Signed-off-by: Yisen Zhuang yisen.zhu...@huawei.com --- drivers/net/ethernet/hisilicon/Kconfig | 33 +- drivers/net/ethernet/hisilicon/Makefile | 1 + drivers/net/ethernet/hisilicon/hns/Makefile | 15 + drivers/net/ethernet/hisilicon/hns/hnae.c | 494 +++ drivers/net/ethernet/hisilicon/hns/hnae.h | 582 5 files changed, 1124 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index dead17b..1e4f5a7 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -5,7 +5,7 @@ config NET_VENDOR_HISILICON bool Hisilicon devices default y - depends on ARM + depends on ARM || ARM64 ---help--- If you have a network (Ethernet) card belonging to this class, say Y. @@ -31,4 +31,35 @@ config HIP04_ETH If you wish to compile a kernel for a hardware with hisilicon p04 SoC and want to use the internal ethernet then you should answer Y to this. +config HNS + tristate Hisilicon Network Subsystem Support (Framework) + ---help--- + This selects the framework support for Hisilicon Network Subsystem. It + is needed by any driver which provides HNS acceleration engine or make + use of the engine + +config HNS_DSAF + tristate Hisilicon HNS DSAF device Support + select HNS + select HNS_MDIO + ---help--- + This selects the DSAF (Distributed System Area Frabric) network + acceleration engine support. The engine is used in Hisilicon P660, + Hi1610 and further ICT SoC + +config HNS_MDIO + tristate Hisilicon HNS MDIO device Support + select MDIO + ---help--- + This selects the HNS MDIO support. It is needed by HNS_DSAF to access + the PHY + +config HNS_ENET + tristate Hisilicon HNS Ethernet Device Support + select PHYLIB + select HNS + ---help--- + This selects the general ethernet driver for HNS. This module make + use of any HNS AE driver, such as HNS_DSAF + endif # NET_VENDOR_HISILICON diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile index 6c14540..2503a9b 100644 --- a/drivers/net/ethernet/hisilicon/Makefile +++ b/drivers/net/ethernet/hisilicon/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o +obj-$(CONFIG_HNS) += hns/ diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile b/drivers/net/ethernet/hisilicon/hns/Makefile new file mode 100644 index 000..6680602 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/Makefile @@ -0,0 +1,15 @@ +# +# Makefile for the HISILICON network device drivers. +# + +obj-$(CONFIG_HNS) += hnae.o + +obj-$(CONFIG_HNS_DSAF) += hns_dsaf.o +hns_dsaf-objs = hns_ae_adapt.o hns_dsaf_gmac.o hns_dsaf_mac.o hns_dsaf_misc.o \ + hns_dsaf_main.o hns_dsaf_ppe.o hns_dsaf_rcb.o hns_dsaf_xgmac.o + +obj
Re: [PATCH 3/5] net: add Hisilicon Network Subsystem MDIO support
Thanks, Arnd, You are right. This is the same IP as hip04_mdio.c. We just mis-understand the hardware design. We will merge them and re-submit the patches. On Fri, Aug 14, 2015 at 10:57:28PM +0200, Arnd Bergmann wrote: > On Friday 14 August 2015 18:30:20 Kenneth Lee wrote: > > > +#define MDIO_BASE_ADDR 0x403C > > Does not belong in here (and is not used) > > > +#define MDIO_COMMAND_REG 0x0 > > +#define MDIO_ADDR_REG 0x4 > > +#define MDIO_WDATA_REG 0x8 > > +#define MDIO_RDATA_REG 0xc > > +#define MDIO_STA_REG 0x10 > > These look suspiciously similar to definitions from > drivers/net/ethernet/hisilicon/hip04_mdio.c. > > Could the hardware be related? If so, please try to share > the common parts. > > > +static inline void mdio_write_reg(void *base, u32 reg, u32 value) > > +{ > > + u8 __iomem *reg_addr = ACCESS_ONCE(base); > > + > > + writel(value, reg_addr + reg); > > +} > > + > > +#define MDIO_WRITE_REG(a, reg, value) \ > > + mdio_write_reg((a)->vbase, (reg), (value)) > > > > Something seems wrong here: why do you have an ACCESS_ONCE() on a > local variable? Doesn't this just make the code less efficient > without providing lockless access to shared variables? > > The types are inconsistent here, you should get a warning from > running this through 'make C=1' because of the missing __iomem > annotation of the pointer. > > Also, why both a macro and an inline function? Just use an inline > function. > > Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] net: add Hisilicon Network Subsystem MDIO support
Thanks, Arnd, You are right. This is the same IP as hip04_mdio.c. We just mis-understand the hardware design. We will merge them and re-submit the patches. On Fri, Aug 14, 2015 at 10:57:28PM +0200, Arnd Bergmann wrote: On Friday 14 August 2015 18:30:20 Kenneth Lee wrote: +#define MDIO_BASE_ADDR 0x403C Does not belong in here (and is not used) +#define MDIO_COMMAND_REG 0x0 +#define MDIO_ADDR_REG 0x4 +#define MDIO_WDATA_REG 0x8 +#define MDIO_RDATA_REG 0xc +#define MDIO_STA_REG 0x10 These look suspiciously similar to definitions from drivers/net/ethernet/hisilicon/hip04_mdio.c. Could the hardware be related? If so, please try to share the common parts. +static inline void mdio_write_reg(void *base, u32 reg, u32 value) +{ + u8 __iomem *reg_addr = ACCESS_ONCE(base); + + writel(value, reg_addr + reg); +} + +#define MDIO_WRITE_REG(a, reg, value) \ + mdio_write_reg((a)-vbase, (reg), (value)) Something seems wrong here: why do you have an ACCESS_ONCE() on a local variable? Doesn't this just make the code less efficient without providing lockless access to shared variables? The types are inconsistent here, you should get a warning from running this through 'make C=1' because of the missing __iomem annotation of the pointer. Also, why both a macro and an inline function? Just use an inline function. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] net: add Hisilicon Network Subsystem basic ethernet support
This is to add basic ethernet support for HNS. It is one of the way to use the HNS acceleration engine. But most of the decoding/encoding capability of the AE cannot be used in this way. This submit contains the basic feature as a ethernet driver. More will be added later. Signed-off-by: Kenneth Lee Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/hns/hns_enet.c| 1552 ++ drivers/net/ethernet/hisilicon/hns/hns_enet.h| 81 ++ drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1174 3 files changed, 2807 insertions(+) create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c new file mode 100644 index 000..b58d5ab --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -0,0 +1,1552 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "hnae.h" +#include "hns_enet.h" + +#define NIC_MAX_Q_PER_VF 16 +#define HNS_NIC_TX_TIMEOUT (5 * HZ) + +#define SERVICE_TIMER_HZ (1 * HZ) + +#define NIC_TX_CLEAN_MAX_NUM 256 +#define NIC_RX_CLEAN_MAX_NUM 64 + +#define RCB_ERR_PRINT_CYCLE 1000 + +static inline void fill_desc(struct hnae_ring *ring, void *priv, +int size, dma_addr_t dma, int frag_end, +int buf_num, enum hns_desc_type type) +{ + struct hnae_desc *desc = >desc[ring->next_to_use]; + struct hnae_desc_cb *desc_cb = >desc_cb[ring->next_to_use]; + struct sk_buff *skb; + __be16 protocol; + u32 ip_offset; + u32 asid_bufnum_pid = 0; + u32 flag_ipoffset = 0; + + desc_cb->priv = priv; + desc_cb->length = size; + desc_cb->dma = dma; + desc_cb->type = type; + + desc->addr = cpu_to_le64(dma); + desc->tx.send_size = cpu_to_le16((u16)size); + + /*config bd buffer end */ + flag_ipoffset |= 1 << HNS_TXD_VLD_B; + + asid_bufnum_pid |= buf_num << HNS_TXD_BUFNUM_S; + + if (type == DESC_TYPE_SKB) { + skb = (struct sk_buff *)priv; + + if (skb->ip_summed == CHECKSUM_PARTIAL) { + protocol = skb->protocol; + ip_offset = ETH_HLEN; + + /*if it is a SW VLAN check the next protocol*/ + if (protocol == htons(ETH_P_8021Q)) { + ip_offset += VLAN_HLEN; + protocol = vlan_get_protocol(skb); + skb->protocol = protocol; + } + + if (skb->protocol == ntohs(ETH_P_IP)) { + flag_ipoffset |= 1 << HNS_TXD_L3CS_B; + /* check for tcp/udp header */ + flag_ipoffset |= 1 << HNS_TXD_L4CS_B; + + } else if (skb->protocol == ntohs(ETH_P_IPV6)) { + /* ipv6 has not l3 cs, check for L4 header */ + flag_ipoffset |= 1 << HNS_TXD_L4CS_B; + } + + flag_ipoffset |= ip_offset << HNS_TXD_IPOFFSET_S; + } + } + + flag_ipoffset |= frag_end << HNS_TXD_FE_B; + + desc->tx.asid_bufnum_pid = cpu_to_le16(asid_bufnum_pid); + desc->tx.flag_ipoffset = cpu_to_le32(flag_ipoffset); + + ring_ptr_move_fw(ring, next_to_use); +} + +static inline void unfill_desc(struct hnae_ring *ring) +{ + ring_ptr_move_bw(ring, next_to_use); +} + +int hns_nic_net_xmit_hw(struct net_device *ndev, + struct sk_buff *skb, + struct hns_nic_ring_data *ring_data) +{ + struct hns_nic_priv *priv = netdev_priv(ndev); + struct device *dev = priv->dev; + struct hnae_ring *ring = ring_data->ring; + struct netdev_queue *dev_queue; + struct skb_frag_struct *frag; + int buf_num; + dma_addr_t dma; + int size, next_to_use; + int i, j; + struct sk_buff *new_skb; + + assert(ring->max_desc_num_per_pkt <= ring->desc_num); + + /* no. of segments (plus a header) */ + buf_num = skb_shinfo(skb)->nr_frags + 1; + + if (unlikely(buf_num > ring->max_desc_num_per_pkt)) { +
[PATCH 3/5] net: add Hisilicon Network Subsystem MDIO support
The MDIO support for Hisilicon Network Subsystem. It is used in Hislicon P660 and Hi1610 SoC to control the external PHY Signed-off-by: Yisen Zhuang Signed-off-by: Kenneth Lee --- drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c | 597 + 1 file changed, 597 insertions(+) create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c diff --git a/drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c b/drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c new file mode 100644 index 000..7113fa8 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c @@ -0,0 +1,597 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define MDIO_DRV_NAME "hi-mdio" +#define MDIO_BUS_NAME "Hisilicon MII Bus" +#define MDIO_DRV_VERSION "1.1.0" +#define MDIO_COPYRIGHT "Copyright(c) 2015 Huawei Corporation." +#define MDIO_DRV_STRING MDIO_BUS_NAME +#define MDIO_DEFAULT_DEVICE_DESCR MDIO_BUS_NAME + +#define MDIO_CTL_DEV_ADDR(x) (x & 0x1f) +#define MDIO_CTL_PORT_ADDR(x) ((x & 0x1f) << 5) + +#define MDIO_BASE_ADDR 0x403C +#define MDIO_REG_ADDR_LEN 0x1000 +#define MDIO_PHY_GRP_LEN 0x100 +#define MDIO_REG_LEN 0x10 +#define MDIO_PHY_ADDR_NUM 5 +#define MDIO_MAX_PHY_ADDR 0x1F +#define MDIO_MAX_PHY_REG_ADDR 0x + +#define MDIO_TIMEOUT 100 + +struct hns_mdio_device { + struct device *dev; + void *vbase;/* mdio reg base address */ + u8 phy_class[PHY_MAX_ADDR]; + u8 index; + u8 chip_id; + u8 gidx;/* global index */ +}; + +#define MDIO_COMMAND_REG 0x0 +#define MDIO_ADDR_REG 0x4 +#define MDIO_WDATA_REG 0x8 +#define MDIO_RDATA_REG 0xc +#define MDIO_STA_REG 0x10 + +#define MDIO_CMD_DEVAD_M 0x1f +#define MDIO_CMD_DEVAD_S 0 +#define MDIO_CMD_PRTAD_M 0x1f +#define MDIO_CMD_PRTAD_S 5 +#define MDIO_CMD_OP_M 0x3 +#define MDIO_CMD_OP_S 10 +#define MDIO_CMD_ST_M 0x3 +#define MDIO_CMD_ST_S 12 +#define MDIO_CMD_START_B 14 + +#define MDIO_ADDR_DATA_M 0x +#define MDIO_ADDR_DATA_S 0 + +#define MDIO_WDATA_DATA_M 0x +#define MDIO_WDATA_DATA_S 0 + +#define MDIO_RDATA_DATA_M 0x +#define MDIO_RDATA_DATA_S 0 + +#define MDIO_STATE_STA_B 0 + +enum mdio_st_clause { + MDIO_ST_CLAUSE_45 = 0, + MDIO_ST_CLAUSE_22 +}; + +enum mdio_c22_op_seq { + MDIO_C22_WRITE = 1, + MDIO_C22_READ = 2 +}; + +enum mdio_c45_op_seq { + MDIO_C45_WRITE_ADDR = 0, + MDIO_C45_WRITE_DATA, + MDIO_C45_READ_INCREMENT, + MDIO_C45_READ +}; + +static inline void mdio_write_reg(void *base, u32 reg, u32 value) +{ + u8 __iomem *reg_addr = ACCESS_ONCE(base); + + writel(value, reg_addr + reg); +} + +#define MDIO_WRITE_REG(a, reg, value) \ + mdio_write_reg((a)->vbase, (reg), (value)) + +static inline u32 mdio_read_reg(void *base, u32 reg) +{ + u8 __iomem *reg_addr = ACCESS_ONCE(base); + + return readl(reg_addr + reg); +} + +#define MDIO_READ_REG(a, reg) \ + mdio_read_reg((a)->vbase, (reg)) + +#define mdio_set_field(origin, mask, shift, val) \ + do { \ + (origin) &= (~((mask) << (shift))); \ + (origin) |= (((val) & (mask)) << (shift)); \ + } while (0) + +#define mdio_get_field(origin, mask, shift) (((origin) >> (shift)) & (mask)) + +static void mdio_set_reg_field(void *base, u32 reg, u32 mask, u32 shift, + u32 val) +{ + u32 origin = mdio_read_reg(base, reg); + + mdio_set_field(origin, mask, shift, val); + mdio_write_reg(base, reg, origin); +} + +#define MDIO_SET_REG_FIELD(dev, reg, mask, shift, val) \ + mdio_set_reg_field((dev)->vbase, (reg), (mask), (shift), (val)) + +static u32 mdio_get_reg_field(void *base, u32 reg, u32 mask, u32 shift) +{ + u32 origin; + + origin = mdio_read_reg(base, reg); + return mdio_get_field(origin, mask, shift); +} + +#define MDIO_GET_REG_FIELD(dev, reg, mask, shift) \ + mdio_get_reg_field((dev)->vbase, (reg), (mask), (shift)) + +#define MDIO_SET_REG_BIT(dev, reg, bit, val) \ + mdio_set_reg_field((dev)->vbase, (reg), 0x1ull, (bit), (val)) + +#define MDIO_GET_REG_BIT(dev, reg, bit) \ + mdio_get_reg_fiel
[PATCH 2/5] net: add Hisilicon Network Subsystem hnae framework support
HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a unified ring buffer interface for Hisilicon Network Acceleration Engines. With the interface, upper layer can work as ethernet driver, ODP driver or other service driver on purpose. Signed-off-by: Kenneth Lee Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/Kconfig | 33 +- drivers/net/ethernet/hisilicon/Makefile | 1 + drivers/net/ethernet/hisilicon/hns/Makefile | 15 + drivers/net/ethernet/hisilicon/hns/hnae.c | 494 +++ drivers/net/ethernet/hisilicon/hns/hnae.h | 582 5 files changed, 1124 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index dead17b..1e4f5a7 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -5,7 +5,7 @@ config NET_VENDOR_HISILICON bool "Hisilicon devices" default y - depends on ARM + depends on ARM || ARM64 ---help--- If you have a network (Ethernet) card belonging to this class, say Y. @@ -31,4 +31,35 @@ config HIP04_ETH If you wish to compile a kernel for a hardware with hisilicon p04 SoC and want to use the internal ethernet then you should answer Y to this. +config HNS + tristate "Hisilicon Network Subsystem Support (Framework)" + ---help--- + This selects the framework support for Hisilicon Network Subsystem. It + is needed by any driver which provides HNS acceleration engine or make + use of the engine + +config HNS_DSAF + tristate "Hisilicon HNS DSAF device Support" + select HNS + select HNS_MDIO + ---help--- + This selects the DSAF (Distributed System Area Frabric) network + acceleration engine support. The engine is used in Hisilicon P660, + Hi1610 and further ICT SoC + +config HNS_MDIO + tristate "Hisilicon HNS MDIO device Support" + select MDIO + ---help--- + This selects the HNS MDIO support. It is needed by HNS_DSAF to access + the PHY + +config HNS_ENET + tristate "Hisilicon HNS Ethernet Device Support" + select PHYLIB + select HNS + ---help--- + This selects the general ethernet driver for HNS. This module make + use of any HNS AE driver, such as HNS_DSAF + endif # NET_VENDOR_HISILICON diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile index 6c14540..2503a9b 100644 --- a/drivers/net/ethernet/hisilicon/Makefile +++ b/drivers/net/ethernet/hisilicon/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o +obj-$(CONFIG_HNS) += hns/ diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile b/drivers/net/ethernet/hisilicon/hns/Makefile new file mode 100644 index 000..6680602 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/Makefile @@ -0,0 +1,15 @@ +# +# Makefile for the HISILICON network device drivers. +# + +obj-$(CONFIG_HNS) += hnae.o + +obj-$(CONFIG_HNS_DSAF) += hns_dsaf.o +hns_dsaf-objs = hns_ae_adapt.o hns_dsaf_gmac.o hns_dsaf_mac.o hns_dsaf_misc.o \ + hns_dsaf_main.o hns_dsaf_ppe.o hns_dsaf_rcb.o hns_dsaf_xgmac.o + +obj-$(CONFIG_HNS_MDIO) += hns_mdio.o +hns_mdio-objs = hns_mdio_main.o + +obj-$(CONFIG_HNS_ENET) += hns_enet_drv.o +hns_enet_drv-objs = hns_enet.o hns_ethtool.o diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c new file mode 100644 index 000..fd09768 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -0,0 +1,494 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include + +#include "hnae.h" + +#define cls_to_ae_dev(dev) container_of(dev, struct hnae_ae_dev, cls_dev) + +static struct class *hnae_class; + +static inline void hnae_list_add(spinlock_t *lock, struct list_head *node, +struct list_head *head) +{ + unsigned long flags; + + spin_lock_irqsave(lock, flags); + list_add_tail_rcu(node, head); + spin_unlock_irqrestore(lock, flags); +} + +static inline void hnae_list_del(spinlock_t *lock, struct list_head *node) +{ + unsigned long flags; + + spin_lock_irqsave(lock, flags); + list_del_rcu(node); + spin_unlock_irqrestore(lock, flags
[PATCH 0/5] net: Hisilicon Network Subsystem support
This patchset add Hisilicon Network Subsystem support. The subsystem provides a long term developing network accelerate engine with ring buffer interface. The network interface can be used as standard ethernet network interface card or be made use by a network application with decoded L2 to L4 data. The patchset is porting from some internal-use drivers, it is tested and working fine with the hardware. But some detail design is not that good. But we want to know if the community can accept the structure/arch before refining it. Thank you. Kenneth Lee (5): net: add Hisilicon Network Subsystem support (config and documents) net: add Hisilicon Network Subsystem hnae framework support net: add Hisilicon Network Subsystem MDIO support net: add Hisilicon Network Subsystem DSAF support net: add Hisilicon Network Subsystem basic ethernet support .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 40 + .../devicetree/bindings/net/hisilicon-hns-mdio.txt | 22 + .../devicetree/bindings/net/hisilicon-hns-nic.txt | 14 + arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi | 197 ++ drivers/net/ethernet/hisilicon/Kconfig | 33 +- drivers/net/ethernet/hisilicon/Makefile|1 + drivers/net/ethernet/hisilicon/hns/Makefile| 15 + drivers/net/ethernet/hisilicon/hns/hnae.c | 494 drivers/net/ethernet/hisilicon/hns/hnae.h | 582 + drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 766 ++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 705 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h | 45 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 942 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 462 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2681 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 438 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 311 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h | 45 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 582 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h | 105 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 972 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 136 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h | 958 +++ .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c| 826 ++ .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h| 15 + drivers/net/ethernet/hisilicon/hns/hns_enet.c | 1552 +++ drivers/net/ethernet/hisilicon/hns/hns_enet.h | 81 + drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1174 + drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c | 597 + 29 files changed, 14790 insertions(+), 1 deletion(-) create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] net: add Hisilicon Network Subsystem support (config and documents)
The Hisilicon Network Subsystem is a long term evolution IP which is supposed to be used in Hisilicon ICT SoC. The IP, which is called hns for short, is a TCP/IP acceleration engine, which can directly decode TCP/IP stream and distribute them to different ring buffers. HNS can be configured to work on different mode for different scenario. This patch make use only some of the mode to make it as standard ethernet NIC. The other mode will be added soon. The whole function has 4 kernel sub-modules: hnae: the HNS acceleration engine framework. It provides a abstract interface between the engine and the upper layers which make use of the engine by ring buffer. hns_enet_drv: a standard ethernet driver that base on the ring buffer. hns_dsaf: one of the implementation of HNS acceleration engine, which is applied on Hililicon P660, Hi1610 and other later-on SoCs hns_mdio: the mdio control to the PHY, used by acceleration engine This submit add basic config and documents Signed-off-by: Kenneth Lee Signed-off-by: Yisen Zhuang --- .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 40 + .../devicetree/bindings/net/hisilicon-hns-mdio.txt | 22 +++ .../devicetree/bindings/net/hisilicon-hns-nic.txt | 14 ++ arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi | 197 + 4 files changed, 273 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt new file mode 100644 index 000..038c03d --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt @@ -0,0 +1,40 @@ +Hisilicon DSA Fabric device controller + +Required properties: +- compatible: should be "hisilicon,dsaf". +- dsa-name: dsa fabric name who provide this interface +- interrupt-parent: the interrupt parent of this device. +- interrupts: should contain the DSA Fabric and rcb interrupt. +- reg: specifies base physical address(es) and size of the device registers. + The first region is external interface control register base and size. + The second region is SerDes base register and size. + The third region is the PPE register base and size. + The fourth region is dsa fabric base register and size. +- phy-handle: phy handle of physicl port, 0 if not any phy device. see ethernet.txt [1]. +- buf-size: rx buffer size, should be 16-1024. +- desc-num: number of description in TX and RX queue, should be 512, 1024, 2048 or 4096. + +[1] Documentation/devicetree/bindings/net/phy.txt + +Example: + +dsa: dsa@c700 { + compatible = "hisilicon,dsaf"; + dsa_name = "soc0-n4"; + interrupt-parent = <_dsa>; + reg = <0x0 0xC000 0x0 0x42 + 0x0 0xC200 0x0 0x30 + 0x0 0xc500 0x0 0x89 + 0x0 0xc700 0x0 0x6>; + phy-handle = <0 0 0 0 _phy4 _phy5 0 0>; + interrupts = <131 4>,<132 4>, <133 4>,<134 4>, +<135 4>,<136 4>, <137 4>,<138 4>, +<139 4>,<140 4>, <141 4>,<142 4>, +<143 4>,<144 4>, <145 4>,<146 4>, +<147 4>,<148 4>, <384 1>,<385 1>, +<386 1>,<387 1>, <388 1>,<389 1>, +<390 1>,<391 1>, + buf-size = <4096>; + desc-num = <1024>; + dma-coherent; +}; diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt new file mode 100644 index 000..205e803 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt @@ -0,0 +1,22 @@ +Hisilicon MDIO bus controller + +Properties: +- compatible: "hisilicon,mdio" +- reg: The base address of the MDIO bus controller register bank. +- #address-cells: Must be <1>. +- #size-cells: Must be <0>. MDIO addresses have no size component. + +Typically an MDIO bus might have several children. + +Example: + mdio@803c { + #address-cells = <1>; + #size-cells = <0>; + compatible = "hisilicon,mdio"; + reg = <0x0 0x803c 0x0 0x1>; + + ethernet-phy@0 { +... +reg = <0>; + }; + }; diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt ne
[PATCH 5/5] net: add Hisilicon Network Subsystem basic ethernet support
This is to add basic ethernet support for HNS. It is one of the way to use the HNS acceleration engine. But most of the decoding/encoding capability of the AE cannot be used in this way. This submit contains the basic feature as a ethernet driver. More will be added later. Signed-off-by: Kenneth Lee liguo...@huawei.com Signed-off-by: Yisen Zhuang yisen.zhu...@huawei.com --- drivers/net/ethernet/hisilicon/hns/hns_enet.c| 1552 ++ drivers/net/ethernet/hisilicon/hns/hns_enet.h| 81 ++ drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1174 3 files changed, 2807 insertions(+) create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c new file mode 100644 index 000..b58d5ab --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -0,0 +1,1552 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include linux/module.h +#include linux/interrupt.h +#include linux/etherdevice.h +#include linux/platform_device.h +#include linux/clk.h +#include linux/skbuff.h +#include linux/phy.h +#include linux/io.h +#include linux/ip.h +#include linux/ipv6.h +#include linux/if_vlan.h +#include hnae.h +#include hns_enet.h + +#define NIC_MAX_Q_PER_VF 16 +#define HNS_NIC_TX_TIMEOUT (5 * HZ) + +#define SERVICE_TIMER_HZ (1 * HZ) + +#define NIC_TX_CLEAN_MAX_NUM 256 +#define NIC_RX_CLEAN_MAX_NUM 64 + +#define RCB_ERR_PRINT_CYCLE 1000 + +static inline void fill_desc(struct hnae_ring *ring, void *priv, +int size, dma_addr_t dma, int frag_end, +int buf_num, enum hns_desc_type type) +{ + struct hnae_desc *desc = ring-desc[ring-next_to_use]; + struct hnae_desc_cb *desc_cb = ring-desc_cb[ring-next_to_use]; + struct sk_buff *skb; + __be16 protocol; + u32 ip_offset; + u32 asid_bufnum_pid = 0; + u32 flag_ipoffset = 0; + + desc_cb-priv = priv; + desc_cb-length = size; + desc_cb-dma = dma; + desc_cb-type = type; + + desc-addr = cpu_to_le64(dma); + desc-tx.send_size = cpu_to_le16((u16)size); + + /*config bd buffer end */ + flag_ipoffset |= 1 HNS_TXD_VLD_B; + + asid_bufnum_pid |= buf_num HNS_TXD_BUFNUM_S; + + if (type == DESC_TYPE_SKB) { + skb = (struct sk_buff *)priv; + + if (skb-ip_summed == CHECKSUM_PARTIAL) { + protocol = skb-protocol; + ip_offset = ETH_HLEN; + + /*if it is a SW VLAN check the next protocol*/ + if (protocol == htons(ETH_P_8021Q)) { + ip_offset += VLAN_HLEN; + protocol = vlan_get_protocol(skb); + skb-protocol = protocol; + } + + if (skb-protocol == ntohs(ETH_P_IP)) { + flag_ipoffset |= 1 HNS_TXD_L3CS_B; + /* check for tcp/udp header */ + flag_ipoffset |= 1 HNS_TXD_L4CS_B; + + } else if (skb-protocol == ntohs(ETH_P_IPV6)) { + /* ipv6 has not l3 cs, check for L4 header */ + flag_ipoffset |= 1 HNS_TXD_L4CS_B; + } + + flag_ipoffset |= ip_offset HNS_TXD_IPOFFSET_S; + } + } + + flag_ipoffset |= frag_end HNS_TXD_FE_B; + + desc-tx.asid_bufnum_pid = cpu_to_le16(asid_bufnum_pid); + desc-tx.flag_ipoffset = cpu_to_le32(flag_ipoffset); + + ring_ptr_move_fw(ring, next_to_use); +} + +static inline void unfill_desc(struct hnae_ring *ring) +{ + ring_ptr_move_bw(ring, next_to_use); +} + +int hns_nic_net_xmit_hw(struct net_device *ndev, + struct sk_buff *skb, + struct hns_nic_ring_data *ring_data) +{ + struct hns_nic_priv *priv = netdev_priv(ndev); + struct device *dev = priv-dev; + struct hnae_ring *ring = ring_data-ring; + struct netdev_queue *dev_queue; + struct skb_frag_struct *frag; + int buf_num; + dma_addr_t dma; + int size, next_to_use; + int i, j; + struct sk_buff *new_skb; + + assert(ring-max_desc_num_per_pkt = ring-desc_num); + + /* no. of segments (plus a header) */ + buf_num = skb_shinfo(skb)-nr_frags + 1; + + if (unlikely(buf_num ring-max_desc_num_per_pkt
[PATCH 1/5] net: add Hisilicon Network Subsystem support (config and documents)
The Hisilicon Network Subsystem is a long term evolution IP which is supposed to be used in Hisilicon ICT SoC. The IP, which is called hns for short, is a TCP/IP acceleration engine, which can directly decode TCP/IP stream and distribute them to different ring buffers. HNS can be configured to work on different mode for different scenario. This patch make use only some of the mode to make it as standard ethernet NIC. The other mode will be added soon. The whole function has 4 kernel sub-modules: hnae: the HNS acceleration engine framework. It provides a abstract interface between the engine and the upper layers which make use of the engine by ring buffer. hns_enet_drv: a standard ethernet driver that base on the ring buffer. hns_dsaf: one of the implementation of HNS acceleration engine, which is applied on Hililicon P660, Hi1610 and other later-on SoCs hns_mdio: the mdio control to the PHY, used by acceleration engine This submit add basic config and documents Signed-off-by: Kenneth Lee liguo...@huawei.com Signed-off-by: Yisen Zhuang yisen.zhu...@huawei.com --- .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 40 + .../devicetree/bindings/net/hisilicon-hns-mdio.txt | 22 +++ .../devicetree/bindings/net/hisilicon-hns-nic.txt | 14 ++ arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi | 197 + 4 files changed, 273 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt new file mode 100644 index 000..038c03d --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt @@ -0,0 +1,40 @@ +Hisilicon DSA Fabric device controller + +Required properties: +- compatible: should be hisilicon,dsaf. +- dsa-name: dsa fabric name who provide this interface +- interrupt-parent: the interrupt parent of this device. +- interrupts: should contain the DSA Fabric and rcb interrupt. +- reg: specifies base physical address(es) and size of the device registers. + The first region is external interface control register base and size. + The second region is SerDes base register and size. + The third region is the PPE register base and size. + The fourth region is dsa fabric base register and size. +- phy-handle: phy handle of physicl port, 0 if not any phy device. see ethernet.txt [1]. +- buf-size: rx buffer size, should be 16-1024. +- desc-num: number of description in TX and RX queue, should be 512, 1024, 2048 or 4096. + +[1] Documentation/devicetree/bindings/net/phy.txt + +Example: + +dsa: dsa@c700 { + compatible = hisilicon,dsaf; + dsa_name = soc0-n4; + interrupt-parent = mbigen_dsa; + reg = 0x0 0xC000 0x0 0x42 + 0x0 0xC200 0x0 0x30 + 0x0 0xc500 0x0 0x89 + 0x0 0xc700 0x0 0x6; + phy-handle = 0 0 0 0 soc0_phy4 soc0_phy5 0 0; + interrupts = 131 4,132 4, 133 4,134 4, +135 4,136 4, 137 4,138 4, +139 4,140 4, 141 4,142 4, +143 4,144 4, 145 4,146 4, +147 4,148 4, 384 1,385 1, +386 1,387 1, 388 1,389 1, +390 1,391 1, + buf-size = 4096; + desc-num = 1024; + dma-coherent; +}; diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt new file mode 100644 index 000..205e803 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt @@ -0,0 +1,22 @@ +Hisilicon MDIO bus controller + +Properties: +- compatible: hisilicon,mdio +- reg: The base address of the MDIO bus controller register bank. +- #address-cells: Must be 1. +- #size-cells: Must be 0. MDIO addresses have no size component. + +Typically an MDIO bus might have several children. + +Example: + mdio@803c { + #address-cells = 1; + #size-cells = 0; + compatible = hisilicon,mdio; + reg = 0x0 0x803c 0x0 0x1; + + ethernet-phy@0 { +... +reg = 0; + }; + }; diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt new file mode 100644 index 000..5ab6969 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt @@ -0,0 +1,14 @@ +Hisilicon Network Subsystem NIC controller + +Required properties: +- compatible: hisilicon,hns-nic +- ae-name: accelerator name who provide this interface
[PATCH 0/5] net: Hisilicon Network Subsystem support
This patchset add Hisilicon Network Subsystem support. The subsystem provides a long term developing network accelerate engine with ring buffer interface. The network interface can be used as standard ethernet network interface card or be made use by a network application with decoded L2 to L4 data. The patchset is porting from some internal-use drivers, it is tested and working fine with the hardware. But some detail design is not that good. But we want to know if the community can accept the structure/arch before refining it. Thank you. Kenneth Lee (5): net: add Hisilicon Network Subsystem support (config and documents) net: add Hisilicon Network Subsystem hnae framework support net: add Hisilicon Network Subsystem MDIO support net: add Hisilicon Network Subsystem DSAF support net: add Hisilicon Network Subsystem basic ethernet support .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 40 + .../devicetree/bindings/net/hisilicon-hns-mdio.txt | 22 + .../devicetree/bindings/net/hisilicon-hns-nic.txt | 14 + arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi | 197 ++ drivers/net/ethernet/hisilicon/Kconfig | 33 +- drivers/net/ethernet/hisilicon/Makefile|1 + drivers/net/ethernet/hisilicon/hns/Makefile| 15 + drivers/net/ethernet/hisilicon/hns/hnae.c | 494 drivers/net/ethernet/hisilicon/hns/hnae.h | 582 + drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 766 ++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 705 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h | 45 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 942 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 462 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2681 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 438 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 311 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h | 45 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 582 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h | 105 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 972 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 136 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h | 958 +++ .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c| 826 ++ .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h| 15 + drivers/net/ethernet/hisilicon/hns/hns_enet.c | 1552 +++ drivers/net/ethernet/hisilicon/hns/hns_enet.h | 81 + drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1174 + drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c | 597 + 29 files changed, 14790 insertions(+), 1 deletion(-) create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] net: add Hisilicon Network Subsystem hnae framework support
HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a unified ring buffer interface for Hisilicon Network Acceleration Engines. With the interface, upper layer can work as ethernet driver, ODP driver or other service driver on purpose. Signed-off-by: Kenneth Lee liguo...@huawei.com Signed-off-by: Yisen Zhuang yisen.zhu...@huawei.com --- drivers/net/ethernet/hisilicon/Kconfig | 33 +- drivers/net/ethernet/hisilicon/Makefile | 1 + drivers/net/ethernet/hisilicon/hns/Makefile | 15 + drivers/net/ethernet/hisilicon/hns/hnae.c | 494 +++ drivers/net/ethernet/hisilicon/hns/hnae.h | 582 5 files changed, 1124 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index dead17b..1e4f5a7 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -5,7 +5,7 @@ config NET_VENDOR_HISILICON bool Hisilicon devices default y - depends on ARM + depends on ARM || ARM64 ---help--- If you have a network (Ethernet) card belonging to this class, say Y. @@ -31,4 +31,35 @@ config HIP04_ETH If you wish to compile a kernel for a hardware with hisilicon p04 SoC and want to use the internal ethernet then you should answer Y to this. +config HNS + tristate Hisilicon Network Subsystem Support (Framework) + ---help--- + This selects the framework support for Hisilicon Network Subsystem. It + is needed by any driver which provides HNS acceleration engine or make + use of the engine + +config HNS_DSAF + tristate Hisilicon HNS DSAF device Support + select HNS + select HNS_MDIO + ---help--- + This selects the DSAF (Distributed System Area Frabric) network + acceleration engine support. The engine is used in Hisilicon P660, + Hi1610 and further ICT SoC + +config HNS_MDIO + tristate Hisilicon HNS MDIO device Support + select MDIO + ---help--- + This selects the HNS MDIO support. It is needed by HNS_DSAF to access + the PHY + +config HNS_ENET + tristate Hisilicon HNS Ethernet Device Support + select PHYLIB + select HNS + ---help--- + This selects the general ethernet driver for HNS. This module make + use of any HNS AE driver, such as HNS_DSAF + endif # NET_VENDOR_HISILICON diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile index 6c14540..2503a9b 100644 --- a/drivers/net/ethernet/hisilicon/Makefile +++ b/drivers/net/ethernet/hisilicon/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o +obj-$(CONFIG_HNS) += hns/ diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile b/drivers/net/ethernet/hisilicon/hns/Makefile new file mode 100644 index 000..6680602 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/Makefile @@ -0,0 +1,15 @@ +# +# Makefile for the HISILICON network device drivers. +# + +obj-$(CONFIG_HNS) += hnae.o + +obj-$(CONFIG_HNS_DSAF) += hns_dsaf.o +hns_dsaf-objs = hns_ae_adapt.o hns_dsaf_gmac.o hns_dsaf_mac.o hns_dsaf_misc.o \ + hns_dsaf_main.o hns_dsaf_ppe.o hns_dsaf_rcb.o hns_dsaf_xgmac.o + +obj-$(CONFIG_HNS_MDIO) += hns_mdio.o +hns_mdio-objs = hns_mdio_main.o + +obj-$(CONFIG_HNS_ENET) += hns_enet_drv.o +hns_enet_drv-objs = hns_enet.o hns_ethtool.o diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c new file mode 100644 index 000..fd09768 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -0,0 +1,494 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include linux/dma-mapping.h +#include linux/interrupt.h +#include linux/skbuff.h +#include linux/slab.h + +#include hnae.h + +#define cls_to_ae_dev(dev) container_of(dev, struct hnae_ae_dev, cls_dev) + +static struct class *hnae_class; + +static inline void hnae_list_add(spinlock_t *lock, struct list_head *node, +struct list_head *head) +{ + unsigned long flags; + + spin_lock_irqsave(lock, flags); + list_add_tail_rcu(node, head); + spin_unlock_irqrestore(lock, flags); +} + +static inline void hnae_list_del(spinlock_t *lock, struct list_head *node) +{ + unsigned long flags; + + spin_lock_irqsave(lock, flags); + list_del_rcu(node
[PATCH 3/5] net: add Hisilicon Network Subsystem MDIO support
The MDIO support for Hisilicon Network Subsystem. It is used in Hislicon P660 and Hi1610 SoC to control the external PHY Signed-off-by: Yisen Zhuang yisen.zhu...@huawei.com Signed-off-by: Kenneth Lee liguo...@huawei.com --- drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c | 597 + 1 file changed, 597 insertions(+) create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c diff --git a/drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c b/drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c new file mode 100644 index 000..7113fa8 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hns_mdio_main.c @@ -0,0 +1,597 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include linux/errno.h +#include linux/etherdevice.h +#include linux/init.h +#include linux/kernel.h +#include linux/module.h +#include linux/mutex.h +#include linux/netdevice.h +#include linux/of_address.h +#include linux/of.h +#include linux/of_mdio.h +#include linux/of_platform.h +#include linux/phy.h +#include linux/platform_device.h +#include linux/spinlock_types.h + +#define MDIO_DRV_NAME hi-mdio +#define MDIO_BUS_NAME Hisilicon MII Bus +#define MDIO_DRV_VERSION 1.1.0 +#define MDIO_COPYRIGHT Copyright(c) 2015 Huawei Corporation. +#define MDIO_DRV_STRING MDIO_BUS_NAME +#define MDIO_DEFAULT_DEVICE_DESCR MDIO_BUS_NAME + +#define MDIO_CTL_DEV_ADDR(x) (x 0x1f) +#define MDIO_CTL_PORT_ADDR(x) ((x 0x1f) 5) + +#define MDIO_BASE_ADDR 0x403C +#define MDIO_REG_ADDR_LEN 0x1000 +#define MDIO_PHY_GRP_LEN 0x100 +#define MDIO_REG_LEN 0x10 +#define MDIO_PHY_ADDR_NUM 5 +#define MDIO_MAX_PHY_ADDR 0x1F +#define MDIO_MAX_PHY_REG_ADDR 0x + +#define MDIO_TIMEOUT 100 + +struct hns_mdio_device { + struct device *dev; + void *vbase;/* mdio reg base address */ + u8 phy_class[PHY_MAX_ADDR]; + u8 index; + u8 chip_id; + u8 gidx;/* global index */ +}; + +#define MDIO_COMMAND_REG 0x0 +#define MDIO_ADDR_REG 0x4 +#define MDIO_WDATA_REG 0x8 +#define MDIO_RDATA_REG 0xc +#define MDIO_STA_REG 0x10 + +#define MDIO_CMD_DEVAD_M 0x1f +#define MDIO_CMD_DEVAD_S 0 +#define MDIO_CMD_PRTAD_M 0x1f +#define MDIO_CMD_PRTAD_S 5 +#define MDIO_CMD_OP_M 0x3 +#define MDIO_CMD_OP_S 10 +#define MDIO_CMD_ST_M 0x3 +#define MDIO_CMD_ST_S 12 +#define MDIO_CMD_START_B 14 + +#define MDIO_ADDR_DATA_M 0x +#define MDIO_ADDR_DATA_S 0 + +#define MDIO_WDATA_DATA_M 0x +#define MDIO_WDATA_DATA_S 0 + +#define MDIO_RDATA_DATA_M 0x +#define MDIO_RDATA_DATA_S 0 + +#define MDIO_STATE_STA_B 0 + +enum mdio_st_clause { + MDIO_ST_CLAUSE_45 = 0, + MDIO_ST_CLAUSE_22 +}; + +enum mdio_c22_op_seq { + MDIO_C22_WRITE = 1, + MDIO_C22_READ = 2 +}; + +enum mdio_c45_op_seq { + MDIO_C45_WRITE_ADDR = 0, + MDIO_C45_WRITE_DATA, + MDIO_C45_READ_INCREMENT, + MDIO_C45_READ +}; + +static inline void mdio_write_reg(void *base, u32 reg, u32 value) +{ + u8 __iomem *reg_addr = ACCESS_ONCE(base); + + writel(value, reg_addr + reg); +} + +#define MDIO_WRITE_REG(a, reg, value) \ + mdio_write_reg((a)-vbase, (reg), (value)) + +static inline u32 mdio_read_reg(void *base, u32 reg) +{ + u8 __iomem *reg_addr = ACCESS_ONCE(base); + + return readl(reg_addr + reg); +} + +#define MDIO_READ_REG(a, reg) \ + mdio_read_reg((a)-vbase, (reg)) + +#define mdio_set_field(origin, mask, shift, val) \ + do { \ + (origin) = (~((mask) (shift))); \ + (origin) |= (((val) (mask)) (shift)); \ + } while (0) + +#define mdio_get_field(origin, mask, shift) (((origin) (shift)) (mask)) + +static void mdio_set_reg_field(void *base, u32 reg, u32 mask, u32 shift, + u32 val) +{ + u32 origin = mdio_read_reg(base, reg); + + mdio_set_field(origin, mask, shift, val); + mdio_write_reg(base, reg, origin); +} + +#define MDIO_SET_REG_FIELD(dev, reg, mask, shift, val) \ + mdio_set_reg_field((dev)-vbase, (reg), (mask), (shift), (val)) + +static u32 mdio_get_reg_field(void *base, u32 reg, u32 mask, u32 shift) +{ + u32 origin; + + origin = mdio_read_reg(base, reg); + return mdio_get_field(origin, mask, shift); +} + +#define MDIO_GET_REG_FIELD(dev, reg, mask, shift) \ + mdio_get_reg_field((dev)-vbase, (reg), (mask), (shift)) + +#define MDIO_SET_REG_BIT(dev, reg, bit, val
Re: An small ftrace enhancement idea
Thank you, Steve, Yes, with a separated instance, I can measure the latency for a stimulation while capture the other schedule events which I am interesting in. This is a better solution. I don’t know this “instance” stuff before. I don’t need to create another axe. I am sorry for my ignorance. Thanks and regards. --Kenneth 在 2013年10月31日,下午1:50,Steven Rostedt 写道: > On Wed, 30 Oct 2013 15:39:50 -0700 > Kenneth Lee wrote: > >> Dear Steven, >> >> I want to add a new function to ftrace subsystem. Sometimes, we will face >> such a problem: system do not response to the input on time one to two times >> everyday. It is not easy to capture because it rarely happens. So I want to >> add a function to the kernel. If I have such problem, I insert a kernel >> module, who add a hook to the position that receive the input and another to >> the position that response to the input (with a session id if necessary). >> And I can compare the time between them and if the period is longer then a >> pre-set threshold, I can give a signal to a user helper application (maybe a >> script waiting on the file), which then can save the trace event to a file >> for later inspection. > > I'm a little confused in what you want. > > >> >> >> >> The user helper script may look like this: >> >> >> >> #/bin/sh >> >> >> >> echo ‘sched:*’ > /sys/kernel/debug/tracing/set_event >> >> modprobe delay_inspector.ko threshold=500 >> >> cat /sys/kernel/debug/tracing/waiter #wait for signal >> >> cp /sys/kernel/debug/tracing/trace /var/log/delay_infomation >> >> >> >> >> >> It looks like a standalone function. But I don’t have place to put it. Do >> you think I can implement it in ftrace? And do you think if there are better >> solution? >> > > You want something to wake up if it takes too long before an event > happens? > > If so, why not just use a select() on the trace_pipe and if it times > out, then dump the trace. You can even set up a separate instance. > > (this is waiting for a schedule switch to pid 1) > > cd /sys/kernel/debug/tracing > mkdir instances/mine > echo 'next_pid == 1' > instances/mine/events/sched/sched_switch/filter > echo 1 > instances/mine/events/sched/sched_switch/enable > > > The in a userspace program, I open "instances/mine/events/trace_pipe" > and run a select() on that file descriptor with a given timeout. If the > event does not happen within the expected time frame, the select > returns zero, and this userspace program can deal with it. > > Is that the functionality you are trying to achieve? > > -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: An small ftrace enhancement idea
Thank you, Steve, Yes, with a separated instance, I can measure the latency for a stimulation while capture the other schedule events which I am interesting in. This is a better solution. I don’t know this “instance” stuff before. I don’t need to create another axe. I am sorry for my ignorance. Thanks and regards. --Kenneth 在 2013年10月31日,下午1:50,Steven Rostedt rost...@goodmis.org 写道: On Wed, 30 Oct 2013 15:39:50 -0700 Kenneth Lee nek.in...@gmail.com wrote: Dear Steven, I want to add a new function to ftrace subsystem. Sometimes, we will face such a problem: system do not response to the input on time one to two times everyday. It is not easy to capture because it rarely happens. So I want to add a function to the kernel. If I have such problem, I insert a kernel module, who add a hook to the position that receive the input and another to the position that response to the input (with a session id if necessary). And I can compare the time between them and if the period is longer then a pre-set threshold, I can give a signal to a user helper application (maybe a script waiting on the file), which then can save the trace event to a file for later inspection. I'm a little confused in what you want. The user helper script may look like this: #/bin/sh echo ‘sched:*’ /sys/kernel/debug/tracing/set_event modprobe delay_inspector.ko threshold=500 cat /sys/kernel/debug/tracing/waiter #wait for signal cp /sys/kernel/debug/tracing/trace /var/log/delay_infomation It looks like a standalone function. But I don’t have place to put it. Do you think I can implement it in ftrace? And do you think if there are better solution? You want something to wake up if it takes too long before an event happens? If so, why not just use a select() on the trace_pipe and if it times out, then dump the trace. You can even set up a separate instance. (this is waiting for a schedule switch to pid 1) cd /sys/kernel/debug/tracing mkdir instances/mine echo 'next_pid == 1' instances/mine/events/sched/sched_switch/filter echo 1 instances/mine/events/sched/sched_switch/enable The in a userspace program, I open instances/mine/events/trace_pipe and run a select() on that file descriptor with a given timeout. If the event does not happen within the expected time frame, the select returns zero, and this userspace program can deal with it. Is that the functionality you are trying to achieve? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
An small ftrace enhancement idea
Dear Steven, I want to add a new function to ftrace subsystem. Sometimes, we will face such a problem: system do not response to the input on time one to two times everyday. It is not easy to capture because it rarely happens. So I want to add a function to the kernel. If I have such problem, I insert a kernel module, who add a hook to the position that receive the input and another to the position that response to the input (with a session id if necessary). And I can compare the time between them and if the period is longer then a pre-set threshold, I can give a signal to a user helper application (maybe a script waiting on the file), which then can save the trace event to a file for later inspection. The user helper script may look like this: #/bin/sh echo ‘sched:*’ > /sys/kernel/debug/tracing/set_event modprobe delay_inspector.ko threshold=500 cat /sys/kernel/debug/tracing/waiter #wait for signal cp /sys/kernel/debug/tracing/trace /var/log/delay_infomation It looks like a standalone function. But I don’t have place to put it. Do you think I can implement it in ftrace? And do you think if there are better solution? Thank you. Kenneth Lee-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
An small ftrace enhancement idea
Dear Steven, I want to add a new function to ftrace subsystem. Sometimes, we will face such a problem: system do not response to the input on time one to two times everyday. It is not easy to capture because it rarely happens. So I want to add a function to the kernel. If I have such problem, I insert a kernel module, who add a hook to the position that receive the input and another to the position that response to the input (with a session id if necessary). And I can compare the time between them and if the period is longer then a pre-set threshold, I can give a signal to a user helper application (maybe a script waiting on the file), which then can save the trace event to a file for later inspection. The user helper script may look like this: #/bin/sh echo ‘sched:*’ /sys/kernel/debug/tracing/set_event modprobe delay_inspector.ko threshold=500 cat /sys/kernel/debug/tracing/waiter #wait for signal cp /sys/kernel/debug/tracing/trace /var/log/delay_infomation It looks like a standalone function. But I don’t have place to put it. Do you think I can implement it in ftrace? And do you think if there are better solution? Thank you. Kenneth Lee-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/