Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak
On Thu, Jun 12, 2014 at 02:25:43PM +0900, Minchan Kim wrote: > On Thu, Jun 12, 2014 at 12:21:39PM +0900, Joonsoo Kim wrote: > > We should free memory for bitmap when we find zone mis-match, > > otherwise this memory will leak. > > Then, -stable stuff? I don't think so. This is just possible leak candidate, so we don't need to push this to stable tree. > > > > > Additionally, I copy code comment from ppc kvm's cma code to notify > > why we need to check zone mis-match. > > > > Signed-off-by: Joonsoo Kim > > > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > > index bd0bb81..fb0cdce 100644 > > --- a/drivers/base/dma-contiguous.c > > +++ b/drivers/base/dma-contiguous.c > > @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma) > > base_pfn = pfn; > > for (j = pageblock_nr_pages; j; --j, pfn++) { > > WARN_ON_ONCE(!pfn_valid(pfn)); > > + /* > > +* alloc_contig_range requires the pfn range > > +* specified to be in the same zone. Make this > > +* simple by forcing the entire CMA resv range > > +* to be in the same zone. > > +*/ > > if (page_zone(pfn_to_page(pfn)) != zone) > > - return -EINVAL; > > + goto err; > > At a first glance, I thought it would be better to handle such error > before activating. > So when I see the registration code(ie, dma_contiguous_revere_area), > I realized it is impossible because we didn't set up zone yet. :( > > If so, when we detect to fail here, it would be better to report more > meaningful error message like what was successful zone and what is > new zone and failed pfn number? What I want to do in early phase of this patchset is to make cma code on DMA APIs similar to ppc kvm's cma code. ppc kvm's cma code already has this error handling logic, so I make this patch. If we think that we need more things, we can do that on general cma code after merging this patchset. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Jun 12
Hi all, The powerpc allyesconfig is again broken more than usual. Changes since 20140611: Dropped tree: drm-intel-fixes (build problems) The drm-intel-fixes still had its build failure so I dropped it at the maintainers request. The samsung tree gained a conflict against Linus' tree. The pci tree lost its build failure. The net-next tree gained a conflict against Linus' tree and a build failure for which I reverted a commit. The virtio tree gained a conflict against Linus' tree. The target-updates tree gained a conflict against the virtio tree. Non-merge commits (relative to Linus' tree): 3656 2925 files changed, 115781 insertions(+), 54892 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a multi_v7_defconfig for arm. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm defconfig. Below is a summary of the state of the merge. I am currently merging 219 trees (counting Linus' and 29 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (4251c2a67011 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux) Merging fixes/master (4b660a7f5c80 Linux 3.15-rc6) Merging kbuild-current/rc-fixes (38dbfb59d117 Linus 3.14-rc1) Merging arc-current/for-curr (89ca3b881987 Linux 3.15-rc4) Merging arm-current/fixes (3f8517e7937d ARM: 8063/1: bL_switcher: fix individual online status reporting of removed CPUs) Merging m68k-current/for-linus (e8d6dc5ad26e m68k/hp300: Convert printk to pr_foo()) Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX) Merging powerpc-merge/merge (8212f58a9b15 powerpc: Wire renameat2() syscall) Merging sparc/master (8ecc1bad4c9b sparc64: fix format string mismatch in arch/sparc/kernel/sysfs.c) Merging net/master (c5b46160877a net/core: Add VF link state control policy) Merging ipsec/master (6d004d6cc739 vti: Use the tunnel mark for lookup in the error handlers.) Merging sound-current/for-linus (6538de03a98f ALSA: hda - Add quirk for ABit AA8XE) Merging pci-current/for-linus (d0b4cc4e3270 PCI: Wrong register used to check pending traffic) Merging wireless/master (2c316e699fa4 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging driver-core.current/driver-core-linus (4b660a7f5c80 Linux 3.15-rc6) Merging tty.current/tty-linus (d6d211db37e7 Linux 3.15-rc5) Merging usb.current/usb-linus (5dc2808c4729 xhci: delete endpoints from bandwidth list before freeing whole device) Merging usb-gadget-fixes/fixes (886c7c426d46 usb: gadget: at91-udc: fix irq and iomem resource retrieval) Merging staging.current/staging-linus (9326c5ca0982 staging: r8192e_pci: fix htons error) Merging char-misc.current/char-misc-linus (d1db0eea8524 Linux 3.15-rc3) Merging input-current/for-linus (a292241cccb7 Merge branch 'next' into for-linus) Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" stripe) Merging crypto-current/master (3901c1124ec5 crypto: s390 - fix aes,des ctr mode concurrency finding.) Merging ide/master (5b40dd30bbfa ide: Fix SC1200 dependencies) Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff) Merging devicetree-current/devicetree/merge (4b660a7f5c80 Linux 3.15-rc6) Merging rr-fixes/fixes (79465d2fd48e module: remove warning about waiting module removal.) Merging mfd-fixes/master (73beb63d290f mfd: rtsx_pcr: Disable interrupts before cancelling delayed works) Merging vfio-fixes/for-linus (239a87020b26 Merge branch 'for-joerg/arm-smmu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into for-linus) Merging drm-in
Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
On Thu, 12 Jun 2014 12:29:09 +0900, Masami Hiramatsu wrote: > NO, ftrace_lookup_ip() returns NULL if the hash is empty, so adding > !ftrace_hash_empty() is meaningless :) > > Actually, here I intended to have 3 meanings for the new/old_hash arguments, > - If it is NULL, it hits all > - If it is EMPTY_HASH, it hits nothing > - If it has some entries, it hits those entries. > > And in ftrace.c(__ftrace_hash_rec_update), AFAICS, ops->filter_hash has only > 2 meanings, > - If it is EMPTY_HASH or NULL, it hits all > - If it has some entries, it hits those entries. Then I found an unrelated issue during review. It seems that checking NULL other_hash for the 'all' case in __ftrace_hash_rec_update() is not sufficient. It should check EMPTY_HASH case too, but then it ends up removing the check at all since it can be covered in ftrace_lookup_ip(). Thanks, Namhyung diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 13885590a184..8bd7aa69a479 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1545,7 +1545,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops, * Only the filter_hash affects all records. * Update if the record is not in the notrace hash. */ - if (!other_hash || !ftrace_lookup_ip(other_hash, rec->ip)) + if (!ftrace_lookup_ip(other_hash, rec->ip)) match = 1; } else { in_hash = !!ftrace_lookup_ip(hash, rec->ip); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 01/10] DMA, CMA: clean-up log message
On Thu, Jun 12, 2014 at 02:18:53PM +0900, Minchan Kim wrote: > Hi Joonsoo, > > On Thu, Jun 12, 2014 at 12:21:38PM +0900, Joonsoo Kim wrote: > > We don't need explicit 'CMA:' prefix, since we already define prefix > > 'cma:' in pr_fmt. So remove it. > > > > And, some logs print function name and others doesn't. This looks > > bad to me, so I unify log format to print function name consistently. > > > > Lastly, I add one more debug log on cma_activate_area(). > > When I take a look, it just indicates cma_activate_area was called or not, > without what range for the area was reserved successfully so I couldn't see > the intention for new message. Description should explain it so that everybody > can agree on your claim. > Hello, I paste the answer in other thread. This pr_debug() comes from ppc kvm's kvm_cma_init_reserved_areas(). I want to maintain all log messages as much as possible to reduce confusion with this generalization. If I need to respin this patchset, I will explain more about it. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region
On Thu, Jun 12, 2014 at 12:21:41PM +0900, Joonsoo Kim wrote: > ppc kvm's cma area management needs alignment constraint on > cma region. So support it to prepare generalization of cma area > management functionality. > > Additionally, add some comments which tell us why alignment > constraint is needed on cma region. > > Signed-off-by: Joonsoo Kim > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index 8a44c82..bc4c171 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > > struct cma { > unsigned long base_pfn; > @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas); > * @size: Size of the reserved area (in bytes), > * @base: Base address of the reserved area optional, use 0 for any > * @limit: End address of the reserved memory (optional, 0 for any). > + * @alignment: Alignment for the contiguous memory area, should be power of 2 > * @res_cma: Pointer to store the created cma region. > * @fixed: hint about where to place the reserved area > * Pz, move the all description to new API function rather than internal one. > @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas); > */ > static int __init __dma_contiguous_reserve_area(phys_addr_t size, > phys_addr_t base, phys_addr_t limit, > + phys_addr_t alignment, > struct cma **res_cma, bool fixed) > { > struct cma *cma = _areas[cma_area_count]; > - phys_addr_t alignment; > int ret = 0; > > - pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__, > - (unsigned long)size, (unsigned long)base, > - (unsigned long)limit); > + pr_debug("%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n", Why is it called by "align_order"? > + __func__, (unsigned long)size, (unsigned long)base, > + (unsigned long)limit, (unsigned long)alignment); > > /* Sanity checks */ > if (cma_area_count == ARRAY_SIZE(cma_areas)) { > @@ -253,8 +255,17 @@ static int __init > __dma_contiguous_reserve_area(phys_addr_t size, > if (!size) > return -EINVAL; > > - /* Sanitise input arguments */ > - alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order); > + if (alignment && !is_power_of_2(alignment)) > + return -EINVAL; > + > + /* > + * Sanitise input arguments. > + * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise, > + * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism I'm not a native but try for clear documenation. Pages both ends in CMA area could be merged into adjacent unmovable migratetype page by page allocator's buddy algorithm. In the case, you couldn't get a contiguous memory, which is not what we want. > + * and CMA property will be broken. > + */ > + alignment = max(alignment, > + (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order)); > base = ALIGN(base, alignment); > size = ALIGN(size, alignment); > limit &= ~(alignment - 1); > @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, > phys_addr_t base, > { > int ret; > > - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); > + ret = __dma_contiguous_reserve_area(size, base, limit, 0, > + res_cma, fixed); > if (ret) > return ret; > > -- > 1.7.9.5 -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] usb: host: uhci-grlib.c : use devm_ functions
On 2014-06-11 20:38, Himangi Saraogi wrote: The various devm_ functions allocate memory that is released when a driver detaches. This patch uses devm_ioremap_resource for data that is allocated in the probe function of a platform device and is only freed in the remove function. The corresponding free functions are removed and two labels are done away with. Also, linux/device.h is added to make sure the devm_*() routine declarations are unambiguously available. Signed-off-by: Himangi Saraogi Looks and works fine now! Acked-by: Andreas Larsson Best regards, Andreas Larsson --- Not compile tested due to incompatible architecture. v3: pass correct arguments to devm_ioremap_resource drivers/usb/host/uhci-grlib.c | 31 +-- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/drivers/usb/host/uhci-grlib.c b/drivers/usb/host/uhci-grlib.c index ab25dc3..05f57ff 100644 --- a/drivers/usb/host/uhci-grlib.c +++ b/drivers/usb/host/uhci-grlib.c @@ -17,6 +17,7 @@ * (C) Copyright 2004-2007 Alan Stern, st...@rowland.harvard.edu */ +#include #include #include #include @@ -113,24 +114,17 @@ static int uhci_hcd_grlib_probe(struct platform_device *op) hcd->rsrc_start = res.start; hcd->rsrc_len = resource_size(); - if (!request_mem_region(hcd->rsrc_start, hcd->rsrc_len, hcd_name)) { - printk(KERN_ERR "%s: request_mem_region failed\n", __FILE__); - rv = -EBUSY; - goto err_rmr; - } - irq = irq_of_parse_and_map(dn, 0); if (irq == NO_IRQ) { printk(KERN_ERR "%s: irq_of_parse_and_map failed\n", __FILE__); rv = -EBUSY; - goto err_irq; + goto err_usb; } - hcd->regs = ioremap(hcd->rsrc_start, hcd->rsrc_len); - if (!hcd->regs) { - printk(KERN_ERR "%s: ioremap failed\n", __FILE__); - rv = -ENOMEM; - goto err_ioremap; + hcd->regs = devm_ioremap_resource(>dev, ); + if (IS_ERR(hcd->regs)) { + rv = PTR_ERR(hcd->regs); + goto err_irq; } uhci = hcd_to_uhci(hcd); @@ -139,18 +133,14 @@ static int uhci_hcd_grlib_probe(struct platform_device *op) rv = usb_add_hcd(hcd, irq, 0); if (rv) - goto err_uhci; + goto err_irq; device_wakeup_enable(hcd->self.controller); return 0; -err_uhci: - iounmap(hcd->regs); -err_ioremap: - irq_dispose_mapping(irq); err_irq: - release_mem_region(hcd->rsrc_start, hcd->rsrc_len); -err_rmr: + irq_dispose_mapping(irq); +err_usb: usb_put_hcd(hcd); return rv; @@ -164,10 +154,7 @@ static int uhci_hcd_grlib_remove(struct platform_device *op) usb_remove_hcd(hcd); - iounmap(hcd->regs); irq_dispose_mapping(hcd->irq); - release_mem_region(hcd->rsrc_start, hcd->rsrc_len); - usb_put_hcd(hcd); return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest
On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote: > > On 6/11/2014 6:54 AM, Peter Zijlstra wrote: > >On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: > >>Enabling this configuration feature causes a slight decrease the > >>performance of an uncontended lock-unlock operation by about 1-2% > >>mainly due to the use of a static key. However, uncontended lock-unlock > >>operation are really just a tiny percentage of a real workload. So > >>there should no noticeable change in application performance. > >No, entirely unacceptable. > > > >>+#ifdef CONFIG_VIRT_UNFAIR_LOCKS > >>+/** > >>+ * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly > >>+ * @lock : Pointer to queue spinlock structure > >>+ * Return: 1 if lock acquired, 0 if failed > >>+ */ > >>+static __always_inline int queue_spin_trylock_unfair(struct qspinlock > >>*lock) > >>+{ > >>+ union arch_qspinlock *qlock = (union arch_qspinlock *)lock; > >>+ > >>+ if (!qlock->locked && (cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0)) > >>+ return 1; > >>+ return 0; > >>+} > >>+ > >>+/** > >>+ * queue_spin_lock_unfair - acquire a queue spinlock unfairly > >>+ * @lock: Pointer to queue spinlock structure > >>+ */ > >>+static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock) > >>+{ > >>+ union arch_qspinlock *qlock = (union arch_qspinlock *)lock; > >>+ > >>+ if (likely(cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0)) > >>+ return; > >>+ /* > >>+* Since the lock is now unfair, we should not activate the 2-task > >>+* pending bit spinning code path which disallows lock stealing. > >>+*/ > >>+ queue_spin_lock_slowpath(lock, -1); > >>+} > >Why is this needed? > > I added the unfair version of lock and trylock as my original version isn't > a simple test-and-set lock. Now I changed the core part to use the simple > test-and-set lock. However, I still think that an unfair version in the fast > path can be helpful to performance when both the unfair lock and paravirt > spinlock are enabled. In this case, paravirt spinlock code will disable the > unfair lock code in the slowpath, but still allow the unfair version in the > fast path to get the best possible performance in a virtual guest. > > Yes, I could take that out to allow either unfair or paravirt spinlock, but > not both. I do think that a little bit of unfairness will help in the > virtual environment. When will you learn to like simplicity and stop this massive over engineering effort? There's no sane reason to have the test-and-set virt and paravirt locks enabled at the same bloody time. There's 3 distinct cases: - native - virt - paravirt And they do not overlap. Furthermore, if there is any possibility at all of not polluting the native code, grab it with both hands. Native performance is king, try your very utmost bestest to preserve that, paravirt is a distant second and nobody sane should care about the virt case at all. If you want extra lock stealing in the paravirt case, put it in the slowpath code before you start queueing. pgpji3CPU64HJ.pgp Description: PGP signature
Re: [PATCH v2 01/10] DMA, CMA: clean-up log message
On Thu, Jun 12, 2014 at 10:11:19AM +0530, Aneesh Kumar K.V wrote: > Joonsoo Kim writes: > > > We don't need explicit 'CMA:' prefix, since we already define prefix > > 'cma:' in pr_fmt. So remove it. > > > > And, some logs print function name and others doesn't. This looks > > bad to me, so I unify log format to print function name consistently. > > > > Lastly, I add one more debug log on cma_activate_area(). > > > > Signed-off-by: Joonsoo Kim > > > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > > index 83969f8..bd0bb81 100644 > > --- a/drivers/base/dma-contiguous.c > > +++ b/drivers/base/dma-contiguous.c > > @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit) > > } > > > > if (selected_size && !dma_contiguous_default_area) { > > - pr_debug("%s: reserving %ld MiB for global area\n", __func__, > > + pr_debug("%s(): reserving %ld MiB for global area\n", __func__, > > (unsigned long)selected_size / SZ_1M); > > Do we need to do function(), or just function:. I have seen the later > usage in other parts of the kernel. Hello, I also haven't seen this format in other kernel code, but, in cma, they use this format as following. function(arg1, arg2, ...): some message If we all dislike this format, we can change it after merging this patchset. Until then, it seems better to me to leave it as is. > > > > > dma_contiguous_reserve_area(selected_size, selected_base, > > @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma) > > unsigned i = cma->count >> pageblock_order; > > struct zone *zone; > > > > - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); > > + pr_debug("%s()\n", __func__); > > why ? > This pr_debug() comes from ppc kvm's kvm_cma_init_reserved_areas(). I want to maintain all log messages as much as possible to reduce confusion with this generalization. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] mm: mark remap_file_pages() syscall as deprecated
Hi Kirill, On Thu, May 8, 2014 at 2:41 PM, Kirill A. Shutemov wrote: > The remap_file_pages() system call is used to create a nonlinear mapping, > that is, a mapping in which the pages of the file are mapped into a > nonsequential order in memory. The advantage of using remap_file_pages() > over using repeated calls to mmap(2) is that the former approach does not > require the kernel to create additional VMA (Virtual Memory Area) data > structures. > > Supporting of nonlinear mapping requires significant amount of non-trivial > code in kernel virtual memory subsystem including hot paths. Also to get > nonlinear mapping work kernel need a way to distinguish normal page table > entries from entries with file offset (pte_file). Kernel reserves flag in > PTE for this purpose. PTE flags are scarce resource especially on some CPU > architectures. It would be nice to free up the flag for other usage. > > Fortunately, there are not many users of remap_file_pages() in the wild. > It's only known that one enterprise RDBMS implementation uses the syscall > on 32-bit systems to map files bigger than can linearly fit into 32-bit > virtual address space. This use-case is not critical anymore since 64-bit > systems are widely available. > > The plan is to deprecate the syscall and replace it with an emulation. > The emulation will create new VMAs instead of nonlinear mappings. It's > going to work slower for rare users of remap_file_pages() but ABI is > preserved. > > One side effect of emulation (apart from performance) is that user can hit > vm.max_map_count limit more easily due to additional VMAs. See comment for > DEFAULT_MAX_MAP_COUNT for more details on the limit. Best to CC linux-api@ (https://www.kernel.org/doc/man-pages/linux-api-ml.html) on patches like this, as well as the man-pages maintainer, so that something goes into the man page. I added the following into the man page: Note: this system call is (since Linux 3.16) deprecated and will eventually be replaced by a slower in-kernel emulation. Those few applications that use this system call should conโ sider migrating to alternatives. Okay? Cheers, Michael > Signed-off-by: Kirill A. Shutemov > --- > Documentation/vm/remap_file_pages.txt | 28 > mm/fremap.c | 4 > 2 files changed, 32 insertions(+) > create mode 100644 Documentation/vm/remap_file_pages.txt > > diff --git a/Documentation/vm/remap_file_pages.txt > b/Documentation/vm/remap_file_pages.txt > new file mode 100644 > index ..560e4363a55d > --- /dev/null > +++ b/Documentation/vm/remap_file_pages.txt > @@ -0,0 +1,28 @@ > +The remap_file_pages() system call is used to create a nonlinear mapping, > +that is, a mapping in which the pages of the file are mapped into a > +nonsequential order in memory. The advantage of using remap_file_pages() > +over using repeated calls to mmap(2) is that the former approach does not > +require the kernel to create additional VMA (Virtual Memory Area) data > +structures. > + > +Supporting of nonlinear mapping requires significant amount of non-trivial > +code in kernel virtual memory subsystem including hot paths. Also to get > +nonlinear mapping work kernel need a way to distinguish normal page table > +entries from entries with file offset (pte_file). Kernel reserves flag in > +PTE for this purpose. PTE flags are scarce resource especially on some CPU > +architectures. It would be nice to free up the flag for other usage. > + > +Fortunately, there are not many users of remap_file_pages() in the wild. > +It's only known that one enterprise RDBMS implementation uses the syscall > +on 32-bit systems to map files bigger than can linearly fit into 32-bit > +virtual address space. This use-case is not critical anymore since 64-bit > +systems are widely available. > + > +The plan is to deprecate the syscall and replace it with an emulation. > +The emulation will create new VMAs instead of nonlinear mappings. It's > +going to work slower for rare users of remap_file_pages() but ABI is > +preserved. > + > +One side effect of emulation (apart from performance) is that user can hit > +vm.max_map_count limit more easily due to additional VMAs. See comment for > +DEFAULT_MAX_MAP_COUNT for more details on the limit. > diff --git a/mm/fremap.c b/mm/fremap.c > index 34feba60a17e..12c3bb63b7f9 100644 > --- a/mm/fremap.c > +++ b/mm/fremap.c > @@ -152,6 +152,10 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, > unsigned long, size, > int has_write_lock = 0; > vm_flags_t vm_flags = 0; > > + pr_warn_once("%s (%d) uses depricated remap_file_pages() syscall. " > + "See Documentation/vm/remap_file_pages.txt.\n", > + current->comm, current->pid); > + > if (prot) > return err; > /* > -- > 2.0.0.rc2 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm'
Re: Re: [PATCH ftrace/core 0/2] ftrace, kprobes: Introduce IPMODIFY flag for ftrace_ops to detect conflicts
(2014/06/12 1:58), Josh Poimboeuf wrote: > On Tue, Jun 10, 2014 at 10:50:01AM +, Masami Hiramatsu wrote: >> Hi, >> >> Here is a pair of patches which introduces IPMODIFY flag for >> ftrace_ops to detect conflicts of ftrace users who can modify >> regs->ip in their handler. >> Currently, only kprobes can change the regs->ip in the handler, >> but recently kpatch is also want to change it. Moreover, since >> the ftrace itself exported to modules, it might be considerable >> senario. >> >> Here we talked on github. >> https://github.com/dynup/kpatch/issues/47 >> >> To protect modified regs-ip from each other, this series >> introduces FTRACE_OPS_FL_IPMODIFY flag and ftrace now ensures >> the flag can be set on each function entry location. If there >> is someone who already reserve regs->ip on target function >> entry, ftrace_set_filter_ip or register_ftrace_function will >> return -EBUSY. Users must handle that. >> >> At this point, all kprobes will reserve regs->ip, since jprobe >> requires it. > > Masami, thanks very much for this! > > One issue with this approach is that it _always_ makes kprobes and > kpatch incompatible when probing/patching the same function, even when > kprobes doesn't need to touch regs->ip. Right. > Is it possible to add a kprobes flag (KPROBE_FLAG_IPMODIFY), which is > only set by those kprobes users (just jprobes?) which need to modify IP? > Then kprobes could only set the corresponding ftrace flag when it's > really needed. And I think kprobes could even enforce the fact that > !KPROBE_FLAG_IPMODIFY users don't change regs->ip. No, actually we don't need that additional flag, we can slightly change the kprobes behavior(spec) that requires setting kprobe->break_handler a function if it modifies regs->ip. (this doesn't break jprobe) The problem is that we need a separate ftrace_ops for jprobe and other probes which can change the regs->ip. But current kprobes don't expected that such case... > BTW, I've done some testing with this patch set by patching/probing the > same function with FTRACE_OPS_FL_IPMODIFY, and got some warnings. I saw > the following warning when attempting to kpatch a kprobed function: Ah, thanks for testing! I think it needs more work on failure path. > > WARNING: CPU: 2 PID: 18351 at kernel/trace/ftrace.c:419 > __unregister_ftrace_function+0x1be/0x1d0() > Modules linked in: kpatch_meminfo_string(OE+) kpatch(OE) > stap_8d70d6e041605bd1e144cba4801652_14636(OE) rfcomm fuse ipt_MASQUERADE ccm > xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack bnep ebtable_nat > ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat > nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle > ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack > iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 > snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic > x86_pkg_temp_thermal coretemp kvm_intel snd_hda_intel iTCO_wdt > iTCO_vendor_support snd_hda_controller kvm snd_hda_codec iwlwifi snd_hwdep > uvcvideo snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device >videobuf2_core btusb v4l2_common snd_pcm videodev nfsd cfg80211 microcode > e1000e bluetooth media thinkpad_acpi joydev sdhci_pci sdhci pcspkr serio_raw > snd_timer i2c_i801 snd mmc_core auth_rpcgss mei_me mei lpc_ich mfd_core > shpchp ptp pps_core wmi tpm_tis soundcore tpm rfkill nfs_acl lockd sunrpc > dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul crc32_pclmul > crc32c_intel ghash_clmulni_intel i2c_core video > CPU: 2 PID: 18351 Comm: insmod Tainted: GW OE 3.15.0-IPMODIFY+ #1 > Hardware name: LENOVO 2356BH8/2356BH8, BIOS G7ET63WW (2.05 ) 11/12/2012 > b39bd289 8803b78d7bc0 816f31ed > 8803b78d7bf8 8108914d a07f9040 >fff0 0001 8803e7ac4200 > Call Trace: >[] dump_stack+0x45/0x56 >[] warn_slowpath_common+0x7d/0xa0 >[] warn_slowpath_null+0x1a/0x20 >[] __unregister_ftrace_function+0x1be/0x1d0 >[] ftrace_startup+0x1e4/0x220 >[] register_ftrace_function+0x43/0x60 >[] kpatch_register+0x664/0x830 [kpatch] >[] ? 0xa080 >[] ? 0xa080 >[] patch_init+0x194/0x1000 [kpatch_meminfo_string] >[] ? 0xa0045fff >[] do_one_initcall+0xd4/0x210 >[] ? set_memory_nx+0x43/0x50 >[] load_module+0x1d92/0x25e0 >[] ? store_uevent+0x70/0x70 >[] ? kernel_read+0x50/0x80 >[] SyS_finit_module+0xa6/0xd0 >[] system_call_fastpath+0x16/0x1b > > > That warning happened because __unregister_ftrace_function() doesn't > expect FTRACE_OPS_FL_ENABLED to be cleared in the ftrace_startup error > path. Ah, right! I'll fix that. > I tried removing the FTRACE_OPS_FL_ENABLED clearing line in > ftrace_startup, but I saw more warnings. This one
Re: [RFC PATCH 00/13][V3] kexec: A new system call to allow in kernel loading
On 06/03/14 at 09:06am, Vivek Goyal wrote: > Hi, > > This is V3 of the patchset. Previous versions were posted here. > > V1: https://lkml.org/lkml/2013/11/20/540 > V2: https://lkml.org/lkml/2014/1/27/331 > > Changes since v2: > > - Took care of most of the review comments from V2. > - Added support for kexec/kdump on EFI systems. > - Dropped support for loading ELF vmlinux. > > This patch series is generated on top of 3.15.0-rc8. It also requires a > two patch cleanup series which is sitting in -tip tree here. > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/log/?h=x86/boot > > This patch series does not do kernel signature verification yet. I plan > to post another patch series for that. Now bzImage is already signed > with PKCS7 signature I plan to parse and verify those signatures. > > Primary goal of this patchset is to prepare groundwork so that kernel > image can be signed and signatures be verified during kexec load. This > should help with two things. > > - It should allow kexec/kdump on secureboot enabled machines. > > - In general it can help even without secureboot. By being able to verify > kernel image signature in kexec, it should help with avoiding module > signing restrictions. Matthew Garret showed how to boot into a custom > kernel, modify first kernel's memory and then jump back to old kernel and > bypass any policy one wants to. > > Any feedback is welcome. Hi, Vivek For efi ioremapping case, in 3.15 kernel efi runtime maps will not be saved if efi=old_map is used. So you need detect this and fail the kexec file load. Otherwise the patchset works for me. Thanks Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
Hi Masami, On Thu, 12 Jun 2014 12:29:09 +0900, Masami Hiramatsu wrote: > (2014/06/11 16:41), Namhyung Kim wrote: >> Hi Masami, >> >> On Wed, 11 Jun 2014 10:28:01 +0900, Masami Hiramatsu wrote: >>> (2014/06/10 22:53), Namhyung Kim wrote: Hi Masami, 2014-06-10 (ํ), 10:50 +, Masami Hiramatsu: > Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among > + /* Update rec->flags */ > + do_for_each_ftrace_rec(pg, rec) { > + /* We need to update only differences of filter_hash */ > + in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip); > + in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip); Why not use ftrace_hash_empty() here instead of checking NULL? >>> >>> Ah, a trick is here. Since an empty filter_hash must hit all, we can not >>> enable/disable filter_hash if we use ftrace_hash_empty() here. >>> >>> To enabling the new_hash, old_hash must be EMPTY_HASH which means in_old >>> always be false. To disabling, new_hash is EMPTY_HASH too. >>> Please see ftrace_hash_ipmodify_enable/disable/update(). >> >> I'm confused. 8-p I guess what you want to do is checking records in >> either of the filter_hash, right? If so, what about this? >> >> in_old = !ftrace_hash_empty(old_hash) && ftrace_lookup_ip(old_hash, >> rec->ip); >> in_new = !ftrace_hash_empty(new_hash) && ftrace_lookup_ip(new_hash, >> rec->ip); > > NO, ftrace_lookup_ip() returns NULL if the hash is empty, so adding > !ftrace_hash_empty() is meaningless :) Ah, you're right! > > Actually, here I intended to have 3 meanings for the new/old_hash arguments, > - If it is NULL, it hits all > - If it is EMPTY_HASH, it hits nothing > - If it has some entries, it hits those entries. > > And in ftrace.c(__ftrace_hash_rec_update), AFAICS, ops->filter_hash has only > 2 meanings, > - If it is EMPTY_HASH or NULL, it hits all > - If it has some entries, it hits those entries. > > So I had to do above change... Then I propose to use a different value/symbol instead of EMPTY_HASH in order to prevent future confusion and add some comments there. [SNIP] > +static int ftrace_hash_ipmodify_enable(struct ftrace_ops *ops) > +{ > + struct ftrace_hash *hash = ops->filter_hash; > + > + if (ftrace_hash_empty(hash)) > + hash = NULL; > + > + return __ftrace_hash_update_ipmodify(ops, EMPTY_HASH, hash); > +} Please see above comment. You can pass an empty hash as is, or pass NULL as second arg. The same goes to below... >>> >>> As I said above, that is by design :). EMPTY_HASH means it hits nothing, >>> NULL means it hits all. >> >> But doesn't it make unrelated records also get the flag updated? I'm >> curious when new_hash can be empty on _enable() case.. > > NO, _enable() is called right before ftrace_hash_rec_enable(ops,1) which > always enables filter_hash (since the 2nd arg is 1). If the filter_hash > is empty, ftrace_hash_rec_enable() enables ftrace_ops on all ftrace_recs. But AFAICS both of kprobes and kpatch call ftrace_set_filter_ip() before calling register_ftrace_function(). That means there's no case when ops->filter_hash can be empty, right? > > Ah, but I found I made a redundant mistake (different one) in > ftrace_hash_move(), > ftrace_hash_ipmodify_update() should be done only if "enable" is set (that > means ftrace_hash_move() updates filter_hash, not notrace_hash). > I'll update this patch. Right. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs
On Thu, Jun 12, 2014 at 12:21:40PM +0900, Joonsoo Kim wrote: > To prepare future generalization work on cma area management code, > we need to separate core cma management codes from DMA APIs. > We will extend these core functions to cover requirements of > ppc kvm's cma area management functionality in following patches. > This separation helps us not to touch DMA APIs while extending > core functions. > > Signed-off-by: Joonsoo Kim > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index fb0cdce..8a44c82 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas); > * If @fixed is true, reserve contiguous area at exactly @base. If false, > * reserve in range from @base to @limit. > */ > -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, > -phys_addr_t limit, struct cma **res_cma, > -bool fixed) > +static int __init __dma_contiguous_reserve_area(phys_addr_t size, > + phys_addr_t base, phys_addr_t limit, > + struct cma **res_cma, bool fixed) > { > struct cma *cma = _areas[cma_area_count]; > phys_addr_t alignment; > @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t > size, phys_addr_t base, > > pr_info("%s(): reserved %ld MiB at %08lx\n", > __func__, (unsigned long)size / SZ_1M, (unsigned long)base); > - > - /* Architecture specific contiguous memory fixup. */ > - dma_contiguous_early_fixup(base, size); > return 0; > + > err: > pr_err("%s(): failed to reserve %ld MiB\n", > __func__, (unsigned long)size / SZ_1M); > return ret; > } > > +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, > +phys_addr_t limit, struct cma **res_cma, > +bool fixed) > +{ > + int ret; > + > + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); > + if (ret) > + return ret; > + > + /* Architecture specific contiguous memory fixup. */ > + dma_contiguous_early_fixup(base, size); In old, base and size are aligned with alignment and passed into arch fixup but your patch is changing it. I didn't look at what kinds of side effect it makes but just want to confirm. > + > + return 0; > +} > + > static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) > { > mutex_lock(>lock); > @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned > long pfn, int count) > * global one. Requires architecture specific dev_get_cma_area() helper > * function. > */ > -struct page *dma_alloc_from_contiguous(struct device *dev, int count, > +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, > unsigned int align) > { > unsigned long mask, pfn, pageno, start = 0; > - struct cma *cma = dev_get_cma_area(dev); > struct page *page = NULL; > int ret; > > if (!cma || !cma->count) > return NULL; > > - if (align > CONFIG_CMA_ALIGNMENT) > - align = CONFIG_CMA_ALIGNMENT; > - > pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma, >count, align); > > @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device > *dev, int count, > return page; > } > Please move the description in __dma_alloc_from_contiguous to here exported API. > +struct page *dma_alloc_from_contiguous(struct device *dev, int count, > +unsigned int align) > +{ > + struct cma *cma = dev_get_cma_area(dev); > + > + if (align > CONFIG_CMA_ALIGNMENT) > + align = CONFIG_CMA_ALIGNMENT; > + > + return __dma_alloc_from_contiguous(cma, count, align); > +} > + > /** > * dma_release_from_contiguous() - release allocated pages > * @dev: Pointer to device for which the pages were allocated. > @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device > *dev, int count, > * It returns false when provided pages do not belong to contiguous area and > * true otherwise. > */ > -bool dma_release_from_contiguous(struct device *dev, struct page *pages, > +static bool __dma_release_from_contiguous(struct cma *cma, struct page > *pages, >int count) > { > - struct cma *cma = dev_get_cma_area(dev); > unsigned long pfn; > > if (!cma || !pages) > @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, > struct page *pages, > > return true; > } > + Ditto. > +bool dma_release_from_contiguous(struct device *dev, struct page *pages, > + int count) > +{ > + struct cma *cma =
[Regression] 3.15 mmc related ext4 corruption with qemu-system-arm
I've been seeing some ext4 corruption with recent kernels under qemu-system-arm. This issue seems to crop up after shutting down uncleanly (terminating qemu), shortly after booting about 50% of the time. ext4/mmc related dmesg details are: [3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at 0x10005000 irq 41,42 (pio) [3.268316] mmc0: new SDHC card at address 4567 [3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB [3.315699] mmcblk0: p1 p2 p3 p4 < p5 p6 > ... [ 11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option [ 11.904714] EXT4-fs (mmcblk0p5): recovery complete [ 11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: nomblk_io_submit,errors=panic ... [ 91.558824] EXT4-fs error (device mmcblk0p5): ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in gd; block bitmap corrupt. [ 91.560641] Aborting journal on device mmcblk0p5-8. [ 91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5): panic forced after error [ 91.562589] [ 91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560 [ 91.564616] [] (unwind_backtrace) from [] (show_stack+0x11/0x14) [ 91.565154] [] (show_stack) from [] (dump_stack+0x59/0x7c) [ 91.565666] [] (dump_stack) from [] (panic+0x67/0x178) [ 91.566147] [] (panic) from [] (ext4_handle_error+0x69/0x74) [ 91.566659] [] (ext4_handle_error) from [] (__ext4_grp_locked_error+0x6b/0x160) [ 91.567223] [] (__ext4_grp_locked_error) from [] (ext4_mb_generate_buddy+0x1b1/0x29c) [ 91.567860] [] (ext4_mb_generate_buddy) from [] (ext4_mb_init_cache+0x219/0x4e0) [ 91.568473] [] (ext4_mb_init_cache) from [] (ext4_mb_init_group+0xbb/0x138) [ 91.569021] [] (ext4_mb_init_group) from [] (ext4_mb_good_group+0xf3/0xfc) [ 91.569659] [] (ext4_mb_good_group) from [] (ext4_mb_regular_allocator+0x153/0x2c4) [ 91.570250] [] (ext4_mb_regular_allocator) from [] (ext4_mb_new_blocks+0x2fd/0x4e4) [ 91.570868] [] (ext4_mb_new_blocks) from [] (ext4_ext_map_blocks+0x965/0x10bc) [ 91.571444] [] (ext4_ext_map_blocks) from [] (ext4_map_blocks+0xfb/0x36c) [ 91.571992] [] (ext4_map_blocks) from [] (mpage_map_and_submit_extent+0x99/0x5f0) [ 91.572614] [] (mpage_map_and_submit_extent) from [] (ext4_writepages+0x2b9/0x4e8) [ 91.573201] [] (ext4_writepages) from [] (do_writepages+0x19/0x28) [ 91.573709] [] (do_writepages) from [] (__filemap_fdatawrite_range+0x3d/0x44) [ 91.574265] [] (__filemap_fdatawrite_range) from [] (filemap_flush+0x23/0x28) [ 91.574854] [] (filemap_flush) from [] (ext4_rename+0x2f9/0x3e4) [ 91.575360] [] (ext4_rename) from [] (vfs_rename+0x183/0x45c) [ 91.575911] [] (vfs_rename) from [] (SyS_renameat2+0x22b/0x26c) [ 91.576460] [] (SyS_renameat2) from [] (SyS_rename+0x1f/0x24) [ 91.576961] [] (SyS_rename) from [] (ret_fast_syscall+0x1/0x5c) Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't be surprising, as I saw problems with that patch earlier in the 3.15-rc cycle: https://lkml.org/lkml/2014/4/14/824 However that discussion petered out (possibly my fault for not following up) as to if it was an issue with the patch or a issue with qemu. Then the original issue disappeared for me, which I figured was due to a fix upstream, but now I'm guessing coincided with me updating my system and getting qemu v2.0 (where as previously I was on 1.5). $ qemu-system-arm -version QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright (c) 2003-2008 Fabrice Bellard While the previous behavior was annoying and kept my emulated environments from booting, this while a bit more rare and subtle eats the disks, which is much more painful for my testing. Unfortunately reverting the change (manually, as it doesn't revert cleanly anymore) doesn't seem to completely avoid the issue, so the bisection may have gone slightly astray (though it is interesting it landed on the same commit I earlier had trouble with). So I'll back-track and double check some of the last few "good" results to validate I didn't just luck into 3 good boots accidentally. I'll also review my revert in case I missed something subtle in doing it manually. Anyway, if there is any thoughts on how to better chase this down and debug it, I'd appreciate it! I can also provide reproduction instructions with a pre-built Linaro android disk image and hand built kernel if anyone wants to debug this themselves. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[f2fs-dev][PATCH 3/3] f2fs: avoid to truncate non-updated page partially
After we call find_data_page in truncate_partial_data_page, we could not guarantee this page is updated or not as error may occurred in lower layer. We'd better check status of the page to avoid this no updated page be writebacked to device. Signed-off-by: Chao Yu --- fs/f2fs/file.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 9c49c59..fc569ca 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -380,13 +380,15 @@ static void truncate_partial_data_page(struct inode *inode, u64 from) return; lock_page(page); - if (unlikely(page->mapping != inode->i_mapping)) { - f2fs_put_page(page, 1); - return; - } + if (unlikely(!PageUptodate(page) || + page->mapping != inode->i_mapping)) + goto out; + f2fs_wait_on_page_writeback(page, DATA); zero_user(page, offset, PAGE_CACHE_SIZE - offset); set_page_dirty(page); + +out: f2fs_put_page(page, 1); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[f2fs-dev][PATCH 2/3] f2fs: avoid unneeded SetPageUptodate in f2fs_write_end
We have already set page update in ->write_begin, so we should remove redundant SetPageUptodate in ->write_end. Signed-off-by: Chao Yu --- fs/f2fs/data.c |1 - 1 file changed, 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index c1fb6dd..fd133cf 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1003,7 +1003,6 @@ static int f2fs_write_end(struct file *file, trace_f2fs_write_end(inode, pos, len, copied); - SetPageUptodate(page); set_page_dirty(page); if (pos + copied > i_size_read(inode)) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] MMC updates for 3.16-rc1
Hi Linus, On Wed, 11 Jun 2014, Linus Torvalds wrote: > On Tue, Jun 10, 2014 at 2:50 PM, Linus Torvalds > wrote: > > > > Also, that new drivers/mmc/host/usdhi6rol0.c driver is one f*cking > > noisy compile, and knisr certainly has never been tested in a 64-bit > > environment. Please either fix it, or make it depend on BROKEN. > > Guys? Seriously, if that driver isn't fixed, I'm going to mark it > broken myself. It pretty much generates as many lines of warnings as > the rest of my "allmodconfig" build combined. > > It's extremely annoying, and the crazy warnings are likely to hide > potential real problems elsewhere, so right now that driver has > negative value. I do a lot of allmodconfig builds during the merge > window, and I am not going to look at that warning much longer. > > Fix it promptly, or it gets disabled. I sent a patch a few hours ago: https://patchwork.kernel.org/patch/4338531/ Since it's only changing print format strings, it should be a trivial one to review, so, just waiting for Chris to pick it up and push it to you. Sorry about the trouble. Thanks Guennadi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak
On Thu, Jun 12, 2014 at 12:21:39PM +0900, Joonsoo Kim wrote: > We should free memory for bitmap when we find zone mis-match, > otherwise this memory will leak. Then, -stable stuff? > > Additionally, I copy code comment from ppc kvm's cma code to notify > why we need to check zone mis-match. > > Signed-off-by: Joonsoo Kim > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index bd0bb81..fb0cdce 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma) > base_pfn = pfn; > for (j = pageblock_nr_pages; j; --j, pfn++) { > WARN_ON_ONCE(!pfn_valid(pfn)); > + /* > + * alloc_contig_range requires the pfn range > + * specified to be in the same zone. Make this > + * simple by forcing the entire CMA resv range > + * to be in the same zone. > + */ > if (page_zone(pfn_to_page(pfn)) != zone) > - return -EINVAL; > + goto err; At a first glance, I thought it would be better to handle such error before activating. So when I see the registration code(ie, dma_contiguous_revere_area), I realized it is impossible because we didn't set up zone yet. :( If so, when we detect to fail here, it would be better to report more meaningful error message like what was successful zone and what is new zone and failed pfn number? > } > init_cma_reserved_pageblock(pfn_to_page(base_pfn)); > } while (--i); > > mutex_init(>lock); > return 0; > + > +err: > + kfree(cma->bitmap); > + return -EINVAL; > } > > static struct cma cma_areas[MAX_CMA_AREAS]; > -- > 1.7.9.5 -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[f2fs-dev][PATCH 1/3] f2fs: check lower bound nid value in check_nid_range
This patch add lower bound verification for nid in check_nid_range, so nids reserved like 0, node, meta passed by caller could be checked there. And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying code. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h |3 ++- fs/f2fs/inode.c |1 + fs/f2fs/super.c |4 +--- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 7ef7acd..58df97e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -641,7 +641,8 @@ static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi) */ static inline int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid) { - WARN_ON((nid >= NM_I(sbi)->max_nid)); + if (unlikely(nid < F2FS_ROOT_INO(sbi))) + return -EINVAL; if (unlikely(nid >= NM_I(sbi)->max_nid)) return -EINVAL; return 0; diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index adc622c..2cf6962 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -78,6 +78,7 @@ static int do_read_inode(struct inode *inode) if (check_nid_range(sbi, inode->i_ino)) { f2fs_msg(inode->i_sb, KERN_ERR, "bad inode number: %lu", (unsigned long) inode->i_ino); + WARN_ON(1); return -EINVAL; } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b2b1863..8f96d93 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -689,9 +689,7 @@ static struct inode *f2fs_nfs_get_inode(struct super_block *sb, struct f2fs_sb_info *sbi = F2FS_SB(sb); struct inode *inode; - if (unlikely(ino < F2FS_ROOT_INO(sbi))) - return ERR_PTR(-ESTALE); - if (unlikely(ino >= NM_I(sbi)->max_nid)) + if (check_nid_range(sbi, ino)) return ERR_PTR(-ESTALE); /* -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drm/cirrus: bind also to qemu-xen-traditional
Ping? On Fri, Apr 11, Olaf Hering wrote: > qemu as used by xend/xm toolstack uses a different subvendor id. > Bind the drm driver also to this emulated card. > > Signed-off-by: Olaf Hering > --- > drivers/gpu/drm/cirrus/cirrus_drv.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/cirrus/cirrus_drv.c > b/drivers/gpu/drm/cirrus/cirrus_drv.c > index 953fc8a..848 100644 > --- a/drivers/gpu/drm/cirrus/cirrus_drv.c > +++ b/drivers/gpu/drm/cirrus/cirrus_drv.c > @@ -31,6 +31,8 @@ static struct drm_driver driver; > static DEFINE_PCI_DEVICE_TABLE(pciidlist) = { > { PCI_VENDOR_ID_CIRRUS, PCI_DEVICE_ID_CIRRUS_5446, 0x1af4, 0x1100, 0, > 0, 0 }, > + { PCI_VENDOR_ID_CIRRUS, PCI_DEVICE_ID_CIRRUS_5446, PCI_VENDOR_ID_XEN, > + 0x0001, 0, 0, 0 }, > {0,} > }; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH[[vme/bridges/vme_ca91cx42.c:1382: Bad if test Bug Fix]โ
On Thu, Jun 12, 2014 at 04:17:36AM +, Nick Krause wrote: > Here is the fixed patch as per Greg's recommendations. Unforunalty my email > client removes tabs so I will have to be sending it as a patch file if that's > Ok. > Nick HTML is rejected by the mailing lists, and we can't take a base64 attachment either :( Take a look at Documentation/email_clients.txt for ideas on how to fix this up on your end. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 01/10] DMA, CMA: clean-up log message
Hi Joonsoo, On Thu, Jun 12, 2014 at 12:21:38PM +0900, Joonsoo Kim wrote: > We don't need explicit 'CMA:' prefix, since we already define prefix > 'cma:' in pr_fmt. So remove it. > > And, some logs print function name and others doesn't. This looks > bad to me, so I unify log format to print function name consistently. > > Lastly, I add one more debug log on cma_activate_area(). When I take a look, it just indicates cma_activate_area was called or not, without what range for the area was reserved successfully so I couldn't see the intention for new message. Description should explain it so that everybody can agree on your claim. > > Signed-off-by: Joonsoo Kim > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index 83969f8..bd0bb81 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit) > } > > if (selected_size && !dma_contiguous_default_area) { > - pr_debug("%s: reserving %ld MiB for global area\n", __func__, > + pr_debug("%s(): reserving %ld MiB for global area\n", __func__, >(unsigned long)selected_size / SZ_1M); > > dma_contiguous_reserve_area(selected_size, selected_base, > @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma) > unsigned i = cma->count >> pageblock_order; > struct zone *zone; > > - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); > + pr_debug("%s()\n", __func__); > > + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); > if (!cma->bitmap) > return -ENOMEM; > > @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, > phys_addr_t base, > > /* Sanity checks */ > if (cma_area_count == ARRAY_SIZE(cma_areas)) { > - pr_err("Not enough slots for CMA reserved regions!\n"); > + pr_err("%s(): Not enough slots for CMA reserved regions!\n", > + __func__); > return -ENOSPC; > } > > @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t > size, phys_addr_t base, > *res_cma = cma; > cma_area_count++; > > - pr_info("CMA: reserved %ld MiB at %08lx\n", (unsigned long)size / SZ_1M, > - (unsigned long)base); > + pr_info("%s(): reserved %ld MiB at %08lx\n", > + __func__, (unsigned long)size / SZ_1M, (unsigned long)base); > > /* Architecture specific contiguous memory fixup. */ > dma_contiguous_early_fixup(base, size); > return 0; > err: > - pr_err("CMA: failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M); > + pr_err("%s(): failed to reserve %ld MiB\n", > + __func__, (unsigned long)size / SZ_1M); > return ret; > } > > -- > 1.7.9.5 -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] fs2dt: Refine kdump device_tree sort
From: Yang Wei The commit b02d735bf was to rearrange the device-tree entries, and assumed that these entries are sorted in the ascending order. but acctually when I was validating kexec and kdump, the order of serial node still is changed. We should not only compare the length of directory name, but also compare the directory name, it would ensure that the order of device node is really in ascending order. Signed-off-by: Yang Wei --- kexec/fs2dt.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) It is validated on Freescale t4240qds. diff --git a/kexec/fs2dt.c b/kexec/fs2dt.c index 1e5f074..0bffaf5 100644 --- a/kexec/fs2dt.c +++ b/kexec/fs2dt.c @@ -479,6 +479,9 @@ static int comparefunc(const struct dirent **dentry1, { char *str1 = (*(struct dirent **)dentry1)->d_name; char *str2 = (*(struct dirent **)dentry2)->d_name; + char* ptr1 = strchr(str1, '@'); + char* ptr2 = strchr(str2, '@'); + int len1, len2; /* * strcmp scans from left to right and fails to idetify for some @@ -486,9 +489,13 @@ static int comparefunc(const struct dirent **dentry1, * Therefore, we get the wrong sorted order like memory@1000 and * memory@f00. */ - if (strchr(str1, '@') && strchr(str2, '@') && - (strlen(str1) > strlen(str2))) - return 1; + if (ptr1 && ptr2) { + len1 = ptr1 - str1; + len2 = ptr2 - str2; + if (!strncmp(str1, str2, len1 >len2 ? len1: len2) && + (strlen(str1) > strlen(str2))) + return 1; + } return strcmp(str1, str2); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region
Joonsoo Kim writes: > ppc kvm's cma area management needs alignment constraint on > cma region. So support it to prepare generalization of cma area > management functionality. > > Additionally, add some comments which tell us why alignment > constraint is needed on cma region. > > Signed-off-by: Joonsoo Kim Reviewed-by: Aneesh Kumar K.V > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index 8a44c82..bc4c171 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > > struct cma { > unsigned long base_pfn; > @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas); > * @size: Size of the reserved area (in bytes), > * @base: Base address of the reserved area optional, use 0 for any > * @limit: End address of the reserved memory (optional, 0 for any). > + * @alignment: Alignment for the contiguous memory area, should be power of 2 > * @res_cma: Pointer to store the created cma region. > * @fixed: hint about where to place the reserved area > * > @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas); > */ > static int __init __dma_contiguous_reserve_area(phys_addr_t size, > phys_addr_t base, phys_addr_t limit, > + phys_addr_t alignment, > struct cma **res_cma, bool fixed) > { > struct cma *cma = _areas[cma_area_count]; > - phys_addr_t alignment; > int ret = 0; > > - pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__, > - (unsigned long)size, (unsigned long)base, > - (unsigned long)limit); > + pr_debug("%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n", > + __func__, (unsigned long)size, (unsigned long)base, > + (unsigned long)limit, (unsigned long)alignment); > > /* Sanity checks */ > if (cma_area_count == ARRAY_SIZE(cma_areas)) { > @@ -253,8 +255,17 @@ static int __init > __dma_contiguous_reserve_area(phys_addr_t size, > if (!size) > return -EINVAL; > > - /* Sanitise input arguments */ > - alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order); > + if (alignment && !is_power_of_2(alignment)) > + return -EINVAL; > + > + /* > + * Sanitise input arguments. > + * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise, > + * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism > + * and CMA property will be broken. > + */ > + alignment = max(alignment, > + (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order)); > base = ALIGN(base, alignment); > size = ALIGN(size, alignment); > limit &= ~(alignment - 1); > @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, > phys_addr_t base, > { > int ret; > > - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); > + ret = __dma_contiguous_reserve_area(size, base, limit, 0, > + res_cma, fixed); > if (ret) > return ret; > > -- > 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs
Joonsoo Kim writes: > To prepare future generalization work on cma area management code, > we need to separate core cma management codes from DMA APIs. > We will extend these core functions to cover requirements of > ppc kvm's cma area management functionality in following patches. > This separation helps us not to touch DMA APIs while extending > core functions. > > Signed-off-by: Joonsoo Kim Reviewed-by: Aneesh Kumar K.V > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index fb0cdce..8a44c82 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas); > * If @fixed is true, reserve contiguous area at exactly @base. If false, > * reserve in range from @base to @limit. > */ > -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, > -phys_addr_t limit, struct cma **res_cma, > -bool fixed) > +static int __init __dma_contiguous_reserve_area(phys_addr_t size, > + phys_addr_t base, phys_addr_t limit, > + struct cma **res_cma, bool fixed) > { > struct cma *cma = _areas[cma_area_count]; > phys_addr_t alignment; > @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t > size, phys_addr_t base, > > pr_info("%s(): reserved %ld MiB at %08lx\n", > __func__, (unsigned long)size / SZ_1M, (unsigned long)base); > - > - /* Architecture specific contiguous memory fixup. */ > - dma_contiguous_early_fixup(base, size); > return 0; > + > err: > pr_err("%s(): failed to reserve %ld MiB\n", > __func__, (unsigned long)size / SZ_1M); > return ret; > } > > +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, > +phys_addr_t limit, struct cma **res_cma, > +bool fixed) > +{ > + int ret; > + > + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); > + if (ret) > + return ret; > + > + /* Architecture specific contiguous memory fixup. */ > + dma_contiguous_early_fixup(base, size); > + > + return 0; > +} > + > static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) > { > mutex_lock(>lock); > @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned > long pfn, int count) > * global one. Requires architecture specific dev_get_cma_area() helper > * function. > */ > -struct page *dma_alloc_from_contiguous(struct device *dev, int count, > +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, > unsigned int align) > { > unsigned long mask, pfn, pageno, start = 0; > - struct cma *cma = dev_get_cma_area(dev); > struct page *page = NULL; > int ret; > > if (!cma || !cma->count) > return NULL; > > - if (align > CONFIG_CMA_ALIGNMENT) > - align = CONFIG_CMA_ALIGNMENT; > - > pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma, >count, align); > > @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device > *dev, int count, > return page; > } > > +struct page *dma_alloc_from_contiguous(struct device *dev, int count, > +unsigned int align) > +{ > + struct cma *cma = dev_get_cma_area(dev); > + > + if (align > CONFIG_CMA_ALIGNMENT) > + align = CONFIG_CMA_ALIGNMENT; > + > + return __dma_alloc_from_contiguous(cma, count, align); > +} > + > /** > * dma_release_from_contiguous() - release allocated pages > * @dev: Pointer to device for which the pages were allocated. > @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device > *dev, int count, > * It returns false when provided pages do not belong to contiguous area and > * true otherwise. > */ > -bool dma_release_from_contiguous(struct device *dev, struct page *pages, > +static bool __dma_release_from_contiguous(struct cma *cma, struct page > *pages, >int count) > { > - struct cma *cma = dev_get_cma_area(dev); > unsigned long pfn; > > if (!cma || !pages) > @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, > struct page *pages, > > return true; > } > + > +bool dma_release_from_contiguous(struct device *dev, struct page *pages, > + int count) > +{ > + struct cma *cma = dev_get_cma_area(dev); > + > + return __dma_release_from_contiguous(cma, pages, count); > +} > -- > 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak
Joonsoo Kim writes: > We should free memory for bitmap when we find zone mis-match, > otherwise this memory will leak. > > Additionally, I copy code comment from ppc kvm's cma code to notify > why we need to check zone mis-match. > > Signed-off-by: Joonsoo Kim Reviewed-by: Aneesh Kumar K.V > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index bd0bb81..fb0cdce 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma) > base_pfn = pfn; > for (j = pageblock_nr_pages; j; --j, pfn++) { > WARN_ON_ONCE(!pfn_valid(pfn)); > + /* > + * alloc_contig_range requires the pfn range > + * specified to be in the same zone. Make this > + * simple by forcing the entire CMA resv range > + * to be in the same zone. > + */ > if (page_zone(pfn_to_page(pfn)) != zone) > - return -EINVAL; > + goto err; > } > init_cma_reserved_pageblock(pfn_to_page(base_pfn)); > } while (--i); > > mutex_init(>lock); > return 0; > + > +err: > + kfree(cma->bitmap); > + return -EINVAL; > } > > static struct cma cma_areas[MAX_CMA_AREAS]; > -- > 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 01/10] DMA, CMA: clean-up log message
Joonsoo Kim writes: > We don't need explicit 'CMA:' prefix, since we already define prefix > 'cma:' in pr_fmt. So remove it. > > And, some logs print function name and others doesn't. This looks > bad to me, so I unify log format to print function name consistently. > > Lastly, I add one more debug log on cma_activate_area(). > > Signed-off-by: Joonsoo Kim > > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c > index 83969f8..bd0bb81 100644 > --- a/drivers/base/dma-contiguous.c > +++ b/drivers/base/dma-contiguous.c > @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit) > } > > if (selected_size && !dma_contiguous_default_area) { > - pr_debug("%s: reserving %ld MiB for global area\n", __func__, > + pr_debug("%s(): reserving %ld MiB for global area\n", __func__, >(unsigned long)selected_size / SZ_1M); Do we need to do function(), or just function:. I have seen the later usage in other parts of the kernel. > > dma_contiguous_reserve_area(selected_size, selected_base, > @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma) > unsigned i = cma->count >> pageblock_order; > struct zone *zone; > > - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); > + pr_debug("%s()\n", __func__); why ? > > + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); > if (!cma->bitmap) > return -ENOMEM; > > @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, > phys_addr_t base, > > /* Sanity checks */ > if (cma_area_count == ARRAY_SIZE(cma_areas)) { > - pr_err("Not enough slots for CMA reserved regions!\n"); > + pr_err("%s(): Not enough slots for CMA reserved regions!\n", > + __func__); > return -ENOSPC; > } > > @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t > size, phys_addr_t base, > *res_cma = cma; > cma_area_count++; > > - pr_info("CMA: reserved %ld MiB at %08lx\n", (unsigned long)size / SZ_1M, > - (unsigned long)base); > + pr_info("%s(): reserved %ld MiB at %08lx\n", > + __func__, (unsigned long)size / SZ_1M, (unsigned long)base); > > /* Architecture specific contiguous memory fixup. */ > dma_contiguous_early_fixup(base, size); > return 0; > err: > - pr_err("CMA: failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M); > + pr_err("%s(): failed to reserve %ld MiB\n", > + __func__, (unsigned long)size / SZ_1M); > return ret; > } > > -- > 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] MMC updates for 3.16-rc1
On Tue, Jun 10, 2014 at 2:50 PM, Linus Torvalds wrote: > > Also, that new drivers/mmc/host/usdhi6rol0.c driver is one f*cking > noisy compile, and knisr certainly has never been tested in a 64-bit > environment. Please either fix it, or make it depend on BROKEN. Guys? Seriously, if that driver isn't fixed, I'm going to mark it broken myself. It pretty much generates as many lines of warnings as the rest of my "allmodconfig" build combined. It's extremely annoying, and the crazy warnings are likely to hide potential real problems elsewhere, so right now that driver has negative value. I do a lot of allmodconfig builds during the merge window, and I am not going to look at that warning much longer. Fix it promptly, or it gets disabled. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] usb: ehci-exynos: Make provision for vdd regulators
On Thursday, June 12, 2014 12:39 AM, Alan Stern wrote: > On Fri, 6 Jun 2014, Vivek Gautam wrote: > > > Facilitate getting required 3.3V and 1.0V VDD supply for > > EHCI controller on Exynos. > > > > With patches for regulators' nodes merged in 3.15: > > c8c253f ARM: dts: Add regulator entries to smdk5420 > > 275dcd2 ARM: dts: add max77686 pmic node for smdk5250, > > > > certain perripherals will now need to ensure that, > > they request VDD regulators in their drivers, and enable > > them so as to make them working. > > "Certain peripherals"? Don't you mean "certain controllers"? > > Does this mean some controllers don't need to use the VDD regulators? > > > @@ -193,7 +196,31 @@ static int exynos_ehci_probe(struct platform_device > > *pdev) > > > > err = exynos_ehci_get_phy(>dev, exynos_ehci); > > if (err) > > - goto fail_clk; > > + goto fail_regulator1; > > + > > + exynos_ehci->vdd33 = devm_regulator_get(>dev, "vdd33"); > > + if (!IS_ERR(exynos_ehci->vdd33)) { > > + err = regulator_enable(exynos_ehci->vdd33); > > + if (err) { > > + dev_err(>dev, > > + "Failed to enable 3.3V Vdd supply\n"); > > + goto fail_regulator1; > > + } > > + } else { > > + dev_warn(>dev, "Regulator 3.3V Vdd supply not found\n"); > > + } > > What if this is one of the controllers that don't need to use a VDD > regulator? Do you really want to print out a warning in that case? > Should you call devm_regulator_get_optional() instead? I agree with Alan's suggestion. This warning message is not proper, when USB controllers that don't need a VDD regulator are used. The devm_regulator_get_optional() looks better. Best regards, Jingoo Han -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the target-updates tree with the virtio tree
Hi Nicholas, Today's linux-next merge of the target-updates tree got a conflict in drivers/scsi/virtio_scsi.c between commit c77fba9ab058 ("virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock") from the virtio tree and commit e6dc783a38ec ("virtio-scsi: Enable DIF/DIX modes in SCSI host LLD") from the target-updates tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/scsi/virtio_scsi.c index 99fdb9403944,1c326b63ca55.. --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@@ -396,10 -438,11 +398,10 @@@ static void virtscsi_event_done(struct */ static int virtscsi_add_cmd(struct virtqueue *vq, struct virtio_scsi_cmd *cmd, - size_t req_size, size_t resp_size, gfp_t gfp) + size_t req_size, size_t resp_size) { struct scsi_cmnd *sc = cmd->sc; - struct scatterlist *sgs[4], req, resp; + struct scatterlist *sgs[6], req, resp; struct sg_table *out, *in; unsigned out_num = 0, in_num = 0; @@@ -425,10 -472,14 +431,14 @@@ sgs[out_num + in_num++] = /* Data-in buffer */ - if (in) + if (in) { + /* Place READ protection SGLs before Data IN payload */ + if (scsi_prot_sg_count(sc)) + sgs[out_num + in_num++] = scsi_prot_sglist(sc); sgs[out_num + in_num++] = in->sgl; + } - return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp); + return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC); } static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq, @@@ -455,9 -538,10 +497,10 @@@ static int virtscsi_queuecommand(struc struct virtio_scsi_vq *req_vq, struct scsi_cmnd *sc) { - struct virtio_scsi_cmd *cmd; - int ret, req_size; - struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev); + struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc); ++ int req_size; + BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize); /* TODO: check feature bit and fail if unsupported? */ @@@ -466,26 -550,34 +509,24 @@@ dev_dbg(>device->sdev_gendev, "cmd %p CDB: %#02x\n", sc, sc->cmnd[0]); - ret = SCSI_MLQUEUE_HOST_BUSY; - cmd = mempool_alloc(virtscsi_cmd_pool, GFP_ATOMIC); - if (!cmd) - goto out; - memset(cmd, 0, sizeof(*cmd)); cmd->sc = sc; - cmd->req.cmd = (struct virtio_scsi_cmd_req){ - .lun[0] = 1, - .lun[1] = sc->device->id, - .lun[2] = (sc->device->lun >> 8) | 0x40, - .lun[3] = sc->device->lun & 0xff, - .tag = (unsigned long)sc, - .task_attr = VIRTIO_SCSI_S_SIMPLE, - .prio = 0, - .crn = 0, - }; BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE); - memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len); - if (virtscsi_kick_cmd(req_vq, cmd, - sizeof cmd->req.cmd, sizeof cmd->resp.cmd) != 0) + if (virtio_has_feature(vscsi->vdev, VIRTIO_SCSI_F_T10_PI)) { + virtio_scsi_init_hdr_pi(>req.cmd_pi, sc); + memcpy(cmd->req.cmd_pi.cdb, sc->cmnd, sc->cmd_len); + req_size = sizeof(cmd->req.cmd_pi); + } else { + virtio_scsi_init_hdr(>req.cmd, sc); + memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len); + req_size = sizeof(cmd->req.cmd); + } + - if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd), -GFP_ATOMIC) == 0) - ret = 0; - else - mempool_free(cmd, virtscsi_cmd_pool); - -out: - return ret; ++ if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof cmd->resp.cmd) != 0) + return SCSI_MLQUEUE_HOST_BUSY; + return 0; } static int virtscsi_queuecommand_single(struct Scsi_Host *sh, signature.asc Description: PGP signature
random: Benchamrking fast_mix2
> I redid my numbers, and I can no longer reproduce the 7x slowdown. I > do see that if you compile w/o -O2, fast_mix2 is twice as slow. But > it's not 7x slower. For my single-round, I needed to drop to 2 loops rather than 3 to match the speed. That's in the source I posted, but I didn't point it out. (It wasn't an attempt to be deceptive, that's just how I happened to have left the file when I was experimenting with various options. I figured if we were looking for 7x, 1.5x wasn't all that important.) That explains some of the residual difference between our figures. When developing, I was using a many-iteration benchmark, and I suspect it fitted in the Ivy Bridge uop cache, which let it saturate the execution resources. Sorry for the premature alarm; I'll go back to work and find something better. I still get comparable speed for 2 loops and -O2: $ cc -W -Wall -m32 -O2 -march=native random.c -o random32 # ./perftest ../spooky/random32 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0:148124 (-24) 1: 48 36 (-12) 2: 40 36 (-4) 3: 44 40 (-4) 4: 44 40 (-4) 5: 36 36 (+0) 6: 52 36 (-16) 7: 44 32 (-12) 8: 44 36 (-8) 9: 48 36 (-12) $ cc -W -Wall -m64 -O2 -march=native random.c -o random64 # ./perftest ../spooky/random64 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0:132104 (-28) 1: 40 40 (+0) 2: 36 44 (+8) 3: 32 40 (+8) 4: 40 36 (-4) 5: 32 40 (+8) 6: 36 44 (+8) 7: 40 40 (+0) 8: 36 44 (+8) 9: 40 36 (-4) $ cc -W -Wall -m32 -O3 -march=native random.c -o random32 # ./perftest ./random32 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 88 48 (-40) 1: 36 40 (+4) 2: 36 44 (+8) 3: 32 40 (+8) 4: 36 40 (+4) 5: 96 40 (-56) 6: 40 40 (+0) 7: 36 40 (+4) 8: 28 48 (+20) 9: 28 40 (+12) $ cc -W -Wall -m64 -O3 -march=native random.c -o random64 # ./perftest ./random64 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 72 80 (+8) 1: 36 52 (+16) 2: 32 36 (+4) 3: 32 36 (+4) 4: 28 40 (+12) 5: 32 40 (+8) 6: 32 40 (+8) 7: 32 36 (+4) 8: 28 44 (+16) 9: 36 36 (+0) $ cc -W -Wall -m32 -Os -march=native random.c -o random32 # ./perftest ./random32 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0:108132 (+24) 1: 44 44 (+0) 2: 76 40 (-36) 3: 44 48 (+4) 4: 36 40 (+4) 5: 32 44 (+12) 6: 40 56 (+16) 7: 44 36 (-8) 8: 44 40 (-4) 9: 32 40 (+8) $ $ cc -W -Wall -m64 -Os -march=native random.c -o random64 # ./perftest ./random64 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 96108 (+12) 1: 44 52 (+8) 2: 40 40 (+0) 3: 40 36 (-4) 4: 40 32 (-8) 5: 36 36 (+0) 6: 44 32 (-12) 7: 36 36 (+0) 8: 40 36 (-4) 9: 40 36 (-4) Yours looks much more careful about the timing. A few GCC warnings I ended up fixing: 1) "volatile" on rdtsc is meaningless and ignore (with a warning) 2) fast_mix2() needs a void return type; it defaults to int. 3) int main() needs a "return 0" Here's what I got running *your* program, unmodified except for the above (meaning 3 inner loop iterations). Compiled with GCC 4.9.0 (Devian 4.9.0-6), -O2. i7-4940K# ./perftest ./ted32 fast_mix: 430 fast_mix2: 431 fast_mix: 442 fast_mix2: 464 fast_mix: 442 fast_mix2: 465 fast_mix: 442 fast_mix2: 431 fast_mix: 442 fast_mix2: 465 fast_mix: 431 fast_mix2: 430 fast_mix: 442 fast_mix2: 431 fast_mix: 431 fast_mix2: 465 fast_mix: 431 fast_mix2: 465 fast_mix: 431 fast_mix2: 431 i7-4940K# ./perftest ./ted64 fast_mix: 454 fast_mix2: 465 fast_mix: 453 fast_mix2: 465 fast_mix: 442 fast_mix2: 464 fast_mix: 453 fast_mix2: 464 fast_mix: 454 fast_mix2: 465 fast_mix: 453 fast_mix2: 465 fast_mix: 442 fast_mix2: 464 fast_mix: 453 fast_mix2: 464 fast_mix: 453 fast_mix2: 464 fast_mix: 453 fast_mix2: 465 In other words, pretty damn near the same speed (with 3 loops). So we still have some discrepancy to track
Re: [PATCH 1/4] spi: qup: Remove chip select function
On Mon, May 19, 2014 at 11:07:38AM +0300, Ivan T. Ivanov wrote: > > +- num-cs: total number of chipselects > > My understanding is that "num-cs" have to be parsed by > master driver, not by core SPI driver. Right. I need to parse it and check vs the max cs and use that value to set the master->num_chipselect > > > - > > - /* Disable auto CS toggle and use manual */ > > - iocontol &= ~SPI_IO_C_MX_CS_MODE; > > Probably we should keep this? Actually this is cleared in the probe during the initial settings of IO_CONTROL. So this isn't necessary. -- sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH[vme/bridges/vme_ca91cx42.c:1382: Bad if test Bug Fix]
On Thu, Jun 12, 2014 at 03:44:34AM +, Nick Krause wrote: > Hey Fellow Developers, > This is my first patch so if there are any errors please reply as i will > fix them. Below is the patch. > -- drivers/vme/bridges/vme_ca91cx42.h.origย ย ย 2014-06-11 22:50:29.339671939 > -0400 > +++ drivers/vme/bridges/vme_ca91cx42.hย ย ย 2014-06-11 23:15:36.027685173 -0400 > @@ -526,7 +526,7 @@ static const int CA91CX42_LINT_LM[] = { > ย #define CA91CX42_VSI_CTL_SUPER_SUPRย ย ย (1<<21) > ย > ย #define CA91CX42_VSI_CTL_VAS_Mย ย ย ย ย ย (7<<16) > -#define CA91CX42_VSI_CTL_VAS_A16ย ย ย 0 > +#define CA91CX42_VSI_CTL_VAS_A16ย ย ย (3<<16) > ย #define CA91CX42_VSI_CTL_VAS_A24ย ย ย (1<<16) > ย #define CA91CX42_VSI_CTL_VAS_A32ย ย ย (1<<17) > ย #define CA91CX42_VSI_CTL_VAS_USER1ย ย ย (3<<17) > @@ -549,7 +549,7 @@ static const int CA91CX42_LINT_LM[] = { > ย #define CA91CX42_LM_CTL_SUPRย ย ย ย ย ย (1<<21) > ย #define CA91CX42_LM_CTL_NPRIVย ย ย ย ย ย (1<<20) > ย #define CA91CX42_LM_CTL_AS_Mย ย ย ย ย ย (5<<16) > -#define CA91CX42_LM_CTL_AS_A16ย ย ย ย ย ย 0 > +#define CA91CX42_LM_CTL_AS_A16ย ย ย ย ย ย (3<<16) > ย #define CA91CX42_LM_CTL_AS_A24ย ย ย ย ย ย (1<<16) > ย #define CA91CX42_LM_CTL_AS_A32ย ย ย ย ย ย (1<<17) > Signed-off-by: Nicholas Krause Always run your patch through scripts/checkpatch.pl first to catch the issues that are 'obvious'. After that, the signed-off-by: needs to be up in the changelog area, there needs to be a changelog explaining why this patch is needed, and the tabs need to be put back in the patch (your email client ate them.) Can you try again? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PATCH[vme/bridges/vme_ca91cx42.c:1382: Bad if test Bug Fix]
Hey Fellow Developers, This is my first patch so if there are any errors please reply as i will fix them. Below is the patch. -- drivers/vme/bridges/vme_ca91cx42.h.origย ย ย 2014-06-11 22:50:29.339671939 -0400 +++ drivers/vme/bridges/vme_ca91cx42.hย ย ย 2014-06-11 23:15:36.027685173 -0400 @@ -526,7 +526,7 @@ static const int CA91CX42_LINT_LM[] = { ย #define CA91CX42_VSI_CTL_SUPER_SUPRย ย ย (1<<21) ย ย #define CA91CX42_VSI_CTL_VAS_Mย ย ย ย ย ย (7<<16) -#define CA91CX42_VSI_CTL_VAS_A16ย ย ย 0 +#define CA91CX42_VSI_CTL_VAS_A16ย ย ย (3<<16) ย #define CA91CX42_VSI_CTL_VAS_A24ย ย ย (1<<16) ย #define CA91CX42_VSI_CTL_VAS_A32ย ย ย (1<<17) ย #define CA91CX42_VSI_CTL_VAS_USER1ย ย ย (3<<17) @@ -549,7 +549,7 @@ static const int CA91CX42_LINT_LM[] = { ย #define CA91CX42_LM_CTL_SUPRย ย ย ย ย ย (1<<21) ย #define CA91CX42_LM_CTL_NPRIVย ย ย ย ย ย (1<<20) ย #define CA91CX42_LM_CTL_AS_Mย ย ย ย ย ย (5<<16) -#define CA91CX42_LM_CTL_AS_A16ย ย ย ย ย ย 0 +#define CA91CX42_LM_CTL_AS_A16ย ย ย ย ย ย (3<<16) ย #define CA91CX42_LM_CTL_AS_A24ย ย ย ย ย ย (1<<16) ย #define CA91CX42_LM_CTL_AS_A32ย ย ย ย ย ย (1<<17) Signed-off-by: Nicholas Krause Nick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [net/ipvs] BUG: unable to handle kernel NULL pointer dereference at 00000004
On Wed, Jun 11, 2014 at 04:34:19PM +0800, Jet Chen wrote: > On 06/11/2014 01:59 PM, Julian Anastasov wrote: > > > > Hello, > > > > On Wed, 11 Jun 2014, Jet Chen wrote: > > > >> Hi Wensong, > >> > >> 0day kernel testing robot got the below dmesg. > >> > >> +---++ > >> | boot_successes| 26 | > >> | boot_failures | 4 | > >> | BUG:unable_to_handle_kernel_NULL_pointer_dereference | 4 | > >> | Oops | 4 | > >> | EIP_is_at_ip_vs_stop_estimator| 4 | > >> | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 4 | > >> | backtrace:cleanup_net | 4 | > >> +---++ > >> > >> > >> [child0:2725] process_vm_readv (347) returned ENOSYS, marking as inactive. > >> [child0:2725] uid changed! Was: 0, now -788547075 > >> Bailing main loop. Exit reason: UID changed. > >> [ 12.182233] BUG: unable to handle kernel NULL pointer dereference at > >> 0004 > >> [ 12.183011] IP: [<4c2f6567>] ip_vs_stop_estimator+0x20/0x3e > >> [ 12.183011] *pdpt = *pde = f000ff53f000ff53 [ > >> 12.183011] Oops: 0002 [#1] DEBUG_PAGEALLOC > >> [ 12.183011] Modules linked in: > >> [ 12.183011] CPU: 0 PID: 57 Comm: kworker/u2:1 Not tainted 3.15.0-rc8 #1 > >> [ 12.183011] Workqueue: netns cleanup_net > >> [ 12.183011] task: 528773f0 ti: 52878000 task.ti: 52878000 > >> [ 12.183011] EIP: 0060:[<4c2f6567>] EFLAGS: 00010206 CPU: 0 > >> [ 12.183011] EIP is at ip_vs_stop_estimator+0x20/0x3e > >> [ 12.183011] EAX: EBX: 51c39a54 ECX: EDX: > > > > ip_vs_stop_estimator fails at list_del(>list) > > on mov %eax,0x4(%edx) instruction and EDX is 0. It means, > > this estimator was never started (initialized with > > INIT_LIST_HEAD in ip_vs_start_estimator) or stopped > > before with the same list_del. > > > > At first look, it is strange but I think the reason > > is the missing CONFIG_SYSCTL. ip_vs_control_net_cleanup > > fails at ip_vs_stop_estimator(net, >tot_stats) > > because it is called not depending on CONFIG_SYSCTL but > > without CONFIG_SYSCTL ip_vs_start_estimator was never > > called. > > > > Can you test such patch? > > Julian, your patch works. Thanks. > > Tested-by: Jet Chen Thanks, Julian, should I take this one? I'm assuming this problem has been present for quite a number of releases. > > ipvs: stop tot_stats estimator only under CONFIG_SYSCTL > > > > The tot_stats estimator is started only when CONFIG_SYSCTL > > is defined. But it is stopped without checking CONFIG_SYSCTL. > > Fix the crash by moving ip_vs_stop_estimator into > > ip_vs_control_net_cleanup_sysctl. > > > > Signed-off-by: Julian Anastasov > > --- > > net/netfilter/ipvs/ip_vs_ctl.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > > index c42e83d..581a658 100644 > > --- a/net/netfilter/ipvs/ip_vs_ctl.c > > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > > @@ -3778,6 +3778,7 @@ static void __net_exit > > ip_vs_control_net_cleanup_sysctl(struct net *net) > > cancel_delayed_work_sync(>defense_work); > > cancel_work_sync(>defense_work.work); > > unregister_net_sysctl_table(ipvs->sysctl_hdr); > > + ip_vs_stop_estimator(net, >tot_stats); > > } > > > > #else > > @@ -3840,7 +3841,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net > > *net) > > struct netns_ipvs *ipvs = net_ipvs(net); > > > > ip_vs_trash_cleanup(net); > > - ip_vs_stop_estimator(net, >tot_stats); > > ip_vs_control_net_cleanup_sysctl(net); > > remove_proc_entry("ip_vs_stats_percpu", net->proc_net); > > remove_proc_entry("ip_vs_stats", net->proc_net); > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/char/random.c: More futzing about
Just to add to my total confusion about the totally disparate performance numbers we're seeing, I did some benchmarks on other machines. The speedup isn't as good one-pass as it is iterated, and as I mentioned it's slower on a P4, but it's not 7 times slower by any stretch. There are all 1-iteration numbers, run immediately after scp-ing the binary to the machine so there's no possibility if anything being cached. (The "64" and "32" versions are compiled -m32 and -m64, of course.) 2.5 GHz Phenom 9850: $ /tmp/random64 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0:199142 (-57) 1:104 95 (-9) 2:104110 (+6) 3:103109 (+6) 4:105 89 (-16) 5:103 88 (-15) 6:104 89 (-15) 7:104 95 (-9) 8:105 85 (-20) 9:105 85 (-20) $ /tmp/random32 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0:324147 (-177) 1:100 86 (-14) 2:100 99 (-1) 3:100 88 (-12) 4:100 86 (-14) 5:100 86 (-14) 6:100 89 (-11) 7:100111 (+11) 8:100111 (+11) 9:100 88 (-12) $ /tmp/random64 1 pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1 0: 554788 220327 (-334461) 1: 554825 220176 (-334649) 2: 553505 220148 (-57) 3: 554661 220064 (-334597) 4: 569559 220064 (-349495) 5: 612798 220065 (-392733) 6: 570287 220064 (-350223) 7: 554790 220064 (-334726) 8: 554715 220065 (-334650) 9: 569840 220064 (-349776) $ /tmp/random32 1 pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1 0: 520117 280225 (-239892) 1: 520125 280154 (-239971) 2: 520104 280094 (-240010) 3: 520079 280060 (-240019) 4: 520069 280060 (-240009) 5: 520060 280060 (-24) 6: 558971 280060 (-278911) 7: 520102 280060 (-240042) 8: 520082 280060 (-240022) 9: 520058 280060 (-239998) 3 GHz i5-3330: $ /tmp/random64 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 78 75 (-3) 1: 36 33 (-3) 2: 33 39 (+6) 3: 36 30 (-6) 4: 36 33 (-3) 5: 30 33 (+3) 6: 30 54 (+24) 7: 24 48 (+24) 8: 27 33 (+6) 9: 30 33 (+3) $ /tmp/random32 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 66 78 (+12) 1: 39 39 (+0) 2: 36 39 (+3) 3: 45 33 (-12) 4: 42 33 (-9) 5: 33 42 (+9) 6: 45 33 (-12) 7: 39 36 (-3) 8:105 48 (-57) 9: 42 39 (-3) $ /tmp/random64 1 pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1 0: 406188 218104 (-188084) 1: 402620 246968 (-155652) 2: 402652 239840 (-162812) 3: 402720 200312 (-202408) 4: 402584 200080 (-202504) 5: 447488 200228 (-247260) 6: 402788 200312 (-202476) 7: 402688 200080 (-202608) 8: 427140 224320 (-202820) 9: 402576 200080 (-202496) $ /tmp/random32 1 pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1 0: 406485 266670 (-139815) 1: 392694 266463 (-126231) 2: 392496 266763 (-125733) 3: 426003 266145 (-159858) 4: 392688 27 (-126021) 5: 432231 266589 (-165642) 6: 392754 298734 (-94020) 7: 392883 284994 (-107889) 8: 392637 266694 (-125943) 9: 392985 267024 (-125961) 3.5 GHz i7-2700: # /tmp/perftest /tmp/random64 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 82 90 (+8) 1: 38 41 (+3) 2: 46 38 (-8) 3: 35 41 (+6) 4: 46 41 (-5) 5: 38 38 (+0) 6: 41 55 (+14) 7: 41 35 (-6) 8: 46 24 (-22) 9: 35 38 (+3) # /tmp/perftest /tmp/random32 pool 1 = 85670974 e96b1f8f 51244abf 5863283f pool 2 = 03564c6c eba81d03 55c77fa1 760374a7 0: 82 76 (-6) 1: 32 53 (+21) 2: 49 44 (-5) 3: 35 41 (+6) 4: 46 35 (-11) 5: 35 44 (+9) 6: 49 50 (+1) 7: 41 41 (+0) 8: 32 44 (+12) 9: 49 44 (-5) #
linux-next: manual merge of the virtio tree with Linus' tree
Hi Rusty, Today's linux-next merge of the virtio tree got a conflict in drivers/scsi/virtio_scsi.c between commit b54197c43db8 ("virtio_scsi: use cmd_size") from Linus' tree and commit c77fba9ab058 ("virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock") from the virtio tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/scsi/virtio_scsi.c index d4727b339474,e2a68aece3da.. --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@@ -484,10 -529,13 +483,9 @@@ static int virtscsi_queuecommand(struc memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len); if (virtscsi_kick_cmd(req_vq, cmd, - sizeof cmd->req.cmd, sizeof cmd->resp.cmd, - GFP_ATOMIC) != 0) -sizeof cmd->req.cmd, sizeof cmd->resp.cmd) == 0) - ret = 0; - else - mempool_free(cmd, virtscsi_cmd_pool); - -out: - return ret; ++sizeof cmd->req.cmd, sizeof cmd->resp.cmd) != 0) + return SCSI_MLQUEUE_HOST_BUSY; + return 0; } static int virtscsi_queuecommand_single(struct Scsi_Host *sh, signature.asc Description: PGP signature
Re: [PATCH ftrace/core 0/2] ftrace, kprobes: Introduce IPMODIFY flag for ftrace_ops to detect conflicts
Hi Josh, On Wed, 11 Jun 2014 11:58:26 -0500, Josh Poimboeuf wrote: > On Tue, Jun 10, 2014 at 10:50:01AM +, Masami Hiramatsu wrote: >> Hi, >> >> Here is a pair of patches which introduces IPMODIFY flag for >> ftrace_ops to detect conflicts of ftrace users who can modify >> regs->ip in their handler. >> Currently, only kprobes can change the regs->ip in the handler, >> but recently kpatch is also want to change it. Moreover, since >> the ftrace itself exported to modules, it might be considerable >> senario. >> >> Here we talked on github. >> https://github.com/dynup/kpatch/issues/47 >> >> To protect modified regs-ip from each other, this series >> introduces FTRACE_OPS_FL_IPMODIFY flag and ftrace now ensures >> the flag can be set on each function entry location. If there >> is someone who already reserve regs->ip on target function >> entry, ftrace_set_filter_ip or register_ftrace_function will >> return -EBUSY. Users must handle that. >> >> At this point, all kprobes will reserve regs->ip, since jprobe >> requires it. > > Masami, thanks very much for this! > > One issue with this approach is that it _always_ makes kprobes and > kpatch incompatible when probing/patching the same function, even when > kprobes doesn't need to touch regs->ip. > > Is it possible to add a kprobes flag (KPROBE_FLAG_IPMODIFY), which is > only set by those kprobes users (just jprobes?) which need to modify IP? > Then kprobes could only set the corresponding ftrace flag when it's > really needed. And I think kprobes could even enforce the fact that > !KPROBE_FLAG_IPMODIFY users don't change regs->ip. > > > BTW, I've done some testing with this patch set by patching/probing the > same function with FTRACE_OPS_FL_IPMODIFY, and got some warnings. I saw > the following warning when attempting to kpatch a kprobed function: > > > WARNING: CPU: 2 PID: 18351 at kernel/trace/ftrace.c:419 > __unregister_ftrace_function+0x1be/0x1d0() > Modules linked in: kpatch_meminfo_string(OE+) kpatch(OE) > stap_8d70d6e041605bd1e144cba4801652_14636(OE) rfcomm fuse ipt_MASQUERADE ccm > xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack bnep ebtable_nat > ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat > nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle > ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack > iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 > snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic > x86_pkg_temp_thermal coretemp kvm_intel snd_hda_intel iTCO_wdt > iTCO_vendor_support snd_hda_controller kvm snd_hda_codec iwlwifi snd_hwdep > uvcvideo snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_dev > ice >videobuf2_core btusb v4l2_common snd_pcm videodev nfsd cfg80211 microcode > e1000e bluetooth media thinkpad_acpi joydev sdhci_pci sdhci pcspkr serio_raw > snd_timer i2c_i801 snd mmc_core auth_rpcgss mei_me mei lpc_ich mfd_core > shpchp ptp pps_core wmi tpm_tis soundcore tpm rfkill nfs_acl lockd sunrpc > dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul crc32_pclmul > crc32c_intel ghash_clmulni_intel i2c_core video > CPU: 2 PID: 18351 Comm: insmod Tainted: GW OE 3.15.0-IPMODIFY+ #1 > Hardware name: LENOVO 2356BH8/2356BH8, BIOS G7ET63WW (2.05 ) 11/12/2012 > b39bd289 8803b78d7bc0 816f31ed > 8803b78d7bf8 8108914d a07f9040 >fff0 0001 8803e7ac4200 > Call Trace: >[] dump_stack+0x45/0x56 >[] warn_slowpath_common+0x7d/0xa0 >[] warn_slowpath_null+0x1a/0x20 >[] __unregister_ftrace_function+0x1be/0x1d0 >[] ftrace_startup+0x1e4/0x220 >[] register_ftrace_function+0x43/0x60 >[] kpatch_register+0x664/0x830 [kpatch] >[] ? 0xa080 >[] ? 0xa080 >[] patch_init+0x194/0x1000 [kpatch_meminfo_string] >[] ? 0xa0045fff >[] do_one_initcall+0xd4/0x210 >[] ? set_memory_nx+0x43/0x50 >[] load_module+0x1d92/0x25e0 >[] ? store_uevent+0x70/0x70 >[] ? kernel_read+0x50/0x80 >[] SyS_finit_module+0xa6/0xd0 >[] system_call_fastpath+0x16/0x1b > > > That warning happened because __unregister_ftrace_function() doesn't > expect FTRACE_OPS_FL_ENABLED to be cleared in the ftrace_startup error > path. I tried removing the FTRACE_OPS_FL_ENABLED clearing line in > ftrace_startup, but I saw more warnings. Did you just remove the clearing line or actually clear the flag after __unregister_ftrace_function() was called? > This one happened when attempting to kprobe a kpatched function: > > > WARNING: CPU: 3 PID: at kernel/kprobes.c:953 arm_kprobe+0xa7/0xe0() > Failed to init kprobe-ftrace (-16) > Modules linked in: stap_b2ea0de23f179d8ded86fcc19fcc533_(OE) > kpatch_meminfo_string(OE) kpatch(OE) rfcomm fuse ccm ipt_MASQUERADE >
Re: [patch V4 02/10] rtmutex: Simplify rtmutex_slowtrylock()
On 06/12/2014 02:44 AM, Thomas Gleixner wrote: > Oleg noticed that rtmutex_slowtrylock() has a pointless check for > rt_mutex_owner(lock) != current. > > To avoid calling try_to_take_rtmutex() we really want to check whether > the lock has an owner at all or whether the trylock failed because the > owner is NULL, but the RT_MUTEX_HAS_WAITERS bit is set. This covers > the lock is owned by caller situation as well. > > We can actually do this check lockless. trylock is taking a chance > whether we take lock->wait_lock to do the check or not. > > Add comments to the function while at it. > > Reported-by: Oleg Nesterov > Signed-off-by: Thomas Gleixner > --- Reviewed-by: Lai Jiangshan Thanks, Lai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
(2014/06/11 16:41), Namhyung Kim wrote: > Hi Masami, > > On Wed, 11 Jun 2014 10:28:01 +0900, Masami Hiramatsu wrote: >> (2014/06/10 22:53), Namhyung Kim wrote: >>> Hi Masami, >>> >>> 2014-06-10 (ํ), 10:50 +, Masami Hiramatsu: Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among ftrace users who may modify regs->ip to change the execution path. This also adds the flag to kprobe_ftrace_ops, since ftrace-based kprobes already modifies regs->ip. Thus, if another user modifies the regs->ip on the same function entry, one of them will be broken. So both should add IPMODIFY flag and make sure that ftrace_set_filter_ip() succeeds. Note that currently conflicts of IPMODIFY are detected on the filter hash. It does NOT care about the notrace hash. This means that if you set filter hash all functions and notrace(mask) some of them, the IPMODIFY flag will be applied to all functions. >>> >>> [SNIP] +static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops, + struct ftrace_hash *old_hash, + struct ftrace_hash *new_hash) +{ + struct ftrace_page *pg; + struct dyn_ftrace *rec, *end = NULL; + int in_old, in_new; + + /* Only update if the ops has been registered */ + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) + return 0; + + if (!(ops->flags & FTRACE_OPS_FL_SAVE_REGS) || + !(ops->flags & FTRACE_OPS_FL_IPMODIFY)) + return 0; + + /* Update rec->flags */ + do_for_each_ftrace_rec(pg, rec) { + /* We need to update only differences of filter_hash */ + in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip); + in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip); >>> >>> Why not use ftrace_hash_empty() here instead of checking NULL? >> >> Ah, a trick is here. Since an empty filter_hash must hit all, we can not >> enable/disable filter_hash if we use ftrace_hash_empty() here. >> >> To enabling the new_hash, old_hash must be EMPTY_HASH which means in_old >> always be false. To disabling, new_hash is EMPTY_HASH too. >> Please see ftrace_hash_ipmodify_enable/disable/update(). > > I'm confused. 8-p I guess what you want to do is checking records in > either of the filter_hash, right? If so, what about this? > > in_old = !ftrace_hash_empty(old_hash) && ftrace_lookup_ip(old_hash, > rec->ip); > in_new = !ftrace_hash_empty(new_hash) && ftrace_lookup_ip(new_hash, > rec->ip); NO, ftrace_lookup_ip() returns NULL if the hash is empty, so adding !ftrace_hash_empty() is meaningless :) Actually, here I intended to have 3 meanings for the new/old_hash arguments, - If it is NULL, it hits all - If it is EMPTY_HASH, it hits nothing - If it has some entries, it hits those entries. And in ftrace.c(__ftrace_hash_rec_update), AFAICS, ops->filter_hash has only 2 meanings, - If it is EMPTY_HASH or NULL, it hits all - If it has some entries, it hits those entries. So I had to do above change... >>> Also >>> return value of ftrace_lookup_ip is not boolean.. maybe you need to >>> add !! or convert type of the in_{old,new} to bool. >> >> Yeah, I see. And there is '||' (logical OR) which evaluates the result >> as boolean. :) > > Argh... you're right! :) > >> >>> >>> + if (in_old == in_new) + continue; + + if (in_new) { + /* New entries must ensure no others are using it */ + if (rec->flags & FTRACE_FL_IPMODIFY) + goto rollback; + rec->flags |= FTRACE_FL_IPMODIFY; + } else /* Removed entry */ + rec->flags &= ~FTRACE_FL_IPMODIFY; + } while_for_each_ftrace_rec(); + + return 0; + +rollback: + end = rec; + + /* Roll back what we did above */ + do_for_each_ftrace_rec(pg, rec) { + if (rec == end) + goto err_out; + + in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip); + in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip); + if (in_old == in_new) + continue; + + if (in_new) + rec->flags &= ~FTRACE_FL_IPMODIFY; + else + rec->flags |= FTRACE_FL_IPMODIFY; + } while_for_each_ftrace_rec(); + +err_out: + return -EBUSY; +} + +static int ftrace_hash_ipmodify_enable(struct ftrace_ops *ops) +{ + struct ftrace_hash *hash = ops->filter_hash; + + if (ftrace_hash_empty(hash)) + hash = NULL; + + return __ftrace_hash_update_ipmodify(ops, EMPTY_HASH, hash); +} >>> >>> Please see above comment. You can pass an empty hash as is, or
RE: Hello
$2M was Donated to you. Please contact Pedro via pedroquezada...@qq.com-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch 11/13] wireless: mwifiex: Use the proper interfaces
Hi Thomas, Thanks for your patch. > Why is converting time formats so desired if there are proper > interfaces for this? > > Signed-off-by: Thomas Gleixner > Cc: Bing Zhao > Cc: "John W. Linville" > Cc: linux-wirel...@vger.kernel.org [...] > Index: linux/drivers/net/wireless/mwifiex/main.c > === > --- linux.orig/drivers/net/wireless/mwifiex/main.c > +++ linux/drivers/net/wireless/mwifiex/main.c > @@ -611,7 +611,6 @@ mwifiex_hard_start_xmit(struct sk_buff * > struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev); > struct sk_buff *new_skb; > struct mwifiex_txinfo *tx_info; > - struct timeval tv; > > dev_dbg(priv->adapter->dev, "data: %lu BSS(%d-%d): Data <= kernel\n", > jiffies, priv->bss_type, priv->bss_num); > @@ -658,8 +657,7 @@ mwifiex_hard_start_xmit(struct sk_buff * >* firmware for aggregate delay calculation for stats and >* MSDU lifetime expiry. >*/ > - do_gettimeofday(); > - skb->tstamp = timeval_to_ktime(tv); > + __net_timestamp(skb); > > mwifiex_queue_tx_pkt(priv, skb); > > Index: linux/drivers/net/wireless/mwifiex/tdls.c > === > --- linux.orig/drivers/net/wireless/mwifiex/tdls.c > +++ linux/drivers/net/wireless/mwifiex/tdls.c > @@ -552,8 +552,7 @@ int mwifiex_send_tdls_data_frame(struct > tx_info->bss_num = priv->bss_num; > tx_info->bss_type = priv->bss_type; > > - do_gettimeofday(); > - skb->tstamp = timeval_to_ktime(tv); > + __net_timestamp(skb); I guess we need to remove "struct timeval tv" local variable too. > mwifiex_queue_tx_pkt(priv, skb); > > return 0; > @@ -710,8 +709,7 @@ int mwifiex_send_tdls_action_frame(struc > pkt_len = skb->len - MWIFIEX_MGMT_FRAME_HEADER_SIZE - sizeof(pkt_len); > memcpy(skb->data + MWIFIEX_MGMT_FRAME_HEADER_SIZE, _len, > sizeof(pkt_len)); > - do_gettimeofday(); > - skb->tstamp = timeval_to_ktime(tv); > + __net_timestamp(skb); And here too. Could you please remove these two "struct timeval tv" and send v2 with my ACK? Acked-by: Bing Zhao Thanks, Bing > mwifiex_queue_tx_pkt(priv, skb); > > return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/char/random.c: More futzing about
On Wed, Jun 11, 2014 at 08:32:49PM -0400, George Spelvin wrote: > Comparable, but slightly slower. Clearly, I need to do better. > And you can see the first-iteration effects clearly. Still, > noting *remotely* like 7x! I redid my numbers, and I can no longer reproduce the 7x slowdown. I do see that if you compile w/o -O2, fast_mix2 is twice as slow. But it's not 7x slower. When compiling w/o -O2: fast_mix fast_mix2 task-clock 221.3 ms 460.7 ms When compiling with -O2 -Os: fast_mix fast_mix2 task-clock 115.4 ms 71.5 ms And here's the numbers I got with a single iteration using rdtsc: fast_mix: 164 fast_mix2: 237 fast_mix: 168 fast_mix2: 230 fast_mix: 166 fast_mix2: 228 fast_mix: 164 fast_mix2: 230 fast_mix: 166 fast_mix2: 230 fast_mix: 168 fast_mix2: 232 fast_mix: 166 fast_mix2: 228 fast_mix: 164 fast_mix2: 228 fast_mix: 166 fast_mix2: 234 fast_mix: 166 fast_mix2: 230 - Ted #include #include #include #include typedef unsigned int __u32; struct fast_pool { __u32 pool[4]; unsigned long last; unsigned short count; unsigned char rotate; unsigned char last_timer_intr; }; /** * rol32 - rotate a 32-bit value left * @word: value to rotate * @shift: bits to roll */ static inline __u32 rol32(__u32 word, unsigned int shift) { return (word << shift) | (word >> (32 - shift)); } static __u32 const twist_table[8] = { 0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158, 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 }; /* * This is a fast mixing routine used by the interrupt randomness * collector. It's hardcoded for an 128 bit pool and assumes that any * locks that might be needed are taken by the caller. */ extern void fast_mix(struct fast_pool *f, __u32 input[4]) { __u32 w; unsignedinput_rotate = f->rotate; w = rol32(input[0], input_rotate) ^ f->pool[0] ^ f->pool[3]; f->pool[0] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 14) & 31; w = rol32(input[1], input_rotate) ^ f->pool[1] ^ f->pool[0]; f->pool[1] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 7) & 31; w = rol32(input[2], input_rotate) ^ f->pool[2] ^ f->pool[1]; f->pool[2] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 7) & 31; w = rol32(input[3], input_rotate) ^ f->pool[3] ^ f->pool[2]; f->pool[3] = (w >> 3) ^ twist_table[w & 7]; input_rotate = (input_rotate + 7) & 31; f->rotate = input_rotate; f->count++; } extern fast_mix2(struct fast_pool *f, __u32 const input[4]) { __u32 a = f->pool[0] ^ input[0], b = f->pool[1] ^ input[1]; __u32 c = f->pool[2] ^ input[2], d = f->pool[3] ^ input[3]; int i; for (i = 0; i < 3; i++) { /* * Inspired by ChaCha's QuarterRound, but * modified for much greater parallelism. * Surprisingly, rotating a and c seems to work * better than b and d. And it runs faster. */ a += b; c += d; d ^= a; b ^= c; a = rol32(a, 15); c = rol32(c, 21); a += b; c += d; d ^= a; b ^= c; a = rol32(a, 3);c = rol32(c, 7); } f->pool[0] = a; f->pool[1] = b; f->pool[2] = c; f->pool[3] = d; f->count++; } static __inline__ volatile unsigned long long rdtsc(void) { unsigned long long int x; __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); return x; } int main(int argc, char **argv) { struct fast_pool f; int i; __u32 input[4]; unsigned volatile long long start_time, end_time; memset(, 0, sizeof(f)); memset(, 0, sizeof(input)); f.pool[0] = 1; #if !defined(BENCH_FASTMIX) && !defined(BENCH_FASTMIX2) for (i=0; i < 10; i++) { usleep(5); start_time = rdtsc(); fast_mix(, input); end_time = rdtsc(); printf("fast_mix: %llu\t", end_time - start_time); usleep(5); start_time = rdtsc(); fast_mix2(, input); end_time = rdtsc(); printf("fast_mix2: %llu\n", end_time - start_time); } #endif #ifdef BENCH_FASTMIX for (i=0; i < 1024; i++) { fast_mix(, input); } #endif #ifdef BENCH_FASTMIX2 for (i=0; i < 1024; i++) { fast_mix2(, input); } #endif } -- To unsubscribe from this list: send the line
[PATCH v2 08/10] mm, cma: clean-up cma allocation error path
We can remove one call sites for clear_cma_bitmap() if we first call it before checking error number. Signed-off-by: Joonsoo Kim diff --git a/mm/cma.c b/mm/cma.c index 1e1b017..01a0713 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align) if (ret == 0) { page = pfn_to_page(pfn); break; - } else if (ret != -EBUSY) { - clear_cma_bitmap(cma, pfn, count); - break; } + clear_cma_bitmap(cma, pfn, count); + if (ret != -EBUSY) + break; + pr_debug("%s(): memory range at %p is busy, retrying\n", __func__, pfn_to_page(pfn)); /* try again with a bit different memory target */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 10/10] mm, cma: use spinlock instead of mutex
Currently, we should take the mutex for manipulating bitmap. This job may be really simple and short so we don't need to sleep if contended. So I change it to spinlock. Signed-off-by: Joonsoo Kim diff --git a/mm/cma.c b/mm/cma.c index 22a5b23..3085e8c 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -36,7 +37,7 @@ struct cma { unsigned long count; unsigned long *bitmap; int order_per_bit; /* Order of pages represented by one bit */ - struct mutexlock; + spinlock_t lock; }; /* @@ -72,9 +73,9 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) bitmapno = (pfn - cma->base_pfn) >> cma->order_per_bit; nr_bits = cma_bitmap_pages_to_bits(cma, count); - mutex_lock(>lock); + spin_lock(>lock); bitmap_clear(cma->bitmap, bitmapno, nr_bits); - mutex_unlock(>lock); + spin_unlock(>lock); } static int __init cma_activate_area(struct cma *cma) @@ -112,7 +113,7 @@ static int __init cma_activate_area(struct cma *cma) init_cma_reserved_pageblock(pfn_to_page(base_pfn)); } while (--i); - mutex_init(>lock); + spin_lock_init(>lock); return 0; err: @@ -261,11 +262,11 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align) nr_bits = cma_bitmap_pages_to_bits(cma, count); for (;;) { - mutex_lock(>lock); + spin_lock(>lock); bitmapno = bitmap_find_next_zero_area(cma->bitmap, bitmap_maxno, start, nr_bits, mask); if (bitmapno >= bitmap_maxno) { - mutex_unlock(>lock); + spin_unlock(>lock); break; } bitmap_set(cma->bitmap, bitmapno, nr_bits); @@ -274,7 +275,7 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align) * our exclusive use. If the migration fails we will take the * lock again and unmark it. */ - mutex_unlock(>lock); + spin_unlock(>lock); pfn = cma->base_pfn + (bitmapno << cma->order_per_bit); mutex_lock(_mutex); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 00/10] CMA: generalize CMA reserved area management code
Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. >From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying & pasting this reserved area management code. v2: Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. Patch 1-5 prepare some features to cover ppc kvm's requirements. Patch 6-7 generalize CMA reserved area management code and change users to use it. Patch 8-10 clean-up minor things. Joonsoo Kim (10): DMA, CMA: clean-up log message DMA, CMA: fix possible memory leak DMA, CMA: separate core cma management codes from DMA APIs DMA, CMA: support alignment constraint on cma region DMA, CMA: support arbitrary bitmap granularity CMA: generalize CMA reserved area management functionality PPC, KVM, CMA: use general CMA reserved area management framework mm, cma: clean-up cma allocation error path mm, cma: move output param to the end of param list mm, cma: use spinlock instead of mutex arch/powerpc/kvm/book3s_hv_builtin.c | 17 +- arch/powerpc/kvm/book3s_hv_cma.c | 240 arch/powerpc/kvm/book3s_hv_cma.h | 27 --- drivers/base/Kconfig | 10 - drivers/base/dma-contiguous.c| 248 ++--- include/linux/cma.h | 12 ++ include/linux/dma-contiguous.h |3 +- mm/Kconfig | 11 ++ mm/Makefile |1 + mm/cma.c | 333 ++ 10 files changed, 382 insertions(+), 520 deletions(-) delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.c delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.h create mode 100644 include/linux/cma.h create mode 100644 mm/cma.c -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 09/10] mm, cma: move output param to the end of param list
Conventionally, we put output param to the end of param list. cma_declare_contiguous() doesn't look like that, so change it. Additionally, move down cma_areas reference code to the position where it is really needed. Signed-off-by: Joonsoo Kim diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 28ec226..97613ea 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void) align_size = max(kvm_rma_pages << PAGE_SHIFT, align_size); cma_declare_contiguous(selected_size, 0, 0, align_size, - KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, _cma, false); + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, _cma); } } diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index f177f73..bfd4553 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed); + ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma); if (ret) return ret; diff --git a/include/linux/cma.h b/include/linux/cma.h index e38efe9..e53eead 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -6,7 +6,7 @@ struct cma; extern int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base, phys_addr_t limit, phys_addr_t alignment, int order_per_bit, - struct cma **res_cma, bool fixed); + bool fixed, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align); extern bool cma_release(struct cma *cma, struct page *pages, int count); #endif diff --git a/mm/cma.c b/mm/cma.c index 01a0713..22a5b23 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas); * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the contiguous memory area, should be power of 2 * @order_per_bit: Order of pages represented by one bit on bitmap. - * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area + * @res_cma: Pointer to store the created cma region. * * This function reserves memory from early allocator. It should be * called by arch specific code once the early allocator (memblock or bootmem) @@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas); int __init cma_declare_contiguous(phys_addr_t size, phys_addr_t base, phys_addr_t limit, phys_addr_t alignment, int order_per_bit, - struct cma **res_cma, bool fixed) + bool fixed, struct cma **res_cma) { - struct cma *cma = _areas[cma_area_count]; + struct cma *cma; int ret = 0; pr_debug("%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n", @@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size, * Each reserved area must be initialised later, when more kernel * subsystems (like slab allocator) are available. */ + cma = _areas[cma_area_count]; cma->base_pfn = PFN_DOWN(base); cma->count = size >> PAGE_SHIFT; cma->order_per_bit = order_per_bit; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 06/10] CMA: generalize CMA reserved area management functionality
Currently, there are two users on CMA functionality, one is the DMA subsystem and the other is the kvm on powerpc. They have their own code to manage CMA reserved area even if they looks really similar. >From my guess, it is caused by some needs on bitmap management. Kvm side wants to maintain bitmap not for 1 page, but for more size. Eventually it use bitmap where one bit represents 64 pages. When I implement CMA related patches, I should change those two places to apply my change and it seem to be painful to me. I want to change this situation and reduce future code management overhead through this patch. This change could also help developer who want to use CMA in their new feature development, since they can use CMA easily without copying & pasting this reserved area management code. In previous patches, we have prepared some features to generalize CMA reserved area management and now it's time to do it. This patch moves core functions to mm/cma.c and change DMA APIs to use these functions. There is no functional change in DMA APIs. v2: There is no big change from v1 in mm/cma.c. Mostly renaming. Acked-by: Michal Nazarewicz Signed-off-by: Joonsoo Kim diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 00e13ce..4eac559 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -283,16 +283,6 @@ config CMA_ALIGNMENT If unsure, leave the default value "8". -config CMA_AREAS - int "Maximum count of the CMA device-private areas" - default 7 - help - CMA allows to create CMA areas for particular devices. This parameter - sets the maximum number of such device private CMA areas in the - system. - - If unsure, leave the default value "7". - endif endmenu diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 9bc9340..f177f73 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -24,25 +24,10 @@ #include #include -#include -#include -#include #include -#include -#include -#include #include #include - -struct cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; - int order_per_bit; /* Order of pages represented by one bit */ - struct mutexlock; -}; - -struct cma *dma_contiguous_default_area; +#include #ifdef CONFIG_CMA_SIZE_MBYTES #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES @@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area; #define CMA_SIZE_MBYTES 0 #endif +struct cma *dma_contiguous_default_area; + /* * Default global CMA area size can be defined in kernel's .config. * This is useful mainly for distro maintainers to create a kernel @@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } } -static DEFINE_MUTEX(cma_mutex); - -static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) -{ - return (1 << (align_order >> cma->order_per_bit)) - 1; -} - -static unsigned long cma_bitmap_maxno(struct cma *cma) -{ - return cma->count >> cma->order_per_bit; -} - -static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, - unsigned long pages) -{ - return ALIGN(pages, 1 << cma->order_per_bit) >> cma->order_per_bit; -} - -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - unsigned long bitmapno, nr_bits; - - bitmapno = (pfn - cma->base_pfn) >> cma->order_per_bit; - nr_bits = cma_bitmap_pages_to_bits(cma, count); - - mutex_lock(>lock); - bitmap_clear(cma->bitmap, bitmapno, nr_bits); - mutex_unlock(>lock); -} - -static int __init cma_activate_area(struct cma *cma) -{ - int bitmap_maxno = cma_bitmap_maxno(cma); - int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); - unsigned long base_pfn = cma->base_pfn, pfn = base_pfn; - unsigned i = cma->count >> pageblock_order; - struct zone *zone; - - pr_debug("%s()\n", __func__); - - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); - if (!cma->bitmap) - return -ENOMEM; - - WARN_ON_ONCE(!pfn_valid(pfn)); - zone = page_zone(pfn_to_page(pfn)); - - do { - unsigned j; - base_pfn = pfn; - for (j = pageblock_nr_pages; j; --j, pfn++) { - WARN_ON_ONCE(!pfn_valid(pfn)); - /* -* alloc_contig_range requires the pfn range -* specified to be in the same zone. Make this -* simple by forcing the entire CMA resv range -* to be in the same zone. -*/ - if (page_zone(pfn_to_page(pfn)) != zone) - goto err; - } - init_cma_reserved_pageblock(pfn_to_page(base_pfn)); - } while (--i); - -
[PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework
Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Acked-by: Michal Nazarewicz Acked-by: Paolo Bonzini Signed-off-by: Joonsoo Kim diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 7cde8a6..28ec226 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -16,12 +16,14 @@ #include #include #include +#include #include #include #include -#include "book3s_hv_cma.h" +#define KVM_CMA_CHUNK_ORDER18 + /* * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206) * should be power of 2. @@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5; unsigned long kvm_rma_pages = (1 << 27) >> PAGE_SHIFT; /* 128MB */ EXPORT_SYMBOL_GPL(kvm_rma_pages); +static struct cma *kvm_cma; + /* Work out RMLS (real mode limit selector) field value for a given RMA size. Assumes POWER7 or PPC970. */ static inline int lpcr_rmls(unsigned long rma_size) @@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma() ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL); if (!ri) return NULL; - page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages); + page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages)); if (!page) goto err_out; atomic_set(>use_count, 1); @@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma); void kvm_release_rma(struct kvm_rma_info *ri) { if (atomic_dec_and_test(>use_count)) { - kvm_release_cma(pfn_to_page(ri->base_pfn), kvm_rma_pages); + cma_release(kvm_cma, pfn_to_page(ri->base_pfn), kvm_rma_pages); kfree(ri); } } @@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages) /* Old CPUs require HPT aligned on a multiple of its size */ if (!cpu_has_feature(CPU_FTR_ARCH_206)) align_pages = nr_pages; - return kvm_alloc_cma(nr_pages, align_pages); + return cma_alloc(kvm_cma, nr_pages, get_order(align_pages)); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt); void kvm_release_hpt(struct page *page, unsigned long nr_pages) { - kvm_release_cma(page, nr_pages); + cma_release(kvm_cma, page, nr_pages); } EXPORT_SYMBOL_GPL(kvm_release_hpt); @@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void) align_size = HPT_ALIGN_PAGES << PAGE_SHIFT; align_size = max(kvm_rma_pages << PAGE_SHIFT, align_size); - kvm_cma_declare_contiguous(selected_size, align_size); + cma_declare_contiguous(selected_size, 0, 0, align_size, + KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, _cma, false); } } diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c deleted file mode 100644 index d9d3d85..000 --- a/arch/powerpc/kvm/book3s_hv_cma.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - * Contiguous Memory Allocator for ppc KVM hash pagetable based on CMA - * for DMA mapping framework - * - * Copyright IBM Corporation, 2013 - * Author Aneesh Kumar K.V - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation; either version 2 of the - * License or (at your optional) any later version of the license. - * - */ -#define pr_fmt(fmt) "kvm_cma: " fmt - -#ifdef CONFIG_CMA_DEBUG -#ifndef DEBUG -# define DEBUG -#endif -#endif - -#include -#include -#include -#include - -#include "book3s_hv_cma.h" - -struct kvm_cma { - unsigned long base_pfn; - unsigned long count; - unsigned long *bitmap; -}; - -static DEFINE_MUTEX(kvm_cma_mutex); -static struct kvm_cma kvm_cma_area; - -/** - * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling - * for kvm hash pagetable - * @size: Size of the reserved memory. - * @alignment: Alignment for the contiguous memory area - * - * This function reserves memory for kvm cma area. It should be - * called by arch code when early allocator (memblock or bootmem) - * is still activate. - */ -long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t alignment) -{ - long base_pfn; - phys_addr_t addr; - struct kvm_cma *cma = _cma_area; - - pr_debug("%s(size %lx)\n", __func__, (unsigned long)size); - - if (!size) - return -EINVAL; - /* -* Sanitise input arguments. -* We should be pageblock aligned for CMA. -*/ - alignment = max(alignment, (phys_addr_t)(PAGE_SIZE << pageblock_order)); - size = ALIGN(size, alignment); - /* -* Reserve memory -* Use __memblock_alloc_base() since -* memblock_alloc_base() panic()s. -*/ - addr = __memblock_alloc_base(size, alignment, 0); - if (!addr) { -
[PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity
ppc kvm's cma region management requires arbitrary bitmap granularity, since they want to reserve very large memory and manage this region with bitmap that one bit for several pages to reduce management overheads. So support arbitrary bitmap granularity for following generalization. Signed-off-by: Joonsoo Kim diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index bc4c171..9bc9340 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -38,6 +38,7 @@ struct cma { unsigned long base_pfn; unsigned long count; unsigned long *bitmap; + int order_per_bit; /* Order of pages represented by one bit */ struct mutexlock; }; @@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit) static DEFINE_MUTEX(cma_mutex); +static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order) +{ + return (1 << (align_order >> cma->order_per_bit)) - 1; +} + +static unsigned long cma_bitmap_maxno(struct cma *cma) +{ + return cma->count >> cma->order_per_bit; +} + +static unsigned long cma_bitmap_pages_to_bits(struct cma *cma, + unsigned long pages) +{ + return ALIGN(pages, 1 << cma->order_per_bit) >> cma->order_per_bit; +} + +static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) +{ + unsigned long bitmapno, nr_bits; + + bitmapno = (pfn - cma->base_pfn) >> cma->order_per_bit; + nr_bits = cma_bitmap_pages_to_bits(cma, count); + + mutex_lock(>lock); + bitmap_clear(cma->bitmap, bitmapno, nr_bits); + mutex_unlock(>lock); +} + static int __init cma_activate_area(struct cma *cma) { - int bitmap_size = BITS_TO_LONGS(cma->count) * sizeof(long); + int bitmap_maxno = cma_bitmap_maxno(cma); + int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long); unsigned long base_pfn = cma->base_pfn, pfn = base_pfn; unsigned i = cma->count >> pageblock_order; struct zone *zone; @@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas); * @base: Base address of the reserved area optional, use 0 for any * @limit: End address of the reserved memory (optional, 0 for any). * @alignment: Alignment for the contiguous memory area, should be power of 2 + * @order_per_bit: Order of pages represented by one bit on bitmap. * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area * @@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas); */ static int __init __dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, - phys_addr_t alignment, + phys_addr_t alignment, int order_per_bit, struct cma **res_cma, bool fixed) { struct cma *cma = _areas[cma_area_count]; @@ -269,6 +300,8 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit &= ~(alignment - 1); + /* size should be aligned with order_per_bit */ + BUG_ON(!IS_ALIGNED(size >> PAGE_SHIFT, 1 << order_per_bit)); /* Reserve memory */ if (base && fixed) { @@ -294,6 +327,7 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, */ cma->base_pfn = PFN_DOWN(base); cma->count = size >> PAGE_SHIFT; + cma->order_per_bit = order_per_bit; *res_cma = cma; cma_area_count++; @@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = __dma_contiguous_reserve_area(size, base, limit, 0, + ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0, res_cma, fixed); if (ret) return ret; @@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, return 0; } -static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) -{ - mutex_lock(>lock); - bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count); - mutex_unlock(>lock); -} - /** * dma_alloc_from_contiguous() - allocate pages from contiguous area * @dev: Pointer to device for which the allocation is performed. @@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, unsigned int align) { - unsigned long mask, pfn, pageno, start = 0; + unsigned long mask, pfn, start = 0; + unsigned long bitmap_maxno, bitmapno, nr_bits; struct page *page = NULL; int ret; @@ -358,18 +386,19 @@ static struct page *__dma_alloc_from_contiguous(struct cma *cma,
[PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region
ppc kvm's cma area management needs alignment constraint on cma region. So support it to prepare generalization of cma area management functionality. Additionally, add some comments which tell us why alignment constraint is needed on cma region. Signed-off-by: Joonsoo Kim diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 8a44c82..bc4c171 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -32,6 +32,7 @@ #include #include #include +#include struct cma { unsigned long base_pfn; @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas); * @size: Size of the reserved area (in bytes), * @base: Base address of the reserved area optional, use 0 for any * @limit: End address of the reserved memory (optional, 0 for any). + * @alignment: Alignment for the contiguous memory area, should be power of 2 * @res_cma: Pointer to store the created cma region. * @fixed: hint about where to place the reserved area * @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas); */ static int __init __dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, + phys_addr_t alignment, struct cma **res_cma, bool fixed) { struct cma *cma = _areas[cma_area_count]; - phys_addr_t alignment; int ret = 0; - pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__, -(unsigned long)size, (unsigned long)base, -(unsigned long)limit); + pr_debug("%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n", + __func__, (unsigned long)size, (unsigned long)base, + (unsigned long)limit, (unsigned long)alignment); /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { @@ -253,8 +255,17 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t size, if (!size) return -EINVAL; - /* Sanitise input arguments */ - alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order); + if (alignment && !is_power_of_2(alignment)) + return -EINVAL; + + /* +* Sanitise input arguments. +* CMA area should be at least MAX_ORDER - 1 aligned. Otherwise, +* CMA area could be merged into other MIGRATE_TYPE by buddy mechanism +* and CMA property will be broken. +*/ + alignment = max(alignment, + (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order)); base = ALIGN(base, alignment); size = ALIGN(size, alignment); limit &= ~(alignment - 1); @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, { int ret; - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); + ret = __dma_contiguous_reserve_area(size, base, limit, 0, + res_cma, fixed); if (ret) return ret; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs
To prepare future generalization work on cma area management code, we need to separate core cma management codes from DMA APIs. We will extend these core functions to cover requirements of ppc kvm's cma area management functionality in following patches. This separation helps us not to touch DMA APIs while extending core functions. Signed-off-by: Joonsoo Kim diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index fb0cdce..8a44c82 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas); * If @fixed is true, reserve contiguous area at exactly @base. If false, * reserve in range from @base to @limit. */ -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, - phys_addr_t limit, struct cma **res_cma, - bool fixed) +static int __init __dma_contiguous_reserve_area(phys_addr_t size, + phys_addr_t base, phys_addr_t limit, + struct cma **res_cma, bool fixed) { struct cma *cma = _areas[cma_area_count]; phys_addr_t alignment; @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, pr_info("%s(): reserved %ld MiB at %08lx\n", __func__, (unsigned long)size / SZ_1M, (unsigned long)base); - - /* Architecture specific contiguous memory fixup. */ - dma_contiguous_early_fixup(base, size); return 0; + err: pr_err("%s(): failed to reserve %ld MiB\n", __func__, (unsigned long)size / SZ_1M); return ret; } +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, + phys_addr_t limit, struct cma **res_cma, + bool fixed) +{ + int ret; + + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed); + if (ret) + return ret; + + /* Architecture specific contiguous memory fixup. */ + dma_contiguous_early_fixup(base, size); + + return 0; +} + static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) { mutex_lock(>lock); @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count) * global one. Requires architecture specific dev_get_cma_area() helper * function. */ -struct page *dma_alloc_from_contiguous(struct device *dev, int count, +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count, unsigned int align) { unsigned long mask, pfn, pageno, start = 0; - struct cma *cma = dev_get_cma_area(dev); struct page *page = NULL; int ret; if (!cma || !cma->count) return NULL; - if (align > CONFIG_CMA_ALIGNMENT) - align = CONFIG_CMA_ALIGNMENT; - pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma, count, align); @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count, return page; } +struct page *dma_alloc_from_contiguous(struct device *dev, int count, + unsigned int align) +{ + struct cma *cma = dev_get_cma_area(dev); + + if (align > CONFIG_CMA_ALIGNMENT) + align = CONFIG_CMA_ALIGNMENT; + + return __dma_alloc_from_contiguous(cma, count, align); +} + /** * dma_release_from_contiguous() - release allocated pages * @dev: Pointer to device for which the pages were allocated. @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count, * It returns false when provided pages do not belong to contiguous area and * true otherwise. */ -bool dma_release_from_contiguous(struct device *dev, struct page *pages, +static bool __dma_release_from_contiguous(struct cma *cma, struct page *pages, int count) { - struct cma *cma = dev_get_cma_area(dev); unsigned long pfn; if (!cma || !pages) @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages, return true; } + +bool dma_release_from_contiguous(struct device *dev, struct page *pages, +int count) +{ + struct cma *cma = dev_get_cma_area(dev); + + return __dma_release_from_contiguous(cma, pages, count); +} -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 02/10] DMA, CMA: fix possible memory leak
We should free memory for bitmap when we find zone mis-match, otherwise this memory will leak. Additionally, I copy code comment from ppc kvm's cma code to notify why we need to check zone mis-match. Signed-off-by: Joonsoo Kim diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index bd0bb81..fb0cdce 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma) base_pfn = pfn; for (j = pageblock_nr_pages; j; --j, pfn++) { WARN_ON_ONCE(!pfn_valid(pfn)); + /* +* alloc_contig_range requires the pfn range +* specified to be in the same zone. Make this +* simple by forcing the entire CMA resv range +* to be in the same zone. +*/ if (page_zone(pfn_to_page(pfn)) != zone) - return -EINVAL; + goto err; } init_cma_reserved_pageblock(pfn_to_page(base_pfn)); } while (--i); mutex_init(>lock); return 0; + +err: + kfree(cma->bitmap); + return -EINVAL; } static struct cma cma_areas[MAX_CMA_AREAS]; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 01/10] DMA, CMA: clean-up log message
We don't need explicit 'CMA:' prefix, since we already define prefix 'cma:' in pr_fmt. So remove it. And, some logs print function name and others doesn't. This looks bad to me, so I unify log format to print function name consistently. Lastly, I add one more debug log on cma_activate_area(). Signed-off-by: Joonsoo Kim diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c index 83969f8..bd0bb81 100644 --- a/drivers/base/dma-contiguous.c +++ b/drivers/base/dma-contiguous.c @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit) } if (selected_size && !dma_contiguous_default_area) { - pr_debug("%s: reserving %ld MiB for global area\n", __func__, + pr_debug("%s(): reserving %ld MiB for global area\n", __func__, (unsigned long)selected_size / SZ_1M); dma_contiguous_reserve_area(selected_size, selected_base, @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma) unsigned i = cma->count >> pageblock_order; struct zone *zone; - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); + pr_debug("%s()\n", __func__); + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); if (!cma->bitmap) return -ENOMEM; @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, /* Sanity checks */ if (cma_area_count == ARRAY_SIZE(cma_areas)) { - pr_err("Not enough slots for CMA reserved regions!\n"); + pr_err("%s(): Not enough slots for CMA reserved regions!\n", + __func__); return -ENOSPC; } @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, *res_cma = cma; cma_area_count++; - pr_info("CMA: reserved %ld MiB at %08lx\n", (unsigned long)size / SZ_1M, - (unsigned long)base); + pr_info("%s(): reserved %ld MiB at %08lx\n", + __func__, (unsigned long)size / SZ_1M, (unsigned long)base); /* Architecture specific contiguous memory fixup. */ dma_contiguous_early_fixup(base, size); return 0; err: - pr_err("CMA: failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M); + pr_err("%s(): failed to reserve %ld MiB\n", + __func__, (unsigned long)size / SZ_1M); return ret; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/7 v7] trace, RAS: Add eMCA trace event interface
On Wed, Jun 11, 2014 at 09:02:15PM +0200, Borislav Petkov wrote: > > +EXPORT_SYMBOL_GPL(cper_mem_err_pack); > > Why do we export this one and the one below? What .config warrants this? > > CONFIG_ACPI_EXTLOG=m doesn't need them, AFAICT. > Right. acpi_extlog doesn't use it. They can be exported later until needed. > > + TP_STRUCT__entry( > > + __field(u32, err_seq) > > + __field(u8, etype) > > + __field(u8, sev) > > + __field(u64, pa) > > + __field(u8, pa_mask_lsb) > > + __array(u8, fru_id, 40) > > How did you come up with this magic number? Why isn't that sizeof(uuid_le)? Cause I want to convert it into a string. > > + snprintf(__entry->fru_id, 39, "%pUl", fru_id); > > Yeah, I didn't catch the reasoning behind why we need to convert the FRU > into a string and not leave it simply as u8[16]... Fair enough. It can be compressed a little bit more. signature.asc Description: Digital signature
Re: [PULL] modules-next
Mark Brown writes: > On Wed, Jun 11, 2014 at 03:03:47PM +0930, Rusty Russell wrote: > >> drivers/regulator/virtual: avoid world-writable sysfs files. > > Acked-by: Mark Brown > > if you need to respin - please do send patches to maintainers. If the address in drivers/regulator/virtual.c is incorrect, please update it: Subject: [PATCH 5/9] drivers/regulator/virtual: avoid world-writable sysfs files. To: linux-kernel@vger.kernel.org Cc: Rusty Russell , Mark Brown Date: Tue, 22 Apr 2014 13:03:28 +0930 In line with practice for module parameters, we're adding a build-time check that sysfs files aren't world-writable. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PULL] virtio-next
The following changes since commit ec6931b281797b69e6cf109f9cc94d5a2bf994e0: word-at-a-time: avoid undefined behaviour in zero_bytemask macro (2014-04-27 15:20:05 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git tags/virtio-next-for-linus for you to fetch changes up to c77fba9ab058d1e96ed51d4215e56905c9ef8d2a: virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock. (2014-05-21 11:25:41 +0930) Main excitement is a virtio_scsi fix for alloc holding spinlock on the abort path, which I refuse to CC stable since (1) I discovered it myself, and (2) it's been there forever with no reports. Cheers, Rusty. Amos Kong (1): virtio-rng: support multiple virtio-rng devices Heinz Graalfs (1): virtio_ccw: introduce device_lost in virtio_ccw_device Rusty Russell (2): virtio: virtio_break_device() to mark all virtqueues broken. virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock. Sasha Levin (2): virtio-rng: fix boot with virtio-rng device virtio-rng: fixes for device registration/unregistration drivers/char/hw_random/virtio-rng.c | 105 +++- drivers/s390/kvm/virtio_ccw.c | 49 - drivers/scsi/virtio_scsi.c | 15 +++--- drivers/virtio/virtio_ring.c| 15 ++ include/linux/virtio.h | 2 + 5 files changed, 127 insertions(+), 59 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: console: lockup on boot
On 06/11/2014 05:31 PM, Jan Kara wrote: > On Wed 11-06-14 22:34:36, Jan Kara wrote: >> > On Wed 11-06-14 10:55:55, Sasha Levin wrote: >>> > > On 06/10/2014 11:59 AM, Peter Hurley wrote: > > > On 06/06/2014 03:05 PM, Sasha Levin wrote: > > > >> On 05/30/2014 10:07 AM, Jan Kara wrote: >> > > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote: > > > On 05/30/2014 09:11 AM, Sasha Levin wrote: >> > > >>> Hi all, >> > > >>> >> > > >>> I sometime see lockups when booting my KVM guest with >> > > >>> the latest -next kernel, >> > > >>> it basically hangs right when it should start 'init', >> > > >>> and after a while I get >> > > >>> the following spew: >> > > >>> >> > > >>> [ 30.790833] BUG: spinlock lockup suspected on CPU#1, >> > > >>> swapper/1/0 > > > > > > Maybe related to this report: > > > https://lkml.org/lkml/2014/5/30/26 > > > from Jet Chen which was bisected to > > > > > > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0 > > > Author: Jan Kara > > > AuthorDate: Thu May 22 10:43:35 2014 +1000 > > > Commit: Stephen Rothwell > > > CommitDate: Thu May 22 10:43:35 2014 +1000 > > > > > > printk: enable interrupts before calling > > > console_trylock_for_printk() > > > We need interrupts disabled when calling > > > console_trylock_for_printk() only > > > so that cpu id we pass to can_use_console() remains > > > valid (for other > > > things console_sem provides all the exclusion we need > > > and deadlocks on > > > console_sem due to interrupts are impossible because we > > > use > > > down_trylock()). However if we are rescheduled, we are > > > guaranteed to run > > > on an online cpu so we can easily just get the cpu id in > > > can_use_console(). > > > We can lose a bit of performance when we enable > > > interrupts in > > > vprintk_emit() and then disable them again in > > > console_unlock() but OTOH it > > > can somewhat reduce interrupt latency caused by > > > console_unlock() > > > especially since later in the patch series we will want > > > to spin on > > > console_sem in console_trylock_for_printk(). > > > Signed-off-by: Jan Kara > > > Signed-off-by: Andrew Morton > > > > > > ? >> > > >>>Yeah, very likely. I think I see the problem, I'll send the >> > > >>> fix shortly. > > > >> > > > >> Hi Jan, > > > >> > > > >> It seems that the issue I'm seeing is different from the "[prink] > > > >> BUG: spinlock > > > >> lockup suspected on CPU#0, swapper/1". > > > >> > > > >> Is there anything else I could try here? The issue is very common > > > >> during testing. > > > > > > Sasha, > > > > > > Is this bisectable? Maybe that's the best way forward here. >>> > > >>> > > I've ran a bisection again and ended up at the same commit as Jet Chen >>> > > (the commit unfortunately already made it to Linus's tree). >>> > > >>> > > Note that I did try Jan's proposed fix and that didn't solve the issue >>> > > for me, I believe we're seeing different issues caused by the same >>> > > commit. >> > Sorry it has been busy time lately and I didn't have as much time to look >> > into this as would be needed. > Oops, pressed send too early... So I have two debug patches for you. Can > you try whether the problem reproduces with the first one or with both of > them applied? The first patch fixed it (I assumed that there's no need to try the second). Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: timekeeping: exiting task with held timekeeping locks
On 06/11/2014 07:30 PM, John Stultz wrote: > On Wed, Jun 11, 2014 at 4:04 PM, Sasha Levin wrote: >> > Hi all, >> > >> > While fuzzing with trinity inside a KVM tools guest running the latest >> > -next >> > kernel I've stumbled on the following spew: >> > >> > [ 3460.136058] = >> > [ 3460.138017] [ BUG: trinity-c70/27193 still has locks held! ] >> > [ 3460.141491] 3.15.0-next-20140611-sasha-00022-g9466d2f-dirty #638 Not >> > tainted >> > [ 3460.143219] - >> > [ 3460.167979] 2 locks held by trinity-c70/27193: >> > [ 3460.169172] #0: (tick_broadcast_lock){-.-.-.}, at: >> > tick_handle_periodic_broadcast (kernel/time/tick-broadcast.c:301) >> > [ 3460.468004] #1: (timekeeper_lock){-.-.-.}, at: update_wall_time >> > (kernel/time/timekeeping.c:1371) >> > [ 3460.920025] >> > [ 3460.920025] stack backtrace: >> > [ 3460.928146] CPU: 0 PID: 27193 Comm: trinity-c70 Not tainted >> > 3.15.0-next-20140611-sasha-00022-g9466d2f-dirty #638 >> > [ 3460.928648] can: request_module (can-proto-3) failed. >> > [ 3460.943111] 8800576ef4c8 8800576efc88 a551093c >> > 0001 >> > [ 3460.962511] 880056f9b000 8800576efca8 a21c6a43 >> > 880056f9bbe8 >> > [ 3461.007184] 880056f9bbe8 8800576efd48 >> > [ 3461.017661] can: request_module (can-proto-0) failed. >> > [ 3461.045536] a21636ea 8800576efcc8 >> > [ 3461.170992] Call Trace: >> > [ 3461.174122] dump_stack (lib/dump_stack.c:52) >> > [ 3461.558864] debug_check_no_locks_held (kernel/locking/lockdep.c:4107 >> > kernel/locking/lockdep.c:4113) >> > [ 3461.577066] do_exit (kernel/exit.c:796) >> > [ 3461.592523] ? debug_smp_processor_id (lib/smp_processor_id.c:57) >> > [ 3461.629067] ? _raw_spin_unlock_irq >> > (./arch/x86/include/asm/paravirt.h:819 >> > include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199) >> > [ 3461.671525] do_group_exit (kernel/exit.c:884) >> > [ 3461.717091] get_signal_to_deliver (kernel/signal.c:2347) >> > [ 3461.724142] ? vtime_account_user (kernel/sched/cputime.c:687) >> > [ 3461.800505] do_signal (arch/x86/kernel/signal.c:698) >> > [ 3461.808792] ? vtime_account_user (kernel/sched/cputime.c:687) >> > [ 3461.812780] ? preempt_count_sub (kernel/sched/core.c:2602) >> > [ 3461.824601] ? context_tracking_user_exit >> > (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) >> > kernel/context_tracking.c:182 (discriminator 2)) >> > [ 3461.827619] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) >> > [ 3461.831486] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 >> > kernel/locking/lockdep.c:2599) >> > [ 3461.841516] do_notify_resume (arch/x86/kernel/signal.c:751) >> > [ 3461.847056] retint_signal (arch/x86/kernel/entry_64.S:921) > > Huh.. Got me.. I don't see how the task can get out of > update_wall_time() w/o releasing the timekeeper_lock. Same with the > tick_broadcast_lock. Does this happen all the time or was this a > one-off? One-off, only seen once. Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: Bad page state in process swapper pfn:00000
On 6/11/2014 12:19 PM, Geert Uytterhoeven wrote: > Hi Laura, > > On Wed, Jun 11, 2014 at 7:32 PM, Laura Abbott wrote: >> On 6/11/2014 4:40 AM, Geert Uytterhoeven wrote: >>> With current mainline, I get an early crash on r8a7791/koelsch: >>> >>> BUG: Bad page state in process swapper pfn:0 >>> page:ee20b000 count:0 mapcount:0 mapping:66756200 index:0x65726566 >>> page flags: >>> 0x74656b63(locked|error|lru|active|owner_priv_1|arch_1|private|writeback|head|swapcache >>> |reclaim|mlocked) >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> bad because of flags: >>> page flags: 0x212861(locked|lru|active|private|writeback|swapcache|mlocked) >>> >>> I bisected it to >>> >>> commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76 >>> Author: Laura Abbott >>> Date: Sun Apr 13 22:54:58 2014 +0100 >>> >>> ARM: 8025/1: Get rid of meminfo > >>> -Truncating RAM at 4000-bfff to -6f7f (vmalloc region overlap). >>> +Truncating RAM at 0x-0xc000 to -0x6f80 >> >> I'm guessing this is the issue right there. >> >> memory@4000 { >> device_type = "memory"; >> reg = <0 0x4000 0 0x4000>; >> }; >> >> memory@2 { >> device_type = "memory"; >> reg = <2 0x 0 0x4000>; >> }; >> >> Those are the memory nodes from r8a7791-koelsch.dts. It looks like the memory >> outside 32-bit address range is not being dropped. It was suggested to drop >> early_init_dt_add_memory_arch which called arm_add_memory and just use the >> generic of code directly but the problem is arm_add_memory does additional >> bounds checking. It looks like early_init_dt_add_memory_arch in >> drivers/of/fdt.c checks for overflow on u64 types but not for overflow >> on phys_addr_t (32 bits) which is what memblock_add actually uses. >> >> For a quick test, can you try bringing back early_init_dt_add_memory_arch >> and see if that fixes the problem: >> >> diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c >> index e94a157..ea9ce92 100644 >> --- a/arch/arm/kernel/devtree.c >> +++ b/arch/arm/kernel/devtree.c >> @@ -27,6 +27,10 @@ >> #include >> #include >> >> +void __init early_init_dt_add_memory_arch(u64 base, u64 size) >> +{ >> + arm_add_memory(base, size); >> +} >> >> #ifdef CONFIG_SMP >> extern struct of_cpu_method __cpu_method_of_table[]; > > Thanks, my board boots again after applying this quick hack. > Great! Russell are you okay with taking the above as a fix or would you prefer I fixup drivers/of/fdt.c right now? Thanks, Laura 8< >From 14bda557a108ad197e7c5f040f50ca024b45cc17 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Wed, 11 Jun 2014 19:39:29 -0700 Subject: [PATCH] arm: Bring back early_init_dt_add_memory_arch Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) removed early_init_dt_add_memory_arch in favor of using the common method. The common method does not currently check for memory outside of 32-bit bounds which may lead to memory being incorrectly added to the system. Bring back early_init_dt_add_memory_arch for now until the generic function can be fixed up. Reported-by: Geert Uytterhoeven Signed-off-by: Laura Abbott --- arch/arm/kernel/devtree.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c index e94a157..ea9ce92 100644 --- a/arch/arm/kernel/devtree.c +++ b/arch/arm/kernel/devtree.c @@ -27,6 +27,10 @@ #include #include +void __init early_init_dt_add_memory_arch(u64 base, u64 size) +{ + arm_add_memory(base, size); +} #ifdef CONFIG_SMP extern struct of_cpu_method __cpu_method_of_table[]; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] sctp: Fix sk_ack_backlog wrap-around problem
Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman Signed-off-by: Xufeng Zhang --- Change for v2: Drop the redundant test for temp suggested by Vlad Yasevich. net/sctp/associola.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 39579c3..0b8 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -330,7 +330,7 @@ void sctp_association_free(struct sctp_association *asoc) /* Only real associations count against the endpoint, so * don't bother for if this is a temporary association. */ - if (!asoc->temp) { + if (!list_empty(>asocs)) { list_del(>asocs); /* Decrement the backlog value for a TCP-style listening -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] nouveau: rename aux.c to auxiliary.c for reviewing it on Windows
On 06/11/2014 04:24 PM, Borislav Petkov wrote: > On Wed, Jun 11, 2014 at 03:53:55PM +0800, Lai Jiangshan wrote: >> When I tried to review the linux kernel on Windows in my laptop >> and incidentally found that it failed to open the aux.c. >> >> And Microsoft tells me: >> (http://msdn.microsoft.com/en-us/library/aa365247.aspx) >> >>> Do not use the following reserved names for the name of a file: >>> CON, PRN, AUX, NUL, , and LPT9. Also avoid these names >>> followed immediately by an extension; for example, NUL.txt >>> is not recommended. >> >> The name "aux" is listed above. And it sometimes makes sense to >> review linux on windows, so we rename the aux.c to auxiliary.c. > > I think you missed April 1st by more than 2 months. > EVERY DAY IS APRIL FIRST if you fool others' convenience. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Proposal to realize hot-add *several sections one time*
On 2014/6/12 6:08, David Rientjes wrote: > On Wed, 11 Jun 2014, Zhang Zhen wrote: > >> Hi, >> >> Now we can hot-add memory by >> >> % echo start_address_of_new_memory > /sys/devices/system/memory/probe >> >> Then, [start_address_of_new_memory, start_address_of_new_memory + >> memory_block_size] memory range is hot-added. >> >> But we can only hot-add *one section one time* by this way. >> Whether we can add an argument on behalf of the count of the sections to add >> ? >> So we can can hot-add *several sections one time*. Just like: >> > > Not necessarily true, it depends on sections_per_block. Don't believe > Documentation/memory-hotplug.txt that suggests this is only for powerpc, > x86 and sh allow this interface as well. > >> % echo start_address_of_new_memory count_of_sections > >> /sys/devices/system/memory/probe >> >> Then, [start_address_of_new_memory, start_address_of_new_memory + >> count_of_sections * memory_block_size] memory range is hot-added. >> >> If this proposal is reasonable, i will send a patch to realize it. >> > > The problem is knowing how much memory is being onlined so that you can > definitively determine what count_of_sections should be. The number of > pages per memory section depends on PAGE_SIZE and SECTION_SIZE_BITS which > differ depending on the architectures that support this interface. So if > you support count_of_sections, it would return errno even though you have > onlined some sections. > Hum, sorry. My expression is not right. The count of sections one time hot-added depends on sections_per_block. Now we are porting the memory-hotplug to arm. But we can only hot-add *fixed number of sections one time* on particular architecture. Whether we can add an argument on behalf of the count of the blocks to add ? % echo start_address_of_new_memory count_of_blocks > /sys/devices/system/memory/probe Then, [start_address_of_new_memory, start_address_of_new_memory + count_of_blocks * memory_block_size] memory range is hot-added. So user don't need execute several times of echo when they want to hot add multi-block size memory. Any comments are welcome. Best regards! > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] nouveau: rename aux.c to auxiliary.c for reviewing it on Windows
On 06/11/2014 04:24 PM, Borislav Petkov wrote: > On Wed, Jun 11, 2014 at 03:53:55PM +0800, Lai Jiangshan wrote: >> When I tried to review the linux kernel on Windows in my laptop >> and incidentally found that it failed to open the aux.c. >> >> And Microsoft tells me: >> (http://msdn.microsoft.com/en-us/library/aa365247.aspx) >> >>> Do not use the following reserved names for the name of a file: >>> CON, PRN, AUX, NUL, , and LPT9. Also avoid these names >>> followed immediately by an extension; for example, NUL.txt >>> is not recommended. >> >> The name "aux" is listed above. And it sometimes makes sense to >> review linux on windows, so we rename the aux.c to auxiliary.c. > > I think you missed April 1st by more than 2 months. > EVERY DAY IS APRIL FIRST if you fool others' convenience. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 09/10] mm, compaction: try to capture the just-created high-order freepage
On Wed, Jun 11, 2014 at 04:56:49PM +0200, Vlastimil Babka wrote: > On 06/09/2014 11:26 AM, Vlastimil Babka wrote: > > Compaction uses watermark checking to determine if it succeeded in creating > > a high-order free page. My testing has shown that this is quite racy and it > > can happen that watermark checking in compaction succeeds, and moments later > > the watermark checking in page allocation fails, even though the number of > > free pages has increased meanwhile. > > > > It should be more reliable if direct compaction captured the high-order free > > page as soon as it detects it, and pass it back to allocation. This would > > also reduce the window for somebody else to allocate the free page. > > > > This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture > > a > > suitable high-order page immediately when it is made available"), but later > > reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable > > high-order page") due to flaws. > > > > This patch differs from the previous attempt in two aspects: > > > > 1) The previous patch scanned free lists to capture the page. In this patch, > > only the cc->order aligned block that the migration scanner just > > finished > > is considered, but only if pages were actually isolated for migration in > > that block. Tracking cc->order aligned blocks also has benefits for the > > following patch that skips blocks where non-migratable pages were found. > > Generally I like this. > > 2) In this patch, the isolated free page is allocated through extending > > get_page_from_freelist() and buffered_rmqueue(). This ensures that it > > gets > > all operations such as prep_new_page() and page->pfmemalloc setting that > > was missing in the previous attempt, zone statistics are updated etc. > > But this part is problem. Capturing is not common but you are adding more overhead in hotpath for rare cases where even they are ok to fail so it's not a good deal. In such case, We have no choice but to do things you mentioned (ex,statistics, prep_new_page, pfmemalloc) manually in __alloc_pages_direct_compact. > > Evaluation is pending. > > Uh, so if anyone wants to test it, here's a fixed version, as initial > evaluation > showed it does not actually capture anything (which should not affect patch > 10/10 > though) and debugging this took a while. > > - for pageblock_order (i.e. THP), capture was never attempted, as the for > cycle > in isolate_migratepages_range() has ended right before the > low_pfn == next_capture_pfn check > - lru_add_drain() has to be done before pcplists drain. This made a big > difference > (~50 successful captures -> ~1300 successful captures) > Note that __alloc_pages_direct_compact() is missing lru_add_drain() as > well, and > all the existing watermark-based compaction termination decisions (which > happen > before the drain in __alloc_pages_direct_compact()) don't do any draining > at all. > > -8<- > From: Vlastimil Babka > Date: Wed, 28 May 2014 17:05:18 +0200 > Subject: [PATCH fixed 09/10] mm, compaction: try to capture the just-created > high-order freepage > > Compaction uses watermark checking to determine if it succeeded in creating > a high-order free page. My testing has shown that this is quite racy and it > can happen that watermark checking in compaction succeeds, and moments later > the watermark checking in page allocation fails, even though the number of > free pages has increased meanwhile. > > It should be more reliable if direct compaction captured the high-order free > page as soon as it detects it, and pass it back to allocation. This would > also reduce the window for somebody else to allocate the free page. > > This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a > suitable high-order page immediately when it is made available"), but later > reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable > high-order page") due to flaws. > > This patch differs from the previous attempt in two aspects: > > 1) The previous patch scanned free lists to capture the page. In this patch, >only the cc->order aligned block that the migration scanner just finished >is considered, but only if pages were actually isolated for migration in >that block. Tracking cc->order aligned blocks also has benefits for the >following patch that skips blocks where non-migratable pages were found. > > 2) In this patch, the isolated free page is allocated through extending >get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets >all operations such as prep_new_page() and page->pfmemalloc setting that >was missing in the previous attempt, zone statistics are updated etc. > > Evaluation is pending. > > Signed-off-by: Vlastimil Babka > Cc: Minchan Kim > Cc: Mel Gorman > Cc: Joonsoo Kim > Cc: Michal Nazarewicz > Cc: Naoya Horiguchi > Cc: Christoph
Re: [PATCH 12/14] block: Add specific data integrity errors
> "Christoph" == Christoph Hellwig writes: >> Introduce a set of error codes that can be used by the block >> integrity subsystem to signal which class of error was encountered by >> either the I/O controller or the storage device. Christoph> I'd also love to see something catching these so that they Christoph> don't leak to userspace. This patch was really meant as an RFC. But it is absolutely my intent to expose these to userspace. Albeit only to applications that supply or request protection information via Darrick's aio extensions. I also use these errors extensively in my test utilities to verify that the correct problem gets detected by the correct entity when I inject an error. I should add that in the past I had a separate error status inside the bip that contained the data integrity specific errors. But that involved all sorts of evil hacks when bios were cloned, split and stacked. After talking to nab about his needs for target I figured it was better to just define new error codes and handle them like Hannes did for the extended SCSI errors. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: build failure after merge of the net-next tree
Hi all, After merging the net-next tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: net/bridge/br_multicast.c: In function 'br_multicast_has_querier_adjacent': net/bridge/br_multicast.c:2248:25: error: 'struct net_bridge' has no member named 'ip6_other_query' if (!timer_pending(>ip6_other_query.timer) || ^ In file included from include/linux/idr.h:18:0, from include/linux/kernfs.h:14, from include/linux/sysfs.h:15, from include/linux/kobject.h:21, from include/linux/device.h:17, from include/linux/dma-mapping.h:5, from arch/powerpc/include/asm/machdep.h:14, from arch/powerpc/include/asm/archrandom.h:6, from include/linux/random.h:81, from include/linux/net.h:22, from include/linux/skbuff.h:27, from include/linux/if_ether.h:23, from net/bridge/br_multicast.c:15: net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member named 'ip6_querier' rcu_dereference(br->ip6_querier.port) == port) ^ net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member named 'ip6_querier' rcu_dereference(br->ip6_querier.port) == port) ^ In file included from include/linux/err.h:4:0, from net/bridge/br_multicast.c:13: net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member named 'ip6_querier' rcu_dereference(br->ip6_querier.port) == port) ^ net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member named 'ip6_querier' rcu_dereference(br->ip6_querier.port) == port) ^ In file included from include/linux/idr.h:18:0, from include/linux/kernfs.h:14, from include/linux/sysfs.h:15, from include/linux/kobject.h:21, from include/linux/device.h:17, from include/linux/dma-mapping.h:5, from arch/powerpc/include/asm/machdep.h:14, from arch/powerpc/include/asm/archrandom.h:6, from include/linux/random.h:81, from include/linux/net.h:22, from include/linux/skbuff.h:27, from include/linux/if_ether.h:23, from net/bridge/br_multicast.c:15: net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member named 'ip6_querier' rcu_dereference(br->ip6_querier.port) == port) ^ Caused by commit 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port"). This build has CONFIG_IPV6 not set. I have reverted that commit for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
Re: [PATCH v2] powerpc: Avoid circular dependency with zImage.%
This v2 patch is good, Tested-by: Mike Qiu On 06/11/2014 11:40 PM, Michal Marek wrote: The rule to create the final images uses a zImage.% pattern. Unfortunately, this also matches the names of the zImage.*.lds linker scripts, which appear as a dependency of the final images. This somehow worked when $(srctree) used to be an absolute path, but now the pattern matches too much. List only the images from $(image-y) as the target of the rule, to avoid the circular dependency. Signed-off-by: Michal Marek --- v2: - Filter out duplicates in the target list - fix the platform argument to cmd_wrap arch/powerpc/boot/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 426dce7..ccc25ed 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -333,8 +333,8 @@ $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz $(obj)/zImage.initrd.%: vmlinux $(wrapperbits) $(call if_changed,wrap,$*,,,$(obj)/ramdisk.image.gz) -$(obj)/zImage.%: vmlinux $(wrapperbits) - $(call if_changed,wrap,$*) +$(addprefix $(obj)/, $(sort $(filter zImage.%, $(image-y: vmlinux $(wrapperbits) + $(call if_changed,wrap,$(subst $(obj)/zImage.,,$@)) # dtbImage% - a dtbImage is a zImage with an embedded device tree blob $(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] sched: Rework migrate_tasks()
On Wed, 2014-06-11 at 23:33 +0400, Kirill Tkhai wrote: > ะ ะกั, 11/06/2014 ะฒ 17:43 +0400, Kirill Tkhai ะฟะธัะตั: > > > > 11.06.2014, 17:15, "Srikar Dronamraju" : > > >>> * Kirill Tkhai [2014-06-11 13:52:10]: > > Currently migrate_tasks() skips throttled tasks, > > because they are not pickable by pick_next_task(). > > >>> Before migrate_tasks() is called, we do call set_rq_offline(), in > > >>> migration_call(). > > >>> > > >>> Shouldnt this take care of unthrottling the tasks and making sure that > > >>> they can be picked by pick_next_task(). > > >> If we do this separate for every class, we'll have to do this 3 times. > > >> Furthermore, deadline class does not have a list of throttled tasks. > > >> So we'll have to the same as I did: to lock tasklist_lock and to iterate > > >> throw all of the tasks in the system just to found deadline tasks. > > > > > > I think you misread my comment. > > > > > > Currently migrate_task() gets called from migration_call() and in the > > > migration_call() before migrate_tasks(), set_rq_offline() should put > > > tasks back using unthrottle_cfs_rq(). > > > > > > So my question is: Why are these tasks not getting unthrottled > > > through we are calling set_rq_offline? To me set_rq_offline is > > > calling the actual sched class routines to do the needful. > > > > > > I can understand about deadline tasks, because we don't have a deadline > > > But thats the only tasks that we need to fix. > > > > Hm, I tested that on fair class tasks. They used to disappear from > > /proc/sched_debug and used to hang. I'll check all once again. > > > > I'm agree with you, if set_rq_offline() already presents, we should use it. > > > > /me went to clarify why it does not work in my test. > > Ok, it looks like the problem is that unthrottled cfs_rq may become throttled > again ;) Dejavu. You could try either of the below. On Thu, Apr 03, 2014 at 10:02:18AM +0200, Mike Galbraith wrote: > Prevent large wakeup latencies from being accounted to the wrong task. > > Cc: > Signed-off-by:Mike Galbraith > --- > kernel/sched/core.c |7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -118,7 +118,12 @@ void update_rq_clock(struct rq *rq) > { > s64 delta; > > - if (rq->skip_clock_update > 0) > + /* > + * Set during wakeup to indicate we are on the way to schedule(). > + * Decrement to ensure that a very large latency is not accounted > + * to the wrong task. > + */ > + if (rq->skip_clock_update-- > 0) > return; > > delta = sched_clock_cpu(cpu_of(rq)) - rq->clock; OK; so as previously mentioned (Oct '13); I've entirely had it with skip_clock_update bugs, so I got angry and did the below. Its not something I can merge, not least because it uses trace_printk(), but it should be usable to 1) demonstate the above actually helps and 2) make damn sure we got it right this time :-) I've not really stared at the output much yet; but when you select function_graph tracer; we get lovely things like: 8) | wake_up_process() { 8) | try_to_wake_up() { 8) 0.076 us| _raw_spin_lock_irqsave(); 8) 0.092 us| task_waking_fair(); 8) 0.106 us| select_task_rq_fair(); 8) 0.161 us| _raw_spin_lock(); 8) | ttwu_do_activate.constprop.103() { 8) | activate_task() { 8) | enqueue_task() { 8) | update_rq_clock() { 8) | /* clock update: 420411 */ 8) 0.084 us| sched_avg_update(); 8) 1.277 us|} 8) | enqueue_task_fair() { 8) | enqueue_entity() { 8) 0.083 us| update_curr(); 8) 0.071 us| __compute_runnable_contrib(); 8) 0.074 us| __update_entity_load_avg_contrib(); 8) 0.121 us| update_cfs_rq_blocked_load(); 8) 0.236 us|
linux-next: manual merge of the net-next tree with Linus' tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in drivers/infiniband/hw/cxgb4/cm.c between commits 11b8e22d4d09 ("RDMA/cxgb4: Fix vlan support") and 9eccfe109b27 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service") from Linus' tree and commits 92e7ae71726c ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections") and b408ff282dda ("iw_cxgb4: don't truncate the recv window size") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/infiniband/hw/cxgb4/cm.c index 96d7131ab974,965eaafd5851.. --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@@ -533,38 -532,17 +537,49 @@@ static int send_abort(struct c4iw_ep *e return c4iw_l2t_send(>com.dev->rdev, skb, ep->l2t); } +/* + * c4iw_form_pm_msg - Form a port mapper message with mapping info + */ +static void c4iw_form_pm_msg(struct c4iw_ep *ep, + struct iwpm_sa_data *pm_msg) +{ + memcpy(_msg->loc_addr, >com.local_addr, + sizeof(ep->com.local_addr)); + memcpy(_msg->rem_addr, >com.remote_addr, + sizeof(ep->com.remote_addr)); +} + +/* + * c4iw_form_reg_msg - Form a port mapper message with dev info + */ +static void c4iw_form_reg_msg(struct c4iw_dev *dev, + struct iwpm_dev_data *pm_msg) +{ + memcpy(pm_msg->dev_name, dev->ibdev.name, IWPM_DEVNAME_SIZE); + memcpy(pm_msg->if_name, dev->rdev.lldi.ports[0]->name, + IWPM_IFNAME_SIZE); +} + +static void c4iw_record_pm_msg(struct c4iw_ep *ep, + struct iwpm_sa_data *pm_msg) +{ + memcpy(>com.mapped_local_addr, _msg->mapped_loc_addr, + sizeof(ep->com.mapped_local_addr)); + memcpy(>com.mapped_remote_addr, _msg->mapped_rem_addr, + sizeof(ep->com.mapped_remote_addr)); +} + + static void best_mtu(const unsigned short *mtus, unsigned short mtu, +unsigned int *idx, int use_ts) + { + unsigned short hdr_size = sizeof(struct iphdr) + + sizeof(struct tcphdr) + + (use_ts ? 12 : 0); + unsigned short data_size = mtu - hdr_size; + + cxgb4_best_aligned_mtu(mtus, hdr_size, data_size, 8, idx); + } + static int send_connect(struct c4iw_ep *ep) { struct cpl_act_open_req *req; @@@ -583,14 -561,11 +598,15 @@@ int sizev6 = is_t4(ep->com.dev->rdev.lldi.adapter_type) ? sizeof(struct cpl_act_open_req6) : sizeof(struct cpl_t5_act_open_req6); - struct sockaddr_in *la = (struct sockaddr_in *)>com.local_addr; - struct sockaddr_in *ra = (struct sockaddr_in *)>com.remote_addr; - struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)>com.local_addr; - struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)>com.remote_addr; + struct sockaddr_in *la = (struct sockaddr_in *) + >com.mapped_local_addr; + struct sockaddr_in *ra = (struct sockaddr_in *) + >com.mapped_remote_addr; + struct sockaddr_in6 *la6 = (struct sockaddr_in6 *) + >com.mapped_local_addr; + struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *) + >com.mapped_remote_addr; + int win; wrlen = (ep->com.remote_addr.ss_family == AF_INET) ? roundup(sizev4, 16) : @@@ -1796,7 -1821,8 +1862,8 @@@ static int import_ep(struct c4iw_ep *ep step = cdev->rdev.lldi.nrxq / cdev->rdev.lldi.nchan; ep->rss_qid = cdev->rdev.lldi.rxq_ids[ - cxgb4_port_idx(n->dev) * step]; + cxgb4_port_idx(pdev) * step]; + set_tcp_window(ep, (struct port_info *)netdev_priv(pdev)); if (clear_mpa_v1) { ep->retry_with_mpa_v1 = 0; signature.asc Description: PGP signature
Re: [PATCH v5 4/4] drivers: net: Add APM X-Gene SoC ethernet driver support.
On Thu, Jun 5, 2014 at 12:45 AM, David Miller wrote: > From: Iyappan Subramanian > Date: Mon, 2 Jun 2014 12:39:14 -0700 > >> + netdev_err(ndev, "LERR: %d ring_num: %d ", status, ring->num); >> + switch (status) { >> + case HBF_READ_DATA: >> + netdev_err(ndev, "HBF read data error\n"); >> + break; > > This is not really appropriate. > > We have statistics like the ones you are incrementing in this > function as the mechanism people can use to learn what events > happened on an interface, and how many times they happened. > > Therefore, emitting a log message for each one of those events too is > not necessary. > > We don't emit a netdev_err() for every packet that the IPv4 stack > drops due to a bad checksum, for example. > > Please get rid of this. Sure. I will remove the error message prints and clean up the function. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/char/random.c: More futzing about
> Sadly I can't find the tree, but I'm 94% sure it was Skein-256 > (specifically the SHA3-256 candidate parameter set.) It would be nice to have two hash functions, optimized separately for 32- and 64-bit processors. As the Skein report says, the algorithm can be adapted to 32 bits easily enough. I also did some work a while ago to adapt the Skein parameter search code to develop a Skein-192 (6x32 bits) that would fit into registers on x86-32. (It got stalled when I e-mailed Niels Ferguson about it and never heard back; it fell off the to-do list while I was waiting.) The intended target was IPv6 address hashing for sequence number randomization, but it could be used for pool hashing, too. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Input: evdev - Fix incorrect kfree of err_free_client after vzalloc
This bug was introduced by commit 92eb77d ("Input: evdev - fall back to vmalloc for client event buffer"). vzalloc is used to alloc memory as fallback in case of failure of kzalloc. But err_free_client was not considered on below case. 1. kzalloc fail 2. vzalloc success 3. evdev_open_device fail 4. kfree So that address checking is needed to call correct free function. Signed-off-by: Yongtaek Lee Reviewed-by: Daniel Stone --- drivers/input/evdev.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c index ce953d8..f60daa0 100644 --- a/drivers/input/evdev.c +++ b/drivers/input/evdev.c @@ -422,7 +422,10 @@ static int evdev_open(struct inode *inode, struct file *file) err_free_client: evdev_detach_client(evdev, client); - kfree(client); + if (is_vmalloc_addr(client)) + vfree(client); + else + kfree(client); return error; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/3] lib: glob.c: Add CONFIG_GLOB_SELFTEST
>> Persuading GCC to throw away *all* the self-test data after running >> it was surprisingly annoying. > > Yeah. Props for making the attempt. *Whew*. I was worried I'd get upbraided for overoptimziation. >> The one thing I'm not really sure about is what to do if the self-test >> fails. For now, I make the module_init function fail too. Opinions? > > The printk should suffice - someone will notice it eventually. > > Using KERN_ERR to report a failure might help draw attention to it. I'm not sure what you mean by "might"; I already *do* report it as KERN_ERR. If you think failing the module load is a bad idea, feel free to modify the patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 4/5] kernel/rcu/tree.c:3435 fix a sparse warning
On Wed, Jun 11, 2014 at 5:25 PM, wrote: > On Wed, Jun 11, 2014 at 04:39:42PM -0400, Pranith Kumar wrote: >> kernel/rcu/tree.c:3435:21: warning: incorrect type in argument 1 (different >> modifiers) >> kernel/rcu/tree.c:3435:21:expected int ( *threadfn )( ... ) >> kernel/rcu/tree.c:3435:21:got int ( static [toplevel] [noreturn] >> * )( ... ) >> >> by removing __noreturn attribute and adding unreachable() as suggested on the >> mailing list: http://www.kernelhub.org/?p=2=436683 >> >> Signed-off-by: Pranith Kumar > > No, we should not do this. And the mailing list post you point to seems > to explicitly recommend using noreturn rather than unreachable. > > If sparse doesn't understand this, that's a bug in sparse, not in the > kernel. Sparse needs to understand that it's OK to drop noreturn from a > function pointer type, just not OK to add it. > > Rationale: If you call a noreturn function through a non-noreturn > function pointer, you might end up with unnecessary cleanup code, but > the call will work. If you call a non-noreturn function through a > noreturn function pointer, the caller will not expect a return, and may > crash; *that* should require a cast. > Yes, I understand the rationale. I think this should be fixed in sparse. Please drop this patch. Thanks! -- Pranith -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest
On 6/11/2014 6:54 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance of an uncontended lock-unlock operation by about 1-2% mainly due to the use of a static key. However, uncontended lock-unlock operation are really just a tiny percentage of a real workload. So there should no noticeable change in application performance. No, entirely unacceptable. +#ifdef CONFIG_VIRT_UNFAIR_LOCKS +/** + * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queue_spin_trylock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (!qlock->locked && (cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +/** + * queue_spin_lock_unfair - acquire a queue spinlock unfairly + * @lock: Pointer to queue spinlock structure + */ +static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + if (likely(cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0)) + return; + /* +* Since the lock is now unfair, we should not activate the 2-task +* pending bit spinning code path which disallows lock stealing. +*/ + queue_spin_lock_slowpath(lock, -1); +} Why is this needed? I added the unfair version of lock and trylock as my original version isn't a simple test-and-set lock. Now I changed the core part to use the simple test-and-set lock. However, I still think that an unfair version in the fast path can be helpful to performance when both the unfair lock and paravirt spinlock are enabled. In this case, paravirt spinlock code will disable the unfair lock code in the slowpath, but still allow the unfair version in the fast path to get the best possible performance in a virtual guest. Yes, I could take that out to allow either unfair or paravirt spinlock, but not both. I do think that a little bit of unfairness will help in the virtual environment. +/* + * Redefine arch_spin_lock and arch_spin_trylock as inline functions that will + * jump to the unfair versions if the static key virt_unfairlocks_enabled + * is true. + */ +#undef arch_spin_lock +#undef arch_spin_trylock +#undef arch_spin_lock_flags + +/** + * arch_spin_lock - acquire a queue spinlock + * @lock: Pointer to queue spinlock structure + */ +static inline void arch_spin_lock(struct qspinlock *lock) +{ + if (static_key_false(_unfairlocks_enabled)) + queue_spin_lock_unfair(lock); + else + queue_spin_lock(lock); +} + +/** + * arch_spin_trylock - try to acquire the queue spinlock + * @lock : Pointer to queue spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static inline int arch_spin_trylock(struct qspinlock *lock) +{ + if (static_key_false(_unfairlocks_enabled)) + return queue_spin_trylock_unfair(lock); + else + return queue_spin_trylock(lock); +} So I really don't see the point of all this? Why do you need special {try,}lock paths for this case? Are you worried about the upper 24bits? No, as I said above. I was planning for the coexistence of unfair lock in the fast path and paravirt spinlock in the slowpath. diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index ae1b19d..3723c83 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -217,6 +217,14 @@ static __always_inline int try_set_locked(struct qspinlock *lock) { struct __qspinlock *l = (void *)lock; +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* +* Need to use atomic operation to grab the lock when lock stealing +* can happen. +*/ + if (static_key_false(_unfairlocks_enabled)) + return cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0; +#endif barrier(); ACCESS_ONCE(l->locked) = _Q_LOCKED_VAL; barrier(); Why? If we have a simple test-and-set lock like below, we'll never get here at all. Again, it is due the coexistence of unfair lock in fast path and paravirt spinlock in the slowpath. @@ -252,6 +260,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* +* A simple test and set unfair lock +*/ + if (static_key_false(_unfairlocks_enabled)) { + cpu_relax();/* Relax after a failed lock attempt */ Meh, I don't think anybody can tell the difference if you put that in or not, therefore don't. Yes, I can take out the cpu_relax() here. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
Re: [PATCH] sctp: Fix sk_ack_backlog wrap-around problem
On 06/12/2014 12:55 AM, Vlad Yasevich wrote: On 06/11/2014 08:55 AM, Vlad Yasevich wrote: On 06/10/2014 10:37 PM, Xufeng Zhang wrote: Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by: Neil Horman Signed-off-by: Xufeng Zhang --- net/sctp/associola.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 39579c3..60564f2 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -330,7 +330,7 @@ void sctp_association_free(struct sctp_association *asoc) /* Only real associations count against the endpoint, so * don't bother for if this is a temporary association. */ - if (!asoc->temp) { + if (!asoc->temp&& !list_empty(>asocs)) { list_del(>asocs); /* Decrement the backlog value for a TCP-style listening I am not crazy about this patch. It's been suggested before that may be duplicate cookie processing should really be creating a temporary association since that's is how that association is being used. I had another look at the description for triggering this issue and realized that I was thinking about something else when looking at this solution. There is however no need to test both the list and temp value. We can simply always test the that list is not empty before doing list_del(). Thanks a lot for the comment! I'll send V2 later. Thanks, Xufeng -vlad It might be nice at that approach. It actually benefits us as the association destruction would happen immediately instead of being delayed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/5] of: Automatic console registration cleanups
On Fri, Mar 28, 2014 at 11:08 AM, Grant Likely wrote: > Hi all, > > This is a series that I've been playing with over the last few days to > clean up the selection of default console devices when using the device > tree. The device tree defines a way of specifying the console by using a > "stdout-path" property in the /chosen node, but very few drivers > actually attempt to use that data, and so for most platforms there needs > to be a "console=" line in the command line if a serial port is intended > to be used as the console. > > With this series, if there is a /chosen/stdout-path property, and if > that property points to a serial port node, then when the serial driver > registers the port, the core uart_add_one_port() function will notice > and if no console= argument was provided then add it as a preferred > console. > > I've not tested this very extensively yet, but I want to get some > feedback before I go further. > > The one downside with this approach is that it doesn't do anything for > early console setup. That still needs to be added on a per-driver basis, > but at least it shouldn't conflict with this approach. Hey, what happened with this series? Rob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 3/4] mutex: Try to acquire mutex only if it is unlocked
On 6/11/2014 2:37 PM, Jason Low wrote: Upon entering the slowpath in __mutex_lock_common(), we try once more to acquire the mutex. We only try to acquire if (lock->count >= 0). However, what we actually want here is to try to acquire if the mutex is unlocked (lock->count == 1). This patch changes it so that we only try-acquire the mutex upon entering the slowpath if it is unlocked, rather than if the lock count is non-negative. This helps further reduce unnecessary atomic xchg() operations. Furthermore, this patch uses !mutex_is_locked(lock) to do the initial checks for if the lock is free rather than directly calling atomic_read() on the lock->count, in order to improve readability. Signed-off-by: Jason Low --- kernel/locking/mutex.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 4bd9546..e4d997b 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -432,7 +432,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, if (owner && !mutex_spin_on_owner(lock, owner)) break; - if ((atomic_read(>count) == 1) && + /* Try to acquire the mutex if it is unlocked. */ + if (!mutex_is_locked(lock) && (atomic_cmpxchg(>count, 1, 0) == 1)) { lock_acquired(>dep_map, ip); if (use_ww_ctx) { @@ -479,9 +480,9 @@ slowpath: /* * Once more, try to acquire the lock. Only try-lock the mutex if -* lock->count >= 0 to reduce unnecessary xchg operations. +* it is unlocked to reduce unnecessary xchg() operations. */ - if (atomic_read(>count) >= 0 && (atomic_xchg(>count, 0) == 1)) + if (!mutex_is_locked(lock) && (atomic_xchg(>count, 0) == 1)) goto skip_wait; debug_mutex_lock_common(lock, ); Acked-by: Waiman Long -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/4] mutex: Delete the MUTEX_SHOW_NO_WAITER macro
On 6/11/2014 2:37 PM, Jason Low wrote: v1->v2: - There were discussions in v1 about a possible mutex_has_waiters() function. This patch didn't use that function because the places which used MUTEX_SHOW_NO_WAITER requires checking for lock->count while an actual mutex_has_waiters() should check for !list_empty(wait_list). We'll just delete the macro and directly use atomic_read() + comments. MUTEX_SHOW_NO_WAITER() is a macro which checks for if there are "no waiters" on a mutex by checking if the lock count is non-negative. Based on feedback from the discussion in the earlier version of this patchset, the macro is not very readable. Furthermore, checking lock->count isn't always the correct way to determine if there are "no waiters" on a mutex. For example, a negative count on a mutex really only means that there "potentially" are waiters. Likewise, there can be waiters on the mutex even if the count is non-negative. Thus, "MUTEX_SHOW_NO_WAITER" doesn't always do what the name of the macro suggests. So this patch deletes the MUTEX_SHOW_NO_WAITERS() macro, directly use atomic_read() instead of the macro, and adds comments which elaborate on how the extra atomic_read() checks can help reduce unnecessary xchg() operations. Signed-off-by: Jason Low --- kernel/locking/mutex.c | 18 -- 1 files changed, 8 insertions(+), 10 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index dd26bf6..4bd9546 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -46,12 +46,6 @@ # include #endif -/* - * A negative mutex count indicates that waiters are sleeping waiting for the - * mutex. - */ -#defineMUTEX_SHOW_NO_WAITER(mutex) (atomic_read(&(mutex)->count) >= 0) - void __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) { @@ -483,8 +477,11 @@ slowpath: #endif spin_lock_mutex(>wait_lock, flags); - /* once more, can we acquire the lock? */ - if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(>count, 0) == 1)) + /* +* Once more, try to acquire the lock. Only try-lock the mutex if +* lock->count >= 0 to reduce unnecessary xchg operations. +*/ + if (atomic_read(>count) >= 0 && (atomic_xchg(>count, 0) == 1)) goto skip_wait; debug_mutex_lock_common(lock, ); @@ -504,9 +501,10 @@ slowpath: * it's unlocked. Later on, if we sleep, this is the * operation that gives us the lock. We xchg it to -1, so * that when we release the lock, we properly wake up the -* other waiters: +* other waiters. We only attempt the xchg if the count is +* non-negative in order to avoid unnecessary xchg operations: */ - if (MUTEX_SHOW_NO_WAITER(lock) && + if (atomic_read(>count) >= 0 && (atomic_xchg(>count, -1) == 1)) break; Acked-by: Waiman Long -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch v5.1 03/03]: hwrng: khwrngd derating per device
On 05/27/2014 07:11 AM, Torsten Duwe wrote: > [checkpatch tells me not to 0-init...] > > This patch introduces a derating factor to struct hwrng for > the random bits going into the kernel input pool, and a common > default derating for drivers which do not specify one. > > Signed-off-by: Torsten Duwe > Did we lose track of this patchset? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/3] locking/mutex: Try to acquire mutex only if it is unlocked
On 6/11/2014 5:48 PM, Jason Low wrote: On Wed, 2014-06-11 at 17:00 -0400, Long, Wai Man wrote: On 6/9/2014 1:38 PM, Jason Low wrote: On Wed, 2014-06-04 at 13:58 -0700, Davidlohr Bueso wrote: On Wed, 2014-06-04 at 13:57 -0700, Davidlohr Bueso wrote: In addition, how about the following helpers instead: - mutex_is_unlocked() : count > 0 - mutex_has_waiters() : count < 0, or list_empty(->wait_list) ^ err, that's !list_empty() Between checking for (count < 0) or checking for !list_empty(wait_list) for waiters: Now that I think about it, I would expect a mutex_has_waiters() function to return !list_empty(wait_list) as that really tells whether or not there are waiters. For example, in highly contended cases, there can still be waiters on the mutex if count is 1. Likewise, in places where we currently use "MUTEX_SHOW_NO_WAITER", we need to check for (count < 0) to ensure lock->count is a negative value before the thread sleeps on the mutex. One option would be to still remove MUTEX_SHOW_NO_WAITER(), directly use atomic_read() in place of the macro, and just comment on why we have an extra atomic_read() that may "appear redundant". Another option could be to provide a function that checks for "potential waiters" on the mutex. Any thoughts? For the first MUTEX_SHOW_NO_WAITER() call site, you can replace it with a check for (count > 0). Yup, in my v2 patch, the first call site becomes !mutex_is_locked(lock) which is really a check for (count == 1). Yes, your v2 patch looks fine to me. The second call site within the for loop, however, is a bit more tricky. It has to serve 2 purposes: 1. Opportunistically get the lock 2. Set the count value to -1 to indicate someone is waiting on the lock, that is why an xchg() operation has to be done even if its value is 0. I do agree that the naming isn't that good. Maybe it can be changed to something like static inline int mutex_value_has_waiters(mutex *lock){ return lock->count < 0; } So I can imagine that a mutex_value_has_waiters() function might still not be a great name, since the mutex can have waiters in the case that the value lock->count >= 0. In the second call site, do you think we should just do a direct atomic_read(lock->count) >= 0 and comment that we only do the xchg if the count is non-negative to avoid unnecessary xchg? That what I did in my v2 patch. I think that is a good idea to avoid any controversy in naming. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Proposal.
I have a proposal for you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: clear root port wake on bits if controller isn't wake-up capable
On 06/11/2014 11:26 PM, Greg Kroah-Hartman wrote: On Wed, Jun 11, 2014 at 06:25:20AM +0800, Lu Baolu wrote: When the xHCI PCI host is suspended, if do_wakeup is false in xhci_pci_suspend, xhci_bus_suspend needs to clear all root port wake on bits. Otherwise some Intel platform may get a spurious wakeup, even if PCI PME# is disabled. http://marc.info/?l=linux-usb=138194006009255=2 Signed-off-by: Lu Baolu --- drivers/usb/host/xhci-hub.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Should this also be a stable kernel patch? If so, how far back? Yes. This patch should be back-ported to kernels as old as 2.6.37, that contains the commit 9777e3ce907d4cb5a513902a87ecd03b52499569 "USB: xHCI: bus power management implementation". Thanks, -baolu thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus
Ok, some misconfiguration here probably, never mind. I'll finish the tests tomorrow, compare with existent ones and let you know asap. Tks. On Wed, Jun 11, 2014 at 10:09 PM, Eric W. Biederman wrote: > Rafael Tinoco writes: > >> I'm getting a kernel panic with your patch: >> >> -- panic >> -- mount_block_root >> -- mount_root >> -- prepare_namespace >> -- kernel_init_freeable >> >> It is giving me an unknown block device for the same config file i >> used on other builds. Since my test is running on a kvm guest under a >> ramdisk, i'm still checking if there are any differences between this >> build and other ones but I think there aren't. >> >> Any chances that "prepare_namespace" might be breaking mount_root ? > > My patch boots for me > > Eric -- -- Rafael David Tinoco Software Sustaining Engineer @ Canonical Canonical Technical Services Engineering Team # Email: rafael.tin...@canonical.com (GPG: 87683FC0) # Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo) # LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus
Rafael Tinoco writes: > I'm getting a kernel panic with your patch: > > -- panic > -- mount_block_root > -- mount_root > -- prepare_namespace > -- kernel_init_freeable > > It is giving me an unknown block device for the same config file i > used on other builds. Since my test is running on a kvm guest under a > ramdisk, i'm still checking if there are any differences between this > build and other ones but I think there aren't. > > Any chances that "prepare_namespace" might be breaking mount_root ? My patch boots for me Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] tracing: Fix memory leak on failure path in ftrace_allocate_pages()
Hi Steve, On Wed, 11 Jun 2014 10:03:40 -0400, Steven Rostedt wrote: > On Wed, 11 Jun 2014 17:06:53 +0900 > Namhyung Kim wrote: > >> As struct ftrace_page is managed in a single linked list, it should >> free from the start page. >> >> Signed-off-by: Namhyung Kim >> --- >> kernel/trace/ftrace.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >> index 5b372e3ed675..ddfda763ded7 100644 >> --- a/kernel/trace/ftrace.c >> +++ b/kernel/trace/ftrace.c >> @@ -2398,7 +2398,8 @@ ftrace_allocate_pages(unsigned long num_to_init) >> return start_pg; >> >> free_pages: >> -while (start_pg) { >> +pg = start_pg; >> +while (pg) { > > It works with just the added "pg = start_page", I would keep the > while (start_pg) still. The reason why I changed it is the code actually uses pg rather than start_pg in the loop. So it's more comfortable for me to check the pg in the condition. But it's minor, I won't insist it strongly.. :) Thanks, Namhyung >> order = get_count_order(pg->size / ENTRIES_PER_PAGE); >> free_pages((unsigned long)pg->records, order); >> start_pg = pg->next; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rtc/hctosys.c Problem during kernel boot
On Wed, Jun 11, 2014 at 04:53:55PM -0700, John Stultz wrote: > On Wed, Jun 11, 2014 at 4:01 PM, John Whitmore wrote: > > I'm having a problem with a DS3234 SPI based RTC chip and rtc/hctosys.c on > > the > > 3.10.29 kernel of the RaspberryPi. I'm not sure this is a bug or not but > > thought I'd ask. I've enabled the kernel config option for HCTOSYS which, on > > boot, should set the system's date/time to the value read from the RTC. I > > tried tihs but it would never happen on the RPi. I eventually found in > > syslog > > that the kernel boot is attempting to execute the hctosys functionality > > prior > > to the SPI being initialised. As a result of this when hctosys is attempted > > there is not /dev/rtc0 yet. A short time later the DS3234 RTC is initialised > > but by then it's too late. > > > > Once the system has booted and I've logged in I can read and write to the > > RTC > > and all seems good but /sys/class/rtc/rtc0/hctosys is '0' indicating that > > the > > system time was not set on boot. > > > > There is a "deprecated" warning in the syslog coming from the spi of the > > board > > file so perhaps that is the cause. So is this a bug? And if so what can I do > > to resolve it. The hctosys is on a "late_initcall" so not sure of timing. > > Sigh. Yea, this issue was brought up previously, but we never got > around to a solution that could be merged. > > Basically hctosys is late_init, but if the driver is a module, it > might not be loaded in time. Adding hooks at module load time when > RTCs are registered could be done, but then you have the issue that > userspace might have set the clock via something like ntpdate, so > HCTOSYS could then cause the clock to be less accurate. > > So we need to make the HCTOSYS functionality happen at RTC register > time, but it needs to set the clock only if nothing has set the clock > already. This requires a new timekeeeping interface - something like > timekeeping_set_time_if_unset(), which atomically would set the time > if it has never been set. > > You can read some of the previous discussion here: > https://lkml.org/lkml/2013/6/17/533 > Thanks a million for that information I'll have a look, as I might try and resolve the issue. > I'd be very interested in patches to resolve this! > > thanks > -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] perf timechart: add more options to IO mode
On Tue, 10 Jun 2014 19:04:54 +0400, Stanislav Fomichev wrote: > --io-skip-eagain - don't show EAGAIN errors > --io-min-time- make small io bursts visible > --io-merge-dist - merge adjacent events > > Signed-off-by: Stanislav Fomichev > --- > tools/perf/Documentation/perf-timechart.txt | 9 ++ > tools/perf/builtin-timechart.c | 49 > +++-- > 2 files changed, 56 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/Documentation/perf-timechart.txt > b/tools/perf/Documentation/perf-timechart.txt > index ec6b46c7bca0..62c29656ad95 100644 > --- a/tools/perf/Documentation/perf-timechart.txt > +++ b/tools/perf/Documentation/perf-timechart.txt > @@ -64,6 +64,15 @@ TIMECHART OPTIONS > duration or tasks with given name. If number is given it's interpreted > as number of nanoseconds. If non-numeric string is given it's > interpreted as task name. > +--io-skip-eagain:: > + Don't draw EAGAIN IO events. > +--io-min-time:: > + Draw small events as if they lasted min-time. Useful when you need > + to see very small and fast IO. Default value is 1ms. It's in nano-second unit, right? If so, it's very unconvenient for user to specify. Maybe we could support to parse unit (s, ms, us, ...) also. > +--io-merge-dist:: > + Merge events that are merge-dist nanoseconds apart. > + Reduces number of figures on the SVG and makes it more render-friendly. > + Default value is 1us. Ditto. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/char/random.c: more ruminations
On 06/11/2014 06:11 AM, Theodore Ts'o wrote: > On Tue, Jun 10, 2014 at 11:58:06PM -0400, George Spelvin wrote: >> You can forbid underflows, but the code doesn't forbid overflows. >> >> 1. Assume the entropy count starts at 512 bytes (input pool full) >> 2. Random writer mixes in 20 bytes of entropy into the input pool. >> 2a. Input pool entropy is, however, capped at 512 bytes. >> 3. Random extractor extracts 32 bytes of entropy from the pool. >>Succeeds because 32 < 512. Pool is left with 480 bytes of >>entropy. >> 3a. Random extractor decrements pool entropy estimate to 480 bytes. >> This is accurate. >> 4. Random writer credits pool with 20 bytes of entropy. >> 5. Input pool entropy is now 480 bytes, estimate is 500 bytes. > > Good point, that's a potential problem, although messing up the > accounting betewen 480 and 500 bytes is not nearly as bad as messing > up 0 and 20. > > It's not something where if the changes required massive changes, that > I'd necessarily feel the need to backport them to stable. It's a > certificational weakness, but it's a not disaster. > Actually, with the new accounting code it will be even less serious, because mixing into a nearly full pool is discounted heavily -- because it is not like filling a queue; the mixing function will probabilistically overwrite existing pool entropy. So it is still a race condition, and still wrong, but it is a lot less wrong. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] perf timechart: implement IO mode
Hi Stanislav, On Tue, 10 Jun 2014 19:04:52 +0400, Stanislav Fomichev wrote: > In IO mode timechart shows any disk/network activity. [SNIP] > +Record system-wide IO events: > + > + $ perf timechart record -I I got a segfault here: Core was generated by `perf timechart record -I'. Program terminated with signal 11, Segmentation fault. #0 parse_options_step (ctx=ctx@entry=0x7fff6dcd8ef0, options=options@entry=0x587de0, usagestr=usagestr@entry=0x588900) at util/parse-options.c:353 353 if (*arg != '-' || !arg[1]) { Missing separate debuginfos, use: debuginfo-install glibc-2.17-9.fc20.x86_64 nss-softokn-freebl-3.15-1.fc20.x86_64 numactl-libs-2.0.7-6.fc17.x86_64 (gdb) bt #0 parse_options_step (ctx=ctx@entry=0x7fff6dcd8ef0, options=options@entry= 0x587de0, usagestr=usagestr@entry=0x588900) at util/parse-options.c:353 #1 0x00465cf4 in parse_options_subcommand (argc=argc@entry=197, argv=argv@entry=0x13fd6d0, options=options@entry=0x587de0, subcommands=subcommands@entry=0x0, usagestr=usagestr@entry=0x588900, flags=flags@entry=2) at util/parse-options.c:462 #2 0x00465f54 in parse_options (argc=argc@entry=197, argv=argv@entry= 0x13fd6d0, options=options@entry=0x587de0, usagestr=usagestr@entry= 0x588900, flags=flags@entry=2) at util/parse-options.c:492 #3 0x00429ef8 in cmd_record (argc=argc@entry=197, argv=argv@entry= 0x13fd6d0, prefix=prefix@entry=0x0) at builtin-record.c:894 #4 0x00434c59 in timechart__io_record (argv=0x7fff6dcdc270, argc=0) at builtin-timechart.c:1756 #5 cmd_timechart (argc=0, argv=0x7fff6dcdc270, prefix=) at builtin-timechart.c:1957 #6 0x0041b603 in run_builtin (p=p@entry=0x7fa230, argc=argc@entry=3, argv=argv@entry=0x7fff6dcdc270) at perf.c:319 #7 0x0041ae82 in handle_internal_command (argv=0x7fff6dcdc270, argc=3) at perf.c:376 #8 run_argv (argv=0x7fff6dcdc060, argcp=0x7fff6dcdc06c) at perf.c:420 #9 main (argc=3, argv=0x7fff6dcdc270) at perf.c:534 It was because, as I said, my system doesn't have pread64 syscall.. you missed to decrease rec_argc when skipping invalid events. :) > + > + then generate timechart: > + > + $ perf timechart After fixing the problem, I could run timechart and generate an output.svg file. But it doesn't show any IO activity.. process info was there in grey boxes (rect.process3) but no color boxes. I also tried recording with ping and dd, but the result was same. I suspect it's because of some mis-calculation of position or size of the boxes. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6] NVMe: conversion to blk-mq
On Mon, Jun 9, 2014 at 6:40 PM, Ming Lei wrote: > The root cause is that device returns > NVME_INTERNAL_DEV_ERROR(0x6) with your conversion > patch. The above problem is caused by qemu not handling -EAGAIN from io_submit(), so please ignore the report. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] of: Add vendor 2nd prefix for Asahi Kasei Corp
On Wed, Jun 11, 2014 at 05:53:02PM -0700, Kuninori Morimoto wrote: > From: Kuninori Morimoto > > Current vendor-prefixes.txt already has > "ak" prefix for Asahi Kasei Corp by > ae8c4209af2cec065fef15d200a42a04130799f7 > (of: Add vendor prefix for Asahi Kasei Corp.) > > It went through the appropriate review process, > and is already in use. > But, almost all Asahi Kasei chip driver is > using "asahi-kasei" prefix today. > > Due to ABIness, this patch adds > "asahi-kasei" to vendor-prefixes.txt. > checkpatch.pl will report WARNING without this patch. > (DT compatible string vendor "asahi-kasei" appears un-documented) > > OTOH, Asahi Kasei is usually referred as "AKM", > but this patch doesn't care about it. > Because no DT is using it today. > > Cc: Stephen Warren > Cc: Mark Brown > Cc: Geert Uytterhoeven , > Signed-off-by: Kuninori Morimoto Acked-by: Simon Horman > --- > .../devicetree/bindings/vendor-prefixes.txt|1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt > b/Documentation/devicetree/bindings/vendor-prefixes.txt > index abc3080..7e4bb83 100644 > --- a/Documentation/devicetree/bindings/vendor-prefixes.txt > +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt > @@ -17,6 +17,7 @@ amstaos AMS-Taos Inc. > apm Applied Micro Circuits Corporation (APM) > arm ARM Ltd. > armadeus ARMadeus Systems SARL > +asahi-kasei Asahi Kasei Corp. > atmelAtmel Corporation > auo AU Optronics Corporation > avagoAvago Technologies > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86/smep: Be more informative when signalling an SMEP fault
Commit-ID: eff50c347fcc8feeb8c1723c23c89aba67c60263 Gitweb: http://git.kernel.org/tip/eff50c347fcc8feeb8c1723c23c89aba67c60263 Author: Jiri Kosina AuthorDate: Tue, 10 Jun 2014 22:49:31 +0200 Committer: H. Peter Anvin CommitDate: Wed, 11 Jun 2014 17:55:30 -0700 x86/smep: Be more informative when signalling an SMEP fault If pagefault triggers due to SMEP triggering, it can't be really easily distinguished from any other oops-causing pagefault, which might lead to quite some confusion when trying to understand the reason for the oops. Print an explanatory message in case the fault happened during instruction fetch for _PAGE_USER page which is present and executable on SMEP-enabled CPUs. This is consistent with what we are doing for NX already; in addition to immediately seeing from the oops what might be happening, it can even easily give a good indication to sysadmins who are carefully monitoring their kernel logs that someone might be trying to pwn them. Signed-off-by: Jiri Kosina Link: http://lkml.kernel.org/r/alpine.lnx.2.00.1406102248490.1...@pobox.suse.cz Signed-off-by: H. Peter Anvin --- arch/x86/mm/fault.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 858b47b..9de4cdb 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -575,6 +575,8 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address) static const char nx_warning[] = KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n"; +static const char smep_warning[] = KERN_CRIT +"unable to execute userspace code (SMEP?) (uid: %d)\n"; static void show_fault_oops(struct pt_regs *regs, unsigned long error_code, @@ -595,6 +597,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, if (pte && pte_present(*pte) && !pte_exec(*pte)) printk(nx_warning, from_kuid(_user_ns, current_uid())); + if (pte && pte_present(*pte) && pte_exec(*pte) && + (pgd_flags(*pgd) & _PAGE_USER) && + (read_cr4() & X86_CR4_SMEP)) + printk(smep_warning, from_kuid(_user_ns, current_uid())); } printk(KERN_ALERT "BUG: unable to handle kernel "); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NTB driver support in haswell platform?
Hi Jon, Thanks for your detailed explanation. Now I have a clearer understanding of it. Thanks! :) Yijing. On 2014/6/12 1:18, Jon Mason wrote: > On Wed, Jun 11, 2014 at 05:03:38PM +0800, Yijing Wang wrote: >> Hi Jon, >>I have a Intel Haswell platform in hand, and our team want to use NTB in >> this platform. >> I checked the current intel NTB driver in Linux kernel, I found the Haswell >> NTB pci device id >> is not contained in ntb_pci_tbl[]. I want to know whether current kernel ntb >> driver can support >> the ntb device in Haswell platform ? > > Yes, it does support Haswell and the Device IDs are in there. > PCI_DEVICE_ID_INTEL_NTB_B2B_HSX, PCI_DEVICE_ID_INTEL_NTB_PS_HSX, and > PCI_DEVICE_ID_INTEL_NTB_SS_HSX are the relevant dev ids for Haswell. > >> Haswell NTB device id: >> >> From Haswell EDS 7.4.2 >> >> did >> Bus: 0 Device: 3 Function: 0 Offset: 2 >> Bit Attr Default Description >> 15:0 RO-V 2F08h Device_Identification_Number โ Device ID values vary from >> function to function. >> Bits 15:8 are equal to 0x2F. The following list is a breakdown of the >> function groups. >> 0x2F00 - 0x2F1F : PCI Express and DMI2 >> 0x2F20 - 0x2F3F : Integrated I/O Features >> 0x2F40 - 0x2F5F : Performance Monitors >> 0x2F80 - 0x2F9F : Intel QPI >> 0x2FA0 - 0x2FBF : Home Agent/Memory Controller >> 0x2FC0 - 0x2FDF : Power Management >> 0x2FE0 - 0x2FFF : Cbo/Ring >> Default value may vary based on bus, device, and function of this CSR >> location. >> >> >> Current ntb_pci_tbl[] in Linux: >> >> #define PCI_DEVICE_ID_INTEL_NTB_B2B_JSF 0x3725 >> #define PCI_DEVICE_ID_INTEL_NTB_PS_JSF 0x3726 >> #define PCI_DEVICE_ID_INTEL_NTB_SS_JSF 0x3727 >> #define PCI_DEVICE_ID_INTEL_NTB_B2B_SNB 0x3C0D >> #define PCI_DEVICE_ID_INTEL_NTB_PS_SNB 0x3C0E >> #define PCI_DEVICE_ID_INTEL_NTB_SS_SNB 0x3C0F >> #define PCI_DEVICE_ID_INTEL_NTB_B2B_IVT 0x0E0D >> #define PCI_DEVICE_ID_INTEL_NTB_PS_IVT 0x0E0E >> #define PCI_DEVICE_ID_INTEL_NTB_SS_IVT 0x0E0F >> #define PCI_DEVICE_ID_INTEL_NTB_B2B_HSX 0x2F0D >> #define PCI_DEVICE_ID_INTEL_NTB_PS_HSX 0x2F0E >> #define PCI_DEVICE_ID_INTEL_NTB_SS_HSX 0x2F0F >> #define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD 0x0C4E >> >> So we should modify the default device id to 0x2F0D, 0x2F0E or 0x2F0F ? > > The device IDs are present above: PCI_DEVICE_ID_INTEL_NTB_B2B_HSX, > PCI_DEVICE_ID_INTEL_NTB_PS_HSX, and PCI_DEVICE_ID_INTEL_NTB_SS_HSX. > >> What's the difference between them? > > The last 3 letters are the name of the CPU where NTB is found. HSX is > Haswell Xeon. The 2-3 letters before that are the configuration type > of the NTB device. B2B is for "Back-to-back" configurations, aka > "NTB-NTB". > > B2B > [CPU]---[NTB]===[NTB]---[CPU] > > PS/SS is for NTB-RP configurations. PS is "Primary Side" and SS is > "Secondary Side". > > [CPU]---[SS|PS]---[CPU] > > I have an NTB wiki on my github account > (https://github.com/jonmason/ntb/wiki) describing the configuration, > etc. Also on the wiki is a link to a doc (not written by me, and > contains references to a driver that was not made public) that has > some graphics that might be useful. Specifically, pages 10 and 17. > To save time, the URL is > http://download.intel.com/design/intarch/papers/323328.pdf > > Let me know if you have any questions or issues, and I'll be happy to > walk you through it. > > Thanks, > Jon > >> >> Thanks! >> Yijing. >> >> >> >> >> -- >> Thanks! >> Yijing >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] of: Add vendor 2nd prefix for Asahi Kasei Corp
From: Kuninori Morimoto Current vendor-prefixes.txt already has "ak" prefix for Asahi Kasei Corp by ae8c4209af2cec065fef15d200a42a04130799f7 (of: Add vendor prefix for Asahi Kasei Corp.) It went through the appropriate review process, and is already in use. But, almost all Asahi Kasei chip driver is using "asahi-kasei" prefix today. Due to ABIness, this patch adds "asahi-kasei" to vendor-prefixes.txt. checkpatch.pl will report WARNING without this patch. (DT compatible string vendor "asahi-kasei" appears un-documented) OTOH, Asahi Kasei is usually referred as "AKM", but this patch doesn't care about it. Because no DT is using it today. Cc: Stephen Warren Cc: Mark Brown Cc: Geert Uytterhoeven , Signed-off-by: Kuninori Morimoto --- .../devicetree/bindings/vendor-prefixes.txt|1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index abc3080..7e4bb83 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -17,6 +17,7 @@ amstaos AMS-Taos Inc. apmApplied Micro Circuits Corporation (APM) armARM Ltd. armadeus ARMadeus Systems SARL +asahi-kaseiAsahi Kasei Corp. atmel Atmel Corporation auoAU Optronics Corporation avago Avago Technologies -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch v3] mm, pcp: allow restoring percpu_pagelist_fraction default
Oleg reports a division by zero error on zero-length write() to the percpu_pagelist_fraction sysctl: divide error: [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: 8800d5aeb6e0 ti: 8800d87a2000 task.ti: 8800d87a2000 RIP: 0010:[] [] percpu_pagelist_fraction_sysctl_handler+0x84/0x120 RSP: 0018:8800d87a3e78 EFLAGS: 00010246 RAX: 0f89 RBX: 88011f7fd000 RCX: RDX: RSI: 0001 RDI: 0010 RBP: 8800d87a3e98 R08: 81d002c8 R09: 8800d87a3f50 R10: 000b R11: 0246 R12: 0060 R13: 81c3c3e0 R14: 81cfddf8 R15: 8801193b0800 FS: 7f614f1e9740() GS:88011f44() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f614f1fa000 CR3: d9291000 CR4: 06e0 Stack: 0001 ffea 81c3c3e0 8800d87a3ee8 8122b163 8800d87a3f50 7fff1564969c 8800d8098f00 7fff1564969c 8800d87a3f50 Call Trace: [] proc_sys_call_handler+0xb3/0xc0 [] proc_sys_write+0x14/0x20 [] vfs_write+0xba/0x1e0 [] SyS_write+0x46/0xb0 [] tracesys+0xe1/0xe6 However, if the percpu_pagelist_fraction sysctl is set by the user, it is also impossible to restore it to the kernel default since the user cannot write 0 to the sysctl. This patch allows the user to write 0 to restore the default behavior. It still requires a fraction equal to or larger than 8, however, as stated by the documentation for sanity. If a value in the range [1, 7] is written, the sysctl will return EINVAL. This successfully solves the divide by zero issue at the same time. Reported-by: Oleg Drokin Cc: sta...@vger.kernel.org Signed-off-by: David Rientjes --- v3: remove needless ret = 0 assignment per Oleg rewrote changelog added sta...@vger.kernel.org Documentation/sysctl/vm.txt | 3 ++- kernel/sysctl.c | 3 +-- mm/page_alloc.c | 40 3 files changed, 31 insertions(+), 15 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -702,7 +702,8 @@ The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) The initial value is zero. Kernel does not use this value at boot time to set -the high water marks for each per cpu page list. +the high water marks for each per cpu page list. If the user writes '0' to this +sysctl, it will revert to this default behavior. == diff --git a/kernel/sysctl.c b/kernel/sysctl.c --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -136,7 +136,6 @@ static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; static int minolduid; -static int min_percpu_pagelist_fract = 8; static int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; @@ -1328,7 +1327,7 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(percpu_pagelist_fraction), .mode = 0644, .proc_handler = percpu_pagelist_fraction_sysctl_handler, - .extra1 = _percpu_pagelist_fract, + .extra1 = , }, #ifdef CONFIG_MMU { diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -69,6 +69,7 @@ /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); +#define MIN_PERCPU_PAGELIST_FRACTION (8) #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); @@ -4145,7 +4146,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY) #endif -static int __meminit zone_batchsize(struct zone *zone) +static int zone_batchsize(struct zone *zone) { #ifdef CONFIG_MMU int batch; @@ -4261,8 +4262,8 @@ static void pageset_set_high(struct per_cpu_pageset *p, pageset_update(>pcp, high, batch); } -static void __meminit pageset_set_high_and_batch(struct zone *zone, - struct per_cpu_pageset *pcp) +static void pageset_set_high_and_batch(struct zone *zone, + struct per_cpu_pageset *pcp) { if (percpu_pagelist_fraction) pageset_set_high(pcp, @@ -5881,23
Re: drivers/char/random.c: More futzing about
On 06/11/2014 01:41 PM, H. Peter Anvin wrote: > On 06/11/2014 12:25 PM, Theodore Ts'o wrote: >> On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote: >>> While talking about performance, I did a quick prototype of random using >>> Skein instead of SHA-1, and it was measurably faster, in part because >>> Skein produces more output per hash. >> >> Which Skein parameters did you use, and how much stack space was >> required for it? Skein-512 is described as needing 200 bytes of >> state, IIRC (which I assume most of which comes from Threefish key >> schedule). >> > > I believe I used Skein-256, but I'd have to dig to find it again. > > -hpa > Sadly I can't find the tree, but I'm 94% sure it was Skein-256 (specifically the SHA3-256 candidate parameter set.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/char/random.c: more ruminations
> It's not something where if the changes required massive changes, that > I'd necessarily feel the need to backport them to stable. It's a > certificational weakness, but it's a not disaster. Agreed! It's been there for years, and I'm not too worried. It takes a pretty tight race to cause the problem in the first place. As you note, it only happens with a full pool (already a very secure situation), and the magnitude is limited by the size of entropy additions, which are normally small. I'm just never happy with bugs in security-critical code. "I don't think that bug is exploitable" is almost as ominous a phrase as "Y'all watch this!" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the samsung tree with Linus' tree
Hi Kukjin, Today's linux-next merge of the samsung tree got a conflict in arch/arm/mach-exynos/sleep.S between commit 25a9ef63cd2b ("ARM: l2c: exynos: convert to common l2c310 early resume functionality") from Linus' tree and commit af728bd84cc8 ("ARM: EXYNOS: Fix build error with thumb2") from the samsung tree. I fixed it up (the former removed the code updated by the latter) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature