date:20140611

Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Joonsoo Kim

On Thu, Jun 12, 2014 at 02:25:43PM +0900, Minchan Kim wrote:
> On Thu, Jun 12, 2014 at 12:21:39PM +0900, Joonsoo Kim wrote:
> > We should free memory for bitmap when we find zone mis-match,
> > otherwise this memory will leak.
> 
> Then, -stable stuff?

I don't think so. This is just possible leak candidate, so we don't
need to push this to stable tree.

> 
> > 
> > Additionally, I copy code comment from ppc kvm's cma code to notify
> > why we need to check zone mis-match.
> > 
> > Signed-off-by: Joonsoo Kim 
> > 
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > index bd0bb81..fb0cdce 100644
> > --- a/drivers/base/dma-contiguous.c
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
> > base_pfn = pfn;
> > for (j = pageblock_nr_pages; j; --j, pfn++) {
> > WARN_ON_ONCE(!pfn_valid(pfn));
> > +   /*
> > +* alloc_contig_range requires the pfn range
> > +* specified to be in the same zone. Make this
> > +* simple by forcing the entire CMA resv range
> > +* to be in the same zone.
> > +*/
> > if (page_zone(pfn_to_page(pfn)) != zone)
> > -   return -EINVAL;
> > +   goto err;
> 
> At a first glance, I thought it would be better to handle such error
> before activating.
> So when I see the registration code(ie, dma_contiguous_revere_area),
> I realized it is impossible because we didn't set up zone yet. :(
> 
> If so, when we detect to fail here, it would be better to report more
> meaningful error message like what was successful zone and what is
> new zone and failed pfn number?

What I want to do in early phase of this patchset is to make cma code
on DMA APIs similar to ppc kvm's cma code. ppc kvm's cma code already
has this error handling logic, so I make this patch.

If we think that we need more things, we can do that on general cma code
after merging this patchset.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Jun 12

2014-06-11 Thread Stephen Rothwell

Hi all,

The powerpc allyesconfig is again broken more than usual.

Changes since 20140611:

Dropped tree: drm-intel-fixes (build problems)

The drm-intel-fixes still had its build failure so I dropped it at the
maintainers request.

The samsung tree gained a conflict against Linus' tree.

The pci tree lost its build failure.

The net-next tree gained a conflict against Linus' tree and a build
failure for which I reverted a commit.

The virtio tree gained a conflict against Linus' tree.

The target-updates tree gained a conflict against the virtio tree.

Non-merge commits (relative to Linus' tree): 3656
 2925 files changed, 115781 insertions(+), 54892 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 219 trees (counting Linus' and 29 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (4251c2a67011 Merge tag 'modules-next-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux)
Merging fixes/master (4b660a7f5c80 Linux 3.15-rc6)
Merging kbuild-current/rc-fixes (38dbfb59d117 Linus 3.14-rc1)
Merging arc-current/for-curr (89ca3b881987 Linux 3.15-rc4)
Merging arm-current/fixes (3f8517e7937d ARM: 8063/1: bL_switcher: fix 
individual online status reporting of removed CPUs)
Merging m68k-current/for-linus (e8d6dc5ad26e m68k/hp300: Convert printk to 
pr_foo())
Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX)
Merging powerpc-merge/merge (8212f58a9b15 powerpc: Wire renameat2() syscall)
Merging sparc/master (8ecc1bad4c9b sparc64: fix format string mismatch in 
arch/sparc/kernel/sysfs.c)
Merging net/master (c5b46160877a net/core: Add VF link state control policy)
Merging ipsec/master (6d004d6cc739 vti: Use the tunnel mark for lookup in the 
error handlers.)
Merging sound-current/for-linus (6538de03a98f ALSA: hda - Add quirk for ABit 
AA8XE)
Merging pci-current/for-linus (d0b4cc4e3270 PCI: Wrong register used to check 
pending traffic)
Merging wireless/master (2c316e699fa4 Merge branch 'for-john' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging driver-core.current/driver-core-linus (4b660a7f5c80 Linux 3.15-rc6)
Merging tty.current/tty-linus (d6d211db37e7 Linux 3.15-rc5)
Merging usb.current/usb-linus (5dc2808c4729 xhci: delete endpoints from 
bandwidth list before freeing whole device)
Merging usb-gadget-fixes/fixes (886c7c426d46 usb: gadget: at91-udc: fix irq and 
iomem resource retrieval)
Merging staging.current/staging-linus (9326c5ca0982 staging: r8192e_pci: fix 
htons error)
Merging char-misc.current/char-misc-linus (d1db0eea8524 Linux 3.15-rc3)
Merging input-current/for-linus (a292241cccb7 Merge branch 'next' into 
for-linus)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (3901c1124ec5 crypto: s390 - fix aes,des ctr mode 
concurrency finding.)
Merging ide/master (5b40dd30bbfa ide: Fix SC1200 dependencies)
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging devicetree-current/devicetree/merge (4b660a7f5c80 Linux 3.15-rc6)
Merging rr-fixes/fixes (79465d2fd48e module: remove warning about waiting 
module removal.)
Merging mfd-fixes/master (73beb63d290f mfd: rtsx_pcr: Disable interrupts before 
cancelling delayed works)
Merging vfio-fixes/for-linus (239a87020b26 Merge branch 
'for-joerg/arm-smmu/fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into for-linus)
Merging drm-in

Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict

2014-06-11 Thread Namhyung Kim

On Thu, 12 Jun 2014 12:29:09 +0900, Masami Hiramatsu wrote:
> NO, ftrace_lookup_ip() returns NULL if the hash is empty, so adding
> !ftrace_hash_empty() is meaningless :)
>
> Actually, here I intended to have 3 meanings for the new/old_hash arguments,
> - If it is NULL, it hits all
> - If it is EMPTY_HASH, it hits nothing
> - If it has some entries, it hits those entries.
>
> And in ftrace.c(__ftrace_hash_rec_update), AFAICS, ops->filter_hash has only
> 2 meanings,
> - If it is EMPTY_HASH or NULL, it hits all
> - If it has some entries, it hits those entries.

Then I found an unrelated issue during review.

It seems that checking NULL other_hash for the 'all' case in
__ftrace_hash_rec_update() is not sufficient.  It should check
EMPTY_HASH case too, but then it ends up removing the check at all since
it can be covered in ftrace_lookup_ip().

Thanks,
Namhyung


diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 13885590a184..8bd7aa69a479 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1545,7 +1545,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops 
*ops,
 * Only the filter_hash affects all records.
 * Update if the record is not in the notrace hash.
 */
-   if (!other_hash || !ftrace_lookup_ip(other_hash, 
rec->ip))
+   if (!ftrace_lookup_ip(other_hash, rec->ip))
match = 1;
} else {
in_hash = !!ftrace_lookup_ip(hash, rec->ip);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Joonsoo Kim

On Thu, Jun 12, 2014 at 02:18:53PM +0900, Minchan Kim wrote:
> Hi Joonsoo,
> 
> On Thu, Jun 12, 2014 at 12:21:38PM +0900, Joonsoo Kim wrote:
> > We don't need explicit 'CMA:' prefix, since we already define prefix
> > 'cma:' in pr_fmt. So remove it.
> > 
> > And, some logs print function name and others doesn't. This looks
> > bad to me, so I unify log format to print function name consistently.
> > 
> > Lastly, I add one more debug log on cma_activate_area().
> 
> When I take a look, it just indicates cma_activate_area was called or not,
> without what range for the area was reserved successfully so I couldn't see
> the intention for new message. Description should explain it so that everybody
> can agree on your claim.
> 

Hello,

I paste the answer in other thread.

This pr_debug() comes from ppc kvm's kvm_cma_init_reserved_areas().
I want to maintain all log messages as much as possible to reduce
confusion with this generalization.

If I need to respin this patchset, I will explain more about it.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-11 Thread Minchan Kim

On Thu, Jun 12, 2014 at 12:21:41PM +0900, Joonsoo Kim wrote:
> ppc kvm's cma area management needs alignment constraint on
> cma region. So support it to prepare generalization of cma area
> management functionality.
> 
> Additionally, add some comments which tell us why alignment
> constraint is needed on cma region.
> 
> Signed-off-by: Joonsoo Kim 
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 8a44c82..bc4c171 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct cma {
>   unsigned long   base_pfn;
> @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
>   * @size: Size of the reserved area (in bytes),
>   * @base: Base address of the reserved area optional, use 0 for any
>   * @limit: End address of the reserved memory (optional, 0 for any).
> + * @alignment: Alignment for the contiguous memory area, should be power of 2
>   * @res_cma: Pointer to store the created cma region.
>   * @fixed: hint about where to place the reserved area
>   *

Pz, move the all description to new API function rather than internal one.

> @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
>   */
>  static int __init __dma_contiguous_reserve_area(phys_addr_t size,
>   phys_addr_t base, phys_addr_t limit,
> + phys_addr_t alignment,
>   struct cma **res_cma, bool fixed)
>  {
>   struct cma *cma = _areas[cma_area_count];
> - phys_addr_t alignment;
>   int ret = 0;
>  
> - pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> -  (unsigned long)size, (unsigned long)base,
> -  (unsigned long)limit);
> + pr_debug("%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n",

Why is it called by "align_order"?

> + __func__, (unsigned long)size, (unsigned long)base,
> + (unsigned long)limit, (unsigned long)alignment);
>  
>   /* Sanity checks */
>   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
> @@ -253,8 +255,17 @@ static int __init 
> __dma_contiguous_reserve_area(phys_addr_t size,
>   if (!size)
>   return -EINVAL;
>  
> - /* Sanitise input arguments */
> - alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
> + if (alignment && !is_power_of_2(alignment))
> + return -EINVAL;
> +
> + /*
> +  * Sanitise input arguments.
> +  * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise,
> +  * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism

I'm not a native but try for clear documenation.

 Pages both ends in CMA area could be merged into adjacent unmovable
 migratetype page by page allocator's buddy algorithm. In the case,
 you couldn't get a contiguous memory, which is not what we want.

> +  * and CMA property will be broken.
> +  */
> + alignment = max(alignment,
> + (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
>   base = ALIGN(base, alignment);
>   size = ALIGN(size, alignment);
>   limit &= ~(alignment - 1);
> @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
> phys_addr_t base,
>  {
>   int ret;
>  
> - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
> + ret = __dma_contiguous_reserve_area(size, base, limit, 0,
> + res_cma, fixed);
>   if (ret)
>   return ret;
>  
> -- 
> 1.7.9.5

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] usb: host: uhci-grlib.c : use devm_ functions

2014-06-11 Thread Andreas Larsson


On 2014-06-11 20:38, Himangi Saraogi wrote:

The various devm_ functions allocate memory that is released when a
driver detaches. This patch uses devm_ioremap_resource for data
that is allocated in the probe function of a platform device and
is only freed in the remove function. The corresponding free functions
are removed and two labels are done away with. Also, linux/device.h
is added to make sure the devm_*() routine declarations are
unambiguously available.

Signed-off-by: Himangi Saraogi 


Looks and works fine now!

Acked-by: Andreas Larsson 

Best regards,
Andreas Larsson


---
Not compile tested due to incompatible architecture.
v3: pass correct arguments to devm_ioremap_resource

  drivers/usb/host/uhci-grlib.c | 31 +--
  1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/usb/host/uhci-grlib.c b/drivers/usb/host/uhci-grlib.c
index ab25dc3..05f57ff 100644
--- a/drivers/usb/host/uhci-grlib.c
+++ b/drivers/usb/host/uhci-grlib.c
@@ -17,6 +17,7 @@
   * (C) Copyright 2004-2007 Alan Stern, st...@rowland.harvard.edu
   */

+#include 
  #include 
  #include 
  #include 
@@ -113,24 +114,17 @@ static int uhci_hcd_grlib_probe(struct platform_device 
*op)
hcd->rsrc_start = res.start;
hcd->rsrc_len = resource_size();

-   if (!request_mem_region(hcd->rsrc_start, hcd->rsrc_len, hcd_name)) {
-   printk(KERN_ERR "%s: request_mem_region failed\n", __FILE__);
-   rv = -EBUSY;
-   goto err_rmr;
-   }
-
irq = irq_of_parse_and_map(dn, 0);
if (irq == NO_IRQ) {
printk(KERN_ERR "%s: irq_of_parse_and_map failed\n", __FILE__);
rv = -EBUSY;
-   goto err_irq;
+   goto err_usb;
}

-   hcd->regs = ioremap(hcd->rsrc_start, hcd->rsrc_len);
-   if (!hcd->regs) {
-   printk(KERN_ERR "%s: ioremap failed\n", __FILE__);
-   rv = -ENOMEM;
-   goto err_ioremap;
+   hcd->regs = devm_ioremap_resource(>dev, );
+   if (IS_ERR(hcd->regs)) {
+   rv = PTR_ERR(hcd->regs);
+   goto err_irq;
}

uhci = hcd_to_uhci(hcd);
@@ -139,18 +133,14 @@ static int uhci_hcd_grlib_probe(struct platform_device 
*op)

rv = usb_add_hcd(hcd, irq, 0);
if (rv)
-   goto err_uhci;
+   goto err_irq;

device_wakeup_enable(hcd->self.controller);
return 0;

-err_uhci:
-   iounmap(hcd->regs);
-err_ioremap:
-   irq_dispose_mapping(irq);
  err_irq:
-   release_mem_region(hcd->rsrc_start, hcd->rsrc_len);
-err_rmr:
+   irq_dispose_mapping(irq);
+err_usb:
usb_put_hcd(hcd);

return rv;
@@ -164,10 +154,7 @@ static int uhci_hcd_grlib_remove(struct platform_device 
*op)

usb_remove_hcd(hcd);

-   iounmap(hcd->regs);
irq_dispose_mapping(hcd->irq);
-   release_mem_region(hcd->rsrc_start, hcd->rsrc_len);
-
usb_put_hcd(hcd);

return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-11 Thread Peter Zijlstra

On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote:
> 
> On 6/11/2014 6:54 AM, Peter Zijlstra wrote:
> >On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote:
> >>Enabling this configuration feature causes a slight decrease the
> >>performance of an uncontended lock-unlock operation by about 1-2%
> >>mainly due to the use of a static key. However, uncontended lock-unlock
> >>operation are really just a tiny percentage of a real workload. So
> >>there should no noticeable change in application performance.
> >No, entirely unacceptable.
> >
> >>+#ifdef CONFIG_VIRT_UNFAIR_LOCKS
> >>+/**
> >>+ * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly
> >>+ * @lock : Pointer to queue spinlock structure
> >>+ * Return: 1 if lock acquired, 0 if failed
> >>+ */
> >>+static __always_inline int queue_spin_trylock_unfair(struct qspinlock 
> >>*lock)
> >>+{
> >>+   union arch_qspinlock *qlock = (union arch_qspinlock *)lock;
> >>+
> >>+   if (!qlock->locked && (cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0))
> >>+   return 1;
> >>+   return 0;
> >>+}
> >>+
> >>+/**
> >>+ * queue_spin_lock_unfair - acquire a queue spinlock unfairly
> >>+ * @lock: Pointer to queue spinlock structure
> >>+ */
> >>+static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock)
> >>+{
> >>+   union arch_qspinlock *qlock = (union arch_qspinlock *)lock;
> >>+
> >>+   if (likely(cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0))
> >>+   return;
> >>+   /*
> >>+* Since the lock is now unfair, we should not activate the 2-task
> >>+* pending bit spinning code path which disallows lock stealing.
> >>+*/
> >>+   queue_spin_lock_slowpath(lock, -1);
> >>+}
> >Why is this needed?
> 
> I added the unfair version of lock and trylock as my original version isn't
> a simple test-and-set lock. Now I changed the core part to use the simple
> test-and-set lock. However, I still think that an unfair version in the fast
> path can be helpful to performance when both the unfair lock and paravirt
> spinlock are enabled. In this case, paravirt spinlock code will disable the
> unfair lock code in the slowpath, but still allow the unfair version in the
> fast path to get the best possible performance in a virtual guest.
> 
> Yes, I could take that out to allow either unfair or paravirt spinlock, but
> not both. I do think that a little bit of unfairness will help in the
> virtual environment.

When will you learn to like simplicity and stop this massive over
engineering effort?

There's no sane reason to have the test-and-set virt and paravirt locks
enabled at the same bloody time.

There's 3 distinct cases:

 - native
 - virt
 - paravirt

And they do not overlap. Furthermore, if there is any possibility at all
of not polluting the native code, grab it with both hands.

Native performance is king, try your very utmost bestest to preserve
that, paravirt is a distant second and nobody sane should care about the
virt case at all.

If you want extra lock stealing in the paravirt case, put it in the
slowpath code before you start queueing.


pgpji3CPU64HJ.pgp
Description: PGP signature

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Joonsoo Kim

On Thu, Jun 12, 2014 at 10:11:19AM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim  writes:
> 
> > We don't need explicit 'CMA:' prefix, since we already define prefix
> > 'cma:' in pr_fmt. So remove it.
> >
> > And, some logs print function name and others doesn't. This looks
> > bad to me, so I unify log format to print function name consistently.
> >
> > Lastly, I add one more debug log on cma_activate_area().
> >
> > Signed-off-by: Joonsoo Kim 
> >
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > index 83969f8..bd0bb81 100644
> > --- a/drivers/base/dma-contiguous.c
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
> > }
> >
> > if (selected_size && !dma_contiguous_default_area) {
> > -   pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> > +   pr_debug("%s(): reserving %ld MiB for global area\n", __func__,
> >  (unsigned long)selected_size / SZ_1M);
> 
> Do we need to do function(), or just function:. I have seen the later
> usage in other parts of the kernel.

Hello,

I also haven't seen this format in other kernel code, but, in cma, they use
this format as following.

function(arg1, arg2, ...): some message

If we all dislike this format, we can change it after merging this
patchset. Until then, it seems better to me to leave it as is.

> 
> >
> > dma_contiguous_reserve_area(selected_size, selected_base,
> > @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
> > unsigned i = cma->count >> pageblock_order;
> > struct zone *zone;
> >
> > -   cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> > +   pr_debug("%s()\n", __func__);
> 
> why ?
> 

This pr_debug() comes from ppc kvm's kvm_cma_init_reserved_areas().
I want to maintain all log messages as much as possible to reduce confusion
with this generalization.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mm: mark remap_file_pages() syscall as deprecated

2014-06-11 Thread Michael Kerrisk

Hi Kirill,

On Thu, May 8, 2014 at 2:41 PM, Kirill A. Shutemov
 wrote:
> The remap_file_pages() system call is used to create a nonlinear mapping,
> that is, a mapping in which the pages of the file are mapped into a
> nonsequential order in memory. The advantage of using remap_file_pages()
> over using repeated calls to mmap(2) is that the former approach does not
> require the kernel to create additional VMA (Virtual Memory Area) data
> structures.
>
> Supporting of nonlinear mapping requires significant amount of non-trivial
> code in kernel virtual memory subsystem including hot paths. Also to get
> nonlinear mapping work kernel need a way to distinguish normal page table
> entries from entries with file offset (pte_file). Kernel reserves flag in
> PTE for this purpose. PTE flags are scarce resource especially on some CPU
> architectures. It would be nice to free up the flag for other usage.
>
> Fortunately, there are not many users of remap_file_pages() in the wild.
> It's only known that one enterprise RDBMS implementation uses the syscall
> on 32-bit systems to map files bigger than can linearly fit into 32-bit
> virtual address space. This use-case is not critical anymore since 64-bit
> systems are widely available.
>
> The plan is to deprecate the syscall and replace it with an emulation.
> The emulation will create new VMAs instead of nonlinear mappings. It's
> going to work slower for rare users of remap_file_pages() but ABI is
> preserved.
>
> One side effect of emulation (apart from performance) is that user can hit
> vm.max_map_count limit more easily due to additional VMAs. See comment for
> DEFAULT_MAX_MAP_COUNT for more details on the limit.

Best to CC linux-api@
(https://www.kernel.org/doc/man-pages/linux-api-ml.html) on patches
like this, as well as the man-pages maintainer, so that something goes
into the man page. I added the following into the man page:

   Note:  this  system  call  is (since Linux 3.16) deprecated and
   will eventually be replaced by a  slower  in-kernel  emulation.
   Those  few  applications  that use this system call should con‐
   sider migrating to alternatives.

Okay?

Cheers,

Michael


> Signed-off-by: Kirill A. Shutemov 
> ---
>  Documentation/vm/remap_file_pages.txt | 28 
>  mm/fremap.c   |  4 
>  2 files changed, 32 insertions(+)
>  create mode 100644 Documentation/vm/remap_file_pages.txt
>
> diff --git a/Documentation/vm/remap_file_pages.txt 
> b/Documentation/vm/remap_file_pages.txt
> new file mode 100644
> index ..560e4363a55d
> --- /dev/null
> +++ b/Documentation/vm/remap_file_pages.txt
> @@ -0,0 +1,28 @@
> +The remap_file_pages() system call is used to create a nonlinear mapping,
> +that is, a mapping in which the pages of the file are mapped into a
> +nonsequential order in memory. The advantage of using remap_file_pages()
> +over using repeated calls to mmap(2) is that the former approach does not
> +require the kernel to create additional VMA (Virtual Memory Area) data
> +structures.
> +
> +Supporting of nonlinear mapping requires significant amount of non-trivial
> +code in kernel virtual memory subsystem including hot paths. Also to get
> +nonlinear mapping work kernel need a way to distinguish normal page table
> +entries from entries with file offset (pte_file). Kernel reserves flag in
> +PTE for this purpose. PTE flags are scarce resource especially on some CPU
> +architectures. It would be nice to free up the flag for other usage.
> +
> +Fortunately, there are not many users of remap_file_pages() in the wild.
> +It's only known that one enterprise RDBMS implementation uses the syscall
> +on 32-bit systems to map files bigger than can linearly fit into 32-bit
> +virtual address space. This use-case is not critical anymore since 64-bit
> +systems are widely available.
> +
> +The plan is to deprecate the syscall and replace it with an emulation.
> +The emulation will create new VMAs instead of nonlinear mappings. It's
> +going to work slower for rare users of remap_file_pages() but ABI is
> +preserved.
> +
> +One side effect of emulation (apart from performance) is that user can hit
> +vm.max_map_count limit more easily due to additional VMAs. See comment for
> +DEFAULT_MAX_MAP_COUNT for more details on the limit.
> diff --git a/mm/fremap.c b/mm/fremap.c
> index 34feba60a17e..12c3bb63b7f9 100644
> --- a/mm/fremap.c
> +++ b/mm/fremap.c
> @@ -152,6 +152,10 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, 
> unsigned long, size,
> int has_write_lock = 0;
> vm_flags_t vm_flags = 0;
>
> +   pr_warn_once("%s (%d) uses depricated remap_file_pages() syscall. "
> +   "See Documentation/vm/remap_file_pages.txt.\n",
> +   current->comm, current->pid);
> +
> if (prot)
> return err;
> /*
> --
> 2.0.0.rc2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm'

Re: Re: [PATCH ftrace/core 0/2] ftrace, kprobes: Introduce IPMODIFY flag for ftrace_ops to detect conflicts

2014-06-11 Thread Masami Hiramatsu

(2014/06/12 1:58), Josh Poimboeuf wrote:
> On Tue, Jun 10, 2014 at 10:50:01AM +, Masami Hiramatsu wrote:
>> Hi,
>>
>> Here is a pair of patches which introduces IPMODIFY flag for
>> ftrace_ops to detect conflicts of ftrace users who can modify
>> regs->ip in their handler.
>> Currently, only kprobes can change the regs->ip in the handler,
>> but recently kpatch is also want to change it. Moreover, since
>> the ftrace itself exported to modules, it might be considerable
>> senario.
>>
>> Here we talked on github.
>>  https://github.com/dynup/kpatch/issues/47
>>
>> To protect modified regs-ip from each other, this series
>> introduces FTRACE_OPS_FL_IPMODIFY flag and ftrace now ensures
>> the flag can be set on each function entry location. If there
>> is someone who already reserve regs->ip on target function
>> entry, ftrace_set_filter_ip or register_ftrace_function will
>> return -EBUSY. Users must handle that.
>>
>> At this point, all kprobes will reserve regs->ip, since jprobe
>> requires it.
> 
> Masami, thanks very much for this!
> 
> One issue with this approach is that it _always_ makes kprobes and
> kpatch incompatible when probing/patching the same function, even when
> kprobes doesn't need to touch regs->ip.

Right.

> Is it possible to add a kprobes flag (KPROBE_FLAG_IPMODIFY), which is
> only set by those kprobes users (just jprobes?) which need to modify IP?
> Then kprobes could only set the corresponding ftrace flag when it's
> really needed.  And I think kprobes could even enforce the fact that
> !KPROBE_FLAG_IPMODIFY users don't change regs->ip.

No, actually we don't need that additional flag, we can slightly change the
kprobes behavior(spec) that requires setting kprobe->break_handler a
function if it modifies regs->ip. (this doesn't break jprobe)
The problem is that we need a separate ftrace_ops for jprobe and other
probes which can change the regs->ip. But current kprobes don't expected
that such case...

> BTW, I've done some testing with this patch set by patching/probing the
> same function with FTRACE_OPS_FL_IPMODIFY, and got some warnings.  I saw
> the following warning when attempting to kpatch a kprobed function:

Ah, thanks for testing! I think it needs more work on failure path.


> 
>   WARNING: CPU: 2 PID: 18351 at kernel/trace/ftrace.c:419 
> __unregister_ftrace_function+0x1be/0x1d0()
>   Modules linked in: kpatch_meminfo_string(OE+) kpatch(OE) 
> stap_8d70d6e041605bd1e144cba4801652_14636(OE) rfcomm fuse ipt_MASQUERADE ccm 
> xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack bnep ebtable_nat 
> ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat 
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
> ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
> iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic 
> x86_pkg_temp_thermal coretemp kvm_intel snd_hda_intel iTCO_wdt 
> iTCO_vendor_support snd_hda_controller kvm snd_hda_codec iwlwifi snd_hwdep 
> uvcvideo snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device
>videobuf2_core btusb v4l2_common snd_pcm videodev nfsd cfg80211 microcode 
> e1000e bluetooth media thinkpad_acpi joydev sdhci_pci sdhci pcspkr serio_raw 
> snd_timer i2c_i801 snd mmc_core auth_rpcgss mei_me mei lpc_ich mfd_core 
> shpchp ptp pps_core wmi tpm_tis soundcore tpm rfkill nfs_acl lockd sunrpc 
> dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul crc32_pclmul 
> crc32c_intel ghash_clmulni_intel i2c_core video
>   CPU: 2 PID: 18351 Comm: insmod Tainted: GW  OE 3.15.0-IPMODIFY+ #1
>   Hardware name: LENOVO 2356BH8/2356BH8, BIOS G7ET63WW (2.05 ) 11/12/2012
> b39bd289 8803b78d7bc0 816f31ed
> 8803b78d7bf8 8108914d a07f9040
>fff0  0001 8803e7ac4200
>   Call Trace:
>[] dump_stack+0x45/0x56
>[] warn_slowpath_common+0x7d/0xa0
>[] warn_slowpath_null+0x1a/0x20
>[] __unregister_ftrace_function+0x1be/0x1d0
>[] ftrace_startup+0x1e4/0x220
>[] register_ftrace_function+0x43/0x60
>[] kpatch_register+0x664/0x830 [kpatch]
>[] ? 0xa080
>[] ? 0xa080
>[] patch_init+0x194/0x1000 [kpatch_meminfo_string]
>[] ? 0xa0045fff
>[] do_one_initcall+0xd4/0x210
>[] ? set_memory_nx+0x43/0x50
>[] load_module+0x1d92/0x25e0
>[] ? store_uevent+0x70/0x70
>[] ? kernel_read+0x50/0x80
>[] SyS_finit_module+0xa6/0xd0
>[] system_call_fastpath+0x16/0x1b
> 
> 
> That warning happened because __unregister_ftrace_function() doesn't
> expect FTRACE_OPS_FL_ENABLED to be cleared in the ftrace_startup error
> path.

Ah, right! I'll fix that.

>  I tried removing the FTRACE_OPS_FL_ENABLED clearing line in
> ftrace_startup, but I saw more warnings.  This one

Re: [RFC PATCH 00/13][V3] kexec: A new system call to allow in kernel loading

2014-06-11 Thread Dave Young

On 06/03/14 at 09:06am, Vivek Goyal wrote:
> Hi,
> 
> This is V3 of the patchset. Previous versions were posted here.
> 
> V1: https://lkml.org/lkml/2013/11/20/540
> V2: https://lkml.org/lkml/2014/1/27/331
> 
> Changes since v2:
> 
> - Took care of most of the review comments from V2.
> - Added support for kexec/kdump on EFI systems.
> - Dropped support for loading ELF vmlinux.
> 
> This patch series is generated on top of 3.15.0-rc8. It also requires a
> two patch cleanup series which is sitting in -tip tree here.
> 
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/log/?h=x86/boot
> 
> This patch series does not do kernel signature verification yet. I plan
> to post another patch series for that. Now bzImage is already signed
> with PKCS7 signature I plan to parse and verify those signatures.
> 
> Primary goal of this patchset is to prepare groundwork so that kernel
> image can be signed and signatures be verified during kexec load. This
> should help with two things.
> 
> - It should allow kexec/kdump on secureboot enabled machines.
> 
> - In general it can help even without secureboot. By being able to verify
>   kernel image signature in kexec, it should help with avoiding module
>   signing restrictions. Matthew Garret showed how to boot into a custom
>   kernel, modify first kernel's memory and then jump back to old kernel and
>   bypass any policy one wants to.
> 
> Any feedback is welcome.

Hi, Vivek

For efi ioremapping case, in 3.15 kernel efi runtime maps will not be saved
if efi=old_map is used. So you need detect this and fail the kexec file load.

Otherwise the patchset works for me.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict

2014-06-11 Thread Namhyung Kim

Hi Masami,

On Thu, 12 Jun 2014 12:29:09 +0900, Masami Hiramatsu wrote:
> (2014/06/11 16:41), Namhyung Kim wrote:
>> Hi Masami,
>> 
>> On Wed, 11 Jun 2014 10:28:01 +0900, Masami Hiramatsu wrote:
>>> (2014/06/10 22:53), Namhyung Kim wrote:
 Hi Masami,

 2014-06-10 (화), 10:50 +, Masami Hiramatsu:
> Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among
> + /* Update rec->flags */
> + do_for_each_ftrace_rec(pg, rec) {
> + /* We need to update only differences of filter_hash */
> + in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip);
> + in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip);

 Why not use ftrace_hash_empty() here instead of checking NULL? 
>>>
>>> Ah, a trick is here. Since an empty filter_hash must hit all, we can not
>>> enable/disable filter_hash if we use ftrace_hash_empty() here.
>>>
>>> To enabling the new_hash, old_hash must be EMPTY_HASH which means in_old
>>> always be false. To disabling, new_hash is EMPTY_HASH too.
>>> Please see ftrace_hash_ipmodify_enable/disable/update().
>> 
>> I'm confused. 8-p  I guess what you want to do is checking records in
>> either of the filter_hash, right?  If so, what about this?
>> 
>>  in_old = !ftrace_hash_empty(old_hash) && ftrace_lookup_ip(old_hash, 
>> rec->ip);
>>  in_new = !ftrace_hash_empty(new_hash) && ftrace_lookup_ip(new_hash, 
>> rec->ip);
>
> NO, ftrace_lookup_ip() returns NULL if the hash is empty, so adding
> !ftrace_hash_empty() is meaningless :)

Ah, you're right!

>
> Actually, here I intended to have 3 meanings for the new/old_hash arguments,
> - If it is NULL, it hits all
> - If it is EMPTY_HASH, it hits nothing
> - If it has some entries, it hits those entries.
>
> And in ftrace.c(__ftrace_hash_rec_update), AFAICS, ops->filter_hash has only
> 2 meanings,
> - If it is EMPTY_HASH or NULL, it hits all
> - If it has some entries, it hits those entries.
>
> So I had to do above change...

Then I propose to use a different value/symbol instead of EMPTY_HASH in
order to prevent future confusion and add some comments there.


[SNIP]
> +static int ftrace_hash_ipmodify_enable(struct ftrace_ops *ops)
> +{
> + struct ftrace_hash *hash = ops->filter_hash;
> +
> + if (ftrace_hash_empty(hash))
> + hash = NULL;
> +
> + return __ftrace_hash_update_ipmodify(ops, EMPTY_HASH, hash);
> +}

 Please see above comment.  You can pass an empty hash as is, or pass
 NULL as second arg.  The same goes to below...
>>>
>>> As I said above, that is by design :). EMPTY_HASH means it hits nothing,
>>> NULL means it hits all.
>> 
>> But doesn't it make unrelated records also get the flag updated?  I'm
>> curious when new_hash can be empty on _enable() case..
>
> NO, _enable() is called right before ftrace_hash_rec_enable(ops,1) which
> always enables filter_hash (since the 2nd arg is 1). If the filter_hash
> is empty, ftrace_hash_rec_enable() enables ftrace_ops on all ftrace_recs.

But AFAICS both of kprobes and kpatch call ftrace_set_filter_ip() before
calling register_ftrace_function().  That means there's no case when
ops->filter_hash can be empty, right?


>
> Ah, but I found I made a redundant mistake (different one) in 
> ftrace_hash_move(),
> ftrace_hash_ipmodify_update() should be done only if "enable" is set (that
> means ftrace_hash_move() updates filter_hash, not notrace_hash).
> I'll update this patch.

Right.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-11 Thread Minchan Kim

On Thu, Jun 12, 2014 at 12:21:40PM +0900, Joonsoo Kim wrote:
> To prepare future generalization work on cma area management code,
> we need to separate core cma management codes from DMA APIs.
> We will extend these core functions to cover requirements of
> ppc kvm's cma area management functionality in following patches.
> This separation helps us not to touch DMA APIs while extending
> core functions.
> 
> Signed-off-by: Joonsoo Kim 
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index fb0cdce..8a44c82 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
>   * If @fixed is true, reserve contiguous area at exactly @base.  If false,
>   * reserve in range from @base to @limit.
>   */
> -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
> -phys_addr_t limit, struct cma **res_cma,
> -bool fixed)
> +static int __init __dma_contiguous_reserve_area(phys_addr_t size,
> + phys_addr_t base, phys_addr_t limit,
> + struct cma **res_cma, bool fixed)
>  {
>   struct cma *cma = _areas[cma_area_count];
>   phys_addr_t alignment;
> @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
> size, phys_addr_t base,
>  
>   pr_info("%s(): reserved %ld MiB at %08lx\n",
>   __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
> -
> - /* Architecture specific contiguous memory fixup. */
> - dma_contiguous_early_fixup(base, size);
>   return 0;
> +
>  err:
>   pr_err("%s(): failed to reserve %ld MiB\n",
>   __func__, (unsigned long)size / SZ_1M);
>   return ret;
>  }
>  
> +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
> +phys_addr_t limit, struct cma **res_cma,
> +bool fixed)
> +{
> + int ret;
> +
> + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
> + if (ret)
> + return ret;
> +
> + /* Architecture specific contiguous memory fixup. */
> + dma_contiguous_early_fixup(base, size);

In old, base and size are aligned with alignment and passed into arch fixup
but your patch is changing it.
I didn't look at what kinds of side effect it makes but just want to confirm.

> +
> + return 0;
> +}
> +
>  static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
>  {
>   mutex_lock(>lock);
> @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
> long pfn, int count)
>   * global one. Requires architecture specific dev_get_cma_area() helper
>   * function.
>   */
> -struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
>  unsigned int align)
>  {
>   unsigned long mask, pfn, pageno, start = 0;
> - struct cma *cma = dev_get_cma_area(dev);
>   struct page *page = NULL;
>   int ret;
>  
>   if (!cma || !cma->count)
>   return NULL;
>  
> - if (align > CONFIG_CMA_ALIGNMENT)
> - align = CONFIG_CMA_ALIGNMENT;
> -
>   pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
>count, align);
>  
> @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device 
> *dev, int count,
>   return page;
>  }
>  

Please move the description in __dma_alloc_from_contiguous to here exported API.

> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +unsigned int align)
> +{
> + struct cma *cma = dev_get_cma_area(dev);
> +
> + if (align > CONFIG_CMA_ALIGNMENT)
> + align = CONFIG_CMA_ALIGNMENT;
> +
> + return __dma_alloc_from_contiguous(cma, count, align);
> +}
> +
>  /**
>   * dma_release_from_contiguous() - release allocated pages
>   * @dev:   Pointer to device for which the pages were allocated.
> @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device 
> *dev, int count,
>   * It returns false when provided pages do not belong to contiguous area and
>   * true otherwise.
>   */
> -bool dma_release_from_contiguous(struct device *dev, struct page *pages,
> +static bool __dma_release_from_contiguous(struct cma *cma, struct page 
> *pages,
>int count)
>  {
> - struct cma *cma = dev_get_cma_area(dev);
>   unsigned long pfn;
>  
>   if (!cma || !pages)
> @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
> struct page *pages,
>  
>   return true;
>  }
> +

Ditto.

> +bool dma_release_from_contiguous(struct device *dev, struct page *pages,
> +  int count)
> +{
> + struct cma *cma =

[Regression] 3.15 mmc related ext4 corruption with qemu-system-arm

2014-06-11 Thread John Stultz

I've been seeing some ext4 corruption with recent kernels under qemu-system-arm.

This issue seems to crop up after shutting down uncleanly (terminating
qemu), shortly after booting about 50% of the time.

ext4/mmc related dmesg details are:
[3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at
0x10005000 irq 41,42 (pio)
[3.268316] mmc0: new SDHC card at address 4567
[3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB
[3.315699]  mmcblk0: p1 p2 p3 p4 < p5 p6 >
...
[   11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option
[   11.904714] EXT4-fs (mmcblk0p5): recovery complete
[   11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered
data mode. Opts: nomblk_io_submit,errors=panic
...
[   91.558824] EXT4-fs error (device mmcblk0p5):
ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in
gd; block bitmap corrupt.
[   91.560641] Aborting journal on device mmcblk0p5-8.
[   91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5):
panic forced after error
[   91.562589]
[   91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560
[   91.564616] [] (unwind_backtrace) from []
(show_stack+0x11/0x14)
[   91.565154] [] (show_stack) from []
(dump_stack+0x59/0x7c)
[   91.565666] [] (dump_stack) from [] (panic+0x67/0x178)
[   91.566147] [] (panic) from []
(ext4_handle_error+0x69/0x74)
[   91.566659] [] (ext4_handle_error) from []
(__ext4_grp_locked_error+0x6b/0x160)
[   91.567223] [] (__ext4_grp_locked_error) from
[] (ext4_mb_generate_buddy+0x1b1/0x29c)
[   91.567860] [] (ext4_mb_generate_buddy) from []
(ext4_mb_init_cache+0x219/0x4e0)
[   91.568473] [] (ext4_mb_init_cache) from []
(ext4_mb_init_group+0xbb/0x138)
[   91.569021] [] (ext4_mb_init_group) from []
(ext4_mb_good_group+0xf3/0xfc)
[   91.569659] [] (ext4_mb_good_group) from []
(ext4_mb_regular_allocator+0x153/0x2c4)
[   91.570250] [] (ext4_mb_regular_allocator) from
[] (ext4_mb_new_blocks+0x2fd/0x4e4)
[   91.570868] [] (ext4_mb_new_blocks) from []
(ext4_ext_map_blocks+0x965/0x10bc)
[   91.571444] [] (ext4_ext_map_blocks) from []
(ext4_map_blocks+0xfb/0x36c)
[   91.571992] [] (ext4_map_blocks) from []
(mpage_map_and_submit_extent+0x99/0x5f0)
[   91.572614] [] (mpage_map_and_submit_extent) from
[] (ext4_writepages+0x2b9/0x4e8)
[   91.573201] [] (ext4_writepages) from []
(do_writepages+0x19/0x28)
[   91.573709] [] (do_writepages) from []
(__filemap_fdatawrite_range+0x3d/0x44)
[   91.574265] [] (__filemap_fdatawrite_range) from
[] (filemap_flush+0x23/0x28)
[   91.574854] [] (filemap_flush) from []
(ext4_rename+0x2f9/0x3e4)
[   91.575360] [] (ext4_rename) from []
(vfs_rename+0x183/0x45c)
[   91.575911] [] (vfs_rename) from []
(SyS_renameat2+0x22b/0x26c)
[   91.576460] [] (SyS_renameat2) from []
(SyS_rename+0x1f/0x24)
[   91.576961] [] (SyS_rename) from []
(ret_fast_syscall+0x1/0x5c)


Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4
(mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't
be surprising, as I saw problems with that patch earlier in the
3.15-rc cycle:
https://lkml.org/lkml/2014/4/14/824

However that discussion petered out (possibly my fault for not
following up) as to if it was an issue with the patch or a issue with
qemu.  Then the original issue disappeared for me, which I figured was
due to a fix upstream, but now I'm guessing coincided with me updating
my system and getting qemu v2.0 (where as previously I was on 1.5).

$ qemu-system-arm -version
QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright
(c) 2003-2008 Fabrice Bellard

While the previous behavior was annoying and kept my emulated
environments from booting, this while a bit more rare and subtle eats
the disks, which is much more painful for my testing.

Unfortunately reverting the change (manually, as it doesn't revert
cleanly anymore) doesn't seem to completely avoid the issue, so the
bisection may have gone slightly astray (though it is interesting it
landed on the same commit I earlier had trouble with). So I'll
back-track and double check some of the last few "good" results to
validate I didn't just luck into 3 good boots accidentally. I'll also
review my revert in case I missed something subtle in doing it
manually.

Anyway, if there is any thoughts on how to better chase this down and
debug it, I'd appreciate it! I can also provide reproduction
instructions with a pre-built Linaro android disk image and hand built
kernel if anyone wants to debug this themselves.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH 3/3] f2fs: avoid to truncate non-updated page partially

2014-06-11 Thread Chao Yu

After we call find_data_page in truncate_partial_data_page, we could not
guarantee this page is updated or not as error may occurred in lower layer.

We'd better check status of the page to avoid this no updated page be
writebacked to device.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9c49c59..fc569ca 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -380,13 +380,15 @@ static void truncate_partial_data_page(struct inode 
*inode, u64 from)
return;
 
lock_page(page);
-   if (unlikely(page->mapping != inode->i_mapping)) {
-   f2fs_put_page(page, 1);
-   return;
-   }
+   if (unlikely(!PageUptodate(page) ||
+   page->mapping != inode->i_mapping))
+   goto out;
+
f2fs_wait_on_page_writeback(page, DATA);
zero_user(page, offset, PAGE_CACHE_SIZE - offset);
set_page_dirty(page);
+
+out:
f2fs_put_page(page, 1);
 }
 
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH 2/3] f2fs: avoid unneeded SetPageUptodate in f2fs_write_end

2014-06-11 Thread Chao Yu

We have already set page update in ->write_begin, so we should remove redundant
SetPageUptodate in ->write_end.

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c1fb6dd..fd133cf 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1003,7 +1003,6 @@ static int f2fs_write_end(struct file *file,
 
trace_f2fs_write_end(inode, pos, len, copied);
 
-   SetPageUptodate(page);
set_page_dirty(page);
 
if (pos + copied > i_size_read(inode)) {
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] MMC updates for 3.16-rc1

2014-06-11 Thread Guennadi Liakhovetski

Hi Linus,

On Wed, 11 Jun 2014, Linus Torvalds wrote:

> On Tue, Jun 10, 2014 at 2:50 PM, Linus Torvalds
>  wrote:
> >
> > Also, that new drivers/mmc/host/usdhi6rol0.c driver is one f*cking
> > noisy compile, and knisr certainly has never been tested in a 64-bit
> > environment. Please either fix it, or make it depend on BROKEN.
> 
> Guys? Seriously, if that driver isn't fixed, I'm going to mark it
> broken myself. It pretty much generates as many lines of warnings as
> the rest of my "allmodconfig" build combined.
> 
> It's extremely annoying, and the crazy warnings are likely to hide
> potential real problems elsewhere, so right now that driver has
> negative value. I do a lot of allmodconfig builds during the merge
> window, and I am not going to look at that warning much longer.
> 
> Fix it promptly, or it gets disabled.

I sent a patch a few hours ago: 
https://patchwork.kernel.org/patch/4338531/ Since it's only changing print 
format strings, it should be a trivial one to review, so, just waiting for 
Chris to pick it up and push it to you. Sorry about the trouble.

Thanks
Guennadi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Minchan Kim

On Thu, Jun 12, 2014 at 12:21:39PM +0900, Joonsoo Kim wrote:
> We should free memory for bitmap when we find zone mis-match,
> otherwise this memory will leak.

Then, -stable stuff?

> 
> Additionally, I copy code comment from ppc kvm's cma code to notify
> why we need to check zone mis-match.
> 
> Signed-off-by: Joonsoo Kim 
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index bd0bb81..fb0cdce 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
>   base_pfn = pfn;
>   for (j = pageblock_nr_pages; j; --j, pfn++) {
>   WARN_ON_ONCE(!pfn_valid(pfn));
> + /*
> +  * alloc_contig_range requires the pfn range
> +  * specified to be in the same zone. Make this
> +  * simple by forcing the entire CMA resv range
> +  * to be in the same zone.
> +  */
>   if (page_zone(pfn_to_page(pfn)) != zone)
> - return -EINVAL;
> + goto err;

At a first glance, I thought it would be better to handle such error
before activating.
So when I see the registration code(ie, dma_contiguous_revere_area),
I realized it is impossible because we didn't set up zone yet. :(

If so, when we detect to fail here, it would be better to report more
meaningful error message like what was successful zone and what is
new zone and failed pfn number?

>   }
>   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   } while (--i);
>  
>   mutex_init(>lock);
>   return 0;
> +
> +err:
> + kfree(cma->bitmap);
> + return -EINVAL;
>  }
>  
>  static struct cma cma_areas[MAX_CMA_AREAS];
> -- 
> 1.7.9.5

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH 1/3] f2fs: check lower bound nid value in check_nid_range

2014-06-11 Thread Chao Yu

This patch add lower bound verification for nid in check_nid_range, so nids
reserved like 0, node, meta passed by caller could be checked there.

And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying
code.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h  |3 ++-
 fs/f2fs/inode.c |1 +
 fs/f2fs/super.c |4 +---
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 7ef7acd..58df97e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -641,7 +641,8 @@ static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi)
  */
 static inline int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
 {
-   WARN_ON((nid >= NM_I(sbi)->max_nid));
+   if (unlikely(nid < F2FS_ROOT_INO(sbi)))
+   return -EINVAL;
if (unlikely(nid >= NM_I(sbi)->max_nid))
return -EINVAL;
return 0;
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index adc622c..2cf6962 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -78,6 +78,7 @@ static int do_read_inode(struct inode *inode)
if (check_nid_range(sbi, inode->i_ino)) {
f2fs_msg(inode->i_sb, KERN_ERR, "bad inode number: %lu",
 (unsigned long) inode->i_ino);
+   WARN_ON(1);
return -EINVAL;
}
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b2b1863..8f96d93 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -689,9 +689,7 @@ static struct inode *f2fs_nfs_get_inode(struct super_block 
*sb,
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
 
-   if (unlikely(ino < F2FS_ROOT_INO(sbi)))
-   return ERR_PTR(-ESTALE);
-   if (unlikely(ino >= NM_I(sbi)->max_nid))
+   if (check_nid_range(sbi, ino))
return ERR_PTR(-ESTALE);
 
/*
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/cirrus: bind also to qemu-xen-traditional

2014-06-11 Thread Olaf Hering

Ping?

On Fri, Apr 11, Olaf Hering wrote:

> qemu as used by xend/xm toolstack uses a different subvendor id.
> Bind the drm driver also to this emulated card.
> 
> Signed-off-by: Olaf Hering 
> ---
>  drivers/gpu/drm/cirrus/cirrus_drv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/cirrus/cirrus_drv.c 
> b/drivers/gpu/drm/cirrus/cirrus_drv.c
> index 953fc8a..848 100644
> --- a/drivers/gpu/drm/cirrus/cirrus_drv.c
> +++ b/drivers/gpu/drm/cirrus/cirrus_drv.c
> @@ -31,6 +31,8 @@ static struct drm_driver driver;
>  static DEFINE_PCI_DEVICE_TABLE(pciidlist) = {
>   { PCI_VENDOR_ID_CIRRUS, PCI_DEVICE_ID_CIRRUS_5446, 0x1af4, 0x1100, 0,
> 0, 0 },
> + { PCI_VENDOR_ID_CIRRUS, PCI_DEVICE_ID_CIRRUS_5446, PCI_VENDOR_ID_XEN,
> +   0x0001, 0, 0, 0 },
>   {0,}
>  };
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH[[vme/bridges/vme_ca91cx42.c:1382: Bad if test Bug Fix]‏

2014-06-11 Thread gre...@linuxfoundation.org

On Thu, Jun 12, 2014 at 04:17:36AM +, Nick Krause wrote:
> Here is the fixed patch as per Greg's recommendations. Unforunalty my email
> client removes tabs so I will have to be sending it as a patch file if that's
> Ok.
> Nick

HTML is rejected by the mailing lists, and we can't take a base64
attachment either :(

Take a look at Documentation/email_clients.txt for ideas on how to fix
this up on your end.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Minchan Kim

Hi Joonsoo,

On Thu, Jun 12, 2014 at 12:21:38PM +0900, Joonsoo Kim wrote:
> We don't need explicit 'CMA:' prefix, since we already define prefix
> 'cma:' in pr_fmt. So remove it.
> 
> And, some logs print function name and others doesn't. This looks
> bad to me, so I unify log format to print function name consistently.
> 
> Lastly, I add one more debug log on cma_activate_area().

When I take a look, it just indicates cma_activate_area was called or not,
without what range for the area was reserved successfully so I couldn't see
the intention for new message. Description should explain it so that everybody
can agree on your claim.

> 
> Signed-off-by: Joonsoo Kim 
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 83969f8..bd0bb81 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
>   }
>  
>   if (selected_size && !dma_contiguous_default_area) {
> - pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> + pr_debug("%s(): reserving %ld MiB for global area\n", __func__,
>(unsigned long)selected_size / SZ_1M);
>  
>   dma_contiguous_reserve_area(selected_size, selected_base,
> @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
>   unsigned i = cma->count >> pageblock_order;
>   struct zone *zone;
>  
> - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> + pr_debug("%s()\n", __func__);
>  
> + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
>   if (!cma->bitmap)
>   return -ENOMEM;
>  
> @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
> phys_addr_t base,
>  
>   /* Sanity checks */
>   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
> - pr_err("Not enough slots for CMA reserved regions!\n");
> + pr_err("%s(): Not enough slots for CMA reserved regions!\n",
> + __func__);
>   return -ENOSPC;
>   }
>  
> @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
> size, phys_addr_t base,
>   *res_cma = cma;
>   cma_area_count++;
>  
> - pr_info("CMA: reserved %ld MiB at %08lx\n", (unsigned long)size / SZ_1M,
> - (unsigned long)base);
> + pr_info("%s(): reserved %ld MiB at %08lx\n",
> + __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
>  
>   /* Architecture specific contiguous memory fixup. */
>   dma_contiguous_early_fixup(base, size);
>   return 0;
>  err:
> - pr_err("CMA: failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M);
> + pr_err("%s(): failed to reserve %ld MiB\n",
> + __func__, (unsigned long)size / SZ_1M);
>   return ret;
>  }
>  
> -- 
> 1.7.9.5

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1] fs2dt: Refine kdump device_tree sort

2014-06-11 Thread Wei.Yang

From: Yang Wei 

The commit b02d735bf was to rearrange the device-tree entries, and
assumed that these entries are sorted in the ascending order. but
acctually when I was validating kexec and kdump, the order of
serial node still is changed. We should not only compare the length
of directory name, but also compare the directory name, it would
ensure that the order of device node is really in ascending order.

Signed-off-by: Yang Wei 
---
 kexec/fs2dt.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

It is validated on Freescale t4240qds.

diff --git a/kexec/fs2dt.c b/kexec/fs2dt.c
index 1e5f074..0bffaf5 100644
--- a/kexec/fs2dt.c
+++ b/kexec/fs2dt.c
@@ -479,6 +479,9 @@ static int comparefunc(const struct dirent **dentry1,
 {
char *str1 = (*(struct dirent **)dentry1)->d_name;
char *str2 = (*(struct dirent **)dentry2)->d_name;
+   char* ptr1 = strchr(str1, '@');
+   char* ptr2 = strchr(str2, '@');
+   int len1, len2;
 
/*
 * strcmp scans from left to right and fails to idetify for some
@@ -486,9 +489,13 @@ static int comparefunc(const struct dirent **dentry1,
 * Therefore, we get the wrong sorted order like memory@1000 and
 * memory@f00.
 */
-   if (strchr(str1, '@') && strchr(str2, '@') &&
-   (strlen(str1) > strlen(str2)))
-   return 1;
+   if (ptr1 && ptr2) {
+   len1 = ptr1 - str1;
+   len2 = ptr2 - str2;
+   if (!strncmp(str1, str2, len1 >len2 ? len1: len2) &&
+   (strlen(str1) > strlen(str2)))
+   return 1;
+   }
 
return strcmp(str1, str2);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-11 Thread Aneesh Kumar K.V

Joonsoo Kim  writes:

> ppc kvm's cma area management needs alignment constraint on
> cma region. So support it to prepare generalization of cma area
> management functionality.
>
> Additionally, add some comments which tell us why alignment
> constraint is needed on cma region.
>
> Signed-off-by: Joonsoo Kim 

Reviewed-by: Aneesh Kumar K.V 

>
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 8a44c82..bc4c171 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct cma {
>   unsigned long   base_pfn;
> @@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
>   * @size: Size of the reserved area (in bytes),
>   * @base: Base address of the reserved area optional, use 0 for any
>   * @limit: End address of the reserved memory (optional, 0 for any).
> + * @alignment: Alignment for the contiguous memory area, should be power of 2
>   * @res_cma: Pointer to store the created cma region.
>   * @fixed: hint about where to place the reserved area
>   *
> @@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
>   */
>  static int __init __dma_contiguous_reserve_area(phys_addr_t size,
>   phys_addr_t base, phys_addr_t limit,
> + phys_addr_t alignment,
>   struct cma **res_cma, bool fixed)
>  {
>   struct cma *cma = _areas[cma_area_count];
> - phys_addr_t alignment;
>   int ret = 0;
>
> - pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
> -  (unsigned long)size, (unsigned long)base,
> -  (unsigned long)limit);
> + pr_debug("%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n",
> + __func__, (unsigned long)size, (unsigned long)base,
> + (unsigned long)limit, (unsigned long)alignment);
>
>   /* Sanity checks */
>   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
> @@ -253,8 +255,17 @@ static int __init 
> __dma_contiguous_reserve_area(phys_addr_t size,
>   if (!size)
>   return -EINVAL;
>
> - /* Sanitise input arguments */
> - alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
> + if (alignment && !is_power_of_2(alignment))
> + return -EINVAL;
> +
> + /*
> +  * Sanitise input arguments.
> +  * CMA area should be at least MAX_ORDER - 1 aligned. Otherwise,
> +  * CMA area could be merged into other MIGRATE_TYPE by buddy mechanism
> +  * and CMA property will be broken.
> +  */
> + alignment = max(alignment,
> + (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
>   base = ALIGN(base, alignment);
>   size = ALIGN(size, alignment);
>   limit &= ~(alignment - 1);
> @@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
> phys_addr_t base,
>  {
>   int ret;
>
> - ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
> + ret = __dma_contiguous_reserve_area(size, base, limit, 0,
> + res_cma, fixed);
>   if (ret)
>   return ret;
>
> -- 
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-11 Thread Aneesh Kumar K.V

Joonsoo Kim  writes:

> To prepare future generalization work on cma area management code,
> we need to separate core cma management codes from DMA APIs.
> We will extend these core functions to cover requirements of
> ppc kvm's cma area management functionality in following patches.
> This separation helps us not to touch DMA APIs while extending
> core functions.
>
> Signed-off-by: Joonsoo Kim 

Reviewed-by: Aneesh Kumar K.V 

>
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index fb0cdce..8a44c82 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
>   * If @fixed is true, reserve contiguous area at exactly @base.  If false,
>   * reserve in range from @base to @limit.
>   */
> -int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
> -phys_addr_t limit, struct cma **res_cma,
> -bool fixed)
> +static int __init __dma_contiguous_reserve_area(phys_addr_t size,
> + phys_addr_t base, phys_addr_t limit,
> + struct cma **res_cma, bool fixed)
>  {
>   struct cma *cma = _areas[cma_area_count];
>   phys_addr_t alignment;
> @@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
> size, phys_addr_t base,
>
>   pr_info("%s(): reserved %ld MiB at %08lx\n",
>   __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
> -
> - /* Architecture specific contiguous memory fixup. */
> - dma_contiguous_early_fixup(base, size);
>   return 0;
> +
>  err:
>   pr_err("%s(): failed to reserve %ld MiB\n",
>   __func__, (unsigned long)size / SZ_1M);
>   return ret;
>  }
>
> +int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
> +phys_addr_t limit, struct cma **res_cma,
> +bool fixed)
> +{
> + int ret;
> +
> + ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
> + if (ret)
> + return ret;
> +
> + /* Architecture specific contiguous memory fixup. */
> + dma_contiguous_early_fixup(base, size);
> +
> + return 0;
> +}
> +
>  static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
>  {
>   mutex_lock(>lock);
> @@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
> long pfn, int count)
>   * global one. Requires architecture specific dev_get_cma_area() helper
>   * function.
>   */
> -struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
>  unsigned int align)
>  {
>   unsigned long mask, pfn, pageno, start = 0;
> - struct cma *cma = dev_get_cma_area(dev);
>   struct page *page = NULL;
>   int ret;
>
>   if (!cma || !cma->count)
>   return NULL;
>
> - if (align > CONFIG_CMA_ALIGNMENT)
> - align = CONFIG_CMA_ALIGNMENT;
> -
>   pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
>count, align);
>
> @@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device 
> *dev, int count,
>   return page;
>  }
>
> +struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> +unsigned int align)
> +{
> + struct cma *cma = dev_get_cma_area(dev);
> +
> + if (align > CONFIG_CMA_ALIGNMENT)
> + align = CONFIG_CMA_ALIGNMENT;
> +
> + return __dma_alloc_from_contiguous(cma, count, align);
> +}
> +
>  /**
>   * dma_release_from_contiguous() - release allocated pages
>   * @dev:   Pointer to device for which the pages were allocated.
> @@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device 
> *dev, int count,
>   * It returns false when provided pages do not belong to contiguous area and
>   * true otherwise.
>   */
> -bool dma_release_from_contiguous(struct device *dev, struct page *pages,
> +static bool __dma_release_from_contiguous(struct cma *cma, struct page 
> *pages,
>int count)
>  {
> - struct cma *cma = dev_get_cma_area(dev);
>   unsigned long pfn;
>
>   if (!cma || !pages)
> @@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
> struct page *pages,
>
>   return true;
>  }
> +
> +bool dma_release_from_contiguous(struct device *dev, struct page *pages,
> +  int count)
> +{
> + struct cma *cma = dev_get_cma_area(dev);
> +
> + return __dma_release_from_contiguous(cma, pages, count);
> +}
> -- 
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Aneesh Kumar K.V

Joonsoo Kim  writes:

> We should free memory for bitmap when we find zone mis-match,
> otherwise this memory will leak.
>
> Additionally, I copy code comment from ppc kvm's cma code to notify
> why we need to check zone mis-match.
>
> Signed-off-by: Joonsoo Kim 

Reviewed-by: Aneesh Kumar K.V 

>
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index bd0bb81..fb0cdce 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
>   base_pfn = pfn;
>   for (j = pageblock_nr_pages; j; --j, pfn++) {
>   WARN_ON_ONCE(!pfn_valid(pfn));
> + /*
> +  * alloc_contig_range requires the pfn range
> +  * specified to be in the same zone. Make this
> +  * simple by forcing the entire CMA resv range
> +  * to be in the same zone.
> +  */
>   if (page_zone(pfn_to_page(pfn)) != zone)
> - return -EINVAL;
> + goto err;
>   }
>   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   } while (--i);
>
>   mutex_init(>lock);
>   return 0;
> +
> +err:
> + kfree(cma->bitmap);
> + return -EINVAL;
>  }
>
>  static struct cma cma_areas[MAX_CMA_AREAS];
> -- 
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Aneesh Kumar K.V

Joonsoo Kim  writes:

> We don't need explicit 'CMA:' prefix, since we already define prefix
> 'cma:' in pr_fmt. So remove it.
>
> And, some logs print function name and others doesn't. This looks
> bad to me, so I unify log format to print function name consistently.
>
> Lastly, I add one more debug log on cma_activate_area().
>
> Signed-off-by: Joonsoo Kim 
>
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 83969f8..bd0bb81 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
>   }
>
>   if (selected_size && !dma_contiguous_default_area) {
> - pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> + pr_debug("%s(): reserving %ld MiB for global area\n", __func__,
>(unsigned long)selected_size / SZ_1M);

Do we need to do function(), or just function:. I have seen the later
usage in other parts of the kernel.

>
>   dma_contiguous_reserve_area(selected_size, selected_base,
> @@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
>   unsigned i = cma->count >> pageblock_order;
>   struct zone *zone;
>
> - cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> + pr_debug("%s()\n", __func__);

why ?

>
> + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
>   if (!cma->bitmap)
>   return -ENOMEM;
>
> @@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
> phys_addr_t base,
>
>   /* Sanity checks */
>   if (cma_area_count == ARRAY_SIZE(cma_areas)) {
> - pr_err("Not enough slots for CMA reserved regions!\n");
> + pr_err("%s(): Not enough slots for CMA reserved regions!\n",
> + __func__);
>   return -ENOSPC;
>   }
>
> @@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t 
> size, phys_addr_t base,
>   *res_cma = cma;
>   cma_area_count++;
>
> - pr_info("CMA: reserved %ld MiB at %08lx\n", (unsigned long)size / SZ_1M,
> - (unsigned long)base);
> + pr_info("%s(): reserved %ld MiB at %08lx\n",
> + __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
>
>   /* Architecture specific contiguous memory fixup. */
>   dma_contiguous_early_fixup(base, size);
>   return 0;
>  err:
> - pr_err("CMA: failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M);
> + pr_err("%s(): failed to reserve %ld MiB\n",
> + __func__, (unsigned long)size / SZ_1M);
>   return ret;
>  }
>
> -- 
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] MMC updates for 3.16-rc1

2014-06-11 Thread Linus Torvalds

On Tue, Jun 10, 2014 at 2:50 PM, Linus Torvalds
 wrote:
>
> Also, that new drivers/mmc/host/usdhi6rol0.c driver is one f*cking
> noisy compile, and knisr certainly has never been tested in a 64-bit
> environment. Please either fix it, or make it depend on BROKEN.

Guys? Seriously, if that driver isn't fixed, I'm going to mark it
broken myself. It pretty much generates as many lines of warnings as
the rest of my "allmodconfig" build combined.

It's extremely annoying, and the crazy warnings are likely to hide
potential real problems elsewhere, so right now that driver has
negative value. I do a lot of allmodconfig builds during the merge
window, and I am not going to look at that warning much longer.

Fix it promptly, or it gets disabled.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] usb: ehci-exynos: Make provision for vdd regulators

2014-06-11 Thread Jingoo Han

On Thursday, June 12, 2014 12:39 AM, Alan Stern wrote:
> On Fri, 6 Jun 2014, Vivek Gautam wrote:
> 
> > Facilitate getting required 3.3V and 1.0V VDD supply for
> > EHCI controller on Exynos.
> >
> > With patches for regulators' nodes merged in 3.15:
> > c8c253f ARM: dts: Add regulator entries to smdk5420
> > 275dcd2 ARM: dts: add max77686 pmic node for smdk5250,
> >
> > certain perripherals will now need to ensure that,
> > they request VDD regulators in their drivers, and enable
> > them so as to make them working.
> 
> "Certain peripherals"?  Don't you mean "certain controllers"?
> 
> Does this mean some controllers don't need to use the VDD regulators?
> 
> > @@ -193,7 +196,31 @@ static int exynos_ehci_probe(struct platform_device 
> > *pdev)
> >
> > err = exynos_ehci_get_phy(>dev, exynos_ehci);
> > if (err)
> > -   goto fail_clk;
> > +   goto fail_regulator1;
> > +
> > +   exynos_ehci->vdd33 = devm_regulator_get(>dev, "vdd33");
> > +   if (!IS_ERR(exynos_ehci->vdd33)) {
> > +   err = regulator_enable(exynos_ehci->vdd33);
> > +   if (err) {
> > +   dev_err(>dev,
> > +   "Failed to enable 3.3V Vdd supply\n");
> > +   goto fail_regulator1;
> > +   }
> > +   } else {
> > +   dev_warn(>dev, "Regulator 3.3V Vdd supply not found\n");
> > +   }
> 
> What if this is one of the controllers that don't need to use a VDD
> regulator?  Do you really want to print out a warning in that case?
> Should you call devm_regulator_get_optional() instead?

I agree with Alan's suggestion. This warning message is not
proper, when USB controllers that don't need a VDD regulator
are used. The devm_regulator_get_optional() looks better.

Best regards,
Jingoo Han


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the target-updates tree with the virtio tree

2014-06-11 Thread Stephen Rothwell

Hi Nicholas,

Today's linux-next merge of the target-updates tree got a conflict in
drivers/scsi/virtio_scsi.c between commit c77fba9ab058 ("virtio_scsi:
don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock") from the
virtio tree and commit e6dc783a38ec ("virtio-scsi: Enable DIF/DIX modes
in SCSI host LLD") from the target-updates tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/scsi/virtio_scsi.c
index 99fdb9403944,1c326b63ca55..
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@@ -396,10 -438,11 +398,10 @@@ static void virtscsi_event_done(struct 
   */
  static int virtscsi_add_cmd(struct virtqueue *vq,
struct virtio_scsi_cmd *cmd,
 -  size_t req_size, size_t resp_size, gfp_t gfp)
 +  size_t req_size, size_t resp_size)
  {
struct scsi_cmnd *sc = cmd->sc;
-   struct scatterlist *sgs[4], req, resp;
+   struct scatterlist *sgs[6], req, resp;
struct sg_table *out, *in;
unsigned out_num = 0, in_num = 0;
  
@@@ -425,10 -472,14 +431,14 @@@
sgs[out_num + in_num++] = 
  
/* Data-in buffer */
-   if (in)
+   if (in) {
+   /* Place READ protection SGLs before Data IN payload */
+   if (scsi_prot_sg_count(sc))
+   sgs[out_num + in_num++] = scsi_prot_sglist(sc);
sgs[out_num + in_num++] = in->sgl;
+   }
  
 -  return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp);
 +  return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
  }
  
  static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
@@@ -455,9 -538,10 +497,10 @@@ static int virtscsi_queuecommand(struc
 struct virtio_scsi_vq *req_vq,
 struct scsi_cmnd *sc)
  {
 -  struct virtio_scsi_cmd *cmd;
 -  int ret, req_size;
 -
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
 +  struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
++  int req_size;
 +
BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
  
/* TODO: check feature bit and fail if unsupported?  */
@@@ -466,26 -550,34 +509,24 @@@
dev_dbg(>device->sdev_gendev,
"cmd %p CDB: %#02x\n", sc, sc->cmnd[0]);
  
 -  ret = SCSI_MLQUEUE_HOST_BUSY;
 -  cmd = mempool_alloc(virtscsi_cmd_pool, GFP_ATOMIC);
 -  if (!cmd)
 -  goto out;
 -
memset(cmd, 0, sizeof(*cmd));
cmd->sc = sc;
-   cmd->req.cmd = (struct virtio_scsi_cmd_req){
-   .lun[0] = 1,
-   .lun[1] = sc->device->id,
-   .lun[2] = (sc->device->lun >> 8) | 0x40,
-   .lun[3] = sc->device->lun & 0xff,
-   .tag = (unsigned long)sc,
-   .task_attr = VIRTIO_SCSI_S_SIMPLE,
-   .prio = 0,
-   .crn = 0,
-   };
  
BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE);
-   memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
  
-   if (virtscsi_kick_cmd(req_vq, cmd,
- sizeof cmd->req.cmd, sizeof cmd->resp.cmd) != 0)
+   if (virtio_has_feature(vscsi->vdev, VIRTIO_SCSI_F_T10_PI)) {
+   virtio_scsi_init_hdr_pi(>req.cmd_pi, sc);
+   memcpy(cmd->req.cmd_pi.cdb, sc->cmnd, sc->cmd_len);
+   req_size = sizeof(cmd->req.cmd_pi);
+   } else {
+   virtio_scsi_init_hdr(>req.cmd, sc);
+   memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
+   req_size = sizeof(cmd->req.cmd);
+   }
+ 
 -  if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd),
 -GFP_ATOMIC) == 0)
 -  ret = 0;
 -  else
 -  mempool_free(cmd, virtscsi_cmd_pool);
 -
 -out:
 -  return ret;
++  if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof cmd->resp.cmd) != 0)
 +  return SCSI_MLQUEUE_HOST_BUSY;
 +  return 0;
  }
  
  static int virtscsi_queuecommand_single(struct Scsi_Host *sh,


signature.asc
Description: PGP signature

random: Benchamrking fast_mix2

2014-06-11 Thread George Spelvin

> I redid my numbers, and I can no longer reproduce the 7x slowdown.  I
> do see that if you compile w/o -O2, fast_mix2 is twice as slow.  But
> it's not 7x slower.

For my single-round, I needed to drop to 2 loops rather than 3 to match
the speed.  That's in the source I posted, but I didn't point it out.

(It wasn't an attempt to be deceptive, that's just how I happened
to have left the file when I was experimenting with various options.
I figured if we were looking for 7x, 1.5x wasn't all that important.)

That explains some of the residual difference between our figures.

When developing, I was using a many-iteration benchmark, and I suspect it
fitted in the Ivy Bridge uop cache, which let it saturate the execution
resources.

Sorry for the premature alarm; I'll go back to work and find something
better.

I still get comparable speed for 2 loops and -O2:
$ cc -W -Wall -m32 -O2 -march=native random.c -o random32
# ./perftest ../spooky/random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0:148124 (-24)
 1: 48 36 (-12)
 2: 40 36 (-4)
 3: 44 40 (-4)
 4: 44 40 (-4)
 5: 36 36 (+0)
 6: 52 36 (-16)
 7: 44 32 (-12)
 8: 44 36 (-8)
 9: 48 36 (-12)
$ cc -W -Wall -m64 -O2 -march=native random.c -o random64
# ./perftest ../spooky/random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0:132104 (-28)
 1: 40 40 (+0)
 2: 36 44 (+8)
 3: 32 40 (+8)
 4: 40 36 (-4)
 5: 32 40 (+8)
 6: 36 44 (+8)
 7: 40 40 (+0)
 8: 36 44 (+8)
 9: 40 36 (-4)
$ cc -W -Wall -m32 -O3 -march=native random.c -o random32
# ./perftest ./random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 88 48 (-40)
 1: 36 40 (+4)
 2: 36 44 (+8)
 3: 32 40 (+8)
 4: 36 40 (+4)
 5: 96 40 (-56)
 6: 40 40 (+0)
 7: 36 40 (+4)
 8: 28 48 (+20)
 9: 28 40 (+12)
$ cc -W -Wall -m64 -O3 -march=native random.c -o random64
# ./perftest ./random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 72 80 (+8)
 1: 36 52 (+16)
 2: 32 36 (+4)
 3: 32 36 (+4)
 4: 28 40 (+12)
 5: 32 40 (+8)
 6: 32 40 (+8)
 7: 32 36 (+4)
 8: 28 44 (+16)
 9: 36 36 (+0)
$ cc -W -Wall -m32 -Os -march=native random.c -o random32
# ./perftest ./random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0:108132 (+24)
 1: 44 44 (+0)
 2: 76 40 (-36)
 3: 44 48 (+4)
 4: 36 40 (+4)
 5: 32 44 (+12)
 6: 40 56 (+16)
 7: 44 36 (-8)
 8: 44 40 (-4)
 9: 32 40 (+8)
$ $ cc -W -Wall -m64 -Os -march=native random.c -o random64
# ./perftest ./random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 96108 (+12)
 1: 44 52 (+8)
 2: 40 40 (+0)
 3: 40 36 (-4)
 4: 40 32 (-8)
 5: 36 36 (+0)
 6: 44 32 (-12)
 7: 36 36 (+0)
 8: 40 36 (-4)
 9: 40 36 (-4)

Yours looks much more careful about the timing.

A few GCC warnings I ended up fixing:
1) "volatile" on rdtsc is meaningless and ignore (with a warning)
2) fast_mix2() needs a void return type; it defaults to int.
3) int main() needs a "return 0"


Here's what I got running *your* program, unmodified except
for the above (meaning 3 inner loop iterations).
Compiled with GCC 4.9.0 (Devian 4.9.0-6), -O2.

i7-4940K# ./perftest ./ted32   
fast_mix: 430   fast_mix2: 431
fast_mix: 442   fast_mix2: 464
fast_mix: 442   fast_mix2: 465
fast_mix: 442   fast_mix2: 431
fast_mix: 442   fast_mix2: 465
fast_mix: 431   fast_mix2: 430
fast_mix: 442   fast_mix2: 431
fast_mix: 431   fast_mix2: 465
fast_mix: 431   fast_mix2: 465
fast_mix: 431   fast_mix2: 431
i7-4940K# ./perftest ./ted64
fast_mix: 454   fast_mix2: 465
fast_mix: 453   fast_mix2: 465
fast_mix: 442   fast_mix2: 464
fast_mix: 453   fast_mix2: 464
fast_mix: 454   fast_mix2: 465
fast_mix: 453   fast_mix2: 465
fast_mix: 442   fast_mix2: 464
fast_mix: 453   fast_mix2: 464
fast_mix: 453   fast_mix2: 464
fast_mix: 453   fast_mix2: 465

In other words, pretty damn near the same
speed (with 3 loops).

So we still have some discrepancy to track

Re: [PATCH 1/4] spi: qup: Remove chip select function

2014-06-11 Thread Andy Gross

On Mon, May 19, 2014 at 11:07:38AM +0300, Ivan T. Ivanov wrote:

 
> > +- num-cs:  total number of chipselects
> 
> My understanding is that "num-cs" have to be parsed by
> master driver, not by core SPI driver.

Right.  I need to parse it and check vs the max cs and use that value to set the
master->num_chipselect

> 
> > -
> > -   /* Disable auto CS toggle and use manual */
> > -   iocontol &= ~SPI_IO_C_MX_CS_MODE;
> 
> Probably we should keep this?

Actually this is cleared in the probe during the initial settings of IO_CONTROL.
So this isn't necessary.


-- 
sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH[vme/bridges/vme_ca91cx42.c:1382: Bad if test Bug Fix]

2014-06-11 Thread gre...@linuxfoundation.org

On Thu, Jun 12, 2014 at 03:44:34AM +, Nick Krause wrote:
> Hey Fellow Developers, 
> This is my first patch so if there are any errors please reply as i will 
> fix them. Below is the patch.
> -- drivers/vme/bridges/vme_ca91cx42.h.orig    2014-06-11 22:50:29.339671939 
> -0400
> +++ drivers/vme/bridges/vme_ca91cx42.h    2014-06-11 23:15:36.027685173 -0400
> @@ -526,7 +526,7 @@ static const int CA91CX42_LINT_LM[] = {
>  #define CA91CX42_VSI_CTL_SUPER_SUPR    (1<<21)
>  
>  #define CA91CX42_VSI_CTL_VAS_M        (7<<16)
> -#define CA91CX42_VSI_CTL_VAS_A16    0
> +#define CA91CX42_VSI_CTL_VAS_A16    (3<<16)
>  #define CA91CX42_VSI_CTL_VAS_A24    (1<<16)
>  #define CA91CX42_VSI_CTL_VAS_A32    (1<<17)
>  #define CA91CX42_VSI_CTL_VAS_USER1    (3<<17)
> @@ -549,7 +549,7 @@ static const int CA91CX42_LINT_LM[] = {
>  #define CA91CX42_LM_CTL_SUPR        (1<<21)
>  #define CA91CX42_LM_CTL_NPRIV        (1<<20)
>  #define CA91CX42_LM_CTL_AS_M        (5<<16)
> -#define CA91CX42_LM_CTL_AS_A16        0
> +#define CA91CX42_LM_CTL_AS_A16        (3<<16)
>  #define CA91CX42_LM_CTL_AS_A24        (1<<16)
>  #define CA91CX42_LM_CTL_AS_A32        (1<<17)
> Signed-off-by: Nicholas Krause 

Always run your patch through scripts/checkpatch.pl first to catch the
issues that are 'obvious'.

After that, the signed-off-by: needs to be up in the changelog area,
there needs to be a changelog explaining why this patch is needed, and
the tabs need to be put back in the patch (your email client ate them.)

Can you try again?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PATCH[vme/bridges/vme_ca91cx42.c:1382: Bad if test Bug Fix]

2014-06-11 Thread Nick Krause

Hey Fellow Developers, 
This is my first patch so if there are any errors please reply as i will 
fix them. Below is the patch.
-- drivers/vme/bridges/vme_ca91cx42.h.orig    2014-06-11 22:50:29.339671939 
-0400
+++ drivers/vme/bridges/vme_ca91cx42.h    2014-06-11 23:15:36.027685173 -0400
@@ -526,7 +526,7 @@ static const int CA91CX42_LINT_LM[] = {
 #define CA91CX42_VSI_CTL_SUPER_SUPR    (1<<21)
 
 #define CA91CX42_VSI_CTL_VAS_M        (7<<16)
-#define CA91CX42_VSI_CTL_VAS_A16    0
+#define CA91CX42_VSI_CTL_VAS_A16    (3<<16)
 #define CA91CX42_VSI_CTL_VAS_A24    (1<<16)
 #define CA91CX42_VSI_CTL_VAS_A32    (1<<17)
 #define CA91CX42_VSI_CTL_VAS_USER1    (3<<17)
@@ -549,7 +549,7 @@ static const int CA91CX42_LINT_LM[] = {
 #define CA91CX42_LM_CTL_SUPR        (1<<21)
 #define CA91CX42_LM_CTL_NPRIV        (1<<20)
 #define CA91CX42_LM_CTL_AS_M        (5<<16)
-#define CA91CX42_LM_CTL_AS_A16        0
+#define CA91CX42_LM_CTL_AS_A16        (3<<16)
 #define CA91CX42_LM_CTL_AS_A24        (1<<16)
 #define CA91CX42_LM_CTL_AS_A32        (1<<17)
Signed-off-by: Nicholas Krause 
Nick
  --
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net/ipvs] BUG: unable to handle kernel NULL pointer dereference at 00000004

2014-06-11 Thread Simon Horman

On Wed, Jun 11, 2014 at 04:34:19PM +0800, Jet Chen wrote:
> On 06/11/2014 01:59 PM, Julian Anastasov wrote:
> > 
> > Hello,
> > 
> > On Wed, 11 Jun 2014, Jet Chen wrote:
> > 
> >> Hi Wensong,
> >>
> >> 0day kernel testing robot got the below dmesg.
> >>
> >> +---++
> >> | boot_successes| 26 |
> >> | boot_failures | 4  |
> >> | BUG:unable_to_handle_kernel_NULL_pointer_dereference  | 4  |
> >> | Oops  | 4  |
> >> | EIP_is_at_ip_vs_stop_estimator| 4  |
> >> | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 4  |
> >> | backtrace:cleanup_net | 4  |
> >> +---++
> >>
> >>
> >> [child0:2725] process_vm_readv (347) returned ENOSYS, marking as inactive.
> >> [child0:2725] uid changed! Was: 0, now -788547075
> >> Bailing main loop. Exit reason: UID changed.
> >> [   12.182233] BUG: unable to handle kernel NULL pointer dereference at 
> >> 0004
> >> [   12.183011] IP: [<4c2f6567>] ip_vs_stop_estimator+0x20/0x3e
> >> [   12.183011] *pdpt =  *pde = f000ff53f000ff53 [   
> >> 12.183011] Oops: 0002 [#1] DEBUG_PAGEALLOC
> >> [   12.183011] Modules linked in:
> >> [   12.183011] CPU: 0 PID: 57 Comm: kworker/u2:1 Not tainted 3.15.0-rc8 #1
> >> [   12.183011] Workqueue: netns cleanup_net
> >> [   12.183011] task: 528773f0 ti: 52878000 task.ti: 52878000
> >> [   12.183011] EIP: 0060:[<4c2f6567>] EFLAGS: 00010206 CPU: 0
> >> [   12.183011] EIP is at ip_vs_stop_estimator+0x20/0x3e
> >> [   12.183011] EAX:  EBX: 51c39a54 ECX:  EDX: 
> > 
> > ip_vs_stop_estimator fails at list_del(>list)
> > on mov %eax,0x4(%edx) instruction and EDX is 0. It means,
> > this estimator was never started (initialized with
> > INIT_LIST_HEAD in ip_vs_start_estimator) or stopped
> > before with the same list_del.
> > 
> > At first look, it is strange but I think the reason
> > is the missing CONFIG_SYSCTL. ip_vs_control_net_cleanup
> > fails at ip_vs_stop_estimator(net, >tot_stats)
> > because it is called not depending on CONFIG_SYSCTL but
> > without CONFIG_SYSCTL ip_vs_start_estimator was never
> > called.
> > 
> > Can you test such patch?
> 
> Julian, your patch works. Thanks.
> 
> Tested-by: Jet Chen 

Thanks, Julian, should I take this one?
I'm assuming this problem has been present for quite a number of releases.

> > ipvs: stop tot_stats estimator only under CONFIG_SYSCTL
> > 
> > The tot_stats estimator is started only when CONFIG_SYSCTL
> > is defined. But it is stopped without checking CONFIG_SYSCTL.
> > Fix the crash by moving ip_vs_stop_estimator into
> > ip_vs_control_net_cleanup_sysctl.
> > 
> > Signed-off-by: Julian Anastasov 
> > ---
> >  net/netfilter/ipvs/ip_vs_ctl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> > index c42e83d..581a658 100644
> > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > @@ -3778,6 +3778,7 @@ static void __net_exit 
> > ip_vs_control_net_cleanup_sysctl(struct net *net)
> > cancel_delayed_work_sync(>defense_work);
> > cancel_work_sync(>defense_work.work);
> > unregister_net_sysctl_table(ipvs->sysctl_hdr);
> > +   ip_vs_stop_estimator(net, >tot_stats);
> >  }
> >  
> >  #else
> > @@ -3840,7 +3841,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net 
> > *net)
> > struct netns_ipvs *ipvs = net_ipvs(net);
> >  
> > ip_vs_trash_cleanup(net);
> > -   ip_vs_stop_estimator(net, >tot_stats);
> > ip_vs_control_net_cleanup_sysctl(net);
> > remove_proc_entry("ip_vs_stats_percpu", net->proc_net);
> > remove_proc_entry("ip_vs_stats", net->proc_net);
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin

Just to add to my total confusion about the totally disparate performance
numbers we're seeing, I did some benchmarks on other machines.

The speedup isn't as good one-pass as it is iterated, and as I mentioned
it's slower on a P4, but it's not 7 times slower by any stretch.

There are all 1-iteration numbers, run immediately after scp-ing the
binary to the machine so there's no possibility if anything being cached.

(The "64" and "32" versions are compiled -m32 and -m64,
of course.)

2.5 GHz Phenom 9850:

$ /tmp/random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0:199142 (-57)
 1:104 95 (-9)
 2:104110 (+6)
 3:103109 (+6)
 4:105 89 (-16)
 5:103 88 (-15)
 6:104 89 (-15)
 7:104 95 (-9)
 8:105 85 (-20)
 9:105 85 (-20)
$ /tmp/random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0:324147 (-177)
 1:100 86 (-14)
 2:100 99 (-1)
 3:100 88 (-12)
 4:100 86 (-14)
 5:100 86 (-14)
 6:100 89 (-11)
 7:100111 (+11)
 8:100111 (+11)
 9:100 88 (-12)
$ /tmp/random64 1
pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e
pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1
 0: 554788 220327 (-334461)
 1: 554825 220176 (-334649)
 2: 553505 220148 (-57)
 3: 554661 220064 (-334597)
 4: 569559 220064 (-349495)
 5: 612798 220065 (-392733)
 6: 570287 220064 (-350223)
 7: 554790 220064 (-334726)
 8: 554715 220065 (-334650)
 9: 569840 220064 (-349776)
$ /tmp/random32 1
pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e
pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1
 0: 520117 280225 (-239892)
 1: 520125 280154 (-239971)
 2: 520104 280094 (-240010)
 3: 520079 280060 (-240019)
 4: 520069 280060 (-240009)
 5: 520060 280060 (-24)
 6: 558971 280060 (-278911)
 7: 520102 280060 (-240042)
 8: 520082 280060 (-240022)
 9: 520058 280060 (-239998)


3 GHz i5-3330:
$ /tmp/random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 78 75 (-3)
 1: 36 33 (-3)
 2: 33 39 (+6)
 3: 36 30 (-6)
 4: 36 33 (-3)
 5: 30 33 (+3)
 6: 30 54 (+24)
 7: 24 48 (+24)
 8: 27 33 (+6)
 9: 30 33 (+3)
$ /tmp/random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 66 78 (+12)
 1: 39 39 (+0)
 2: 36 39 (+3)
 3: 45 33 (-12)
 4: 42 33 (-9)
 5: 33 42 (+9)
 6: 45 33 (-12)
 7: 39 36 (-3)
 8:105 48 (-57)
 9: 42 39 (-3)
$ /tmp/random64 1
pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e
pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1
 0: 406188 218104 (-188084)
 1: 402620 246968 (-155652)
 2: 402652 239840 (-162812)
 3: 402720 200312 (-202408)
 4: 402584 200080 (-202504)
 5: 447488 200228 (-247260)
 6: 402788 200312 (-202476)
 7: 402688 200080 (-202608)
 8: 427140 224320 (-202820)
 9: 402576 200080 (-202496)
$ /tmp/random32 1
pool 1 = 54b3ba06 4769d67a eb04bbf3 5e42df6e
pool 2 = 9d3c469e 6fecdb60 423af4ca 465173d1
 0: 406485 266670 (-139815)
 1: 392694 266463 (-126231)
 2: 392496 266763 (-125733)
 3: 426003 266145 (-159858)
 4: 392688 27 (-126021)
 5: 432231 266589 (-165642)
 6: 392754 298734 (-94020)
 7: 392883 284994 (-107889)
 8: 392637 266694 (-125943)
 9: 392985 267024 (-125961)


3.5 GHz i7-2700:
# /tmp/perftest /tmp/random64
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 82 90 (+8)
 1: 38 41 (+3)
 2: 46 38 (-8)
 3: 35 41 (+6)
 4: 46 41 (-5)
 5: 38 38 (+0)
 6: 41 55 (+14)
 7: 41 35 (-6)
 8: 46 24 (-22)
 9: 35 38 (+3)
# /tmp/perftest /tmp/random32
pool 1 = 85670974 e96b1f8f 51244abf 5863283f
pool 2 = 03564c6c eba81d03 55c77fa1 760374a7
 0: 82 76 (-6)
 1: 32 53 (+21)
 2: 49 44 (-5)
 3: 35 41 (+6)
 4: 46 35 (-11)
 5: 35 44 (+9)
 6: 49 50 (+1)
 7: 41 41 (+0)
 8: 32 44 (+12)
 9: 49 44 (-5)
#

linux-next: manual merge of the virtio tree with Linus' tree

2014-06-11 Thread Stephen Rothwell

Hi Rusty,

Today's linux-next merge of the virtio tree got a conflict in
drivers/scsi/virtio_scsi.c between commit b54197c43db8 ("virtio_scsi:
use cmd_size") from Linus' tree and commit c77fba9ab058 ("virtio_scsi:
don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock") from the
virtio tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/scsi/virtio_scsi.c
index d4727b339474,e2a68aece3da..
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@@ -484,10 -529,13 +483,9 @@@ static int virtscsi_queuecommand(struc
memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
  
if (virtscsi_kick_cmd(req_vq, cmd,
- sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
- GFP_ATOMIC) != 0)
 -sizeof cmd->req.cmd, sizeof cmd->resp.cmd) == 0)
 -  ret = 0;
 -  else
 -  mempool_free(cmd, virtscsi_cmd_pool);
 -
 -out:
 -  return ret;
++sizeof cmd->req.cmd, sizeof cmd->resp.cmd) != 0)
 +  return SCSI_MLQUEUE_HOST_BUSY;
 +  return 0;
  }
  
  static int virtscsi_queuecommand_single(struct Scsi_Host *sh,


signature.asc
Description: PGP signature

Re: [PATCH ftrace/core 0/2] ftrace, kprobes: Introduce IPMODIFY flag for ftrace_ops to detect conflicts

2014-06-11 Thread Namhyung Kim

Hi Josh,

On Wed, 11 Jun 2014 11:58:26 -0500, Josh Poimboeuf wrote:
> On Tue, Jun 10, 2014 at 10:50:01AM +, Masami Hiramatsu wrote:
>> Hi,
>> 
>> Here is a pair of patches which introduces IPMODIFY flag for
>> ftrace_ops to detect conflicts of ftrace users who can modify
>> regs->ip in their handler.
>> Currently, only kprobes can change the regs->ip in the handler,
>> but recently kpatch is also want to change it. Moreover, since
>> the ftrace itself exported to modules, it might be considerable
>> senario.
>> 
>> Here we talked on github.
>>  https://github.com/dynup/kpatch/issues/47
>> 
>> To protect modified regs-ip from each other, this series
>> introduces FTRACE_OPS_FL_IPMODIFY flag and ftrace now ensures
>> the flag can be set on each function entry location. If there
>> is someone who already reserve regs->ip on target function
>> entry, ftrace_set_filter_ip or register_ftrace_function will
>> return -EBUSY. Users must handle that.
>> 
>> At this point, all kprobes will reserve regs->ip, since jprobe
>> requires it.
>
> Masami, thanks very much for this!
>
> One issue with this approach is that it _always_ makes kprobes and
> kpatch incompatible when probing/patching the same function, even when
> kprobes doesn't need to touch regs->ip.
>
> Is it possible to add a kprobes flag (KPROBE_FLAG_IPMODIFY), which is
> only set by those kprobes users (just jprobes?) which need to modify IP?
> Then kprobes could only set the corresponding ftrace flag when it's
> really needed.  And I think kprobes could even enforce the fact that
> !KPROBE_FLAG_IPMODIFY users don't change regs->ip.
>
>
> BTW, I've done some testing with this patch set by patching/probing the
> same function with FTRACE_OPS_FL_IPMODIFY, and got some warnings.  I saw
> the following warning when attempting to kpatch a kprobed function:
>
>
>   WARNING: CPU: 2 PID: 18351 at kernel/trace/ftrace.c:419 
> __unregister_ftrace_function+0x1be/0x1d0()
>   Modules linked in: kpatch_meminfo_string(OE+) kpatch(OE) 
> stap_8d70d6e041605bd1e144cba4801652_14636(OE) rfcomm fuse ipt_MASQUERADE ccm 
> xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack bnep ebtable_nat 
> ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat 
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
> ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
> iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic 
> x86_pkg_temp_thermal coretemp kvm_intel snd_hda_intel iTCO_wdt 
> iTCO_vendor_support snd_hda_controller kvm snd_hda_codec iwlwifi snd_hwdep 
> uvcvideo snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_dev
>  ice
>videobuf2_core btusb v4l2_common snd_pcm videodev nfsd cfg80211 microcode 
> e1000e bluetooth media thinkpad_acpi joydev sdhci_pci sdhci pcspkr serio_raw 
> snd_timer i2c_i801 snd mmc_core auth_rpcgss mei_me mei lpc_ich mfd_core 
> shpchp ptp pps_core wmi tpm_tis soundcore tpm rfkill nfs_acl lockd sunrpc 
> dm_crypt i915 i2c_algo_bit drm_kms_helper drm crct10dif_pclmul crc32_pclmul 
> crc32c_intel ghash_clmulni_intel i2c_core video
>   CPU: 2 PID: 18351 Comm: insmod Tainted: GW  OE 3.15.0-IPMODIFY+ #1
>   Hardware name: LENOVO 2356BH8/2356BH8, BIOS G7ET63WW (2.05 ) 11/12/2012
> b39bd289 8803b78d7bc0 816f31ed
> 8803b78d7bf8 8108914d a07f9040
>fff0  0001 8803e7ac4200
>   Call Trace:
>[] dump_stack+0x45/0x56
>[] warn_slowpath_common+0x7d/0xa0
>[] warn_slowpath_null+0x1a/0x20
>[] __unregister_ftrace_function+0x1be/0x1d0
>[] ftrace_startup+0x1e4/0x220
>[] register_ftrace_function+0x43/0x60
>[] kpatch_register+0x664/0x830 [kpatch]
>[] ? 0xa080
>[] ? 0xa080
>[] patch_init+0x194/0x1000 [kpatch_meminfo_string]
>[] ? 0xa0045fff
>[] do_one_initcall+0xd4/0x210
>[] ? set_memory_nx+0x43/0x50
>[] load_module+0x1d92/0x25e0
>[] ? store_uevent+0x70/0x70
>[] ? kernel_read+0x50/0x80
>[] SyS_finit_module+0xa6/0xd0
>[] system_call_fastpath+0x16/0x1b
>
>
> That warning happened because __unregister_ftrace_function() doesn't
> expect FTRACE_OPS_FL_ENABLED to be cleared in the ftrace_startup error
> path.  I tried removing the FTRACE_OPS_FL_ENABLED clearing line in
> ftrace_startup, but I saw more warnings.  

Did you just remove the clearing line or actually clear the flag after
__unregister_ftrace_function() was called?


> This one happened when attempting to kprobe a kpatched function:
>
>
>   WARNING: CPU: 3 PID:  at kernel/kprobes.c:953 arm_kprobe+0xa7/0xe0()
>   Failed to init kprobe-ftrace (-16)
>   Modules linked in: stap_b2ea0de23f179d8ded86fcc19fcc533_(OE) 
> kpatch_meminfo_string(OE) kpatch(OE) rfcomm fuse ccm ipt_MASQUERADE 
>

Re: [patch V4 02/10] rtmutex: Simplify rtmutex_slowtrylock()

2014-06-11 Thread Lai Jiangshan

On 06/12/2014 02:44 AM, Thomas Gleixner wrote:
> Oleg noticed that rtmutex_slowtrylock() has a pointless check for
> rt_mutex_owner(lock) != current.
> 
> To avoid calling try_to_take_rtmutex() we really want to check whether
> the lock has an owner at all or whether the trylock failed because the
> owner is NULL, but the RT_MUTEX_HAS_WAITERS bit is set. This covers
> the lock is owned by caller situation as well.
> 
> We can actually do this check lockless. trylock is taking a chance
> whether we take lock->wait_lock to do the check or not.
> 
> Add comments to the function while at it.
> 
> Reported-by: Oleg Nesterov 
> Signed-off-by: Thomas Gleixner 
> ---

Reviewed-by: Lai Jiangshan 

Thanks,
Lai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict

2014-06-11 Thread Masami Hiramatsu

(2014/06/11 16:41), Namhyung Kim wrote:
> Hi Masami,
> 
> On Wed, 11 Jun 2014 10:28:01 +0900, Masami Hiramatsu wrote:
>> (2014/06/10 22:53), Namhyung Kim wrote:
>>> Hi Masami,
>>>
>>> 2014-06-10 (화), 10:50 +, Masami Hiramatsu:
 Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among
 ftrace users who may modify regs->ip to change the execution
 path. This also adds the flag to kprobe_ftrace_ops, since
 ftrace-based kprobes already modifies regs->ip. Thus, if
 another user modifies the regs->ip on the same function entry,
 one of them will be broken. So both should add IPMODIFY flag
 and make sure that ftrace_set_filter_ip() succeeds.

 Note that currently conflicts of IPMODIFY are detected on the
 filter hash. It does NOT care about the notrace hash. This means
 that if you set filter hash all functions and notrace(mask)
 some of them, the IPMODIFY flag will be applied to all
 functions.

>>>
>>> [SNIP]
 +static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
 +   struct ftrace_hash *old_hash,
 +   struct ftrace_hash *new_hash)
 +{
 +  struct ftrace_page *pg;
 +  struct dyn_ftrace *rec, *end = NULL;
 +  int in_old, in_new;
 +
 +  /* Only update if the ops has been registered */
 +  if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
 +  return 0;
 +
 +  if (!(ops->flags & FTRACE_OPS_FL_SAVE_REGS) ||
 +  !(ops->flags & FTRACE_OPS_FL_IPMODIFY))
 +  return 0;
 +
 +  /* Update rec->flags */
 +  do_for_each_ftrace_rec(pg, rec) {
 +  /* We need to update only differences of filter_hash */
 +  in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip);
 +  in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip);
>>>
>>> Why not use ftrace_hash_empty() here instead of checking NULL? 
>>
>> Ah, a trick is here. Since an empty filter_hash must hit all, we can not
>> enable/disable filter_hash if we use ftrace_hash_empty() here.
>>
>> To enabling the new_hash, old_hash must be EMPTY_HASH which means in_old
>> always be false. To disabling, new_hash is EMPTY_HASH too.
>> Please see ftrace_hash_ipmodify_enable/disable/update().
> 
> I'm confused. 8-p  I guess what you want to do is checking records in
> either of the filter_hash, right?  If so, what about this?
> 
>   in_old = !ftrace_hash_empty(old_hash) && ftrace_lookup_ip(old_hash, 
> rec->ip);
>   in_new = !ftrace_hash_empty(new_hash) && ftrace_lookup_ip(new_hash, 
> rec->ip);

NO, ftrace_lookup_ip() returns NULL if the hash is empty, so adding
!ftrace_hash_empty() is meaningless :)

Actually, here I intended to have 3 meanings for the new/old_hash arguments,
- If it is NULL, it hits all
- If it is EMPTY_HASH, it hits nothing
- If it has some entries, it hits those entries.

And in ftrace.c(__ftrace_hash_rec_update), AFAICS, ops->filter_hash has only
2 meanings,
- If it is EMPTY_HASH or NULL, it hits all
- If it has some entries, it hits those entries.

So I had to do above change...

>>> Also
>>> return value of ftrace_lookup_ip is not boolean..  maybe you need to
>>> add !! or convert type of the in_{old,new} to bool.
>>
>> Yeah, I see. And there is '||' (logical OR) which evaluates the result
>> as boolean. :)
> 
> Argh... you're right! :)
> 
>>
>>>
>>>
 +  if (in_old == in_new)
 +  continue;
 +
 +  if (in_new) {
 +  /* New entries must ensure no others are using it */
 +  if (rec->flags & FTRACE_FL_IPMODIFY)
 +  goto rollback;
 +  rec->flags |= FTRACE_FL_IPMODIFY;
 +  } else /* Removed entry */
 +  rec->flags &= ~FTRACE_FL_IPMODIFY;
 +  } while_for_each_ftrace_rec();
 +
 +  return 0;
 +
 +rollback:
 +  end = rec;
 +
 +  /* Roll back what we did above */
 +  do_for_each_ftrace_rec(pg, rec) {
 +  if (rec == end)
 +  goto err_out;
 +
 +  in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip);
 +  in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip);
 +  if (in_old == in_new)
 +  continue;
 +
 +  if (in_new)
 +  rec->flags &= ~FTRACE_FL_IPMODIFY;
 +  else
 +  rec->flags |= FTRACE_FL_IPMODIFY;
 +  } while_for_each_ftrace_rec();
 +
 +err_out:
 +  return -EBUSY;
 +}
 +
 +static int ftrace_hash_ipmodify_enable(struct ftrace_ops *ops)
 +{
 +  struct ftrace_hash *hash = ops->filter_hash;
 +
 +  if (ftrace_hash_empty(hash))
 +  hash = NULL;
 +
 +  return __ftrace_hash_update_ipmodify(ops, EMPTY_HASH, hash);
 +}
>>>
>>> Please see above comment.  You can pass an empty hash as is, or

RE: Hello

2014-06-11 Thread Zellmann, Wayne P.

$2M was Donated to you. Please contact Pedro via
pedroquezada...@qq.com--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch 11/13] wireless: mwifiex: Use the proper interfaces

2014-06-11 Thread Bing Zhao

Hi Thomas,

Thanks for your patch.

> Why is converting time formats so desired if there are proper
> interfaces for this?
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bing Zhao 
> Cc: "John W. Linville" 
> Cc: linux-wirel...@vger.kernel.org

[...]

> Index: linux/drivers/net/wireless/mwifiex/main.c
> ===
> --- linux.orig/drivers/net/wireless/mwifiex/main.c
> +++ linux/drivers/net/wireless/mwifiex/main.c
> @@ -611,7 +611,6 @@ mwifiex_hard_start_xmit(struct sk_buff *
>   struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev);
>   struct sk_buff *new_skb;
>   struct mwifiex_txinfo *tx_info;
> - struct timeval tv;
> 
>   dev_dbg(priv->adapter->dev, "data: %lu BSS(%d-%d): Data <= kernel\n",
>   jiffies, priv->bss_type, priv->bss_num);
> @@ -658,8 +657,7 @@ mwifiex_hard_start_xmit(struct sk_buff *
>* firmware for aggregate delay calculation for stats and
>* MSDU lifetime expiry.
>*/
> - do_gettimeofday();
> - skb->tstamp = timeval_to_ktime(tv);
> + __net_timestamp(skb);
> 
>   mwifiex_queue_tx_pkt(priv, skb);
> 
> Index: linux/drivers/net/wireless/mwifiex/tdls.c
> ===
> --- linux.orig/drivers/net/wireless/mwifiex/tdls.c
> +++ linux/drivers/net/wireless/mwifiex/tdls.c
> @@ -552,8 +552,7 @@ int mwifiex_send_tdls_data_frame(struct
>   tx_info->bss_num = priv->bss_num;
>   tx_info->bss_type = priv->bss_type;
> 
> - do_gettimeofday();
> - skb->tstamp = timeval_to_ktime(tv);
> + __net_timestamp(skb);

I guess we need to remove "struct timeval tv" local variable too.

>   mwifiex_queue_tx_pkt(priv, skb);
> 
>   return 0;
> @@ -710,8 +709,7 @@ int mwifiex_send_tdls_action_frame(struc
>   pkt_len = skb->len - MWIFIEX_MGMT_FRAME_HEADER_SIZE - sizeof(pkt_len);
>   memcpy(skb->data + MWIFIEX_MGMT_FRAME_HEADER_SIZE, _len,
>  sizeof(pkt_len));
> - do_gettimeofday();
> - skb->tstamp = timeval_to_ktime(tv);
> + __net_timestamp(skb);

And here too.

Could you please remove these two "struct timeval tv" and send v2 with my ACK?

Acked-by: Bing Zhao 

Thanks,
Bing

>   mwifiex_queue_tx_pkt(priv, skb);
> 
>   return 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread Theodore Ts'o

On Wed, Jun 11, 2014 at 08:32:49PM -0400, George Spelvin wrote:
> Comparable, but slightly slower.  Clearly, I need to do better.
> And you can see the first-iteration effects clearly.  Still,
> noting *remotely* like 7x!

I redid my numbers, and I can no longer reproduce the 7x slowdown.  I
do see that if you compile w/o -O2, fast_mix2 is twice as slow.  But
it's not 7x slower.

When compiling w/o -O2:

   fast_mix fast_mix2
task-clock 221.3 ms 460.7 ms

When compiling with -O2 -Os:

   fast_mix fast_mix2
task-clock 115.4 ms 71.5 ms

And here's the numbers I got with a single iteration using rdtsc:

fast_mix: 164   fast_mix2: 237
fast_mix: 168   fast_mix2: 230
fast_mix: 166   fast_mix2: 228
fast_mix: 164   fast_mix2: 230
fast_mix: 166   fast_mix2: 230
fast_mix: 168   fast_mix2: 232
fast_mix: 166   fast_mix2: 228
fast_mix: 164   fast_mix2: 228
fast_mix: 166   fast_mix2: 234
fast_mix: 166   fast_mix2: 230

- Ted


#include 
#include 
#include 
#include 

typedef unsigned int __u32;

struct fast_pool {
__u32   pool[4];
unsigned long   last;
unsigned short  count;
unsigned char   rotate;
unsigned char   last_timer_intr;
};


/**
 * rol32 - rotate a 32-bit value left
 * @word: value to rotate
 * @shift: bits to roll
 */
static inline __u32 rol32(__u32 word, unsigned int shift)
{
return (word << shift) | (word >> (32 - shift));
}

static __u32 const twist_table[8] = {
0x, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };

/*
 * This is a fast mixing routine used by the interrupt randomness
 * collector.  It's hardcoded for an 128 bit pool and assumes that any
 * locks that might be needed are taken by the caller.
 */
extern void fast_mix(struct fast_pool *f, __u32 input[4])
{
__u32   w;
unsignedinput_rotate = f->rotate;

w = rol32(input[0], input_rotate) ^ f->pool[0] ^ f->pool[3];
f->pool[0] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 14) & 31;
w = rol32(input[1], input_rotate) ^ f->pool[1] ^ f->pool[0];
f->pool[1] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 7) & 31;
w = rol32(input[2], input_rotate) ^ f->pool[2] ^ f->pool[1];
f->pool[2] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 7) & 31;
w = rol32(input[3], input_rotate) ^ f->pool[3] ^ f->pool[2];
f->pool[3] = (w >> 3) ^ twist_table[w & 7];
input_rotate = (input_rotate + 7) & 31;

f->rotate = input_rotate;
f->count++;
}

extern fast_mix2(struct fast_pool *f, __u32 const input[4])
{
__u32 a = f->pool[0] ^ input[0],  b = f->pool[1] ^ input[1];
__u32 c = f->pool[2] ^ input[2],  d = f->pool[3] ^ input[3];
int i;


for (i = 0; i < 3; i++) {
/*
 * Inspired by ChaCha's QuarterRound, but
 * modified for much greater parallelism.
 * Surprisingly, rotating a and c seems to work
 * better than b and d.  And it runs faster.
 */
a += b; c += d;
d ^= a; b ^= c;
a = rol32(a, 15);   c = rol32(c, 21);

a += b; c += d;
d ^= a; b ^= c;
a = rol32(a, 3);c = rol32(c, 7);
}
f->pool[0] = a;  f->pool[1] = b;
f->pool[2] = c;  f->pool[3] = d;
f->count++;
}

static __inline__ volatile unsigned long long rdtsc(void)
{
  unsigned long long int x;
 __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
 return x;
}

int main(int argc, char **argv)
{
struct fast_pool f;
int i;
__u32 input[4];
unsigned volatile long long start_time, end_time;

memset(, 0, sizeof(f));
memset(, 0, sizeof(input));
f.pool[0] = 1;

#if !defined(BENCH_FASTMIX) && !defined(BENCH_FASTMIX2)
for (i=0; i < 10; i++) {
usleep(5);
start_time = rdtsc();
fast_mix(, input);
end_time = rdtsc();
printf("fast_mix: %llu\t", end_time - start_time);
usleep(5);
start_time = rdtsc();
fast_mix2(, input);
end_time = rdtsc();
printf("fast_mix2: %llu\n", end_time - start_time);
}

#endif

#ifdef BENCH_FASTMIX
for (i=0; i < 1024; i++) {
fast_mix(, input);
}
#endif

#ifdef BENCH_FASTMIX2
for (i=0; i < 1024; i++) {
fast_mix2(, input);
}
#endif
}
--
To unsubscribe from this list: send the line

[PATCH v2 08/10] mm, cma: clean-up cma allocation error path

2014-06-11 Thread Joonsoo Kim

We can remove one call sites for clear_cma_bitmap() if we first
call it before checking error number.

Signed-off-by: Joonsoo Kim 

diff --git a/mm/cma.c b/mm/cma.c
index 1e1b017..01a0713 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -282,11 +282,12 @@ struct page *cma_alloc(struct cma *cma, int count, 
unsigned int align)
if (ret == 0) {
page = pfn_to_page(pfn);
break;
-   } else if (ret != -EBUSY) {
-   clear_cma_bitmap(cma, pfn, count);
-   break;
}
+
clear_cma_bitmap(cma, pfn, count);
+   if (ret != -EBUSY)
+   break;
+
pr_debug("%s(): memory range at %p is busy, retrying\n",
 __func__, pfn_to_page(pfn));
/* try again with a bit different memory target */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 10/10] mm, cma: use spinlock instead of mutex

2014-06-11 Thread Joonsoo Kim

Currently, we should take the mutex for manipulating bitmap.
This job may be really simple and short so we don't need to sleep
if contended. So I change it to spinlock.

Signed-off-by: Joonsoo Kim 

diff --git a/mm/cma.c b/mm/cma.c
index 22a5b23..3085e8c 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,7 +37,7 @@ struct cma {
unsigned long   count;
unsigned long   *bitmap;
int order_per_bit; /* Order of pages represented by one bit */
-   struct mutexlock;
+   spinlock_t  lock;
 };
 
 /*
@@ -72,9 +73,9 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long 
pfn, int count)
bitmapno = (pfn - cma->base_pfn) >> cma->order_per_bit;
nr_bits = cma_bitmap_pages_to_bits(cma, count);
 
-   mutex_lock(>lock);
+   spin_lock(>lock);
bitmap_clear(cma->bitmap, bitmapno, nr_bits);
-   mutex_unlock(>lock);
+   spin_unlock(>lock);
 }
 
 static int __init cma_activate_area(struct cma *cma)
@@ -112,7 +113,7 @@ static int __init cma_activate_area(struct cma *cma)
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
} while (--i);
 
-   mutex_init(>lock);
+   spin_lock_init(>lock);
return 0;
 
 err:
@@ -261,11 +262,11 @@ struct page *cma_alloc(struct cma *cma, int count, 
unsigned int align)
nr_bits = cma_bitmap_pages_to_bits(cma, count);
 
for (;;) {
-   mutex_lock(>lock);
+   spin_lock(>lock);
bitmapno = bitmap_find_next_zero_area(cma->bitmap,
bitmap_maxno, start, nr_bits, mask);
if (bitmapno >= bitmap_maxno) {
-   mutex_unlock(>lock);
+   spin_unlock(>lock);
break;
}
bitmap_set(cma->bitmap, bitmapno, nr_bits);
@@ -274,7 +275,7 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned 
int align)
 * our exclusive use. If the migration fails we will take the
 * lock again and unmark it.
 */
-   mutex_unlock(>lock);
+   spin_unlock(>lock);
 
pfn = cma->base_pfn + (bitmapno << cma->order_per_bit);
mutex_lock(_mutex);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 00/10] CMA: generalize CMA reserved area management code

2014-06-11 Thread Joonsoo Kim

Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the kvm on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
>From my guess, it is caused by some needs on bitmap management. Kvm side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying & pasting this reserved area management code.

v2:
  Although this patchset looks very different with v1, the end result,
  that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7.

Patch 1-5 prepare some features to cover ppc kvm's requirements.
Patch 6-7 generalize CMA reserved area management code and change users
to use it.
Patch 8-10 clean-up minor things.

Joonsoo Kim (10):
  DMA, CMA: clean-up log message
  DMA, CMA: fix possible memory leak
  DMA, CMA: separate core cma management codes from DMA APIs
  DMA, CMA: support alignment constraint on cma region
  DMA, CMA: support arbitrary bitmap granularity
  CMA: generalize CMA reserved area management functionality
  PPC, KVM, CMA: use general CMA reserved area management framework
  mm, cma: clean-up cma allocation error path
  mm, cma: move output param to the end of param list
  mm, cma: use spinlock instead of mutex

 arch/powerpc/kvm/book3s_hv_builtin.c |   17 +-
 arch/powerpc/kvm/book3s_hv_cma.c |  240 
 arch/powerpc/kvm/book3s_hv_cma.h |   27 ---
 drivers/base/Kconfig |   10 -
 drivers/base/dma-contiguous.c|  248 ++---
 include/linux/cma.h  |   12 ++
 include/linux/dma-contiguous.h   |3 +-
 mm/Kconfig   |   11 ++
 mm/Makefile  |1 +
 mm/cma.c |  333 ++
 10 files changed, 382 insertions(+), 520 deletions(-)
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.c
 delete mode 100644 arch/powerpc/kvm/book3s_hv_cma.h
 create mode 100644 include/linux/cma.h
 create mode 100644 mm/cma.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 09/10] mm, cma: move output param to the end of param list

2014-06-11 Thread Joonsoo Kim

Conventionally, we put output param to the end of param list.
cma_declare_contiguous() doesn't look like that, so change it.

Additionally, move down cma_areas reference code to the position
where it is really needed.

Signed-off-by: Joonsoo Kim 

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 28ec226..97613ea 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -184,7 +184,7 @@ void __init kvm_cma_reserve(void)
 
align_size = max(kvm_rma_pages << PAGE_SHIFT, align_size);
cma_declare_contiguous(selected_size, 0, 0, align_size,
-   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, _cma, false);
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, false, _cma);
}
 }
 
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index f177f73..bfd4553 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -149,7 +149,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = cma_declare_contiguous(size, base, limit, 0, 0, res_cma, fixed);
+   ret = cma_declare_contiguous(size, base, limit, 0, 0, fixed, res_cma);
if (ret)
return ret;
 
diff --git a/include/linux/cma.h b/include/linux/cma.h
index e38efe9..e53eead 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -6,7 +6,7 @@ struct cma;
 extern int __init cma_declare_contiguous(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
phys_addr_t alignment, int order_per_bit,
-   struct cma **res_cma, bool fixed);
+   bool fixed, struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align);
 extern bool cma_release(struct cma *cma, struct page *pages, int count);
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index 01a0713..22a5b23 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -142,8 +142,8 @@ core_initcall(cma_init_reserved_areas);
  * @limit: End address of the reserved memory (optional, 0 for any).
  * @alignment: Alignment for the contiguous memory area, should be power of 2
  * @order_per_bit: Order of pages represented by one bit on bitmap.
- * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
+ * @res_cma: Pointer to store the created cma region.
  *
  * This function reserves memory from early allocator. It should be
  * called by arch specific code once the early allocator (memblock or bootmem)
@@ -156,9 +156,9 @@ core_initcall(cma_init_reserved_areas);
 int __init cma_declare_contiguous(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
phys_addr_t alignment, int order_per_bit,
-   struct cma **res_cma, bool fixed)
+   bool fixed, struct cma **res_cma)
 {
-   struct cma *cma = _areas[cma_area_count];
+   struct cma *cma;
int ret = 0;
 
pr_debug("%s(size %lx, base %08lx, limit %08lx alignment %08lx)\n",
@@ -214,6 +214,7 @@ int __init cma_declare_contiguous(phys_addr_t size,
 * Each reserved area must be initialised later, when more kernel
 * subsystems (like slab allocator) are available.
 */
+   cma = _areas[cma_area_count];
cma->base_pfn = PFN_DOWN(base);
cma->count = size >> PAGE_SHIFT;
cma->order_per_bit = order_per_bit;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 06/10] CMA: generalize CMA reserved area management functionality

2014-06-11 Thread Joonsoo Kim

Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the kvm on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar.
>From my guess, it is caused by some needs on bitmap management. Kvm side
wants to maintain bitmap not for 1 page, but for more size. Eventually it
use bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through
this patch.

This change could also help developer who want to use CMA in their
new feature development, since they can use CMA easily without
copying & pasting this reserved area management code.

In previous patches, we have prepared some features to generalize
CMA reserved area management and now it's time to do it. This patch
moves core functions to mm/cma.c and change DMA APIs to use
these functions.

There is no functional change in DMA APIs.

v2: There is no big change from v1 in mm/cma.c. Mostly renaming.

Acked-by: Michal Nazarewicz 
Signed-off-by: Joonsoo Kim 

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 00e13ce..4eac559 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -283,16 +283,6 @@ config CMA_ALIGNMENT
 
  If unsure, leave the default value "8".
 
-config CMA_AREAS
-   int "Maximum count of the CMA device-private areas"
-   default 7
-   help
- CMA allows to create CMA areas for particular devices. This parameter
- sets the maximum number of such device private CMA areas in the
- system.
-
- If unsure, leave the default value "7".
-
 endif
 
 endmenu
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 9bc9340..f177f73 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -24,25 +24,10 @@
 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
 #include 
 #include 
-
-struct cma {
-   unsigned long   base_pfn;
-   unsigned long   count;
-   unsigned long   *bitmap;
-   int order_per_bit; /* Order of pages represented by one bit */
-   struct mutexlock;
-};
-
-struct cma *dma_contiguous_default_area;
+#include 
 
 #ifdef CONFIG_CMA_SIZE_MBYTES
 #define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
@@ -50,6 +35,8 @@ struct cma *dma_contiguous_default_area;
 #define CMA_SIZE_MBYTES 0
 #endif
 
+struct cma *dma_contiguous_default_area;
+
 /*
  * Default global CMA area size can be defined in kernel's .config.
  * This is useful mainly for distro maintainers to create a kernel
@@ -156,199 +143,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
}
 }
 
-static DEFINE_MUTEX(cma_mutex);
-
-static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order)
-{
-   return (1 << (align_order >> cma->order_per_bit)) - 1;
-}
-
-static unsigned long cma_bitmap_maxno(struct cma *cma)
-{
-   return cma->count >> cma->order_per_bit;
-}
-
-static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
-   unsigned long pages)
-{
-   return ALIGN(pages, 1 << cma->order_per_bit) >> cma->order_per_bit;
-}
-
-static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
-{
-   unsigned long bitmapno, nr_bits;
-
-   bitmapno = (pfn - cma->base_pfn) >> cma->order_per_bit;
-   nr_bits = cma_bitmap_pages_to_bits(cma, count);
-
-   mutex_lock(>lock);
-   bitmap_clear(cma->bitmap, bitmapno, nr_bits);
-   mutex_unlock(>lock);
-}
-
-static int __init cma_activate_area(struct cma *cma)
-{
-   int bitmap_maxno = cma_bitmap_maxno(cma);
-   int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
-   unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
-   unsigned i = cma->count >> pageblock_order;
-   struct zone *zone;
-
-   pr_debug("%s()\n", __func__);
-
-   cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-   if (!cma->bitmap)
-   return -ENOMEM;
-
-   WARN_ON_ONCE(!pfn_valid(pfn));
-   zone = page_zone(pfn_to_page(pfn));
-
-   do {
-   unsigned j;
-   base_pfn = pfn;
-   for (j = pageblock_nr_pages; j; --j, pfn++) {
-   WARN_ON_ONCE(!pfn_valid(pfn));
-   /*
-* alloc_contig_range requires the pfn range
-* specified to be in the same zone. Make this
-* simple by forcing the entire CMA resv range
-* to be in the same zone.
-*/
-   if (page_zone(pfn_to_page(pfn)) != zone)
-   goto err;
-   }
-   init_cma_reserved_pageblock(pfn_to_page(base_pfn));
-   } while (--i);
-
-

[PATCH v2 07/10] PPC, KVM, CMA: use general CMA reserved area management framework

2014-06-11 Thread Joonsoo Kim

Now, we have general CMA reserved area management framework,
so use it for future maintainabilty. There is no functional change.

Acked-by: Michal Nazarewicz 
Acked-by: Paolo Bonzini 
Signed-off-by: Joonsoo Kim 

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7cde8a6..28ec226 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -16,12 +16,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 
-#include "book3s_hv_cma.h"
+#define KVM_CMA_CHUNK_ORDER18
+
 /*
  * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
  * should be power of 2.
@@ -43,6 +45,8 @@ static unsigned long kvm_cma_resv_ratio = 5;
 unsigned long kvm_rma_pages = (1 << 27) >> PAGE_SHIFT; /* 128MB */
 EXPORT_SYMBOL_GPL(kvm_rma_pages);
 
+static struct cma *kvm_cma;
+
 /* Work out RMLS (real mode limit selector) field value for a given RMA size.
Assumes POWER7 or PPC970. */
 static inline int lpcr_rmls(unsigned long rma_size)
@@ -97,7 +101,7 @@ struct kvm_rma_info *kvm_alloc_rma()
ri = kmalloc(sizeof(struct kvm_rma_info), GFP_KERNEL);
if (!ri)
return NULL;
-   page = kvm_alloc_cma(kvm_rma_pages, kvm_rma_pages);
+   page = cma_alloc(kvm_cma, kvm_rma_pages, get_order(kvm_rma_pages));
if (!page)
goto err_out;
atomic_set(>use_count, 1);
@@ -112,7 +116,7 @@ EXPORT_SYMBOL_GPL(kvm_alloc_rma);
 void kvm_release_rma(struct kvm_rma_info *ri)
 {
if (atomic_dec_and_test(>use_count)) {
-   kvm_release_cma(pfn_to_page(ri->base_pfn), kvm_rma_pages);
+   cma_release(kvm_cma, pfn_to_page(ri->base_pfn), kvm_rma_pages);
kfree(ri);
}
 }
@@ -134,13 +138,13 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
/* Old CPUs require HPT aligned on a multiple of its size */
if (!cpu_has_feature(CPU_FTR_ARCH_206))
align_pages = nr_pages;
-   return kvm_alloc_cma(nr_pages, align_pages);
+   return cma_alloc(kvm_cma, nr_pages, get_order(align_pages));
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
 
 void kvm_release_hpt(struct page *page, unsigned long nr_pages)
 {
-   kvm_release_cma(page, nr_pages);
+   cma_release(kvm_cma, page, nr_pages);
 }
 EXPORT_SYMBOL_GPL(kvm_release_hpt);
 
@@ -179,7 +183,8 @@ void __init kvm_cma_reserve(void)
align_size = HPT_ALIGN_PAGES << PAGE_SHIFT;
 
align_size = max(kvm_rma_pages << PAGE_SHIFT, align_size);
-   kvm_cma_declare_contiguous(selected_size, align_size);
+   cma_declare_contiguous(selected_size, 0, 0, align_size,
+   KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, _cma, false);
}
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
deleted file mode 100644
index d9d3d85..000
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ /dev/null
@@ -1,240 +0,0 @@
-/*
- * Contiguous Memory Allocator for ppc KVM hash pagetable  based on CMA
- * for DMA mapping framework
- *
- * Copyright IBM Corporation, 2013
- * Author Aneesh Kumar K.V 
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of the
- * License or (at your optional) any later version of the license.
- *
- */
-#define pr_fmt(fmt) "kvm_cma: " fmt
-
-#ifdef CONFIG_CMA_DEBUG
-#ifndef DEBUG
-#  define DEBUG
-#endif
-#endif
-
-#include 
-#include 
-#include 
-#include 
-
-#include "book3s_hv_cma.h"
-
-struct kvm_cma {
-   unsigned long   base_pfn;
-   unsigned long   count;
-   unsigned long   *bitmap;
-};
-
-static DEFINE_MUTEX(kvm_cma_mutex);
-static struct kvm_cma kvm_cma_area;
-
-/**
- * kvm_cma_declare_contiguous() - reserve area for contiguous memory handling
- *   for kvm hash pagetable
- * @size:  Size of the reserved memory.
- * @alignment:  Alignment for the contiguous memory area
- *
- * This function reserves memory for kvm cma area. It should be
- * called by arch code when early allocator (memblock or bootmem)
- * is still activate.
- */
-long __init kvm_cma_declare_contiguous(phys_addr_t size, phys_addr_t alignment)
-{
-   long base_pfn;
-   phys_addr_t addr;
-   struct kvm_cma *cma = _cma_area;
-
-   pr_debug("%s(size %lx)\n", __func__, (unsigned long)size);
-
-   if (!size)
-   return -EINVAL;
-   /*
-* Sanitise input arguments.
-* We should be pageblock aligned for CMA.
-*/
-   alignment = max(alignment, (phys_addr_t)(PAGE_SIZE << pageblock_order));
-   size = ALIGN(size, alignment);
-   /*
-* Reserve memory
-* Use __memblock_alloc_base() since
-* memblock_alloc_base() panic()s.
-*/
-   addr = __memblock_alloc_base(size, alignment, 0);
-   if (!addr) {
-

[PATCH v2 05/10] DMA, CMA: support arbitrary bitmap granularity

2014-06-11 Thread Joonsoo Kim

ppc kvm's cma region management requires arbitrary bitmap granularity,
since they want to reserve very large memory and manage this region
with bitmap that one bit for several pages to reduce management overheads.
So support arbitrary bitmap granularity for following generalization.

Signed-off-by: Joonsoo Kim 

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index bc4c171..9bc9340 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -38,6 +38,7 @@ struct cma {
unsigned long   base_pfn;
unsigned long   count;
unsigned long   *bitmap;
+   int order_per_bit; /* Order of pages represented by one bit */
struct mutexlock;
 };
 
@@ -157,9 +158,38 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
 
 static DEFINE_MUTEX(cma_mutex);
 
+static unsigned long cma_bitmap_aligned_mask(struct cma *cma, int align_order)
+{
+   return (1 << (align_order >> cma->order_per_bit)) - 1;
+}
+
+static unsigned long cma_bitmap_maxno(struct cma *cma)
+{
+   return cma->count >> cma->order_per_bit;
+}
+
+static unsigned long cma_bitmap_pages_to_bits(struct cma *cma,
+   unsigned long pages)
+{
+   return ALIGN(pages, 1 << cma->order_per_bit) >> cma->order_per_bit;
+}
+
+static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
+{
+   unsigned long bitmapno, nr_bits;
+
+   bitmapno = (pfn - cma->base_pfn) >> cma->order_per_bit;
+   nr_bits = cma_bitmap_pages_to_bits(cma, count);
+
+   mutex_lock(>lock);
+   bitmap_clear(cma->bitmap, bitmapno, nr_bits);
+   mutex_unlock(>lock);
+}
+
 static int __init cma_activate_area(struct cma *cma)
 {
-   int bitmap_size = BITS_TO_LONGS(cma->count) * sizeof(long);
+   int bitmap_maxno = cma_bitmap_maxno(cma);
+   int bitmap_size = BITS_TO_LONGS(bitmap_maxno) * sizeof(long);
unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
unsigned i = cma->count >> pageblock_order;
struct zone *zone;
@@ -221,6 +251,7 @@ core_initcall(cma_init_reserved_areas);
  * @base: Base address of the reserved area optional, use 0 for any
  * @limit: End address of the reserved memory (optional, 0 for any).
  * @alignment: Alignment for the contiguous memory area, should be power of 2
+ * @order_per_bit: Order of pages represented by one bit on bitmap.
  * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
  *
@@ -235,7 +266,7 @@ core_initcall(cma_init_reserved_areas);
  */
 static int __init __dma_contiguous_reserve_area(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
-   phys_addr_t alignment,
+   phys_addr_t alignment, int order_per_bit,
struct cma **res_cma, bool fixed)
 {
struct cma *cma = _areas[cma_area_count];
@@ -269,6 +300,8 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t 
size,
base = ALIGN(base, alignment);
size = ALIGN(size, alignment);
limit &= ~(alignment - 1);
+   /* size should be aligned with order_per_bit */
+   BUG_ON(!IS_ALIGNED(size >> PAGE_SHIFT, 1 << order_per_bit));
 
/* Reserve memory */
if (base && fixed) {
@@ -294,6 +327,7 @@ static int __init __dma_contiguous_reserve_area(phys_addr_t 
size,
 */
cma->base_pfn = PFN_DOWN(base);
cma->count = size >> PAGE_SHIFT;
+   cma->order_per_bit = order_per_bit;
*res_cma = cma;
cma_area_count++;
 
@@ -313,7 +347,7 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = __dma_contiguous_reserve_area(size, base, limit, 0,
+   ret = __dma_contiguous_reserve_area(size, base, limit, 0, 0,
res_cma, fixed);
if (ret)
return ret;
@@ -324,13 +358,6 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
return 0;
 }
 
-static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
-{
-   mutex_lock(>lock);
-   bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
-   mutex_unlock(>lock);
-}
-
 /**
  * dma_alloc_from_contiguous() - allocate pages from contiguous area
  * @dev:   Pointer to device for which the allocation is performed.
@@ -345,7 +372,8 @@ static void clear_cma_bitmap(struct cma *cma, unsigned long 
pfn, int count)
 static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
   unsigned int align)
 {
-   unsigned long mask, pfn, pageno, start = 0;
+   unsigned long mask, pfn, start = 0;
+   unsigned long bitmap_maxno, bitmapno, nr_bits;
struct page *page = NULL;
int ret;
 
@@ -358,18 +386,19 @@ static struct page *__dma_alloc_from_contiguous(struct 
cma *cma,

[PATCH v2 04/10] DMA, CMA: support alignment constraint on cma region

2014-06-11 Thread Joonsoo Kim

ppc kvm's cma area management needs alignment constraint on
cma region. So support it to prepare generalization of cma area
management functionality.

Additionally, add some comments which tell us why alignment
constraint is needed on cma region.

Signed-off-by: Joonsoo Kim 

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 8a44c82..bc4c171 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct cma {
unsigned long   base_pfn;
@@ -219,6 +220,7 @@ core_initcall(cma_init_reserved_areas);
  * @size: Size of the reserved area (in bytes),
  * @base: Base address of the reserved area optional, use 0 for any
  * @limit: End address of the reserved memory (optional, 0 for any).
+ * @alignment: Alignment for the contiguous memory area, should be power of 2
  * @res_cma: Pointer to store the created cma region.
  * @fixed: hint about where to place the reserved area
  *
@@ -233,15 +235,15 @@ core_initcall(cma_init_reserved_areas);
  */
 static int __init __dma_contiguous_reserve_area(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
+   phys_addr_t alignment,
struct cma **res_cma, bool fixed)
 {
struct cma *cma = _areas[cma_area_count];
-   phys_addr_t alignment;
int ret = 0;
 
-   pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
-(unsigned long)size, (unsigned long)base,
-(unsigned long)limit);
+   pr_debug("%s(size %lx, base %08lx, limit %08lx align_order %08lx)\n",
+   __func__, (unsigned long)size, (unsigned long)base,
+   (unsigned long)limit, (unsigned long)alignment);
 
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
@@ -253,8 +255,17 @@ static int __init 
__dma_contiguous_reserve_area(phys_addr_t size,
if (!size)
return -EINVAL;
 
-   /* Sanitise input arguments */
-   alignment = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+   if (alignment && !is_power_of_2(alignment))
+   return -EINVAL;
+
+   /*
+* Sanitise input arguments.
+* CMA area should be at least MAX_ORDER - 1 aligned. Otherwise,
+* CMA area could be merged into other MIGRATE_TYPE by buddy mechanism
+* and CMA property will be broken.
+*/
+   alignment = max(alignment,
+   (phys_addr_t)PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order));
base = ALIGN(base, alignment);
size = ALIGN(size, alignment);
limit &= ~(alignment - 1);
@@ -302,7 +313,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 {
int ret;
 
-   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
+   ret = __dma_contiguous_reserve_area(size, base, limit, 0,
+   res_cma, fixed);
if (ret)
return ret;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 03/10] DMA, CMA: separate core cma management codes from DMA APIs

2014-06-11 Thread Joonsoo Kim

To prepare future generalization work on cma area management code,
we need to separate core cma management codes from DMA APIs.
We will extend these core functions to cover requirements of
ppc kvm's cma area management functionality in following patches.
This separation helps us not to touch DMA APIs while extending
core functions.

Signed-off-by: Joonsoo Kim 

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index fb0cdce..8a44c82 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -231,9 +231,9 @@ core_initcall(cma_init_reserved_areas);
  * If @fixed is true, reserve contiguous area at exactly @base.  If false,
  * reserve in range from @base to @limit.
  */
-int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
-  phys_addr_t limit, struct cma **res_cma,
-  bool fixed)
+static int __init __dma_contiguous_reserve_area(phys_addr_t size,
+   phys_addr_t base, phys_addr_t limit,
+   struct cma **res_cma, bool fixed)
 {
struct cma *cma = _areas[cma_area_count];
phys_addr_t alignment;
@@ -288,16 +288,30 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 
pr_info("%s(): reserved %ld MiB at %08lx\n",
__func__, (unsigned long)size / SZ_1M, (unsigned long)base);
-
-   /* Architecture specific contiguous memory fixup. */
-   dma_contiguous_early_fixup(base, size);
return 0;
+
 err:
pr_err("%s(): failed to reserve %ld MiB\n",
__func__, (unsigned long)size / SZ_1M);
return ret;
 }
 
+int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base,
+  phys_addr_t limit, struct cma **res_cma,
+  bool fixed)
+{
+   int ret;
+
+   ret = __dma_contiguous_reserve_area(size, base, limit, res_cma, fixed);
+   if (ret)
+   return ret;
+
+   /* Architecture specific contiguous memory fixup. */
+   dma_contiguous_early_fixup(base, size);
+
+   return 0;
+}
+
 static void clear_cma_bitmap(struct cma *cma, unsigned long pfn, int count)
 {
mutex_lock(>lock);
@@ -316,20 +330,16 @@ static void clear_cma_bitmap(struct cma *cma, unsigned 
long pfn, int count)
  * global one. Requires architecture specific dev_get_cma_area() helper
  * function.
  */
-struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+static struct page *__dma_alloc_from_contiguous(struct cma *cma, int count,
   unsigned int align)
 {
unsigned long mask, pfn, pageno, start = 0;
-   struct cma *cma = dev_get_cma_area(dev);
struct page *page = NULL;
int ret;
 
if (!cma || !cma->count)
return NULL;
 
-   if (align > CONFIG_CMA_ALIGNMENT)
-   align = CONFIG_CMA_ALIGNMENT;
-
pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
 count, align);
 
@@ -377,6 +387,17 @@ struct page *dma_alloc_from_contiguous(struct device *dev, 
int count,
return page;
 }
 
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+  unsigned int align)
+{
+   struct cma *cma = dev_get_cma_area(dev);
+
+   if (align > CONFIG_CMA_ALIGNMENT)
+   align = CONFIG_CMA_ALIGNMENT;
+
+   return __dma_alloc_from_contiguous(cma, count, align);
+}
+
 /**
  * dma_release_from_contiguous() - release allocated pages
  * @dev:   Pointer to device for which the pages were allocated.
@@ -387,10 +408,9 @@ struct page *dma_alloc_from_contiguous(struct device *dev, 
int count,
  * It returns false when provided pages do not belong to contiguous area and
  * true otherwise.
  */
-bool dma_release_from_contiguous(struct device *dev, struct page *pages,
+static bool __dma_release_from_contiguous(struct cma *cma, struct page *pages,
 int count)
 {
-   struct cma *cma = dev_get_cma_area(dev);
unsigned long pfn;
 
if (!cma || !pages)
@@ -410,3 +430,11 @@ bool dma_release_from_contiguous(struct device *dev, 
struct page *pages,
 
return true;
 }
+
+bool dma_release_from_contiguous(struct device *dev, struct page *pages,
+int count)
+{
+   struct cma *cma = dev_get_cma_area(dev);
+
+   return __dma_release_from_contiguous(cma, pages, count);
+}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 02/10] DMA, CMA: fix possible memory leak

2014-06-11 Thread Joonsoo Kim

We should free memory for bitmap when we find zone mis-match,
otherwise this memory will leak.

Additionally, I copy code comment from ppc kvm's cma code to notify
why we need to check zone mis-match.

Signed-off-by: Joonsoo Kim 

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index bd0bb81..fb0cdce 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -177,14 +177,24 @@ static int __init cma_activate_area(struct cma *cma)
base_pfn = pfn;
for (j = pageblock_nr_pages; j; --j, pfn++) {
WARN_ON_ONCE(!pfn_valid(pfn));
+   /*
+* alloc_contig_range requires the pfn range
+* specified to be in the same zone. Make this
+* simple by forcing the entire CMA resv range
+* to be in the same zone.
+*/
if (page_zone(pfn_to_page(pfn)) != zone)
-   return -EINVAL;
+   goto err;
}
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
} while (--i);
 
mutex_init(>lock);
return 0;
+
+err:
+   kfree(cma->bitmap);
+   return -EINVAL;
 }
 
 static struct cma cma_areas[MAX_CMA_AREAS];
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 01/10] DMA, CMA: clean-up log message

2014-06-11 Thread Joonsoo Kim

We don't need explicit 'CMA:' prefix, since we already define prefix
'cma:' in pr_fmt. So remove it.

And, some logs print function name and others doesn't. This looks
bad to me, so I unify log format to print function name consistently.

Lastly, I add one more debug log on cma_activate_area().

Signed-off-by: Joonsoo Kim 

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 83969f8..bd0bb81 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -144,7 +144,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
}
 
if (selected_size && !dma_contiguous_default_area) {
-   pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+   pr_debug("%s(): reserving %ld MiB for global area\n", __func__,
 (unsigned long)selected_size / SZ_1M);
 
dma_contiguous_reserve_area(selected_size, selected_base,
@@ -163,8 +163,9 @@ static int __init cma_activate_area(struct cma *cma)
unsigned i = cma->count >> pageblock_order;
struct zone *zone;
 
-   cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+   pr_debug("%s()\n", __func__);
 
+   cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
if (!cma->bitmap)
return -ENOMEM;
 
@@ -234,7 +235,8 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
 
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
-   pr_err("Not enough slots for CMA reserved regions!\n");
+   pr_err("%s(): Not enough slots for CMA reserved regions!\n",
+   __func__);
return -ENOSPC;
}
 
@@ -274,14 +276,15 @@ int __init dma_contiguous_reserve_area(phys_addr_t size, 
phys_addr_t base,
*res_cma = cma;
cma_area_count++;
 
-   pr_info("CMA: reserved %ld MiB at %08lx\n", (unsigned long)size / SZ_1M,
-   (unsigned long)base);
+   pr_info("%s(): reserved %ld MiB at %08lx\n",
+   __func__, (unsigned long)size / SZ_1M, (unsigned long)base);
 
/* Architecture specific contiguous memory fixup. */
dma_contiguous_early_fixup(base, size);
return 0;
 err:
-   pr_err("CMA: failed to reserve %ld MiB\n", (unsigned long)size / SZ_1M);
+   pr_err("%s(): failed to reserve %ld MiB\n",
+   __func__, (unsigned long)size / SZ_1M);
return ret;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7 v7] trace, RAS: Add eMCA trace event interface

2014-06-11 Thread Chen, Gong

On Wed, Jun 11, 2014 at 09:02:15PM +0200, Borislav Petkov wrote:
> > +EXPORT_SYMBOL_GPL(cper_mem_err_pack);
> 
> Why do we export this one and the one below? What .config warrants this?
> 
> CONFIG_ACPI_EXTLOG=m doesn't need them, AFAICT.
> 
Right. acpi_extlog doesn't use it. They can be exported later until needed.

> > +   TP_STRUCT__entry(
> > +   __field(u32, err_seq)
> > +   __field(u8, etype)
> > +   __field(u8, sev)
> > +   __field(u64, pa)
> > +   __field(u8, pa_mask_lsb)
> > +   __array(u8, fru_id, 40)
> 
> How did you come up with this magic number? Why isn't that sizeof(uuid_le)?
Cause I want to convert it into a string.

> > +   snprintf(__entry->fru_id, 39, "%pUl", fru_id);
> 
> Yeah, I didn't catch the reasoning behind why we need to convert the FRU
> into a string and not leave it simply as u8[16]...
Fair enough. It can be compressed a little bit more.



signature.asc
Description: Digital signature

Re: [PULL] modules-next

2014-06-11 Thread Rusty Russell

Mark Brown  writes:
> On Wed, Jun 11, 2014 at 03:03:47PM +0930, Rusty Russell wrote:
>
>>   drivers/regulator/virtual: avoid world-writable sysfs files.
>
> Acked-by: Mark Brown 
>
> if you need to respin - please do send patches to maintainers.

If the address in drivers/regulator/virtual.c is incorrect, please
update it:

 Subject: [PATCH 5/9] drivers/regulator/virtual: avoid world-writable sysfs 
files.
 To: linux-kernel@vger.kernel.org
 Cc: Rusty Russell , Mark Brown 

 Date: Tue, 22 Apr 2014 13:03:28 +0930

 In line with practice for module parameters, we're adding a build-time
 check that sysfs files aren't world-writable.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PULL] virtio-next

2014-06-11 Thread Rusty Russell

The following changes since commit ec6931b281797b69e6cf109f9cc94d5a2bf994e0:

  word-at-a-time: avoid undefined behaviour in zero_bytemask macro (2014-04-27 
15:20:05 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git 
tags/virtio-next-for-linus

for you to fetch changes up to c77fba9ab058d1e96ed51d4215e56905c9ef8d2a:

  virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock. 
(2014-05-21 11:25:41 +0930)


Main excitement is a virtio_scsi fix for alloc holding spinlock on the abort
path, which I refuse to CC stable since (1) I discovered it myself, and
(2) it's been there forever with no reports.

Cheers,
Rusty.

Amos Kong (1):
  virtio-rng: support multiple virtio-rng devices

Heinz Graalfs (1):
  virtio_ccw: introduce device_lost in virtio_ccw_device

Rusty Russell (2):
  virtio: virtio_break_device() to mark all virtqueues broken.
  virtio_scsi: don't call virtqueue_add_sgs(... GFP_NOIO) holding spinlock.

Sasha Levin (2):
  virtio-rng: fix boot with virtio-rng device
  virtio-rng: fixes for device registration/unregistration

 drivers/char/hw_random/virtio-rng.c | 105 +++-
 drivers/s390/kvm/virtio_ccw.c   |  49 -
 drivers/scsi/virtio_scsi.c  |  15 +++---
 drivers/virtio/virtio_ring.c|  15 ++
 include/linux/virtio.h  |   2 +
 5 files changed, 127 insertions(+), 59 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: console: lockup on boot

2014-06-11 Thread Sasha Levin

On 06/11/2014 05:31 PM, Jan Kara wrote:
> On Wed 11-06-14 22:34:36, Jan Kara wrote:
>> > On Wed 11-06-14 10:55:55, Sasha Levin wrote:
>>> > > On 06/10/2014 11:59 AM, Peter Hurley wrote:
 > > > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> > > >> On 05/30/2014 10:07 AM, Jan Kara wrote:
>> > > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
 > > > On 05/30/2014 09:11 AM, Sasha Levin wrote:
>> > > >>> Hi all,
>> > > >>>
>> > > >>> I sometime see lockups when booting my KVM guest with 
>> > > >>> the latest -next kernel,
>> > > >>> it basically hangs right when it should start 'init', 
>> > > >>> and after a while I get
>> > > >>> the following spew:
>> > > >>>
>> > > >>> [   30.790833] BUG: spinlock lockup suspected on CPU#1, 
>> > > >>> swapper/1/0
 > > >
 > > > Maybe related to this report: 
 > > > https://lkml.org/lkml/2014/5/30/26
 > > > from Jet Chen which was bisected to
 > > >
 > > > commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
 > > > Author: Jan Kara 
 > > > AuthorDate: Thu May 22 10:43:35 2014 +1000
 > > > Commit: Stephen Rothwell 
 > > > CommitDate: Thu May 22 10:43:35 2014 +1000
 > > >
 > > >  printk: enable interrupts before calling 
 > > > console_trylock_for_printk()
 > > >  We need interrupts disabled when calling 
 > > > console_trylock_for_printk() only
 > > >  so that cpu id we pass to can_use_console() remains 
 > > > valid (for other
 > > >  things console_sem provides all the exclusion we need 
 > > > and deadlocks on
 > > >  console_sem due to interrupts are impossible because we 
 > > > use
 > > >  down_trylock()).  However if we are rescheduled, we are 
 > > > guaranteed to run
 > > >  on an online cpu so we can easily just get the cpu id in
 > > >  can_use_console().
 > > >  We can lose a bit of performance when we enable 
 > > > interrupts in
 > > >  vprintk_emit() and then disable them again in 
 > > > console_unlock() but OTOH it
 > > >  can somewhat reduce interrupt latency caused by 
 > > > console_unlock()
 > > >  especially since later in the patch series we will want 
 > > > to spin on
 > > >  console_sem in console_trylock_for_printk().
 > > >  Signed-off-by: Jan Kara 
 > > >  Signed-off-by: Andrew Morton 
 > > >
 > > > ?
>> > > >>>Yeah, very likely. I think I see the problem, I'll send the 
>> > > >>> fix shortly.
> > > >>
> > > >> Hi Jan,
> > > >>
> > > >> It seems that the issue I'm seeing is different from the "[prink]  
> > > >> BUG: spinlock
> > > >> lockup suspected on CPU#0, swapper/1".
> > > >>
> > > >> Is there anything else I could try here? The issue is very common 
> > > >> during testing.
 > > > 
 > > > Sasha,
 > > > 
 > > > Is this bisectable? Maybe that's the best way forward here.
>>> > > 
>>> > > I've ran a bisection again and ended up at the same commit as Jet Chen
>>> > > (the commit unfortunately already made it to Linus's tree).
>>> > > 
>>> > > Note that I did try Jan's proposed fix and that didn't solve the issue
>>> > > for me, I believe we're seeing different issues caused by the same
>>> > > commit.
>> >   Sorry it has been busy time lately and I didn't have as much time to look
>> > into this as would be needed.
>   Oops, pressed send too early... So I have two debug patches for you. Can
> you try whether the problem reproduces with the first one or with both of
> them applied?

The first patch fixed it (I assumed that there's no need to try the second).


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: timekeeping: exiting task with held timekeeping locks

2014-06-11 Thread Sasha Levin

On 06/11/2014 07:30 PM, John Stultz wrote:
> On Wed, Jun 11, 2014 at 4:04 PM, Sasha Levin  wrote:
>> > Hi all,
>> >
>> > While fuzzing with trinity inside a KVM tools guest running the latest 
>> > -next
>> > kernel I've stumbled on the following spew:
>> >
>> > [ 3460.136058] =
>> > [ 3460.138017] [ BUG: trinity-c70/27193 still has locks held! ]
>> > [ 3460.141491] 3.15.0-next-20140611-sasha-00022-g9466d2f-dirty #638 Not 
>> > tainted
>> > [ 3460.143219] -
>> > [ 3460.167979] 2 locks held by trinity-c70/27193:
>> > [ 3460.169172] #0: (tick_broadcast_lock){-.-.-.}, at: 
>> > tick_handle_periodic_broadcast (kernel/time/tick-broadcast.c:301)
>> > [ 3460.468004] #1: (timekeeper_lock){-.-.-.}, at: update_wall_time 
>> > (kernel/time/timekeeping.c:1371)
>> > [ 3460.920025]
>> > [ 3460.920025] stack backtrace:
>> > [ 3460.928146] CPU: 0 PID: 27193 Comm: trinity-c70 Not tainted 
>> > 3.15.0-next-20140611-sasha-00022-g9466d2f-dirty #638
>> > [ 3460.928648] can: request_module (can-proto-3) failed.
>> > [ 3460.943111]  8800576ef4c8 8800576efc88 a551093c 
>> > 0001
>> > [ 3460.962511]  880056f9b000 8800576efca8 a21c6a43 
>> > 880056f9bbe8
>> > [ 3461.007184]  880056f9bbe8 8800576efd48
>> > [ 3461.017661] can: request_module (can-proto-0) failed.
>> > [ 3461.045536]  a21636ea 8800576efcc8
>> > [ 3461.170992] Call Trace:
>> > [ 3461.174122] dump_stack (lib/dump_stack.c:52)
>> > [ 3461.558864] debug_check_no_locks_held (kernel/locking/lockdep.c:4107 
>> > kernel/locking/lockdep.c:4113)
>> > [ 3461.577066] do_exit (kernel/exit.c:796)
>> > [ 3461.592523] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
>> > [ 3461.629067] ? _raw_spin_unlock_irq 
>> > (./arch/x86/include/asm/paravirt.h:819 
>> > include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
>> > [ 3461.671525] do_group_exit (kernel/exit.c:884)
>> > [ 3461.717091] get_signal_to_deliver (kernel/signal.c:2347)
>> > [ 3461.724142] ? vtime_account_user (kernel/sched/cputime.c:687)
>> > [ 3461.800505] do_signal (arch/x86/kernel/signal.c:698)
>> > [ 3461.808792] ? vtime_account_user (kernel/sched/cputime.c:687)
>> > [ 3461.812780] ? preempt_count_sub (kernel/sched/core.c:2602)
>> > [ 3461.824601] ? context_tracking_user_exit 
>> > (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) 
>> > kernel/context_tracking.c:182 (discriminator 2))
>> > [ 3461.827619] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
>> > [ 3461.831486] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 
>> > kernel/locking/lockdep.c:2599)
>> > [ 3461.841516] do_notify_resume (arch/x86/kernel/signal.c:751)
>> > [ 3461.847056] retint_signal (arch/x86/kernel/entry_64.S:921)
> 
> Huh.. Got me.. I don't see how the task can get out of
> update_wall_time() w/o releasing the timekeeper_lock. Same with the
> tick_broadcast_lock.  Does this happen all the time or was this a
> one-off?

One-off, only seen once.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Bad page state in process swapper pfn:00000

2014-06-11 Thread Laura Abbott

On 6/11/2014 12:19 PM, Geert Uytterhoeven wrote:
> Hi Laura,
> 
> On Wed, Jun 11, 2014 at 7:32 PM, Laura Abbott  wrote:
>> On 6/11/2014 4:40 AM, Geert Uytterhoeven wrote:
>>> With current mainline, I get an early crash on r8a7791/koelsch:
>>>
>>> BUG: Bad page state in process swapper  pfn:0
>>> page:ee20b000 count:0 mapcount:0 mapping:66756200 index:0x65726566
>>> page flags: 
>>> 0x74656b63(locked|error|lru|active|owner_priv_1|arch_1|private|writeback|head|swapcache
>>> |reclaim|mlocked)
>>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> bad because of flags:
>>> page flags: 0x212861(locked|lru|active|private|writeback|swapcache|mlocked)
>>>
>>> I bisected it to
>>>
>>> commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
>>> Author: Laura Abbott 
>>> Date:   Sun Apr 13 22:54:58 2014 +0100
>>>
>>> ARM: 8025/1: Get rid of meminfo
> 
>>> -Truncating RAM at 4000-bfff to -6f7f (vmalloc region overlap).
>>> +Truncating RAM at 0x-0xc000 to -0x6f80
>>
>> I'm guessing this is the issue right there.
>>
>> memory@4000 {
>> device_type = "memory";
>> reg = <0 0x4000 0 0x4000>;
>> };
>>
>> memory@2 {
>> device_type = "memory";
>> reg = <2 0x 0 0x4000>;
>> };
>>
>> Those are the memory nodes from r8a7791-koelsch.dts. It looks like the memory
>> outside 32-bit address range is not being dropped. It was suggested to drop
>> early_init_dt_add_memory_arch which called arm_add_memory and just use the
>> generic of code directly but the problem is arm_add_memory does additional
>> bounds checking. It looks like early_init_dt_add_memory_arch in
>> drivers/of/fdt.c checks for overflow on u64 types but not for overflow
>> on phys_addr_t (32 bits) which is what memblock_add actually uses.
>>
>> For a quick test, can you try bringing back early_init_dt_add_memory_arch
>> and see if that fixes the problem:
>>
>> diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
>> index e94a157..ea9ce92 100644
>> --- a/arch/arm/kernel/devtree.c
>> +++ b/arch/arm/kernel/devtree.c
>> @@ -27,6 +27,10 @@
>>  #include 
>>  #include 
>>
>> +void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>> +{
>> +   arm_add_memory(base, size);
>> +}
>>
>>  #ifdef CONFIG_SMP
>>  extern struct of_cpu_method __cpu_method_of_table[];
> 
> Thanks, my board boots again after applying this quick hack.
> 

Great! Russell are you okay with taking the above as a fix or would you prefer
I fixup drivers/of/fdt.c right now? 


Thanks,
Laura

8<
>From 14bda557a108ad197e7c5f040f50ca024b45cc17 Mon Sep 17 00:00:00 2001
From: Laura Abbott 
Date: Wed, 11 Jun 2014 19:39:29 -0700
Subject: [PATCH] arm: Bring back early_init_dt_add_memory_arch

Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) removed
early_init_dt_add_memory_arch in favor of using the common method.
The common method does not currently check for memory outside of
32-bit bounds which may lead to memory being incorrectly added to
the system. Bring back early_init_dt_add_memory_arch for now until
the generic function can be fixed up.

Reported-by: Geert Uytterhoeven 
Signed-off-by: Laura Abbott 
---
 arch/arm/kernel/devtree.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
index e94a157..ea9ce92 100644
--- a/arch/arm/kernel/devtree.c
+++ b/arch/arm/kernel/devtree.c
@@ -27,6 +27,10 @@
 #include 
 #include 
 
+void __init early_init_dt_add_memory_arch(u64 base, u64 size)
+{
+   arm_add_memory(base, size);
+}
 
 #ifdef CONFIG_SMP
 extern struct of_cpu_method __cpu_method_of_table[];
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] sctp: Fix sk_ack_backlog wrap-around problem

2014-06-11 Thread Xufeng Zhang

Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.

Fix-suggested-by: Neil Horman 
Signed-off-by: Xufeng Zhang 
---
Change for v2:
  Drop the redundant test for temp suggested by Vlad Yasevich.

 net/sctp/associola.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 39579c3..0b8 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -330,7 +330,7 @@ void sctp_association_free(struct sctp_association *asoc)
/* Only real associations count against the endpoint, so
 * don't bother for if this is a temporary association.
 */
-   if (!asoc->temp) {
+   if (!list_empty(>asocs)) {
list_del(>asocs);
 
/* Decrement the backlog value for a TCP-style listening
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] nouveau: rename aux.c to auxiliary.c for reviewing it on Windows

2014-06-11 Thread Lai Jiangshan

On 06/11/2014 04:24 PM, Borislav Petkov wrote:
> On Wed, Jun 11, 2014 at 03:53:55PM +0800, Lai Jiangshan wrote:
>> When I tried to review the linux kernel on Windows in my laptop
>> and incidentally found that it failed to open the aux.c.
>>
>> And Microsoft tells me:
>> (http://msdn.microsoft.com/en-us/library/aa365247.aspx)
>>
>>> Do not use the following reserved names for the name of a file:
>>> CON, PRN, AUX, NUL, , and LPT9. Also avoid these names
>>> followed immediately by an extension; for example, NUL.txt
>>> is not recommended.
>>
>> The name "aux" is listed above. And it sometimes makes sense to
>> review linux on windows, so we rename the aux.c to auxiliary.c.
> 
> I think you missed April 1st by more than 2 months.
> 

EVERY DAY IS APRIL FIRST if you fool others' convenience.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Proposal to realize hot-add several sections one time

2014-06-11 Thread Zhang Zhen

On 2014/6/12 6:08, David Rientjes wrote:
> On Wed, 11 Jun 2014, Zhang Zhen wrote:
> 
>> Hi,
>>
>> Now we can hot-add memory by
>>
>> % echo start_address_of_new_memory > /sys/devices/system/memory/probe
>>
>> Then, [start_address_of_new_memory, start_address_of_new_memory +
>> memory_block_size] memory range is hot-added.
>>
>> But we can only hot-add *one section one time* by this way.
>> Whether we can add an argument on behalf of the count of the sections to add 
>> ?
>> So we can can hot-add *several sections one time*. Just like:
>>
> 
> Not necessarily true, it depends on sections_per_block.  Don't believe 
> Documentation/memory-hotplug.txt that suggests this is only for powerpc, 
> x86 and sh allow this interface as well.
> 
>> % echo start_address_of_new_memory count_of_sections > 
>> /sys/devices/system/memory/probe
>>
>> Then, [start_address_of_new_memory, start_address_of_new_memory +
>> count_of_sections * memory_block_size] memory range is hot-added.
>>
>> If this proposal is reasonable, i will send a patch to realize it.
>>
> 
> The problem is knowing how much memory is being onlined so that you can 
> definitively determine what count_of_sections should be.  The number of 
> pages per memory section depends on PAGE_SIZE and SECTION_SIZE_BITS which 
> differ depending on the architectures that support this interface.  So if 
> you support count_of_sections, it would return errno even though you have 
> onlined some sections.
> 
Hum, sorry.
My expression is not right. The count of sections one time hot-added
depends on sections_per_block.

Now we are porting the memory-hotplug to arm.
But we can only hot-add *fixed number of sections one time* on particular 
architecture.

Whether we can add an argument on behalf of the count of the blocks to add ?

% echo start_address_of_new_memory count_of_blocks > 
/sys/devices/system/memory/probe

Then, [start_address_of_new_memory, start_address_of_new_memory + 
count_of_blocks * memory_block_size]
memory range is hot-added.

So user don't need execute several times of echo when they want to hot add 
multi-block size memory.

Any comments are welcome.

Best regards!
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] nouveau: rename aux.c to auxiliary.c for reviewing it on Windows

2014-06-11 Thread Lai Jiangshan

On 06/11/2014 04:24 PM, Borislav Petkov wrote:
> On Wed, Jun 11, 2014 at 03:53:55PM +0800, Lai Jiangshan wrote:
>> When I tried to review the linux kernel on Windows in my laptop
>> and incidentally found that it failed to open the aux.c.
>>
>> And Microsoft tells me:
>> (http://msdn.microsoft.com/en-us/library/aa365247.aspx)
>>
>>> Do not use the following reserved names for the name of a file:
>>> CON, PRN, AUX, NUL, , and LPT9. Also avoid these names
>>> followed immediately by an extension; for example, NUL.txt
>>> is not recommended.
>>
>> The name "aux" is listed above. And it sometimes makes sense to
>> review linux on windows, so we rename the aux.c to auxiliary.c.
> 
> I think you missed April 1st by more than 2 months.
> 

EVERY DAY IS APRIL FIRST if you fool others' convenience.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 09/10] mm, compaction: try to capture the just-created high-order freepage

2014-06-11 Thread Minchan Kim

On Wed, Jun 11, 2014 at 04:56:49PM +0200, Vlastimil Babka wrote:
> On 06/09/2014 11:26 AM, Vlastimil Babka wrote:
> > Compaction uses watermark checking to determine if it succeeded in creating
> > a high-order free page. My testing has shown that this is quite racy and it
> > can happen that watermark checking in compaction succeeds, and moments later
> > the watermark checking in page allocation fails, even though the number of
> > free pages has increased meanwhile.
> > 
> > It should be more reliable if direct compaction captured the high-order free
> > page as soon as it detects it, and pass it back to allocation. This would
> > also reduce the window for somebody else to allocate the free page.
> > 
> > This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture 
> > a
> > suitable high-order page immediately when it is made available"), but later
> > reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable
> > high-order page") due to flaws.
> > 
> > This patch differs from the previous attempt in two aspects:
> > 
> > 1) The previous patch scanned free lists to capture the page. In this patch,
> > only the cc->order aligned block that the migration scanner just 
> > finished
> > is considered, but only if pages were actually isolated for migration in
> > that block. Tracking cc->order aligned blocks also has benefits for the
> > following patch that skips blocks where non-migratable pages were found.
> > 

Generally I like this.

> > 2) In this patch, the isolated free page is allocated through extending
> > get_page_from_freelist() and buffered_rmqueue(). This ensures that it 
> > gets
> > all operations such as prep_new_page() and page->pfmemalloc setting that
> > was missing in the previous attempt, zone statistics are updated etc.
> > 

But this part is problem.
Capturing is not common but you are adding more overhead in hotpath for rare 
cases
where even they are ok to fail so it's not a good deal.
In such case, We have no choice but to do things you mentioned (ex,statistics,
prep_new_page, pfmemalloc) manually in __alloc_pages_direct_compact.

> > Evaluation is pending.
> 
> Uh, so if anyone wants to test it, here's a fixed version, as initial 
> evaluation
> showed it does not actually capture anything (which should not affect patch 
> 10/10
> though) and debugging this took a while.
> 
> - for pageblock_order (i.e. THP), capture was never attempted, as the for 
> cycle
>   in isolate_migratepages_range() has ended right before the
>   low_pfn == next_capture_pfn check
> - lru_add_drain() has to be done before pcplists drain. This made a big 
> difference
>   (~50 successful captures -> ~1300 successful captures)
>   Note that __alloc_pages_direct_compact() is missing lru_add_drain() as 
> well, and
>   all the existing watermark-based compaction termination decisions (which 
> happen
>   before the drain in __alloc_pages_direct_compact()) don't do any draining 
> at all.
>   
> -8<-
> From: Vlastimil Babka 
> Date: Wed, 28 May 2014 17:05:18 +0200
> Subject: [PATCH fixed 09/10] mm, compaction: try to capture the just-created
>  high-order freepage
> 
> Compaction uses watermark checking to determine if it succeeded in creating
> a high-order free page. My testing has shown that this is quite racy and it
> can happen that watermark checking in compaction succeeds, and moments later
> the watermark checking in page allocation fails, even though the number of
> free pages has increased meanwhile.
> 
> It should be more reliable if direct compaction captured the high-order free
> page as soon as it detects it, and pass it back to allocation. This would
> also reduce the window for somebody else to allocate the free page.
> 
> This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a
> suitable high-order page immediately when it is made available"), but later
> reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable
> high-order page") due to flaws.
> 
> This patch differs from the previous attempt in two aspects:
> 
> 1) The previous patch scanned free lists to capture the page. In this patch,
>only the cc->order aligned block that the migration scanner just finished
>is considered, but only if pages were actually isolated for migration in
>that block. Tracking cc->order aligned blocks also has benefits for the
>following patch that skips blocks where non-migratable pages were found.
> 
> 2) In this patch, the isolated free page is allocated through extending
>get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets
>all operations such as prep_new_page() and page->pfmemalloc setting that
>was missing in the previous attempt, zone statistics are updated etc.
> 
> Evaluation is pending.
> 
> Signed-off-by: Vlastimil Babka 
> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Joonsoo Kim 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph

Re: [PATCH 12/14] block: Add specific data integrity errors

2014-06-11 Thread Martin K. Petersen

> "Christoph" == Christoph Hellwig  writes:

>> Introduce a set of error codes that can be used by the block
>> integrity subsystem to signal which class of error was encountered by
>> either the I/O controller or the storage device.

Christoph> I'd also love to see something catching these so that they
Christoph> don't leak to userspace.

This patch was really meant as an RFC. But it is absolutely my intent to
expose these to userspace. Albeit only to applications that supply or
request protection information via Darrick's aio extensions.

I also use these errors extensively in my test utilities to verify that
the correct problem gets detected by the correct entity when I inject an
error.

I should add that in the past I had a separate error status inside the
bip that contained the data integrity specific errors. But that involved
all sorts of evil hacks when bios were cloned, split and stacked. After
talking to nab about his needs for target I figured it was better to
just define new error codes and handle them like Hannes did for the
extended SCSI errors.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the net-next tree

2014-06-11 Thread Stephen Rothwell

Hi all,

After merging the net-next tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

net/bridge/br_multicast.c: In function 'br_multicast_has_querier_adjacent':
net/bridge/br_multicast.c:2248:25: error: 'struct net_bridge' has no member 
named 'ip6_other_query'
   if (!timer_pending(>ip6_other_query.timer) ||
 ^
In file included from include/linux/idr.h:18:0,
 from include/linux/kernfs.h:14,
 from include/linux/sysfs.h:15,
 from include/linux/kobject.h:21,
 from include/linux/device.h:17,
 from include/linux/dma-mapping.h:5,
 from arch/powerpc/include/asm/machdep.h:14,
 from arch/powerpc/include/asm/archrandom.h:6,
 from include/linux/random.h:81,
 from include/linux/net.h:22,
 from include/linux/skbuff.h:27,
 from include/linux/if_ether.h:23,
 from net/bridge/br_multicast.c:15:
net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member 
named 'ip6_querier'
   rcu_dereference(br->ip6_querier.port) == port)
 ^
net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member 
named 'ip6_querier'
   rcu_dereference(br->ip6_querier.port) == port)
 ^
In file included from include/linux/err.h:4:0,
 from net/bridge/br_multicast.c:13:
net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member 
named 'ip6_querier'
   rcu_dereference(br->ip6_querier.port) == port)
 ^
net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member 
named 'ip6_querier'
   rcu_dereference(br->ip6_querier.port) == port)
 ^
In file included from include/linux/idr.h:18:0,
 from include/linux/kernfs.h:14,
 from include/linux/sysfs.h:15,
 from include/linux/kobject.h:21,
 from include/linux/device.h:17,
 from include/linux/dma-mapping.h:5,
 from arch/powerpc/include/asm/machdep.h:14,
 from arch/powerpc/include/asm/archrandom.h:6,
 from include/linux/random.h:81,
 from include/linux/net.h:22,
 from include/linux/skbuff.h:27,
 from include/linux/if_ether.h:23,
 from net/bridge/br_multicast.c:15:
net/bridge/br_multicast.c:2249:25: error: 'struct net_bridge' has no member 
named 'ip6_querier'
   rcu_dereference(br->ip6_querier.port) == port)
 ^

Caused by commit 2cd4143192e8 ("bridge: memorize and export selected
IGMP/MLD querier port").  This build has CONFIG_IPV6 not set.

I have reverted that commit for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

Re: [PATCH v2] powerpc: Avoid circular dependency with zImage.%

2014-06-11 Thread Mike Qiu


This v2 patch is good,

Tested-by: Mike Qiu 

On 06/11/2014 11:40 PM, Michal Marek wrote:

The rule to create the final images uses a zImage.% pattern.
Unfortunately, this also matches the names of the zImage.*.lds linker
scripts, which appear as a dependency of the final images. This somehow
worked when $(srctree) used to be an absolute path, but now the pattern
matches too much. List only the images from $(image-y) as the target of
the rule, to avoid the circular dependency.

Signed-off-by: Michal Marek 
---
v2:
   - Filter out duplicates in the target list
   - fix the platform argument to cmd_wrap

  arch/powerpc/boot/Makefile | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 426dce7..ccc25ed 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -333,8 +333,8 @@ $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz
  $(obj)/zImage.initrd.%: vmlinux $(wrapperbits)
$(call if_changed,wrap,$*,,,$(obj)/ramdisk.image.gz)

-$(obj)/zImage.%: vmlinux $(wrapperbits)
-   $(call if_changed,wrap,$*)
+$(addprefix $(obj)/, $(sort $(filter zImage.%, $(image-y: vmlinux 
$(wrapperbits)
+   $(call if_changed,wrap,$(subst $(obj)/zImage.,,$@))

  # dtbImage% - a dtbImage is a zImage with an embedded device tree blob
  $(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] sched: Rework migrate_tasks()

2014-06-11 Thread Mike Galbraith

On Wed, 2014-06-11 at 23:33 +0400, Kirill Tkhai wrote: 
> В Ср, 11/06/2014 в 17:43 +0400, Kirill Tkhai пишет:
> > 
> > 11.06.2014, 17:15, "Srikar Dronamraju" :
> > >>>  * Kirill Tkhai  [2014-06-11 13:52:10]:
> >    Currently migrate_tasks() skips throttled tasks,
> >    because they are not pickable by pick_next_task().
> > >>>  Before migrate_tasks() is called, we do call set_rq_offline(), in
> > >>>  migration_call().
> > >>>
> > >>>  Shouldnt this take care of unthrottling the tasks and making sure that
> > >>>  they can be picked by pick_next_task().
> > >>  If we do this separate for every class, we'll have to do this 3 times.
> > >>  Furthermore, deadline class does not have a list of throttled tasks.
> > >>  So we'll have to the same as I did: to lock tasklist_lock and to iterate
> > >>  throw all of the tasks in the system just to found deadline tasks.
> > >
> > > I think you misread my comment.
> > >
> > > Currently migrate_task() gets called from migration_call() and in the
> > > migration_call() before migrate_tasks(), set_rq_offline() should put
> > > tasks back using unthrottle_cfs_rq().
> > >
> > > So my question is: Why are these tasks not getting unthrottled
> > > through we are calling set_rq_offline? To me set_rq_offline is
> > > calling the actual sched class routines to do the needful.
> > >
> > > I can understand about deadline tasks, because we don't have a deadline
> > > But thats the only tasks that we need to fix.
> > 
> > Hm, I tested that on fair class tasks. They used to disappear from
> > /proc/sched_debug and used to hang. I'll check all once again.
> > 
> > I'm agree with you, if set_rq_offline() already presents, we should use it.
> > 
> > /me went to clarify why it does not work in my test.
> 
> Ok, it looks like the problem is that unthrottled cfs_rq may become throttled
> again ;)

Dejavu.  You could try either of the below.

On Thu, Apr 03, 2014 at 10:02:18AM +0200, Mike Galbraith wrote:
> Prevent large wakeup latencies from being accounted to the wrong task.
> 
> Cc: 
> Signed-off-by:Mike Galbraith 
> ---
>  kernel/sched/core.c |7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -118,7 +118,12 @@ void update_rq_clock(struct rq *rq)
>  {
>   s64 delta;
>  
> - if (rq->skip_clock_update > 0)
> + /*
> +  * Set during wakeup to indicate we are on the way to schedule().
> +  * Decrement to ensure that a very large latency is not accounted
> +  * to the wrong task.
> +  */
> + if (rq->skip_clock_update-- > 0)
>   return;
>  
>   delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;

OK; so as previously mentioned (Oct '13); I've entirely had it with
skip_clock_update bugs, so I got angry and did the below.

Its not something I can merge, not least because it uses trace_printk(),
but it should be usable to 1) demonstate the above actually helps and 2)
make damn sure we got it right this time :-)

I've not really stared at the output much yet; but when you select
function_graph tracer; we get lovely things like:

  8)   |  wake_up_process() 
{
  8)   |
try_to_wake_up() {
  8)   0.076 us|  
_raw_spin_lock_irqsave();
  8)   0.092 us|  
task_waking_fair();
  8)   0.106 us|  
select_task_rq_fair();
  8)   0.161 us|  
_raw_spin_lock();
  8)   |  
ttwu_do_activate.constprop.103() {
  8)   |
activate_task() {
  8)   |  
enqueue_task() {
  8)   |
update_rq_clock() {
  8)   |  /* 
clock update: 420411 */
  8)   0.084 us|  
sched_avg_update();
  8)   1.277 us|}
  8)   |
enqueue_task_fair() {
  8)   |  
enqueue_entity() {
  8)   0.083 us|
update_curr();
  8)   0.071 us|
__compute_runnable_contrib();
  8)   0.074 us|
__update_entity_load_avg_contrib();
  8)   0.121 us|
update_cfs_rq_blocked_load();
  8)   0.236 us|

linux-next: manual merge of the net-next tree with Linus' tree

2014-06-11 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
drivers/infiniband/hw/cxgb4/cm.c between commits 11b8e22d4d09
("RDMA/cxgb4: Fix vlan support") and 9eccfe109b27 ("RDMA/cxgb4: Add
support for iWARP Port Mapper user space service") from Linus' tree and
commits 92e7ae71726c ("iw_cxgb4: Choose appropriate hw mtu index and
ISS for iWARP connections") and b408ff282dda ("iw_cxgb4: don't truncate
the recv window size") from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/infiniband/hw/cxgb4/cm.c
index 96d7131ab974,965eaafd5851..
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@@ -533,38 -532,17 +537,49 @@@ static int send_abort(struct c4iw_ep *e
return c4iw_l2t_send(>com.dev->rdev, skb, ep->l2t);
  }
  
 +/*
 + * c4iw_form_pm_msg - Form a port mapper message with mapping info
 + */
 +static void c4iw_form_pm_msg(struct c4iw_ep *ep,
 +  struct iwpm_sa_data *pm_msg)
 +{
 +  memcpy(_msg->loc_addr, >com.local_addr,
 +  sizeof(ep->com.local_addr));
 +  memcpy(_msg->rem_addr, >com.remote_addr,
 +  sizeof(ep->com.remote_addr));
 +}
 +
 +/*
 + * c4iw_form_reg_msg - Form a port mapper message with dev info
 + */
 +static void c4iw_form_reg_msg(struct c4iw_dev *dev,
 +  struct iwpm_dev_data *pm_msg)
 +{
 +  memcpy(pm_msg->dev_name, dev->ibdev.name, IWPM_DEVNAME_SIZE);
 +  memcpy(pm_msg->if_name, dev->rdev.lldi.ports[0]->name,
 +  IWPM_IFNAME_SIZE);
 +}
 +
 +static void c4iw_record_pm_msg(struct c4iw_ep *ep,
 +  struct iwpm_sa_data *pm_msg)
 +{
 +  memcpy(>com.mapped_local_addr, _msg->mapped_loc_addr,
 +  sizeof(ep->com.mapped_local_addr));
 +  memcpy(>com.mapped_remote_addr, _msg->mapped_rem_addr,
 +  sizeof(ep->com.mapped_remote_addr));
 +}
 +
+ static void best_mtu(const unsigned short *mtus, unsigned short mtu,
+unsigned int *idx, int use_ts)
+ {
+   unsigned short hdr_size = sizeof(struct iphdr) +
+ sizeof(struct tcphdr) +
+ (use_ts ? 12 : 0);
+   unsigned short data_size = mtu - hdr_size;
+ 
+   cxgb4_best_aligned_mtu(mtus, hdr_size, data_size, 8, idx);
+ }
+ 
  static int send_connect(struct c4iw_ep *ep)
  {
struct cpl_act_open_req *req;
@@@ -583,14 -561,11 +598,15 @@@
int sizev6 = is_t4(ep->com.dev->rdev.lldi.adapter_type) ?
sizeof(struct cpl_act_open_req6) :
sizeof(struct cpl_t5_act_open_req6);
 -  struct sockaddr_in *la = (struct sockaddr_in *)>com.local_addr;
 -  struct sockaddr_in *ra = (struct sockaddr_in *)>com.remote_addr;
 -  struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)>com.local_addr;
 -  struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)>com.remote_addr;
 +  struct sockaddr_in *la = (struct sockaddr_in *)
 +   >com.mapped_local_addr;
 +  struct sockaddr_in *ra = (struct sockaddr_in *)
 +   >com.mapped_remote_addr;
 +  struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)
 + >com.mapped_local_addr;
 +  struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)
 + >com.mapped_remote_addr;
+   int win;
  
wrlen = (ep->com.remote_addr.ss_family == AF_INET) ?
roundup(sizev4, 16) :
@@@ -1796,7 -1821,8 +1862,8 @@@ static int import_ep(struct c4iw_ep *ep
step = cdev->rdev.lldi.nrxq /
cdev->rdev.lldi.nchan;
ep->rss_qid = cdev->rdev.lldi.rxq_ids[
 -  cxgb4_port_idx(n->dev) * step];
 +  cxgb4_port_idx(pdev) * step];
+   set_tcp_window(ep, (struct port_info *)netdev_priv(pdev));
  
if (clear_mpa_v1) {
ep->retry_with_mpa_v1 = 0;


signature.asc
Description: PGP signature

Re: [PATCH v5 4/4] drivers: net: Add APM X-Gene SoC ethernet driver support.

2014-06-11 Thread Iyappan Subramanian

On Thu, Jun 5, 2014 at 12:45 AM, David Miller  wrote:
> From: Iyappan Subramanian 
> Date: Mon,  2 Jun 2014 12:39:14 -0700
>
>> + netdev_err(ndev, "LERR: %d ring_num: %d ", status, ring->num);
>> + switch (status) {
>> + case HBF_READ_DATA:
>> + netdev_err(ndev, "HBF read data error\n");
>> + break;
>
> This is not really appropriate.
>
> We have statistics like the ones you are incrementing in this
> function as the mechanism people can use to learn what events
> happened on an interface, and how many times they happened.
>
> Therefore, emitting a log message for each one of those events too is
> not necessary.
>
> We don't emit a netdev_err() for every packet that the IPv4 stack
> drops due to a bad checksum, for example.
>
> Please get rid of this.

Sure.  I will remove the error message prints and clean up the function.

>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread George Spelvin

> Sadly I can't find the tree, but I'm 94% sure it was Skein-256
> (specifically the SHA3-256 candidate parameter set.)

It would be nice to have two hash functions, optimized separately for 32-
and 64-bit processors.  As the Skein report says, the algorithm can
be adapted to 32 bits easily enough.

I also did some work a while ago to adapt the Skein parameter search
code to develop a Skein-192 (6x32 bits) that would fit into registers
on x86-32.  (It got stalled when I e-mailed Niels Ferguson about it
and never heard back; it fell off the to-do list while I was waiting.)

The intended target was IPv6 address hashing for sequence number
randomization, but it could be used for pool hashing, too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Input: evdev - Fix incorrect kfree of err_free_client after vzalloc

2014-06-11 Thread Yongtaek Lee

This bug was introduced by commit 92eb77d ("Input: evdev - fall back
to vmalloc for client event buffer").

vzalloc is used to alloc memory as fallback in case of failure
of kzalloc. But err_free_client was not considered on below case.
1. kzalloc fail
2. vzalloc success
3. evdev_open_device fail
4. kfree

So that address checking is needed to call correct free function.

Signed-off-by: Yongtaek Lee 
Reviewed-by: Daniel Stone 
---
 drivers/input/evdev.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index ce953d8..f60daa0 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -422,7 +422,10 @@ static int evdev_open(struct inode *inode, struct file 
*file)
 
  err_free_client:
evdev_detach_client(evdev, client);
-   kfree(client);
+   if (is_vmalloc_addr(client))
+   vfree(client);
+   else
+   kfree(client);
return error;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] lib: glob.c: Add CONFIG_GLOB_SELFTEST

2014-06-11 Thread George Spelvin

>> Persuading GCC to throw away *all* the self-test data after running
>> it was surprisingly annoying.
>
> Yeah.  Props for making the attempt.

*Whew*.  I was worried I'd get upbraided for overoptimziation.

>> The one thing I'm not really sure about is what to do if the self-test
>> fails.  For now, I make the module_init function fail too.  Opinions?
>
> The printk should suffice - someone will notice it eventually.
> 
> Using KERN_ERR to report a failure might help draw attention to it.

I'm not sure what you mean by "might"; I already *do* report it as KERN_ERR.

If you think failing the module load is a bad idea, feel free to
modify the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 4/5] kernel/rcu/tree.c:3435 fix a sparse warning

2014-06-11 Thread Pranith Kumar

On Wed, Jun 11, 2014 at 5:25 PM,   wrote:
> On Wed, Jun 11, 2014 at 04:39:42PM -0400, Pranith Kumar wrote:
>> kernel/rcu/tree.c:3435:21: warning: incorrect type in argument 1 (different 
>> modifiers)
>> kernel/rcu/tree.c:3435:21:expected int ( *threadfn )( ... )
>> kernel/rcu/tree.c:3435:21:got int ( static [toplevel] [noreturn] 
>> * )( ... )
>>
>> by removing __noreturn attribute and adding unreachable() as suggested on the
>> mailing list: http://www.kernelhub.org/?p=2=436683
>>
>> Signed-off-by: Pranith Kumar 
>
> No, we should not do this.  And the mailing list post you point to seems
> to explicitly recommend using noreturn rather than unreachable.
>
> If sparse doesn't understand this, that's a bug in sparse, not in the
> kernel.  Sparse needs to understand that it's OK to drop noreturn from a
> function pointer type, just not OK to add it.
>
> Rationale: If you call a noreturn function through a non-noreturn
> function pointer, you might end up with unnecessary cleanup code, but
> the call will work.  If you call a non-noreturn function through a
> noreturn function pointer, the caller will not expect a return, and may
> crash; *that* should require a cast.
>

Yes, I understand the rationale. I think this should be fixed in
sparse. Please drop this patch.

Thanks!
-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-11 Thread Long, Wai Man



On 6/11/2014 6:54 AM, Peter Zijlstra wrote:

On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote:

Enabling this configuration feature causes a slight decrease the
performance of an uncontended lock-unlock operation by about 1-2%
mainly due to the use of a static key. However, uncontended lock-unlock
operation are really just a tiny percentage of a real workload. So
there should no noticeable change in application performance.

No, entirely unacceptable.


+#ifdef CONFIG_VIRT_UNFAIR_LOCKS
+/**
+ * queue_spin_trylock_unfair - try to acquire the queue spinlock unfairly
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static __always_inline int queue_spin_trylock_unfair(struct qspinlock *lock)
+{
+   union arch_qspinlock *qlock = (union arch_qspinlock *)lock;
+
+   if (!qlock->locked && (cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0))
+   return 1;
+   return 0;
+}
+
+/**
+ * queue_spin_lock_unfair - acquire a queue spinlock unfairly
+ * @lock: Pointer to queue spinlock structure
+ */
+static __always_inline void queue_spin_lock_unfair(struct qspinlock *lock)
+{
+   union arch_qspinlock *qlock = (union arch_qspinlock *)lock;
+
+   if (likely(cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0))
+   return;
+   /*
+* Since the lock is now unfair, we should not activate the 2-task
+* pending bit spinning code path which disallows lock stealing.
+*/
+   queue_spin_lock_slowpath(lock, -1);
+}

Why is this needed?


I added the unfair version of lock and trylock as my original version 
isn't a simple test-and-set lock. Now I changed the core part to use the 
simple test-and-set lock. However, I still think that an unfair version 
in the fast path can be helpful to performance when both the unfair lock 
and paravirt spinlock are enabled. In this case, paravirt spinlock code 
will disable the unfair lock code in the slowpath, but still allow the 
unfair version in the fast path to get the best possible performance in 
a virtual guest.


Yes, I could take that out to allow either unfair or paravirt spinlock, 
but not both. I do think that a little bit of unfairness will help in 
the virtual environment.



+/*
+ * Redefine arch_spin_lock and arch_spin_trylock as inline functions that will
+ * jump to the unfair versions if the static key virt_unfairlocks_enabled
+ * is true.
+ */
+#undef arch_spin_lock
+#undef arch_spin_trylock
+#undef arch_spin_lock_flags
+
+/**
+ * arch_spin_lock - acquire a queue spinlock
+ * @lock: Pointer to queue spinlock structure
+ */
+static inline void arch_spin_lock(struct qspinlock *lock)
+{
+   if (static_key_false(_unfairlocks_enabled))
+   queue_spin_lock_unfair(lock);
+   else
+   queue_spin_lock(lock);
+}
+
+/**
+ * arch_spin_trylock - try to acquire the queue spinlock
+ * @lock : Pointer to queue spinlock structure
+ * Return: 1 if lock acquired, 0 if failed
+ */
+static inline int arch_spin_trylock(struct qspinlock *lock)
+{
+   if (static_key_false(_unfairlocks_enabled))
+   return queue_spin_trylock_unfair(lock);
+   else
+   return queue_spin_trylock(lock);
+}

So I really don't see the point of all this? Why do you need special
{try,}lock paths for this case? Are you worried about the upper 24bits?


No, as I said above. I was planning for the coexistence of unfair lock 
in the fast path and paravirt spinlock in the slowpath.



diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index ae1b19d..3723c83 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -217,6 +217,14 @@ static __always_inline int try_set_locked(struct qspinlock 
*lock)
  {
struct __qspinlock *l = (void *)lock;
  
+#ifdef CONFIG_VIRT_UNFAIR_LOCKS

+   /*
+* Need to use atomic operation to grab the lock when lock stealing
+* can happen.
+*/
+   if (static_key_false(_unfairlocks_enabled))
+   return cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0;
+#endif
barrier();
ACCESS_ONCE(l->locked) = _Q_LOCKED_VAL;
barrier();

Why? If we have a simple test-and-set lock like below, we'll never get
here at all.


Again, it is due the coexistence of unfair lock in fast path and 
paravirt spinlock in the slowpath.



@@ -252,6 +260,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
  
  	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
  
+#ifdef CONFIG_VIRT_UNFAIR_LOCKS

+   /*
+* A simple test and set unfair lock
+*/
+   if (static_key_false(_unfairlocks_enabled)) {
+   cpu_relax();/* Relax after a failed lock attempt */

Meh, I don't think anybody can tell the difference if you put that in or
not, therefore don't.


Yes, I can take out the cpu_relax() here.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

Re: [PATCH] sctp: Fix sk_ack_backlog wrap-around problem

2014-06-11 Thread Xufeng Zhang


On 06/12/2014 12:55 AM, Vlad Yasevich wrote:

On 06/11/2014 08:55 AM, Vlad Yasevich wrote:
   

On 06/10/2014 10:37 PM, Xufeng Zhang wrote:
 

Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.

Fix-suggested-by: Neil Horman
Signed-off-by: Xufeng Zhang
---
  net/sctp/associola.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 39579c3..60564f2 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -330,7 +330,7 @@ void sctp_association_free(struct sctp_association *asoc)
/* Only real associations count against the endpoint, so
 * don't bother for if this is a temporary association.
 */
-   if (!asoc->temp) {
+   if (!asoc->temp&&  !list_empty(>asocs)) {
list_del(>asocs);

/* Decrement the backlog value for a TCP-style listening

   

I am not crazy about this patch.   It's been suggested before that may
be duplicate cookie processing should really be creating a temporary
association since that's is how that association is being used.
 

I had another look at the description for triggering this issue and
realized that I was thinking about something else when looking at
this solution.

There is however no need to test both the list and temp value.
We can simply always test the that list is not empty before doing
list_del().
   


Thanks a lot for the comment!
I'll send V2 later.


Thanks,
Xufeng



-vlad

   

  It
might be nice at that approach.  It actually benefits us as the
association destruction would happen immediately instead of being delayed.
 


   


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/5] of: Automatic console registration cleanups

2014-06-11 Thread Rob Herring

On Fri, Mar 28, 2014 at 11:08 AM, Grant Likely  wrote:
> Hi all,
>
> This is a series that I've been playing with over the last few days to
> clean up the selection of default console devices when using the device
> tree. The device tree defines a way of specifying the console by using a
> "stdout-path" property in the /chosen node, but very few drivers
> actually attempt to use that data, and so for most platforms there needs
> to be a "console=" line in the command line if a serial port is intended
> to be used as the console.
>
> With this series, if there is a /chosen/stdout-path property, and if
> that property points to a serial port node, then when the serial driver
> registers the port, the core uart_add_one_port() function will notice
> and if no console= argument was provided then add it as a preferred
> console.
>
> I've not tested this very extensively yet, but I want to get some
> feedback before I go further.
>
> The one downside with this approach is that it doesn't do anything for
> early console setup. That still needs to be added on a per-driver basis,
> but at least it shouldn't conflict with this approach.

Hey, what happened with this series?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/4] mutex: Try to acquire mutex only if it is unlocked

2014-06-11 Thread Long, Wai Man



On 6/11/2014 2:37 PM, Jason Low wrote:

Upon entering the slowpath in __mutex_lock_common(), we try once more to
acquire the mutex. We only try to acquire if (lock->count >= 0). However,
what we actually want here is to try to acquire if the mutex is unlocked
(lock->count == 1).
   
This patch changes it so that we only try-acquire the mutex upon entering

the slowpath if it is unlocked, rather than if the lock count is non-negative.
This helps further reduce unnecessary atomic xchg() operations.

Furthermore, this patch uses !mutex_is_locked(lock) to do the initial
checks for if the lock is free rather than directly calling atomic_read()
on the lock->count, in order to improve readability.

Signed-off-by: Jason Low 
---
  kernel/locking/mutex.c |7 ---
  1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4bd9546..e4d997b 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -432,7 +432,8 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
if (owner && !mutex_spin_on_owner(lock, owner))
break;
  
-		if ((atomic_read(>count) == 1) &&

+   /* Try to acquire the mutex if it is unlocked. */
+   if (!mutex_is_locked(lock) &&
(atomic_cmpxchg(>count, 1, 0) == 1)) {
lock_acquired(>dep_map, ip);
if (use_ww_ctx) {
@@ -479,9 +480,9 @@ slowpath:
  
  	/*

 * Once more, try to acquire the lock. Only try-lock the mutex if
-* lock->count >= 0 to reduce unnecessary xchg operations.
+* it is unlocked to reduce unnecessary xchg() operations.
 */
-   if (atomic_read(>count) >= 0 && (atomic_xchg(>count, 0) == 
1))
+   if (!mutex_is_locked(lock) && (atomic_xchg(>count, 0) == 1))
goto skip_wait;
  
  	debug_mutex_lock_common(lock, );


Acked-by: Waiman Long 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/4] mutex: Delete the MUTEX_SHOW_NO_WAITER macro

2014-06-11 Thread Long, Wai Man



On 6/11/2014 2:37 PM, Jason Low wrote:

v1->v2:
- There were discussions in v1 about a possible mutex_has_waiters()
   function. This patch didn't use that function because the places which
   used MUTEX_SHOW_NO_WAITER requires checking for lock->count while an
   actual mutex_has_waiters() should check for !list_empty(wait_list).
   We'll just delete the macro and directly use atomic_read() + comments.

MUTEX_SHOW_NO_WAITER() is a macro which checks for if there are
"no waiters" on a mutex by checking if the lock count is non-negative.
Based on feedback from the discussion in the earlier version of this
patchset, the macro is not very readable.
   
Furthermore, checking lock->count isn't always the correct way to

determine if there are "no waiters" on a mutex. For example, a negative
count on a mutex really only means that there "potentially" are
waiters. Likewise, there can be waiters on the mutex even if the count is
non-negative. Thus, "MUTEX_SHOW_NO_WAITER" doesn't always do what the name
of the macro suggests.
  
So this patch deletes the MUTEX_SHOW_NO_WAITERS() macro, directly

use atomic_read() instead of the macro, and adds comments which
elaborate on how the extra atomic_read() checks can help reduce
unnecessary xchg() operations.

Signed-off-by: Jason Low 
---
  kernel/locking/mutex.c |   18 --
  1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index dd26bf6..4bd9546 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -46,12 +46,6 @@
  # include 
  #endif
  
-/*

- * A negative mutex count indicates that waiters are sleeping waiting for the
- * mutex.
- */
-#defineMUTEX_SHOW_NO_WAITER(mutex) (atomic_read(&(mutex)->count) 
>= 0)
-
  void
  __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
  {
@@ -483,8 +477,11 @@ slowpath:
  #endif
spin_lock_mutex(>wait_lock, flags);
  
-	/* once more, can we acquire the lock? */

-   if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(>count, 0) == 1))
+   /*
+* Once more, try to acquire the lock. Only try-lock the mutex if
+* lock->count >= 0 to reduce unnecessary xchg operations.
+*/
+   if (atomic_read(>count) >= 0 && (atomic_xchg(>count, 0) == 
1))
goto skip_wait;
  
  	debug_mutex_lock_common(lock, );

@@ -504,9 +501,10 @@ slowpath:
 * it's unlocked. Later on, if we sleep, this is the
 * operation that gives us the lock. We xchg it to -1, so
 * that when we release the lock, we properly wake up the
-* other waiters:
+* other waiters. We only attempt the xchg if the count is
+* non-negative in order to avoid unnecessary xchg operations:
 */
-   if (MUTEX_SHOW_NO_WAITER(lock) &&
+   if (atomic_read(>count) >= 0 &&
(atomic_xchg(>count, -1) == 1))
break;
  


Acked-by: Waiman Long 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v5.1 03/03]: hwrng: khwrngd derating per device

2014-06-11 Thread H. Peter Anvin

On 05/27/2014 07:11 AM, Torsten Duwe wrote:
> [checkpatch tells me not to 0-init...]
> 
> This patch introduces a derating factor to struct hwrng for
> the random bits going into the kernel input pool, and a common
> default derating for drivers which do not specify one.
> 
> Signed-off-by: Torsten Duwe 
> 

Did we lose track of this patchset?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] locking/mutex: Try to acquire mutex only if it is unlocked

2014-06-11 Thread Long, Wai Man



On 6/11/2014 5:48 PM, Jason Low wrote:

On Wed, 2014-06-11 at 17:00 -0400, Long, Wai Man wrote:

On 6/9/2014 1:38 PM, Jason Low wrote:

On Wed, 2014-06-04 at 13:58 -0700, Davidlohr Bueso wrote:

On Wed, 2014-06-04 at 13:57 -0700, Davidlohr Bueso wrote:

In addition, how about the following helpers instead:
- mutex_is_unlocked() : count > 0
- mutex_has_waiters() : count < 0, or list_empty(->wait_list)

^ err, that's !list_empty()

Between checking for (count < 0) or checking for !list_empty(wait_list)
for waiters:

Now that I think about it, I would expect a mutex_has_waiters() function
to return !list_empty(wait_list) as that really tells whether or not
there are waiters. For example, in highly contended cases, there can
still be waiters on the mutex if count is 1.

Likewise, in places where we currently use "MUTEX_SHOW_NO_WAITER", we
need to check for (count < 0) to ensure lock->count is a negative value
before the thread sleeps on the mutex.

One option would be to still remove MUTEX_SHOW_NO_WAITER(), directly use
atomic_read() in place of the macro, and just comment on why we have an
extra atomic_read() that may "appear redundant". Another option could be
to provide a function that checks for "potential waiters" on the mutex.

Any thoughts?


For the first MUTEX_SHOW_NO_WAITER() call site, you can replace it with
a check for (count > 0).

Yup, in my v2 patch, the first call site becomes !mutex_is_locked(lock)
which is really a check for (count == 1).


Yes, your v2 patch looks fine to me.


The second call site within the for loop,
however, is a bit more tricky. It has to serve 2 purposes:

1. Opportunistically get the lock
2. Set the count value to -1 to indicate someone is waiting on the lock,
that is why an xchg() operation has to be done even if its value is 0.

I do agree that the naming isn't that good. Maybe it can be changed to
something like

static inline int mutex_value_has_waiters(mutex *lock){ return
lock->count < 0; }

So I can imagine that a mutex_value_has_waiters() function might still
not be a great name, since the mutex can have waiters in the case that
the value lock->count >= 0.

In the second call site, do you think we should just do a direct
atomic_read(lock->count) >= 0 and comment that we only do the xchg if
the count is non-negative to avoid unnecessary xchg? That what I did in
my v2 patch.


I think that is a good idea to avoid any controversy in naming.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Proposal.

2014-06-11 Thread Gogna

I have a proposal for you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] xhci: clear root port wake on bits if controller isn't wake-up capable

2014-06-11 Thread baolu



On 06/11/2014 11:26 PM, Greg Kroah-Hartman wrote:

On Wed, Jun 11, 2014 at 06:25:20AM +0800, Lu Baolu wrote:

When the xHCI PCI host is suspended, if do_wakeup is false in xhci_pci_suspend,
xhci_bus_suspend needs to clear all root port wake on bits. Otherwise some Intel
platform may get a spurious wakeup, even if PCI PME# is disabled.

http://marc.info/?l=linux-usb=138194006009255=2

Signed-off-by: Lu Baolu 
---
  drivers/usb/host/xhci-hub.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

Should this also be a stable kernel patch?  If so, how far back?

Yes.

This patch should be back-ported to kernels as old as 2.6.37, that
contains the commit 9777e3ce907d4cb5a513902a87ecd03b52499569
"USB: xHCI: bus power management implementation".

Thanks,
-baolu


thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Rafael Tinoco

Ok, some misconfiguration here probably, never mind. I'll finish the
tests tomorrow, compare with existent ones and let you know asap. Tks.

On Wed, Jun 11, 2014 at 10:09 PM, Eric W. Biederman
 wrote:
> Rafael Tinoco  writes:
>
>> I'm getting a kernel panic with your patch:
>>
>> -- panic
>> -- mount_block_root
>> -- mount_root
>> -- prepare_namespace
>> -- kernel_init_freeable
>>
>> It is giving me an unknown block device for the same config file i
>> used on other builds. Since my test is running on a kvm guest under a
>> ramdisk, i'm still checking if there are any differences between this
>> build and other ones but I think there aren't.
>>
>> Any chances that "prepare_namespace" might be breaking mount_root ?
>
> My patch boots for me
>
> Eric



-- 
-- 
Rafael David Tinoco
Software Sustaining Engineer @ Canonical
Canonical Technical Services Engineering Team
# Email: rafael.tin...@canonical.com (GPG: 87683FC0)
# Phone: +55.11.9.6777.2727 (Americas/Sao_Paulo)
# LP: ~inaddy | IRC: tinoco | Skype: rafael.tinoco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-11 Thread Eric W. Biederman

Rafael Tinoco  writes:

> I'm getting a kernel panic with your patch:
>
> -- panic
> -- mount_block_root
> -- mount_root
> -- prepare_namespace
> -- kernel_init_freeable
>
> It is giving me an unknown block device for the same config file i
> used on other builds. Since my test is running on a kvm guest under a
> ramdisk, i'm still checking if there are any differences between this
> build and other ones but I think there aren't.
>
> Any chances that "prepare_namespace" might be breaking mount_root ?

My patch boots for me

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] tracing: Fix memory leak on failure path in ftrace_allocate_pages()

2014-06-11 Thread Namhyung Kim

Hi Steve,

On Wed, 11 Jun 2014 10:03:40 -0400, Steven Rostedt wrote:
> On Wed, 11 Jun 2014 17:06:53 +0900
> Namhyung Kim  wrote:
>
>> As struct ftrace_page is managed in a single linked list, it should
>> free from the start page.
>> 
>> Signed-off-by: Namhyung Kim 
>> ---
>>  kernel/trace/ftrace.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 5b372e3ed675..ddfda763ded7 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -2398,7 +2398,8 @@ ftrace_allocate_pages(unsigned long num_to_init)
>>  return start_pg;
>>  
>>   free_pages:
>> -while (start_pg) {
>> +pg = start_pg;
>> +while (pg) {
>
> It works with just the added "pg = start_page", I would keep the
> while (start_pg) still.

The reason why I changed it is the code actually uses pg rather than
start_pg in the loop.  So it's more comfortable for me to check the pg
in the condition.  But it's minor, I won't insist it strongly.. :)

Thanks,
Namhyung


>>  order = get_count_order(pg->size / ENTRIES_PER_PAGE);
>>  free_pages((unsigned long)pg->records, order);
>>  start_pg = pg->next;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rtc/hctosys.c Problem during kernel boot

2014-06-11 Thread John Whitmore

On Wed, Jun 11, 2014 at 04:53:55PM -0700, John Stultz wrote:
> On Wed, Jun 11, 2014 at 4:01 PM, John Whitmore  wrote:
> > I'm having a problem with a DS3234 SPI based RTC chip and rtc/hctosys.c on 
> > the
> > 3.10.29 kernel of the RaspberryPi. I'm not sure this is a bug or not but
> > thought I'd ask. I've enabled the kernel config option for HCTOSYS which, on
> > boot, should set the system's date/time to the value read from the RTC. I
> > tried tihs but it would never happen on the RPi. I eventually found in 
> > syslog
> > that the kernel boot is attempting to execute the hctosys functionality 
> > prior
> > to the SPI being initialised. As a result of this when hctosys is attempted
> > there is not /dev/rtc0 yet. A short time later the DS3234 RTC is initialised
> > but by then it's too late.
> >
> > Once the system has booted and I've logged in I can read and write to the 
> > RTC
> > and all seems good but /sys/class/rtc/rtc0/hctosys is '0' indicating that 
> > the
> > system time was not set on boot.
> >
> > There is a "deprecated" warning in the syslog coming from the spi of the 
> > board
> > file so perhaps that is the cause. So is this a bug? And if so what can I do
> > to resolve it. The hctosys is on a "late_initcall" so not sure of timing.
> 
> Sigh. Yea, this issue was brought up previously, but we never got
> around to a solution that could be merged.
> 
> Basically hctosys is late_init, but if the driver is a module, it
> might not be loaded in time. Adding hooks at module load time when
> RTCs are registered could be done, but then you have the issue that
> userspace might have set the clock via something like ntpdate, so
> HCTOSYS could then cause the clock to be less accurate.
> 
> So we need to make the HCTOSYS functionality happen at RTC register
> time, but it needs to set the clock only if nothing has set the clock
> already. This requires a new timekeeeping interface - something like
> timekeeping_set_time_if_unset(), which atomically would set the time
> if it has never been set.
> 
> You can read some of the previous discussion here:
> https://lkml.org/lkml/2013/6/17/533
> 

Thanks a million for that information I'll have a look, as I might try and
resolve the issue.

> I'd be very interested in patches to resolve this!
> 
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] perf timechart: add more options to IO mode

2014-06-11 Thread Namhyung Kim

On Tue, 10 Jun 2014 19:04:54 +0400, Stanislav Fomichev wrote:
> --io-skip-eagain - don't show EAGAIN errors
> --io-min-time- make small io bursts visible
> --io-merge-dist  - merge adjacent events
>
> Signed-off-by: Stanislav Fomichev 
> ---
>  tools/perf/Documentation/perf-timechart.txt |  9 ++
>  tools/perf/builtin-timechart.c  | 49 
> +++--
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-timechart.txt 
> b/tools/perf/Documentation/perf-timechart.txt
> index ec6b46c7bca0..62c29656ad95 100644
> --- a/tools/perf/Documentation/perf-timechart.txt
> +++ b/tools/perf/Documentation/perf-timechart.txt
> @@ -64,6 +64,15 @@ TIMECHART OPTIONS
>   duration or tasks with given name. If number is given it's interpreted
>   as number of nanoseconds. If non-numeric string is given it's
>   interpreted as task name.
> +--io-skip-eagain::
> + Don't draw EAGAIN IO events.
> +--io-min-time::
> + Draw small events as if they lasted min-time. Useful when you need
> + to see very small and fast IO. Default value is 1ms.

It's in nano-second unit, right?  If so, it's very unconvenient for user
to specify.  Maybe we could support to parse unit (s, ms, us, ...) also.


> +--io-merge-dist::
> + Merge events that are merge-dist nanoseconds apart.
> + Reduces number of figures on the SVG and makes it more render-friendly.
> + Default value is 1us.

Ditto.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: more ruminations

2014-06-11 Thread H. Peter Anvin

On 06/11/2014 06:11 AM, Theodore Ts'o wrote:
> On Tue, Jun 10, 2014 at 11:58:06PM -0400, George Spelvin wrote:
>> You can forbid underflows, but the code doesn't forbid overflows.
>>
>> 1. Assume the entropy count starts at 512 bytes (input pool full)
>> 2. Random writer mixes in 20 bytes of entropy into the input pool.
>> 2a. Input pool entropy is, however, capped at 512 bytes.
>> 3. Random extractor extracts 32 bytes of entropy from the pool.
>>Succeeds because 32 < 512.  Pool is left with 480 bytes of
>>entropy.
>> 3a. Random extractor decrements pool entropy estimate to 480 bytes.
>> This is accurate.
>> 4. Random writer credits pool with 20 bytes of entropy.
>> 5. Input pool entropy is now 480 bytes, estimate is 500 bytes.
> 
> Good point, that's a potential problem, although messing up the
> accounting betewen 480 and 500 bytes is not nearly as bad as messing
> up 0 and 20.
> 
> It's not something where if the changes required massive changes, that
> I'd necessarily feel the need to backport them to stable.  It's a
> certificational weakness, but it's a not disaster.
> 

Actually, with the new accounting code it will be even less serious,
because mixing into a nearly full pool is discounted heavily -- because
it is not like filling a queue; the mixing function will
probabilistically overwrite existing pool entropy.

So it is still a race condition, and still wrong, but it is a lot less
wrong.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] perf timechart: implement IO mode

2014-06-11 Thread Namhyung Kim

Hi Stanislav,

On Tue, 10 Jun 2014 19:04:52 +0400, Stanislav Fomichev wrote:
> In IO mode timechart shows any disk/network activity.

[SNIP]
> +Record system-wide IO events:
> +
> +  $ perf timechart record -I

I got a segfault here:

  Core was generated by `perf timechart record -I'.
  Program terminated with signal 11, Segmentation fault.
  #0  parse_options_step (ctx=ctx@entry=0x7fff6dcd8ef0, 
  options=options@entry=0x587de0, usagestr=usagestr@entry=0x588900)
  at util/parse-options.c:353
  353   if (*arg != '-' || !arg[1]) {
  Missing separate debuginfos, use: debuginfo-install glibc-2.17-9.fc20.x86_64 
nss-softokn-freebl-3.15-1.fc20.x86_64 numactl-libs-2.0.7-6.fc17.x86_64
  (gdb) bt
  #0  parse_options_step (ctx=ctx@entry=0x7fff6dcd8ef0, options=options@entry=
  0x587de0, usagestr=usagestr@entry=0x588900) at util/parse-options.c:353
  #1  0x00465cf4 in parse_options_subcommand (argc=argc@entry=197, 
  argv=argv@entry=0x13fd6d0, options=options@entry=0x587de0, 
  subcommands=subcommands@entry=0x0, usagestr=usagestr@entry=0x588900, 
  flags=flags@entry=2) at util/parse-options.c:462
  #2  0x00465f54 in parse_options (argc=argc@entry=197, argv=argv@entry=
  0x13fd6d0, options=options@entry=0x587de0, usagestr=usagestr@entry=
  0x588900, flags=flags@entry=2) at util/parse-options.c:492
  #3  0x00429ef8 in cmd_record (argc=argc@entry=197, argv=argv@entry=
  0x13fd6d0, prefix=prefix@entry=0x0) at builtin-record.c:894
  #4  0x00434c59 in timechart__io_record (argv=0x7fff6dcdc270, argc=0)
  at builtin-timechart.c:1756
  #5  cmd_timechart (argc=0, argv=0x7fff6dcdc270, prefix=)
  at builtin-timechart.c:1957
  #6  0x0041b603 in run_builtin (p=p@entry=0x7fa230, argc=argc@entry=3, 
  argv=argv@entry=0x7fff6dcdc270) at perf.c:319
  #7  0x0041ae82 in handle_internal_command (argv=0x7fff6dcdc270, 
argc=3)
  at perf.c:376
  #8  run_argv (argv=0x7fff6dcdc060, argcp=0x7fff6dcdc06c) at perf.c:420
  #9  main (argc=3, argv=0x7fff6dcdc270) at perf.c:534

It was because, as I said, my system doesn't have pread64 syscall.. you
missed to decrease rec_argc when skipping invalid events. :)

> +
> +  then generate timechart:
> +
> +  $ perf timechart

After fixing the problem, I could run timechart and generate an
output.svg file.  But it doesn't show any IO activity.. process info was
there in grey boxes (rect.process3) but no color boxes.  I also tried
recording with ping and dd, but the result was same.  I suspect it's
because of some mis-calculation of position or size of the boxes.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] NVMe: conversion to blk-mq

2014-06-11 Thread Ming Lei

On Mon, Jun 9, 2014 at 6:40 PM, Ming Lei  wrote:
> The root cause is that device returns
> NVME_INTERNAL_DEV_ERROR(0x6) with your conversion
> patch.

The above problem is caused by qemu not handling -EAGAIN from
io_submit(), so please ignore the report.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] of: Add vendor 2nd prefix for Asahi Kasei Corp

2014-06-11 Thread Simon Horman

On Wed, Jun 11, 2014 at 05:53:02PM -0700, Kuninori Morimoto wrote:
> From: Kuninori Morimoto 
> 
> Current vendor-prefixes.txt already has
> "ak" prefix for Asahi Kasei Corp by
> ae8c4209af2cec065fef15d200a42a04130799f7
> (of: Add vendor prefix for Asahi Kasei Corp.)
> 
> It went through the appropriate review process,
> and is already in use.
> But, almost all Asahi Kasei chip driver is
> using "asahi-kasei" prefix today.
> 
> Due to ABIness, this patch adds
> "asahi-kasei" to vendor-prefixes.txt.
> checkpatch.pl will report WARNING without this patch.
> (DT compatible string vendor "asahi-kasei" appears un-documented)
> 
> OTOH, Asahi Kasei is usually referred as "AKM",
> but this patch doesn't care about it.
> Because no DT is using it today.
> 
> Cc: Stephen Warren 
> Cc: Mark Brown 
> Cc: Geert Uytterhoeven ,
> Signed-off-by: Kuninori Morimoto 

Acked-by: Simon Horman 

> ---
>  .../devicetree/bindings/vendor-prefixes.txt|1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
> b/Documentation/devicetree/bindings/vendor-prefixes.txt
> index abc3080..7e4bb83 100644
> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt
> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
> @@ -17,6 +17,7 @@ amstaos AMS-Taos Inc.
>  apm  Applied Micro Circuits Corporation (APM)
>  arm  ARM Ltd.
>  armadeus ARMadeus Systems SARL
> +asahi-kasei  Asahi Kasei Corp.
>  atmelAtmel Corporation
>  auo  AU Optronics Corporation
>  avagoAvago Technologies
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/mm] x86/smep: Be more informative when signalling an SMEP fault

2014-06-11 Thread tip-bot for Jiri Kosina

Commit-ID:  eff50c347fcc8feeb8c1723c23c89aba67c60263
Gitweb: http://git.kernel.org/tip/eff50c347fcc8feeb8c1723c23c89aba67c60263
Author: Jiri Kosina 
AuthorDate: Tue, 10 Jun 2014 22:49:31 +0200
Committer:  H. Peter Anvin 
CommitDate: Wed, 11 Jun 2014 17:55:30 -0700

x86/smep: Be more informative when signalling an SMEP fault

If pagefault triggers due to SMEP triggering, it can't be really easily
distinguished from any other oops-causing pagefault, which might lead to quite
some confusion when trying to understand the reason for the oops.

Print an explanatory message in case the fault happened during instruction
fetch for _PAGE_USER page which is present and executable on SMEP-enabled CPUs.

This is consistent with what we are doing for NX already; in addition to
immediately seeing from the oops what might be happening, it can even easily
give a good indication to sysadmins who are carefully monitoring their kernel
logs that someone might be trying to pwn them.

Signed-off-by: Jiri Kosina 
Link: http://lkml.kernel.org/r/alpine.lnx.2.00.1406102248490.1...@pobox.suse.cz
Signed-off-by: H. Peter Anvin 
---
 arch/x86/mm/fault.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 858b47b..9de4cdb 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -575,6 +575,8 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long 
address)
 
 static const char nx_warning[] = KERN_CRIT
 "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n";
+static const char smep_warning[] = KERN_CRIT
+"unable to execute userspace code (SMEP?) (uid: %d)\n";
 
 static void
 show_fault_oops(struct pt_regs *regs, unsigned long error_code,
@@ -595,6 +597,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code,
 
if (pte && pte_present(*pte) && !pte_exec(*pte))
printk(nx_warning, from_kuid(_user_ns, 
current_uid()));
+   if (pte && pte_present(*pte) && pte_exec(*pte) &&
+   (pgd_flags(*pgd) & _PAGE_USER) &&
+   (read_cr4() & X86_CR4_SMEP))
+   printk(smep_warning, from_kuid(_user_ns, 
current_uid()));
}
 
printk(KERN_ALERT "BUG: unable to handle kernel ");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NTB driver support in haswell platform?

2014-06-11 Thread Yijing Wang

Hi Jon, Thanks for your detailed explanation. Now I have a clearer 
understanding of it.

Thanks! :)
Yijing.

On 2014/6/12 1:18, Jon Mason wrote:
> On Wed, Jun 11, 2014 at 05:03:38PM +0800, Yijing Wang wrote:
>> Hi Jon,
>>I have a Intel Haswell platform in hand, and our team want to use NTB in 
>> this platform.
>> I checked the current intel NTB driver in Linux kernel, I found the Haswell 
>> NTB pci device id
>> is not contained in ntb_pci_tbl[]. I want to know whether current kernel ntb 
>> driver can support
>> the ntb device in Haswell platform ?
> 
> Yes, it does support Haswell and the Device IDs are in there.
> PCI_DEVICE_ID_INTEL_NTB_B2B_HSX, PCI_DEVICE_ID_INTEL_NTB_PS_HSX, and
> PCI_DEVICE_ID_INTEL_NTB_SS_HSX are the relevant dev ids for Haswell.
> 
>> Haswell NTB device id:
>>
>> From Haswell EDS 7.4.2
>>
>> did
>> Bus: 0 Device: 3 Function: 0 Offset: 2
>> Bit Attr Default Description
>> 15:0 RO-V 2F08h Device_Identification_Number — Device ID values vary from 
>> function to function.
>> Bits 15:8 are equal to 0x2F. The following list is a breakdown of the 
>> function groups.
>>   0x2F00 - 0x2F1F : PCI Express and DMI2
>>   0x2F20 - 0x2F3F : Integrated I/O Features
>>   0x2F40 - 0x2F5F : Performance Monitors
>>   0x2F80 - 0x2F9F : Intel QPI
>>   0x2FA0 - 0x2FBF : Home Agent/Memory Controller
>>   0x2FC0 - 0x2FDF : Power Management
>>   0x2FE0 - 0x2FFF : Cbo/Ring
>> Default value may vary based on bus, device, and function of this CSR 
>> location.
>>
>>
>> Current ntb_pci_tbl[] in Linux:
>>
>> #define PCI_DEVICE_ID_INTEL_NTB_B2B_JSF  0x3725
>> #define PCI_DEVICE_ID_INTEL_NTB_PS_JSF   0x3726
>> #define PCI_DEVICE_ID_INTEL_NTB_SS_JSF   0x3727
>> #define PCI_DEVICE_ID_INTEL_NTB_B2B_SNB  0x3C0D
>> #define PCI_DEVICE_ID_INTEL_NTB_PS_SNB   0x3C0E
>> #define PCI_DEVICE_ID_INTEL_NTB_SS_SNB   0x3C0F
>> #define PCI_DEVICE_ID_INTEL_NTB_B2B_IVT  0x0E0D
>> #define PCI_DEVICE_ID_INTEL_NTB_PS_IVT   0x0E0E
>> #define PCI_DEVICE_ID_INTEL_NTB_SS_IVT   0x0E0F
>> #define PCI_DEVICE_ID_INTEL_NTB_B2B_HSX  0x2F0D
>> #define PCI_DEVICE_ID_INTEL_NTB_PS_HSX   0x2F0E
>> #define PCI_DEVICE_ID_INTEL_NTB_SS_HSX   0x2F0F
>> #define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD  0x0C4E
>>
>> So we should modify the default device id to 0x2F0D, 0x2F0E or 0x2F0F ?
> 
> The device IDs are present above: PCI_DEVICE_ID_INTEL_NTB_B2B_HSX,
> PCI_DEVICE_ID_INTEL_NTB_PS_HSX, and PCI_DEVICE_ID_INTEL_NTB_SS_HSX.
> 
>> What's the difference between them?
> 
> The last 3 letters are the name of the CPU where NTB is found.  HSX is
> Haswell Xeon.  The 2-3 letters before that are the configuration type
> of the NTB device.  B2B is for "Back-to-back" configurations, aka
> "NTB-NTB". 
> 
> B2B
> [CPU]---[NTB]===[NTB]---[CPU]
> 
> PS/SS is for NTB-RP configurations.  PS is "Primary Side" and SS is
> "Secondary Side".
> 
> [CPU]---[SS|PS]---[CPU]
> 
> I have an NTB wiki on my github account
> (https://github.com/jonmason/ntb/wiki) describing the configuration,
> etc.  Also on the wiki is a link to a doc (not written by me, and
> contains references to a driver that was not made public) that has
> some graphics that might be useful.  Specifically, pages 10 and 17.
> To save time, the URL is
> http://download.intel.com/design/intarch/papers/323328.pdf
> 
> Let me know if you have any questions or issues, and I'll be happy to
> walk you through it.
> 
> Thanks,
> Jon
> 
>>
>> Thanks!
>> Yijing.
>>
>>
>>
>>
>> -- 
>> Thanks!
>> Yijing
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] of: Add vendor 2nd prefix for Asahi Kasei Corp

2014-06-11 Thread Kuninori Morimoto

From: Kuninori Morimoto 

Current vendor-prefixes.txt already has
"ak" prefix for Asahi Kasei Corp by
ae8c4209af2cec065fef15d200a42a04130799f7
(of: Add vendor prefix for Asahi Kasei Corp.)

It went through the appropriate review process,
and is already in use.
But, almost all Asahi Kasei chip driver is
using "asahi-kasei" prefix today.

Due to ABIness, this patch adds
"asahi-kasei" to vendor-prefixes.txt.
checkpatch.pl will report WARNING without this patch.
(DT compatible string vendor "asahi-kasei" appears un-documented)

OTOH, Asahi Kasei is usually referred as "AKM",
but this patch doesn't care about it.
Because no DT is using it today.

Cc: Stephen Warren 
Cc: Mark Brown 
Cc: Geert Uytterhoeven ,
Signed-off-by: Kuninori Morimoto 
---
 .../devicetree/bindings/vendor-prefixes.txt|1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index abc3080..7e4bb83 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -17,6 +17,7 @@ amstaos   AMS-Taos Inc.
 apmApplied Micro Circuits Corporation (APM)
 armARM Ltd.
 armadeus   ARMadeus Systems SARL
+asahi-kaseiAsahi Kasei Corp.
 atmel  Atmel Corporation
 auoAU Optronics Corporation
 avago  Avago Technologies
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch v3] mm, pcp: allow restoring percpu_pagelist_fraction default

2014-06-11 Thread David Rientjes

Oleg reports a division by zero error on zero-length write() to the
percpu_pagelist_fraction sysctl:

divide error:  [#1] SMP DEBUG_PAGEALLOC
CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: 8800d5aeb6e0 ti: 8800d87a2000 task.ti: 8800d87a2000
RIP: 0010:[]  [] 
percpu_pagelist_fraction_sysctl_handler+0x84/0x120
RSP: 0018:8800d87a3e78  EFLAGS: 00010246
RAX: 0f89 RBX: 88011f7fd000 RCX: 
RDX:  RSI: 0001 RDI: 0010
RBP: 8800d87a3e98 R08: 81d002c8 R09: 8800d87a3f50
R10: 000b R11: 0246 R12: 0060
R13: 81c3c3e0 R14: 81cfddf8 R15: 8801193b0800
FS:  7f614f1e9740() GS:88011f44() 
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f614f1fa000 CR3: d9291000 CR4: 06e0
Stack:
 0001 ffea 81c3c3e0 
 8800d87a3ee8 8122b163 8800d87a3f50 7fff1564969c
  8800d8098f00 7fff1564969c 8800d87a3f50
Call Trace:
 [] proc_sys_call_handler+0xb3/0xc0
 [] proc_sys_write+0x14/0x20
 [] vfs_write+0xba/0x1e0
 [] SyS_write+0x46/0xb0
 [] tracesys+0xe1/0xe6

However, if the percpu_pagelist_fraction sysctl is set by the user, it is also
impossible to restore it to the kernel default since the user cannot write 0 to 
the sysctl.

This patch allows the user to write 0 to restore the default behavior.  It
still requires a fraction equal to or larger than 8, however, as stated by the 
documentation for sanity.  If a value in the range [1, 7] is written, the 
sysctl 
will return EINVAL.

This successfully solves the divide by zero issue at the same time.

Reported-by: Oleg Drokin 
Cc: sta...@vger.kernel.org
Signed-off-by: David Rientjes 
---
 v3: remove needless ret = 0 assignment per Oleg
 rewrote changelog
 added sta...@vger.kernel.org

 Documentation/sysctl/vm.txt |  3 ++-
 kernel/sysctl.c |  3 +--
 mm/page_alloc.c | 40 
 3 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -702,7 +702,8 @@ The batch value of each per cpu pagelist is also updated as 
a result.  It is
 set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
 
 The initial value is zero.  Kernel does not use this value at boot time to set
-the high water marks for each per cpu page list.
+the high water marks for each per cpu page list.  If the user writes '0' to 
this
+sysctl, it will revert to this default behavior.
 
 ==
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -136,7 +136,6 @@ static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
 static int minolduid;
-static int min_percpu_pagelist_fract = 8;
 
 static int ngroups_max = NGROUPS_MAX;
 static const int cap_last_cap = CAP_LAST_CAP;
@@ -1328,7 +1327,7 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(percpu_pagelist_fraction),
.mode   = 0644,
.proc_handler   = percpu_pagelist_fraction_sysctl_handler,
-   .extra1 = _percpu_pagelist_fract,
+   .extra1 = ,
},
 #ifdef CONFIG_MMU
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -69,6 +69,7 @@
 
 /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
 static DEFINE_MUTEX(pcp_batch_high_lock);
+#define MIN_PERCPU_PAGELIST_FRACTION   (8)
 
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DEFINE_PER_CPU(int, numa_node);
@@ -4145,7 +4146,7 @@ static void __meminit zone_init_free_lists(struct zone 
*zone)
memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
-static int __meminit zone_batchsize(struct zone *zone)
+static int zone_batchsize(struct zone *zone)
 {
 #ifdef CONFIG_MMU
int batch;
@@ -4261,8 +4262,8 @@ static void pageset_set_high(struct per_cpu_pageset *p,
pageset_update(>pcp, high, batch);
 }
 
-static void __meminit pageset_set_high_and_batch(struct zone *zone,
-   struct per_cpu_pageset *pcp)
+static void pageset_set_high_and_batch(struct zone *zone,
+  struct per_cpu_pageset *pcp)
 {
if (percpu_pagelist_fraction)
pageset_set_high(pcp,
@@ -5881,23

Re: drivers/char/random.c: More futzing about

2014-06-11 Thread H. Peter Anvin

On 06/11/2014 01:41 PM, H. Peter Anvin wrote:
> On 06/11/2014 12:25 PM, Theodore Ts'o wrote:
>> On Wed, Jun 11, 2014 at 09:48:31AM -0700, H. Peter Anvin wrote:
>>> While talking about performance, I did a quick prototype of random using
>>> Skein instead of SHA-1, and it was measurably faster, in part because
>>> Skein produces more output per hash.
>>
>> Which Skein parameters did you use, and how much stack space was
>> required for it?  Skein-512 is described as needing 200 bytes of
>> state, IIRC (which I assume most of which comes from Threefish key
>> schedule). 
>>
> 
> I believe I used Skein-256, but I'd have to dig to find it again.
> 
>   -hpa
> 

Sadly I can't find the tree, but I'm 94% sure it was Skein-256
(specifically the SHA3-256 candidate parameter set.)

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: more ruminations

2014-06-11 Thread George Spelvin

> It's not something where if the changes required massive changes, that
> I'd necessarily feel the need to backport them to stable.  It's a
> certificational weakness, but it's a not disaster.

Agreed!  It's been there for years, and I'm not too worried.  It takes
a pretty tight race to cause the problem in the first place.

As you note, it only happens with a full pool (already a very secure
situation), and the magnitude is limited by the size of entropy additions,
which are normally small.

I'm just never happy with bugs in security-critical code.  "I don't
think that bug is exploitable" is almost as ominous a phrase as "Y'all
watch this!"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the samsung tree with Linus' tree

2014-06-11 Thread Stephen Rothwell

Hi Kukjin,

Today's linux-next merge of the samsung tree got a conflict in
arch/arm/mach-exynos/sleep.S between commit 25a9ef63cd2b ("ARM: l2c:
exynos: convert to common l2c310 early resume functionality") from
Linus' tree and commit af728bd84cc8 ("ARM: EXYNOS: Fix build error with
thumb2") from the samsung tree.

I fixed it up (the former removed the code updated by the latter) and
can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1520 matches

Mail list logo