date:20130523

Re: [PATCH 3/6] dma: Add a jz4740 dmaengine driver

2013-05-23 Thread Lars-Peter Clausen

On 05/24/2013 07:54 AM, Vinod Koul wrote:
> On Fri, May 24, 2013 at 07:58:04AM +0200, Lars-Peter Clausen wrote:
>> This one needs both.
>>
 +  jzcfg.mode = JZ4740_DMA_MODE_SINGLE;
 +  jzcfg.request_type = config->slave_id;
 +
 +  chan->config = *config;
 +
 +  jz4740_dma_configure(chan->jz_chan, &jzcfg);
 +
 +  return 0;
>>> You are NOT use src_addr/dstn_addr? How else are you passing the periphral
>>> address?
>> I'm saving the whole config, which will later be used to retrieve the source 
>> or
>> dest address.
> well I missed that and it is a bad idea. You dont know when client has
> freed/thrown the pointer so copy this instead..

I do copy the full config, not just the pointer to the config. Although
src_addr and dest_addr are the only two fields which are used later on at this
point. So I could change it to just copy src_addr and dest_addr, or well just
one of them depending on the direction.

> 
>>
 +}
>> [...]
 +static int jz4740_dma_alloc_chan_resources(struct dma_chan *c)
 +{
 +  struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
 +
 +  chan->jz_chan = jz4740_dma_request(chan, NULL);
 +  if (!chan->jz_chan)
 +  return -EBUSY;
 +
 +  jz4740_dma_set_complete_cb(chan->jz_chan, jz4740_dma_complete_cb);
 +
 +  return 0;
>>> Zero is not expected value, you need to return the descriptors allocated
>>> sucessfully.
>>
>> Well, zero descriptors have been allocated. As far as I can see only a 
>> negative
>> return value is treated as an error. Also the core doesn't seem to use the
>> return value for anything else but checking if it is an error.
> This is the API defination
> * @device_alloc_chan_resources: allocate resources and return the
> *  number of allocated descriptors
> 

But 0 is still the number of descriptors that have been pre-allocated.

- Lars


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/6] dma: Add a jz4740 dmaengine driver

2013-05-23 Thread Vinod Koul

On Fri, May 24, 2013 at 07:58:04AM +0200, Lars-Peter Clausen wrote:
> This one needs both.
> 
> >> +  jzcfg.mode = JZ4740_DMA_MODE_SINGLE;
> >> +  jzcfg.request_type = config->slave_id;
> >> +
> >> +  chan->config = *config;
> >> +
> >> +  jz4740_dma_configure(chan->jz_chan, &jzcfg);
> >> +
> >> +  return 0;
> > You are NOT use src_addr/dstn_addr? How else are you passing the periphral
> > address?
> I'm saving the whole config, which will later be used to retrieve the source 
> or
> dest address.
well I missed that and it is a bad idea. You dont know when client has
freed/thrown the pointer so copy this instead..

> 
> >> +}
> [...]
> >> +static int jz4740_dma_alloc_chan_resources(struct dma_chan *c)
> >> +{
> >> +  struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
> >> +
> >> +  chan->jz_chan = jz4740_dma_request(chan, NULL);
> >> +  if (!chan->jz_chan)
> >> +  return -EBUSY;
> >> +
> >> +  jz4740_dma_set_complete_cb(chan->jz_chan, jz4740_dma_complete_cb);
> >> +
> >> +  return 0;
> > Zero is not expected value, you need to return the descriptors allocated
> > sucessfully.
> 
> Well, zero descriptors have been allocated. As far as I can see only a 
> negative
> return value is treated as an error. Also the core doesn't seem to use the
> return value for anything else but checking if it is an error.
This is the API defination
* @device_alloc_chan_resources: allocate resources and return the
*  number of allocated descriptors

> >> +}
> >> +
> [...]
> >> +  dd->chancnt = 6;
> > hard coding is not advised
> 
> But there are 6 channels ;)
JZ4740_MAX_CH 6 :)

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net-next RFC 2/8] macvtap: return -EBADFD when TUNGETIFF fails

2013-05-23 Thread Jason Wang

On 05/23/2013 07:54 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:27AM +0800, Jason Wang wrote:
>> Tuntap return -EBADFD when TUNGETIFF fails, we should return the same value.
>>
>> Signed-off-by: Jason Wang 
> Can you add some more comments on why this matters?

Just to keep compatibility for tuntap.
> Any userspace that cares will have to handle both, right?

Ideally the userspace should expect the same behavior for both tap and
macvtap.

Maybe we should just keep this.
>
>> ---
>>  drivers/net/macvtap.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 59e9605..ce1c72a 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -928,7 +928,7 @@ static long macvtap_ioctl(struct file *file, unsigned 
>> int cmd,
>>  rcu_read_unlock_bh();
>>  
>>  if (!vlan)
>> -return -ENOLINK;
>> +return -EBADFD;
>>  
>>  ret = 0;
>>  if (copy_to_user(&ifr->ifr_name, vlan->dev->name, IFNAMSIZ) ||
>> -- 
>> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] crypto: ux500/[cryp|hash] - Enable Device Tree

2013-05-23 Thread Linus Walleij

On Thu, May 16, 2013 at 1:27 PM, Lee Jones  wrote:

> This patch-set simply allows ux500-[cryp|hash] drivers to be probed
> when Snowball is running with Device Tree enabled.
>
> Linus,
>
> Feel free to split the DTS changes out into your ux500-devicetree branch.

I have tentatively applied 1, 2, 3 & 4 to my device tree branch.

5 & 6 need bindings and ACKs.

I can merge them through the ux500 devicetree branch as well I
think, they "should" be orthogonal to other (DMA-related) changes
AFAICT.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] crypto: ux500/cryp - Enable DT probing of the driver

2013-05-23 Thread Linus Walleij

On Thu, May 16, 2013 at 1:27 PM, Lee Jones  wrote:

> By providing an OF match table with a suitable compatible string, we
> can ensure the ux500-crypt driver is probed by supplying an associated
> DT node in a given platform's Device Tree.
>
> Cc: Herbert Xu 
> Cc: linux-cry...@vger.kernel.org
> Signed-off-by: Lee Jones 
> ---
>  drivers/crypto/ux500/cryp/cryp_core.c |6 ++

W00t! No binding document?

OK I guess it's just reg, irq ... but still it's compulsory,
right?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] crypto: ux500/cryp - Enable DT probing of the driver

2013-05-23 Thread Herbert Xu

On Fri, May 24, 2013 at 08:20:38AM +0200, Linus Walleij wrote:
> On Thu, May 16, 2013 at 1:27 PM, Lee Jones  wrote:
> 
> > By providing an OF match table with a suitable compatible string, we
> > can ensure the ux500-crypt driver is probed by supplying an associated
> > DT node in a given platform's Device Tree.
> >
> > Cc: Herbert Xu 
> > Cc: linux-cry...@vger.kernel.org
> > Signed-off-by: Lee Jones 
> 
> Herbert, can I have your ACK on patch 5 & 6 in this series to take it
> through the ARM SoC tree?

Sure,

Acked-by: Herbert Xu 

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net-next RFC 3/8] macvtap: introduce macvtap_get_vlan()

2013-05-23 Thread Jason Wang

On 05/23/2013 11:11 PM, Sergei Shtylyov wrote:
> Hello.
>
> On 23-05-2013 7:12, Jason Wang wrote:
>
>> Factor out the device holding logic to a macvtap_get_vlan(), this
>> will be also
>> used by multiqueue API.
>
>> Signed-off-by: Jason Wang 
>> ---
>>   drivers/net/macvtap.c |   26 +++---
>>   1 files changed, 19 insertions(+), 7 deletions(-)
>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index ce1c72a..a36e49e 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -890,6 +890,23 @@ out:
>>   return ret;
>>   }
>>
>> +static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
>> +{
>> +struct macvlan_dev *vlan;
>
>Empty line wouldn't hurt here, after declaration.

Will add one line here. Thanks
>
>> +rcu_read_lock_bh();
>> +vlan = rcu_dereference_bh(q->vlan);
>> +if (vlan)
>> +dev_hold(vlan->dev);
>> +rcu_read_unlock_bh();
>> +
>> +return vlan;
>> +}
> [...]
>
> WBR, Sergei
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] crypto: ux500/cryp - Enable DT probing of the driver

2013-05-23 Thread Linus Walleij

On Thu, May 16, 2013 at 1:27 PM, Lee Jones  wrote:

> By providing an OF match table with a suitable compatible string, we
> can ensure the ux500-crypt driver is probed by supplying an associated
> DT node in a given platform's Device Tree.
>
> Cc: Herbert Xu 
> Cc: linux-cry...@vger.kernel.org
> Signed-off-by: Lee Jones 

Herbert, can I have your ACK on patch 5 & 6 in this series to take it
through the ARM SoC tree?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net-next RFC 7/8] macvtap: add TUNSETQUEUE ioctl

2013-05-23 Thread Jason Wang

On 05/23/2013 07:52 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:32AM +0800, Jason Wang wrote:
>> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
>> enable a queue of macvtap. This is used to be compatible at API layer of 
>> tuntap
>> to simplify the userspace to manage the queues.
>>
>> This is done by just introduce a pointer detached which was set when the 
>> queue
>> were detached. Only the queue with NULL set to this pointer can be used to
>> forward packets.
>>
>> Signed-off-by: Jason Wang 
> Isn't this a bit too tricky?

Well, not tricky and it was just like what we did the reshuffling in
PATCH 5/8
> Why keep disabled queues on the array at all?

Both are ok, but this method need less data structures in macvlan_dev.
> Let's add a linked list of all queues for accounting.
> Use array for enabled queues only, for data path.

This would not be simpler than just do in place reshuffling.
>
>> ---
>>  drivers/net/macvtap.c  |  147 
>> +++
>>  include/linux/if_macvlan.h |7 ++
>>  2 files changed, 126 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index e3f9344..06b10a5 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -85,67 +85,131 @@ static const struct proto_ops macvtap_socket_ops;
>>   */
>>  static DEFINE_SPINLOCK(macvtap_lock);
>>  
>> +static void macvtap_swap_slot(struct macvlan_dev *vlan, int a, int b)
>> +{
>> +struct macvtap_queue *q1, *q2;
>> +
>> +if (a == b)
>> +return;
>> +
>> +q1 = rcu_dereference_protected(vlan->taps[a],
>> +   lockdep_is_held(&macvtap_lock));
>> +q2 = rcu_dereference_protected(vlan->taps[b],
>> +   lockdep_is_held(&macvtap_lock));
>> +
>> +BUG_ON(q1 == NULL || q2 == NULL);
>> +
>> +rcu_assign_pointer(vlan->taps[a], vlan->taps[b]);
>> +rcu_assign_pointer(vlan->taps[b], q1);
>> +
>> +q1->queue_index = b;
>> +q2->queue_index = a;
>> +}
>> +
>>  static int macvtap_set_queue(struct net_device *dev, struct file *file,
>> -struct macvtap_queue *q)
>> + struct macvtap_queue *q, bool enable)
>>  {
>>  struct macvlan_dev *vlan = netdev_priv(dev);
>>  int err = -EBUSY;
>> +int total;
>>  
>>  spin_lock(&macvtap_lock);
>> -if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
>> +
>> +total = vlan->numvtaps + vlan->numdisabled;
>> +if (!enable && total == MAX_MACVTAP_QUEUES)
>> +goto out;
>> +
>> +err = -EINVAL;
>> +if (enable && q->queue_index < vlan->numvtaps)
>>  goto out;
>>  
>>  err = 0;
>> -rcu_assign_pointer(q->vlan, vlan);
>> -rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>> -sock_hold(&q->sk);
>>  
>> -q->file = file;
>> -q->queue_index = vlan->numvtaps;
>> -file->private_data = q;
>> +if (enable) {
>> +BUG_ON(q->queue_index >= total);
>> +macvtap_swap_slot(vlan, q->queue_index, vlan->numvtaps);
>> +--vlan->numdisabled;
>> +} else {
>> +rcu_assign_pointer(q->vlan, vlan);
>> +rcu_assign_pointer(vlan->taps[total], q);
>> +sock_hold(&q->sk);
>> +
>> +q->file = file;
>> +q->queue_index = total;
>> +file->private_data = q;
>> +if (vlan->numdisabled)
>> +macvtap_swap_slot(vlan, vlan->numvtaps, total);
>> +
>> +/* Make sure the pointers were seen before numvtaps. */
>> +smp_wmb();
>> +}
>>  
>>  vlan->numvtaps++;
>> -
>>  out:
>>  spin_unlock(&macvtap_lock);
>>  return err;
>>  }
>>  
>>  /*
>> - * The file owning the queue got closed, give up both
>> - * the reference that the files holds as well as the
>> - * one from the macvlan_dev if that still exists.
>> + * The file owning the queue got closed or disabled.
>> + *
>> + * If closed give up both the reference that the files holds as well as the
>> + * one from the macvlan_dev if that still exists. If disabled, reshuffle the
>> + * array of taps.
>>   *
>>   * Using the spinlock makes sure that we don't get
>>   * to the queue again after destroying it.
>>   */
>> -static void macvtap_put_queue(struct macvtap_queue *q)
>> +static int macvtap_put_queue(struct macvtap_queue *q, bool clean)
> s/clean/release/
> and add a comment explaining what it is.

Ok.
>>  {
>> -struct macvtap_queue *nq;
>>  struct macvlan_dev *vlan;
>> +int err = -EINVAL;
>>  
>>  spin_lock(&macvtap_lock);
>>  vlan = rcu_dereference_protected(q->vlan,
>>   lockdep_is_held(&macvtap_lock));
>> +
>>  if (vlan) {
>> +int total = vlan->numvtaps + vlan->numdisabled;
>>  int index = q->queue_index;
>> -BUG_ON(index >= vlan->numvtaps);
>> +bool disabled = q->queue_i

Re: [RFC PATCH] fs: call_usermodehelper_root helper introduced

2013-05-23 Thread Stanislav Kinsbursky


24.05.2013 01:32, Eric W. Biederman пишет:

"J. Bruce Fields"  writes:


On Thu, May 23, 2013 at 03:55:47PM -0400, J. Bruce Fields wrote:

On Thu, May 23, 2013 at 09:05:26AM -0400, Jeff Layton wrote:

What might help most here is to lay out a particular scenario for how
you envision setting up knfsd in a container so we can ensure that it's
addressed properly by whatever solution you settle on.


BTW the problem I have here is that the only case I've personally had
any interest in is using network and file namespaces to isolate nfsd's
to make them safe to migrate across nodes of a cluster.

So while the idea of making user namespaces and unprivileged knfsd and
the rest work is really interesting and I'm happy to think about it, I'm
not sure how feasible or useful it is.

I'd therefore actually prefer just to take something like Stanislav's
patch now and put off the problem of how to make it work correctly with
user namespaces until we actually turn that on.  His patch fixes a real
bug that we have now, while user-namespaced-nfsd still sounds a bit
pie-in-the-sky to me.


But maybe I don't understand why Eric thinks nfsd in usernamespaces is
imminent.  Or maybe I'm missing some security problem that Stanislav's
patch would introduce now without allowing nfsd to run in a user
namespace.


The problem I saw is less pronounced but I think actually exists without
user namespaces as well.  In particular if we let root in the container
start knfsd in a network and mount namespace.  Then if that container
does not have certain capabilities like say CAP_MKNOD all of a sudden
we have a process running in the container with CAP_MKNOD.

I haven't had time to look at everything just yet but I don't think
solving this is particularly hard.

There are really two very simple solutions.
1) When we start knfsd in the container we create a kernel thread that
is a child of the userspace process that started knfsd.  That kernel
thread can then be used to launch user mode helpers.

This uses the same code as is needed today to launch user mode
helpers with just a different parent thread.



This might work for us.
but one of the reasons I'm worrying about is that the spawned kernel thread will
have the same capabilities as the process, calling for NFSd start.
And this capabilities can be insufficient for running "/sbin/nfsdcltrack" 
binary.


2) We pass enough information for the user mode helper to figure out
what container this is all for.  File descriptors or whatever.
Then the user mode helper outside the container using chdir and setns
sets up the environment that the user mode helper typically expects
and runs the process inside of the container.



How does chdir and setns differs from doing set_fs_root() in UMH  in terms of
security and isolation?


Or we look at what the user mode helper is doing and realize we don't
need to run the user mode binary inside of the container.  If all it
does is query /etc/passwd for username to uid mapping for example (for
user namespaces we will probably care but not without them).

I don't think any of this is hard to implement.

I think user namespaces are imminent because after my last pass through
the code the remaining work looked pretty trivial, and as soon as the
dust settles I expect user namespaces become the common way to run code
in containers, which should greatly increase the demand for this feature
in user namespaces.

Eric








--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 0/3][TESTS] LAB: Support for Legacy Application Booster governor - tests results

2013-05-23 Thread Lukasz Majewski

Hi Viresh,

> 
> > On 22 May 2013 15:57, Lukasz Majewski 
> > wrote:
> > >> On 3 May 2013 19:37, Jonghwa Lee 
> > >> wrote:
> > 
> > > I think, that overclocking support is crucial here. As you pointed
> > > out
> > > - ondemand and conservative benefit from it. Therefore, I would
> > > urge for its mainline acceptance.
> > >
> > > (code for reference)
> > > http://thread.gmane.org/gmane.linux.kernel/1484746/match=cpufreq
> > >
> > > In this RFC (patch 1/3), I've decided to put the burden of
> > > overclocking support to platform code (cpufreq/exynos-cpufreq.c
> > > and cpufreq/exynos4x12-cpufreq.c).
> > >
> > > Those changes aren't intrusive for other boards/archs. Moreover
> > > overclocking is closely related to processor clocking/power
> > > dissipation capabilities, so SoC specific code is a good place for
> > > it.
> > >
> > >
> > > What DO need a broad acceptance is the overclocking API proposed
> > > at: include/linux/cpufreq.h
> > >
> > > This introduces interface to which others will be bind. It
> > > shouldn't be difficult to implement overclocking at other SoCs (as
> > > it was proposed for Exynos).
> > >
> > > Feedback is welcome, since I might have overlooked oddities
> > > present at other SoCs.
> > 
> > Hi..
> > 
> > I am not talking about the minute details here... for example I
> > didn't like the way overclocking support is implemented... It has to
> > be a bit more framework oriented then driver...
> > 
> > What I am thinking right now is if it is worth to add both the
> > features you are trying. i.e. overclocking and LAB..
> > 
> > So, requested you to give some figures... of ondemand with and
> > without overclocking... Leave LAB for now...

As you wished, I've provided relevant data for overclocking.

Would you be so kind and comment on them?



> > 
> > Then we can give LAB a try with above...
> 
> Test HW Exynos4412 (4 Cores):
> Kernel 3.8.3
> 
> Ondemand max freq: 1.4 GHz
> Overclock max freq: 1.5 GHz
> 
> 
> Ondemand improvement with and without overclocking (called by us
> TurboBoost - TB):
> 
> Dhrystone has been built according to:
> http://zenit.senecac.on.ca/wiki/index.php/Dhrystone_howto
> It's Makefile is also attached.
> 
> 
> Dhrystone # of Threads
>   1   2   3   4
> ondemand  2054794 2061855 2097902 2090592
> ondemand + TB 2290076 2205882 2281368 2290076
> 
> Improvement:  10% 7%  8%  9%
> -
> 
> Electric charge [C]
> (Avg) [A] * [second]  # of Threads
>   1   2   3   4
> ondemand  1,334   1,837   2,296   3,096
> ondemand + TB 1,401   2,2025  2,907   4,34976
>   
> Power cost:   5%  17% 21% 29%
> -
> 
> Execution time [second]   # of Threads
>   1   2   3   4
> ondemand  2,827   2,8 2,787   2,872
> ondemand + TB 2,622   2,694   2,667   2,76
>   
>   
> Speedup:  -7% -4% -4% -4%
> 
> -
> 
> "Real life" example:
> time tar -czf linux-3.9.1.tar.gz linux-3.9.1/
> 
>   Avg current[mA] Time[s]
> Ondemand: 460 153 
> Ondemand + TB:512 144
> 
> Result:   +10%-6%
> 
> 
> 
> Conclusion: 
> 
> The main use case for TB is to speed up execution of tasks packed to
> one core. Other cores are then in IDLE state.
> 
> For a single core we can safely overclock, since we will not exceed
> its power consumption and thermal limits.
> 
> 


-- 
Best regards,

Lukasz Majewski

Samsung R&D Poland (SRPOL) | Linux Platform Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/6] dma: Add a jz4740 dmaengine driver

2013-05-23 Thread Lars-Peter Clausen

On 05/24/2013 06:59 AM, Vinod Koul wrote:
> On Thu, May 23, 2013 at 10:36:24PM +0200, Lars-Peter Clausen wrote:
>> This patch adds dmaengine support for the JZ4740 DMA controller. For now the
>> driver will be a wrapper around the custom JZ4740 DMA API. Once all users of 
>> the
>> custom JZ4740 DMA API have been converted to the dmaengine API the custom API
>> will be removed and direct hardware access will be added to the dmaengine
>> driver.
>>
>> Signed-off-by: Lars-Peter Clausen 
>> ---
> 
>> +static enum jz4740_dma_width jz4740_dma_width(enum dma_slave_buswidth width)
>> +{
>> +switch (width) {
>> +case DMA_SLAVE_BUSWIDTH_1_BYTE:
>> +return JZ4740_DMA_WIDTH_8BIT;
>> +case DMA_SLAVE_BUSWIDTH_2_BYTES:
>> +return JZ4740_DMA_WIDTH_16BIT;
>> +case DMA_SLAVE_BUSWIDTH_4_BYTES:
>> +return JZ4740_DMA_WIDTH_32BIT;
>> +default:
>> +return JZ4740_DMA_WIDTH_32BIT;
>> +}
> Only diff between the values here and dmaengien values is 
> JZ4740_DMA_WIDTH_32BIT
> as 0. But the header tells me taht its default and SIZE one has values in that
> pattern too. If that is the case you maybe able to get rid on conversion and 
> use
> dmaengine values directly.
> 

I'd prefer to keep it the way it is. The JZ4740_DMA_WIDTH constants end up
being written to the hardware, while the DMA_SLAVE_BUSWIDTH constants are Linux
internal. I prefer to not mix these two up.

>> +}
>> +
>> +static enum jz4740_dma_transfer_size jz4740_dma_maxburst(u32 maxburst)
>> +{
>> +if (maxburst <= 1)
>> +return JZ4740_DMA_TRANSFER_SIZE_1BYTE;
>> +else if (maxburst <= 3)
>> +return JZ4740_DMA_TRANSFER_SIZE_2BYTE;
>> +else if (maxburst <= 15)
>> +return JZ4740_DMA_TRANSFER_SIZE_4BYTE;
>> +else if (maxburst <= 31)
>> +return JZ4740_DMA_TRANSFER_SIZE_16BYTE;
>> +
>> +return JZ4740_DMA_TRANSFER_SIZE_32BYTE;
>> +}
>> +
>> +static int jz4740_dma_slave_config(struct dma_chan *c,
>> +const struct dma_slave_config *config)
>> +{
>> +struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
>> +struct jz4740_dma_config jzcfg;
>> +
>> +switch (config->direction) {
>> +case DMA_MEM_TO_DEV:
>> +jzcfg.flags = JZ4740_DMA_SRC_AUTOINC;
>> +jzcfg.transfer_size = jz4740_dma_maxburst(config->dst_maxburst);
>> +break;
>> +case DMA_DEV_TO_MEM:
>> +jzcfg.flags = JZ4740_DMA_DST_AUTOINC;
>> +jzcfg.transfer_size = jz4740_dma_maxburst(config->src_maxburst);
>> +break;
>> +default:
>> +return -EINVAL;
>> +}
>> +
>> +
>> +jzcfg.src_width = jz4740_dma_width(config->src_addr_width);
>> +jzcfg.dst_width = jz4740_dma_width(config->dst_addr_width);
> this should be direction based, typically DMA engines have only one width to 
> be
> programmed.

This one needs both.

>> +jzcfg.mode = JZ4740_DMA_MODE_SINGLE;
>> +jzcfg.request_type = config->slave_id;
>> +
>> +chan->config = *config;
>> +
>> +jz4740_dma_configure(chan->jz_chan, &jzcfg);
>> +
>> +return 0;
> You are NOT use src_addr/dstn_addr? How else are you passing the periphral
> address?

I'm saving the whole config, which will later be used to retrieve the source or
dest address.

>> +}
[...]
>> +static int jz4740_dma_alloc_chan_resources(struct dma_chan *c)
>> +{
>> +struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
>> +
>> +chan->jz_chan = jz4740_dma_request(chan, NULL);
>> +if (!chan->jz_chan)
>> +return -EBUSY;
>> +
>> +jz4740_dma_set_complete_cb(chan->jz_chan, jz4740_dma_complete_cb);
>> +
>> +return 0;
> Zero is not expected value, you need to return the descriptors allocated
> sucessfully.

Well, zero descriptors have been allocated. As far as I can see only a negative
return value is treated as an error. Also the core doesn't seem to use the
return value for anything else but checking if it is an error.

>> +}
>> +
[...]
>> +dd->chancnt = 6;
> hard coding is not advised

But there are 6 channels ;)

[...]
>> +
>> +static struct platform_driver jz4740_dma_driver = {
>> +.probe = jz4740_dma_probe,
>> +.remove = jz4740_dma_remove,
>> +.driver = {
>> +.name = "jz4740-dma",
>> +.owner = THIS_MODULE,
>> +},
>> +};
>> +module_platform_driver(jz4740_dma_driver);
> typically lot of dma driver like to be higher up in the module order. The 
> reason
> is to have device initialized before clients, pls check if you need that

I don't need it.

Thanks for the quick review.
- Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dma: use platform_{get,set}_drvdata()

2013-05-23 Thread Vinod Koul

On Fri, May 24, 2013 at 10:10:13AM +0900, Jingoo Han wrote:
> Use the wrapper functions for getting and setting the driver data using
> platform_device instead of using dev_{get,set}_drvdata() with &pdev->dev,
> so we can directly pass a struct platform_device.
> 
> Also, unnecessary dev_set_drvdata() is removed, because the driver core
> clears the driver data to NULL after device_release or on probe failure.
> 
> Signed-off-by: Jingoo Han 
Applied thanks

--
~Vinod
> ---
>  drivers/dma/fsldma.c  |5 ++---
>  drivers/dma/ppc4xx/adma.c |5 ++---
>  2 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
> index 4fc2980..49e8fbd 100644
> --- a/drivers/dma/fsldma.c
> +++ b/drivers/dma/fsldma.c
> @@ -1368,7 +1368,7 @@ static int fsldma_of_probe(struct platform_device *op)
>  
>   dma_set_mask(&(op->dev), DMA_BIT_MASK(36));
>  
> - dev_set_drvdata(&op->dev, fdev);
> + platform_set_drvdata(op, fdev);
>  
>   /*
>* We cannot use of_platform_bus_probe() because there is no
> @@ -1417,7 +1417,7 @@ static int fsldma_of_remove(struct platform_device *op)
>   struct fsldma_device *fdev;
>   unsigned int i;
>  
> - fdev = dev_get_drvdata(&op->dev);
> + fdev = platform_get_drvdata(op);
>   dma_async_device_unregister(&fdev->common);
>  
>   fsldma_free_irqs(fdev);
> @@ -1428,7 +1428,6 @@ static int fsldma_of_remove(struct platform_device *op)
>   }
>  
>   iounmap(fdev->regs);
> - dev_set_drvdata(&op->dev, NULL);
>   kfree(fdev);
>  
>   return 0;
> diff --git a/drivers/dma/ppc4xx/adma.c b/drivers/dma/ppc4xx/adma.c
> index 5d3d955..e68c51d 100644
> --- a/drivers/dma/ppc4xx/adma.c
> +++ b/drivers/dma/ppc4xx/adma.c
> @@ -4481,7 +4481,7 @@ static int ppc440spe_adma_probe(struct platform_device 
> *ofdev)
>   adev->dev = &ofdev->dev;
>   adev->common.dev = &ofdev->dev;
>   INIT_LIST_HEAD(&adev->common.channels);
> - dev_set_drvdata(&ofdev->dev, adev);
> + platform_set_drvdata(ofdev, adev);
>  
>   /* create a channel */
>   chan = kzalloc(sizeof(*chan), GFP_KERNEL);
> @@ -4594,14 +4594,13 @@ out:
>   */
>  static int ppc440spe_adma_remove(struct platform_device *ofdev)
>  {
> - struct ppc440spe_adma_device *adev = dev_get_drvdata(&ofdev->dev);
> + struct ppc440spe_adma_device *adev = platform_get_drvdata(ofdev);
>   struct device_node *np = ofdev->dev.of_node;
>   struct resource res;
>   struct dma_chan *chan, *_chan;
>   struct ppc_dma_chan_ref *ref, *_ref;
>   struct ppc440spe_adma_chan *ppc440spe_chan;
>  
> - dev_set_drvdata(&ofdev->dev, NULL);
>   if (adev->id < PPC440SPE_ADMA_ENGINES_NUM)
>   ppc440spe_adma_devices[adev->id] = -1;
>  
> -- 
> 1.7.10.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 09/11] KVM: MMU: introduce kvm_mmu_prepare_zap_obsolete_page

2013-05-23 Thread Xiao Guangrong

On 05/24/2013 01:39 PM, Xiao Guangrong wrote:

>>> if (kvm_mmu_prepare_zap_page(sp, list))
>>> hlist_del(sp->hlist);
>>>
>>> Or, i missed your suggestion?
>> My assumption is that we can leave obsolete shadow pages on hashtable
>> till commit_zap time.
> 
> Ah, i see.
> 
> Yes, i agree with your idea. I think we can only skip the obsolete-and-invalid
> page since the obsolete-but-unzapped page still affects the mmu's behaviour,
> for example, it can cause page write-protect, kvm_mmu_unprotect_page()
> can not work by skipping unzapped-obsolete pages.

kvm_mmu_unprotect_page() can work, we can skip obsolete pages too when detect
whether need to write-protect a page, it is easier to make the page become
writable when zapping obsolete pages.

Will update it following your idea, sorry for my noise.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] fs: call_usermodehelper_root helper introduced

2013-05-23 Thread Stanislav Kinsbursky


23.05.2013 23:55, J. Bruce Fields пишет:

On Thu, May 23, 2013 at 09:05:26AM -0400, Jeff Layton wrote:

On Thu, 23 May 2013 15:25:20 +0300

I'm not familiar with nfsdcltrack but I would imagine it receives it's 
information from
Kernel as a command line parameters.

Would it not be the simplest approach to add a --chroot=/path/to/root optional
parameter to nfsdcltrack so it should access an alternate DB relative to
--chroot.

This would address Eric's concern of not executing user-privileged executable
from Kernel. I think

Just my $0.017
Boaz



I think that sounds reasonable. Is it always the case
that /path/to/root is reachable from the "primary" namespace?


I don't think we can assume that.



Yes, we can't. For example in case of different mount namespaces.


If not, you may need to do something more exotic there.


We should be able to pass a file descriptor and then work relative to
that.



We can't do this either.
Moreover, passing a file descriptor is something, that solves (?) completely 
different problem.
Imagine the following:
1) We have a host, based on, say RHEL6, which nfs-utils has doesn't have 
"/sbin/nfsdcltrack" and all.
2) And we have a container in it, based on, say, Fedora-19, which nfs-utils has 
this binary.

In case of starting NFSd in Fedora CT, we won't be able to execute the desired 
binary without root swapping.
Because we won't be able to even lookup it in the host file system.

So, as I said previously, the main problem here is not how to modify the 
userspace binary, but how to lookup and execute the right (!) one.
And I don't see, how we can do this (simple enough) without root swap.


--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 09/11] KVM: MMU: introduce kvm_mmu_prepare_zap_obsolete_page

2013-05-23 Thread Xiao Guangrong

On 05/23/2013 11:57 PM, Gleb Natapov wrote:
> On Thu, May 23, 2013 at 09:03:50PM +0800, Xiao Guangrong wrote:
>> On 05/23/2013 08:39 PM, Gleb Natapov wrote:
>>> On Thu, May 23, 2013 at 07:13:58PM +0800, Xiao Guangrong wrote:
 On 05/23/2013 04:09 PM, Gleb Natapov wrote:
> On Thu, May 23, 2013 at 03:50:16PM +0800, Xiao Guangrong wrote:
>> On 05/23/2013 03:37 PM, Gleb Natapov wrote:
>>> On Thu, May 23, 2013 at 02:31:47PM +0800, Xiao Guangrong wrote:
 On 05/23/2013 02:18 PM, Gleb Natapov wrote:
> On Thu, May 23, 2013 at 02:13:06PM +0800, Xiao Guangrong wrote:
>> On 05/23/2013 01:57 PM, Gleb Natapov wrote:
>>> On Thu, May 23, 2013 at 03:55:58AM +0800, Xiao Guangrong wrote:
 It is only used to zap the obsolete page. Since the obsolete page
 will not be used, we need not spend time to find its unsync 
 children
 out. Also, we delete the page from shadow page cache so that the 
 page
 is completely isolated after call this function.

 The later patch will use it to collapse tlb flushes

 Signed-off-by: Xiao Guangrong 
 ---
  arch/x86/kvm/mmu.c |   46 
 +-
  1 files changed, 41 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 9b57faa..e676356 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1466,7 +1466,7 @@ static inline void 
 kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
  static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
  {
ASSERT(is_empty_shadow_page(sp->spt));
 -  hlist_del(&sp->hash_link);
 +  hlist_del_init(&sp->hash_link);
>>> Why do you need hlist_del_init() here? Why not move it into
>>
>> Since the hlist will be double freed. We will it like this:
>>
>> kvm_mmu_prepare_zap_obsolete_page(page, list);
>> kvm_mmu_commit_zap_page(list);
>>kvm_mmu_free_page(page);
>>
>> The first place is kvm_mmu_prepare_zap_obsolete_page(page), which 
>> have
>> deleted the hash list.
>>
>>> kvm_mmu_prepare_zap_page() like we discussed it here:
>>> https://patchwork.kernel.org/patch/2580351/ instead of doing
>>> it differently for obsolete and non obsolete pages?
>>
>> It is can break the hash-list walking: we should rescan the
>> hash list once the page is prepared-ly zapped.
>>
>> I mentioned it in the changelog:
>>
>>   4): drop the patch which deleted page from hash list at the 
>> "prepare"
>>   time since it can break the walk based on hash list.
> Can you elaborate on how this can happen?

 There is a example:

 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 {
struct kvm_mmu_page *sp;
LIST_HEAD(invalid_list);
int r;

pgprintk("%s: looking for gfn %llx\n", __func__, gfn);
r = 0;
spin_lock(&kvm->mmu_lock);
for_each_gfn_indirect_valid_sp(kvm, sp, gfn) {
pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
 sp->role.word);
r = 1;
kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
}
kvm_mmu_commit_zap_page(kvm, &invalid_list);
spin_unlock(&kvm->mmu_lock);

return r;
 }

 It works fine since kvm_mmu_prepare_zap_page does not touch the hash 
 list.
 If we delete hlist in kvm_mmu_prepare_zap_page(), this kind of codes 
 should
 be changed to:

 restart:
for_each_gfn_indirect_valid_sp(kvm, sp, gfn) {
pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
 sp->role.word);
r = 1;
if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
goto restart;
}
kvm_mmu_commit_zap_page(kvm, &invalid_list);

>>> Hmm, yes. So lets leave it as is and always commit invalid_list before
>>
>> So, you mean drop this patch and the patch of
>> KVM: MMU: collapse TLB flushes when zap all pages?
>>
> We still want to add kvm_reload_remote_mmus() to
> kvm_mmu_invalidate_zap_all_pages(). But yes, we disable a nice
> optimization here. So may be skipping obsolete pages while walking
> hashtable is better solution.

 I am willing to use this way instead, b

Re: [net-next RFC 6/8] macvtap: allow TUNSETIFF to create multiqueue device

2013-05-23 Thread Jason Wang

On 05/23/2013 07:45 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:31AM +0800, Jason Wang wrote:
>> Though the queue were in fact created by open(), we still need to add this 
>> check
>> to be compatible with tuntap which can let mgmt software use a single API to
>> manage queues. This patch only validates the device name and moves the 
>> TUNSETIFF
>> to a helper.
>>
>> Signed-off-by: Jason Wang 
>> ---
>>  drivers/net/macvtap.c |   50 
>> ++--
>>  1 files changed, 39 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 5fd341c..e3f9344 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -867,6 +867,7 @@ out:
>>  return ret;
>>  }
>>  
>> +
> Please don't add empty lines like this :).

Sure.
>
>>  static struct macvlan_dev *macvtap_get_vlan(struct macvtap_queue *q)
>>  {
>>  struct macvlan_dev *vlan;
>> @@ -884,6 +885,43 @@ static void macvtap_put_vlan(struct macvlan_dev *vlan)
>>  dev_put(vlan->dev);
>>  }
>>  
>> +static int macvtap_set_iff(struct file *file, struct ifreq __user *ifr_u)
>> +{
>> +struct macvtap_queue *q = file->private_data;
>> +struct net *net = current->nsproxy->net_ns;
>> +struct inode *inode = file_inode(file);
>> +struct net_device *dev, *dev2;
>> +struct ifreq ifr;
>> +
>> +if (copy_from_user(&ifr, ifr_u, sizeof(struct ifreq)))
>> +return -EFAULT;
>> +
>> +if (ifr.ifr_flags & IFF_MULTI_QUEUE) {
> So why do we validate the name?
> Pls add a comment.

Ok. The validation is to prevent the userspace create the queues on the
wrong device. Not sure whether or not this is needed for single queue case.
>
>> +dev = __dev_get_by_name(net, ifr.ifr_name);
>> +if (!dev)
>> +return -EINVAL;
>> +
>> +dev2 = dev_get_by_macvtap_minor(iminor(inode));
>> +if (!dev2)
>> +return -EINVAL;
>> +
>> +if (dev != dev2) {
>> +dev_put(dev2);
>> +return -EINVAL;
>> +}
>> +
>> +dev_put(dev2);
>> +}
>> +
>> +if ((ifr.ifr_flags & ~(IFF_VNET_HDR | IFF_MULTI_QUEUE)) !=
>> +(IFF_NO_PI | IFF_TAP))
>> +return -EINVAL;
>> +else
>> +q->flags = ifr.ifr_flags;
>> +
>> +return 0;
>> +}
>> +
>>  /*
>>   * provide compatibility with generic tun/tap interface
>>   */
>> @@ -902,17 +940,7 @@ static long macvtap_ioctl(struct file *file, unsigned 
>> int cmd,
>>  
>>  switch (cmd) {
>>  case TUNSETIFF:
>> -/* ignore the name, just look at flags */
>> -if (get_user(u, &ifr->ifr_flags))
>> -return -EFAULT;
>> -
>> -ret = 0;
>> -if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
>> -ret = -EINVAL;
>> -else
>> -q->flags = u;
>> -
>> -return ret;
>> +return macvtap_set_iff(file, ifr);
>>  
>>  case TUNGETIFF:
>>  vlan = macvtap_get_vlan(q);
>> -- 
>> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/6] dma: Add a jz4740 dmaengine driver

2013-05-23 Thread Vinod Koul

On Thu, May 23, 2013 at 10:36:24PM +0200, Lars-Peter Clausen wrote:
> This patch adds dmaengine support for the JZ4740 DMA controller. For now the
> driver will be a wrapper around the custom JZ4740 DMA API. Once all users of 
> the
> custom JZ4740 DMA API have been converted to the dmaengine API the custom API
> will be removed and direct hardware access will be added to the dmaengine
> driver.
> 
> Signed-off-by: Lars-Peter Clausen 
> ---

> +static enum jz4740_dma_width jz4740_dma_width(enum dma_slave_buswidth width)
> +{
> + switch (width) {
> + case DMA_SLAVE_BUSWIDTH_1_BYTE:
> + return JZ4740_DMA_WIDTH_8BIT;
> + case DMA_SLAVE_BUSWIDTH_2_BYTES:
> + return JZ4740_DMA_WIDTH_16BIT;
> + case DMA_SLAVE_BUSWIDTH_4_BYTES:
> + return JZ4740_DMA_WIDTH_32BIT;
> + default:
> + return JZ4740_DMA_WIDTH_32BIT;
> + }
Only diff between the values here and dmaengien values is JZ4740_DMA_WIDTH_32BIT
as 0. But the header tells me taht its default and SIZE one has values in that
pattern too. If that is the case you maybe able to get rid on conversion and use
dmaengine values directly.

> +}
> +
> +static enum jz4740_dma_transfer_size jz4740_dma_maxburst(u32 maxburst)
> +{
> + if (maxburst <= 1)
> + return JZ4740_DMA_TRANSFER_SIZE_1BYTE;
> + else if (maxburst <= 3)
> + return JZ4740_DMA_TRANSFER_SIZE_2BYTE;
> + else if (maxburst <= 15)
> + return JZ4740_DMA_TRANSFER_SIZE_4BYTE;
> + else if (maxburst <= 31)
> + return JZ4740_DMA_TRANSFER_SIZE_16BYTE;
> +
> + return JZ4740_DMA_TRANSFER_SIZE_32BYTE;
> +}
> +
> +static int jz4740_dma_slave_config(struct dma_chan *c,
> + const struct dma_slave_config *config)
> +{
> + struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
> + struct jz4740_dma_config jzcfg;
> +
> + switch (config->direction) {
> + case DMA_MEM_TO_DEV:
> + jzcfg.flags = JZ4740_DMA_SRC_AUTOINC;
> + jzcfg.transfer_size = jz4740_dma_maxburst(config->dst_maxburst);
> + break;
> + case DMA_DEV_TO_MEM:
> + jzcfg.flags = JZ4740_DMA_DST_AUTOINC;
> + jzcfg.transfer_size = jz4740_dma_maxburst(config->src_maxburst);
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> +
> + jzcfg.src_width = jz4740_dma_width(config->src_addr_width);
> + jzcfg.dst_width = jz4740_dma_width(config->dst_addr_width);
this should be direction based, typically DMA engines have only one width to be
programmed.
> + jzcfg.mode = JZ4740_DMA_MODE_SINGLE;
> + jzcfg.request_type = config->slave_id;
> +
> + chan->config = *config;
> +
> + jz4740_dma_configure(chan->jz_chan, &jzcfg);
> +
> + return 0;
You are NOT use src_addr/dstn_addr? How else are you passing the periphral
address?

> +}
> +
> +static int jz4740_dma_terminate_all(struct dma_chan *c)
> +{
> + struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
> + unsigned long flags;
> + LIST_HEAD(head);
> +
> + spin_lock_irqsave(&chan->vchan.lock, flags);
> + jz4740_dma_disable(chan->jz_chan);
> + chan->desc = NULL;
> + vchan_get_all_descriptors(&chan->vchan, &head);
> + spin_unlock_irqrestore(&chan->vchan.lock, flags);
> +
> + vchan_dma_desc_free_list(&chan->vchan, &head);
> +
> + return 0;
> +}
> +
> +static int jz4740_dma_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
> + unsigned long arg)
> +{
> + struct dma_slave_config *config = (struct dma_slave_config *)arg;
> +
> + switch (cmd) {
> + case DMA_SLAVE_CONFIG:
> + return jz4740_dma_slave_config(chan, config);
> + case DMA_TERMINATE_ALL:
> + return jz4740_dma_terminate_all(chan);
> + default:
> + return -EINVAL;
ENXIO/ENOSYS perhaps?
> + }
> +}
> +

> +static int jz4740_dma_alloc_chan_resources(struct dma_chan *c)
> +{
> + struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
> +
> + chan->jz_chan = jz4740_dma_request(chan, NULL);
> + if (!chan->jz_chan)
> + return -EBUSY;
> +
> + jz4740_dma_set_complete_cb(chan->jz_chan, jz4740_dma_complete_cb);
> +
> + return 0;
Zero is not expected value, you need to return the descriptors allocated
sucessfully.
> +}
> +
> +static void jz4740_dma_free_chan_resources(struct dma_chan *c)
> +{
> + struct jz4740_dmaengine_chan *chan = to_jz4740_dma_chan(c);
> +
> + vchan_free_chan_resources(&chan->vchan);
> + jz4740_dma_free(chan->jz_chan);
> + chan->jz_chan = NULL;
> +}
> +
> +static void jz4740_dma_desc_free(struct virt_dma_desc *vdesc)
> +{
> + kfree(container_of(vdesc, struct jz4740_dma_desc, vdesc));
> +}
> +
> +static int jz4740_dma_probe(struct platform_device *pdev)
> +{
> + struct jz4740_dmaengine_chan *chan;
> + struct jz4740_dma_dev *dmadev;
> + struct dma_device *dd;
> + unsigned int i;
> + int ret;
> +
> + dmadev = d

RE: [PATCH v1 2/9] usb: musb: nop: remove unused nop_xceiv_(un)register APIs from glue

2013-05-23 Thread B, Ravi


Subject: Re: [PATCH v1 2/9] usb: musb: nop: remove unused 
nop_xceiv_(un)register APIs from glue

Hello.

On 05/23/2013 09:07 PM, B, Ravi wrote:

>>> removed unused nop xceiv (un_)register API's from all musb platform 
>>> drivers
>> Since when are they unused?
> Please refer to commit id 662dca54 : usb: otg: support for multiple 
> transceivers by a single controller.
> Usb_get_phy() is used to get the of phy used by controller, phy bindings are 
> done through DT.

>Why are you sure that all these platforms support DT (in all 
> configurations)?
> It seems to me that you're simply breaking the patched glue layers with this 
> patch.
> I'll let Felipe decide the fate of this patch though...

You are correct, the bindings of phy and controller need not to done through DT 
alone, there is a saperate API
Phy API's available for such bindings done in respective board platform files. 

>
>>> Signed-off-by: Ravi Babu 
> --
> Ravi B

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v1 8/9] usb: phy: dts: Adding usbphy DT bindings for am33xx

2013-05-23 Thread B, Ravi

Subject: Re: [PATCH v1 8/9] usb: phy: dts: Adding usbphy DT bindings for am33xx

Hello.

On 05/23/2013 09:13 PM, B, Ravi wrote:

>
>>> +   phy1: usbphy-gs70@44e10620 {
>>> +   compatible = "ti,dsps-usbphy";
>>> +   reg = <0x44e10620 0x8
>>> +  0x44e10648 0x4>;
>>> +   reg-names = "phy_ctrl","phy_wkup";
>>> +   id = <0>;
>>> +   };
>>> +
>>> +   phy2: usbphy-gs70@44e10628 {
>>> +   compatible = "ti,dsps-usbphy";
>>> +   reg = <0x44e10628 0x8
>>> +  0x44e10648 0x4>;
>> The second register conflicts with phy1.
> The two instances of phy uses common phy wakeup register.

>> That's why there is a resource conflict. Have you actually tried to 
>> instantiate the devices out of such tree?
>>This register should be declared somewhere above the PHYs I think...

I did not face any problem with this, I have tested both instances of phy used 
by dual instance controller, worked fine. 
What do you suggest, in case of common register which both phy have to use this 
for wakeup functionality.  
The DT should support this.  What do you suggest in such case?

--
Ravi B

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net-next RFC 5/8] macvtap: eliminate linear search

2013-05-23 Thread Jason Wang

On 05/23/2013 07:41 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:30AM +0800, Jason Wang wrote:
>> Linear search were used in both get_slot() and macvtap_get_queue(), this is
>> because:
>>
>> - macvtap didn't reshuffle the array of taps when create or destroy a queue, 
>> so
>>   when adding a new queue, macvtap must do linear search to find a location 
>> for
>>   the new queue. This will also complicate the TUNSETQUEUE implementation for
>>   multiqueue API.
>> - the queue itself didn't track the queue index, so the we must do a linear
>>   search in the array to find the location of a existed queue.
>>
>> The solution is straightforward: reshuffle the array and introduce a 
>> queue_index
>> to macvtap_queue.
>>
>> Signed-off-by: Jason Wang 
>> ---
>>  drivers/net/macvtap.c |   63 
>> +++-
>>  1 files changed, 20 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index a36e49e..5fd341c 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -44,6 +44,7 @@ struct macvtap_queue {
>>  struct macvlan_dev __rcu *vlan;
>>  struct file *file;
>>  unsigned int flags;
>> +u16 queue_index;
>>  };
>>  
>>  static struct proto macvtap_proto = {
>> @@ -84,30 +85,10 @@ static const struct proto_ops macvtap_socket_ops;
>>   */
>>  static DEFINE_SPINLOCK(macvtap_lock);
>>  
>> -/*
>> - * get_slot: return a [unused/occupied] slot in vlan->taps[]:
>> - *  - if 'q' is NULL, return the first empty slot;
>> - *  - otherwise, return the slot this pointer occupies.
>> - */
>> -static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
>> -{
>> -int i;
>> -
>> -for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
>> -if (rcu_dereference_protected(vlan->taps[i],
>> -  lockdep_is_held(&macvtap_lock)) 
>> == q)
>> -return i;
>> -}
>> -
>> -/* Should never happen */
>> -BUG_ON(1);
>> -}
>> -
>>  static int macvtap_set_queue(struct net_device *dev, struct file *file,
>>  struct macvtap_queue *q)
>>  {
>>  struct macvlan_dev *vlan = netdev_priv(dev);
>> -int index;
>>  int err = -EBUSY;
>>  
>>  spin_lock(&macvtap_lock);
>> @@ -115,12 +96,12 @@ static int macvtap_set_queue(struct net_device *dev, 
>> struct file *file,
>>  goto out;
>>  
>>  err = 0;
>> -index = get_slot(vlan, NULL);
>>  rcu_assign_pointer(q->vlan, vlan);
>> -rcu_assign_pointer(vlan->taps[index], q);
>> +rcu_assign_pointer(vlan->taps[vlan->numvtaps], q);
>>  sock_hold(&q->sk);
>>  
>>  q->file = file;
>> +q->queue_index = vlan->numvtaps;
>>  file->private_data = q;
>>  
>>  vlan->numvtaps++;
>> @@ -140,16 +121,23 @@ out:
>>   */
>>  static void macvtap_put_queue(struct macvtap_queue *q)
>>  {
>> +struct macvtap_queue *nq;
>>  struct macvlan_dev *vlan;
>>  
>>  spin_lock(&macvtap_lock);
>>  vlan = rcu_dereference_protected(q->vlan,
>>   lockdep_is_held(&macvtap_lock));
>>  if (vlan) {
>> -int index = get_slot(vlan, q);
>> +int index = q->queue_index;
>> +BUG_ON(index >= vlan->numvtaps);
>>  
>> -RCU_INIT_POINTER(vlan->taps[index], NULL);
>> +rcu_assign_pointer(vlan->taps[index],
>> +   vlan->taps[vlan->numvtaps - 1]);
>>  RCU_INIT_POINTER(q->vlan, NULL);
>> +nq = rcu_dereference_protected(vlan->taps[index],
>> +   lockdep_is_held(&macvtap_lock));
>> +nq->queue_index = index;
>> +
>>  sock_put(&q->sk);
> Hmm this looks like a bug BTW: sock_put should be after
> no one uses this tap, so must be after synchronize_rcu.
> No?

Two refcnts were held, one is the socket itself (and we can safely put
it here), another is held by device when doing macvtap_set_queue(), it
was released by another sock_put() after synchronize_rcu().
>
>>  --vlan->numvtaps;
>>  }
> So:
>   nq = rcu_dereference_protected(vlan->taps[vlan->numvtaps - 1],
>  lockdep_is_held(&macvtap_lock));
>   rcu_assign_pointer(vlan->taps[index], nq);
>
> same thing?

Same and cleaner, thanks.
>
>> @@ -182,8 +170,7 @@ static struct macvtap_queue *macvtap_get_queue(struct 
>> net_device *dev,
>>  rxq = skb_get_rxhash(skb);
>>  if (rxq) {
>>  tap = rcu_dereference(vlan->taps[rxq % numvtaps]);
>> -if (tap)
>> -goto out;
>> +goto out;
>>  }
>>  
>>  if (likely(skb_rx_queue_recorded(skb))) {
>> @@ -193,17 +180,10 @@ static struct macvtap_queue *macvtap_get_queue(struct 
>> net_device *dev,
>>  rxq -= numvtaps;
>>  
>>  tap = rcu_dereference(vlan->taps[rxq]);
>> -if (tap

Re: [PATCH] cpufreq: fix governor start/stop race condition

2013-05-23 Thread Viresh Kumar

On 23 May 2013 08:14, Xiaoguang Chen  wrote:
> Do you mean my patch will cause deadlock? I once tried to add another lock
> to protect the GOV_STOP/START sequence instead of using the rwsem in this
> patch.
> But I saw deadlock indeed.
> In cpufreq_add_policy_cpu, the lock has to be added before the rwsem since
> GOV_STOP is
> before lock_policy_rwsem_write, but in cpufreq_update_policy, it will first
> get the rwsem, and then
> call __cpufreq_set_policy which will contain GOV_STOP again, if we add the
> new lock before this GOV_STOP,
> then we may get deadlock in below sequence:
> 1) hotplug in one cpu by calling cpufreq_add_policy_cpu in which new lock is
> locked first then rwsem is locked.
> 2) governor change in cpufreq_update_policy in which rwsem is locked first
> then new lock is locked.
> this is a deadlock issue if above two steps interleaves

Check this patch.

https://patchwork-mail.kernel.org/patch/2575231/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Documentation/kdump: Remove TODO in this document

2013-05-23 Thread Zhang Yanfei

Add cc'ing triv...@kernel.org

于 2013年05月24日 12:54, Rob Landley 写道:
> On 05/23/2013 03:30:15 AM, Zhang Yanfei wrote:
>> We have already had the relocatable kernel, so just remove
>> the TODO in the kdump document.
>>
>> Signed-off-by: Zhang Yanfei 
> 
> Acked-by: Rob Landley 
> 
> Please forward to triv...@kernel.org
> 
> Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net-next RFC 4/8] macvlan: reduce the max number of taps to 8

2013-05-23 Thread Jason Wang

On 05/23/2013 02:37 PM, Michael S. Tsirkin wrote:
> On Thu, May 23, 2013 at 11:12:29AM +0800, Jason Wang wrote:
>> To be same with tap.
>>
>> Signed-off-by: Jason Wang 
> Well for tap the very specific reason was that
> there's an array of big queue structures,
> so we need to limit it to make it fit in a page.
> No such reason here right?

Right, one of the reason is that the flow caches itself occupies too
much space, I'd move it out of tun_struct.
>
> We need at least as much as tap to be compatible, so
> let's just make it 16 unconditionally?

Ok, in fact we can make it even more.
>
>> ---
>>  include/linux/if_macvlan.h |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
>> index e47ad46..32e943a 100644
>> --- a/include/linux/if_macvlan.h
>> +++ b/include/linux/if_macvlan.h
>> @@ -50,7 +50,7 @@ struct macvlan_pcpu_stats {
>>   * Maximum times a macvtap device can be opened. This can be used to
>>   * configure the number of receive queue, e.g. for multiqueue virtio.
>>   */
>> -#define MAX_MACVTAP_QUEUES  (NR_CPUS < 16 ? NR_CPUS : 16)
>> +#define MAX_MACVTAP_QUEUES  8
>>  
>>  #define MACVLAN_MC_FILTER_BITS  8
>>  #define MACVLAN_MC_FILTER_SZ(1 << MACVLAN_MC_FILTER_BITS)
>> -- 
>> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Documentation/kdump: Remove TODO in this document

2013-05-23 Thread Rob Landley


On 05/23/2013 03:30:15 AM, Zhang Yanfei wrote:

We have already had the relocatable kernel, so just remove
the TODO in the kdump document.

Signed-off-by: Zhang Yanfei 


Acked-by: Rob Landley 

Please forward to triv...@kernel.org

Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] build some drivers only when compile-testing

2013-05-23 Thread Rob Landley


On 05/23/2013 09:01:40 AM, Ben Hutchings wrote:

On Wed, 2013-05-22 at 19:23 -0700, Greg Kroah-Hartman wrote:
> On Wed, May 22, 2013 at 11:18:46AM +0200, Jiri Slaby wrote:
> > Some drivers can be built on more platforms than they run on. This
> > causes users and distributors packaging burden when they have to
> > manually deselect some drivers from their allmodconfigs. Or  
sometimes

> > it is even impossible to disable the drivers without patching the
> > kernel.
> >
> > Introduce a new config option COMPILE_TEST and make all those  
drivers
> > to depend on the platform they run on, or on the COMPILE_TEST  
option.
> > Now, when users/distributors choose COMPILE_TEST=n they will not  
have

> > the drivers in their allmodconfig setups, but developers still can
> > compile-test them with COMPILE_TEST=y.
>
> I understand the urge, and it's getting hard for distros to handle  
these

> drivers that just don't work on other architectures, but it's really
> valuable to ensure that they build properly, for those of us that  
don't

> have many/any cross compilers set up.


In http://landley.net/aboriginal/bin grab the cross-compiler-*.tar.bz2  
tarballs, extract them, add the "bin" subdirectory of each to the  
$PATH. Congratulations, you have cross compilers set up. (They're  
statically linked and relocatable, so should run just about anywhere.  
If they don't, let me know and I'll fix it.)


Example build:

  make ARCH=sparc sparc32_defconfig
  PATH=/home/landley/simple-cross-compiler-sparc/bin:$PATH \
make ARCH=sparc CROSS_COMPILE=sparc-

Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH RFC v3 1/3] i2c: i2c-bfin-twi: convert to devm_* API

2013-05-23 Thread Zhang, Sonic

Acked-by: Sonic Zhang 


>-Original Message-
>From: Libo Chen [mailto:libo.c...@huawei.com]
>Sent: Thursday, May 23, 2013 8:00 PM
>To: w...@the-dreams.de
>Cc: guz.f...@cn.fujitsu.com; Zhang, Sonic; uclinux-dist-
>de...@blackfin.uclinux.org; linux-...@vger.kernel.org; linux-
>ker...@vger.kernel.org; lize...@huawei.com; libo.c...@huawei.com
>Subject: [PATCH RFC v3 1/3] i2c: i2c-bfin-twi: convert to devm_* API
>
>peripheral_request_list has got free if any one faild, so no need to free 
>again in err
>case.
>aovid this, convert them to devm_* API
>
>Signed-off-by: Libo Chen 
>---
> drivers/i2c/busses/i2c-bfin-twi.c |   38 +---
> 1 files changed, 10 insertions(+), 28 deletions(-)
>
>diff --git a/drivers/i2c/busses/i2c-bfin-twi.c 
>b/drivers/i2c/busses/i2c-bfin-twi.c
>index 05080c4..2b99c48 100644
>--- a/drivers/i2c/busses/i2c-bfin-twi.c
>+++ b/drivers/i2c/busses/i2c-bfin-twi.c
>@@ -621,35 +621,27 @@ static int i2c_bfin_twi_probe(struct platform_device
>*pdev)
>   int rc;
>   unsigned int clkhilow;
>
>-  iface = kzalloc(sizeof(struct bfin_twi_iface), GFP_KERNEL);
>+  iface = devm_kzalloc(&pdev->dev, sizeof(struct bfin_twi_iface),
>+  GFP_KERNEL);
>   if (!iface) {
>   dev_err(&pdev->dev, "Cannot allocate memory\n");
>-  rc = -ENOMEM;
>-  goto out_error_nomem;
>+  return -ENOMEM;
>   }
>
>   spin_lock_init(&(iface->lock));
>
>   /* Find and map our resources */
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>-  if (res == NULL) {
>-  dev_err(&pdev->dev, "Cannot get IORESOURCE_MEM\n");
>-  rc = -ENOENT;
>-  goto out_error_get_res;
>-  }
>-
>-  iface->regs_base = ioremap(res->start, resource_size(res));
>-  if (iface->regs_base == NULL) {
>+  iface->regs_base = devm_ioremap_resource(&pdev->dev, res);
>+  if (IS_ERR(iface->regs_base)) {
>   dev_err(&pdev->dev, "Cannot map IO\n");
>-  rc = -ENXIO;
>-  goto out_error_ioremap;
>+  return -ENXIO;
>   }
>
>   iface->irq = platform_get_irq(pdev, 0);
>   if (iface->irq < 0) {
>   dev_err(&pdev->dev, "No IRQ specified\n");
>-  rc = -ENOENT;
>-  goto out_error_no_irq;
>+  return -ENOENT;
>   }
>
>   p_adap = &iface->adap;
>@@ -666,10 +658,10 @@ static int i2c_bfin_twi_probe(struct platform_device
>*pdev)
>   "i2c-bfin-twi");
>   if (rc) {
>   dev_err(&pdev->dev, "Can't setup pin mux!\n");
>-  goto out_error_pin_mux;
>+  return rc;
>   }
>
>-  rc = request_irq(iface->irq, bfin_twi_interrupt_entry,
>+  rc = devm_request_irq(&pdev->dev, iface->irq,
>+bfin_twi_interrupt_entry,
>   0, pdev->name, iface);
>   if (rc) {
>   dev_err(&pdev->dev, "Can't get IRQ %d !\n", iface->irq); @@ -
>707,16 +699,9 @@ static int i2c_bfin_twi_probe(struct platform_device *pdev)
>   return 0;
>
> out_error_add_adapter:
>-  free_irq(iface->irq, iface);
> out_error_req_irq:
>-out_error_no_irq:
>   peripheral_free_list((unsigned short *)pdev->dev.platform_data);
>-out_error_pin_mux:
>-  iounmap(iface->regs_base);
>-out_error_ioremap:
>-out_error_get_res:
>-  kfree(iface);
>-out_error_nomem:
>+
>   return rc;
> }
>
>@@ -725,10 +710,7 @@ static int i2c_bfin_twi_remove(struct platform_device
>*pdev)
>   struct bfin_twi_iface *iface = platform_get_drvdata(pdev);
>
>   i2c_del_adapter(&(iface->adap));
>-  free_irq(iface->irq, iface);
>   peripheral_free_list((unsigned short *)pdev->dev.platform_data);
>-  iounmap(iface->regs_base);
>-  kfree(iface);
>
>   return 0;
> }
>--
>1.7.1
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: move proc mount options out of pid_namespace

2013-05-23 Thread Stephen Mell

Gu,

On Friday, May 24, 2013 11:03:31 Gu Zheng wrote:
> Hi Stephen,
> 
> On 05/24/2013 07:32 AM, Stephen Mell wrote:
> 
> > On Thursday, May 23, 2013 18:20:57 Gu Zheng wrote:
> > 
> >> Here it'll create a new proc sb instance which holds the same context as 
> >> the old ones
> >> each time we mount proc though in the same PID namespace, won't it?
> > I believe so. But this is the point, right? 

> Yes, but I think it's also the problem.
> 
> >They won't be identical if different mount options are used, I don't think.
> 
> If different mount options are used, we'll create different super block 
> instance, and they have
> the same context, only the difference is each one holds different 
> proc_sb_info.
> But I think what we really want is only one proc sb instance and create 
> different proc_sb_info
> if different mount options are used.

Will having several different superblocks cause problems, or is it merely 
inefficient? I freely admit to not really knowing what I'm doing, and I thank 
you for your assistance.
How is this situation distinct from that of ramfs? It appears to have a 
superblock for each mount.
It would seem to me as though one cannot have different sb_infos with the same 
superblock, making storing the mount options in sb_info effectively the same as 
storing them in the superblock itself, for the purposes of this discussion. 
Where would the mount options be stored, if not in the superblock?

> > 
> >> Here the pre-check seems needless.
> > Is that new with my patch, or has it always been needless?
> 
> Yeah, it's always needless.
> 
> Thanks,
> Gu
> 
> > 
> > Thanks,
> > Stephen

Thanks again,
Stephen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/8] Documentation: Updating a broken link in "the perfect patch"

2013-05-23 Thread Rob Landley


On 05/23/2013 07:49:58 AM, Ben Minerds wrote:
 The link to Andrews patches were out of date. Replacing with new  
links. Also

 slightly reformatted to allow easier selection of links.

 Signed-off-by: Ben Minerds 
---
 Documentation/development-process/patches/The-Perfect-Patch.txt | 8  


 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git  
a/Documentation/development-process/patches/The-Perfect-Patch.txt  
b/Documentation/development-process/patches/The-Perfect-Patch.txt

index 217a303..63eccbb 100644
--- a/Documentation/development-process/patches/The-Perfect-Patch.txt
+++ b/Documentation/development-process/patches/The-Perfect-Patch.txt
@@ -160,8 +160,8 @@ Overall



-The patch management scripts at  
http://www.zip.com.au/~akpm/linux/patches/

-implement all of the above.
+The patch management scripts here implement all of the above:
+  https://www.kernel.org/pub/linux/kernel/people/akpm/patches/


I looked in that directory, and I'm not seeing quilt.

-The patch management tools at  
https://savannah.nongnu.org/projects/quilt/ also

-implement all of the above.
\ No newline at end of file
+The patch management tools here implement all of the above:
+  https://savannah.nongnu.org/projects/quilt/


This appears to be the currently location. As mentioned on line 76 of  
the current Documentation/SubmittingPatches.


I think I'd like to NAK this whole series until you read what's  
currently in the tree, please?


Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] Documentation: Adding "The Perfect Patch" by Andrew Morton

2013-05-23 Thread Joe Perches

On Thu, 2013-05-23 at 23:10 -0500, Rob Landley wrote:
> On 05/23/2013 07:49:53 AM, Ben Minerds wrote:
> >  Adding Andrews advice on patch submission and subdirectory for  
> > further patch
> >  documentstion.
> 
> You've seen Documentation/SubmittingPatches right?

Maybe not.

> > +- Ensure that your patch does not add new trailing whitespace.  The  
> > below
> > +  script will fix up your patch by stripping off such whitespace.
[]
> Doesn't scripts/checkpatch.pl check for this?

Yes it does.
scripts/cleanpatch exists too.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/8] Documentation: Replacing reference to broken submission format URL

2013-05-23 Thread Rob Landley


On 05/23/2013 07:49:57 AM, Ben Minerds wrote:
 Replacing refs to broken URL with internal documentation reference,  
and a

 little whitespace shuffle to keep it under 80 chars wide.

 Signed-off-by: Ben Minerds 
---
 Documentation/HOWTO   | 10 +-
 Documentation/SubmittingPatches   |  2 +-
 Documentation/ja_JP/HOWTO |  8 
 Documentation/ja_JP/SubmittingPatches |  2 +-
 Documentation/ko_KR/HOWTO |  2 +-
 Documentation/zh_CN/HOWTO |  2 +-
 Documentation/zh_CN/SubmittingPatches |  2 +-
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 11e597e..fba34c0 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -110,11 +110,11 @@ required reading:
 subject to scrutiny for content and style), but not following  
them

 will almost always prevent it.

-Other excellent descriptions of how to create patches properly  
are:

-   "The Perfect Patch"
-		 
Documentation/development-process/patches/The-Perfect-Patch.txt

-   "Linux kernel patch submission format"
-   http://linux.yyz.us/patch-format.html
+  Other excellent descriptions of how to create patches properly are:
+"The Perfect Patch"
+  Documentation/development-process/patches/The-Perfect-Patch.txt
+"Linux kernel patch submission format"
+   
Documentation/development-process/patches/Patch-Submission-Format.txt


Ok, this is the third consecutive patch to do approximately the same  
thing, and now you're patching lines you added in a previous patch in  
the same series.


Breaking files up into stages helps with bisectability. Are we really  
going to "git bisect" documentation?


On the larger question of "is this a good idea", I'm leaning towards  
"no" and would like you to explain why an 8 year old description  
duplicating portions of SubmittingPatches needs to be in-tree.


Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/8] Documentation: Adding "Linux Kernel Patch Submission Format"

2013-05-23 Thread Rob Landley


On 05/23/2013 07:49:54 AM, Ben Minerds wrote:
 Adding "Linux Kernel Patch Submission Format" reference to remove  
external

 dependancy.

 Signed-off-by: Ben Minerds 
---
 .../patches/Patch-Submission-Format.txt| 89  
++

...
+For more details, read Documentation/SubmittingPatches and  
Documentation/SubmittingDrivers in the kernel source tree.


What's in this that _isn't_ in SubmittingPatches? What does having this  
second file add?


Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] Documentation: Adding "The Perfect Patch" by Andrew Morton

2013-05-23 Thread Rob Landley


On 05/23/2013 07:49:53 AM, Ben Minerds wrote:
 Adding Andrews advice on patch submission and subdirectory for  
further patch

 documentstion.


You've seen Documentation/SubmittingPatches right?


 Signed-off-by: Ben Minerds 
---
 .../patches/The-Perfect-Patch.txt  | 167  
+

 1 file changed, 167 insertions(+)
 create mode 100644  
Documentation/development-process/patches/The-Perfect-Patch.txt


diff --git  
a/Documentation/development-process/patches/The-Perfect-Patch.txt  
b/Documentation/development-process/patches/The-Perfect-Patch.txt

new file mode 100644
index 000..b07074e
--- /dev/null
+++ b/Documentation/development-process/patches/The-Perfect-Patch.txt
@@ -0,0 +1,167 @@
+From: Andrew Morton [email blocked]
+To: Mukker, Atul [email blocked]
+Subject: Re: [patch] 2.6.9-rc1-mm1: megaraid_mbox.c compile error  
with gcc 3.4

+Date:  Sat, 28 Aug 2004 13:04:19 -0700
+


This email is eight years old.


+"Mukker, Atul" [email blocked] wrote:
+>
+> The driver and the patches with the re-ordered functions is  
available at

+>  ftp://ftp.lsil.com/pub/linux-megaraid/drivers/version-2.20.3.1/
+
+I dunno about James, but I *really* dislike receiving patches by  
going and
+getting them from internet servers.  It breaks our commonly-used  
tools.  It

+loses authorship info.  It loses Signed-off-by: info.  There is no
+changelog.  All this means that your patch is more likely to be  
ignored by

+busy people.  Please, just email the diffs.
+
+I wrote a little guide this week:
+
+
+
+The perfect patch. [email blocked]
+
+The latest version of this document may be found at
+http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt


Could you just grab the actual version from his repo instead of one  
that says "email blocked" and commemorates the URL of a rejected  
submission in a chatty email header?



+Delivery
+
+
+- Patches are delivered via email only.  Downloading them from  
internet

+  servers is a pain.
+
+- One patch per email, with the changelog in the body of the email.
+
+Subject:
+
+
+- The email's Subject: should consisely describe the patch which  
that email
+  contains.  The Subject: should not be a filename.  Do not use the  
same

+  Subject: for every patch in a whole patch series.
+
+  Bear in mind that the Subject: of your email becomes a  
globally-unique
+  identifier for that patch.  It propagates all the way into  
BitKeeper.  The
+  Subject: may later be used in developer discussions which refer to  
the

+  patch.  People will want to google for the patch's Subject: to read
+  discussion regarding that patch.
+
+- When sending a series of patches, the patches should be  
sequence-numbered

+  in the Subject:
+
+- It is nice if the Subject: includes mention of the subsystem which  
it

+  affects.  See example below.
+
+- Example Subject:
+
+   [patch 2/5] ext2: improve scalability of bitmap searching
+
+- Note that various people's patch receiving scripts will strip away
+  Subject: text which is inside brackets [].  So you should place  
information
+  which has no long-term relevance to the patch inside brackets.   
This
+  includes the word "patch" and any sequence numbering.  The  
subsystem

+  identifier ("ext2:") should hence be outside brackets.
+
+
+Changelog
+=
+
+- Bear in mind that the Subject: and changelog which you provide will
+  propagate all the way into the permanent kernel record.  Other  
developers
+  will want to read and understand your patch and changelog years in  
the

+  future.
+
+  So the changelog should describe the patch fully:
+
+  - why the kernel needed patching
+
+  - the overall design approach in the patch
+
+  - implementation details
+
+  - testing results
+
+- Don't bother mentioning what version of the kernel the patch  
applies to
+  ("applies to 2.6.8-rc1").  This is not interesting information -  
once the
+  patch is in bitkeeper, of _course_ it applied, and it'll probably  
be merged

+  into a later kernel than the one which you wrote it for.
+


Bitkeeper?

Is this really current advice? What part of this has _not_ been put in  
SubmittingPatches yet?


(I'm all for moving SubmittingPatches and friends under  
development-process/ and in fact vaguely plan a general Documentation/  
tree cleanup next time I have some downtime between contracts. But  
adding eight year old duplication: not so much.)


+- Do not refer to earlier patches when changelogging a new version  
of a
+  patch.  It's not very useful to have a bitkeeper changelog which  
says "OK,
+  this fixes the things you mentioned yesterday".  Each iteration of  
the patch
+  should contain a standalone changelog.  This implies that you need  
a patch

+  management system which maintains changelogs.  See below.
+
+- Add a Signed-off-by: line.
+


There's a whole edifice of signed-off-by line advice elsewhere in  
Documentation. (The pointy-haired types descended on this and attempted  
to make a bureau

Re: [PATCH 1/2] mmc: sdhci: Added set_power sdhci_ops handler.

2013-05-23 Thread Jaehoon Chung

On 05/23/2013 04:25 PM, Guennadi Liakhovetski wrote:
> On Wed, 22 May 2013, Felipe Ferreri Tonello wrote:
> 
>> Hi Guennadi,
>>
>> On Wednesday, May 22, 2013 10:30:40 PM Guennadi Liakhovetski wrote:
>>> On Wed, 22 May 2013, Felipe F. Tonello wrote:
 From: "Felipe F. Tonello" 

 This is useful for power managment purposes if a sdhci child host wants to
 turn off some other peripheral also.
>>>
>>> Sorry, could you elaborate a bit? In what situations is it exactly useful?
>>> And why cannot the regulator API be used there?
>>
>> Sorry about that.
>>
>> One example that I can think of is when you have a wifi module connected as 
>> a 
>> mmc card via sdio. So you can register a callback function in your machine 
>> source code to turn on/off the wifi module based on the mmc host power.
> 
> Ok, understand. Your second patch in this series adds such a callback in 
> your SDHCI host driver and there it just calls a platform callback. I 
> don't think this is a good idea. First, we want to go away from platform 
> callbacks, because they are incompatible with DT. Second, because the 
> proper solution IMHO would be for your platform to export a regulator, and 
> the SDHCI core driver already includes regulator support.
We can use the regulator framework.
i think this callback function didn't need.

Best Regards,
Jaehoon Chung
> 
>> I've seen this implementation in others mmc hosts, such as omap.
> 
> Which, however, doesn't yet mean, it's a good idea :)
> 
> Thanks
> Guennadi
> ---
> Guennadi Liakhovetski, Ph.D.
> Freelance Open-Source Software Developer
> http://www.open-technology.de/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v3 2/3] i2c: pxa: convert to devm_* API

2013-05-23 Thread Libo Chen

On 2013/5/24 11:35, Gu Zheng wrote:
> Hi Libo,
>I think you can merge patch 1/3 and 2/3, they e thdo the saming that using 
>  devm_* API to simplify
> and make the code clean, and the additional goal is it also can fix a bug. 
> Besides, maybe you need to
> change the title and make it clear and self-described.
> 
> Thanks,
> Gu

thanks for your suggestion. But, I had referenced other guy`s patch title, they 
usually do like this, so I think it is enough.

Regards,

Libo


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] mmc: sdhci-s3c: Added set_power handler to platdata

2013-05-23 Thread Jaehoon Chung

Hi Felipe,

I didn't understand this patch, why need to add set_power?
We can use to control the power with the fixed regulator.
Then we can also use the regulator framework.
And i know also control the module like wifi with rfkill.
In set_power, what is it controlled?

Best Regards,
Jaehoon Chung
On 05/23/2013 02:47 AM, Felipe F. Tonello wrote:
> From: "Felipe F. Tonello" 
> 
> This is useful to turn off peripherals that are related to the mmc host. One
> common case is when the wifi module is connected as an mmc card to the host.
> 
> Signed-off-by: Felipe F. Tonello 
> ---
>  drivers/mmc/host/sdhci-s3c.c| 8 
>  include/linux/platform_data/mmc-sdhci-s3c.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-s3c.c b/drivers/mmc/host/sdhci-s3c.c
> index c6f6246..f7e740c 100644
> --- a/drivers/mmc/host/sdhci-s3c.c
> +++ b/drivers/mmc/host/sdhci-s3c.c
> @@ -360,11 +360,19 @@ static int sdhci_s3c_platform_bus_width(struct 
> sdhci_host *host, int width)
>   return 0;
>  }
>  
> +static void sdhci_s3c_set_power(struct sdhci_host *host, bool power)
> +{
> + struct sdhci_s3c *ourhost = to_s3c(host);
> + if (ourhost->pdata->set_power)
> + ourhost->pdata->set_power(power);
> +}
> +
>  static struct sdhci_ops sdhci_s3c_ops = {
>   .get_max_clock  = sdhci_s3c_get_max_clk,
>   .set_clock  = sdhci_s3c_set_clock,
>   .get_min_clock  = sdhci_s3c_get_min_clock,
>   .platform_bus_width = sdhci_s3c_platform_bus_width,
> + .set_power  = sdhci_s3c_set_power,
>  };
>  
>  static void sdhci_s3c_notify_change(struct platform_device *dev, int state)
> diff --git a/include/linux/platform_data/mmc-sdhci-s3c.h 
> b/include/linux/platform_data/mmc-sdhci-s3c.h
> index 249f023..55be925 100644
> --- a/include/linux/platform_data/mmc-sdhci-s3c.h
> +++ b/include/linux/platform_data/mmc-sdhci-s3c.h
> @@ -50,6 +50,7 @@ struct s3c_sdhci_platdata {
> int state));
>  
>   void(*cfg_gpio)(struct platform_device *dev, int width);
> + void(*set_power)(bool power);
>  };
>  
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/07] Documentation patch of ams AS3722 PMIC against linux_3.8.8

2013-05-23 Thread Rob Landley


On 05/23/2013 07:07:37 AM, Florian Lobmaier wrote:

From: Florian Lobmaier 

Added multi-function device driver support for ams AS3722 PMIC
Includes modules gpio, regulator, rtc, and watchdog

Signed-off-by: Florian Lobmaier 


This seems like company-internal documentation that's got large  
irrelevant sections once merged into Linus's tree.



---
diff -uprN -X Documentation/dontdiff  
../kernel_3.8.8/linux-kernel/Documentation/mfd-as3722.txt  
./Documentation/mfd-as3722.txt
--- ../kernel_3.8.8/linux-kernel/Documentation/mfd-as3722.txt	 
1970-01-01 01:00:00.0 +0100
+++ ./Documentation/mfd-as3722.txt	2013-05-23 13:12:35.0  
+0200


Is there a better place for this than the top level Documentation  
directory?



@@ -0,0 +1,133 @@
+AS3722 PMIC Driver documentation
+
+Author: Florian Lobmaier 
+Last update: 2013-04-12


This is information you use git to find.

+This document describes how to use the multi-function device driver  
for

+AS3722 PMICs. The AS3722 PMICs are accessed via i2c.
+
+Version info:
+=
+Linux kernel: 3.4.38
+Patch created with: $ git diff  
f3b5af9a6e2a873110bb8546b42ae7c51f2213b3 > as3722_driver_v0.0.1.patch

+Apply patch with:  $ patch -p0 < as3722_driver_v0.0.1.patch


Again: if this goes into the tree with git, you're copying the git  
history into Documentation.


And "Apply patch with" means what after merging...? (You are submitting  
this for inclusion, right?)


$ git show f3b5af9a6e2a
fatal: ambiguous argument 'f3b5af9a6e2a': unknown revision or path not  
in the working tree.



+Description of files:
+=
+
+as3722_platform-data.c:
+Platform data and as3722 regulator initialisation support.
+
+drivers/mfd/as3722-core.c:
+AS3722 communication and device management
+
+drivers/regulator/as3722-regulator.c:
+Adds the LDO's DCDC's and current regulators to /sys/class/regulator.
+
+drivers/gpio/gpio-as3722.c:
+Adds the GPIOs of AS3722 to /sys/class/gpio and to GPIO framework.
+
+drivers/rtc/rtc-as3722.c:
+Adds the RTC functionality of AS3722 to RTC framework.
+
+drivers/mfd/as3722-test.c:
+Regulator test framework. Exports some tests to userspace (see below)
+
+include/linux/mfd/as3722-reg.h:
+Register definitions
+
+include/linux/mfd/as3722-plat.h:
+platform data struct definitions


Putting a one line comment at the top of each file would be more  
appropriate.



+How do I configure it?
+==
+
+1. Configure it in config by selecting
+
+CONFIG_MFD_CORE=y
+CONFIG_MFD_AS3722=y
+CONFIG_REGULATOR=y
+CONFIG_REGULATOR_AS3722=y
+CONFIG_GPIO_AS3722=y
+CONFIG_RTC_DRV_AS3722=y


Menuconfig help exists, and forward slash search "AS3722" within  
menuconfig also exists. There's bound to be a better way to phrase  
this...


+2. Configure the driver according to the schematic  by filling out  
the
+as3722_platform_data. Additionally regulator initial data and  
initial register

+setup can be done here. Add it to the list of attached devices when
+registering the proper i2c bus.


Ok, this looks like it might belong in Documentation. Except it doesn't  
say how to do any of that.


Also, this does _not_ support device tree? (Given the mention of gpio  
I'd generally assumed this was arm, but you haven't mentioned any  
actual use cases for it yet...)


The next section seems actually useful:


+Testing Regulators:
+===
+
+# insmod as3722-core.ko
+# insmod as3722-regulators.ko
+
+Regulators using the linux kernel regulator framework can be found at
+/sys/class/regulator. Via sysfs its possible to get some status  
information:

+
+# cd /sys/class/regulator/regulator.9
+# ls
+# ls
+device  microvolts  num_users   type
+max_microamps   min_microamps   power   uevent
+max_microvolts  min_microvolts  state
+microamps   namesubsystem
+
+We can perform the testing by using the as3722-test.ko modules which  
provides

+a testframework to userspace:
+
+Mount debugfs
+# mount -t debugfs none /sys/kernel/debug/
+
+Load test modules (needed for regulator tests)
+# insmod as3722-test.ko
+
+Use the provided test-scripts. Transfer the scripts and the  
respective

+command text files to the rowboat board via adb:


What's a rowboat board? Where are the provided test-scripts?

+(win cygwin) $ adb shell mkdir -p /testing/ && adb push regtest.sh  
/testing/regtest.sh

+(win cygwin) $ adb push test-dig-ldo.txt /testing/test-dig-ldo.txt
+
+Run the script on the beagle-board testing the ldo3:
+# sh /testing/regtest.sh as3722-ldo3 /testing/test-dig-ldo.txt
+
+Example testing the sd1 (using test-sd.txt):
+# sh /testing/regtest.sh as3722-sd1 /testing/test-sd.txt
+
+Testing gpio:
+=
+For testing the gpio framework the sysFS functions are used. Of  
course

+all gpio framework functionality exists as well.
+# cd /sys/class/gpio/
+
+Now "export" and "unexport" files should be available. Use export  
function
+to create a gpio211 node for GPIO #211 if th

Re: [PATCH RFC v3 2/3] i2c: pxa: convert to devm_* API

2013-05-23 Thread Li Zefan

On 2013/5/24 11:35, Gu Zheng wrote:
> Hi Libo,
>I think you can merge patch 1/3 and 2/3, they do the same thing that using 
>  devm_* API to simplify
> and make the code clean, and the additional goal is it also can fix a bug.

nope. they should be separated.

> Besides, maybe you need to
> change the title and make it clear and self-described.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG_ON in virtio-ring.c

2013-05-23 Thread Dave Airlie

Hi Rusty,

current virtio-ring.c has a BUG_ON in virtqueue_add that checks
total_sg > vg->vring.num, however I'm not sure it really is 100%
correct.

If I have an indirect ring and I'm adding sgs to it and the host is
delayed (say I've got a thread consuming things from the vring and its
off doing something interesting),
I'd really like to get ENOSPC back from virtqueue_add. However if the
indirect addition fails due to free_sg being 0, we hit the BUG_ON
before we ever get to the ENOSPC check.

the BUG_ON is quite valid in the no indirect case, but when we have
indirect buffers it doesn't seem like it always makes sense.

Not sure best way to fix it, I'm just a virtio newbie :)

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/12] VFS hot tracking

2013-05-23 Thread Zhi Yong Wu

HI, Al Viro.

  I have incorporated all comments from all reviewers and waited for
so long time. If you have no comments,  can you merge the patchset?
thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v3 2/3] i2c: pxa: convert to devm_* API

2013-05-23 Thread Gu Zheng

Hi Libo,
   I think you can merge patch 1/3 and 2/3, they do the same thing that using  
devm_* API to simplify
and make the code clean, and the additional goal is it also can fix a bug. 
Besides, maybe you need to
change the title and make it clear and self-described.

Thanks,
Gu

On 05/23/2013 08:00 PM, Libo Chen wrote:

> when i2c malloc faild, we should not try to call release_mem_region.
> aovid this, convert them to devm_* API.
> 
> Signed-off-by: Libo Chen 
> ---
>  drivers/i2c/busses/i2c-pxa.c |   63 
> ++
>  1 files changed, 15 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-pxa.c b/drivers/i2c/busses/i2c-pxa.c
> index ea6d45d..7b10102 100644
> --- a/drivers/i2c/busses/i2c-pxa.c
> +++ b/drivers/i2c/busses/i2c-pxa.c
> @@ -1091,10 +1091,9 @@ static int i2c_pxa_probe(struct platform_device *dev)
>   struct resource *res = NULL;
>   int ret, irq;
>  
> - i2c = kzalloc(sizeof(struct pxa_i2c), GFP_KERNEL);
> + i2c = devm_kzalloc(&dev->dev, sizeof(struct pxa_i2c), GFP_KERNEL);
>   if (!i2c) {
> - ret = -ENOMEM;
> - goto emalloc;
> + return -ENOMEM;
>   }


The {} pair is no longer needed.

>  
>   /* Default adapter num to device id; i2c_pxa_probe_dt can override. */
> @@ -1104,19 +1103,11 @@ static int i2c_pxa_probe(struct platform_device *dev)
>   if (ret > 0)
>   ret = i2c_pxa_probe_pdata(dev, i2c, &i2c_type);
>   if (ret < 0)
> - goto eclk;
> + return ret;
>  
> - res = platform_get_resource(dev, IORESOURCE_MEM, 0);
>   irq = platform_get_irq(dev, 0);
> - if (res == NULL || irq < 0) {
> - ret = -ENODEV;
> - goto eclk;
> - }
> -
> - if (!request_mem_region(res->start, resource_size(res), res->name)) {
> - ret = -ENOMEM;
> - goto eclk;
> - }
> + if (irq < 0)
> + return -ENODEV;
>  
>   i2c->adap.owner   = THIS_MODULE;
>   i2c->adap.retries = 5;
> @@ -1126,17 +1117,15 @@ static int i2c_pxa_probe(struct platform_device *dev)
>  
>   strlcpy(i2c->adap.name, "pxa_i2c-i2c", sizeof(i2c->adap.name));
>  
> - i2c->clk = clk_get(&dev->dev, NULL);
> + i2c->clk = devm_clk_get(&dev->dev, NULL);
>   if (IS_ERR(i2c->clk)) {
> - ret = PTR_ERR(i2c->clk);
> - goto eclk;
> + return PTR_ERR(i2c->clk);
>   }

the same.

>  
> - i2c->reg_base = ioremap(res->start, resource_size(res));
> - if (!i2c->reg_base) {
> - ret = -EIO;
> - goto eremap;
> - }
> + res = platform_get_resource(dev, IORESOURCE_MEM, 0);
> + i2c->reg_base = devm_ioremap_resource(&dev->dev, res);
> + if (IS_ERR(i2c->reg_base))
> + return PTR_ERR(i2c->reg_bas);
>  
>   i2c->reg_ibmr = i2c->reg_base + pxa_reg_layout[i2c_type].ibmr;
>   i2c->reg_idbr = i2c->reg_base + pxa_reg_layout[i2c_type].idbr;
> @@ -1166,10 +1155,10 @@ static int i2c_pxa_probe(struct platform_device *dev)
>   i2c->adap.algo = &i2c_pxa_pio_algorithm;
>   } else {
>   i2c->adap.algo = &i2c_pxa_algorithm;
> - ret = request_irq(irq, i2c_pxa_handler, IRQF_SHARED,
> -   dev_name(&dev->dev), i2c);
> + ret = devm_request_irq(&dev->dev, irq, i2c_pxa_handler,
> + IRQF_SHARED, dev_name(&dev->dev), i2c);
>   if (ret)
> - goto ereqirq;
> + return ret;
>   }
>  
>   i2c_pxa_reset(i2c);
> @@ -1183,7 +1172,7 @@ static int i2c_pxa_probe(struct platform_device *dev)
>   ret = i2c_add_numbered_adapter(&i2c->adap);
>   if (ret < 0) {
>   printk(KERN_INFO "I2C: Failed to add bus\n");
> - goto eadapt;
> + return ret;
>   }
>   of_i2c_register_devices(&i2c->adap);
>  
> @@ -1198,19 +1187,6 @@ static int i2c_pxa_probe(struct platform_device *dev)
>  #endif
>   return 0;
>  
> -eadapt:
> - if (!i2c->use_pio)
> - free_irq(irq, i2c);
> -ereqirq:
> - clk_disable(i2c->clk);
> - iounmap(i2c->reg_base);
> -eremap:
> - clk_put(i2c->clk);
> -eclk:
> - kfree(i2c);
> -emalloc:
> - release_mem_region(res->start, resource_size(res));
> - return ret;
>  }
>  
>  static int i2c_pxa_remove(struct platform_device *dev)
> @@ -1218,15 +1194,6 @@ static int i2c_pxa_remove(struct platform_device *dev)
>   struct pxa_i2c *i2c = platform_get_drvdata(dev);
>  
>   i2c_del_adapter(&i2c->adap);
> - if (!i2c->use_pio)
> - free_irq(i2c->irq, i2c);
> -
> - clk_disable(i2c->clk);
> - clk_put(i2c->clk);
> -
> - iounmap(i2c->reg_base);
> - release_mem_region(i2c->iobase, i2c->iosize);
> - kfree(i2c);
>  
>   return 0;
>  }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kern

Re: [Suggestion] fs/namespace.c: the direct cause of the warning for: "ida_remove called for id=0 which is not allocated" with mnt_release_group_id()

2013-05-23 Thread Chen Gang

> From: Takashi Iwai 
> Subject: vfs: fix invalid ida_remove() call
> 
> When the group id of a shared mount is not allocated, the umount still
> tries to call mnt_release_group_id(), which eventually hits a kernel
> warning at ida_remove() spewing a message like:
> 
>   ida_remove called for id=0 which is not allocated.
> 
> This patch fixes the bug simply checking the group id in the caller.
> 
> Signed-off-by: Takashi Iwai 
> Reported-by: Cristian Rodr�guez 
> Cc: Al Viro 
> Signed-off-by: Andrew Morton 
> ---
> 

Tested-by Chen Gang 

Thanks.


>  fs/pnode.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff -puN fs/pnode.c~vfs-fix-invalid-ida_remove-call fs/pnode.c
> --- a/fs/pnode.c~vfs-fix-invalid-ida_remove-call
> +++ a/fs/pnode.c
> @@ -83,7 +83,8 @@ static int do_make_slave(struct mount *m
>   if (peer_mnt == mnt)
>   peer_mnt = NULL;
>   }
> - if (IS_MNT_SHARED(mnt) && list_empty(&mnt->mnt_share))
> + if (mnt->mnt_group_id && IS_MNT_SHARED(mnt) &&
> + list_empty(&mnt->mnt_share))
>   mnt_release_group_id(mnt);
>  
>   list_del_init(&mnt->mnt_share);
> _
> 
> 
> 


-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-23 Thread CAI Qian

OK, here is clearer stack output from the run.
CAI Qian

+ ./check
FSTYP -- xfs (non-debug)
PLATFORM  -- Linux/s390x ibm-z10-23 3.9.3

001  29s
002  3s
003  2s
004  [not run] this test requires a valid $SCRATCH_DEV
005  2s
006  9s
007  10s
008  7s
009  [not run] this test requires a valid $SCRATCH_DEV
010  [not run] dbtest was not built for this platform
011  9s
012  10s
013  35s
014  5s
015  [not run] this test requires a valid $SCRATCH_DEV
016  [not run] this test requires a valid $SCRATCH_DEV
017  [not run] this test requires a valid $SCRATCH_DEV
018  [not run] this test requires a valid $SCRATCH_DEV
019  [not run] this test requires a valid $SCRATCH_DEV
020 


[ 1316.571227] XFS (dm-0): Mounting Filesystem
[ 1316.697803] XFS (dm-0): Ending clean mount
[ 1318.080615] XFS (dm-0): Ending clean mount
[ 1348.791125] XFS (dm-0): Mounting Filesystem
[ 1348.989166] XFS (dm-0): Ending clean mount
[ 1353.335478] XFS (dm-0): Mounting Filesystem
[ 1353.496364] XFS (dm-0): Ending clean mount
[ 1357.495427] XFS (dm-0): Mounting Filesystem
[ 1357.676971] XFS (dm-0): Ending clean mount
[ 1361.646399] XFS (dm-0): Mounting Filesystem
[ 1361.890426] XFS (dm-0): Ending clean mount
[ 1371.798944] XFS (dm-0): Mounting Filesystem
[ 1371.976922] XFS (dm-0): Ending clean mount
[ 1384.559103] XFS (dm-0): Mounting Filesystem
[ 1384.725657] XFS (dm-0): Ending clean mount
[ 1393.131347] XFS (dm-0): Mounting Filesystem
[ 1393.357927] XFS (dm-0): Ending clean mount
[ 1407.282708] XFS (dm-0): Mounting Filesystem
[ 1407.745176] XFS (dm-0): Ending clean mount
[ 1422.927074] XFS (dm-0): Mounting Filesystem
[ 1423.136266] XFS (dm-0): Ending clean mount
[ 1425.500910] XFS (dm-0): Mounting Filesystem
[ 1425.608851] XFS (dm-0): Ending clean mount
[ 1450.978110] XFS (dm-0): Mounting Filesystem
[ 1451.255368] XFS (dm-0): Ending clean mount
[ 1453.603742] XFS (dm-0): Mounting Filesystem
[ 1453.680657] XFS (dm-0): Ending clean mount
[ 1456.262266] XFS (dm-0): Mounting Filesystem
[ 1456.330515] XFS (dm-0): Ending clean mount
[ 1457.053767] XFS (dm-0): Mounting Filesystem
[ 1457.107258] XFS (dm-0): Ending clean mount
[ 1462.049374] XFS (dm-0): Mounting Filesystem
[ 1462.111389] XFS (dm-0): Ending clean mount
[ 1471.109589] ODEBUG: deactivate not available (active state 0) object type: ti
mer_list hint: process_timeout+0x0/0x8
[ 1471.109683] [ cut here ]
[ 1471.109688] WARNING: at lib/debugobjects.c:260
[ 1471.109692] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F) dm_l
og(F) dm_mod(F)
[ 1471.109848] CPU: 0 Tainted: GF3.9.3 #2
[ 1471.109858] Process swapper/0 (pid: 0, task: 00a2b4d0, ksp: 0
0a17d28)
[ 1471.109868] Krnl PSW : 0404c0018000 0046c84a (debug_print_object+
0xca/0xd8)
[ 1471.114762]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS:  00a2b4d0 0067 0101f708
[ 1471.114769]0046c846 84a4d448 0086936a 000
001040700
[ 1471.114773]01a0f290 0401 00874cf8 000
000a395d8
[ 1471.114777]0195f820 0001bd20 0046c846 000
1bc20
[ 1471.114792] Krnl Code: 0046c83a: e3441004lg  %r4,0(%r
4,%r1)
   0046c840: c0e500139f88   brasl   %r14,6e0750
  #0046c846: a7f40001   brc 15,46c848
  >0046c84a: a7f4ffc2   brc 15,46c7ce
   0046c84e: a729   lghi%r2,0
   0046c852: a7f4ffd7   brc 15,46c800
   0046c856: 0707   bcr 0,%r7
   0046c858: ebaff0680024   stmg%r10,%r15,104(%r15)
[ 1471.114825] Call Trace:
[ 1471.114828] ([<0046c846>] debug_print_object+0xc6/0xd8)
[ 1471.114833]  [<0046d35c>] debug_object_deactivate+0x15c/0x160
[ 1471.114838]  [<00148244>] run_timer_softirq+0x180/0x464
[ 1471.114843]  [<0013d8d6>] __do_softirq+0x112/0x42c
[ 1471.114847]  [<0013ddf8>] irq_exit+0xc8/0xe8
[ 1471.114851]  [<0010d55e>] do_extint+0x25e/0x318
[ 1471.114859]  [<006f0d90>] ext_skip+0x40/0x44
[ 1471.114866]  [<006f05d6>] vtime_stop_cpu+0x52/0xbc
[ 1471.114870] ([<006f05b4>] vtime_stop_c

Re: [PATCH] scsi: megaraid: check kzalloc

2013-05-23 Thread Santosh Y

On Fri, May 24, 2013 at 7:52 AM, Libo Chen  wrote:
>
> we should check kzalloc, avoid to hit oops
>
> Signed-off-by: Libo Chen 
> ---
>  drivers/scsi/megaraid.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
> index 846f475..195b095 100644
> --- a/drivers/scsi/megaraid.c
> +++ b/drivers/scsi/megaraid.c
> @@ -4162,6 +4162,10 @@ mega_internal_command(adapter_t *adapter, megacmd_t 
> *mc, mega_passthru *pthru)
>
> sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
> scmd->device = sdev;
  ^^
I guess assigning after the NULL check of 'sdev' is more appropriate.

> +   if (!scmd->device) {
> +   scsi_free_command(GFP_KERNEL, scmd);
> +   return -ENOMEM;
> +   }
>
> memset(adapter->int_cdb, 0, sizeof(adapter->int_cdb));
> scmd->cmnd = adapter->int_cdb;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
~Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] ktap 0.1 released

2013-05-23 Thread zhangwei(Jovi)

On 2013/5/22 2:13, Frank Ch. Eigler wrote:
> "zhangwei(Jovi)"  writes:
> 
>> I'm pleased to announce that ktap release v0.1, this is the first official
>> release of ktap project [...]
> 
> Congrats.
> 
> 
>> = what's ktap?
>>
>>Because this is the first release, so there wouldn't include too
>>much features, just contain several basic features about tracing,
>>[...]
> 
> Nice progress.  Reviewing the safety/security items from
> https://lkml.org/lkml/2013/1/17/623, I see improvement in most.
Thanks, frank, you give me a lot of helpful technical comments in that RFC mail,
also as this one :) really thanks.
> 
> For example, you seem to be using GFP_ATOMIC for run-time memory
> allocation, which is safer than before (though still could exhaust
> resources).  OTOH your code doesn't handle *failure* of such
> allocation attempts (see call sites to kp_*alloc).
Yes, memory allocation would be change to be more safer.
> 
> There still doesn't seem to be safety constraints on the incoming
> byte code (like jump ranges, or loop counts).
> 
> It's nice to see some arithmetic OP_* checks, and the user_string
> function is probably safe enough now.  You'll need something analogous
> for kernel space (and possibly as verification for the various %s
> kp_printfs).  The hash tables might be susceptible to the deliberate
> hash collision attacks from last year.
Current hashtable implementation is efficient, but need have more
security concern as you pointed.
> 
> It's nice to see the *_STACK_SIZE constraints in the bytecode
> interpreter; is there any C-level recursion remaining to consume
> excessive kernel stack?
library C functions should not be a problem, like other kernel functions,
author should take care on stack overflow in own risk.
> 
> Exposing os.sleep/os.wait (or general kernel functions) to become
> callable from the scripts is fraught with danger.  You just can't call
> the underlying functions from random kernel context (imagine from the
> most pessimal possible kprobe or tracepoint, somewhere within an
> atomic section), and you'll get crashes.
Right, so those functions only can be called from mainthread,
I will add these checking later.

> 
> You should write several time/space/invasivity stress-tests to help
> see how future progress improves the code's performance/safety on
> these and other problem areas.
Yes, there already have a test/ directory for basic functionality testing,
obviously it's not enough, I will add more benchmark and safety checking 
testcases.
> 
> 
>> = Planned Changes
>>
>>we are planning to enable more kernel ineroperability into ktap [...]
> 
> As per the above, you'll want to be extremely careful about the idea
> to export FFI to let the lua scripts call into arbitrary kernel
> functions.  Perhaps wrap it into a 'guru' mode flag?
Definitely, there must need a mode flag to separate safety and not-safety.
> 
> 
> - FChE
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v3 1/3] i2c: i2c-bfin-twi: convert to devm_* API

2013-05-23 Thread Sachin Kamat

On 23 May 2013 17:30, Libo Chen  wrote:
> peripheral_request_list has got free if any one faild, so no need to free 
> again in err case.
> aovid this, convert them to devm_* API
>
> Signed-off-by: Libo Chen 

It is a good practice to include changelog while submitting revised
versions of the patch.

> ---
>  drivers/i2c/busses/i2c-bfin-twi.c |   38 +---
>  1 files changed, 10 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-bfin-twi.c 
> b/drivers/i2c/busses/i2c-bfin-twi.c
> index 05080c4..2b99c48 100644
> --- a/drivers/i2c/busses/i2c-bfin-twi.c
> +++ b/drivers/i2c/busses/i2c-bfin-twi.c
> @@ -621,35 +621,27 @@ static int i2c_bfin_twi_probe(struct platform_device 
> *pdev)
> int rc;
> unsigned int clkhilow;
>
> -   iface = kzalloc(sizeof(struct bfin_twi_iface), GFP_KERNEL);
> +   iface = devm_kzalloc(&pdev->dev, sizeof(struct bfin_twi_iface),
> +   GFP_KERNEL);
> if (!iface) {
> dev_err(&pdev->dev, "Cannot allocate memory\n");

This error message is not really required.

> -   rc = -ENOMEM;
> -   goto out_error_nomem;
> +   return -ENOMEM;
> }
>
> spin_lock_init(&(iface->lock));
>
> /* Find and map our resources */
> res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -   if (res == NULL) {
> -   dev_err(&pdev->dev, "Cannot get IORESOURCE_MEM\n");
> -   rc = -ENOENT;
> -   goto out_error_get_res;
> -   }
> -
> -   iface->regs_base = ioremap(res->start, resource_size(res));
> -   if (iface->regs_base == NULL) {
> +   iface->regs_base = devm_ioremap_resource(&pdev->dev, res);
> +   if (IS_ERR(iface->regs_base)) {
> dev_err(&pdev->dev, "Cannot map IO\n");

devm_ioremap_resource prints out the error messages. So the above line
is redundant.

> -   rc = -ENXIO;
> -   goto out_error_ioremap;
> +   return -ENXIO;

Why not return PTR_ERR(iface->regs_base)?

> }
>
> iface->irq = platform_get_irq(pdev, 0);
> if (iface->irq < 0) {
> dev_err(&pdev->dev, "No IRQ specified\n");
> -   rc = -ENOENT;
> -   goto out_error_no_irq;
> +   return -ENOENT;
> }
>
> p_adap = &iface->adap;
> @@ -666,10 +658,10 @@ static int i2c_bfin_twi_probe(struct platform_device 
> *pdev)
> "i2c-bfin-twi");
> if (rc) {
> dev_err(&pdev->dev, "Can't setup pin mux!\n");
> -   goto out_error_pin_mux;
> +   return rc;
> }
>
> -   rc = request_irq(iface->irq, bfin_twi_interrupt_entry,
> +   rc = devm_request_irq(&pdev->dev, iface->irq, 
> bfin_twi_interrupt_entry,
> 0, pdev->name, iface);
> if (rc) {
> dev_err(&pdev->dev, "Can't get IRQ %d !\n", iface->irq);
> @@ -707,16 +699,9 @@ static int i2c_bfin_twi_probe(struct platform_device 
> *pdev)
> return 0;
>
>  out_error_add_adapter:
> -   free_irq(iface->irq, iface);
>  out_error_req_irq:
> -out_error_no_irq:
> peripheral_free_list((unsigned short *)pdev->dev.platform_data);
> -out_error_pin_mux:
> -   iounmap(iface->regs_base);
> -out_error_ioremap:
> -out_error_get_res:
> -   kfree(iface);
> -out_error_nomem:
> +

Empty line not required.

You could probably combine all the empty labels into one meaningful label above.

> return rc;


-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5 v2] VFIO PPC64: add VFIO support on POWERPC64

2013-05-23 Thread Alexey Kardashevskiy

On 05/24/2013 12:56 AM, Alex Williamson wrote:
> On Tue, 2013-05-21 at 13:33 +1000, Alexey Kardashevskiy wrote:
>> The series adds support for VFIO on POWERPC in user space (such as QEMU).
>> The in-kernel real mode IOMMU support is added by another series posted
>> separately.
>>
>> As the first and main aim of this series is the POWERNV platform support,
>> the "Enable on POWERNV platform" patch goes first and introduces an API
>> to be used by the VFIO IOMMU driver. The "Enable on pSeries platform" patch
>> simply registers PHBs in the IOMMU subsystem and expects the API to be 
>> present,
>> it enables VFIO support in fully emulated QEMU guests.
>>
>> The main change is that this series was changed and tested against v3.10-rc1.
>> It also contains some bugfixes which are mentioned (if any) in the patch 
>> messages.
>>
>> Alexey Kardashevskiy (3):
>>   powerpc/vfio: Enable on POWERNV platform
>>   powerpc/vfio: Implement IOMMU driver for VFIO
>>   powerpc/vfio: Enable on pSeries platform
>>
>>  Documentation/vfio.txt  |   63 +
>>  arch/powerpc/include/asm/iommu.h|   26 ++
>>  arch/powerpc/kernel/iommu.c |  323 +++
>>  arch/powerpc/platforms/powernv/pci-ioda.c   |1 +
>>  arch/powerpc/platforms/powernv/pci-p5ioc2.c |5 +-
>>  arch/powerpc/platforms/powernv/pci.c|2 +
>>  arch/powerpc/platforms/pseries/iommu.c  |4 +
>>  drivers/iommu/Kconfig   |8 +
>>  drivers/vfio/Kconfig|6 +
>>  drivers/vfio/Makefile   |1 +
>>  drivers/vfio/vfio.c |1 +
>>  drivers/vfio/vfio_iommu_spapr_tce.c |  377 
>> +++
>>  include/uapi/linux/vfio.h   |   34 +++
>>  13 files changed, 850 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
>>
> 
> These look ok to me, how do you want to integrate them?  Should I
> provide Acks on patches 2 & 3 and let them get pushed through the ppc
> tree or should I wait for patch 1 then push 2 & 3 through my tree?

Please ack on 2 & 3 and Ben will merge all three into his tree. Thanks!



-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v3 1/3] i2c: i2c-bfin-twi: convert to devm_* API

2013-05-23 Thread Libo Chen

On 2013/5/24 10:49, Zhang, Sonic wrote:
>> >-  iface = kzalloc(sizeof(struct bfin_twi_iface), GFP_KERNEL);
>> >+  iface = devm_kzalloc(&pdev->dev, sizeof(struct bfin_twi_iface),
>> >+  GFP_KERNEL);
>> >   if (!iface) {
>> >   dev_err(&pdev->dev, "Cannot allocate memory\n");
>> >-  rc = -ENOMEM;
>> >-  goto out_error_nomem;
>> >+  return -ENOMEM;
>> >   }
>> >
>> >   spin_lock_init(&(iface->lock));
>> >
>> >   /* Find and map our resources */
>> >   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> >-  if (res == NULL) {
>> >-  dev_err(&pdev->dev, "Cannot get IORESOURCE_MEM\n");
>> >-  rc = -ENOENT;
>> >-  goto out_error_get_res;
>> >-  }
> Why remove the resource mem check?
> 
> Regards,
> 
> Sonic

devm_ioremap_resource() had done for us, you can refernce the function`comment.

Thanks,

Libo




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: move proc mount options out of pid_namespace

2013-05-23 Thread Gu Zheng

Hi Stephen,

On 05/24/2013 07:32 AM, Stephen Mell wrote:

> On Thursday, May 23, 2013 18:20:57 Gu Zheng wrote:
> 
>> Here it'll create a new proc sb instance which holds the same context as the 
>> old ones
>> each time we mount proc though in the same PID namespace, won't it?
> I believe so. But this is the point, right? 

Yes, but I think it's also the problem.

>They won't be identical if different mount options are used, I don't think.

If different mount options are used, we'll create different super block 
instance, and they have
the same context, only the difference is each one holds different proc_sb_info.
But I think what we really want is only one proc sb instance and create 
different proc_sb_info
if different mount options are used.

> 
>> Here the pre-check seems needless.
> Is that new with my patch, or has it always been needless?

Yeah, it's always needless.

Thanks,
Gu

> 
> Thanks,
> Stephen
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: XFS assertion from truncate. (3.10-rc2)

2013-05-23 Thread Dave Jones

On Thu, May 23, 2013 at 09:52:19PM -0400, Dave Jones wrote:
 > On Fri, May 24, 2013 at 11:26:25AM +1000, Dave Chinner wrote:
 >  
 >  > You want to print the debug output if the masked value != 0.
 > 
 > XFS (sda2): xfs_setattr_size: mask 0xaa69, masked 0x801 ii 
 > 0x8802129e98c0, d 0x88021757d970 path /davej/src/trinity/tmp/tmp.5/

ah, sneaky. There's some unprintable characters there that didn't show up in my 
terminal when I cut-n-pasted.
They made it to the logs though..

XFS (sda2): xfs_setattr_size: mask 0xaa69, masked 0x801 ii 0x8802129e98c0, 
d 0x88021757d970 path /davej/src/trinity/tmp/tmp.5/

Hexdump: of that part.. 706d 352e 012f 0a01 
So filename is 0x01 0x01

Don't know if that's relevant to the bug or not..

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regulator: 88pm800: add regulator driver

2013-05-23 Thread yi zhang

Hi, Brown:

2013/5/23 Mark Brown :
> On Wed, May 22, 2013 at 08:10:53PM +0800, yizhang.m...@gmail.com wrote:
>
> You need a DT binding document for any new DT bindings like this one.
>
Thanks for your comments, I'll do it;

>> +static const unsigned int BUCK1_table[] = {
>> + /* 0x00-0x4F: from 0.6 to 1.5875V with step 0.0125V */
>> + /* 0x50-0x7F: from 1.6 to 1.8V with step 0.05V */
>
> Write this out as code, don't use a big table for large sets of
> voltages.

 The voltage table is "const", if we write it out as code, seems
it's hard for us
 to keep this attribute;
 And the voltage of this BUCK is not linear(it's separated as two parts),
 seems I shouldn't use the framework helpers to implement it; Right?
  what do you think?

>> +static int pm800_list_voltage(struct regulator_dev *rdev, unsigned index)
>> +{
>> + struct pm800_regulator_info *info = rdev_get_drvdata(rdev);
>> + int ret = -EINVAL;
>> +
>> + if (info->vol_table && (index < rdev->desc->n_voltages))
>> + ret = info->vol_table[index];
>> +
>> + return ret;
>> +}
>
> For things that are just table lookups use the framework helpers.
>
   OK, I'll change it; thanks for your comments;
>> +static int choose_voltage(struct regulator_dev *rdev, int min_uv, int 
>> max_uv)
>> +{
>
> Similarly here.
>
OK, copy that; thanks;

>> +static int pm800_set_voltage(struct regulator_dev *rdev,
>> +  int min_uv, int max_uv, unsigned *sel)
>> +{
>> + *sel = choose_voltage(rdev, min_uv, max_uv);
>> + if (*sel < 0)
>> + return -EINVAL;
>> + return regulator_set_voltage_sel_regmap(rdev, *sel);
>> +}
>
> Implement map_voltage.
>
>> + for_each_child_of_node(nproot, np) {
>> + if (!of_node_cmp(np->name, info->desc.name)) {
>> + config->init_data =
>> + of_get_regulator_init_data(&pdev->dev, np);
>> + config->of_node = np;
>> + break;
>> + }
>> + }
>
> Use of_regulator_match().

Copy that, thanks for your comment; I'll change it;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] usb/xhci: unify parameter of xhci_msi_irq

2013-05-23 Thread Alex Shi


Ops, I just find a old patch left on my laptop. and it still works on latest
Linus tree. I don't remember there is a reasonable excuse to reject this patch.

So, anyone like to pick it up?

--
>From 6ae1b9e71f9b14be5774ae9c1b4cf57cd4e747ac Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Mon, 11 Jun 2012 15:10:18 +0800
Subject: [PATCH] usb/xhci: unify parameter of xhci_msi_irq

According to Felipe and Alan's comments the second parameter of irq
handler should be 'void *' not a specific structure pointer.
So change it.

Signed-off-by: Alex Shi 
Acked-by: Felipe Balbi 
---
 drivers/usb/host/xhci-ring.c |2 +-
 drivers/usb/host/xhci.c  |4 ++--
 drivers/usb/host/xhci.h  |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 23b4aef..cc8a52f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2479,7 +2479,7 @@ hw_died:
return IRQ_HANDLED;
 }
 
-irqreturn_t xhci_msi_irq(int irq, struct usb_hcd *hcd)
+irqreturn_t xhci_msi_irq(int irq, void *hcd)
 {
return xhci_irq(hcd);
 }
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index afdc73e..f7d40c1 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -215,7 +215,7 @@ static int xhci_setup_msi(struct xhci_hcd *xhci)
return ret;
}
 
-   ret = request_irq(pdev->irq, (irq_handler_t)xhci_msi_irq,
+   ret = request_irq(pdev->irq, xhci_msi_irq,
0, "xhci_hcd", xhci_to_hcd(xhci));
if (ret) {
xhci_dbg(xhci, "disable MSI interrupt\n");
@@ -287,7 +287,7 @@ static int xhci_setup_msix(struct xhci_hcd *xhci)
 
for (i = 0; i < xhci->msix_count; i++) {
ret = request_irq(xhci->msix_entries[i].vector,
-   (irq_handler_t)xhci_msi_irq,
+   xhci_msi_irq,
0, "xhci_hcd", xhci_to_hcd(xhci));
if (ret)
goto disable_msix;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index de3d6e3..737ef54 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1710,7 +1710,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated);
 
 int xhci_get_frame(struct usb_hcd *hcd);
 irqreturn_t xhci_irq(struct usb_hcd *hcd);
-irqreturn_t xhci_msi_irq(int irq, struct usb_hcd *hcd);
+irqreturn_t xhci_msi_irq(int irq, void *hcd);
 int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev);
 void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev);
 int xhci_alloc_tt_info(struct xhci_hcd *xhci,
-- 
1.7.5.4

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/4] dma-mapping: Define dma_{alloc,free}_attrs() for all archs

2013-05-23 Thread Damian Hobson-Garcia

On 2013/05/23 18:47, Catalin Marinas wrote:
> On Thu, May 23, 2013 at 03:47:13AM +0100, Damian Hobson-Garcia wrote:
>> Hi Catalin,
>> On 2013/05/22 18:47, Catalin Marinas wrote:
>>> On Wed, May 22, 2013 at 03:37:17AM +0100, Damian Hobson-Garcia wrote:
 Hello,
 On 2013/04/30 12:01, Damian Hobson-Garcia wrote:
> Most architectures that define CONFIG_HAVE_DMA=y, have implementations for
> both dma_alloc_attrs() and dma_free_attrs().  All achitectures that do
> not define CONFIG_HAVE_DMA also have both of these definitions provided by
> dma-mapping-broken.h.
>>>
>>> BTW, shouldn't this be called CONFIG_HAVE_DMA_ATTRS?
>>
>> CONFIG_HAVE_DMA_ATTRS is currently used to enable the functions to
>> set/get the DMA attribute values. Poking through the headers, it looks
>> like the struct dma_attrs is defined regardless of the
>> CONFIG_HAVE_DMA_ATTRS setting, so in that respect
>> we always seem to "have" DMA attributes (if we have DMA), but they may
>> not always be meaningful (ie. set to some value).
> 
> My point was about the commit log - grep'ing the kernel for
> CONFIG_HAVE_DMA did not return anything.
> 

Oh yes, my mistake.  It should be CONFIG_HAS_DMA instead of
CONFIG_HAVE_DMA.  I'll update it.

Damian


-- 
Damian Hobson-Garcia
IGEL Co.,Ltd
http://www.igel.co.jp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH RFC v3 1/3] i2c: i2c-bfin-twi: convert to devm_* API

2013-05-23 Thread Zhang, Sonic

Hi Libo,

>-Original Message-
>From: Libo Chen [mailto:libo.c...@huawei.com]
>Sent: Thursday, May 23, 2013 8:00 PM
>To: w...@the-dreams.de
>Cc: guz.f...@cn.fujitsu.com; Zhang, Sonic; uclinux-dist-
>de...@blackfin.uclinux.org; linux-...@vger.kernel.org; linux-
>ker...@vger.kernel.org; lize...@huawei.com; libo.c...@huawei.com
>Subject: [PATCH RFC v3 1/3] i2c: i2c-bfin-twi: convert to devm_* API
>
>peripheral_request_list has got free if any one faild, so no need to free 
>again in err
>case.
>aovid this, convert them to devm_* API
>
>Signed-off-by: Libo Chen 
>---
> drivers/i2c/busses/i2c-bfin-twi.c |   38 +---
> 1 files changed, 10 insertions(+), 28 deletions(-)
>
>diff --git a/drivers/i2c/busses/i2c-bfin-twi.c 
>b/drivers/i2c/busses/i2c-bfin-twi.c
>index 05080c4..2b99c48 100644
>--- a/drivers/i2c/busses/i2c-bfin-twi.c
>+++ b/drivers/i2c/busses/i2c-bfin-twi.c
>@@ -621,35 +621,27 @@ static int i2c_bfin_twi_probe(struct platform_device
>*pdev)
>   int rc;
>   unsigned int clkhilow;
>
>-  iface = kzalloc(sizeof(struct bfin_twi_iface), GFP_KERNEL);
>+  iface = devm_kzalloc(&pdev->dev, sizeof(struct bfin_twi_iface),
>+  GFP_KERNEL);
>   if (!iface) {
>   dev_err(&pdev->dev, "Cannot allocate memory\n");
>-  rc = -ENOMEM;
>-  goto out_error_nomem;
>+  return -ENOMEM;
>   }
>
>   spin_lock_init(&(iface->lock));
>
>   /* Find and map our resources */
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>-  if (res == NULL) {
>-  dev_err(&pdev->dev, "Cannot get IORESOURCE_MEM\n");
>-  rc = -ENOENT;
>-  goto out_error_get_res;
>-  }

Why remove the resource mem check?

Regards,

Sonic

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3.9-stable] staging:iio:light:tsl2x7x: fix the error handling in tsl2x7x_probe()

2013-05-23 Thread Jonghwan Choi

This patch looks like it should be in the 3.9-stable tree, should we apply
it?

--

From: "Wei Yongjun "

commit 3b813798aa7030f1beef638c75f8b0008f737a82 upstream

Fix to return -EINVAL in the i2c device found error handling
case instead of 0, as done elsewhere in this function.
And also correct the fail1 and fail2 lable to do the right thing.

Signed-off-by: Wei Yongjun 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Jonghwan Choi 
---
 drivers/staging/iio/light/tsl2x7x_core.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/iio/light/tsl2x7x_core.c
b/drivers/staging/iio/light/tsl2x7x_core.c
index a58731e..2d40c03 100644
--- a/drivers/staging/iio/light/tsl2x7x_core.c
+++ b/drivers/staging/iio/light/tsl2x7x_core.c
@@ -1869,6 +1869,7 @@ static int tsl2x7x_probe(struct i2c_client *clientp,
dev_info(&chip->client->dev,
"%s: i2c device found does not match
expected id\n",
__func__);
+   ret = -EINVAL;
goto fail1;
}
 
@@ -1907,7 +1908,7 @@ static int tsl2x7x_probe(struct i2c_client *clientp,
if (ret) {
dev_err(&clientp->dev,
"%s: irq request failed", __func__);
-   goto fail2;
+   goto fail1;
}
}
 
@@ -1920,17 +1921,17 @@ static int tsl2x7x_probe(struct i2c_client *clientp,
if (ret) {
dev_err(&clientp->dev,
"%s: iio registration failed\n", __func__);
-   goto fail1;
+   goto fail2;
}
 
dev_info(&clientp->dev, "%s Light sensor found.\n", id->name);
 
return 0;
 
-fail1:
+fail2:
if (clientp->irq)
free_irq(clientp->irq, indio_dev);
-fail2:
+fail1:
iio_device_free(indio_dev);
 
return ret;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3.9-stable] staging/iio/mxs-lradc: fix preenable for multiple

2013-05-23 Thread Jonghwan Choi

This patch looks like it should be in the 3.9-stable tree, should we apply
it?

--

From: "Michał Mirosław "

commit c80712c793febdf1b13ad0e1c71a051e071b3fd8 upstream

This fixes 'preenable failed: -EINVAL' error when using this driver.

Signed-off-by: Michał Mirosław "
Signed-off-by: Jonathan Cameron 
Signed-off-by: Jonghwan Choi 
---
 drivers/staging/iio/adc/mxs-lradc.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/iio/adc/mxs-lradc.c
b/drivers/staging/iio/adc/mxs-lradc.c
index 55a459b..f5e9e55 100644
--- a/drivers/staging/iio/adc/mxs-lradc.c
+++ b/drivers/staging/iio/adc/mxs-lradc.c
@@ -693,7 +693,6 @@ static void mxs_lradc_trigger_remove(struct iio_dev
*iio)
 static int mxs_lradc_buffer_preenable(struct iio_dev *iio)
 {
struct mxs_lradc *lradc = iio_priv(iio);
-   struct iio_buffer *buffer = iio->buffer;
int ret = 0, chan, ofs = 0;
unsigned long enable = 0;
uint32_t ctrl4_set = 0;
@@ -701,7 +700,7 @@ static int mxs_lradc_buffer_preenable(struct iio_dev
*iio)
uint32_t ctrl1_irq = 0;
const uint32_t chan_value = LRADC_CH_ACCUMULATE |
((LRADC_DELAY_TIMER_LOOP - 1) <<
LRADC_CH_NUM_SAMPLES_OFFSET);
-   const int len = bitmap_weight(buffer->scan_mask,
LRADC_MAX_TOTAL_CHANS);
+   const int len = bitmap_weight(iio->active_scan_mask,
LRADC_MAX_TOTAL_CHANS);
 
if (!len)
return -EINVAL;
@@ -728,7 +727,7 @@ static int mxs_lradc_buffer_preenable(struct iio_dev
*iio)
lradc->base + LRADC_CTRL1 + STMP_OFFSET_REG_CLR);
writel(0xff, lradc->base + LRADC_CTRL0 + STMP_OFFSET_REG_CLR);
 
-   for_each_set_bit(chan, buffer->scan_mask, LRADC_MAX_TOTAL_CHANS) {
+   for_each_set_bit(chan, iio->active_scan_mask, LRADC_MAX_TOTAL_CHANS)
{
ctrl4_set |= chan << LRADC_CTRL4_LRADCSELECT_OFFSET(ofs);
ctrl4_clr |= LRADC_CTRL4_LRADCSELECT_MASK(ofs);
ctrl1_irq |= LRADC_CTRL1_LRADC_IRQ_EN(ofs);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] scsi: megaraid: check kzalloc

2013-05-23 Thread Libo Chen


we should check kzalloc, avoid to hit oops

Signed-off-by: Libo Chen 
---
 drivers/scsi/megaraid.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 846f475..195b095 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -4162,6 +4162,10 @@ mega_internal_command(adapter_t *adapter, megacmd_t *mc, 
mega_passthru *pthru)

sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
scmd->device = sdev;
+   if (!scmd->device) {
+   scsi_free_command(GFP_KERNEL, scmd);
+   return -ENOMEM;
+   }

memset(adapter->int_cdb, 0, sizeof(adapter->int_cdb));
scmd->cmnd = adapter->int_cdb;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: Have debug/nmi restore the IDT it replaced

2013-05-23 Thread Steven Rostedt

I've sent this out before, and I'm sending it out again.

https://patchwork.kernel.org/patch/2082571/

It was ignored because I signed it with "Not-yet-signed-off-by". This
time I'm signing it off with a real full signed off by, as this will
help with Seiji's patch set to do the IDT swap to enable tracing of
interrupt vectors.

-
I've been playing with Seiji's patches:

https://lkml.org/lkml/2013/1/21/573

Which add tracing to the x86 interrupt vectors. But to keep tracing
overhead down to zero when tracing is inactive, it uses a swap of the
IDT to replace the irq vectors with versions that have tracepoints in
them, when tracing is enabled. Even though tracepoints have "nops", we
do this to keep even that overhead eliminated.

These seemed to work, until I ran the following command with trace-cmd:


 trace-cmd start -p function_graph -g "smp_trace_apic_timer_interrupt" \
   -g "smp_apic_timer_interrupt" -e irq_vectors


What trace-cmd does above, is not only enables the irq_vectors, but also
produces the call graphs of the two interrupt vector functions:
smp_trace_apic_timer_interrupt and smp_apic_timer_interrupt

The result was rather surprising. It showed only
smp_apic_timer_interrupt being called, and not the trace version.
Obviously, the tracepoints for the vectors were also not there.

When starting without the function graph tracer, it worked fine, but I
wanted to see the trace versions being called to be sure, which required
one of the function tracers.

Investigating, I found the problem. It's with the NMI and breakpoint IDT
handling. I wont explain it too much here, but see:

commit f8988175f "x86: Allow nesting of the debug stack IDT setting"
commit 42181186a "x86: Add counter when debug stack is used with
interrupts enabled"
commit 228bdaa95 "x86: Keep current stack in NMI breakpoints"

The basic gist of the above commits is that on NMI or DEBUG trap
entering, it needs to modify the IDT so that the stack pointer doesn't
get reset to the top of the stack again. On boot up, two IDT tables are
created. One for normal operations and one for this NMI/DEBUG case.

The issue is that it just toggles between the two IDT tables. But if
this happens when Seiji's swap had already happened, it replaces the
trace IDT table back to the normal table, and tracing stops, which sorta
defeats the purpose.


To solve this, I've added a 'hold_idt_descr' per cpu variable that
stores the IDT that was loaded and will use that to replace it. If the
DEBUG/NMI came in when the IDT was normal, it would replace it back with
the normal IDT, and if it came in when it was the trace IDT, it would
put back the trace IDT.

I've run a few tests so far on this code, but I need to run more
stressful ones. Meanwhile, until I find any bugs, I'm posting this patch
for your enjoyment. I think I got all the cases, as NMIs causes the
store/restore functions to be re-entrent without any locks.

Signed-off-by: Steven Rostedt 

Index: linux-trace.git/arch/x86/kernel/cpu/common.c
===
--- linux-trace.git.orig/arch/x86/kernel/cpu/common.c
+++ linux-trace.git/arch/x86/kernel/cpu/common.c
@@ -1148,10 +1148,54 @@ int is_debug_stack(unsigned long addr)
 }
 
 static DEFINE_PER_CPU(u32, debug_stack_use_ctr);
+static DEFINE_PER_CPU(struct desc_ptr, hold_idt_descr);
+static DEFINE_PER_CPU(struct desc_ptr, copy_idt_descr);
 
+/*
+ * Debug traps and NMIs will swap the IDT to have the debug
+ * trap not modify the stack (nmi_idt_descr). But as both the
+ * debug and NMIs share this, they need to be re-entrant. A debug
+ * trap could be doing the swap and after it incs the debug_stack_use_ctr,
+ * an NMI could come in. It too would need to do the swap, and it would
+ * need to swap every time.
+ *
+ * But, the IDT can be changed by other users, and this must restore
+ * that as well.
+ *
+ * Luckily, as interrupts are disabled from the set_zero to reset,
+ * we do not need to worry about the IDT changing underneath us
+ * from other users.
+ */
 void debug_stack_set_zero(void)
 {
+   /*
+* Writing to the IDT is atomic. If an NMI were to come
+* in after we did the compare but before the store_idt(),
+* it too would see the address == 0 and do the store itself.
+*/
+   if (this_cpu_read(hold_idt_descr.address) == 0)
+   store_idt(this_cpu_ptr(&hold_idt_descr));
+
+   /*
+* If an NMI were to come in now, it would not set hold_idt_descr,
+* but on exit, it would restore the IDT because the counter is
+* still zero here. Then it would set the .address back to zero too.
+*/
+
this_cpu_inc(debug_stack_use_ctr);
+
+   /*
+* NMI's coming in now are not an issue as we set the .address
+* and also incremented the ctr. But, if an NMI came in before
+* the counter was incremented, and after the .address was set,
+* the NMI would set the .address back to zero, and would hav

Re: XFS assertion from truncate. (3.10-rc2)

2013-05-23 Thread Dave Jones

On Fri, May 24, 2013 at 11:26:25AM +1000, Dave Chinner wrote:
 
 > You want to print the debug output if the masked value != 0.

XFS (sda2): xfs_setattr_size: mask 0xaa69, masked 0x801 ii 0x8802129e98c0, 
d 0x88021757d970 path /davej/src/trinity/tmp/tmp.5/

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily

2013-05-23 Thread Davidlohr Bueso

ping, Andrew?

On Wed, 2013-05-15 at 18:07 -0700, Davidlohr Bueso wrote:
> This patchset continues the work that began in the sysv ipc semaphore scaling 
> series: https://lkml.org/lkml/2013/3/20/546
> 
> Just like semaphores used to be, sysv shared memory and msg queues also abuse 
> the ipc
> lock, unnecessarily holding it for operations such as permission and security 
> checks. This
> patchset mostly deals with mqueues, and while shared mem can be done in a 
> very similar way,
> I want to get these patches out in the open first. It also does some pending 
> cleanups,
> mostly focused on the two level locking we have in ipc code, taking care of 
> ipc_addid() 
> and ipcctl_pre_down_nolock() - yes there are still functions that need to be 
> updated as well.
> 
> I have tried to split each patch to be as readable and specific as possible, 
> specially when
> shortening the critical regions, going one function at a time.
> 
> Patch 1 moves the locking to be explicitly done by the callers of ipc_addid.
> It updates msg, sem and shm.
> 
> Patches 2-3: are just wrappers around the ipc lock, initially suggested by 
> Linus.
> 
> Patch 4 is similar to patch 1, moving the rcu and rw_mutex locking out of
> ipcctl_pre_down_nolock so that the callers explicitly deals with them. It 
> updates msg, sem
> and shm.
> 
> Patch 5 shortens the critical region in msgctl_down(), using the lockless
> ipcctl_pre_down() function and only acquiring the ipc lock for RMID and SET 
> commands.
> 
> Patch 6 simply moves the what-should-be lockless logic of *_INFO and *_STAT 
> commands
> out of msgctl() into a new function, msgctl_lockless().
> 
> Patch 7 introduces the necessary wrappers around ipc_obtain_object[_check]()
> that will later enable us to separately lookup the ipc object without holding 
> the lock.
> 
> Patch 8 updates the previously added msgctl_nolock() to actually be lockless, 
> reducing
> the critical region for the STAT commands.
> 
> Patch 9 redoes the locking for msgsend().
> 
> Patch 10 redoes the locking for msgrcv().
> 
> Patch 11 removes the now unused msg_lock and msg_lock_check functions, 
> replaced by
> a smarter combination of rcu, ipc_obtain_object and ipc_object_lock.
> 
> Davidlohr Bueso (11):
>   ipc: move rcu lock out of ipc_addid
>   ipc: introduce ipc object locking helpers
>   ipc: close open coded spin lock calls
>   ipc: move locking out of ipcctl_pre_down_nolock
>   ipc,msg: shorten critical region in semctl_down
>   ipc,msg: introduce msgctl_nolock
>   ipc,msg: introduce lockless functions to obtain the ipc object
>   ipc,msg: make msgctl_nolock lockless
>   ipc,msg: reduce critical region in msgsnd
>   ipc,msg: make shorten critical region in msgrcv
>   ipc: remove unused functions
> 
>  ipc/msg.c  | 227 
> ++---
>  ipc/sem.c  |  42 +++-
>  ipc/shm.c  |  32 ++---
>  ipc/util.c |  25 ++-
>  ipc/util.h |  22 --
>  5 files changed, 211 insertions(+), 137 deletions(-)
> 
> Thanks,
> Davidlohr
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-23 Thread Tejun Heo

On Thu, May 23, 2013 at 11:47:25AM +0200, Paolo Bonzini wrote:
> > No no, I'm not talking about it not working for the users - it's just
> > passing the commands, it of course works.  I'm doubting about it being
> > a worthy security isolation layer.  cdb filtering (of any form really)
> > has always been something on the border.  It always was something we
> > got stuck with due to lack of other immediate options.
> 
> I understood correctly then. :)  I agree it's not the best, but it's not
> completely broken either.  It has bugs, and that's what these patches
> try to fix.

The same filtering table being applied to different classes of
hardware is a software bug, but my point is that the practive
essentially entrusts non-insignificant part of security enforcement to
the hardware itself.  The variety of hardware in question is very wide
and significant portion has historically been known to be flaky.

> > I'm wondering whether combining 3 into 4 would be good enough.
> 
> No, it wouldn't.  I learnt it the hard way (by having a patch nacked :))
> at http://thread.gmane.org/gmane.linux.kernel/1311887.

Of course you can't do that by adding dangerous commands to the
existing filtering table which is allowed by default.  I'm saying
"count me out" knob could be good enough.  Neither is perfect but at
least the latter doesn't affect the default cases.

> > Hmmm?  Somebody has to give out the access rights anyway, be it a udev
> > rule or polkit.  While not having to depend on them could be nice, I
> > don't think involving userland is a big deal.  It's already involved
> > in closely related capacities anyway.
> 
> Polkit need not do anything to give out the access rights, it only has
> to just confirm the credentials.  A helper setgid-disk can consult
> polkit, open the file, pass the file descriptor back, and exit.  We
> already do something similar for tap devices.

I see.  It really just comes down to plumbing and doesn't seem like a
particularly challenging one at that.

> Yes, I understand that.  But I don't believe it is a serious concern.
> In the case of tapes, for example, you're not talking about 10 dollar
> USB pendrives whose firmware was written in fire-and-forget mode.
> Firmware bugs would be updated by the manufacturers.

This is being applied to all block devices by default and it's a part
of a vicious circle which is being reinforced by this change.  We
added flawed security mechanism to work around deficiency in software
stack.  The mechanism could be mis-used for other purposed which in
turn developed into other use cases, which then pushes the expansion
of the existing flawed mechanism.  This is a process with postive feed
back built into it.

> I cannot exclude there will _never_ be a bug like this, of course.  But
> even if there will be one, IMHO it would be exceptional enough that we
> can afford fixing it with a per-device quirk.
>
> Scanners have never had any CDB filtering done on them; I searched for
> this kind of bug, and I couldn't find any report.
> 
> What I am trying to do is to reach a compromise.  The one I'm proposing
> is to forbid reserved or vendor-specific commands, while at the same
> time opening the doors on more standardized commands.

I feel pretty uncomfortable about the direction and think we should
reach compromise from the other direction.  If VMs need raw device
access, just entrust the device to those VMs and enforce whatever
extra restriction from hypervisor side.  It sure isn't perfect but
neither is the other compromise and the risk is taken by the ones
wanting to do (relatively) exotic stuff rather than forced on all
others.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: XFS assertion from truncate. (3.10-rc2)

2013-05-23 Thread Dave Jones

On Fri, May 24, 2013 at 11:26:25AM +1000, Dave Chinner wrote:

 > You want to print the debug output if the masked value != 0.

Derp. Thanks.
Rerunning the tests now. Hopefully I'll have something in a few hours.

Dave


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Is spin_is_locked() safe to use with BUG_ON()/WARN_ON()?

2013-05-23 Thread Ryan Mallon

On 24/05/13 01:12, David Howells wrote:
> Linus Torvalds  wrote:
> 
>> We do *not* want to add some crazy "spin_is_nt_locked". We just want
>> to get rid of these idiotic debug tests.
> 
> Generally, I think you are right, though there are also some checks in
> deallocation routines that check that a spinlock is not currently held before
> releasing the memory holding it - should those be allowed to stay?  I'd be
> tempted to wrap the whole check in something, perhaps an "spin_lock_uninit()"
> and move the check to a header file.  Would this be useful for lockdep or
> anything like that?

lockdep has lockdep_assert_held(), which might be what you want. Though
it looks like it possibly also has the false positive issues on SMP?

~Ryan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Suggestion] fs/namespace.c: the direct cause of the warning for: "ida_remove called for id=0 which is not allocated" with mnt_release_group_id()

2013-05-23 Thread Chen Gang

On 05/24/2013 07:18 AM, Andrew Morton wrote:
> On Tue, 14 May 2013 17:06:30 +0800 Chen Gang  wrote:
> 
>> > After call collect_mounts(), then call drop_collected_mounts(), it will
>> > report an warning: "ida_remove called for id=0 which is not allocated"
>> > (one sample is audit_add_tree_rule() in kernel/audit_tree.c).
>> > 
>> > The direct cause (maybe also be the root cause):
>> >   collect_mounts() passs 'CL_PRIVATE' to copy_tree() -> clone_mnt().
>> >   it will set "mnt->mnt_group_id = 0" in clone_mnt().
>> >   when drop_collected_mounts() -> mnt_release_group_id(), 
>> > 'mnt->mnt_group_id == 0'.
> I expect this patch also addresses the bug.  Can you please review and test 
> it?
> 

I am sure, after apply the patch below, it will be no warning again (fix
the direct cause).

Originally, I want to fix this bug just like the patch below, but I am
not quite sure whether it will hide the real root cause.


Thanks.

> 
> From: Takashi Iwai 
> Subject: vfs: fix invalid ida_remove() call
> 
> When the group id of a shared mount is not allocated, the umount still
> tries to call mnt_release_group_id(), which eventually hits a kernel
> warning at ida_remove() spewing a message like:
> 
>   ida_remove called for id=0 which is not allocated.
> 
> This patch fixes the bug simply checking the group id in the caller.
> 
> Signed-off-by: Takashi Iwai 
> Reported-by: Cristian Rodr�guez 
> Cc: Al Viro 
> Signed-off-by: Andrew Morton 
> ---
> 
>  fs/pnode.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff -puN fs/pnode.c~vfs-fix-invalid-ida_remove-call fs/pnode.c
> --- a/fs/pnode.c~vfs-fix-invalid-ida_remove-call
> +++ a/fs/pnode.c
> @@ -83,7 +83,8 @@ static int do_make_slave(struct mount *m
>   if (peer_mnt == mnt)
>   peer_mnt = NULL;
>   }
> - if (IS_MNT_SHARED(mnt) && list_empty(&mnt->mnt_share))
> + if (mnt->mnt_group_id && IS_MNT_SHARED(mnt) &&
> + list_empty(&mnt->mnt_share))
>   mnt_release_group_id(mnt);
>  
>   list_del_init(&mnt->mnt_share);
> _
> 
> 
> 


-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: XFS assertion from truncate. (3.10-rc2)

2013-05-23 Thread Dave Chinner

On Thu, May 23, 2013 at 08:49:06PM -0400, Dave Jones wrote:
> On Fri, May 24, 2013 at 08:30:38AM +1000, Dave Chinner wrote:
> 
>  > >  > Which I put just before the assert that is firing on your machine.
>  > >  > 
>  > >  > And, obviously, it isn't firing on mine and obviously shouldn't be 
> firing on a
>  > >  > mask of 0xa068.
>  > > 
>  > > With this, I get a spew of these when I start a kernel build..
>  > > 
>  > > [  964.378690] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x880221b5d970 path /davej/tmp/ccN2RrM5.s
>  > > [  964.651927] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x88020ff80b90 path /davej/tmp/ccB1Cdmo.s
>  > > [  964.867444] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x8802218a5bc0 path /davej/tmp/ccCUaXbG.s
>  > > [  965.102661] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x88020ffacde0 path /davej/tmp/cckMLf2X.s
>  > > [  967.743312] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a93c200, d 0x88022212c250 path /davej/tmp/ccFMkBbA.s
>  > > [  967.947154] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a93c200, d 0x8802226cc6f0 path /davej/tmp/cc5iX4SR.s
>  > > [  968.009414] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a988000, d 0x8802219f1970 path /davej/tmp/ccvWCHTZ.o
>  > > [  968.091504] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a9898c0, d 0x88022208de10 path /davej/tmp/cc9n6fnm.ld
>  > > [  968.107997] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a98dac0, d 0x880221160de0 path /davej/tmp/cc5rlvHu.le
>  > 
>  > Right, It's printing them out for every truncate that is done.
>  > Just print on the conditional that is was triggering the assert...
> 
> I'm missing something.. That's what I did ?
> 
> -   ASSERT((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
> +
> +   if ((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
> ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
> -   ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
> +   ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0) {

You want to print the debug output if the masked value != 0.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kernel/debug/kdb: code enhancement for 3 areas.

2013-05-23 Thread Chen Gang

On 05/24/2013 01:28 AM, Anton Vorontsov wrote:
> On Thu, May 16, 2013 at 07:25:46PM +0800, Chen Gang wrote:
>> > Delete waste '{' for 'case' statement.
>> > 
>> > For the return variable 'long res' in function kdb_task_state_string(),
>> > neither it matches the function return type 'unsigned long', nor is
>> > suitable as a flag variable to make 'or' operation. So use 'unsigned
>> > long' instead of 'long'.
>> > 
>> > If "case 'A'" happen in function kdb_task_state_string(), can return
>> > ~0UL directly to save the useless looping.
>> > 
>> > Signed-off-by: Chen Gang 
> If anything, these should be 3 separate patches...
> 

OK, I will separate them into 3 patches (originally, I think for trivial
cleanup patches, better let them in one patch, so can save the reviewers
and appliers time)

Excuse me, I have to do something else, so I will finish them within
this month (2013-05-31)


Thanks.
-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] dma: use platform_{get,set}_drvdata()

2013-05-23 Thread Jingoo Han

Use the wrapper functions for getting and setting the driver data using
platform_device instead of using dev_{get,set}_drvdata() with &pdev->dev,
so we can directly pass a struct platform_device.

Also, unnecessary dev_set_drvdata() is removed, because the driver core
clears the driver data to NULL after device_release or on probe failure.

Signed-off-by: Jingoo Han 
---
 drivers/dma/fsldma.c  |5 ++---
 drivers/dma/ppc4xx/adma.c |5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4fc2980..49e8fbd 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1368,7 +1368,7 @@ static int fsldma_of_probe(struct platform_device *op)
 
dma_set_mask(&(op->dev), DMA_BIT_MASK(36));
 
-   dev_set_drvdata(&op->dev, fdev);
+   platform_set_drvdata(op, fdev);
 
/*
 * We cannot use of_platform_bus_probe() because there is no
@@ -1417,7 +1417,7 @@ static int fsldma_of_remove(struct platform_device *op)
struct fsldma_device *fdev;
unsigned int i;
 
-   fdev = dev_get_drvdata(&op->dev);
+   fdev = platform_get_drvdata(op);
dma_async_device_unregister(&fdev->common);
 
fsldma_free_irqs(fdev);
@@ -1428,7 +1428,6 @@ static int fsldma_of_remove(struct platform_device *op)
}
 
iounmap(fdev->regs);
-   dev_set_drvdata(&op->dev, NULL);
kfree(fdev);
 
return 0;
diff --git a/drivers/dma/ppc4xx/adma.c b/drivers/dma/ppc4xx/adma.c
index 5d3d955..e68c51d 100644
--- a/drivers/dma/ppc4xx/adma.c
+++ b/drivers/dma/ppc4xx/adma.c
@@ -4481,7 +4481,7 @@ static int ppc440spe_adma_probe(struct platform_device 
*ofdev)
adev->dev = &ofdev->dev;
adev->common.dev = &ofdev->dev;
INIT_LIST_HEAD(&adev->common.channels);
-   dev_set_drvdata(&ofdev->dev, adev);
+   platform_set_drvdata(ofdev, adev);
 
/* create a channel */
chan = kzalloc(sizeof(*chan), GFP_KERNEL);
@@ -4594,14 +4594,13 @@ out:
  */
 static int ppc440spe_adma_remove(struct platform_device *ofdev)
 {
-   struct ppc440spe_adma_device *adev = dev_get_drvdata(&ofdev->dev);
+   struct ppc440spe_adma_device *adev = platform_get_drvdata(ofdev);
struct device_node *np = ofdev->dev.of_node;
struct resource res;
struct dma_chan *chan, *_chan;
struct ppc_dma_chan_ref *ref, *_ref;
struct ppc440spe_adma_chan *ppc440spe_chan;
 
-   dev_set_drvdata(&ofdev->dev, NULL);
if (adev->id < PPC440SPE_ADMA_ENGINES_NUM)
ppc440spe_adma_devices[adev->id] = -1;
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Kernel Oops on xfrm_output_resume

2013-05-23 Thread laurent . jourden


Hello,

I don't know how to handle this, not a lkml subscriber, just reporting 
this bug to the devs.
All 3.9.y kernels crash after about 6 to 9 hours running without error 
messages before the Oops. I'm running 3.8.13 kernel so far without problem.


Even tried 3.10-rc2, same issue.

Context : two machines running with drbd block devices mirrored over 
ipsec. No Oops with kernels < 3.9.y


[29545.764401] BUG: unable to handle kernel NULL pointer dereference at 
0020

[29545.764419] IP: [] xfrm_output_resume+0x293/0x380
[29545.764433] PGD 0
[29545.764439] Oops:  [#1] SMP
[29545.764446] Modules linked in: authenc esp4 xfrm4_mode_transport 
xt_sctp ip6t_REJECT nf_conntrack_ipv6 ip6table_raw ip6table_mangle 
ip6table_filter ip6_tables xt_TCPMSS act_police cls_basic cls_flow 
cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq 
bridge stp llc xt_statistic xt_CT xt_LOG xt_time xt_connlimit xt_realm 
xt_addrtype iptable_raw xt_comment xt_recent xt_policy xt_nat ipt_ULOG 
ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set 
nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp 
nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp 
nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip 
nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp 
nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns 
nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 
nf_conntrack_ftp xt_TPROXY nf_defrag_ipv6 nf_tproxy_core xt_tcpmss 
xt_pkttype xt_physdev xt_owner xt_NF
QUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit 
xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp 
xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables 
deflate ctr twofish_generic twofish_avx_x86_64 twofish_x86_64_3way 
twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx_x86_64 
camellia_x86_64 cast5_avx_x86_64 cast5_generic cast_common des_generic 
xcbc rmd160 crypto_null af_key xfrm_algo ipv6 drbd lru_cache btrfs xor 
zlib_deflate raid6_pq libcrc32c crc32_pclmul crc32c_intel 
ghash_clmulni_intel

[29545.765130] CPU 3
[29545.765135] Pid: 3245, comm: drbd_r_bdata Not tainted 3.9.3-051922 
#14  /DH67BL
[29545.765182] RIP: 0010:[]  [] 
xfrm_output_resume+0x293/0x380

[29545.765220] RSP: 0018:8803faa21af0  EFLAGS: 00010246
[29545.765240] RAX:  RBX: 8804071258f8 RCX: 
0770
[29545.765263] RDX: 076f RSI: 8803fe126000 RDI: 
8804071258f8
[29545.765285] RBP: 8803faa21b18 R08: 0001a8f0 R09: 
a0298313
[29545.765308] R10: ea000ff84980 R11: 88040e801300 R12: 

[29545.765330] R13: 88040728b400 R14: 88040728b43c R15: 

[29545.765353] FS:  () GS:88041f38() 
knlGS:

[29545.765388] CS:  0010 DS:  ES:  CR0: 80050033
[29545.765409] CR2: 0020 CR3: 017d2000 CR4: 
000407e0
[29545.765432] DR0:  DR1:  DR2: 

[29545.765454] DR3:  DR6: 0ff0 DR7: 
0400
[29545.765477] Process drbd_r_bdata (pid: 3245, threadinfo 
8803faa2, task 880403791000)

[29545.765513] Stack:
[29545.765527]   8804071258f8 8804071258f8 
8804077cc89c
[29545.765563]  88040751ec00 8803faa21b28 815cb96e 
8803faa21b50
[29545.765599]  815cb9a6 8804071258f8 8804071258f8 
815c0aa0

[29545.765635] Call Trace:
[29545.765652]  [] xfrm_output2+0xe/0x10
[29545.765673]  [] xfrm_output+0x36/0xe0
[29545.765694]  [] ? xfrm4_extract_output+0xd0/0xd0
[29545.765717]  [] xfrm4_output_finish+0x22/0x40
[29545.765738]  [] xfrm4_output+0x46/0x80
[29545.765760]  [] ip_local_out+0x20/0x30
[29545.765780]  [] ip_queue_xmit+0x15a/0x3d0
[29545.765803]  [] tcp_transmit_skb+0x406/0x920
[29545.765824]  [] tcp_write_xmit+0x189/0xaa0
[29545.765847]  [] ? __kmalloc_reserve.isra.53+0x37/0xa0
[29545.765870]  [] __tcp_push_pending_frames+0x29/0x90
[29545.765892]  [] tcp_send_fin+0x6a/0x190
[29545.765913]  [] tcp_shutdown+0x5e/0x70
[29545.765934]  [] inet_shutdown+0x92/0x130
[29545.765955]  [] kernel_sock_shutdown+0xb/0x10
[29545.765983]  [] drbd_free_sock+0x7d/0xb0 [drbd]
[29545.766008]  [] conn_disconnect.part.48+0x49/0x444 
[drbd]

[29545.766035]  [] ? conn_request_state+0x53/0x80 [drbd]
[29545.766061]  [] drbdd_init+0x225/0x230 [drbd]
[29545.766087]  [] drbd_thread_setup+0x58/0x130 [drbd]
[29545.766112]  [] ? drbd_open+0xa0/0xa0 [drbd]
[29545.766135]  [] kthread+0xbb/0xc0
[29545.766155]  [] ? kthread_create_on_node+0x120/0x120
[29545.766178]  [] ret_from_fork+0x7c/0xb0
[29545.766199]  [] ? kthread_create_on_node+0x120/0x120
[29545.7

Re: XFS assertion from truncate. (3.10-rc2)

2013-05-23 Thread Dave Jones

On Fri, May 24, 2013 at 08:30:38AM +1000, Dave Chinner wrote:

 > >  > Which I put just before the assert that is firing on your machine.
 > >  > 
 > >  > And, obviously, it isn't firing on mine and obviously shouldn't be 
 > > firing on a
 > >  > mask of 0xa068.
 > > 
 > > With this, I get a spew of these when I start a kernel build..
 > > 
 > > [  964.378690] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a95dac0, d 0x880221b5d970 path /davej/tmp/ccN2RrM5.s
 > > [  964.651927] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a95dac0, d 0x88020ff80b90 path /davej/tmp/ccB1Cdmo.s
 > > [  964.867444] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a95dac0, d 0x8802218a5bc0 path /davej/tmp/ccCUaXbG.s
 > > [  965.102661] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a95dac0, d 0x88020ffacde0 path /davej/tmp/cckMLf2X.s
 > > [  967.743312] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a93c200, d 0x88022212c250 path /davej/tmp/ccFMkBbA.s
 > > [  967.947154] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a93c200, d 0x8802226cc6f0 path /davej/tmp/cc5iX4SR.s
 > > [  968.009414] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a988000, d 0x8802219f1970 path /davej/tmp/ccvWCHTZ.o
 > > [  968.091504] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a9898c0, d 0x88022208de10 path /davej/tmp/cc9n6fnm.ld
 > > [  968.107997] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
 > > 0x88020a98dac0, d 0x880221160de0 path /davej/tmp/cc5rlvHu.le
 > 
 > Right, It's printing them out for every truncate that is done.
 > Just print on the conditional that is was triggering the assert...

I'm missing something.. That's what I did ?

-   ASSERT((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
+
+   if ((mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
-   ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0);
+   ATTR_KILL_PRIV|ATTR_TIMES_SET)) == 0) {
+   struct dentry *d = d_find_alias(VFS_I(ip));
+   char buf[MAXPATHLEN];
+   char *ptr;
+
+   memset(buf, 0, MAXPATHLEN);
+   ptr = dentry_path(d, buf, MAXPATHLEN);
+   xfs_warn(mp, "%s: mask 0x%x, masked 0x%x ii 0x%p, d 0x%p path 
%s",
+   __func__, mask,
+   (mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
+   ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
+   ATTR_KILL_PRIV|ATTR_TIMES_SET)),
+   ip, d, ptr);
+   dput(d);
+   }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] x86/iommu: use bit structures for context_entry

2013-05-23 Thread ZhenHua


Use lo and hi for clear, may run faster than memset.

But it is not a big problem.  Never mind.

Thanks
ZhenHua

On 05/24/2013 08:36 AM, Joe Perches wrote:

On Fri, 2013-05-24 at 08:22 +0800, Li, Zhen-Hua wrote:

There is a structure named context_entry used by intel iommu, and there
are some bit operations on it. Use bit structure may make these operations
easy.

[]

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c

[]

@@ -221,55 +221,28 @@ get_context_addr_from_root(struct root_entry *root)
   * 8-23: domain id
   */
  struct context_entry {
-   u64 lo;
-   u64 hi;
+   union {
+   struct {
+   u64 present:1,

why use struct and union at all?

Why not just:

struct context_entry {
u64 present:1,
fpd:1,
translation_type:2,
reserved_l:8,
asr:52;
u64 aw:3,
avail:4,
reserved_h1:1,
did:16,
reserved_h2:40;
};


@@ -743,7 +716,8 @@ static void clear_context_table(struct intel_iommu *iommu, 
u8 bus, u8 devfn)

[]

if (context) {
-   context_clear_entry(&context[devfn]);
+   context[devfn].lo = 0;
+   context[devfn].hi = 0;

memset(&context[devfn], 0, sizeof(...))




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] x86/iommu: use bit structures for context_entry

2013-05-23 Thread Joe Perches

On Fri, 2013-05-24 at 08:22 +0800, Li, Zhen-Hua wrote:
> There is a structure named context_entry used by intel iommu, and there
> are some bit operations on it. Use bit structure may make these operations
> easy.
[]
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
[]
> @@ -221,55 +221,28 @@ get_context_addr_from_root(struct root_entry *root)
>   * 8-23: domain id
>   */
>  struct context_entry {
> - u64 lo;
> - u64 hi;
> + union {
> + struct {
> + u64 present:1,

why use struct and union at all?

Why not just:

struct context_entry {
u64 present:1,
fpd:1,
translation_type:2,
reserved_l:8,
asr:52;
u64 aw:3,
avail:4,
reserved_h1:1,
did:16,
reserved_h2:40;
}; 

> @@ -743,7 +716,8 @@ static void clear_context_table(struct intel_iommu 
> *iommu, u8 bus, u8 devfn)
[]
>   if (context) {
> - context_clear_entry(&context[devfn]);
> + context[devfn].lo = 0;
> + context[devfn].hi = 0;

memset(&context[devfn], 0, sizeof(...))


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] x86/iommu: fix dma pte address size error

2013-05-23 Thread Li, Zhen-Hua

In Intel Vt-D specs, Chapter 9.3 Page-Table Entry,
The size of ADDR(address) field is 12:51, but the function dma_pte_addr
 treats it as 12:63.

Signed-off-by: Li, Zhen-Hua 
---
 drivers/iommu/intel-iommu.c   |4 ++--
 include/linux/dma_remapping.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b4f0e28..c6d2847 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -311,10 +311,10 @@ static inline void dma_set_pte_prot(struct dma_pte *pte, 
unsigned long prot)
 static inline u64 dma_pte_addr(struct dma_pte *pte)
 {
 #ifdef CONFIG_64BIT
-   return pte->val & VTD_PAGE_MASK;
+   return pte->val & DMA_PTE_MASK;
 #else
/* Must have a full atomic 64-bit read */
-   return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
+   return  __cmpxchg64(&pte->val, 0ULL, 0ULL) & DMA_PTE_MASK;
 #endif
 }
 
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 57c9a8a..7a1e212 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -16,6 +16,8 @@
 #define DMA_PTE_WRITE (2)
 #define DMA_PTE_LARGE_PAGE (1 << 7)
 #define DMA_PTE_SNP (1 << 11)
+#define DMA_PTE_ADD_LENGTH (40)
+#define DMA_PTE_MASK   u64)1 << DMA_PTE_ADD_LENGTH) - 1) << VTD_PAGE_SHIFT)
 
 #define CONTEXT_TT_MULTI_LEVEL 0
 #define CONTEXT_TT_DEV_IOTLB   1
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] x86/iommu: use bit structures for context_entry

2013-05-23 Thread Li, Zhen-Hua

There is a structure named context_entry used by intel iommu, and there
are some bit operations on it. Use bit structure may make these operations
easy.

Signed-off-by: Li, Zhen-Hua 
---
 drivers/iommu/intel-iommu.c |   88 +++
 1 file changed, 31 insertions(+), 57 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b4f0e28..ae10471 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -221,55 +221,28 @@ get_context_addr_from_root(struct root_entry *root)
  * 8-23: domain id
  */
 struct context_entry {
-   u64 lo;
-   u64 hi;
+   union {
+   struct {
+   u64 present:1,
+   fpd:1,
+   translation_type:2,
+   reserved_l:8,
+   asr:52;
+   };
+   u64 lo;
+   };
+   union {
+   struct {
+   u64 aw:3,
+   avail:4,
+   reserved_h1:1,
+   did:16,
+   reserved_h2:40;
+   };
+   u64 hi;
+   };
 };
 
-static inline bool context_present(struct context_entry *context)
-{
-   return (context->lo & 1);
-}
-static inline void context_set_present(struct context_entry *context)
-{
-   context->lo |= 1;
-}
-
-static inline void context_set_fault_enable(struct context_entry *context)
-{
-   context->lo &= (((u64)-1) << 2) | 1;
-}
-
-static inline void context_set_translation_type(struct context_entry *context,
-   unsigned long value)
-{
-   context->lo &= (((u64)-1) << 4) | 3;
-   context->lo |= (value & 3) << 2;
-}
-
-static inline void context_set_address_root(struct context_entry *context,
-   unsigned long value)
-{
-   context->lo |= value & VTD_PAGE_MASK;
-}
-
-static inline void context_set_address_width(struct context_entry *context,
-unsigned long value)
-{
-   context->hi |= value & 7;
-}
-
-static inline void context_set_domain_id(struct context_entry *context,
-unsigned long value)
-{
-   context->hi |= (value & ((1 << 16) - 1)) << 8;
-}
-
-static inline void context_clear_entry(struct context_entry *context)
-{
-   context->lo = 0;
-   context->hi = 0;
-}
-
 /*
  * 0: readable
  * 1: writable
@@ -727,7 +700,7 @@ static int device_context_mapped(struct intel_iommu *iommu, 
u8 bus, u8 devfn)
ret = 0;
goto out;
}
-   ret = context_present(&context[devfn]);
+   ret = context->present;
 out:
spin_unlock_irqrestore(&iommu->lock, flags);
return ret;
@@ -743,7 +716,8 @@ static void clear_context_table(struct intel_iommu *iommu, 
u8 bus, u8 devfn)
root = &iommu->root_entry[bus];
context = get_context_addr_from_root(root);
if (context) {
-   context_clear_entry(&context[devfn]);
+   context[devfn].lo = 0;
+   context[devfn].hi = 0;
__iommu_flush_cache(iommu, &context[devfn], \
sizeof(*context));
}
@@ -1573,7 +1547,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain, int segment,
if (!context)
return -ENOMEM;
spin_lock_irqsave(&iommu->lock, flags);
-   if (context_present(context)) {
+   if (context->present) {
spin_unlock_irqrestore(&iommu->lock, flags);
return 0;
}
@@ -1623,7 +1597,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain, int segment,
}
}
 
-   context_set_domain_id(context, id);
+   context->did = id;
 
if (translation != CONTEXT_TT_PASS_THROUGH) {
info = iommu_support_dev_iotlb(domain, segment, bus, devfn);
@@ -1635,15 +1609,15 @@ static int domain_context_mapping_one(struct 
dmar_domain *domain, int segment,
 * AGAW value supported by hardware. And ASR is ignored by hardware.
 */
if (unlikely(translation == CONTEXT_TT_PASS_THROUGH))
-   context_set_address_width(context, iommu->msagaw);
+   context->aw = iommu->msagaw;
else {
-   context_set_address_root(context, virt_to_phys(pgd));
-   context_set_address_width(context, iommu->agaw);
+   context->asr = virt_to_phys(pgd) >> VTD_PAGE_SHIFT;
+   context->aw  = iommu->agaw;
}
 
-   context_set_translation_type(context, translation);
-   context_set_fault_enable(context);
-   context_set_present(context);
+   context->translation_type = translation;
+   context->fpd = 0;
+   context->present = 1;
domain_flush_cac

[PATCH] mutex: do not unnecessarily deal with waiters

2013-05-23 Thread Davidlohr Bueso

From: Davidlohr Bueso 

Upon entering the slowpath, we immediately attempt to acquire the lock
by checking if it is already unlocked. If we are lucky enough that this
is the case, then we don't need to deal with any waiter related logic.

Furthermore any checks for an empty wait_list are unnecessary as we
already know that count is non-negative and hence no one is waiting for
the lock.

Move the count check and xchg calls to be done before any waiters are
setup - including waiter debugging. Upon failure to acquire the lock,
the xchg sets the counter to 0, instead of -1 as it was originally.
This can be done here since we set it back to -1 right at the beginning
of the loop so other waiters are woken up when the lock is released.

When tested on a 8-socket (80 core) system against a vanilla 3.10-rc1
kernel, this patch provides some small performance benefits (+2-6%).
While it could be considered in the noise level, the average percentages
were stable across multiple runs and no performance regressions were seen.
Two big winners, for small amounts of users (10-100), were the short and
compute workloads had a +19.36% and +%15.76% in jobs per minute.
  
Signed-off-by: Davidlohr Bueso 
---
 kernel/mutex.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/mutex.c b/kernel/mutex.c
index ad53a66..a8cd741 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -306,7 +306,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
owner = ACCESS_ONCE(lock->owner);
if (owner && !mutex_spin_on_owner(lock, owner)) {
mspin_unlock(MLOCK(lock), &node);
-   break;
+   goto slowpath;
}
 
if ((atomic_read(&lock->count) == 1) &&
@@ -314,8 +314,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_acquired(&lock->dep_map, ip);
mutex_set_owner(lock);
mspin_unlock(MLOCK(lock), &node);
-   preempt_enable();
-   return 0;
+   goto done;
}
mspin_unlock(MLOCK(lock), &node);
 
@@ -326,7 +325,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
 * the owner complete.
 */
if (!owner && (need_resched() || rt_task(task)))
-   break;
+   goto slowpath;
 
/*
 * The cpu_relax() call is a compiler barrier which forces
@@ -340,6 +339,14 @@ slowpath:
 #endif
spin_lock_mutex(&lock->wait_lock, flags);
 
+   /* once more, can we acquire the lock? */
+   if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(&lock->count, 0) == 1)) {
+   lock_acquired(&lock->dep_map, ip);
+   mutex_set_owner(lock);
+   spin_unlock_mutex(&lock->wait_lock, flags);
+   goto done;
+   }
+
debug_mutex_lock_common(lock, &waiter);
debug_mutex_add_waiter(lock, &waiter, task_thread_info(task));
 
@@ -347,9 +354,6 @@ slowpath:
list_add_tail(&waiter.list, &lock->wait_list);
waiter.task = task;
 
-   if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(&lock->count, -1) == 1))
-   goto done;
-
lock_contended(&lock->dep_map, ip);
 
for (;;) {
@@ -363,7 +367,7 @@ slowpath:
 * other waiters:
 */
if (MUTEX_SHOW_NO_WAITER(lock) &&
-  (atomic_xchg(&lock->count, -1) == 1))
+   (atomic_xchg(&lock->count, -1) == 1))
break;
 
/*
@@ -388,9 +392,8 @@ slowpath:
spin_lock_mutex(&lock->wait_lock, flags);
}
 
-done:
+   /* got the lock - cleanup and rejoice! */
lock_acquired(&lock->dep_map, ip);
-   /* got the lock - rejoice! */
mutex_remove_waiter(lock, &waiter, current_thread_info());
mutex_set_owner(lock);
 
@@ -399,10 +402,9 @@ done:
atomic_set(&lock->count, 0);
 
spin_unlock_mutex(&lock->wait_lock, flags);
-
debug_mutex_free_waiter(&waiter);
+done:
preempt_enable();
-
return 0;
 }
 
-- 
1.7.11.7




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas

2013-05-23 Thread Andrew Morton

On Wed, 15 May 2013 07:46:36 -0500 Cliff Wickman  wrote:

> Certain tests in walk_page_range() (specifically split_huge_page_pmd())
> assume that all the mapped PFN's are backed with page structures. And this is
> not usually true for VM_PFNMAP areas. This can result in panics on kernel
> page faults when attempting to address those page structures.
> 
> There are a half dozen callers of walk_page_range() that walk through
> a task's entire page table (as N. Horiguchi pointed out). So rather than
> change all of them, this patch changes just walk_page_range() to ignore 
> VM_PFNMAP areas.
> 
> The logic of hugetlb_vma() is moved back into walk_page_range(), as we
> want to test any vma in the range.
> 
> VM_PFNMAP areas are used by:
> - graphics memory manager   gpu/drm/drm_gem.c
> - global reference unit sgi-gru/grufile.c
> - sgi special memorychar/mspec.c
> - and probably several out-of-tree modules

What are your thoughts on the urgency/scheduling of this fix?


(Just to be irritating: "When writing a changelog, please describe the
end-user-visible effects of the bug, so that others can more easily
decide which kernel version(s) should be fixed, and so that downstream
kernel maintainers can more easily work out whether this patch will fix
a problem which they or their customers are observing.")
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OOPS in perf_mmap_close()

2013-05-23 Thread Vince Weaver

On Thu, 23 May 2013, Peter Zijlstra wrote:

> On Thu, May 23, 2013 at 10:10:36AM -0400, Vince Weaver wrote:
> > 
> > I can confirm your patch avoids the oops on my machine.
> > 
> > It does lead to interesting behavior if I run the sample program
> > multiple times (with added printfs):
> > 
> > vince@core2:~$ ./perf_mmap_close_bug 
> > mmap1=0x7f06a6e9
> > mmap2=0x7f06a6e7f000
> > vince@core2:~$ ./perf_mmap_close_bug 
> > mmap1=0x7f878a138000
> > mmap2=0x7f878a127000
> > vince@core2:~$ ./perf_mmap_close_bug 
> > mmap1=0x
> > Error opening fd2 Invalid argument
> > 
> > and then it never successfully completes again.  Is this unexpected 
> > behavior?  
> 
> Sounds weird to me, I'll see if I can reproduce/understand.

I tracked this down in case you haven't already.

The problem is that in the kernel patched 
with your patch locked_vm is getting decremented twice in the sample code 
and going negative.  I'm not sure why this isn't a problem until the 
third time through.  Here are my crude debug printk 
results from kernel/events/core.c

[   28.684862] user_extra: 17 user_lock_limit: 129
[   28.698458] user_locked: 17 locked_vm: 0 user_extra 17
[   28.713853] locked: 0 locked_vm: 0 pinned_vm: 0 extra: 0 lock_limit: 16
[   28.733728] perf_mmap: locked_vm: 17
[   28.744509] mmap_close: locked_vm=0
[   28.754939] mmap_close: locked_vm=-17
[   29.472741] user_extra: 17 user_lock_limit: 129
[   29.486332] user_locked: 0 locked_vm: -17 user_extra 17
[   29.501996] locked: 0 locked_vm: 0 pinned_vm: 0 extra: 0 lock_limit: 16
[   29.521874] perf_mmap: locked_vm: 0
[   29.532400] mmap_close: locked_vm=-17
[   29.543352] mmap_close: locked_vm=-34
[   30.028236] user_extra: 17 user_lock_limit: 129
[   30.041835] user_locked: -17 locked_vm: -34 user_extra 17
[   30.058018] extra: -275 user_locked: -17 user_lock_limit: 258
[   30.075232] locked: -275 locked_vm: 0 pinned_vm: 0 extra: -275 lock_limit: 16

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: move proc mount options out of pid_namespace

2013-05-23 Thread Stephen Mell

On Thursday, May 23, 2013 18:20:57 Gu Zheng wrote:

> Here it'll create a new proc sb instance which holds the same context as the 
> old ones
> each time we mount proc though in the same PID namespace, won't it?
I believe so. But this is the point, right? They won't be identical if 
different mount options are used, I don't think.

> Here the pre-check seems needless.
Is that new with my patch, or has it always been needless?

Thanks,
Stephen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] msm: iommu: fix leak and invalid access

2013-05-23 Thread Andrew Morton

On Wed, 15 May 2013 12:47:03 +0800 Libo Chen  
wrote:

> It fixes two obvious problems:
> 1. We have registered msm_iommu_driver first, and need unregister it when
> registered msm_iommu_ctx_driver fail

yup, that's a bug.

> 2. We don`t need to kfree drvdata before kzalloc successful

The code's OK at present - kfree(NULL) is legal.

However I suppose the patch cleans things up a little bit, however it
missed a couple of things:



From: Andrew Morton 
Subject: drivers-iommu-msm_iommu_devc-fix-leak-and-invalid-access-fix

remove now-unneeded initialization of ctx_drvdata, remove unneeded braces

Cc: David Brown 
Cc: David Woodhouse 
Cc: James Hogan 
Cc: Libo Chen 
Cc: Libo Chen 
Signed-off-by: Andrew Morton 
---

 drivers/iommu/msm_iommu_dev.c |   13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff -puN 
drivers/iommu/msm_iommu_dev.c~drivers-iommu-msm_iommu_devc-fix-leak-and-invalid-access-fix
 drivers/iommu/msm_iommu_dev.c
--- 
a/drivers/iommu/msm_iommu_dev.c~drivers-iommu-msm_iommu_devc-fix-leak-and-invalid-access-fix
+++ a/drivers/iommu/msm_iommu_dev.c
@@ -289,22 +289,19 @@ static int msm_iommu_ctx_probe(struct pl
 {
struct msm_iommu_ctx_dev *c = pdev->dev.platform_data;
struct msm_iommu_drvdata *drvdata;
-   struct msm_iommu_ctx_drvdata *ctx_drvdata = NULL;
+   struct msm_iommu_ctx_drvdata *ctx_drvdata;
int i, ret;
-   if (!c || !pdev->dev.parent) {
+
+   if (!c || !pdev->dev.parent)
return -EINVAL;
-   }
 
drvdata = dev_get_drvdata(pdev->dev.parent);
-
-   if (!drvdata) {
+   if (!drvdata)
return -ENODEV;
-   }
 
ctx_drvdata = kzalloc(sizeof(*ctx_drvdata), GFP_KERNEL);
-   if (!ctx_drvdata) {
+   if (!ctx_drvdata)
return -ENOMEM;
-   }
 
ctx_drvdata->num = c->num;
ctx_drvdata->pdev = pdev;
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Suggestion] fs/namespace.c: the direct cause of the warning for: "ida_remove called for id=0 which is not allocated" with mnt_release_group_id()

2013-05-23 Thread Andrew Morton

On Tue, 14 May 2013 17:06:30 +0800 Chen Gang  wrote:

> After call collect_mounts(), then call drop_collected_mounts(), it will
> report an warning: "ida_remove called for id=0 which is not allocated"
> (one sample is audit_add_tree_rule() in kernel/audit_tree.c).
> 
> The direct cause (maybe also be the root cause):
>   collect_mounts() passs 'CL_PRIVATE' to copy_tree() -> clone_mnt().
>   it will set "mnt->mnt_group_id = 0" in clone_mnt().
>   when drop_collected_mounts() -> mnt_release_group_id(), 'mnt->mnt_group_id 
> == 0'.

I expect this patch also addresses the bug.  Can you please review and test it?


From: Takashi Iwai 
Subject: vfs: fix invalid ida_remove() call

When the group id of a shared mount is not allocated, the umount still
tries to call mnt_release_group_id(), which eventually hits a kernel
warning at ida_remove() spewing a message like:

  ida_remove called for id=0 which is not allocated.

This patch fixes the bug simply checking the group id in the caller.

Signed-off-by: Takashi Iwai 
Reported-by: Cristian Rodríguez 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
---

 fs/pnode.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN fs/pnode.c~vfs-fix-invalid-ida_remove-call fs/pnode.c
--- a/fs/pnode.c~vfs-fix-invalid-ida_remove-call
+++ a/fs/pnode.c
@@ -83,7 +83,8 @@ static int do_make_slave(struct mount *m
if (peer_mnt == mnt)
peer_mnt = NULL;
}
-   if (IS_MNT_SHARED(mnt) && list_empty(&mnt->mnt_share))
+   if (mnt->mnt_group_id && IS_MNT_SHARED(mnt) &&
+   list_empty(&mnt->mnt_share))
mnt_release_group_id(mnt);
 
list_del_init(&mnt->mnt_share);
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor

2013-05-23 Thread Russell King - ARM Linux

On Thu, May 23, 2013 at 10:54:26AM -0700, Stephen Boyd wrote:
> On 05/15/13 12:38, Stephen Boyd wrote:
> > On 05/08/13 14:47, Stephen Boyd wrote:
> >> From: Brian Swetland 
> >>
> >> Currently v7 CPUs with an MIDR that has no bits set in the range
> >> [16:12] will be detected as old ARM CPUs with no caches and so
> >> the cache will never be turned on during decompression. ARM's
> >> Cortex chips have an 0xC in the range [16:12] so they never match
> >> this entry, but Qualcomm's Scorpion and Krait processors never
> >> set these bits to anything besides 0 so they always match.
> >>
> >> Skip this entry if we've compiled in support for v7 CPUs. This
> >> allows kernel decompression to happen nearly instantly instead of
> >> taking over 20 seconds.
> >>
> >> Signed-off-by: Brian Swetland 
> >> [sboyd: Clarified and extended commit text]
> >> Signed-off-by: Stephen Boyd 
> >> ---
> > Ping?
> 
> Russell, shall I add this to the patch tracker?

Yes please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7] arm: use built-in byte swap function

2013-05-23 Thread Russell King - ARM Linux

On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
> 
>text  data bss dec hex filename
> 2840310123932   61960 3026202  2e2d1a vmlinux-lart #orig
> 2840152123932   61960 3026044  2e2c7c vmlinux-lart #builtin-bswap
> 
> 6473120314840 5616016 12403976 bd4508 vmlinux-mxs #orig
> 6472586314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap
> 
> 7419872318372  379556 8117800  7bde28 vmlinux-imx_v6_v7 #orig
> 7419170318364  379556 8117090  7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
> 
> Signed-off-by: Kim Phillips 
> Reviewed-by: Nicolas Pitre 
> Acked-by: David Woodhouse 
> ---
> resending as v6 appears to have fallen though the cracks.  Russell?

Please put it in the patch system (otherwise I do drop patches.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/PCI: setup data may be in highmem

2013-05-23 Thread H. Peter Anvin

On 05/22/2013 02:43 AM, Matt Fleming wrote:
> From: Matt Fleming 
> 
> pcibios_add_device() assumes that the physical addresses stored in
> setup_data are accessible via the direct kernel mapping, and that
> calling phys_to_virt() is valid. This isn't guaranteed to be true on x86
> where the direct mapping range is much smaller than on x86-64.
> 
> Calling phys_to_virt() on a highmem address results in the following,
> 

Bjorn: yours or mine?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] staging/silicom/bypasslib: Reformat comments

2013-05-23 Thread Joe Perches

On Thu, 2013-05-23 at 13:51 -0700, Lisa Nguyen wrote:
> Resolved the C99 comment style issue by reformatting existing comments 
> to meet kernel coding standards in bp_ioctl.h
[]
> diff --git a/drivers/staging/silicom/bypasslib/bp_ioctl.h 
> b/drivers/staging/silicom/bypasslib/bp_ioctl.h
[]
> -#define BP_CAP   0x01//BIT_0
[...]
> +#define BP_CAP   0x01/* BIT_0 */

Better to use the more self-documenting

#define BP_CAP  BIT(0)

etc...  with out any comment at all.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] net: mv643xx_eth: proper initialization for Kirkwood SoCs

2013-05-23 Thread Sebastian Hesselbarth


On 05/23/2013 08:40 PM, Jason Cooper wrote:

On Thu, May 23, 2013 at 11:53:57AM -0600, Jason Gunthorpe wrote:

On Thu, May 23, 2013 at 01:23:39PM -0400, Jason Cooper wrote:

Shouldn't it rather be

compatible = "marvell,kirkwood-eth", "marvell,orion-eth";


Not sure about orion-eth?


Jason, Jason,

sorry I didn't came back to this conversation earlier. I already
reworked the patch to rely on 
of_device_is_compatible(.."marvell,kirkwood-eth"..). This is a

kirkwood only thing as other Orions cannot do clock gating or
retain critcal register content (Dove). I will stick with orion-eth
for all other and maybe introduce new compatible strings (and new
fixes) as soon as issues surface.


I'm inclined to go with of_machine_is_compatible() since the only
concrete difference we know is that the tweak is needed on kirkwood and
nowhere else.


But there is a larger problem here then just this one bit.

The PSC1 register must be set properly for the board layout, and today
we rely on the bootloader to set it. In fact, even with Sebastian's
change the ethernet port won't work without bootloader
intervention. The PortReset bit should also be cleared by the driver
(and it is only present on some variants of this IP block,
apparently).


Actually, fixing modular scenarios is only for the sake of multiarch
someday. I don't see the point in running current kernel without eth
compiled in _on a NAS SoC_ ;)

On Dockstar which I tested, clearing CLK125_BYPASS_EN to make it work
after clock gating, it might be a coincidence that bootloader's PSC1
setup matches reset value here - so please test the patch on other
Kirkwood boards also.

But, as long as no other issue arise, I will not start to modifiy
mv643xx_eth out of the blue. I has been working for ages and breaking
PPC is not my intention. There are other things David Miller already
requested to get fixed and honestly I even thought about a fresh start
for it. Maybe I'll come back to it when barebox gets it's driver
someday.


We know that some Marvell SOC's wack the ethernet registers when they
clock gate, and the flip of Clk125Bypass is another symptom of this
general problem.


Which SoCs except Kirkwood? I cannot reproduce any of this behavior on
Dove - and from what I can see from the FS of Orion5x or MV78x00 there
are no clock gating registers.


So, long term, the PSC1 must be fully set by the driver, based on DT
information describing the board (eg RGMII/MII/1000Base-X [SFP] Phy
type), and the layout of this register seems to vary on a SOC by SOC
basis.


Agree, but I tend to not go at it now. mv643xx_eth has never set up that
registers and actually it never connects anything else than GMII phy (or 
no phy at all). The latter is easy but the for the other, I like to

give up that brain-dead multi-device driver and stick with one device
for both shared and up to three ports. From what I can see from e.g.
ixgbe or any other multi-port eth drivers they all attach the network
device to a single (pci) device.


Thus, I think it is appropriate to call this variant of the eth IP
'marvell,kirkwood-eth' which indicates that the register block follows
the kirkwood manual and the PSC1 register specifically has the
kirkwood layout.


Ok, so mv643xx_eth would match both "marvell,orion-eth" and
"marvell,kirkwood-eth", then write to PSC1 iff it sees a node matching
"marvell,kirkwood-eth".  I'm not too keen on that, however, the matching
of the machine doesn't look to good, either.


I didn't choose "marvell,mv643xx-eth" for two reasons
(a) The DT layout is slightly different with phy-handle instead of phy
and marvell prefixed properties. Choosing a compatible string that
matches any PPC compatible string will cause driver racing with sysdev
code to set up platform_data.

(b) I chose to name the controller "orion-eth" and the port
"orion-eth-port" .. PPC has "mv64360-eth" for the port and some 
"mv64360-eth-block" or "-group" for the controller. IMHO not intuitive,

but it just a name anyway.


Perhaps a better answer is to add a boolean, "marvell,kirkwood_psc1" and
check for that?

Or, marvell,psc1_reset =<0xWWXXYYZZ>;


For the _long_ run: Exploit either already present phy properties for
MII and friends or introduce new marvell prefixed .. but not misuse DT
for register values here. Each SoC should setup mv643xx_eth properly,
but that should be based on a clean approach _and_ enough people willing
to test that.

I just have a Dockstar and Topkick which is running 24/7. I didn't even
check if only 6281 suffers from it or also 6282 or maybe only some
revisions of 6281. This patch is a fix, nothing more nothing
less. If you have Kirkwoods around, please test if it suffers from
loosing the MAC address and if it works after insmod with the fix
installed.


The question is what other Marvell SOCs have the same PSC1 layout as
kirkwood?


I think marvell,psc1_reset =<>; gives us the most flexibility in
accurately describing the hardware.


IMHO using that is just another workar

Re: [f2fs-dev] [PATCH 3/4] f2fs: return proper error from start_gc_thread

2013-05-23 Thread Namjae Jeon

2013/5/23, Jason Hrycay :
> On 5/23/2013 8:58 AM, Namjae Jeon wrote:
>> From: Namjae Jeon 
>>
>> when there is an error from kthread_run, then return proper error
>> rather than returning -ENOMEM.
>>
>> Signed-off-by: Namjae Jeon 
>> Signed-off-by: Amit Sahrawat 
>> ---
>>  fs/f2fs/gc.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index 25b083c..03d5ba1 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -105,7 +105,7 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
>>  if (IS_ERR(gc_th->f2fs_gc_task)) {
>>  kfree(gc_th);
>
> gc_th is free'd here, save off PTR_ERR result to avoid use-after-free?
Yes, I will change this patch.
Thanks for review :)
>
>>  sbi->gc_thread = NULL;
>> -return -ENOMEM;
>> +return PTR_ERR(gc_th->f2fs_gc_task);
>>  }
>>  return 0;
>>  }
>>
>
> --
> Jason Hrycay
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: XFS assertion from truncate. (3.10-rc2)

2013-05-23 Thread Dave Chinner

On Thu, May 23, 2013 at 02:49:48PM -0400, Dave Jones wrote:
> On Thu, May 23, 2013 at 07:54:54AM +1000, Dave Chinner wrote:
> 
>  > Gah, I've got not idea what the hell I was smoking yesterday
>  > afternoon. 0x2000 is actually ATTR_FILE, and 0x8000 is ATTR_OPEN.
>  > 
>  > So a mask of 0xa068 is correct and valid from the open path, and
>  > 0x2068 is just file from the truncate path.
>  > 
>  > But, neither of those should trigger that assert. indeed, on a test
>  > kernel (3.10-rc2 based):
>  > 
>  > # echo I need a new drug > /mnt/scr/bah/blah/black/sheep/foo
>  > [  296.742990] XFS (vdb): xfs_setattr_size: mask 0xa068, masked # 0x0 ii 
> 0x88003e6297c0, d 0x88003e5b9cb0 path /bah/blah/black/sheep/foo
>  > #
>  > 
>  > And there's not assert failure. Indeed, the "masked # 0x0" is what
>  > the assert is checking.
>  > 
>  > And yeah, path output works. Trick for anyone who doesn't read the
>  > code closely - the buffer is filled from the end backwards, and the
>  > start of the path is the return variable. So, the above code is:
>  > 
>  >{
>  >struct dentry *d = d_find_alias(VFS_I(ip));
>  >char buf[MAXPATHLEN];
>  >char *ptr;
>  > 
>  >memset(buf, 0, MAXPATHLEN);
>  >ptr = dentry_path(d, buf, MAXPATHLEN);
>  >xfs_warn(mp, "%s: mask 0x%x, masked 0x%x ii 0x%p, d 0x%p path 
> %s",
>  >__func__, mask,
>  >(mask & (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_ATIME|ATTR_ATIME_SET|
>  >ATTR_MTIME_SET|ATTR_KILL_SUID|ATTR_KILL_SGID|
>  >ATTR_KILL_PRIV|ATTR_TIMES_SET)),
>  >ip, d, ptr);
>  >dput(d);
>  >}
>  > 
>  > Which I put just before the assert that is firing on your machine.
>  > 
>  > And, obviously, it isn't firing on mine and obviously shouldn't be firing 
> on a
>  > mask of 0xa068.
> 
> With this, I get a spew of these when I start a kernel build..
> 
> [  964.378690] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x880221b5d970 path /davej/tmp/ccN2RrM5.s
> [  964.651927] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x88020ff80b90 path /davej/tmp/ccB1Cdmo.s
> [  964.867444] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x8802218a5bc0 path /davej/tmp/ccCUaXbG.s
> [  965.102661] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a95dac0, d 0x88020ffacde0 path /davej/tmp/cckMLf2X.s
> [  967.743312] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a93c200, d 0x88022212c250 path /davej/tmp/ccFMkBbA.s
> [  967.947154] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a93c200, d 0x8802226cc6f0 path /davej/tmp/cc5iX4SR.s
> [  968.009414] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a988000, d 0x8802219f1970 path /davej/tmp/ccvWCHTZ.o
> [  968.091504] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a9898c0, d 0x88022208de10 path /davej/tmp/cc9n6fnm.ld
> [  968.107997] XFS (sda2): xfs_setattr_size: mask 0xa068, masked 0x0 ii 
> 0x88020a98dac0, d 0x880221160de0 path /davej/tmp/cc5rlvHu.le

Right, It's printing them out for every truncate that is done.
Just print on the conditional that is was triggering the assert...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH RFC] sched/rt: preserve global runtime/period ratio in do_balance_runtime()

2013-05-23 Thread Peter Boonstoppel

I think my patch was aiming at enabling rt_runtime borrowing, while preserving 
the global invariant of not allowing 100% allocation of time to rt_rq. 
Disabling RT_RUNTIME_SHARE definitely addresses the safety mechanism concern.

However, I'd like to understand the original goal of this rt_runtime borrowing 
a little bit better. At a high level it seems to me that bandwidth allocation 
(with possible borrowing of unused bandwidth) and providing a safety mechanism 
against rogue rt threads are two independent goals, which just happen to share 
the same mechanism. And if you'd want both, you'd have to provision some amount 
of bandwidth that cannot be borrowed.

Other than that I second Mike's change for flipping the default.

Peter

From: Mike Galbraith [bitbuc...@online.de]
Sent: Wednesday, May 22, 2013 1:46 AM
To: Peter Zijlstra
Cc: Peter Boonstoppel; Ingo Molnar; linux-kernel@vger.kernel.org; Paul Walmsley
Subject: Re: [PATCH RFC] sched/rt: preserve global runtime/period ratio in 
do_balance_runtime()

On Wed, 2013-05-22 at 10:16 +0200, Peter Zijlstra wrote:

> I thought we already had a knob for that; RT_RUNTIME_SHARE. And as Mike
> said, we might consider flipping the default on that.

sched,rt: disable rt_runtime borrowing by default

Make the default RT_RUNTIME_SHARE setting reflect the most common
throttle role, that of safety mechanism to protect the box.

Signed-off-by: Mike Galbraith 

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..0945d38 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -57,7 +57,7 @@ SCHED_FEAT(NONTASK_POWER, true)
 SCHED_FEAT(TTWU_QUEUE, true)

 SCHED_FEAT(FORCE_SD_OVERLAP, false)
-SCHED_FEAT(RT_RUNTIME_SHARE, true)
+SCHED_FEAT(RT_RUNTIME_SHARE, false)
 SCHED_FEAT(LB_MIN, false)

 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore

2013-05-23 Thread Andrew Morton

On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke  
wrote:

> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

More reviewing and testing, please.


From: Andrew Morton 
Subject: vmcore-support-mmap-on-proc-vmcore-fix

use min(), switch to conventional error-unwinding approach

Cc: Atsushi Kumagai 
Cc: HATAYAMA Daisuke 
Cc: KOSAKI Motohiro 
Cc: Lisa Mitchell 
Cc: Vivek Goyal 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
---

 fs/proc/vmcore.c |   27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix 
fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix
+++ a/fs/proc/vmcore.c
@@ -218,9 +218,7 @@ static int mmap_vmcore(struct file *file
if (start < elfcorebuf_sz) {
u64 pfn;
 
-   tsz = elfcorebuf_sz - start;
-   if (size < tsz)
-   tsz = size;
+   tsz = min(elfcorebuf_sz - (size_t)start, size);
pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
vma->vm_page_prot))
@@ -236,15 +234,11 @@ static int mmap_vmcore(struct file *file
if (start < elfcorebuf_sz + elfnotes_sz) {
void *kaddr;
 
-   tsz = elfcorebuf_sz + elfnotes_sz - start;
-   if (size < tsz)
-   tsz = size;
+   tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
kaddr = elfnotes_buf + start - elfcorebuf_sz;
if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
-   kaddr, tsz)) {
-   do_munmap(vma->vm_mm, vma->vm_start, len);
-   return -EAGAIN;
-   }
+   kaddr, tsz))
+   goto fail;
size -= tsz;
start += tsz;
len += tsz;
@@ -257,16 +251,12 @@ static int mmap_vmcore(struct file *file
if (start < m->offset + m->size) {
u64 paddr = 0;
 
-   tsz = m->offset + m->size - start;
-   if (size < tsz)
-   tsz = size;
+   tsz = min_t(size_t, m->offset + m->size - start, size);
paddr = m->paddr + start - m->offset;
if (remap_pfn_range(vma, vma->vm_start + len,
paddr >> PAGE_SHIFT, tsz,
-   vma->vm_page_prot)) {
-   do_munmap(vma->vm_mm, vma->vm_start, len);
-   return -EAGAIN;
-   }
+   vma->vm_page_prot))
+   goto fail;
size -= tsz;
start += tsz;
len += tsz;
@@ -277,6 +267,9 @@ static int mmap_vmcore(struct file *file
}
 
return 0;
+fail:
+   do_munmap(vma->vm_mm, vma->vm_start, len);
+   return -EAGAIN;
 }
 
 static const struct file_operations proc_vmcore_operations = {
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [regression, bisected] x86: efi: Pass boot services variable info to runtime code

2013-05-23 Thread Russ Anderson

On Thu, May 23, 2013 at 12:58:01PM +0100, Matt Fleming wrote:
> On Wed, 22 May, at 11:27:47AM, Russ Anderson wrote:
> > [6.062157] EFI Variables Facility v0.08 2004-May-17
> > [6.067731] BUG: unable to handle kernel paging request at 
> > 7ca95b10
> > [6.075519] IP: [] 0x88007dbf213f
> 
> This is a bit of a head scratcher. Could you paste the EFI memmap
> entries in your dmesg for the regions that cover 0x7ca95b10 and
> 0x7dbf2140?  My guess would be that they're EFI runtime code regions,
> which would at least explain why we seem to be executing code in the
> direct mapping region (0x8800).
> 
> Are you booting via the EFI boot stub?

Interesting data point.  The failure is on a rhel7/grub2 root.
The identical kernel on a rhel6/grub root boots.  So maybe
grub2 brings out the failure?  I suspect Fedora19/grub2 on
EFI should hit the problem (for someone looking to reproduce
it).

In both cases the kernel boot line options are the same.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc  r...@sgi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory

2013-05-23 Thread Andrew Morton

On Thu, 23 May 2013 14:25:30 +0900 HATAYAMA Daisuke  
wrote:

> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> ...
>
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char 
> __user *buffer,
>   return acc;
>   }
>  
> + /* Read Elf note segment */
> + if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> + void *kaddr;
> +
> + tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> + if (buflen < tsz)
> + tsz = buflen;

We have min().

>
> ...
>
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +char **notes_buf, size_t *notes_sz)
> +{
> + int i, nr_ptnote=0, rc=0;
> + char *tmp;
> + Elf64_Ehdr *ehdr_ptr;
> + Elf64_Phdr phdr;
> + u64 phdr_sz = 0, note_off;
> +
> + ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> + rc = update_note_header_size_elf64(ehdr_ptr);
> + if (rc < 0)
> + return rc;
> +
> + rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> + if (rc < 0)
> + return rc;
> +
> + *notes_sz = roundup(phdr_sz, PAGE_SIZE);
> + *notes_buf = vzalloc(*notes_sz);

I think this gets leaked in a number of places.

> + if (!*notes_buf)
> + return -ENOMEM;
> +
> + rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> + if (rc < 0)
> + return rc;
> +
>   /* Prepare merged PT_NOTE program header. */
>   phdr.p_type= PT_NOTE;
>   phdr.p_flags   = 0;
>   note_off = sizeof(Elf64_Ehdr) +
>   (ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> - phdr.p_offset  = note_off;
> + phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>   phdr.p_vaddr   = phdr.p_paddr = 0;
>   phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>   phdr.p_align   = 0;

Please review and test:

From: Andrew Morton 
Subject: vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix

use min(), fix error-path vzalloc() leaks

Cc: Atsushi Kumagai 
Cc: HATAYAMA Daisuke 
Cc: KOSAKI Motohiro 
Cc: Lisa Mitchell 
Cc: Vivek Goyal 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
---

 fs/proc/vmcore.c |   16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff -puN 
fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix
 fs/proc/vmcore.c
--- 
a/fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix
+++ a/fs/proc/vmcore.c
@@ -142,9 +142,7 @@ static ssize_t read_vmcore(struct file *
 
/* Read ELF core header */
if (*fpos < elfcorebuf_sz) {
-   tsz = elfcorebuf_sz - *fpos;
-   if (buflen < tsz)
-   tsz = buflen;
+   tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
return -EFAULT;
buflen -= tsz;
@@ -161,9 +159,7 @@ static ssize_t read_vmcore(struct file *
if (*fpos < elfcorebuf_sz + elfnotes_sz) {
void *kaddr;
 
-   tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
-   if (buflen < tsz)
-   tsz = buflen;
+   tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
if (copy_to_user(buffer, kaddr, tsz))
return -EFAULT;
@@ -179,9 +175,7 @@ static ssize_t read_vmcore(struct file *
 
list_for_each_entry(m, &vmcore_list, list) {
if (*fpos < m->offset + m->size) {
-   tsz = m->offset + m->size - *fpos;
-   if (buf

Re: [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial

2013-05-23 Thread Andrew Morton

On Thu, 23 May 2013 14:25:24 +0900 HATAYAMA Daisuke  
wrote:

> We want to allocate ELF note segment buffer on the 2nd kernel in
> vmalloc space and remap it to user-space in order to reduce the risk
> that memory allocation fails on system with huge number of CPUs and so
> with huge ELF note segment that exceeds 11-order block size.
> 
> Although there's already remap_vmalloc_range for the purpose of
> remapping vmalloc memory to user-space, we need to specify user-space
> range via vma. Mmap on /proc/vmcore needs to remap range across
> multiple objects, so the interface that requires vma to cover full
> range is problematic.
> 
> This patch introduces remap_vmalloc_range_partial that receives
> user-space range as a pair of base address and size and can be used
> for mmap on /proc/vmcore case.
> 
> remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
> 
> ...
>
> +int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long 
> uaddr,
> + void *kaddr, unsigned long size)
>  {
>   struct vm_struct *area;
> - unsigned long uaddr = vma->vm_start;
> - unsigned long usize = vma->vm_end - vma->vm_start;
>  
> - if ((PAGE_SIZE-1) & (unsigned long)addr)
> + size = PAGE_ALIGN(size);
> +
> + if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
> + ((PAGE_SIZE-1) & (unsigned long)kaddr))
>   return -EINVAL;

hm, that's ugly.


Why don't we do this:

From: Andrew Morton 
Subject: include/linux/mm.h: add PAGE_ALIGNED() helper

To test whether an address is aligned to PAGE_SIZE.

Cc: HATAYAMA Daisuke 
Cc: "Eric W. Biederman" , 
Signed-off-by: Andrew Morton 
---

 include/linux/mm.h |3 +++
 1 file changed, 3 insertions(+)

diff -puN include/linux/mm.h~a include/linux/mm.h
--- a/include/linux/mm.h~a
+++ a/include/linux/mm.h
@@ -52,6 +52,9 @@ extern unsigned long sysctl_admin_reserv
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
 
+/* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE 
*/
+#define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)addr, PAGE_SIZE)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way
_


(I'd have thought we already had such a thing, but we don't seem to)


Then this:

From: Andrew Morton 
Subject: vmalloc-introduce-remap_vmalloc_range_partial-fix

use PAGE_ALIGNED()

Cc: Atsushi Kumagai 
Cc: HATAYAMA Daisuke 
Cc: KOSAKI Motohiro 
Cc: Lisa Mitchell 
Cc: Vivek Goyal 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
---

 mm/vmalloc.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff -puN 
include/linux/vmalloc.h~vmalloc-introduce-remap_vmalloc_range_partial-fix 
include/linux/vmalloc.h
diff -puN mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix 
mm/vmalloc.c
--- a/mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix
+++ a/mm/vmalloc.c
@@ -1476,10 +1476,9 @@ static void __vunmap(const void *addr, i
if (!addr)
return;
 
-   if ((PAGE_SIZE-1) & (unsigned long)addr) {
-   WARN(1, KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
+   if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
+   addr));
return;
-   }
 
area = remove_vm_area(addr);
if (unlikely(!area)) {
@@ -2170,8 +2169,7 @@ int remap_vmalloc_range_partial(struct v
 
size = PAGE_ALIGN(size);
 
-   if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
-   ((PAGE_SIZE-1) & (unsigned long)kaddr))
+   if (!PAGE_ALIGNED(uaddr) || !PAGE_ALIGNED(kaddr))
return -EINVAL;
 
area = find_vm_area(kaddr);
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list

2013-05-23 Thread Andrew Morton

On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke  
wrote:

> Treat memory chunks referenced by PT_LOAD program header entries in
> page-size boundary in vmcore_list. Formally, for each range [start,
> end], we set up the corresponding vmcore object in vmcore_list to
> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> 
> This change affects layout of /proc/vmcore.

Well, changing a userspace interface is generally unacceptable because
it can break existing userspace code.

If you think the risk is acceptable then please do explain why.  In
great detail!

> The gaps generated by the
> rearrangement are newly made visible to applications as
> holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
> start] and [end, roundup(end, PAGE_SIZE)].
> 
> Suppose variable m points at a vmcore object in vmcore_list, and
> variable phdr points at the program header of PT_LOAD type the
> variable m corresponds to. Then, pictorially:
> 
>   m->offset+---+
>| hole  |
> phdr->p_offset =   +---+
>   m->offset + (paddr - start)  |   |\
>| kernel memory | phdr->p_memsz
>|   |/
>+---+
>| hole  |
>   m->offset + m->size  +---+
> 
> where m->offset and m->offset + m->size are always page-size aligned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH, RFC 19/22] staging/solo6x10: depend on CONFIG_FONTS

2013-05-23 Thread Arnd Bergmann

On Thursday 23 May 2013, Geert Uytterhoeven wrote:
> Sorry for only noticing this now, but CONFIG_FONTS is not about font support.
> It's about allowing the user to override the default list of builtin fonts.
> I know it's a bad name, but changing this would break make oldconfig.
> Or is this allowed?
> 
> My fix for the solo6x10 build breakage is
> http://marc.info/?l=linux-kernel&m=136861809223875

Right, that sounds like a better solution.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment

2013-05-23 Thread Andrew Morton

On Thu, 23 May 2013 14:25:07 +0900 HATAYAMA Daisuke  
wrote:

> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> ...
>
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>   }
>  
>   /* Read in all elf headers. */
> - elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> + elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * 
> sizeof(Elf64_Phdr);
> + elfcorebuf_sz = elfcorebuf_sz_orig;
> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +get_order(elfcorebuf_sz_orig));
>   if (!elfcorebuf)
>   return -ENOMEM;
>   addr = elfcorehdr_addr;
> - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> + rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>   if (rc < 0) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> +get_order(elfcorebuf_sz_orig));
>   return rc;
>   }
>  
>   /* Merge all PT_NOTE headers into one. */
>   rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>   if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> +get_order(elfcorebuf_sz_orig));
>   return rc;
>   }
>   rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>   &vmcore_list);
>   if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> +get_order(elfcorebuf_sz_orig));
>   return rc;
>   }
> - set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> + set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>   return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>   }
>  
>   /* Read in all elf headers. */
> - elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> - elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> + elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * 
> sizeof(Elf32_Phdr);
> + elfcorebuf_sz = elfcorebuf_sz_orig;
> + elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +get_order(elfcorebuf_sz_orig));
>   if (!elfcorebuf)
>   return -ENOMEM;
>   addr = elfcorehdr_addr;
> - rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> + rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>   if (rc < 0) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> +get_order(elfcorebuf_sz_orig));
>   return rc;
>   }
>  
>   /* Merge all PT_NOTE headers into one. */
>   rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>   if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> +get_order(elfcorebuf_sz_orig));
>   return rc;
>   }
>   rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>   &vmcore_list);
>   if (rc) {
> - kfree(elfcorebuf);
> + free_pages((unsigned long)elfcorebuf,
> +get_order(elfcorebuf_sz_orig));
>   return rc;
>   }
> - set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> + set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>   return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>   return rc;
>  
>   /* Determine vmcore size. */
> - vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> + vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);

Re: [PATCH v2 04/11] pwm: Add Renesas TPU PWM driver

2013-05-23 Thread Thierry Reding

On Wed, Apr 24, 2013 at 10:50:09PM +0200, Laurent Pinchart wrote:
> The Timer Pulse Unit (TPU is a 4-channels 16-bit timer used to generate
> waveforms. This driver exposes PWM functions through the PWM API for
> other drivers to use.
> 
> The code is loosely based on the leds-renesas-tpu driver by Magnus Damm
> and the TPU PWM driver shipped in the Armadillo EVA 800 kernel sources.
> 
> Signed-off-by: Laurent Pinchart 
> Tested-by: Simon Horman 

Sorry for taking so long to look at this...

> diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> index 0e0bfa0..d57ef66 100644
> --- a/drivers/pwm/Kconfig
> +++ b/drivers/pwm/Kconfig
> @@ -115,6 +115,13 @@ config PWM_PXA
> To compile this driver as a module, choose M here: the module
> will be called pwm-pxa.
>  
> +config PWM_RENESAS_TPU
> + tristate "Renesas TPU PWM support"
> + depends on ARCH_SHMOBILE
> + help
> +   This driver exposes the Timer Pulse Unit (TPU) PWM controller found
> +   in Renesas chips through the PWM API.

If the driver can be built, it'd be good to include a short sentence
saying what it will be called, just like for the other drivers.

> diff --git a/drivers/pwm/pwm-renesas-tpu.c b/drivers/pwm/pwm-renesas-tpu.c
[...]
> +struct tpu_pwm_device {
> + bool timer_on;  /* Whether the timer is running */
> +
> + struct tpu_pwm_channel_data *pdata;
> + struct tpu_device *tpu;
> + unsigned int channel;   /* Channel number in the TPU */

I don't think you need this. The pwm_device.hwpwm field carries the same
information.

> +
> + unsigned int prescaler;
> + u16 period;
> + u16 duty;
> +};
> +
> +struct tpu_device {
> + struct platform_device *pdev;
> + struct pwm_chip chip;
> + spinlock_t lock;
> +
> + void __iomem *base;
> + struct clk *clk;
> +
> + struct tpu_pwm_device pwms[TPU_CHANNEL_MAX];
> +};

Can't you reuse the infrastructure built into the PWM subsystem? You can
associate chip-specific data with each PWM device. You can look at the
pwm-atmel-tcb and pwm-bfin drivers for usage examples. In a nutshell you
hook the .request() function and setup the driver-specific structure and
associate them with the PWM using pwm_set_chip_data().

This has the advantage that you don't need the pwms array in tpu_device
and you also don't need TPU_CHANNEL_MAX because only the pwm_chip.npwm
field needs to contain the number of channels.

> +static int tpu_pwm_timer_start(struct tpu_pwm_device *pwm)
> +{
> + int ret;
> +
> + if (!pwm->timer_on) {
> + /* Wake up device and enable clock. */
> + pm_runtime_get_sync(&pwm->tpu->pdev->dev);
> + ret = clk_prepare_enable(pwm->tpu->clk);
> + if (ret) {
> + dev_err(&pwm->tpu->pdev->dev, "cannot enable clock\n");
> + return ret;
> + }
> + pwm->timer_on = true;
> + }
> +
> + /* Make sure the channel is stopped, as we need to reconfigure it
> +  * completely. First drive the pin to the inactive state to avoid
> +  * glitches.
> +  */

Can you please stick to the standard coding style for block comments?
That is:

/*
 * Make sure...
 * ...
 * glitches.
 */

> +/* 
> -
> + * PWM API
> + */

I find these separators more distracting than helpful. But if you have a
strong preference for them I can live with it.

> +
> +static int tpu_pwm_request(struct pwm_chip *chip, struct pwm_device *_pwm)
> +{
> + struct tpu_device *tpu = to_tpu_device(chip);
> + struct tpu_pwm_device *pwm = &tpu->pwms[_pwm->hwpwm];
> +
> + return pwm->pdata == NULL ? -EPROBE_DEFER : 0;
> +}

If you use the same method as the pwm-bfin or pwm-atmel-tcb drivers, you
could initialize each PWM channel here and associated them with the PWM
device using pwm_set_chip_data() (as I hinted at earlier).

> +static int tpu_pwm_config(struct pwm_chip *chip, struct pwm_device *_pwm,
> +   int duty_ns, int period_ns)
[...]
> + if (period_ns <= 0 || duty_ns < 0 || duty_ns > period_ns)
> + return -EINVAL;

The core already performs these checks so they can be dropped.

> + /* Pick a prescaler to avoid overflowing the counter.
> +  * TODO: Pick the highest acceptable prescaler.
> +  */
> + clk_prepare_enable(tpu->clk);

Shouldn't you check the return value here?

> + clk_rate = clk_get_rate(tpu->clk);
> + clk_disable_unprepare(pwm->tpu->clk);

And does the clock really need to be enabled to obtain the rate?

> + for (prescaler = 0; prescaler < ARRAY_SIZE(prescalers); ++prescaler) {
> + period = clk_rate / prescalers[prescaler]
> +/ (NSEC_PER_SEC / period_ns);

I prefer to have the operator on the previous line, as in:

period = clk_rate / prescalers[prescaler] /

Re: [RFC 3/8] mfd:syscon: Introduce claim/read/write/release APIs

2013-05-23 Thread Arnd Bergmann

On Monday 20 May 2013, Srinivas KANDAGATLA wrote:
> On 17/05/13 15:36, Arnd Bergmann wrote:
>
> Some of the drivers like Ethernet already provide higher level
> interfaces via callbacks. We did implement such a callbacks per each SOC
> in non-DT case, and ended up having code duplicated for each SOC.
> 
> On the other hand using device trees to describe the HW
> configuration(sysconfs) made more sense and got rid of SOC specific
> callbacks.

I'm sure it's possible to reduce the duplication without going all
the way to describing every bit in DT.

> > For drivers that are essentially just wrappers around sysconf,
> > I would make them one driver per SoC and use a low-level interface
> > but still hardcode the offsets in the driver instead of using DT
> > to find the registers.
> > 
> > The pinctrl and reset drivers are examples of this.
> 
> In pinctrl bindings case, I think we could do better job by replacing
> the existing bindings of sysconfs for a group of banks with just two
> integer offsets. This would mean that, we can still use the a common
> driver across the SOCs.
>
> And w.r.t to reset, we are planning on using sysconf based
> reset-controller API sitting underneath the reset-controller subsystem.
> Passing the information from device trees would be much clear and
> flexible than adding new driver per/SOC.

Ok

> >> 2> The infrastructure should protect the claimed registers from
> >> over-writing by other drivers. We do this by claim-read/write-release
> >> style API.
> > 
> > I don't understand this part. Is it about atomicity of accesses to
> > 32-bit registers when you only want to change a bit? That is something
> > the regmap interface handles already.
> 
> I forget to mention a important point here, the protection is at bit
> level for the sysconfs. Some of sysconfs have bits shared across IPs, so
> protecting them while the IP is still using it is what we are trying to
> achieve with sysconf-claim/release apis.
> While it may be overkill, but In past it has found bugs and been helpful
> with a meaning full debug output.

Can you give an example of something that is shared across multiple IPs?

Are those always done in the way that you have e.g. the ethernet pinctrl
in the same register as the usb pinctrl, or could you have an ethernet
pinctrl setting mixed together with the usb clock setting for instance?

If the hardware has been designed in a sensible way rather than just
grown by randomly adding bits in open spaces, I would assume that it's
possible to have high-level interfaces built around ranges of registers
rather than arbitrary subsets of registers.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3 v3] dcache: make it more scalable on large system

2013-05-23 Thread Waiman Long


On 05/23/2013 05:42 AM, Dave Chinner wrote:

On Wed, May 22, 2013 at 09:37:25PM -0400, Waiman Long wrote:

Change log:

v2->v3
  - Fix the RCU lock problem found by Al Viro.
  - Rebase the code to the latest v3.10-rc1 linux mainline.
  - Remove patch 4 which may be problematic if the dentry is deleted.
  - Rerun performance measurement using 3.10-rc1 kernel.

v1->v2
  - Include performance improvement in the AIM7 benchmark results because
of this patch.
  - Modify dget_parent() to avoid taking the lock, if possible, to further
improve AIM7 benchmark results.

During some perf-record sessions of the kernel running the high_systime
workload of the AIM7 benchmark, it was found that quite a large portion
of the spinlock contention was due to the perf_event_mmap_event()
function itself. This perf kernel function calls d_path() which,
in turn, call path_get() and dput() indirectly. These 3 functions
were the hottest functions shown in the perf-report output of
the _raw_spin_lock() function in an 8-socket system with 80 cores
(hyperthreading off) with a 3.10-rc1 kernel:

What was it I said about this patchset when you posted it to speed
up an Oracle benchmark back in february? I'll repeat:

"Nobody should be doing reverse dentry-to-name lookups in a quantity
sufficient for it to become a performance limiting factor."


Thank for the comment, but my point is that it is the d_lock contention 
is skewing the data about how much spin lock contention had actually 
happened in the workload and it makes it harder to pinpoint problem 
areas to look at. This is not about performance, it is about accurate 
representation of performance data. Ideally, we want the overhead of 
turning on perf instrumentation to be as low as possible.





And that makes whatever that tracepoint is doing utterly stupid.
Dumping out full paths in a tracepoint is hardly "low overhead", and
that tracepoint should be stomped on from a great height. Sure,
print the filename straight out of the dentry into a tracepoint,
but repeated calculating the full path (probably for the same set of
dentries) is just a dumb thing to do.

Anyway, your numbers show that a single AIM7 microbenchmark goes
better under tracing the specific mmap event that uses d_path(), but
the others are on average a little bit slower. That doesn't convince
me that this is worth doing. Indeed, what happens to performance
when you aren't tracing?

Indeed, have you analysed what makes that
microbenchmark contend so much on the dentry lock while reverse
lookups are occuring? Dentry lock contention does not necessarily
indicate a problem with the dentry locks, and without knowing why
there is contention occuring (esp. compared to the other benchmarks)
we can't really determine if this is a good solution or not...


What made it contend so much was the large number of CPUs available in 
my test system which is a 8-socket Westmere EX machines with 80 cores. 
As perf was collecting data from every core, the threads will 
unavoidably bump into each other to translate dentries back to the full 
paths. The current code only allows one CPU in the d_path() critical 
path. My patch will allow all of them to be in the critical path 
concurrently.


The upcoming Ivy-Bridge EX can have up to 15 cores per socket. So even a 
4-socket machine will have up to 60 cores or 120 virtual CPUs if 
hyperthreading is turned on.

IOWs, you need more than one microbenchmark that interacts with
some naive monitoring code to justify the complexity these locking
changes introduce
The first patch can also independently improve the performance of a 
number of AIM7 workloads including almost 7X improvement in the short 
workload. More detailed information of these types of performance 
benefit was discussed in the patch description of the first patch. I 
will try to collect more performance improvement data on other workloads 
too.


Thank for the review.
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] fs: call_usermodehelper_root helper introduced

2013-05-23 Thread Eric W. Biederman

"J. Bruce Fields"  writes:

> On Thu, May 23, 2013 at 03:55:47PM -0400, J. Bruce Fields wrote:
>> On Thu, May 23, 2013 at 09:05:26AM -0400, Jeff Layton wrote:
>> > What might help most here is to lay out a particular scenario for how
>> > you envision setting up knfsd in a container so we can ensure that it's
>> > addressed properly by whatever solution you settle on.
>
> BTW the problem I have here is that the only case I've personally had
> any interest in is using network and file namespaces to isolate nfsd's
> to make them safe to migrate across nodes of a cluster.
>
> So while the idea of making user namespaces and unprivileged knfsd and
> the rest work is really interesting and I'm happy to think about it, I'm
> not sure how feasible or useful it is.
>
> I'd therefore actually prefer just to take something like Stanislav's
> patch now and put off the problem of how to make it work correctly with
> user namespaces until we actually turn that on.  His patch fixes a real
> bug that we have now, while user-namespaced-nfsd still sounds a bit
> pie-in-the-sky to me.
>
>
> But maybe I don't understand why Eric thinks nfsd in usernamespaces is
> imminent.  Or maybe I'm missing some security problem that Stanislav's
> patch would introduce now without allowing nfsd to run in a user
> namespace.

The problem I saw is less pronounced but I think actually exists without
user namespaces as well.  In particular if we let root in the container
start knfsd in a network and mount namespace.  Then if that container
does not have certain capabilities like say CAP_MKNOD all of a sudden
we have a process running in the container with CAP_MKNOD.

I haven't had time to look at everything just yet but I don't think
solving this is particularly hard.

There are really two very simple solutions.
1) When we start knfsd in the container we create a kernel thread that
   is a child of the userspace process that started knfsd.  That kernel
   thread can then be used to launch user mode helpers.

   This uses the same code as is needed today to launch user mode
   helpers with just a different parent thread.

2) We pass enough information for the user mode helper to figure out
   what container this is all for.  File descriptors or whatever.
   Then the user mode helper outside the container using chdir and setns
   sets up the environment that the user mode helper typically expects
   and runs the process inside of the container.

Or we look at what the user mode helper is doing and realize we don't
need to run the user mode binary inside of the container.  If all it
does is query /etc/passwd for username to uid mapping for example (for
user namespaces we will probably care but not without them).

I don't think any of this is hard to implement.

I think user namespaces are imminent because after my last pass through
the code the remaining work looked pretty trivial, and as soon as the
dust settles I expect user namespaces become the common way to run code
in containers, which should greatly increase the demand for this feature
in user namespaces.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv6 01/11] clockevents: Prefer CPU local devices over global devices

2013-05-23 Thread Sören Brinkmann

On Mon, May 13, 2013 at 12:26:05PM -0700, Stephen Boyd wrote:
> On an SMP system with only one global clockevent and a dummy
> clockevent per CPU we run into problems. We want the dummy
> clockevents to be registered as the per CPU tick devices, but
> we can only achieve that if we register the dummy clockevents
> before the global clockevent or if we artificially inflate the
> rating of the dummy clockevents to be higher than the rating
> of the global clockevent. Failure to do so leads to boot
> hangs when the dummy timers are registered on all other CPUs
> besides the CPU that accepted the global clockevent as its tick
> device and there is no broadcast timer to poke the dummy
> devices.
> 
> If we're registering multiple clockevents and one clockevent is
> global and the other is local to a particular CPU we should
> choose to use the local clockevent regardless of the rating of
> the device. This way, if the clockevent is a dummy it will take
> the tick device duty as long as there isn't a higher rated tick
> device and any global clockevent will be bumped out into
> broadcast mode, fixing the problem described above.
> 
> Reported-by: Mark Rutland 
> Tested-by: Mark Rutland 
> Cc: John Stultz 
> Cc: Thomas Gleixner 
> Signed-off-by: Stephen Boyd 
Tested-by: Sören Brinkmann 

I tested this on Zynq. Zynq uses the ARM twd timer as clockevent devices when
available. When I remove the twd nodes from DT the clockevent device is
a timer common to both CPUs. Without this patch this seems to stall the
second CPU resulting in the system to hang sooner or later.

[   65.36] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   65.36]  1: (1 GPs behind) idle=56e/0/0 softirq=0/0 
[   65.36]  (detected by 0, t=6502 jiffies, g=4294966997, c=4294966996, 
q=6844)
[   65.36] Task dump for CPU 1:
[   65.36] swapper/1   R running  0 0  1 0x

With this patch everything looks fine.

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs: ecryptfs: fixed msync to flush data

2013-05-23 Thread Paul Taysom

When msync is called on a memory mapped file, that
data is not flushed to the disk.

In Linux, msync calls fsync for the file. For ecryptfs,
fsync just calls the lower level file system's fsync.
Changed the ecryptfs fsync code to call filemap_write_and_wait
before calling the lower level fsync.

Addresses the problem described in http://crbug.com/239536

Signed-off-by: Paul Taysom 
---
 fs/ecryptfs/file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index 201f0a0..16f509d 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -295,6 +295,7 @@ static int ecryptfs_release(struct inode *inode, struct 
file *file)
 static int
 ecryptfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 {
+   filemap_write_and_wait(file->f_mapping);
return vfs_fsync(ecryptfs_file_to_lower(file), datasync);
 }
 
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 >

1 - 100 of 578 matches

Mail list logo