Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-08-10 Thread Joseph Glanville
Hi Kent, Tejun

On 9 August 2012 09:57, Kent Overstreet  wrote:
>> Also, how was this tested?
>
> Well, AFAICT the only request based dm target is multipath, and from the
> documentation I've seen it doesn't appear to work without multipath
> hardware, or at least I haven't seen it documented how. So, unless
> there's another user I missed it's not been tested.

Multipath can be tested quite easily with a loopback scsi target, you
don't require specialized hardware.
The easiest way to do this would probably be the built in LIO target +
open_iscsi initiator.

I haven't attempted running this current version of the patch series
but I haven't run into issues with bcache+multipath in the past.

>
>>
>> Thanks.
>>
>> --
>> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-10 Thread Greg Thelen
On Thu, Aug 09 2012, Glauber Costa wrote:

> This patch introduces infrastructure for tracking kernel memory pages to
> a given memcg. This will happen whenever the caller includes the flag
> __GFP_KMEMCG flag, and the task belong to a memcg other than the root.
>
> In memcontrol.h those functions are wrapped in inline accessors.  The
> idea is to later on, patch those with static branches, so we don't incur
> any overhead when no mem cgroups with limited kmem are being used.
>
> [ v2: improved comments and standardized function names ]
>
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> ---
>  include/linux/memcontrol.h |  79 +++
>  mm/memcontrol.c| 185 
> +
>  2 files changed, 264 insertions(+)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8d9489f..75b247e 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -21,6 +21,7 @@
>  #define _LINUX_MEMCONTROL_H
>  #include 
>  #include 
> +#include 
>  
>  struct mem_cgroup;
>  struct page_cgroup;
> @@ -399,6 +400,11 @@ struct sock;
>  #ifdef CONFIG_MEMCG_KMEM
>  void sock_update_memcg(struct sock *sk);
>  void sock_release_memcg(struct sock *sk);
> +
> +#define memcg_kmem_on 1
> +bool __memcg_kmem_new_page(gfp_t gfp, void *handle, int order);
> +void __memcg_kmem_commit_page(struct page *page, void *handle, int order);
> +void __memcg_kmem_free_page(struct page *page, int order);
>  #else
>  static inline void sock_update_memcg(struct sock *sk)
>  {
> @@ -406,6 +412,79 @@ static inline void sock_update_memcg(struct sock *sk)
>  static inline void sock_release_memcg(struct sock *sk)
>  {
>  }
> +
> +#define memcg_kmem_on 0
> +static inline bool
> +__memcg_kmem_new_page(gfp_t gfp, void *handle, int order)
> +{
> + return false;
> +}
> +
> +static inline void  __memcg_kmem_free_page(struct page *page, int order)
> +{
> +}
> +
> +static inline void
> +__memcg_kmem_commit_page(struct page *page, struct mem_cgroup *handle, int 
> order)
> +{
> +}
>  #endif /* CONFIG_MEMCG_KMEM */
> +
> +/**
> + * memcg_kmem_new_page: verify if a new kmem allocation is allowed.
> + * @gfp: the gfp allocation flags.
> + * @handle: a pointer to the memcg this was charged against.
> + * @order: allocation order.
> + *
> + * returns true if the memcg where the current task belongs can hold this
> + * allocation.
> + *
> + * We return true automatically if this allocation is not to be accounted to
> + * any memcg.
> + */
> +static __always_inline bool
> +memcg_kmem_new_page(gfp_t gfp, void *handle, int order)
> +{
> + if (!memcg_kmem_on)
> + return true;
> + if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL))
> + return true;
> + if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
> + return true;
> + return __memcg_kmem_new_page(gfp, handle, order);
> +}
> +
> +/**
> + * memcg_kmem_free_page: uncharge pages from memcg
> + * @page: pointer to struct page being freed
> + * @order: allocation order.
> + *
> + * there is no need to specify memcg here, since it is embedded in 
> page_cgroup
> + */
> +static __always_inline void
> +memcg_kmem_free_page(struct page *page, int order)
> +{
> + if (memcg_kmem_on)
> + __memcg_kmem_free_page(page, order);
> +}
> +
> +/**
> + * memcg_kmem_commit_page: embeds correct memcg in a page
> + * @handle: a pointer to the memcg this was charged against.
> + * @page: pointer to struct page recently allocated
> + * @handle: the memcg structure we charged against
> + * @order: allocation order.
> + *
> + * Needs to be called after memcg_kmem_new_page, regardless of success or
> + * failure of the allocation. if @page is NULL, this function will revert the
> + * charges. Otherwise, it will commit the memcg given by @handle to the
> + * corresponding page_cgroup.
> + */
> +static __always_inline void
> +memcg_kmem_commit_page(struct page *page, struct mem_cgroup *handle, int 
> order)
> +{
> + if (memcg_kmem_on)
> + __memcg_kmem_commit_page(page, handle, order);
> +}
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 54e93de..e9824c1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -10,6 +10,10 @@
>   * Copyright (C) 2009 Nokia Corporation
>   * Author: Kirill A. Shutemov
>   *
> + * Kernel Memory Controller
> + * Copyright (C) 2012 Parallels Inc. and Google Inc.
> + * Authors: Glauber Costa and Suleiman Souhlal
> + *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
>   * the Free Software Foundation; either version 2 of the License, or
> @@ -434,6 +438,9 @@ struct mem_cgroup *mem_cgroup_from_css(struct 
> cgroup_subsys_state *s)
>  #include 
>  
>  static bool 

Re: xtensa port maintenance

2012-08-10 Thread Marc Gauthier
Chris Zankel wrote:
> I have set up a tree on github for now, and will work close
> with Max to get his changes to Stephen's linux-next tree and
> eventually Linus' tree.
> I think it's fine to add Max as a second maintainer [...]

Thanks for helping!


Pete Delaney wrote:
> I'm afraid that doing it piecemeal has failed in the past

Proper patches work fine, was just never done.
(Can discuss off this To: list.)


> Mind anding Marc and myself?

Acting as maintainer involves submitting useful patches over
some period of time.  Right now that's Max and Chris.
(Love to do it, but have plenty as it is on my plate.)

Thanks,
-Marc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot

2012-08-10 Thread Justin Piszcz
On Fri, Aug 10, 2012 at 7:07 PM, Justin Piszcz
>
> Hi,
>
> Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem
> (60TB).
>
> The 3.4 kernel works fine.
>
> This is proven by commenting out the filesystem in /etc/fstab with
> 3.5.1, and all is OK.
>
> --
>
> Hi again,
>
> I tested with linux-3.6-rc1:
>
> The same problem, here is what I get from the strace:
>
> irectory)
> 4434  readlink("/dev", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument)
> 4434  readlink("/dev/sda1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid
> argument)
> 4434  readlink("/r1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument)
> 4434  getuid()  = 0
> 4434  geteuid() = 0
> 4434  getgid()  = 0
> 4434  getegid() = 0
> 4434  prctl(PR_GET_DUMPABLE)= 1
> 4434  lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0
> 4434  getuid()  = 0
> 4434  geteuid() = 0
> 4434  getgid()  = 0
> 4434  getegid() = 0
> 4434  prctl(PR_GET_DUMPABLE)= 1
> 4434  stat("/run", {st_mode=S_IFDIR|0755, st_size=820, ...}) = 0
> 4434  lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> 4434  open("/run/mount/utab", O_RDWR|O_CREAT, 0644) = 3
> 4434  close(3)  = 0
> 4434  mount("/dev/sda1", "/r1", "ext4", MS_MGC_VAL|MS_NOATIME, NULL
>
> --
>
> (w/ 3.6-rc1)
>
> [   89.868843] mount   R  running task0  4434   4433
> 0x0009
> [   89.868847]  880c246b7b68 816c9279 880c246b7aa8
> 880c246b7fd8
> [   89.868851]  880c246b7fd8 4000 88062720cdb0
> 880c246862d0
> [   89.868855]  000116c0 880623a863c0 880623a863c0
> 
> [   89.868855] Call Trace:
> [   89.868858]  [] ? __schedule+0x299/0x770
> [   89.868860]  [] ? __schedule+0x299/0x770
> [   89.868864]  [] ? ext4_get_group_desc+0x49/0xb0
> [   89.868868]  [] ? ext4_calculate_overhead+0x131/0x3e0
> [   89.868871]  [] ? ext4_fill_super+0x1a4b/0x28d0
> [   89.868875]  [] ? mount_bdev+0x1a1/0x1e0
> [   89.868877]  [] ? ext4_calculate_overhead+0x3e0/0x3e0
> [   89.868880]  [] ? ext4_mount+0x10/0x20
> [   89.868882]  [] ? mount_fs+0x1b/0xd0
> [   89.868885]  [] ? vfs_kern_mount+0x6f/0x110
> [   89.86]  [] ? do_kern_mount+0x4f/0x100
> [   89.868890]  [] ? do_mount+0x2fe/0x8a0
> [   89.868894]  [] ? strndup_user+0x53/0x70
> [   89.868896]  [] ? sys_mount+0x90/0xe0
> [   89.868899]  [] ? tracesys+0xd4/0xd9
>
> Justin.
>
>
>

CC: linux-ext4

Any ideas here (kernel 3.4 and below can mount 60TB ext4 no issues)
but > 3.5.1 (did not try 3.5) cannot mount the filesystem.

Justin.

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-10 Thread Xiao Guangrong
On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>>
>> The test case can be found at:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>>
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault-pfn
>> and async is not allowed, then the vm will crash.
>>
>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>> to the guest, read access is happy for readonly memslot, write access on
>> readonly memslot will cause KVM_EXIT_MMIO exit.
> 
> Memory slots whose QEMU mapping is write protected is supported
> today, as long as there are no write faults.
> 
> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> again?
> 

It is happy to map !write host memory space to the readonly memslot,
and they can coexist as well.

readonly memslot checks the write-permission by seeing slot->flags and
!write memory checks the write-permission in hva_to_pfn() function
which checks vma->flags. It is no conflict.

> The initial objective was to fix a vm crash, can you explain that
> initial problem?
>

The issue was trigged by this code:

} else {
if (async && (vma->vm_flags & VM_WRITE))
*async = true;
pfn = KVM_PFN_ERR_FAULT;
}

If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
its physical page is swapped out (or the file data does not be read in),
get_user_page_nowait will fail, above code reject to set async,
then we will get a fault pfn and async=false.

I guess this issue also exists in "QEMU write protected mapping" as
you mentioned above.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info

2012-08-10 Thread Xiao Guangrong
On 08/11/2012 02:03 AM, Marcelo Tosatti wrote:

>>  int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
>>  void *val, unsigned int bytes,
>>  struct x86_exception *exception,
>> @@ -3870,14 +3881,10 @@ int emulator_read_write(struct x86_emulate_ctxt 
>> *ctxt, unsigned long addr,
>>  return rc;
>>
>>  gpa = vcpu->mmio_fragments[0].gpa;
>> -
>>  vcpu->mmio_needed = 1;
>>  vcpu->mmio_cur_fragment = 0;
>>
>> -vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
>> -vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
>> -vcpu->run->exit_reason = KVM_EXIT_MMIO;
>> -vcpu->run->mmio.phys_addr = gpa;
>> +set_mmio_exit_info(vcpu, >mmio_fragments[0], ops->write);
>>
>>  return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
>>  }
>> @@ -5486,7 +5493,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>   */
>>  static int complete_mmio(struct kvm_vcpu *vcpu)
>>  {
>> -struct kvm_run *run = vcpu->run;
>>  struct kvm_mmio_fragment *frag;
>>  int r;
>>
>> @@ -5497,7 +5503,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>>  /* Complete previous fragment */
>>  frag = >mmio_fragments[vcpu->mmio_cur_fragment++];
>>  if (!vcpu->mmio_is_write)
>> -memcpy(frag->data, run->mmio.data, frag->len);
>> +memcpy(frag->data, vcpu->run->mmio.data, frag->len);
>>  if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
>>  vcpu->mmio_needed = 0;
>>  if (vcpu->mmio_is_write)
>> @@ -5507,12 +5513,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>>  }
>>  /* Initiate next fragment */
>>  ++frag;
>> -run->exit_reason = KVM_EXIT_MMIO;
>> -run->mmio.phys_addr = frag->gpa;
>> -if (vcpu->mmio_is_write)
>> -memcpy(run->mmio.data, frag->data, frag->len);
>> -run->mmio.len = frag->len;
>> -run->mmio.is_write = vcpu->mmio_is_write;
>> +set_mmio_exit_info(vcpu, frag, vcpu->mmio_is_write);
>>  return 0;
>>
>>  }
>> -- 
>> 1.7.7.6
> 
> IMO having a function is unnecessary (it makes it harder the code).

Okay, i will drop this patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 05/12] KVM: reorganize hva_to_pfn

2012-08-10 Thread Xiao Guangrong
On 08/11/2012 01:51 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:51:05PM +0800, Xiao Guangrong wrote:
>> We do too many things in hva_to_pfn, this patch reorganize the code,
>> let it be better readable
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  virt/kvm/kvm_main.c |  159 
>> +++
>>  1 files changed, 97 insertions(+), 62 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 26ffc87..dd01bcb 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1043,83 +1043,118 @@ static inline int check_user_page_hwpoison(unsigned 
>> long addr)
>>  return rc == -EHWPOISON;
>>  }
>>
>> -static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
>> -bool write_fault, bool *writable)
>> +/*
>> + * The atomic path to get the writable pfn which will be stored in @pfn,
>> + * true indicates success, otherwise false is returned.
>> + */
>> +static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
>> +bool write_fault, bool *writable, pfn_t *pfn)
>>  {
>>  struct page *page[1];
>> -int npages = 0;
>> -pfn_t pfn;
>> +int npages;
>>
>> -/* we can do it either atomically or asynchronously, not both */
>> -BUG_ON(atomic && async);
>> +if (!(async || atomic))
>> +return false;
>>
>> -BUG_ON(!write_fault && !writable);
>> +npages = __get_user_pages_fast(addr, 1, 1, page);
>> +if (npages == 1) {
>> +*pfn = page_to_pfn(page[0]);
>>
>> -if (writable)
>> -*writable = true;
>> +if (writable)
>> +*writable = true;
>> +return true;
>> +}
>> +
>> +return false;
>> +}
>>
>> -if (atomic || async)
>> -npages = __get_user_pages_fast(addr, 1, 1, page);
>> +/*
>> + * The slow path to get the pfn of the specified host virtual address,
>> + * 1 indicates success, -errno is returned if error is detected.
>> + */
>> +static int hva_to_pfn_slow(unsigned long addr, bool *async, bool 
>> write_fault,
>> +   bool *writable, pfn_t *pfn)
>> +{
>> +struct page *page[1];
>> +int npages = 0;
>>
>> -if (unlikely(npages != 1) && !atomic) {
>> -might_sleep();
>> +might_sleep();
>>
>> -if (writable)
>> -*writable = write_fault;
>> -
>> -if (async) {
>> -down_read(>mm->mmap_sem);
>> -npages = get_user_page_nowait(current, current->mm,
>> - addr, write_fault, page);
>> -up_read(>mm->mmap_sem);
>> -} else
>> -npages = get_user_pages_fast(addr, 1, write_fault,
>> - page);
>> -
>> -/* map read fault as writable if possible */
>> -if (unlikely(!write_fault) && npages == 1) {
>> -struct page *wpage[1];
>> -
>> -npages = __get_user_pages_fast(addr, 1, 1, wpage);
>> -if (npages == 1) {
>> -*writable = true;
>> -put_page(page[0]);
>> -page[0] = wpage[0];
>> -}
>> -npages = 1;
>> +if (writable)
>> +*writable = write_fault;
>> +
>> +if (async) {
>> +down_read(>mm->mmap_sem);
>> +npages = get_user_page_nowait(current, current->mm,
>> +  addr, write_fault, page);
>> +up_read(>mm->mmap_sem);
>> +} else
>> +npages = get_user_pages_fast(addr, 1, write_fault,
>> + page);
>> +if (npages != 1)
>> +return npages;
> 
>  * Returns number of pages pinned. This may be fewer than the number
>  * requested. If nr_pages is 0 or negative, returns 0. If no pages
>  * were pinned, returns -errno.
>  */
> int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> struct page **pages)
> 
> 
> Current behaviour is
> 
> if (atomic || async)
> npages = __get_user_pages_fast(addr, 1, 1, page);
> 
>   if (npages != 1) 
>   slow path retry;
> 
> The changes above change this, don't they?

Marcelo,

Sorry, I do not know why you thought the logic was changed, in this patch,
the logic is:

/* return true if it is successful. */
if (hva_to_pfn_fast(addr, atomic, async, write_fault, writable, ))
return pfn;

/* atomic can not go to slow path. */
if (atomic)
return KVM_PFN_ERR_FAULT;

/* get pfn by the slow path */
npages = hva_to_pfn_slow(addr, async, write_fault, writable, );
if (npages == 1)
return pfn;

/* the error-handle path. */
..


[BUGFIX PATCH][RESEND] kexec & iosapic: kexec oops when iosapic was removed

2012-08-10 Thread Hanjun Guo
Hi, all
We are working on a node hot-plug project, and IOAPIC is one of these devices to
be removed. but after IOSAPIC was removed, we use kexec to start a new kernel,
oops happended.

I reviewed the code and find out:
iosapic_remove
  iosapic_free
memset(_lists[index], 0, sizeof(iosapic_lists[0]))
  iosapic_lists[index].addr was set to 0;

and then kexec a new kernel
kexec_disable_iosapic
  iosapic_write(rte->iosapic,..)
__iosapic_write(iosapic->addr, reg, val);
  addr was set to 0 when iosapic_remove, and oops happened

here is the oops information:

Starting new kernel
kexec[11336]: Oops 8804682956800 [1]
Modules linked in: raw(N) ipv6(N) acpi_cpufreq(N) binfmt_misc(N) fuse(N) nls_iso
8859_1(N) loop(N) ipmi_si(N) ipmi_devintf(N) ipmi_msghandler(N) mca_ereport(N) s
csi_ereport(N) nic_ereport(N) pcie_ereport(N) err_transport(N) nvlist(PN) dm_mod
(N) tpm_tis(N) tpm(N) ppdev(N) tpm_bios(N) serio_raw(N) i2c_i801(N) iTCO_wdt(N)
i2c_core(N) iTCO_vendor_support(N) sg(N) ioatdma(N) igb(N) mptctl(N) dca(N) parp
ort_pc(N) parport(N) container(N) button(N) usbhid(N) hid(N) uhci_hcd(N) ehci_hc
d(N) usbcore(N) sd_mod(N) crc_t10dif(N) ext3(N) mbcache(N) jbd(N) fan(N) process
or(N) ide_pci_generic(N) ide_core(N) ata_piix(N) libata(N) mptsas(N) mptscsih(N)
 mptbase(N) scsi_transport_sas(N) scsi_mod(N) thermal(N) thermal_sys(N) hwmon(N)

Supported: Yes, External

Pid: 11336, CPU 0, comm:kexec
psr : 101009522030 ifs : 8791 ip  : []Tain
ted: P  N  (2.6.32.12_RAS_V1R3C00B011)
ip is at kexec_disable_iosapic+0x120/0x1e0
unat:  pfs : 0791 rsc : 0003
rnat:  bsps:  pr  : 65519aa6a555a659
ldrs:  ccv : ea3cf51e fpsr: 0009804c8a70033f
csd :  ssd : 
b0  : a0010004c150 b6  : a00100012620 b7  : a001cda0
f6  : 0 f7  : 1003e0200
f8  : 1003e5003 f9  : 1003e028fb97183cd
f10 : 1003ee9f380df3c548b67 f11 : 1003e00cc
r1  : a001016cf660 r2  :  r3  : 
r8  : 001009526030 r9  : a00100012620 r10 : e0010053f600
r11 : c000fec34040 r12 : e0078f76fd30 r13 : e0078f76
r14 :  r15 :  r16 : 
r17 :  r18 : 7fff r19 : 
r20 :  r21 : e0010053f590 r22 : a00100cf
r23 : 0036 r24 : e007002f8a84 r25 : 0022
r26 : e007002f8a88 r27 : 0020 r28 : 0002
r29 : a001012c8c60 r30 :  r31 : 00322e49

Call Trace:
 [] show_stack+0x80/0xa0
sp=e0078f76f8f0 bsp=e0078f761380
 [] show_regs+0x640/0x920
sp=e0078f76fac0 bsp=e0078f761328
 [] die+0x190/0x2e0
sp=e0078f76fad0 bsp=e0078f7612e8
 [] ia64_do_page_fault+0x840/0xb20
sp=e0078f76fad0 bsp=e0078f761288
 [] ia64_native_leave_kernel+0x0/0x270
sp=e0078f76fb60 bsp=e0078f761288
 [] kexec_disable_iosapic+0x120/0x1e0
sp=e0078f76fd30 bsp=e0078f761200
 [] machine_shutdown+0x110/0x140
sp=e0078f76fd30 bsp=e0078f7611c8
 [] kernel_kexec+0xd0/0x120
sp=e0078f76fd30 bsp=e0078f7611a0
 [] sys_reboot+0x480/0x4e0
sp=e0078f76fd30 bsp=e0078f761128
 [] ia64_ret_from_syscall+0x0/0x20
sp=e0078f76fe30 bsp=e0078f761120
Kernel panic - not syncing: Fatal exception
irq 69: nobody cared (try booting with the "irqpoll" option)


Signed-off-by: Hanjun Guo 
Signed-off-by: Jianguo Wu 
---
 arch/ia64/kernel/iosapic.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index ef4b5d8..11ce1ec 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -276,6 +276,9 @@ kexec_disable_iosapic(void)
vec = irq_to_vector(irq);
list_for_each_entry(rte, >rtes,
rte_list) {
+   if (rte->refcnt == NO_REF_RTE)
+   continue;
+
iosapic_write(rte->iosapic,
IOSAPIC_RTE_LOW(rte->rte_index),
IOSAPIC_MASK|vec);
-- 
1.7.6.1



.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[RFC PATCH][RESEND] Fusion MPT: disable pci device when mpt map resoures failed

2012-08-10 Thread Hanjun Guo

when probe a pci device, first we enable it, and disable it when
some error happened in the following process, because the power
state of the device is set to D0, and if MSI is disabled,
we will allocate irq and register gsi for this device in the enable process.

In function mpt_mapresources(MPT_ADAPTER *ioc), it forgot disable the
pci device when error happened, the irq and gsi will never be released.
this patch will fix it.

Signed-off-by: Hanjun Guo 
Signed-off-by: Jiang Liu 
---
 drivers/message/fusion/mptbase.c |   18 +++---
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index d99db56..fb69baa 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -1666,7 +1666,7 @@ mpt_mapresources(MPT_ADAPTER *ioc)
if (pci_request_selected_regions(pdev, ioc->bars, "mpt")) {
printk(MYIOC_s_ERR_FMT "pci_request_selected_regions() with "
"MEM failed\n", ioc->name);
-   return r;
+   goto out_pci_disable_device;
}
 
if (sizeof(dma_addr_t) > 4) {
@@ -1690,8 +1690,7 @@ mpt_mapresources(MPT_ADAPTER *ioc)
} else {
printk(MYIOC_s_WARN_FMT "no suitable DMA mask for %s\n",
ioc->name, pci_name(pdev));
-   pci_release_selected_regions(pdev, ioc->bars);
-   return r;
+   goto out_pci_release_region;
}
} else {
if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))
@@ -1704,8 +1703,7 @@ mpt_mapresources(MPT_ADAPTER *ioc)
} else {
printk(MYIOC_s_WARN_FMT "no suitable DMA mask for %s\n",
ioc->name, pci_name(pdev));
-   pci_release_selected_regions(pdev, ioc->bars);
-   return r;
+   goto out_pci_release_region;
}
}
 
@@ -1735,8 +1733,8 @@ mpt_mapresources(MPT_ADAPTER *ioc)
if (mem == NULL) {
printk(MYIOC_s_ERR_FMT ": ERROR - Unable to map adapter"
" memory!\n", ioc->name);
-   pci_release_selected_regions(pdev, ioc->bars);
-   return -EINVAL;
+   r = -EINVAL;
+   goto out_pci_release_region;
}
ioc->memmap = mem;
dinitprintk(ioc, printk(MYIOC_s_INFO_FMT "mem = %p, mem_phys = %llx\n",
@@ -1750,6 +1748,12 @@ mpt_mapresources(MPT_ADAPTER *ioc)
ioc->pio_chip = (SYSIF_REGS __iomem *)port;
 
return 0;
+
+out_pci_release_region:
+   pci_release_selected_regions(pdev, ioc->bars);
+out_pci_disable_device:
+   pci_disable_device(pdev);
+   return r;
 }
 
 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
-- 
1.7.6.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ARM: local timers: add timer support using IO mapped register

2012-08-10 Thread Cyril Chemparathy

On 8/10/2012 5:58 PM, Rohit Vaswani wrote:

The current arch_timer only support accessing through CP15 interface.
Add support for ARM processors that only support IO mapped register
interface



It looks like this patch attempts to address both (a) non-percpu arch 
timers, and (b) memory mapped arch timers in one go.  These should 
probably be broken out into two distinct logical changes.


More below...


Signed-off-by: Rohit Vaswani 
---
  .../devicetree/bindings/arm/arch_timer.txt |7 +
  arch/arm/kernel/arch_timer.c   |  259 
  2 files changed, 223 insertions(+), 43 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt 
b/Documentation/devicetree/bindings/arm/arch_timer.txt
index 52478c8..1c71799 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -14,6 +14,13 @@ The timer is attached to a GIC to deliver its per-processor 
interrupts.

  - clock-frequency : The frequency of the main counter, in Hz. Optional.

+- irq-is-not-percpu: Specify is the timer irq is *NOT* a percpu (PPI) interrupt
+  In the default case i.e without this property, the timer irq is treated as a
+  PPI interrupt. Optional.
+


The handling of non-percpu IRQs looks broken.  The code does 
(enable/disable)_percpu_irq() on IRQs that may no longer be percpu.



+- If the node address and reg is specified, the arch_timer will try to use the 
memory
+  mapped timer. Optional.
+
  Example:

timer {
diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c
index 1d0d9df..09604b7 100644
--- a/arch/arm/kernel/arch_timer.c
+++ b/arch/arm/kernel/arch_timer.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 

  #include 
@@ -29,8 +30,17 @@
  static unsigned long arch_timer_rate;
  static int arch_timer_ppi;
  static int arch_timer_ppi2;
+static int is_irq_percpu;

  static struct clock_event_device __percpu **arch_timer_evt;
+static void __iomem *timer_base;
+


Are percpu memory mapped arch timers an impossibility?


+struct arch_timer_operations {
+   void (*reg_write)(int, u32);
+   u32 (*reg_read)(int);
+   cycle_t (*get_cntpct)(void);
+   cycle_t (*get_cntvct)(void);
+};

  /*
   * Architected system timer support.
@@ -44,7 +54,29 @@ static struct clock_event_device __percpu **arch_timer_evt;
  #define ARCH_TIMER_REG_FREQ   1
  #define ARCH_TIMER_REG_TVAL   2

-static void arch_timer_reg_write(int reg, u32 val)
+/* Iomapped Register Offsets */
+#define ARCH_TIMER_CNTP_LOW_REG0x000
+#define ARCH_TIMER_CNTP_HIGH_REG   0x004
+#define ARCH_TIMER_CNTV_LOW_REG0x008
+#define ARCH_TIMER_CNTV_HIGH_REG   0x00C
+#define ARCH_TIMER_CTRL_REG0x02C
+#define ARCH_TIMER_FREQ_REG0x010
+#define ARCH_TIMER_CNTP_TVAL_REG   0x028
+#define ARCH_TIMER_CNTV_TVAL_REG   0x038
+


ARCH_TIMER_CNTV_TVAL_REG appears to be unused here.


+static void timer_reg_write_mem(int reg, u32 val)
+{
+   switch (reg) {
+   case ARCH_TIMER_REG_CTRL:
+   __raw_writel(val, timer_base + ARCH_TIMER_CTRL_REG);
+   break;
+   case ARCH_TIMER_REG_TVAL:
+   __raw_writel(val, timer_base + ARCH_TIMER_CNTP_TVAL_REG);
+   break;
+   }
+}
+


Wouldn't an array of offsets to map from ARCH_TIMER_REG_* to these 
memory mapped registers eliminate the need to switch-case your way 
through each register?



+static void timer_reg_write_cp15(int reg, u32 val)
  {
switch (reg) {
case ARCH_TIMER_REG_CTRL:
@@ -58,7 +90,28 @@ static void arch_timer_reg_write(int reg, u32 val)
isb();
  }

-static u32 arch_timer_reg_read(int reg)
+static u32 timer_reg_read_mem(int reg)
+{
+   u32 val;
+
+   switch (reg) {
+   case ARCH_TIMER_REG_CTRL:
+   val = __raw_readl(timer_base + ARCH_TIMER_CTRL_REG);
+   break;
+   case ARCH_TIMER_REG_FREQ:
+   val = __raw_readl(timer_base + ARCH_TIMER_FREQ_REG);
+   break;
+   case ARCH_TIMER_REG_TVAL:
+   val = __raw_readl(timer_base + ARCH_TIMER_CNTP_TVAL_REG);
+   break;
+   default:
+   BUG();
+   }
+
+   return val;
+}
+


Same as above.


+static u32 timer_reg_read_cp15(int reg)
  {
u32 val;

@@ -79,6 +132,103 @@ static u32 arch_timer_reg_read(int reg)
return val;
  }

+static cycle_t arch_counter_get_cntpct_mem(void)
+{
+   u32 cvall, cvalh, thigh;
+
+   do {
+   cvalh = __raw_readl(timer_base + ARCH_TIMER_CNTP_HIGH_REG);
+   cvall = __raw_readl(timer_base + ARCH_TIMER_CNTP_LOW_REG);
+   thigh = __raw_readl(timer_base + ARCH_TIMER_CNTP_HIGH_REG);
+   } while (cvalh != thigh);
+
+   return ((cycle_t) cvalh << 32) | cvall;
+}
+
+static cycle_t arch_counter_get_cntpct_cp15(void)
+{
+   u32 cvall, cvalh;

Re: [PATCH 1/6] regulator: core: Add checking n_voltages if using list_voltage() to read voltage regulators

2012-08-10 Thread Axel Lin
2012/8/10 Mark Brown :
> On Fri, Aug 10, 2012 at 08:27:32PM +0800, Axel Lin wrote:
>> 2012/8/10 Mark Brown :
>
>> > We should be failing to register these regulators in the first place, or
>> > at least complaining extremely loudly about them.
>
>> Oh. My original intention is to prevent using list_voltage() to read
>> voltage regulators for the case "n_voltages > 1" in case of both get_voltage
>> and get_voltage_sel are not implemented.
>
> Yes, I see the intention - what I'm saying is that a regulator like that
> makes no sense in the first place.

We do have such case in drivers/regulator/max1586.c

/*
 * The Maxim 1586 controls V3 and V6 voltages, but offers no way of reading back
 * the set up value.
 */
static struct regulator_ops max1586_v3_ops = {
.set_voltage_sel = max1586_v3_set_voltage_sel,
.list_voltage = regulator_list_voltage_linear,
.map_voltage = regulator_map_voltage_linear,
};

static struct regulator_ops max1586_v6_ops = {
.set_voltage_sel = max1586_v6_set_voltage_sel,
.list_voltage = regulator_list_voltage_table,
};
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [PATCH V4]Extcon: adc_jack: adc-jack driver to support 3.5 pi or simliar devices

2012-08-10 Thread anish kumar
Hello Jonathan,

Please refer the latest patch.

Thanks,
On Fri, 2012-08-10 at 22:01 +0100, Jonathan Cameron wrote:
> On 08/08/2012 02:04 AM, anish kumar wrote:
> > From: anish kumar 
> >
> > External connector devices that decides connection information based on
> > ADC values may use adc-jack device driver. The user simply needs to
> > provide a table of adc range and connection states. Then, extcon
> > framework will automatically notify others.
> 
> Couple of utterly trivial points inline.
> Otherwise looks fine to me.
> >
> > Changes in V1:
> > added Lars-Peter Clausen suggested changes:
> > Using macros to get rid of boiler plate code such as devm_kzalloc
> > and module_platform_driver.Other changes suggested are related to
> > coding guidelines.
> >
> > Changes in V2:
> > Removed some unnecessary checks and changed the way we are un-regitering
> > extcon and freeing the irq while removing.
> >
> > Changes in V3:
> > Renamed the files to comply with extcon naming.
> >
> > Changes in this version:
> > Added the cancel_work_sync during removing of driver.
> >
> > Reviewed-by: Lars-Peter Clausen 
> > Signed-off-by: anish kumar 
> > Signed-off-by: MyungJoo Ham 
> Don't these normally go in order of when they occured?
> Hence first sign offs are the authors, any acks / reviewed-bys
> after that and final sign offs for the merges.
> > ---
> >  drivers/extcon/Kconfig |5 +
> >  drivers/extcon/Makefile|1 +
> >  drivers/extcon/extcon-adc-jack.c   |  194 
> > 
> >  include/linux/extcon/extcon-adc-jack.h |   73 
> >  4 files changed, 273 insertions(+), 0 deletions(-)
> >  create mode 100644 drivers/extcon/extcon-adc-jack.c
> >  create mode 100644 include/linux/extcon/extcon-adc-jack.h
> >
> > diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig
> > index e175c8e..596e277 100644
> > --- a/drivers/extcon/Kconfig
> > +++ b/drivers/extcon/Kconfig
> > @@ -21,6 +21,11 @@ config EXTCON_GPIO
> >   Say Y here to enable GPIO based extcon support. Note that GPIO
> >   extcon supports single state per extcon instance.
> >
> > +config EXTCON_ADC_JACK
> > +tristate "ADC Jack extcon support"
> > +help
> > +  Say Y here to enable extcon device driver based on ADC values.
> > +
> >  config EXTCON_MAX77693
> > tristate "MAX77693 EXTCON Support"
> > depends on MFD_MAX77693
> > diff --git a/drivers/extcon/Makefile b/drivers/extcon/Makefile
> > index 88961b3..bc7111e 100644
> > --- a/drivers/extcon/Makefile
> > +++ b/drivers/extcon/Makefile
> > @@ -4,6 +4,7 @@
> >
> >  obj-$(CONFIG_EXTCON)   += extcon_class.o
> >  obj-$(CONFIG_EXTCON_GPIO)  += extcon_gpio.o
> > +obj-$(CONFIG_EXTCON_ADC_JACK)   += extcon-adc-jack.o
> >  obj-$(CONFIG_EXTCON_MAX77693)  += extcon-max77693.o
> >  obj-$(CONFIG_EXTCON_MAX8997)   += extcon-max8997.o
> >  obj-$(CONFIG_EXTCON_ARIZONA)   += extcon-arizona.o
> > diff --git a/drivers/extcon/extcon-adc-jack.c 
> > b/drivers/extcon/extcon-adc-jack.c
> > new file mode 100644
> > index 000..cfc8c59
> > --- /dev/null
> > +++ b/drivers/extcon/extcon-adc-jack.c
> > @@ -0,0 +1,194 @@
> > +/*
> > + * drivers/extcon/extcon-adc-jack.c
> > + *
> > + * Analog Jack extcon driver with ADC-based detection capability.
> > + *
> > + * Copyright (C) 2012 Samsung Electronics
> > + * MyungJoo Ham 
> > + *
> > + * Modified for calling to IIO to get adc by 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/**
> > + * struct adc_jack_data - internal data for adc_jack device driver
> > + * @edev- extcon device.
> > + * @cable_names - list of supported cables.
> > + * @num_cables  - size of cable_names.
> > + * @adc_condition   - list of adc value conditions.
> > + * @num_condition   - size of adc_condition.
> > + * @irq - irq number of attach/detach event (0 if not exist).
> > + * @handling_delay  - interrupt handler will schedule extcon event
> > + *  handling at handling_delay jiffies.
> > + * @handler - extcon event handler called by interrupt handler.
> > + * @chan   - iio channel being queried.
> > + */
> > +struct adc_jack_data {
> > +   struct extcon_dev edev;
> > +
> > +   const char **cable_names;
> > +   int num_cables;
> > +   struct adc_jack_cond *adc_condition;
> > +   int num_conditions;
> > +
> > +   int irq;
> > +   unsigned long handling_delay; /* in jiffies */
> > +   struct delayed_work handler;
> > +
> > +   struct iio_channel *chan;
> > +};
> > +
> > +static void adc_jack_handler(struct work_struct *work)
> > +{
> > +   struct adc_jack_data *data = container_of(to_delayed_work(work),
> > +  

[PATCH 2/3 V1] block: Fix not tracing all device plug-operation.

2012-08-10 Thread Jianpeng Ma
If process handled two or more devices,there will not be trace some
devices plug-operation.

V0-->V1
Fix a bug when insert a req to plug-list which already had the same 
request-queue, it should
used list_add not list_add_tail.

Signed-off-by: Jianpeng Ma 
Signed-off-by: Jens Axboe 
---
 block/blk-core.c |   16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7a3abc6..034f186 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1521,11 +1521,25 @@ get_rq:
struct request *__rq;
 
__rq = list_entry_rq(plug->list.prev);
-   if (__rq->q != q)
+   if (__rq->q != q) {
plug->should_sort = 1;
+   trace_block_plug(q);
+   }
+   } else {
+   struct request *__rq;
+   list_for_each_entry_reverse(__rq, >list,
+   queuelist) {
+   if (__rq->q == q) {
+   list_add(>queuelist,
+   &__rq->queuelist);
+   goto stat_acct;
+   }
+   }
+   trace_block_plug(q);
}
}
list_add_tail(>queuelist, >list);
+stat_acct:
drive_stat_acct(req, 1);
} else {
spin_lock_irq(q->queue_lock);
-- 
1.7.9.5
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

[PATCH v2 09/22] ARM: LPAE: use phys_addr_t for initrd location and size

2012-08-10 Thread Cyril Chemparathy
From: Vitaly Andrianov 

This patch fixes the initrd setup code to use phys_addr_t instead of assuming
32-bit addressing.  Without this we cannot boot on systems where initrd is
located above the 4G physical address limit.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
---
 arch/arm/mm/init.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 19ba70b..bae9d05 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -36,12 +36,13 @@
 
 #include "mm.h"
 
-static unsigned long phys_initrd_start __initdata = 0;
+static phys_addr_t phys_initrd_start __initdata = 0;
 static unsigned long phys_initrd_size __initdata = 0;
 
 static int __init early_initrd(char *p)
 {
-   unsigned long start, size;
+   phys_addr_t start;
+   unsigned long size;
char *endp;
 
start = memparse(p, );
@@ -347,14 +348,14 @@ void __init arm_memblock_init(struct meminfo *mi, struct 
machine_desc *mdesc)
 #ifdef CONFIG_BLK_DEV_INITRD
if (phys_initrd_size &&
!memblock_is_region_memory(phys_initrd_start, phys_initrd_size)) {
-   pr_err("INITRD: 0x%08lx+0x%08lx is not a memory region - 
disabling initrd\n",
-  phys_initrd_start, phys_initrd_size);
+   pr_err("INITRD: 0x%08llx+0x%08lx is not a memory region - 
disabling initrd\n",
+  (u64)phys_initrd_start, phys_initrd_size);
phys_initrd_start = phys_initrd_size = 0;
}
if (phys_initrd_size &&
memblock_is_region_reserved(phys_initrd_start, phys_initrd_size)) {
-   pr_err("INITRD: 0x%08lx+0x%08lx overlaps in-use memory region - 
disabling initrd\n",
-  phys_initrd_start, phys_initrd_size);
+   pr_err("INITRD: 0x%08llx+0x%08lx overlaps in-use memory region 
- disabling initrd\n",
+  (u64)phys_initrd_start, phys_initrd_size);
phys_initrd_start = phys_initrd_size = 0;
}
if (phys_initrd_size) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v2 21/22] ARM: keystone: enable SMP on Keystone machines

2012-08-10 Thread Cyril Chemparathy
This patch adds basic SMP support for Keystone machines.  Nothing very fancy
here, just enough to get 4 CPUs booted up.  This does not include support for
hotplug, etc.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
---
 arch/arm/Kconfig|1 +
 arch/arm/configs/keystone_defconfig |2 +
 arch/arm/mach-keystone/Makefile |1 +
 arch/arm/mach-keystone/keystone.c   |3 ++
 arch/arm/mach-keystone/keystone.h   |   23 +++
 arch/arm/mach-keystone/platsmp.c|   74 +++
 6 files changed, 104 insertions(+)
 create mode 100644 arch/arm/mach-keystone/keystone.h
 create mode 100644 arch/arm/mach-keystone/platsmp.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f1b8aa0..37b4e9c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -416,6 +416,7 @@ config ARCH_KEYSTONE
select SPARSE_IRQ
select NEED_MACH_MEMORY_H
select HAVE_SCHED_CLOCK
+   select HAVE_SMP
help
  Support for boards based on the Texas Instruments Keystone family of
  SoCs.
diff --git a/arch/arm/configs/keystone_defconfig 
b/arch/arm/configs/keystone_defconfig
index 7f2a04b..5f71e66 100644
--- a/arch/arm/configs/keystone_defconfig
+++ b/arch/arm/configs/keystone_defconfig
@@ -1,7 +1,9 @@
 CONFIG_EXPERIMENTAL=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_ARCH_KEYSTONE=y
+CONFIG_SMP=y
 CONFIG_ARM_ARCH_TIMER=y
+CONFIG_NR_CPUS=4
 CONFIG_AEABI=y
 CONFIG_HIGHMEM=y
 CONFIG_VFP=y
diff --git a/arch/arm/mach-keystone/Makefile b/arch/arm/mach-keystone/Makefile
index d4671d5..3f6b8ab 100644
--- a/arch/arm/mach-keystone/Makefile
+++ b/arch/arm/mach-keystone/Makefile
@@ -1 +1,2 @@
 obj-y  := keystone.o
+obj-$(CONFIG_SMP)  += platsmp.o
diff --git a/arch/arm/mach-keystone/keystone.c 
b/arch/arm/mach-keystone/keystone.c
index 702c184..6a8ece9 100644
--- a/arch/arm/mach-keystone/keystone.c
+++ b/arch/arm/mach-keystone/keystone.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+#include "keystone.h"
+
 static struct map_desc io_desc[] = {
{
.virtual= 0xfe80UL,
@@ -73,6 +75,7 @@ static const char *keystone_match[] __initconst = {
 };
 
 DT_MACHINE_START(KEYSTONE, "Keystone")
+   smp_ops(keystone_smp_ops)
.map_io = keystone_map_io,
.init_irq   = keystone_init_irq,
.timer  = _timer,
diff --git a/arch/arm/mach-keystone/keystone.h 
b/arch/arm/mach-keystone/keystone.h
new file mode 100644
index 000..71bd0f4
--- /dev/null
+++ b/arch/arm/mach-keystone/keystone.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright 2010-2012 Texas Instruments, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef __KEYSTONE_H__
+#define __KEYSTONE_H__
+
+extern struct smp_ops keystone_smp_ops;
+extern void secondary_startup(void);
+
+#endif /* __KEYSTONE_H__ */
diff --git a/arch/arm/mach-keystone/platsmp.c b/arch/arm/mach-keystone/platsmp.c
new file mode 100644
index 000..dbe7601
--- /dev/null
+++ b/arch/arm/mach-keystone/platsmp.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright 2012 Texas Instruments, Inc.
+ *
+ * Based on platsmp.c, Copyright 2010-2011 Calxeda, Inc.
+ * Based on platsmp.c, Copyright (C) 2002 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "keystone.h"
+
+static void __init keystone_smp_init_cpus(void)
+{
+   unsigned int i, ncores;
+
+   ncores = 4;
+
+   /* sanity check */
+   if (ncores > NR_CPUS) {
+   pr_warn("restricted to %d cpus\n", NR_CPUS);
+   ncores = NR_CPUS;
+   }
+
+   for (i = 0; i < ncores; i++)
+   set_cpu_possible(i, true);
+
+   set_smp_cross_call(gic_raise_softirq);
+}
+
+static void __init 

[PATCH v2 06/22] ARM: LPAE: use signed arithmetic for mask definitions

2012-08-10 Thread Cyril Chemparathy
This patch applies to PAGE_MASK, PMD_MASK, and PGDIR_MASK, where forcing
unsigned long math truncates the mask at the 32-bits.  This clearly does bad
things on PAE systems.

This patch fixes this problem by defining these masks as signed quantities.
We then rely on sign extension to do the right thing.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/page.h   |2 +-
 arch/arm/include/asm/pgtable-3level.h |6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index ecf9019..1e0fe08 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -13,7 +13,7 @@
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT 12
 #define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
-#define PAGE_MASK  (~(PAGE_SIZE-1))
+#define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index b249035..ae39d11 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -48,16 +48,16 @@
 #define PMD_SHIFT  21
 
 #define PMD_SIZE   (1UL << PMD_SHIFT)
-#define PMD_MASK   (~(PMD_SIZE-1))
+#define PMD_MASK   (~((1 << PMD_SHIFT) - 1))
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
-#define PGDIR_MASK (~(PGDIR_SIZE-1))
+#define PGDIR_MASK (~((1 << PGDIR_SHIFT) - 1))
 
 /*
  * section address mask and size definitions.
  */
 #define SECTION_SHIFT  21
 #define SECTION_SIZE   (1UL << SECTION_SHIFT)
-#define SECTION_MASK   (~(SECTION_SIZE-1))
+#define SECTION_MASK   (~((1 << SECTION_SHIFT) - 1))
 
 #define USER_PTRS_PER_PGD  (PAGE_OFFSET / PGDIR_SIZE)
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 18/22] ARM: add virt_to_idmap for interconnect aliasing

2012-08-10 Thread Cyril Chemparathy
From: Vitaly Andrianov 

On some PAE systems (e.g. TI Keystone), memory is above the 32-bit addressible
limit, and the interconnect provides an aliased view of parts of physical
memory in the 32-bit addressible space.  This alias is strictly for boot time
usage, and is not otherwise usable because of coherency limitations.

On such systems, the idmap mechanism needs to take this aliased mapping into
account.  This patch introduces a virt_to_idmap() macro, which can be used on
such sub-architectures to represent the interconnect supported boot time
alias.  Most other systems would leave this macro untouched, i.e., do a simply
virt_to_phys() and nothing more.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
---
 arch/arm/include/asm/memory.h |9 +
 arch/arm/kernel/smp.c |2 +-
 arch/arm/mm/idmap.c   |4 ++--
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index e5d0cc8..59f101c 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -257,6 +257,15 @@ static inline void *phys_to_virt(phys_addr_t x)
 #define pfn_to_kaddr(pfn)  __va((pfn) << PAGE_SHIFT)
 
 /*
+ * These are for systems that have a hardware interconnect supported alias of
+ * physical memory for idmap purposes.  Most cases should leave these
+ * untouched.
+ */
+#ifndef virt_to_idmap
+#define virt_to_idmap(x) virt_to_phys(x)
+#endif
+
+/*
  * Virtual <-> DMA view memory address translations
  * Again, these are *only* valid on the kernel direct mapped RAM
  * memory.  Use of these is *deprecated* (and that doesn't mean
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 9831716..628f895 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -62,7 +62,7 @@ static DECLARE_COMPLETION(cpu_running);
 
 static unsigned long get_arch_pgd(pgd_t *pgd)
 {
-   phys_addr_t pgdir = virt_to_phys(pgd);
+   phys_addr_t pgdir = virt_to_idmap(pgd);
BUG_ON(pgdir & ARCH_PGD_MASK);
return pgdir >> ARCH_PGD_SHIFT;
 }
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..919cb6e 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -85,8 +85,8 @@ static int __init init_static_idmap(void)
return -ENOMEM;
 
/* Add an identity mapping for the physical address of the section. */
-   idmap_start = virt_to_phys((void *)__idmap_text_start);
-   idmap_end = virt_to_phys((void *)__idmap_text_end);
+   idmap_start = virt_to_idmap((void *)__idmap_text_start);
+   idmap_end = virt_to_idmap((void *)__idmap_text_end);
 
pr_info("Setting up static identity map for 0x%llx - 0x%llx\n",
(long long)idmap_start, (long long)idmap_end);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 08/22] ARM: LPAE: use phys_addr_t in free_memmap()

2012-08-10 Thread Cyril Chemparathy
From: Vitaly Andrianov 

The free_memmap() was mistakenly using unsigned long type to represent
physical addresses.  This breaks on PAE systems where memory could be placed
above the 32-bit addressible limit.

This patch fixes this function to properly use phys_addr_t instead.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
Acked-by: Nicolas Pitre 
---
 arch/arm/mm/init.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 9aec41f..19ba70b 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -457,7 +457,7 @@ static inline void
 free_memmap(unsigned long start_pfn, unsigned long end_pfn)
 {
struct page *start_pg, *end_pg;
-   unsigned long pg, pgend;
+   phys_addr_t pg, pgend;
 
/*
 * Convert start_pfn/end_pfn to a struct page pointer.
@@ -469,8 +469,8 @@ free_memmap(unsigned long start_pfn, unsigned long end_pfn)
 * Convert to physical addresses, and
 * round start upwards and end downwards.
 */
-   pg = (unsigned long)PAGE_ALIGN(__pa(start_pg));
-   pgend = (unsigned long)__pa(end_pg) & PAGE_MASK;
+   pg = PAGE_ALIGN(__pa(start_pg));
+   pgend = __pa(end_pg) & PAGE_MASK;
 
/*
 * If there are free pages between these,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 02/22] ARM: add self test for runtime patch mechanism

2012-08-10 Thread Cyril Chemparathy
This patch adds basic sanity tests to ensure that the instruction patching
results in valid instruction encodings.  This is done by verifying the output
of the patch process against a vector of assembler generated instructions at
init time.

Signed-off-by: Cyril Chemparathy 
---
 arch/arm/Kconfig|   12 
 arch/arm/kernel/runtime-patch.c |   41 +++
 2 files changed, 53 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index d0a04ad..7e552dc 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -211,6 +211,18 @@ config ARM_PATCH_PHYS_VIRT
  this feature (eg, building a kernel for a single machine) and
  you need to shrink the kernel to the minimal size.
 
+config ARM_RUNTIME_PATCH_TEST
+   bool "Self test runtime patching mechanism" if ARM_RUNTIME_PATCH
+   default y
+   help
+ Select this to enable init time self checking for the runtime kernel
+ patching mechanism.  This enables an ISA specific set of tests that
+ ensure that the instructions generated by the patch process are
+ consistent with those generated by the assembler at compile time.
+
+ Only disable this option if you need to shrink the kernel to the
+ minimal size.
+
 config NEED_MACH_IO_H
bool
help
diff --git a/arch/arm/kernel/runtime-patch.c b/arch/arm/kernel/runtime-patch.c
index fd37a2b..c471d8c 100644
--- a/arch/arm/kernel/runtime-patch.c
+++ b/arch/arm/kernel/runtime-patch.c
@@ -163,6 +163,44 @@ static int apply_patch_imm8(const struct patch_info *p)
return 0;
 }
 
+#ifdef CONFIG_ARM_RUNTIME_PATCH_TEST
+static void __init __used __naked __patch_test_code_imm8(void)
+{
+   __asm__ __volatile__ (
+   "   .irpshift1, 0, 6, 12, 18\n"
+   "   .irpshift2, 0, 1, 2, 3, 4, 5\n"
+   "   add r1, r2, #(0x41 << (\\shift1 + \\shift2))\n"
+   "   .endr\n"
+   "   .endr\n"
+   "   .word   0\n"
+   : : :
+   );
+}
+
+static void __init test_patch_imm8(void)
+{
+   u32 test_code_addr = (u32)(&__patch_test_code_imm8);
+   u32 *test_code = (u32 *)(test_code_addr & ~0x3);
+   int i, ret;
+   u32 ninsn, insn;
+
+   insn = test_code[0];
+   for (i = 0; test_code[i]; i++) {
+   ret = do_patch_imm8(insn, 0x41 << i, );
+   if (ret < 0)
+   pr_err("runtime patch (imm8): failed at shift %d\n", i);
+   else if (ninsn != test_code[i])
+   pr_err("runtime patch (imm8): failed, need %x got %x\n",
+  test_code[i], ninsn);
+   }
+}
+
+static void __init runtime_patch_test(void)
+{
+   test_patch_imm8();
+}
+#endif
+
 int runtime_patch(const void *table, unsigned size)
 {
const struct patch_info *p = table, *end = (table + size);
@@ -185,5 +223,8 @@ void __init runtime_patch_kernel(void)
const void *start = &__runtime_patch_table_begin;
const void *end   = &__runtime_patch_table_end;
 
+#ifdef CONFIG_ARM_RUNTIME_PATCH_TEST
+   runtime_patch_test();
+#endif
BUG_ON(runtime_patch(start, end - start));
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 16/22] ARM: mm: cleanup checks for membank overlap with vmalloc area

2012-08-10 Thread Cyril Chemparathy
On Keystone platforms, physical memory is entirely outside the 32-bit
addressible range.  Therefore, the (bank->start > ULONG_MAX) check below marks
the entire system memory as highmem, and this causes unpleasentness all over.

This patch eliminates the extra bank start check (against ULONG_MAX) by
checking bank->start against the physical address corresponding to vmalloc_min
instead.

In the process, this patch also cleans up parts of the highmem sanity check
code by removing what has now become a redundant check for banks that entirely
overlap with the vmalloc range.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/mm/mmu.c |   19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f764c03..3d685c6 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -901,15 +901,12 @@ void __init sanity_check_meminfo(void)
struct membank *bank = [j];
*bank = meminfo.bank[i];
 
-   if (bank->start > ULONG_MAX)
-   highmem = 1;
-
-#ifdef CONFIG_HIGHMEM
if (bank->start >= vmalloc_limit)
highmem = 1;
 
bank->highmem = highmem;
 
+#ifdef CONFIG_HIGHMEM
/*
 * Split those memory banks which are partially overlapping
 * the vmalloc area greatly simplifying things later.
@@ -932,8 +929,6 @@ void __init sanity_check_meminfo(void)
bank->size = vmalloc_limit - bank->start;
}
 #else
-   bank->highmem = highmem;
-
/*
 * Highmem banks not allowed with !CONFIG_HIGHMEM.
 */
@@ -946,18 +941,6 @@ void __init sanity_check_meminfo(void)
}
 
/*
-* Check whether this memory bank would entirely overlap
-* the vmalloc area.
-*/
-   if (bank->start >= vmalloc_limit) {
-   printk(KERN_NOTICE "Ignoring RAM at %.8llx-%.8llx "
-  "(vmalloc region overlap).\n",
-  (unsigned long long)bank->start,
-  (unsigned long long)bank->start + bank->size - 
1);
-   continue;
-   }
-
-   /*
 * Check whether this memory bank would partially overlap
 * the vmalloc area.
 */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 11/22] ARM: LPAE: use 64-bit accessors for TTBR registers

2012-08-10 Thread Cyril Chemparathy
This patch adds TTBR accessor macros, and modifies cpu_get_pgd() and
the LPAE version of cpu_set_reserved_ttbr0() to use these instead.

In the process, we also fix these functions to correctly handle cases
where the physical address lies beyond the 4G limit of 32-bit addressing.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/proc-fns.h |   24 +++-
 arch/arm/mm/context.c   |9 ++---
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/proc-fns.h b/arch/arm/include/asm/proc-fns.h
index 75b5f14..24224df 100644
--- a/arch/arm/include/asm/proc-fns.h
+++ b/arch/arm/include/asm/proc-fns.h
@@ -116,13 +116,27 @@ extern void cpu_resume(void);
 #define cpu_switch_mm(pgd,mm) cpu_do_switch_mm(virt_to_phys(pgd),mm)
 
 #ifdef CONFIG_ARM_LPAE
+
+#define cpu_get_ttbr(nr)   \
+   ({  \
+   u64 ttbr;   \
+   __asm__("mrrc   p15, " #nr ", %Q0, %R0, c2" \
+   : "=r" (ttbr)   \
+   : : "cc");  \
+   ttbr;   \
+   })
+
+#define cpu_set_ttbr(nr, val)  \
+   do {\
+   u64 ttbr = val; \
+   __asm__("mcrr   p15, " #nr ", %Q0, %R0, c2" \
+   : : "r" (ttbr)  \
+   : "cc");\
+   } while (0)
+
 #define cpu_get_pgd()  \
({  \
-   unsigned long pg, pg2;  \
-   __asm__("mrrc   p15, 0, %0, %1, c2" \
-   : "=r" (pg), "=r" (pg2) \
-   :   \
-   : "cc");\
+   u64 pg = cpu_get_ttbr(0);   \
pg &= ~(PTRS_PER_PGD*sizeof(pgd_t)-1);  \
(pgd_t *)phys_to_virt(pg);  \
})
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 119bc52..cd27f35 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_RAW_SPINLOCK(cpu_asid_lock);
 unsigned int cpu_last_asid = ASID_FIRST_VERSION;
@@ -23,17 +24,11 @@ unsigned int cpu_last_asid = ASID_FIRST_VERSION;
 #ifdef CONFIG_ARM_LPAE
 void cpu_set_reserved_ttbr0(void)
 {
-   unsigned long ttbl = __pa(swapper_pg_dir);
-   unsigned long ttbh = 0;
-
/*
 * Set TTBR0 to swapper_pg_dir which contains only global entries. The
 * ASID is set to 0.
 */
-   asm volatile(
-   "   mcrrp15, 0, %0, %1, c2  @ set TTBR0\n"
-   :
-   : "r" (ttbl), "r" (ttbh));
+   cpu_set_ttbr(0, __pa(swapper_pg_dir));
isb();
 }
 #else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v2 22/22] ARM: keystone: add switch over to high physical address range

2012-08-10 Thread Cyril Chemparathy
Keystone platforms have their physical memory mapped at an address outside the
32-bit physical range.  A Keystone machine with 16G of RAM would find its
memory at 0x08 - 0x0b.

For boot purposes, the interconnect supports a limited alias of some of this
memory within the 32-bit addressable space (0x8000 - 0x).  This
aliasing is implemented in hardware, and is not intended to be used much
beyond boot.  For instance, DMA coherence does not work when running out of
this aliased address space.

Therefore, we've taken the approach of booting out of the low physical address
range, and subsequently we switch over to the high range once we're safely
inside machine specific territory.  This patch implements this switch over
mechanism, which involves rewiring the TTBRs and page tables to point to the
new physical address space.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
---
 arch/arm/Kconfig |1 +
 arch/arm/boot/dts/keystone-sim.dts   |8 +++---
 arch/arm/configs/keystone_defconfig  |1 +
 arch/arm/mach-keystone/include/mach/memory.h |   25 +
 arch/arm/mach-keystone/keystone.c|   39 ++
 arch/arm/mach-keystone/platsmp.c |   16 +--
 6 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 37b4e9c..187 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -417,6 +417,7 @@ config ARCH_KEYSTONE
select NEED_MACH_MEMORY_H
select HAVE_SCHED_CLOCK
select HAVE_SMP
+   select ZONE_DMA if ARM_LPAE
help
  Support for boards based on the Texas Instruments Keystone family of
  SoCs.
diff --git a/arch/arm/boot/dts/keystone-sim.dts 
b/arch/arm/boot/dts/keystone-sim.dts
index acec30f8..17ee473 100644
--- a/arch/arm/boot/dts/keystone-sim.dts
+++ b/arch/arm/boot/dts/keystone-sim.dts
@@ -4,8 +4,8 @@
 / {
model = "Texas Instruments Keystone 2 SoC";
compatible = "ti,keystone-evm";
-   #address-cells = <1>;
-   #size-cells = <1>;
+   #address-cells = <2>;
+   #size-cells = <2>;
interrupt-parent = <>;
 
aliases {
@@ -13,11 +13,11 @@
};
 
chosen {
-   bootargs = "console=ttyS0,115200n8 debug earlyprintk lpj=5 
rdinit=/bin/ash rw root=/dev/ram0 initrd=0x8500,2M";
+   bootargs = "console=ttyS0,115200n8 debug earlyprintk lpj=5 
rdinit=/bin/ash rw root=/dev/ram0 initrd=0x80500,2M";
};
 
memory {
-   reg = <0x8000 0x800>;
+   reg = <0x0008 0x 0x 0x800>;
};
 
cpus {
diff --git a/arch/arm/configs/keystone_defconfig 
b/arch/arm/configs/keystone_defconfig
index 5f71e66..8ea3b96 100644
--- a/arch/arm/configs/keystone_defconfig
+++ b/arch/arm/configs/keystone_defconfig
@@ -1,6 +1,7 @@
 CONFIG_EXPERIMENTAL=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_ARCH_KEYSTONE=y
+CONFIG_ARM_LPAE=y
 CONFIG_SMP=y
 CONFIG_ARM_ARCH_TIMER=y
 CONFIG_NR_CPUS=4
diff --git a/arch/arm/mach-keystone/include/mach/memory.h 
b/arch/arm/mach-keystone/include/mach/memory.h
index 7c78b1e..a5f7a1a 100644
--- a/arch/arm/mach-keystone/include/mach/memory.h
+++ b/arch/arm/mach-keystone/include/mach/memory.h
@@ -19,4 +19,29 @@
 #define MAX_PHYSMEM_BITS   36
 #define SECTION_SIZE_BITS  34
 
+#define KEYSTONE_LOW_PHYS_START0x8000ULL
+#define KEYSTONE_LOW_PHYS_SIZE 0x8000ULL /* 2G */
+#define KEYSTONE_LOW_PHYS_END  (KEYSTONE_LOW_PHYS_START + \
+KEYSTONE_LOW_PHYS_SIZE - 1)
+
+#define KEYSTONE_HIGH_PHYS_START   0x8ULL
+#define KEYSTONE_HIGH_PHYS_SIZE0x4ULL  /* 16G */
+#define KEYSTONE_HIGH_PHYS_END (KEYSTONE_HIGH_PHYS_START + \
+KEYSTONE_HIGH_PHYS_SIZE - 1)
+#ifdef CONFIG_ARM_LPAE
+
+#ifndef __ASSEMBLY__
+
+static inline phys_addr_t __virt_to_idmap(unsigned long x)
+{
+   return (phys_addr_t)(x) - CONFIG_PAGE_OFFSET +
+   KEYSTONE_LOW_PHYS_START;
+}
+
+#define virt_to_idmap(x)   __virt_to_idmap((unsigned long)(x))
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* CONFIG_ARM_LPAE */
+
 #endif /* __ASM_MACH_MEMORY_H */
diff --git a/arch/arm/mach-keystone/keystone.c 
b/arch/arm/mach-keystone/keystone.c
index 6a8ece9..c4be7a7 100644
--- a/arch/arm/mach-keystone/keystone.c
+++ b/arch/arm/mach-keystone/keystone.c
@@ -74,6 +74,41 @@ static const char *keystone_match[] __initconst = {
NULL,
 };
 
+static void __init keystone_init_meminfo(void)
+{
+   bool lpae = IS_ENABLED(CONFIG_ARM_LPAE);
+   bool pvpatch = IS_ENABLED(CONFIG_ARM_PATCH_PHYS_VIRT);
+   phys_addr_t offset = PHYS_OFFSET - KEYSTONE_LOW_PHYS_START;
+   phys_addr_t mem_start, mem_end;
+
+   BUG_ON(meminfo.nr_banks < 1);
+   mem_start = meminfo.bank[0].start;
+   

[PATCH v2 00/22] Introducing the TI Keystone platform

2012-08-10 Thread Cyril Chemparathy
This series is a follow on to the series posted earlier (archived at [1]).

Patches 01/22 .. 09/22 of this series have been pretty intensively reviewed;
thanks to all who helped.  We've modified per feedback, and these should be in
reasonable shape.

Patches 10/22 .. 19/22 of this series have not been very widely reviewed.
We'd very much appreciate eyeballs here.

Patches 20/22 .. 22/22 of this series are specific to the TI Keystone platform.
These are not ready to be merged in.  These are being provided here for the sake
of completeness, and to better illustrate the other patches in this series.
These are dependent on the smpops patches (see [2]).

These patches are also available on the following git repository:
git://arago-project.org/git/projects/linux-keystone.git keystone-v2


[1] - http://thread.gmane.org/gmane.linux.kernel/1336081
[2] - http://permalink.gmane.org/gmane.linux.ports.arm.kernel/171540


Series changelog:

(01/22) ARM: add mechanism for late code patching
  (v2)  pulled runtime patching code into separate source files
  (v2)  reordered arguments to patch macros for consistency with assembly
"Rd, Rt, imm" ordering
  (v2)  added support for mov immediate patching
  (v2)  cache flush patched instructions instead of entire kernel code
  (v2)  pack patch table to reduce table volume
  (v2)  add to module vermagic to reflect abi change
  (v2)  misc. cleanups in naming and structure

(02/22) ARM: add self test for runtime patch mechanism
  (v2)  added init-time tests to verify instruction encoding

(03/22) ARM: use late patch framework for phys-virt patching
  (v2)  move __pv_offset and __pv_phys_offset to C code
  (v2)  restore conditional init of __pv_offset and __pv_phys_offset

(04/22) ARM: LPAE: use phys_addr_t on virt <--> phys conversion
  (v2)  fix patched __phys_to_virt() to use 32-bit operand
  (v2)  convert non-patch __phys_to_virt and __virt_to_phys to inlines to retain
type checking

(05/22) ARM: LPAE: support 64-bit virt_to_phys patching
  (v2)  use phys_addr_t instead of split high/low phys_offsets
  (v2)  use mov immediate instead of add to zero when patching in high order
physical address bits
  (v2)  fix __pv_phys_offset handling for big-endian
  (v2)  remove set_phys_offset()

(06/22) ARM: LPAE: use signed arithmetic for mask definitions
(07/22) ARM: LPAE: use phys_addr_t in alloc_init_pud()
(08/22) ARM: LPAE: use phys_addr_t in free_memmap()
  (v2)  unchanged from v1

(09/22) ARM: LPAE: use phys_addr_t for initrd location and size
  (v2)  revert to unsigned long for initrd size

(10/22) ARM: LPAE: use phys_addr_t in switch_mm()
  (v2)  use phys_addr_t instead of u64 in switch_mm()
  (v2)  revert on changes to v6 and v7-2level
  (v2)  fix register mapping for big-endian in v7-3level

(11/22) ARM: LPAE: use 64-bit accessors for TTBR registers
  (v2)  restore comment in cpu_set_reserved_ttbr0()

(12/22) ARM: LPAE: define ARCH_LOW_ADDRESS_LIMIT for bootmem
(13/22) ARM: LPAE: factor out T1SZ and TTBR1 computations
  (v2)  unchanged from v1

(14/22) ARM: LPAE: accomodate >32-bit addresses for page table base
  (v2)  apply arch_pgd_shift only on lpae
  (v2)  move arch_pgd_shift definition to asm/memory.h
  (v2)  revert on changes to non-lpae procs
  (v2)  add check to ensure that the pgd physical address is aligned at an
ARCH_PGD_SHIFT boundary

(15/22) ARM: mm: use physical addresses in highmem sanity checks
(16/22) ARM: mm: cleanup checks for membank overlap with vmalloc area
(17/22) ARM: mm: clean up membank size limit checks
(18/22) ARM: add virt_to_idmap for interconnect aliasing
  (v2)  unchanged from v1

(19/22) ARM: recreate kernel mappings in early_paging_init()
  (v2)  disable on !lpae at compile time


 arch/arm/Kconfig  |   36 
 arch/arm/Makefile |1 +
 arch/arm/boot/dts/keystone-sim.dts|   77 +++
 arch/arm/configs/keystone_defconfig   |   23 +++
 arch/arm/include/asm/mach/arch.h  |1 +
 arch/arm/include/asm/memory.h |   94 ++---
 arch/arm/include/asm/module.h |7 +
 arch/arm/include/asm/page.h   |2 +-
 arch/arm/include/asm/pgtable-3level-hwdef.h   |   10 +
 arch/arm/include/asm/pgtable-3level.h |6 +-
 arch/arm/include/asm/proc-fns.h   |   28 ++-
 arch/arm/include/asm/runtime-patch.h  |  175 
 arch/arm/kernel/Makefile  |1 +
 arch/arm/kernel/armksyms.c|4 -
 arch/arm/kernel/head.S|  107 +++---
 arch/arm/kernel/module.c  |7 +-
 arch/arm/kernel/runtime-patch.c   |  230 +
 arch/arm/kernel/setup.c   |   18 ++
 arch/arm/kernel/smp.c |   11 +-
 arch/arm/kernel/vmlinux.lds.S   

[PATCH v2 03/22] ARM: use late patch framework for phys-virt patching

2012-08-10 Thread Cyril Chemparathy
This patch replaces the original physical offset patching implementation
with one that uses the newly added patching framework.  In the process, we now
unconditionally initialize the __pv_phys_offset and __pv_offset globals in the
head.S code.

Signed-off-by: Cyril Chemparathy 
---
 arch/arm/Kconfig  |1 +
 arch/arm/include/asm/memory.h |   26 +++
 arch/arm/kernel/armksyms.c|4 --
 arch/arm/kernel/head.S|   95 +++--
 arch/arm/kernel/module.c  |5 ---
 arch/arm/kernel/setup.c   |   12 ++
 arch/arm/kernel/vmlinux.lds.S |5 ---
 7 files changed, 36 insertions(+), 112 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 7e552dc..9ac86ea 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -199,6 +199,7 @@ config ARM_PATCH_PHYS_VIRT
default y
depends on !XIP_KERNEL && MMU
depends on !ARCH_REALVIEW || !SPARSEMEM
+   select ARM_RUNTIME_PATCH
help
  Patch phys-to-virt and virt-to-phys translation functions at
  boot and module load time according to the position of the
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index e965f1b..3d93779 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+#include 
+
 #ifdef CONFIG_NEED_MACH_MEMORY_H
 #include 
 #endif
@@ -151,35 +153,21 @@
 #ifndef __virt_to_phys
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 
-/*
- * Constants used to force the right instruction encodings and shifts
- * so that all we need to do is modify the 8-bit constant field.
- */
-#define __PV_BITS_31_240x8100
-
-extern unsigned long __pv_phys_offset;
-#define PHYS_OFFSET __pv_phys_offset
-
-#define __pv_stub(from,to,instr,type)  \
-   __asm__("@ __pv_stub\n" \
-   "1: " instr "   %0, %1, %2\n"   \
-   "   .pushsection .pv_table,\"a\"\n" \
-   "   .long   1b\n"   \
-   "   .popsection\n"  \
-   : "=r" (to) \
-   : "r" (from), "I" (type))
+extern unsigned long   __pv_offset;
+extern unsigned long   __pv_phys_offset;
+#define PHYS_OFFSET__virt_to_phys(PAGE_OFFSET)
 
 static inline unsigned long __virt_to_phys(unsigned long x)
 {
unsigned long t;
-   __pv_stub(x, t, "add", __PV_BITS_31_24);
+   early_patch_imm8("add", t, x, __pv_offset, 0);
return t;
 }
 
 static inline unsigned long __phys_to_virt(unsigned long x)
 {
unsigned long t;
-   __pv_stub(x, t, "sub", __PV_BITS_31_24);
+   early_patch_imm8("sub", t, x, __pv_offset, 0);
return t;
 }
 #else
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..6b388f8 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -152,7 +152,3 @@ EXPORT_SYMBOL(mcount);
 #endif
 EXPORT_SYMBOL(__gnu_mcount_nc);
 #endif
-
-#ifdef CONFIG_ARM_PATCH_PHYS_VIRT
-EXPORT_SYMBOL(__pv_phys_offset);
-#endif
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 3db960e..69a3c09 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -117,7 +117,7 @@ ENTRY(stext)
bl  __fixup_smp
 #endif
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
-   bl  __fixup_pv_table
+   bl  __fixup_pv_offsets
 #endif
bl  __create_page_tables
 
@@ -511,92 +511,29 @@ ENDPROC(fixup_smp)
 
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 
-/* __fixup_pv_table - patch the stub instructions with the delta between
- * PHYS_OFFSET and PAGE_OFFSET, which is assumed to be 16MiB aligned and
- * can be expressed by an immediate shifter operand. The stub instruction
- * has a form of '(add|sub) rd, rn, #imm'.
+/*
+ * __fixup_pv_offsets - update __pv_offset and __pv_phys_offset based on the
+ * runtime location of the kernel.
  */
__HEAD
-__fixup_pv_table:
+__fixup_pv_offsets:
adr r0, 1f
-   ldmia   r0, {r3-r5, r7}
+   ldmia   r0, {r3-r6}
sub r3, r0, r3  @ PHYS_OFFSET - PAGE_OFFSET
-   add r4, r4, r3  @ adjust table start address
-   add r5, r5, r3  @ adjust table end address
-   add r7, r7, r3  @ adjust __pv_phys_offset address
-   str r8, [r7]@ save computed PHYS_OFFSET to __pv_phys_offset
-   mov r6, r3, lsr #24 @ constant for add/sub instructions
-   teq r3, r6, lsl #24 @ must be 16MiB aligned
-THUMB( it  ne  @ cross section branch )
-   bne __error
-   str r6, [r7, #4]@ save to __pv_offset
-   b   __fixup_a_pv_table
-ENDPROC(__fixup_pv_table)
+   add r4, r4, r3  @ virt_to_phys(__pv_phys_offset)
+   add r5, r5, r3  @ virt_to_phys(__pv_offset)
+   add r6, r6, r3  @ virt_to_phys(PAGE_OFFSET) = PHYS_OFFSET
+   str r6, [r4]@ save 

[PATCH v2 07/22] ARM: LPAE: use phys_addr_t in alloc_init_pud()

2012-08-10 Thread Cyril Chemparathy
From: Vitaly Andrianov 

This patch fixes the alloc_init_pud() function to use phys_addr_t instead of
unsigned long when passing in the phys argument.

This is an extension to commit 97092e0c56830457af0639f6bd904537a150ea4a (ARM:
pgtable: use phys_addr_t for physical addresses), which applied similar changes
elsewhere in the ARM memory management code.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
Acked-by: Nicolas Pitre 
---
 arch/arm/mm/mmu.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4c2d045..53eeeb8 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -622,7 +622,8 @@ static void __init alloc_init_section(pud_t *pud, unsigned 
long addr,
 }
 
 static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
-   unsigned long end, unsigned long phys, const struct mem_type *type)
+ unsigned long end, phys_addr_t phys,
+ const struct mem_type *type)
 {
pud_t *pud = pud_offset(pgd, addr);
unsigned long next;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 13/22] ARM: LPAE: factor out T1SZ and TTBR1 computations

2012-08-10 Thread Cyril Chemparathy
This patch moves the TTBR1 offset calculation and the T1SZ calculation out
of the TTB setup assembly code.  This should not affect functionality in
any way, but improves code readability as well as readability of subsequent
patches in this series.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/pgtable-3level-hwdef.h |   10 ++
 arch/arm/mm/proc-v7-3level.S|   16 
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h 
b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..b501650 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -74,4 +74,14 @@
 #define PHYS_MASK_SHIFT(40)
 #define PHYS_MASK  ((1ULL << PHYS_MASK_SHIFT) - 1)
 
+#if defined CONFIG_VMSPLIT_2G
+#define TTBR1_OFFSET   (1 << 4)/* skip two L1 entries */
+#elif defined CONFIG_VMSPLIT_3G
+#define TTBR1_OFFSET   (4096 * (1 + 3))/* only L2, skip pgd + 3*pmd */
+#else
+#define TTBR1_OFFSET   0
+#endif
+
+#define TTBR1_SIZE (((PAGE_OFFSET >> 30) - 1) << 16)
+
 #endif
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 78bd88c..e28383f 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -137,18 +137,10 @@ ENDPROC(cpu_v7_set_pte_ext)
 * booting secondary CPUs would end up using TTBR1 for the identity
 * mapping set up in TTBR0.
 */
-   bhi 9001f   @ PHYS_OFFSET > PAGE_OFFSET?
-   orr \tmp, \tmp, #(((PAGE_OFFSET >> 30) - 1) << 16) @ TTBCR.T1SZ
-#if defined CONFIG_VMSPLIT_2G
-   /* PAGE_OFFSET == 0x8000, T1SZ == 1 */
-   add \ttbr1, \ttbr1, #1 << 4 @ skip two L1 entries
-#elif defined CONFIG_VMSPLIT_3G
-   /* PAGE_OFFSET == 0xc000, T1SZ == 2 */
-   add \ttbr1, \ttbr1, #4096 * (1 + 3) @ only L2 used, skip pgd+3*pmd
-#endif
-   /* CONFIG_VMSPLIT_1G does not need TTBR1 adjustment */
-9001:  mcr p15, 0, \tmp, c2, c0, 2 @ TTB control register
-   mcrrp15, 1, \ttbr1, \zero, c2   @ load TTBR1
+   orrls   \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ
+   mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR
+   addls   \ttbr1, \ttbr1, #TTBR1_OFFSET
+   mcrrp15, 1, \ttbr1, \zero, c2   @ load TTBR1
.endm
 
__CPUINIT
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 04/22] ARM: LPAE: use phys_addr_t on virt <--> phys conversion

2012-08-10 Thread Cyril Chemparathy
This patch fixes up the types used when converting back and forth between
physical and virtual addresses.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
---
 arch/arm/include/asm/memory.h |   26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 3d93779..81e1714 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -157,22 +157,32 @@ extern unsigned long  __pv_offset;
 extern unsigned long   __pv_phys_offset;
 #define PHYS_OFFSET__virt_to_phys(PAGE_OFFSET)
 
-static inline unsigned long __virt_to_phys(unsigned long x)
+static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
unsigned long t;
early_patch_imm8("add", t, x, __pv_offset, 0);
return t;
 }
 
-static inline unsigned long __phys_to_virt(unsigned long x)
+static inline unsigned long __phys_to_virt(phys_addr_t x)
 {
-   unsigned long t;
-   early_patch_imm8("sub", t, x, __pv_offset, 0);
+   unsigned long t, xlo = x;
+   early_patch_imm8("sub", t, xlo, __pv_offset, 0);
return t;
 }
+
 #else
-#define __virt_to_phys(x)  ((x) - PAGE_OFFSET + PHYS_OFFSET)
-#define __phys_to_virt(x)  ((x) - PHYS_OFFSET + PAGE_OFFSET)
+
+static inline phys_addr_t __virt_to_phys(unsigned long x)
+{
+   return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET;
+}
+
+static inline unsigned long __phys_to_virt(phys_addr_t x)
+{
+   return x - PHYS_OFFSET + PAGE_OFFSET;
+}
+
 #endif
 #endif
 
@@ -207,14 +217,14 @@ static inline phys_addr_t virt_to_phys(const volatile 
void *x)
 
 static inline void *phys_to_virt(phys_addr_t x)
 {
-   return (void *)(__phys_to_virt((unsigned long)(x)));
+   return (void *)__phys_to_virt(x);
 }
 
 /*
  * Drivers should NOT use these either.
  */
 #define __pa(x)__virt_to_phys((unsigned long)(x))
-#define __va(x)((void *)__phys_to_virt((unsigned 
long)(x)))
+#define __va(x)((void 
*)__phys_to_virt((phys_addr_t)(x)))
 #define pfn_to_kaddr(pfn)  __va((pfn) << PAGE_SHIFT)
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 01/22] ARM: add mechanism for late code patching

2012-08-10 Thread Cyril Chemparathy
The original phys_to_virt/virt_to_phys patching implementation relied on early
patching prior to MMU initialization.  On PAE systems running out of >4G
address space, this would have entailed an additional round of patching after
switching over to the high address space.

The approach implemented here conceptually extends the original PHYS_OFFSET
patching implementation with the introduction of "early" patch stubs.  Early
patch code is required to be functional out of the box, even before the patch
is applied.  This is implemented by inserting functional (but inefficient)
load code into the .runtime.patch.code init section.  Having functional code
out of the box then allows us to defer the init time patch application until
later in the init sequence.

In addition to fitting better with our need for physical address-space
switch-over, this implementation should be somewhat more extensible by virtue
of its more readable (and hackable) C implementation.  This should prove
useful for other similar init time specialization needs, especially in light
of our multi-platform kernel initiative.

This code has been boot tested in both ARM and Thumb-2 modes on an ARMv7
(Cortex-A8) device.

Note: the obtuse use of stringified symbols in patch_stub() and
early_patch_stub() is intentional.  Theoretically this should have been
accomplished with formal operands passed into the asm block, but this requires
the use of the 'c' modifier for instantiating the long (e.g. .long %c0).
However, the 'c' modifier has been found to ICE certain versions of GCC, and
therefore we resort to stringified symbols here.

Signed-off-by: Cyril Chemparathy 
---
 arch/arm/Kconfig |3 +
 arch/arm/include/asm/module.h|7 ++
 arch/arm/include/asm/runtime-patch.h |  175 +++
 arch/arm/kernel/Makefile |1 +
 arch/arm/kernel/module.c |4 +
 arch/arm/kernel/runtime-patch.c  |  189 ++
 arch/arm/kernel/setup.c  |3 +
 arch/arm/kernel/vmlinux.lds.S|   10 ++
 8 files changed, 392 insertions(+)
 create mode 100644 arch/arm/include/asm/runtime-patch.h
 create mode 100644 arch/arm/kernel/runtime-patch.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e91c7cd..d0a04ad 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -61,6 +61,9 @@ config ARM
 config ARM_HAS_SG_CHAIN
bool
 
+config ARM_RUNTIME_PATCH
+   bool
+
 config NEED_SG_DMA_LENGTH
bool
 
diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
index 6c6809f..2090486 100644
--- a/arch/arm/include/asm/module.h
+++ b/arch/arm/include/asm/module.h
@@ -43,9 +43,16 @@ struct mod_arch_specific {
 #define MODULE_ARCH_VERMAGIC_ARMTHUMB ""
 #endif
 
+#ifdef CONFIG_ARM_RUNTIME_PATCH
+#define MODULE_ARCH_VERMAGIC_RT_PATCH "rt-patch "
+#else
+#define MODULE_ARCH_VERMAGIC_RT_PATCH ""
+#endif
+
 #define MODULE_ARCH_VERMAGIC \
MODULE_ARCH_VERMAGIC_ARMVSN \
MODULE_ARCH_VERMAGIC_ARMTHUMB \
+   MODULE_ARCH_VERMAGIC_RT_PATCH \
MODULE_ARCH_VERMAGIC_P2V
 
 #endif /* _ASM_ARM_MODULE_H */
diff --git a/arch/arm/include/asm/runtime-patch.h 
b/arch/arm/include/asm/runtime-patch.h
new file mode 100644
index 000..6c6e8a2
--- /dev/null
+++ b/arch/arm/include/asm/runtime-patch.h
@@ -0,0 +1,175 @@
+/*
+ * arch/arm/include/asm/runtime-patch.h
+ * Note: this file should not be included by non-asm/.h files
+ *
+ * Copyright 2012 Texas Instruments, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+#ifndef __ASM_ARM_RUNTIME_PATCH_H
+#define __ASM_ARM_RUNTIME_PATCH_H
+
+#include 
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_ARM_RUNTIME_PATCH
+
+struct patch_info {
+   void*insn;
+   u16  type;
+   u8   insn_size;
+   u8   data_size;
+   u32  data[0];
+};
+
+#define patch_next(p)  ((void *)(p) + sizeof(*(p)) + (p)->data_size)
+
+#define PATCH_TYPE_MASK0x00ff
+#define PATCH_IMM8 0x0001
+
+#define PATCH_EARLY0x8000
+
+#define patch_stub(type, code, patch_data, ...)
\
+   __asm__("@ patch stub\n"\
+   "1:\n"  \
+   code\
+   "2:\n"  

[PATCH v2 15/22] ARM: mm: use physical addresses in highmem sanity checks

2012-08-10 Thread Cyril Chemparathy
This patch modifies the highmem sanity checking code to use physical addresses
instead.  This change eliminates the wrap-around problems associated with the
original virtual address based checks, and this simplifies the code a bit.

The one constraint imposed here is that low physical memory must be mapped in
a monotonically increasing fashion if there are multiple banks of memory,
i.e., x < y must => pa(x) < pa(y).

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/mm/mmu.c |   22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 53eeeb8..f764c03 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -895,6 +895,7 @@ phys_addr_t arm_lowmem_limit __initdata = 0;
 void __init sanity_check_meminfo(void)
 {
int i, j, highmem = 0;
+   phys_addr_t vmalloc_limit = __pa(vmalloc_min - 1) + 1;
 
for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
struct membank *bank = [j];
@@ -904,8 +905,7 @@ void __init sanity_check_meminfo(void)
highmem = 1;
 
 #ifdef CONFIG_HIGHMEM
-   if (__va(bank->start) >= vmalloc_min ||
-   __va(bank->start) < (void *)PAGE_OFFSET)
+   if (bank->start >= vmalloc_limit)
highmem = 1;
 
bank->highmem = highmem;
@@ -914,8 +914,8 @@ void __init sanity_check_meminfo(void)
 * Split those memory banks which are partially overlapping
 * the vmalloc area greatly simplifying things later.
 */
-   if (!highmem && __va(bank->start) < vmalloc_min &&
-   bank->size > vmalloc_min - __va(bank->start)) {
+   if (!highmem && bank->start < vmalloc_limit &&
+   bank->size > vmalloc_limit - bank->start) {
if (meminfo.nr_banks >= NR_BANKS) {
printk(KERN_CRIT "NR_BANKS too low, "
 "ignoring high memory\n");
@@ -924,12 +924,12 @@ void __init sanity_check_meminfo(void)
(meminfo.nr_banks - i) * sizeof(*bank));
meminfo.nr_banks++;
i++;
-   bank[1].size -= vmalloc_min - __va(bank->start);
-   bank[1].start = __pa(vmalloc_min - 1) + 1;
+   bank[1].size -= vmalloc_limit - bank->start;
+   bank[1].start = vmalloc_limit;
bank[1].highmem = highmem = 1;
j++;
}
-   bank->size = vmalloc_min - __va(bank->start);
+   bank->size = vmalloc_limit - bank->start;
}
 #else
bank->highmem = highmem;
@@ -949,8 +949,7 @@ void __init sanity_check_meminfo(void)
 * Check whether this memory bank would entirely overlap
 * the vmalloc area.
 */
-   if (__va(bank->start) >= vmalloc_min ||
-   __va(bank->start) < (void *)PAGE_OFFSET) {
+   if (bank->start >= vmalloc_limit) {
printk(KERN_NOTICE "Ignoring RAM at %.8llx-%.8llx "
   "(vmalloc region overlap).\n",
   (unsigned long long)bank->start,
@@ -962,9 +961,8 @@ void __init sanity_check_meminfo(void)
 * Check whether this memory bank would partially overlap
 * the vmalloc area.
 */
-   if (__va(bank->start + bank->size) > vmalloc_min ||
-   __va(bank->start + bank->size) < __va(bank->start)) {
-   unsigned long newsize = vmalloc_min - __va(bank->start);
+   if (bank->start + bank->size > vmalloc_limit)
+   unsigned long newsize = vmalloc_limit - bank->start;
printk(KERN_NOTICE "Truncating RAM at %.8llx-%.8llx "
   "to -%.8llx (vmalloc region overlap).\n",
   (unsigned long long)bank->start,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 19/22] ARM: recreate kernel mappings in early_paging_init()

2012-08-10 Thread Cyril Chemparathy
This patch adds a step in the init sequence, in order to recreate the kernel
code/data page table mappings prior to full paging initialization.  This is
necessary on LPAE systems that run out of a physical address space outside the
4G limit.  On these systems, this implementation provides a machine descriptor
hook that allows the PHYS_OFFSET to be overridden in a machine specific
fashion.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/mach/arch.h |1 +
 arch/arm/kernel/setup.c  |3 ++
 arch/arm/mm/mmu.c|   65 ++
 3 files changed, 69 insertions(+)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index 0b1c94b..2b9ecc5 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -37,6 +37,7 @@ struct machine_desc {
charrestart_mode;   /* default restart mode */
void(*fixup)(struct tag *, char **,
 struct meminfo *);
+   void(*init_meminfo)(void);
void(*reserve)(void);/* reserve mem blocks  */
void(*map_io)(void);/* IO mapping function  */
void(*init_early)(void);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index edb4f42..e37cbaf 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -79,6 +79,7 @@ static int __init fpe_setup(char *line)
 __setup("fpe=", fpe_setup);
 #endif
 
+extern void early_paging_init(struct machine_desc *, struct proc_info_list *);
 extern void paging_init(struct machine_desc *desc);
 extern void sanity_check_meminfo(void);
 extern void reboot_setup(char *str);
@@ -978,6 +979,8 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
 
sort(, meminfo.nr_banks, sizeof(meminfo.bank[0]), 
meminfo_cmp, NULL);
+
+   early_paging_init(mdesc, lookup_processor_type(read_cpuid_id()));
sanity_check_meminfo();
arm_memblock_init(, mdesc);
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 662684b..5d240da 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1168,6 +1169,70 @@ static void __init map_lowmem(void)
}
 }
 
+#ifdef CONFIG_ARM_LPAE
+/*
+ * early_paging_init() recreates boot time page table setup, allowing machines
+ * to switch over to a high (>4G) address space on LPAE systems
+ */
+void __init early_paging_init(struct machine_desc *mdesc,
+ struct proc_info_list *procinfo)
+{
+   pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags;
+   unsigned long map_start, map_end;
+   pgd_t *pgd0, *pgdk;
+   pud_t *pud0, *pudk;
+   pmd_t *pmd0, *pmdk;
+   phys_addr_t phys;
+   int i;
+
+   /* remap kernel code and data */
+   map_start = init_mm.start_code;
+   map_end   = init_mm.brk;
+
+   /* get a handle on things... */
+   pgd0 = pgd_offset_k(0);
+   pud0 = pud_offset(pgd0, 0);
+   pmd0 = pmd_offset(pud0, 0);
+
+   pgdk = pgd_offset_k(map_start);
+   pudk = pud_offset(pgdk, map_start);
+   pmdk = pmd_offset(pudk, map_start);
+
+   phys = PHYS_OFFSET;
+
+   if (mdesc->init_meminfo)
+   mdesc->init_meminfo();
+
+   /* remap level 1 table */
+   for (i = 0; i < PTRS_PER_PGD; i++) {
+   *pud0++ = __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER);
+   pmd0 += PTRS_PER_PMD;
+   }
+
+   /* remap pmds for kernel mapping */
+   phys = __pa(map_start) & PMD_MASK;
+   do {
+   *pmdk++ = __pmd(phys | pmdprot);
+   phys += PMD_SIZE;
+   } while (phys < map_end);
+
+   flush_cache_all();
+   cpu_set_ttbr(0, __pa(pgd0));
+   cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
+   local_flush_tlb_all();
+}
+
+#else
+
+void __init early_paging_init(struct machine_desc *mdesc,
+ struct proc_info_list *procinfo)
+{
+   if (mdesc->init_meminfo)
+   mdesc->init_meminfo();
+}
+
+#endif
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps, and sets up the zero page, bad page and bad page tables.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching

2012-08-10 Thread Cyril Chemparathy
This patch adds support for 64-bit physical addresses in virt_to_phys()
patching.  This does not do real 64-bit add/sub, but instead patches in the
upper 32-bits of the phys_offset directly into the output of virt_to_phys.

There is no corresponding change on the phys_to_virt() side, because
computations on the upper 32-bits would be discarded anyway.

Signed-off-by: Cyril Chemparathy 
---
 arch/arm/include/asm/memory.h |   22 ++
 arch/arm/kernel/head.S|4 
 arch/arm/kernel/setup.c   |2 +-
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 81e1714..dc5fbf3 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -154,14 +154,28 @@
 #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
 
 extern unsigned long   __pv_offset;
-extern unsigned long   __pv_phys_offset;
+extern phys_addr_t __pv_phys_offset;
 #define PHYS_OFFSET__virt_to_phys(PAGE_OFFSET)
 
 static inline phys_addr_t __virt_to_phys(unsigned long x)
 {
-   unsigned long t;
-   early_patch_imm8("add", t, x, __pv_offset, 0);
-   return t;
+   unsigned long tlo, thi;
+
+   early_patch_imm8("add", tlo, x, __pv_offset, 0);
+
+#ifdef CONFIG_ARM_LPAE
+   /*
+* On LPAE, we do not _need_ to do 64-bit arithmetic because the high
+* order 32 bits are never changed by the phys-virt offset.  We simply
+* patch in the high order physical address bits instead.
+*/
+#ifdef __ARMEB__
+   early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
+#else
+   early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
+#endif
+#endif
+   return (u64)tlo | (u64)thi << 32;
 }
 
 static inline unsigned long __phys_to_virt(phys_addr_t x)
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 69a3c09..61fb8df 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -530,7 +530,11 @@ ENDPROC(__fixup_pv_offsets)
 
.align
 1: .long   .
+#if defined(CONFIG_ARM_LPAE) && defined(__ARMEB__)
+   .long   __pv_phys_offset + 4
+#else
.long   __pv_phys_offset
+#endif
.long   __pv_offset
.long   PAGE_OFFSET
 #endif
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 59e0f57..edb4f42 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -159,7 +159,7 @@ DEFINE_PER_CPU(struct cpuinfo_arm, cpu_data);
  * The initializers here prevent these from landing in the BSS section.
  */
 unsigned long __pv_offset = 0xdeadbeef;
-unsigned long __pv_phys_offset = 0xdeadbeef;
+phys_addr_t   __pv_phys_offset = 0xdeadbeef;
 EXPORT_SYMBOL(__pv_phys_offset);
 
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v2 20/22] ARM: keystone: introducing TI Keystone platform

2012-08-10 Thread Cyril Chemparathy
Texas Instruments Keystone family of multicore devices now includes an
upcoming slew of Cortex A15 based devices.  This patch adds basic definitions
for a new Keystone sub-architecture in ARM.

Subsequent patches in this series will extend support to include SMP and take
advantage of the large physical memory addressing capabilities via LPAE.

Signed-off-by: Vitaly Andrianov 
Signed-off-by: Cyril Chemparathy 
Reviewed-by: Arnd Bergmann 
---
 arch/arm/Kconfig  |   18 +
 arch/arm/Makefile |1 +
 arch/arm/boot/dts/keystone-sim.dts|   77 +++
 arch/arm/configs/keystone_defconfig   |   20 +
 arch/arm/mach-keystone/Makefile   |1 +
 arch/arm/mach-keystone/Makefile.boot  |1 +
 arch/arm/mach-keystone/include/mach/debug-macro.S |   44 +++
 arch/arm/mach-keystone/include/mach/memory.h  |   22 ++
 arch/arm/mach-keystone/include/mach/timex.h   |   21 ++
 arch/arm/mach-keystone/include/mach/uncompress.h  |   24 ++
 arch/arm/mach-keystone/keystone.c |   82 +
 11 files changed, 311 insertions(+)
 create mode 100644 arch/arm/boot/dts/keystone-sim.dts
 create mode 100644 arch/arm/configs/keystone_defconfig
 create mode 100644 arch/arm/mach-keystone/Makefile
 create mode 100644 arch/arm/mach-keystone/Makefile.boot
 create mode 100644 arch/arm/mach-keystone/include/mach/debug-macro.S
 create mode 100644 arch/arm/mach-keystone/include/mach/memory.h
 create mode 100644 arch/arm/mach-keystone/include/mach/timex.h
 create mode 100644 arch/arm/mach-keystone/include/mach/uncompress.h
 create mode 100644 arch/arm/mach-keystone/keystone.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9ac86ea..f1b8aa0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -402,6 +402,24 @@ config ARCH_HIGHBANK
help
  Support for the Calxeda Highbank SoC based boards.
 
+config ARCH_KEYSTONE
+   bool "Texas Instruments Keystone Devices"
+   select ARCH_WANT_OPTIONAL_GPIOLIB
+   select ARM_GIC
+   select MULTI_IRQ_HANDLER
+   select CLKDEV_LOOKUP
+   select COMMON_CLK
+   select CLKSRC_MMIO
+   select CPU_V7
+   select GENERIC_CLOCKEVENTS
+   select USE_OF
+   select SPARSE_IRQ
+   select NEED_MACH_MEMORY_H
+   select HAVE_SCHED_CLOCK
+   help
+ Support for boards based on the Texas Instruments Keystone family of
+ SoCs.
+
 config ARCH_CLPS711X
bool "Cirrus Logic CLPS711x/EP721x/EP731x-based"
select CPU_ARM720T
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30eae87..75b5b79 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -146,6 +146,7 @@ machine-$(CONFIG_ARCH_EP93XX)   := ep93xx
 machine-$(CONFIG_ARCH_GEMINI)  := gemini
 machine-$(CONFIG_ARCH_H720X)   := h720x
 machine-$(CONFIG_ARCH_HIGHBANK):= highbank
+machine-$(CONFIG_ARCH_KEYSTONE):= keystone
 machine-$(CONFIG_ARCH_INTEGRATOR)  := integrator
 machine-$(CONFIG_ARCH_IOP13XX) := iop13xx
 machine-$(CONFIG_ARCH_IOP32X)  := iop32x
diff --git a/arch/arm/boot/dts/keystone-sim.dts 
b/arch/arm/boot/dts/keystone-sim.dts
new file mode 100644
index 000..acec30f8
--- /dev/null
+++ b/arch/arm/boot/dts/keystone-sim.dts
@@ -0,0 +1,77 @@
+/dts-v1/;
+/include/ "skeleton.dtsi"
+
+/ {
+   model = "Texas Instruments Keystone 2 SoC";
+   compatible = "ti,keystone-evm";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   interrupt-parent = <>;
+
+   aliases {
+   serial0 = 
+   };
+
+   chosen {
+   bootargs = "console=ttyS0,115200n8 debug earlyprintk lpj=5 
rdinit=/bin/ash rw root=/dev/ram0 initrd=0x8500,2M";
+   };
+
+   memory {
+   reg = <0x8000 0x800>;
+   };
+
+   cpus {
+   interrupt-parent = <>;
+
+   cpu@0 {
+   compatible = "arm,cortex-a15";
+   };
+
+   cpu@1 {
+   compatible = "arm,cortex-a15";
+   };
+
+   cpu@2 {
+   compatible = "arm,cortex-a15";
+   };
+
+   cpu@3 {
+   compatible = "arm,cortex-a15";
+   };
+
+   };
+
+   soc {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+   compatible = "ti,keystone","simple-bus";
+   interrupt-parent = <>;
+
+   gic:interrupt-controller@0256 {
+   compatible = "arm,cortex-a15-gic";
+   #interrupt-cells = <3>;
+   #size-cells = <0>;
+   #address-cells = <1>;
+   interrupt-controller;
+   reg = <0x02561000 0x1000>,
+ 

[PATCH v2 10/22] ARM: LPAE: use phys_addr_t in switch_mm()

2012-08-10 Thread Cyril Chemparathy
This patch modifies the switch_mm() processor functions to use phys_addr_t.
On LPAE systems, we now honor the upper 32-bits of the physical address that
is being passed in, and program these into TTBR as expected.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/proc-fns.h |4 ++--
 arch/arm/mm/proc-v7-3level.S|   26 ++
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/proc-fns.h b/arch/arm/include/asm/proc-fns.h
index f3628fb..75b5f14 100644
--- a/arch/arm/include/asm/proc-fns.h
+++ b/arch/arm/include/asm/proc-fns.h
@@ -60,7 +60,7 @@ extern struct processor {
/*
 * Set the page table
 */
-   void (*switch_mm)(unsigned long pgd_phys, struct mm_struct *mm);
+   void (*switch_mm)(phys_addr_t pgd_phys, struct mm_struct *mm);
/*
 * Set a possibly extended PTE.  Non-extended PTEs should
 * ignore 'ext'.
@@ -82,7 +82,7 @@ extern void cpu_proc_init(void);
 extern void cpu_proc_fin(void);
 extern int cpu_do_idle(void);
 extern void cpu_dcache_clean_area(void *, int);
-extern void cpu_do_switch_mm(unsigned long pgd_phys, struct mm_struct *mm);
+extern void cpu_do_switch_mm(phys_addr_t pgd_phys, struct mm_struct *mm);
 #ifdef CONFIG_ARM_LPAE
 extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte);
 #else
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 8de0f1d..78bd88c 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -39,6 +39,22 @@
 #define TTB_FLAGS_SMP  (TTB_IRGN_WBWA|TTB_S|TTB_RGN_OC_WBWA)
 #define PMD_FLAGS_SMP  (PMD_SECT_WBWA|PMD_SECT_S)
 
+#define rzero  r3
+#ifndef CONFIG_ARM_LPAE
+#  define rpgdlr0
+#  define rpgdhrzero
+#  define rmm  r1
+#else
+#  define rmm  r2
+#ifndef __ARMEB__
+#  define rpgdlr0
+#  define rpgdhr1
+#else
+#  define rpgdlr1
+#  define rpgdhr0
+#endif
+#endif
+
 /*
  * cpu_v7_switch_mm(pgd_phys, tsk)
  *
@@ -47,10 +63,12 @@
  */
 ENTRY(cpu_v7_switch_mm)
 #ifdef CONFIG_MMU
-   ldr r1, [r1, #MM_CONTEXT_ID]@ get mm->context.id
-   and r3, r1, #0xff
-   mov r3, r3, lsl #(48 - 32)  @ ASID
-   mcrrp15, 0, r0, r3, c2  @ set TTB 0
+   mov rzero, #0
+   ldr rmm, [rmm, #MM_CONTEXT_ID]  @ get mm->context.id
+   and rmm, rmm, #0xff
+   mov rmm, rmm, lsl #(48 - 32)@ ASID
+   orr rpgdh, rpgdh, rmm   @ upper 32-bits of pgd phys
+   mcrrp15, 0, rpgdl, rpgdh, c2@ set TTB 0
isb
 #endif
mov pc, lr
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 12/22] ARM: LPAE: define ARCH_LOW_ADDRESS_LIMIT for bootmem

2012-08-10 Thread Cyril Chemparathy
This patch adds an architecture defined override for ARCH_LOW_ADDRESS_LIMIT.
On PAE systems, the absence of this override causes bootmem to incorrectly
limit itself to 32-bit addressable physical memory.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/memory.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index dc5fbf3..64db5a4 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -292,6 +292,8 @@ static inline __deprecated void *bus_to_virt(unsigned long 
x)
 #define arch_is_coherent() 0
 #endif
 
+#define ARCH_LOW_ADDRESS_LIMIT PHYS_MASK
+
 #endif
 
 #include 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 17/22] ARM: mm: clean up membank size limit checks

2012-08-10 Thread Cyril Chemparathy
This patch cleans up the highmem sanity check code by simplifying the range
checks with a pre-calculated size_limit.  This patch should otherwise have no
functional impact on behavior.

This patch also removes a redundant (bank->start < vmalloc_limit) check, since
this is already covered by the !highmem condition.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/mm/mmu.c |   19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 3d685c6..662684b 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -899,10 +899,15 @@ void __init sanity_check_meminfo(void)
 
for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
struct membank *bank = [j];
+   phys_addr_t size_limit;
+
*bank = meminfo.bank[i];
+   size_limit = bank->size;
 
if (bank->start >= vmalloc_limit)
highmem = 1;
+   else
+   size_limit = vmalloc_limit - bank->start;
 
bank->highmem = highmem;
 
@@ -911,8 +916,7 @@ void __init sanity_check_meminfo(void)
 * Split those memory banks which are partially overlapping
 * the vmalloc area greatly simplifying things later.
 */
-   if (!highmem && bank->start < vmalloc_limit &&
-   bank->size > vmalloc_limit - bank->start) {
+   if (!highmem && bank->size > size_limit) {
if (meminfo.nr_banks >= NR_BANKS) {
printk(KERN_CRIT "NR_BANKS too low, "
 "ignoring high memory\n");
@@ -921,12 +925,12 @@ void __init sanity_check_meminfo(void)
(meminfo.nr_banks - i) * sizeof(*bank));
meminfo.nr_banks++;
i++;
-   bank[1].size -= vmalloc_limit - bank->start;
+   bank[1].size -= size_limit;
bank[1].start = vmalloc_limit;
bank[1].highmem = highmem = 1;
j++;
}
-   bank->size = vmalloc_limit - bank->start;
+   bank->size = size_limit;
}
 #else
/*
@@ -944,14 +948,13 @@ void __init sanity_check_meminfo(void)
 * Check whether this memory bank would partially overlap
 * the vmalloc area.
 */
-   if (bank->start + bank->size > vmalloc_limit)
-   unsigned long newsize = vmalloc_limit - bank->start;
+   if (bank->size > size_limit) {
printk(KERN_NOTICE "Truncating RAM at %.8llx-%.8llx "
   "to -%.8llx (vmalloc region overlap).\n",
   (unsigned long long)bank->start,
   (unsigned long long)bank->start + bank->size - 1,
-  (unsigned long long)bank->start + newsize - 1);
-   bank->size = newsize;
+  (unsigned long long)bank->start + size_limit - 
1);
+   bank->size = size_limit;
}
 #endif
if (!bank->highmem && bank->start + bank->size > 
arm_lowmem_limit)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 14/22] ARM: LPAE: accomodate >32-bit addresses for page table base

2012-08-10 Thread Cyril Chemparathy
This patch redefines the early boot time use of the R4 register to steal a few
low order bits (ARCH_PGD_SHIFT bits) on LPAE systems.  This allows for up to
38-bit physical addresses.

Signed-off-by: Cyril Chemparathy 
Signed-off-by: Vitaly Andrianov 
---
 arch/arm/include/asm/memory.h |   15 +++
 arch/arm/kernel/head.S|   10 --
 arch/arm/kernel/smp.c |   11 +--
 arch/arm/mm/proc-v7-3level.S  |8 
 4 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 64db5a4..e5d0cc8 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #ifdef CONFIG_NEED_MACH_MEMORY_H
@@ -143,6 +144,20 @@
 #define page_to_phys(page) (__pfn_to_phys(page_to_pfn(page)))
 #define phys_to_page(phys) (pfn_to_page(__phys_to_pfn(phys)))
 
+/*
+ * Minimum guaranted alignment in pgd_alloc().  The page table pointers passed
+ * around in head.S and proc-*.S are shifted by this amount, in order to
+ * leave spare high bits for systems with physical address extension.  This
+ * does not fully accomodate the 40-bit addressing capability of ARM LPAE, but
+ * gives us about 38-bits or so.
+ */
+#ifdef CONFIG_ARM_LPAE
+#define ARCH_PGD_SHIFT L1_CACHE_SHIFT
+#else
+#define ARCH_PGD_SHIFT 0
+#endif
+#define ARCH_PGD_MASK  ((1 << ARCH_PGD_SHIFT) - 1)
+
 #ifndef __ASSEMBLY__
 
 /*
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index 61fb8df..9664db0 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -152,7 +152,7 @@ ENDPROC(stext)
  *
  * Returns:
  *  r0, r3, r5-r7 corrupted
- *  r4 = physical page table address
+ *  r4 = page table (see ARCH_PGD_SHIFT in asm/memory.h)
  */
 __create_page_tables:
pgtbl   r4, r8  @ page table address
@@ -306,6 +306,7 @@ __create_page_tables:
 #endif
 #ifdef CONFIG_ARM_LPAE
sub r4, r4, #0x1000 @ point to the PGD table
+   mov r4, r4, lsr #ARCH_PGD_SHIFT
 #endif
mov pc, lr
 ENDPROC(__create_page_tables)
@@ -379,7 +380,7 @@ __secondary_data:
  *  r0  = cp#15 control register
  *  r1  = machine ID
  *  r2  = atags or dtb pointer
- *  r4  = page table pointer
+ *  r4  = page table (see ARCH_PGD_SHIFT in asm/memory.h)
  *  r9  = processor ID
  *  r13 = *virtual* address to jump to upon completion
  */
@@ -398,10 +399,7 @@ __enable_mmu:
 #ifdef CONFIG_CPU_ICACHE_DISABLE
bic r0, r0, #CR_I
 #endif
-#ifdef CONFIG_ARM_LPAE
-   mov r5, #0
-   mcrrp15, 0, r4, r5, c2  @ load TTBR0
-#else
+#ifndef CONFIG_ARM_LPAE
mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
  domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
  domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index ebd8ad2..9831716 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -60,6 +60,13 @@ enum ipi_msg_type {
 
 static DECLARE_COMPLETION(cpu_running);
 
+static unsigned long get_arch_pgd(pgd_t *pgd)
+{
+   phys_addr_t pgdir = virt_to_phys(pgd);
+   BUG_ON(pgdir & ARCH_PGD_MASK);
+   return pgdir >> ARCH_PGD_SHIFT;
+}
+
 int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *idle)
 {
int ret;
@@ -69,8 +76,8 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct 
*idle)
 * its stack and the page tables.
 */
secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
-   secondary_data.pgdir = virt_to_phys(idmap_pgd);
-   secondary_data.swapper_pg_dir = virt_to_phys(swapper_pg_dir);
+   secondary_data.pgdir = get_arch_pgd(idmap_pgd);
+   secondary_data.swapper_pg_dir = get_arch_pgd(swapper_pg_dir);
__cpuc_flush_dcache_area(_data, sizeof(secondary_data));
outer_clean_range(__pa(_data), __pa(_data + 1));
 
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index e28383f..6fa0444 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -120,6 +120,7 @@ ENDPROC(cpu_v7_set_pte_ext)
 */
.macro  v7_ttb_setup, zero, ttbr0, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
+   mov \tmp, \tmp, lsr #ARCH_PGD_SHIFT
cmp \ttbr1, \tmp@ PHYS_OFFSET > PAGE_OFFSET? 
(branch below)
mrc p15, 0, \tmp, c2, c0, 2 @ TTB control register
orr \tmp, \tmp, #TTB_EAE
@@ -139,8 +140,15 @@ ENDPROC(cpu_v7_set_pte_ext)
 */
orrls   \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ
mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR
+   mov \tmp, \ttbr1, lsr #(32 - ARCH_PGD_SHIFT)@ upper bits
+   mov \ttbr1, \ttbr1, lsl #ARCH_PGD_SHIFT @ lower bits

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Hello,

On Fri, Aug 10, 2012 at 04:13:03PM -0700, Andi Kleen wrote:
> Naoya Horiguchi  writes:
> 
> > Current error reporting of memory errors on dirty pagecache has silent
> > data lost problem because AS_EIO in struct address_space is cleared
> > once checked.
> 
> Seems very complicated.  I think I would prefer something simpler
> if possible, especially unless it's proven the case is common.
> It's hard to maintain rarely used error code when it's complicated.

I'm not sure if memory error is a rare event, because I don't have
any numbers about that on real systems. But assuming that hwpoison
events are not rare, dirty pagecache error is not an ignorable case
because dirty page ratio is typically ~10% of total physical memory
in average systems. It may be small but not negligible.

> Maybe try Fengguang's simple proposal first? That would fix other IO
> errors too.

In my understanding, Fengguang's patch (specified in this patch's
description) only fixes memory error reporting. And I'm not sure
that similar appoarch (like making AS_EIO sticky) really fixes
the IO errors because this change can break userspace applications
which expect the current behavior.

Anyway, OK, I agree to start with Fengguang's one and separate
out the additional suggestion about "making dirty pagecache error
recoverable". And if possible, I want your feedback about the
additional part of my idea. Can I ask a favor?

Thanks,
Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] HWPOISON: undo memory error handling for dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Hi Andi,

On Fri, Aug 10, 2012 at 04:09:48PM -0700, Andi Kleen wrote:
> Naoya Horiguchi  writes:
> 
> > Current memory error handling on dirty pagecache has a bug that user
> > processes who use corrupted pages via read() or write() can't be aware
> > of the memory error and result in discarding dirty data silently.
> >
> > The following patch is to improve handling/reporting memory errors on
> > this case, but as a short term solution I suggest that we should undo
> > the present error handling code and just leave errors for such cases
> > (which expect the 2nd MCE to panic the system) to ensure data consistency.
> 
> Not sure that's the right approach. It's not worse than any other IO 
> errors isn't it? 

Right, in current situation both memory errors and other IO errors have
the possibility of data lost in the same manner.
I thought that in real mission critical system (for which I think
HWPOISON feature is targeted) closing dangerous path is better than
keeping waiting for someone to solve the problem in more generic manner.

But if we start with Fengguang's approach at first as you replied to
patch 3, this patch is not necessary.

Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 3.2.27

2012-08-10 Thread Ben Hutchings
I'm announcing the release of the 3.2.27 kernel.

All users of the 3.2 kernel series should upgrade.

The updated 3.2.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-3.2.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git

Ben.



 Documentation/sound/alsa/HD-Audio-Models.txt |3 +-
 Documentation/stable_kernel_rules.txt|   19 +-
 Makefile |2 +-
 arch/arm/include/asm/mutex.h |  119 +--
 arch/arm/kernel/entry-armv.S |  111 +++
 arch/arm/kernel/process.c|2 +
 arch/arm/kernel/traps.c  |8 -
 arch/arm/mm/tlb-v7.S |   12 ++
 arch/arm/vfp/entry.S |   16 +-
 arch/arm/vfp/vfphw.S |   19 +-
 arch/arm/vfp/vfpmodule.c |8 +-
 arch/ia64/include/asm/atomic.h   |4 +-
 arch/m68k/include/asm/entry.h|4 +-
 arch/m68k/kernel/sys_m68k.c  |8 +-
 arch/s390/include/asm/mmu_context.h  |   14 +-
 arch/s390/include/asm/processor.h|2 +
 arch/s390/mm/fault.c |   13 +-
 arch/s390/mm/mmap.c  |   12 +-
 arch/s390/mm/pgtable.c   |5 -
 arch/x86/kernel/alternative.c|2 +-
 arch/x86/xen/p2m.c   |   36 
 drivers/block/floppy.c   |8 +-
 drivers/block/virtio_blk.c   |9 +-
 drivers/char/mspec.c |2 +-
 drivers/char/random.c|  273 +++---
 drivers/firmware/pcdp.c  |4 +-
 drivers/gpu/drm/i915/intel_dp.c  |4 +-
 drivers/input/mouse/synaptics.c  |   23 +++
 drivers/md/dm-thin.c |7 +-
 drivers/md/raid1.c   |5 +-
 drivers/media/rc/ene_ir.c|3 +-
 drivers/mfd/ab3100-core.c|2 -
 drivers/mfd/wm831x-otp.c |8 +
 drivers/net/wireless/rt2x00/rt2800usb.c  |1 +
 drivers/platform/x86/asus-wmi.c  |7 +-
 drivers/rtc/rtc-wm831x.c |   24 ++-
 drivers/staging/media/lirc/lirc_sir.c|   60 +-
 drivers/tty/serial/pch_uart.c|   21 +-
 drivers/usb/core/hub.c   |9 +
 drivers/usb/early/ehci-dbgp.c|2 +-
 drivers/video/smscufx.c  |2 +-
 fs/exofs/ore.c   |   14 +-
 fs/nfs/file.c|7 +-
 fs/nfsd/nfs4xdr.c|2 +-
 fs/nilfs2/ioctl.c|4 +-
 fs/nilfs2/super.c|3 +
 fs/nilfs2/the_nilfs.c|1 +
 fs/nilfs2/the_nilfs.h|2 +
 include/linux/hugetlb.h  |   10 +
 include/linux/init_task.h|   12 +-
 include/linux/random.h   |4 +-
 include/linux/sched.h|5 +-
 kernel/futex.c   |   17 +-
 kernel/irq/handle.c  |7 +-
 kernel/sched.c   |   32 +--
 lib/vsprintf.c   |3 +-
 mm/hugetlb.c |   28 ++-
 mm/internal.h|2 +
 mm/memory.c  |7 +-
 mm/mmu_notifier.c|   45 ++---
 mm/page_alloc.c  |   33 ++--
 mm/sparse.c  |3 +
 net/core/dev.c   |3 +
 net/core/drop_monitor.c  |  113 ++-
 net/core/rtnetlink.c |1 +
 net/sunrpc/rpcb_clnt.c   |4 +-
 net/sunrpc/sched.c   |2 +
 net/sunrpc/xprtrdma/transport.c  |3 +-
 net/sunrpc/xprtsock.c|   10 +
 net/wireless/util.c  |3 +
 sound/drivers/mpu401/mpu401_uart.c   |1 +
 sound/pci/hda/patch_realtek.c|   28 +++
 sound/pci/hda/patch_via.c|7 +-
 sound/soc/codecs/wm8962.c|3 +
 sound/soc/codecs/wm8994.c|2 +-
 sound/usb/clock.c|3 +-
 76 files changed, 862 insertions(+), 455 deletions(-)

Alan Cox (2):
  x86, nops: Missing break resulting in incorrect selection on Intel
  pch_uart: Fix missing break for 16 byte fifo

Alasdair G Kergon (1):
  dm thin: reduce endio_hook pool size

Alex Hung (1):
  asus-wmi: use 

Re: [PATCH] xconfig: Display dependency values in debug_info

2012-08-10 Thread Randy Dunlap
On 08/09/2012 11:54 AM, Salar Ali Mumtaz wrote:

> On 12-08-07 12:55 PM, Randy Dunlap wrote:
> 
>> In Kconfig language, is "" the same as 'n' ?
>> If so, I'm OK with your proposal above.
>>
> 
> 
> So a colleague of mine tested this and came up with a conclusion that 
> expressions in Kconfig can only deal with boolean or tristate operands and no 
> casting is made with strings. Using any string operand as part of a boolean 
> expression is simply a type error and Kconfig probably marks such operand as 
> 'n', regardless of its value.
> 


My question was about FRV, which is a boolean,
but still did not display as 'n' in your example.


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, build: Globally set -fno-pic

2012-08-10 Thread tip-bot for Andrew Boie
Commit-ID:  484d90eec884d814b005c9736bcf3fd018acba65
Gitweb: http://git.kernel.org/tip/484d90eec884d814b005c9736bcf3fd018acba65
Author: Andrew Boie 
AuthorDate: Fri, 10 Aug 2012 11:49:06 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 10 Aug 2012 16:12:30 -0700

x86, build: Globally set -fno-pic

GCC built with nonstandard options can enable -fpic by default.
We never want this for 32-bit kernels and it will break the build.

[ hpa: Notably the Android toolchain apparently does this. ]

Change-Id: Iaab7d66e598b1c65ac4a4f0229eca2cd3d0d2898
Signed-off-by: Andrew Boie 
Link: 
http://lkml.kernel.org/r/1344624546-29691-1-git-send-email-andrew.p.b...@intel.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/Makefile |4 
 arch/x86/boot/Makefile|2 +-
 arch/x86/realmode/rm/Makefile |2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b0c5276..682e9c2 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -27,6 +27,10 @@ ifeq ($(CONFIG_X86_32),y)
 
 KBUILD_CFLAGS += -msoft-float -mregparm=3 -freg-struct-return
 
+# Never want PIC in a 32-bit kernel, prevent breakage with GCC built
+# with nonstandard options
+KBUILD_CFLAGS += -fno-pic
+
 # prevent gcc from keeping the stack 16 byte aligned
 KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)
 
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 5a747dd..f7535be 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -57,7 +57,7 @@ KBUILD_CFLAGS := $(LINUXINCLUDE) -g -Os -D_SETUP -D__KERNEL__ 
\
   -Wall -Wstrict-prototypes \
   -march=i386 -mregparm=3 \
   -include $(srctree)/$(src)/code16gcc.h \
-  -fno-strict-aliasing -fomit-frame-pointer \
+  -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
   $(call cc-option, -ffreestanding) \
   $(call cc-option, -fno-toplevel-reorder,\
$(call cc-option, -fno-unit-at-a-time)) \
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index b2d534c..8869287 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -72,7 +72,7 @@ KBUILD_CFLAGS := $(LINUXINCLUDE) -m32 -g -Os -D_SETUP 
-D__KERNEL__ -D_WAKEUP \
   -Wall -Wstrict-prototypes \
   -march=i386 -mregparm=3 \
   -include $(srctree)/$(src)/../../boot/code16gcc.h \
-  -fno-strict-aliasing -fomit-frame-pointer \
+  -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
   $(call cc-option, -ffreestanding) \
   $(call cc-option, -fno-toplevel-reorder,\
$(call cc-option, -fno-unit-at-a-time)) \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

2012-08-10 Thread H. Peter Anvin
On 08/09/2012 03:34 PM, Betty Dall wrote:
> 
> I thought this should be a break instead of a continue since the code
> does a break if the class is 0x. If the function does not have a
> valid VENDOR_ID, then the remaining function numbers do not have to be
> scanned because functions are required to be implemented in order (no
> skipping a function number.)
> 

Is that true?  This is certainly not true in PCI in general: there is
required to be a function 0, but there is no guarantee that functions
1-7 don't have gaps.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ARM: local timers: add timer support using IO mapped register

2012-08-10 Thread Rohit Vaswani

Thanks for your feedback Rob.

On 8/10/2012 3:10 PM, Rob Herring wrote:

On 08/10/2012 04:58 PM, Rohit Vaswani wrote:

The current arch_timer only support accessing through CP15 interface.
Add support for ARM processors that only support IO mapped register
interface

Signed-off-by: Rohit Vaswani 
---
  .../devicetree/bindings/arm/arch_timer.txt |7 +
  arch/arm/kernel/arch_timer.c   |  259 
  2 files changed, 223 insertions(+), 43 deletions(-)

The original file is 360 lines. It doesn't really seem like there's a
lot of overlap and I wonder if it is worth the extra overhead.


diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt 
b/Documentation/devicetree/bindings/arm/arch_timer.txt
index 52478c8..1c71799 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -14,6 +14,13 @@ The timer is attached to a GIC to deliver its per-processor 
interrupts.
  
  - clock-frequency : The frequency of the main counter, in Hz. Optional.
  
+- irq-is-not-percpu: Specify is the timer irq is *NOT* a percpu (PPI) interrupt

+  In the default case i.e without this property, the timer irq is treated as a
+  PPI interrupt. Optional.

The first field in the gic interrupts binding already defines this.
Is there a generic way to extract that information from the interrupts 
binding. I saw Chris Smith's patch that adds irq_is_per_cpu function. 
Perhaps we can use that once it is merged ?



+
+- If the node address and reg is specified, the arch_timer will try to use the 
memory
+  mapped timer. Optional.

This timer is fundamentally different h/w. You need a new compatible string.
I think that the timer is the same, but it just has a different 
interface. Do you still think we need a new compatible string ?



+
  Example:
  
  	timer {

diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c
index 1d0d9df..09604b7 100644
--- a/arch/arm/kernel/arch_timer.c
+++ b/arch/arm/kernel/arch_timer.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  #include 

@@ -29,8 +30,17 @@
  static unsigned long arch_timer_rate;
  static int arch_timer_ppi;
  static int arch_timer_ppi2;
+static int is_irq_percpu;
  
  static struct clock_event_device __percpu **arch_timer_evt;

+static void __iomem *timer_base;
+
+struct arch_timer_operations {
+   void (*reg_write)(int, u32);
+   u32 (*reg_read)(int);
+   cycle_t (*get_cntpct)(void);
+   cycle_t (*get_cntvct)(void);
+};
  
  /*

   * Architected system timer support.
@@ -44,7 +54,29 @@ static struct clock_event_device __percpu **arch_timer_evt;
  #define ARCH_TIMER_REG_FREQ   1
  #define ARCH_TIMER_REG_TVAL   2
  
-static void arch_timer_reg_write(int reg, u32 val)

+/* Iomapped Register Offsets */
+#define ARCH_TIMER_CNTP_LOW_REG0x000
+#define ARCH_TIMER_CNTP_HIGH_REG   0x004
+#define ARCH_TIMER_CNTV_LOW_REG0x008
+#define ARCH_TIMER_CNTV_HIGH_REG   0x00C
+#define ARCH_TIMER_CTRL_REG0x02C
+#define ARCH_TIMER_FREQ_REG0x010
+#define ARCH_TIMER_CNTP_TVAL_REG   0x028
+#define ARCH_TIMER_CNTV_TVAL_REG   0x038
+
+static void timer_reg_write_mem(int reg, u32 val)
+{
+   switch (reg) {
+   case ARCH_TIMER_REG_CTRL:
+   __raw_writel(val, timer_base + ARCH_TIMER_CTRL_REG);
+   break;
+   case ARCH_TIMER_REG_TVAL:
+   __raw_writel(val, timer_base + ARCH_TIMER_CNTP_TVAL_REG);
+   break;

This whole function seems a bit pointless as it only adds timer_base.

Rob
I tried to the keep the functions similar to the cp15 interface ones. Is 
there something else you suggest doing ?


Thanks,
Rohit Vaswani

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pull request: wireless 2012-08-10

2012-08-10 Thread David Miller
From: "John W. Linville" 
Date: Fri, 10 Aug 2012 14:33:51 -0400

> Here is a handful of fixes intended for 3.6.
> 
> Daniel Drake offers a cfg80211 fix to consume pending events before
> taking a wireless device down.  This prevents a resource leak.
> 
> Stanislaw Gruszka gives us a fix for a NULL pointer dereference in
> rt61pci.
> 
> Johannes Berg provides an iwlwifi patch to disable "greenfield" mode.
> Use of that mode was causing a rate scaling problem in for iwlwifi.
> 
> Please let me know if there are problems!

Pulled, thanks John.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ARM: nomadik: configure Nomadik for pin control

2012-08-10 Thread Stephen Warren
On 08/09/2012 04:43 PM, Linus Walleij wrote:
> This converts the Nomadik to using pin control using the
> driver for the STN8815 ASIC.

> diff --git a/arch/arm/mach-nomadik/cpu-8815.c 
> b/arch/arm/mach-nomadik/cpu-8815.c

> +static inline void
> +cpu8815_add_pinctrl(struct device *parent, const char *name)
> +{
> + struct platform_device_info pdevinfo = {
> + .parent = parent,
> + .name = name,
> + .id = -1,
> + };
> +
> + platform_device_register_full();
> +}

Out of curiosity, why platform_device_register_full() not
platform_device_register() here?

Otherwise,
Acked-by: Stephen Warren 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] pinctrl/nomadik: add STn8815 ASIC support

2012-08-10 Thread Stephen Warren
On 08/09/2012 04:43 PM, Linus Walleij wrote:
> This adds support for the STN8815 ASIC for the Nomadik pin
> controller.

Acked-by: Stephen Warren 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Andi Kleen
Naoya Horiguchi  writes:

> Current error reporting of memory errors on dirty pagecache has silent
> data lost problem because AS_EIO in struct address_space is cleared
> once checked.

Seems very complicated.  I think I would prefer something simpler
if possible, especially unless it's proven the case is common.
It's hard to maintain rarely used error code when it's complicated.
Maybe try Fengguang's simple proposal first? That would fix other IO
errors too.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rtc: add MAX8907 RTC driver

2012-08-10 Thread Stephen Warren
From: Stephen Warren 

The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.

The driver is based on an original by or fixed by:
* Tom Cherry 
* Prashant Gaikwad 
* Joseph Yoon 

During upstreaming, I (swarren):
* Converted to regmap.
* Fixed handling of RTC_HOUR register containing 12.
* Fixed handling of RTC_WEEKDAY register.
* General cleanup.

Signed-off-by: Stephen Warren 
---
 drivers/rtc/Kconfig   |   10 ++
 drivers/rtc/Makefile  |1 +
 drivers/rtc/rtc-max8907.c |  245 +
 3 files changed, 256 insertions(+), 0 deletions(-)
 create mode 100644 drivers/rtc/rtc-max8907.c

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index fabc99a..c84f960 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -203,6 +203,16 @@ config RTC_DRV_MAX6900
  This driver can also be built as a module. If so, the module
  will be called rtc-max6900.
 
+config RTC_DRV_MAX8907
+   tristate "Maxim MAX8907"
+   depends on MFD_MAX8907
+   help
+ If you say yes here you will get support for the
+ RTC of Maxim MAX8907 PMIC.
+
+ This driver can also be built as a module. If so, the module
+ will be called rtc-max8907.
+
 config RTC_DRV_MAX8925
tristate "Maxim MAX8925"
depends on MFD_MAX8925
diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
index 0d5b2b6..a0b4fbe 100644
--- a/drivers/rtc/Makefile
+++ b/drivers/rtc/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_RTC_DRV_M48T59)  += rtc-m48t59.o
 obj-$(CONFIG_RTC_DRV_M48T86)   += rtc-m48t86.o
 obj-$(CONFIG_RTC_DRV_MXC)  += rtc-mxc.o
 obj-$(CONFIG_RTC_DRV_MAX6900)  += rtc-max6900.o
+obj-$(CONFIG_RTC_DRV_MAX8907)  += rtc-max8907.o
 obj-$(CONFIG_RTC_DRV_MAX8925)  += rtc-max8925.o
 obj-$(CONFIG_RTC_DRV_MAX8998)  += rtc-max8998.o
 obj-$(CONFIG_RTC_DRV_MAX6902)  += rtc-max6902.o
diff --git a/drivers/rtc/rtc-max8907.c b/drivers/rtc/rtc-max8907.c
new file mode 100644
index 000..4880374
--- /dev/null
+++ b/drivers/rtc/rtc-max8907.c
@@ -0,0 +1,245 @@
+/*
+ * RTC driver for Maxim MAX8907
+ *
+ * Copyright (c) 2011-2012, NVIDIA Corporation.
+ *
+ * Based on drivers/rtc/rtc-max8925.c,
+ * Copyright (C) 2009-2010 Marvell International Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum {
+   RTC_SEC = 0,
+   RTC_MIN,
+   RTC_HOUR,
+   RTC_WEEKDAY,
+   RTC_DATE,
+   RTC_MONTH,
+   RTC_YEAR1,
+   RTC_YEAR2,
+};
+
+#define TIME_NUM   8
+#define ALARM_1SEC (1 << 7)
+#define HOUR_12(1 << 7)
+#define HOUR_AM_PM (1 << 5)
+#define ALARM0_IRQ (1 << 3)
+#define ALARM1_IRQ (1 << 2)
+#define ALARM0_STATUS  (1 << 2)
+#define ALARM1_STATUS  (1 << 1)
+
+struct max8907_rtc {
+   struct max8907  *max8907;
+   struct regmap   *regmap;
+   struct rtc_device   *rtc_dev;
+   int irq;
+};
+
+static irqreturn_t max8907_irq_handler(int irq, void *data)
+{
+   struct max8907_rtc *rtc = data;
+
+   regmap_update_bits(rtc->regmap, MAX8907_REG_ALARM0_CNTL, 0x7f, 0);
+
+   rtc_update_irq(rtc->rtc_dev, 1, RTC_IRQF | RTC_AF);
+
+   return IRQ_HANDLED;
+}
+
+static void regs_to_tm(u8 *regs, struct rtc_time *tm)
+{
+   tm->tm_year = bcd2bin(regs[RTC_YEAR2]) * 100 +
+   bcd2bin(regs[RTC_YEAR1]) - 1900;
+   tm->tm_mon = bcd2bin(regs[RTC_MONTH] & 0x1f) - 1;
+   tm->tm_mday = bcd2bin(regs[RTC_DATE] & 0x3f);
+   tm->tm_wday = (regs[RTC_WEEKDAY] & 0x07) - 1;
+   if (regs[RTC_HOUR] & HOUR_12) {
+   tm->tm_hour = bcd2bin(regs[RTC_HOUR] & 0x01f);
+   if (tm->tm_hour == 12)
+   tm->tm_hour = 0;
+   if (regs[RTC_HOUR] & HOUR_AM_PM)
+   tm->tm_hour += 12;
+   } else {
+   tm->tm_hour = bcd2bin(regs[RTC_HOUR] & 0x03f);
+   }
+   tm->tm_min = bcd2bin(regs[RTC_MIN] & 0x7f);
+   tm->tm_sec = bcd2bin(regs[RTC_SEC] & 0x7f);
+}
+
+static void tm_to_regs(struct rtc_time *tm, u8 *regs)
+{
+   u8 high, low;
+
+   high = (tm->tm_year + 1900) / 100;
+   low = tm->tm_year % 100;
+   regs[RTC_YEAR2] = bin2bcd(high);
+   regs[RTC_YEAR1] = bin2bcd(low);
+   regs[RTC_MONTH] = bin2bcd(tm->tm_mon + 1);
+   regs[RTC_DATE] = bin2bcd(tm->tm_mday);
+   regs[RTC_WEEKDAY] = tm->tm_wday + 1;
+   regs[RTC_HOUR] = bin2bcd(tm->tm_hour);
+   regs[RTC_MIN] = bin2bcd(tm->tm_min);
+   regs[RTC_SEC] = bin2bcd(tm->tm_sec);
+}
+
+static int 

Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 12:28 PM, Alan Cox wrote:
> Explicitly for Linux yes - this is not generally true of the AF_UNIX
> socket domain and even the permissions aspect isn't guaranteed to be
> supported on some BSD environments !

Yes, but let's worry about what the Linux behavior should be.

> The name is however just a proxy for the socket itself. You don't even
> get a device node in the usual sense or the same inode in the file system
> space.


No, but it is looked up the same way any other inode is (the difference
between FIFOs and sockets is that sockets have separate connections,
which is also why open() on sockets would be nice.)

However, there is a fundamental difference between AF_UNIX sockets and
open(), and that is how the pathname is delivered.  It thus would make
more sense to provide the openat()-like information in struct
sockaddr_un, but that may be very hard to do in a sensible way.  In that
sense it perhaps would be cleaner to be able to do an open[at]() on the
socket node with O_PATH (perhaps there should be an O_SOCKET option,
even?) and pass the resulting file descriptor to bind() or connect().

-hpa



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] HWPOISON: undo memory error handling for dirty pagecache

2012-08-10 Thread Andi Kleen
Naoya Horiguchi  writes:

> Current memory error handling on dirty pagecache has a bug that user
> processes who use corrupted pages via read() or write() can't be aware
> of the memory error and result in discarding dirty data silently.
>
> The following patch is to improve handling/reporting memory errors on
> this case, but as a short term solution I suggest that we should undo
> the present error handling code and just leave errors for such cases
> (which expect the 2nd MCE to panic the system) to ensure data consistency.

Not sure that's the right approach. It's not worse than any other IO 
errors isn't it? 

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 07:44 PM, Mihai Moldovan wrote:
> Hm, OK.
>
> Well, I'm done now.
>
> bisect log:
>
> git bisect start
> # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
> git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610
> # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
> git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92
> # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of
> git://oss.sgi.com/xfs/xfs
> git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3
> # good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4
> # bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of
> git://oss.sgi.com/xfs/xfs
> git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10
> # good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag
> 'ktest-v3.5-spelling' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
> git bisect good 927ad551031798d4cba49766549600bbb33872d7
> # good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
> git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9
> # bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll->pll
> mode work
> git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a
> # bad: [8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8] drm/i915: Unconditionally
> initialise the interrupt workers
> git bisect bad 8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8
> # bad: [f637fde434c9e3687798730c7ddd367e93666013] drm/i915: inline
> enable/disable_irq into ring->get/put_irq
> git bisect bad f637fde434c9e3687798730c7ddd367e93666013
> # bad: [23e3f9b37e7368ee8530ba99907508363feebc14] drm/i915: check for disabled
> interrupts on ValleyView
> git bisect bad 23e3f9b37e7368ee8530ba99907508363feebc14
> # good: [8489731c9bd22c27ab17a2190cd7444604abf95f] drm/i915: move clflushing
> into shmem_pread
> git bisect good 8489731c9bd22c27ab17a2190cd7444604abf95f
> # good: [3bd7d90938f1fe77de5991dc4b727843c4980b2a] drm/i915/intel_i2c: 
> refactor
> using intel_gmbus_get_adapter
> git bisect good 3bd7d90938f1fe77de5991dc4b727843c4980b2a
> # bad: [57f350b6722f9569f407872f6ead56e2d221d98a] drm/i915: add DPIO support
> git bisect bad 57f350b6722f9569f407872f6ead56e2d221d98a
> # bad: [93e537a10f2c8c0f2e74409b6cb473fc221758fa] drm/i915: split LVDS update
> code out of i9xx_crtc_mode_set
> git bisect bad 93e537a10f2c8c0f2e74409b6cb473fc221758fa
> # bad: [f2c9677be3158c31ba19f527e2be0f7a519e19d1] drm/i915/intel_i2c: allocate
> gmbus array as part of drm_i915_private
> git bisect bad f2c9677be3158c31ba19f527e2be0f7a519e19d1
> # bad: [2ed06c93a1fce057808894d73167aae03c76deaf] drm/i915/intel_i2c: gmbus
> disabled and reserved ports are invalid
> git bisect bad 2ed06c93a1fce057808894d73167aae03c76deaf

Just to be safe, I also tested git HEAD (3.6.0-rc1-00209-gf62bf17), no dice 
either.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 1/3] HWPOISON: fix action_result() to print out dirty/clean

2012-08-10 Thread Andi Kleen
Naoya Horiguchi  writes:

> action_result() fails to print out "dirty" even if an error occurred on a
> dirty pagecache, because when we check PageDirty in action_result() it was
> cleared after page isolation even if it's dirty before error handling. This
> can break some applications that monitor this message, so should be fixed.
>
> There are several callers of action_result() except page_action(), but
> either of them are not for LRU pages but for free pages or kernel pages,
> so we don't have to consider dirty or not for them.

Looks good

Reviewed-by: Andi Kleen 


-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot

2012-08-10 Thread Justin Piszcz


-Original Message-
From: Justin Piszcz [mailto:jpis...@lucidpixels.com] 
Sent: Friday, August 10, 2012 5:46 PM
To: Jesper Juhl
Cc: linux-kernel@vger.kernel.org; a...@solarrain.com
Subject: Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot

On Fri, Aug 10, 2012 at 1:53 PM, Jesper Juhl  wrote:
> On Fri, 10 Aug 2012, Justin Piszcz wrote:
>
>> Hello,
>>
>> Motherboard: Supermicro X8DTH-6F
>> Distro: Debian Testing x86_64
>>
>> >From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and
the
>> machine attempts to boot but hangs at the filesystem mounting part of the
>> boot process.

Hi,

Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem
(60TB).

The 3.4 kernel works fine.

This is proven by commenting out the filesystem in /etc/fstab with
3.5.1, and all is OK.

--

Hi again,

I tested with linux-3.6-rc1:

The same problem, here is what I get from the strace:

irectory)
4434  readlink("/dev", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument)
4434  readlink("/dev/sda1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid
argument)
4434  readlink("/r1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument)
4434  getuid()  = 0
4434  geteuid() = 0
4434  getgid()  = 0
4434  getegid() = 0
4434  prctl(PR_GET_DUMPABLE)= 1
4434  lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0
4434  getuid()  = 0
4434  geteuid() = 0
4434  getgid()  = 0
4434  getegid() = 0
4434  prctl(PR_GET_DUMPABLE)= 1
4434  stat("/run", {st_mode=S_IFDIR|0755, st_size=820, ...}) = 0
4434  lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
4434  open("/run/mount/utab", O_RDWR|O_CREAT, 0644) = 3
4434  close(3)  = 0
4434  mount("/dev/sda1", "/r1", "ext4", MS_MGC_VAL|MS_NOATIME, NULL

--

(w/ 3.6-rc1) 

[   89.868843] mount   R  running task0  4434   4433
0x0009
[   89.868847]  880c246b7b68 816c9279 880c246b7aa8
880c246b7fd8
[   89.868851]  880c246b7fd8 4000 88062720cdb0
880c246862d0
[   89.868855]  000116c0 880623a863c0 880623a863c0

[   89.868855] Call Trace:
[   89.868858]  [] ? __schedule+0x299/0x770
[   89.868860]  [] ? __schedule+0x299/0x770
[   89.868864]  [] ? ext4_get_group_desc+0x49/0xb0
[   89.868868]  [] ? ext4_calculate_overhead+0x131/0x3e0
[   89.868871]  [] ? ext4_fill_super+0x1a4b/0x28d0
[   89.868875]  [] ? mount_bdev+0x1a1/0x1e0
[   89.868877]  [] ? ext4_calculate_overhead+0x3e0/0x3e0
[   89.868880]  [] ? ext4_mount+0x10/0x20
[   89.868882]  [] ? mount_fs+0x1b/0xd0
[   89.868885]  [] ? vfs_kern_mount+0x6f/0x110
[   89.86]  [] ? do_kern_mount+0x4f/0x100
[   89.868890]  [] ? do_mount+0x2fe/0x8a0
[   89.868894]  [] ? strndup_user+0x53/0x70
[   89.868896]  [] ? sys_mount+0x90/0xe0
[   89.868899]  [] ? tracesys+0xd4/0xd9

Justin.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


dma mapping error check analysis

2012-08-10 Thread Shuah Khan
I analyzed current calls to dma_map_single() and dma_map_page() in the kernel
to see if dma mapping errors are checked after mapping routines return.

Reference linux-next August 6 2012.

This analysis stemmed from the discussion on my patch that disables swiotlb
overflow as a first step towards removing the support all together. Please
refer to thread below:

https://lkml.org/lkml/2012/7/24/391

The goal of this analysis is to find drivers that don't currently check dma
mapping errors and fix them. I did a grep for dma_map_single() and
dma_map_page() and looked at the code that calls these routines. I classified
the results of dma mapping error check status as follows:

Broken:
1. No error checks
2. Partial checks - In that source file, not all calls are followed by checks.
3. Checks dma mapping errors, doesn't unmap already mapped pages when mapping
   error occurs in the middle of a multiple mapping attempt.

The first two categories are classified as broken and need fixing.

The third one needs fixing, since it leaves dangling mapped pages, and holds
on to them which is equivalent to memory leak. Some drivers release all mapped
pages when the device closes, but others don't. Not doing unmap might be
harmless on some architectures going by the comments I found in some source
files.

Good:
1. Checks dma mapping errors and unmaps already mapped pages when mapping
   error occurs in the middle of a multiple mapping attempt.
2. Checks dma mapping errors without unlikely()
3. Checks dma mapping errors with unlikely()

I lumped the above three cases as good cases. Using unlikely() is icing on the
cake, and something we need to be concerned about compared to other problems in
this area.

- dmap_map_single() - results
No error checks - 195 (46%)
Partial checks - 46 (11%)
Doesn't unmap: 26 (6%)
Good: 147 (35%)

- dma_map_page() - results
No error checks: 61 (59%)
Partial checks: 7 (.06%)
Doesn't unmap: 15 (14.5%)
Good: 20 (19%)

In summary a large % of the cases (> 50%) go unchecked. That raises the
following questions:

When do mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Based on what I found, I am not too eager to remove swiotlb overflow support
which would increase the probability of returning dma mapping errors.

However I propose the following to gather more information:

- Change swiotlb to log (pr_info or pr_debug) cases where overflow buffer is
  triggered. (This is a delta on the disable swiotlb patch I sent a few weeks
  ago - References in this posting).
- Change dma_map_single() and dma_map_page() to track how many times they
  return before attempting to fix all the places that don't do dma mapping
  error checks. (Maybe a counter that keeps track, pr_* is not an option).

Comments, thoughts on the analysis and proposal are welcome.

-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xtensa port maintenance

2012-08-10 Thread Chris Zankel

Hi Arnd,

Given the recent renewed push for Xtensa, I'll step in to feed the 
changes upstream. We might change that in future, though.


Max has volunteered to help bring the Xtensa port up-to-date. Most of 
the recent development was done on outdated trees and never got 
submitted in true kernel-manner (i.e. small changes at a time). It's 
also important to bring the ecosystem (compilers, libraries, etc.) to 
the latest trees, and my understanding is that there's also work going 
on in that area.


I have set up a tree on github for now, and will work close with Max to 
get his changes to Stephen's linux-next tree and eventually Linus' tree. 
I think it's fine to add Max as a second maintainer, so he can help 
filtering patches.


Cheers!
-Chris

On 8/10/12 2:15 PM, Arnd Bergmann wrote:

On Monday 06 August 2012, Max Filippov wrote:

I have a couple of questions regarding the path of xtensa-specific patches
upstream:
 - which git tree should they be targeted for? Should I set up a tree for
   pull requests, or will patches be picked up into some existing tree?
   (Looks like Linus' tree is the right target. AFAIK previously xtensa
   patches went mostly through akpm tree).

Setting up a git tree is a good first step if you want to be the official
maintainer, and if you want to get it included into linux-next.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 6/6] kvm: Add de-assert option to KVM_IRQ_ACKFD

2012-08-10 Thread Alex Williamson
It's likely (vfio) that one of the reasons to watch for an IRQ ACK
is to de-assert and re-enable an interrupt.  As the IRQ ACK notfier
is already watching a GSI for an IRQ source ID we can easily couple
these together.

Signed-off-by: Alex Williamson 
---

 Documentation/virtual/kvm/api.txt |4 
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |3 +++
 virt/kvm/eventfd.c|   14 +-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 77b4837..128d4c3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2029,6 +2029,10 @@ the eventfd is only triggered when the specified IRQ 
source ID is
 pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
 must be provided.
 
+When KVM_CAP_IRQ_ACKFD_DEASSERT is available the flag
+KVM_IRQ_ACKFD_FLAG_IRQ_DEASSERT may be used on assignment to specify
+that the GSI should be de-asserted prior to eventfd notification.
+This flag requires an IRQ source ID to be provided as described above.
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d98e59..691b00d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2178,6 +2178,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_IRQFD_ASSERT_ONLY:
case KVM_CAP_IRQ_ACKFD:
case KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID:
+   case KVM_CAP_IRQ_ACKFD_DEASSERT:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 0f53bd5..331631e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -623,6 +623,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_IRQFD_ASSERT_ONLY 83
 #define KVM_CAP_IRQ_ACKFD 84
 #define KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID 85
+#define KVM_CAP_IRQ_ACKFD_DEASSERT 86
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -712,6 +713,8 @@ struct kvm_irq_source_id {
 #define KVM_IRQ_ACKFD_FLAG_DEASSIGN (1 << 0)
 /* Available with KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID */
 #define KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID (1 << 1)
+/* Available with KVM_CAP_IRQ_ACKFD_DEASSERT */
+#define KVM_IRQ_ACKFD_FLAG_DEASSERT (1 << 2)
 
 struct kvm_irq_ackfd {
__u32 flags;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index ff5c784..ffc6a13 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -682,6 +682,7 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
 struct _irq_ackfd {
struct kvm *kvm;
+   bool deassert; /* de-assert on ack? */
struct eventfd_ctx *eventfd; /* signaled on ack */
struct kvm_irq_ack_notifier notifier;
/* Setup/shutdown */
@@ -805,6 +806,10 @@ static void irq_ackfd_acked(struct kvm_irq_ack_notifier 
*notifier)
 
ackfd = container_of(notifier, struct _irq_ackfd, notifier);
 
+   if (ackfd->deassert)
+   kvm_set_irq(ackfd->kvm, ackfd->notifier.irq_source_id,
+   ackfd->notifier.gsi, 0);
+
eventfd_signal(ackfd->eventfd, 1);
 }
 
@@ -845,6 +850,12 @@ static int kvm_assign_irq_ackfd(struct kvm *kvm, struct 
kvm_irq_ackfd *args)
ackfd->notifier.irq_source_id = -1;
ackfd->notifier.irq_acked = irq_ackfd_acked;
 
+   ackfd->deassert = args->flags & KVM_IRQ_ACKFD_FLAG_DEASSERT;
+   if (ackfd->deassert && ackfd->notifier.irq_source_id < 0) {
+   ret = -EINVAL;
+   goto fail;
+   }
+
/*
 * Install our own custom wake-up handling so we are notified via
 * a callback whenever someone releases the underlying eventfd
@@ -945,7 +956,8 @@ fail:
 int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
 {
if (args->flags & ~(KVM_IRQ_ACKFD_FLAG_DEASSIGN |
-   KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID))
+   KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID |
+   KVM_IRQ_ACKFD_FLAG_DEASSERT))
return -EINVAL;
 
if (args->flags & KVM_IRQ_ACKFD_FLAG_DEASSIGN)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 5/6] kvm: KVM_IRQ_ACKFD

2012-08-10 Thread Alex Williamson
Enable a mechanism for IRQ ACKs to be exposed through an eventfd.  The
user can specify the GSI and optionally an IRQ source ID and have the
provided eventfd trigger whenever the irqchip resamples it's inputs,
for instance on EOI.

Signed-off-by: Alex Williamson 
---

 Documentation/virtual/kvm/api.txt |   18 ++
 arch/x86/kvm/x86.c|2 
 include/linux/kvm.h   |   16 ++
 include/linux/kvm_host.h  |   13 ++
 virt/kvm/eventfd.c|  285 +
 virt/kvm/kvm_main.c   |   10 +
 6 files changed, 344 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 17cd599..77b4837 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2011,6 +2011,24 @@ of IRQ source IDs.  These IDs are also shared with KVM 
internal users
 (ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
 may be allocated through this interface.
 
+4.78 KVM_IRQ_ACKFD
+
+Capability: KVM_CAP_IRQ_ACKFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irq_ackfd (in)
+Returns: 0 on success, -errno on error
+
+Allows userspace notification of IRQ ACKs, or resampling of irqchip
+inputs, often associated with an EOI.  User provided kvm_irq_ackfd.fd
+and kvm_irq_ackfd.gsi are required and result in an eventfd trigger
+whenever the GSI is acknowledged.  When KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_FD
+is available, KVM_IRQ_ACKFD supports the KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID
+flag which indicates kvm_irqfd.irq_source_id is provided.  With this,
+the eventfd is only triggered when the specified IRQ source ID is
+pending.  On deassign, fd, gsi, and irq_source_id (if provided on assign)
+must be provided.
+
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19680ed..3d98e59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2176,6 +2176,8 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_KVMCLOCK_CTRL:
case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
case KVM_CAP_IRQFD_ASSERT_ONLY:
+   case KVM_CAP_IRQ_ACKFD:
+   case KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 19b1235..0f53bd5 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -621,6 +621,8 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_NR_IRQ_SOURCE_ID 81
 #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
 #define KVM_CAP_IRQFD_ASSERT_ONLY 83
+#define KVM_CAP_IRQ_ACKFD 84
+#define KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID 85
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -707,6 +709,18 @@ struct kvm_irq_source_id {
__u8 pad[24];
 };
 
+#define KVM_IRQ_ACKFD_FLAG_DEASSIGN (1 << 0)
+/* Available with KVM_CAP_IRQ_ACKFD_IRQ_SOURCE_ID */
+#define KVM_IRQ_ACKFD_FLAG_IRQ_SOURCE_ID (1 << 1)
+
+struct kvm_irq_ackfd {
+   __u32 flags;
+   __u32 fd;
+   __u32 gsi;
+   __u32 irq_source_id;
+   __u8 pad[16];
+};
+
 struct kvm_clock_data {
__u64 clock;
__u32 flags;
@@ -849,6 +863,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
 /* Available with KVM_CAP_IRQ_SOURCE_ID */
 #define KVM_IRQ_SOURCE_ID _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
+/* Available with KVM_CAP_IRQ_ACKFD */
+#define KVM_IRQ_ACKFD _IOW(KVMIO,  0xa9, struct kvm_irq_ackfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ea6d7a1..cdc55c2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,10 @@ struct kvm {
struct list_head  items;
} irqfds;
struct list_head ioeventfds;
+   struct {
+   spinlock_tlock;
+   struct list_head  items;
+   } irq_ackfds;
 #endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
@@ -831,6 +835,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args);
+void kvm_irq_ackfd_release(struct kvm *kvm);
 
 #else
 
@@ -843,6 +849,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct 
kvm_irqfd *args)
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
 
+static inline int kvm_irq_ackfd(struct kvm *kvm, struct kvm_irq_ackfd *args)
+{
+   return -EINVAL;
+}
+
+static inline void kvm_irq_ackfd_release(struct kvm *kvm) {}
+
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
 static inline void kvm_irq_routing_update(struct kvm *kvm,
  struct kvm_irq_routing_table *irq_rt)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index df41038..ff5c784 100644
--- a/virt/kvm/eventfd.c
+++ 

[PATCH v8 4/6] kvm: Add assert-only option to KVM_IRQFD

2012-08-10 Thread Alex Williamson
This allows specifying that an irqfd is used only to assert the
specified gsi, whereas standard behavior is to follow the assertion
with a deassertion.  This will later allow a level interrupt to be
asserted via eventfd and later de-asserted by other means.

Signed-off-by: Alex Williamson 
---

 Documentation/virtual/kvm/api.txt |6 ++
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |3 +++
 virt/kvm/eventfd.c|8 ++--
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 46f4b4d..17cd599 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1951,6 +1951,12 @@ KVM_IRQFD_FLAG_IRQ_SOURCE_ID which can be used to 
specify an IRQ
 source ID (see KVM_IRQ_SOURCE_ID) to be used for the guest interrupt.
 This flag has no effect on deassignment.
 
+When KVM_CAP_IRQFD_ASSERT_ONLY is available, KVM_IRQFD supports the
+KVM_IRQFD_FLAG_ASSERT_ONLY which specifies that an interrupt injected
+via the eventfd is only asserted.  The default behavior is to assert
+then deassert the specified gsi when the eventfd is triggered.  This
+flag has no effect on deassignment.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 946c5bd..19680ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
+   case KVM_CAP_IRQFD_ASSERT_ONLY:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index deda8a9..19b1235 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -620,6 +620,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_NR_IRQ_SOURCE_ID 81
 #define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
+#define KVM_CAP_IRQFD_ASSERT_ONLY 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -687,6 +688,8 @@ struct kvm_xen_hvm_config {
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
 /* Available with KVM_CAP_IRQFD_IRQ_SOURCE_ID */
 #define KVM_IRQFD_FLAG_IRQ_SOURCE_ID (1 << 1)
+/* Available with KVM_CAP_IRQFD_ASSERT_ONLY */
+#define KVM_IRQFD_FLAG_ASSERT_ONLY (1 << 2)
 
 struct kvm_irqfd {
__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 30150f1..df41038 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -52,6 +52,7 @@ struct _irqfd {
/* Used for level IRQ fast-path */
int gsi;
int irq_source_id;
+   bool assert_only;
struct work_struct inject;
/* Used for setup/shutdown */
struct eventfd_ctx *eventfd;
@@ -69,7 +70,8 @@ irqfd_inject(struct work_struct *work)
struct kvm *kvm = irqfd->kvm;
 
kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 1);
-   kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
+   if (!irqfd->assert_only)
+   kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
 }
 
 /*
@@ -218,6 +220,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
irqfd->irq_source_id = args->irq_source_id;
else
irqfd->irq_source_id = KVM_USERSPACE_IRQ_SOURCE_ID;
+   irqfd->assert_only = args->flags & KVM_IRQFD_FLAG_ASSERT_ONLY;
INIT_LIST_HEAD(>list);
INIT_WORK(>inject, irqfd_inject);
INIT_WORK(>shutdown, irqfd_shutdown);
@@ -346,7 +349,8 @@ int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
-   KVM_IRQFD_FLAG_IRQ_SOURCE_ID))
+   KVM_IRQFD_FLAG_IRQ_SOURCE_ID |
+   KVM_IRQFD_FLAG_ASSERT_ONLY))
return -EINVAL;
 
if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 3/6] kvm: Add IRQ source ID option to KVM_IRQFD

2012-08-10 Thread Alex Williamson
This allows specifying an IRQ source ID to be used when injecting an
interrupt.  When not specified KVM_USERSPACE_IRQ_SOURCE_ID is used.

Signed-off-by: Alex Williamson 
---

 Documentation/virtual/kvm/api.txt |5 +
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |6 +-
 virt/kvm/eventfd.c|   14 ++
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 062cfd5..46f4b4d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1946,6 +1946,11 @@ the guest using the specified gsi pin.  The irqfd is 
removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+When KVM_CAP_IRQFD_IRQ_SOURCE_ID is available, KVM_IRQFD supports the
+KVM_IRQFD_FLAG_IRQ_SOURCE_ID which can be used to specify an IRQ
+source ID (see KVM_IRQ_SOURCE_ID) to be used for the guest interrupt.
+This flag has no effect on deassignment.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 75e743e..946c5bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2174,6 +2174,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+   case KVM_CAP_IRQFD_IRQ_SOURCE_ID:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 67b6b49..deda8a9 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_NR_IRQ_SOURCE_ID 81
+#define KVM_CAP_IRQFD_IRQ_SOURCE_ID 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -684,12 +685,15 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+/* Available with KVM_CAP_IRQFD_IRQ_SOURCE_ID */
+#define KVM_IRQFD_FLAG_IRQ_SOURCE_ID (1 << 1)
 
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
__u32 flags;
-   __u8  pad[20];
+   __u32 irq_source_id;
+   __u8  pad[16];
 };
 
 #define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..30150f1 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -51,6 +51,7 @@ struct _irqfd {
struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
/* Used for level IRQ fast-path */
int gsi;
+   int irq_source_id;
struct work_struct inject;
/* Used for setup/shutdown */
struct eventfd_ctx *eventfd;
@@ -67,8 +68,8 @@ irqfd_inject(struct work_struct *work)
struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
struct kvm *kvm = irqfd->kvm;
 
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+   kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 1);
+   kvm_set_irq(kvm, irqfd->irq_source_id, irqfd->gsi, 0);
 }
 
 /*
@@ -138,7 +139,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
irq = rcu_dereference(irqfd->irq_entry);
/* An event has been signaled, inject an interrupt */
if (irq)
-   kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+   kvm_set_msi(irq, kvm, irqfd->irq_source_id, 1);
else
schedule_work(>inject);
rcu_read_unlock();
@@ -213,6 +214,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 
irqfd->kvm = kvm;
irqfd->gsi = args->gsi;
+   if (args->flags & KVM_IRQFD_FLAG_IRQ_SOURCE_ID)
+   irqfd->irq_source_id = args->irq_source_id;
+   else
+   irqfd->irq_source_id = KVM_USERSPACE_IRQ_SOURCE_ID;
INIT_LIST_HEAD(>list);
INIT_WORK(>inject, irqfd_inject);
INIT_WORK(>shutdown, irqfd_shutdown);
@@ -340,7 +345,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-   if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
+   if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
+   KVM_IRQFD_FLAG_IRQ_SOURCE_ID))
return -EINVAL;
 
if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 2/6] kvm: Expose IRQ source IDs to userspace

2012-08-10 Thread Alex Williamson
Introduce KVM_IRQ_SOURCE_ID and KVM_CAP_NR_IRQ_SOURCE_ID to allow
user allocation of IRQ source IDs and querying both the capability
and the total count of IRQ source IDs.  These will later be used
by interfaces for setting up level IRQs.

Signed-off-by: Alex Williamson 
---

 Documentation/virtual/kvm/api.txt |   20 
 arch/x86/kvm/Kconfig  |1 +
 arch/x86/kvm/x86.c|3 +++
 include/linux/kvm.h   |   11 +++
 include/linux/kvm_host.h  |1 +
 virt/kvm/Kconfig  |3 +++
 virt/kvm/irq_comm.c   |   22 ++
 virt/kvm/kvm_main.c   |   16 
 8 files changed, 77 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index bf33aaa..062cfd5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1980,6 +1980,26 @@ return the hash table order in the parameter.  (If the 
guest is using
 the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
+4.77 KVM_IRQ_SOURCE_ID
+
+Capability: KVM_CAP_NR_IRQ_SOURCE_ID
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irq_source_id (in/out)
+Returns: 0 on success, -errno on error
+
+Allows allocating and freeing IRQ source IDs.  Each IRQ source ID
+represents a complete set of irqchip pin inputs which are logically
+OR'd with other IRQ source IDs for determining the final assertion
+level of a pin.  The flag KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN indicates
+whether the call is for an allocation or deallocation.
+kvm_irq_source_id.irq_source_id returns the allocated IRQ source ID
+on success and specifies the freed IRQ source ID on deassign.  The
+return value of KVM_CAP_NR_IRQ_SOURCE_ID indicates the total number
+of IRQ source IDs.  These IDs are also shared with KVM internal users
+(ex. KVM assigned devices, PIT, shared user ID), therefore not all IDs
+may be allocated through this interface.
+
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a28f338..bfd2082 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -37,6 +37,7 @@ config KVM
select TASK_DELAY_ACCT
select PERF_EVENTS
select HAVE_KVM_MSI
+   select HAVE_KVM_IRQ_SOURCE_ID
---help---
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..75e743e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2209,6 +2209,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_TSC_DEADLINE_TIMER:
r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER);
break;
+   case KVM_CAP_NR_IRQ_SOURCE_ID:
+   r = BITS_PER_LONG; /* kvm->arch.irq_sources_bitmap */
+   break;
default:
r = 0;
break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..67b6b49 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_NR_IRQ_SOURCE_ID 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -691,6 +692,14 @@ struct kvm_irqfd {
__u8  pad[20];
 };
 
+#define KVM_IRQ_SOURCE_ID_FLAG_DEASSIGN (1 << 0)
+
+struct kvm_irq_source_id {
+   __u32 flags;
+   __u32 irq_source_id;
+   __u8 pad[24];
+};
+
 struct kvm_clock_data {
__u64 clock;
__u32 flags;
@@ -831,6 +840,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_IRQ_SOURCE_ID */
+#define KVM_IRQ_SOURCE_ID _IOWR(KVMIO, 0xa8, struct kvm_irq_source_id)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2ad3e4a..ea6d7a1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -636,6 +636,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
 int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+int kvm_irq_source_id(struct kvm *kvm, struct kvm_irq_source_id *id);
 
 /* For vcpu->arch.iommu_flags */
 #define KVM_IOMMU_CACHE_COHERENCY  0x1
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 28694f4..b7e0d4d 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -21,3 +21,6 @@ config KVM_ASYNC_PF
 
 config HAVE_KVM_MSI
bool
+
+config HAVE_KVM_IRQ_SOURCE_ID
+   bool
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c

[PATCH v8 1/6] kvm: Allow filtering of acked irqs

2012-08-10 Thread Alex Williamson
Registering an kvm_irq_ack_notifier with kian.irq_source_id < 0
retains existing behavior, filling in the actual irq_source_id results
in the callback only being called when the specified irq_source_id is
asserting the given gsi.

The i8254 PIT remains unfiltered because it de-asserts it's irq source
id, so it's notifier would never get called otherwise.  KVM device
assignment gets filtering as it de-asserts the GSI in it's notifier.

Signed-off-by: Alex Williamson 
---

 arch/x86/kvm/i8254.c |1 +
 arch/x86/kvm/i8259.c |8 +++-
 include/linux/kvm_host.h |4 +++-
 virt/kvm/assigned-dev.c  |1 +
 virt/kvm/ioapic.c|5 -
 virt/kvm/irq_comm.c  |6 --
 6 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index adba28f..2355d19 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -709,6 +709,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
hrtimer_init(_state->pit_timer.timer,
 CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
pit_state->irq_ack_notifier.gsi = 0;
+   pit_state->irq_ack_notifier.irq_source_id = -1; /* No filter */
pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
kvm_register_irq_ack_notifier(kvm, _state->irq_ack_notifier);
pit_state->pit_timer.reinject = true;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index e498b18..d2175a9 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -74,9 +74,14 @@ static void pic_unlock(struct kvm_pic *s)
 
 static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
 {
+   unsigned long irq_source_ids;
+
s->isr &= ~(1 << irq);
if (s != >pics_state->pics[0])
irq += 8;
+
+   irq_source_ids = s->pics_state->irq_states[irq];
+
/*
 * We are dropping lock while calling ack notifiers since ack
 * notifier callbacks for assigned devices call into PIC recursively.
@@ -84,7 +89,8 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
 * it should be safe since PIC state is already updated at this stage.
 */
pic_unlock(s->pics_state);
-   kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
+   kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq,
+irq_source_ids);
pic_lock(s->pics_state);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b70b48b..2ad3e4a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -577,6 +577,7 @@ int kvm_is_mmio_pfn(pfn_t pfn);
 
 struct kvm_irq_ack_notifier {
struct hlist_node link;
+   int irq_source_id;
unsigned gsi;
void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
 };
@@ -627,7 +628,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
int irq_source_id, int level);
-void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
+void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin,
+ unsigned long irq_source_ids);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
 void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 23a41a9..a08c9c1 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -407,6 +407,7 @@ static int assigned_device_enable_guest_intx(struct kvm 
*kvm,
 {
dev->guest_irq = irq->guest_irq;
dev->ack_notifier.gsi = irq->guest_irq;
+   dev->ack_notifier.irq_source_id = dev->irq_source_id;
return 0;
 }
 
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ef61d52..1a9f445 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -241,10 +241,12 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
*ioapic, int vector,
 
for (i = 0; i < IOAPIC_NUM_PINS; i++) {
union kvm_ioapic_redirect_entry *ent = >redirtbl[i];
+   unsigned long irq_source_ids;
 
if (ent->fields.vector != vector)
continue;
 
+   irq_source_ids = ioapic->irq_states[i];
/*
 * We are dropping lock while calling ack notifiers because ack
 * notifier callbacks for assigned devices call into IOAPIC
@@ -254,7 +256,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
*ioapic, int vector,
 * after ack notifier returns.
 */
spin_unlock(>lock);
-   kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
+   kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i,
+

[PATCH v8 0/6] kvm: level irqfd support

2012-08-10 Thread Alex Williamson
v8:

Trying a new approach.  Nobody seems to like the internal IRQ
source ID object and the interactions it implies between irqfd
and eoifd, so let's get rid of it.  Instead, simply expose
IRQ source IDs to userspace.  This lets the user be in charge
of freeing them or hanging onto a source ID for later use.  They
can also detach and re-attach components at will.  It also opens
up the possibility that userspace might want to use each IRQ
source ID for more than one GSI (and avoids the kernel needing
to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.

I really wanted to add a de-assert-only option to irqfd so the
irq_ackfd could be fed directly into an irqfd, but I'm dependent
on the ordering of de-assert _then_ signal an eventfd.  Keeping
that ordering doesn't seem to be possible, especially since irqfd
uses a workqueue, if I attempt to make that connection.  Thanks,

Alex

---

Alex Williamson (6):
  kvm: Add de-assert option to KVM_IRQ_ACKFD
  kvm: KVM_IRQ_ACKFD
  kvm: Add assert-only option to KVM_IRQFD
  kvm: Add IRQ source ID option to KVM_IRQFD
  kvm: Expose IRQ source IDs to userspace
  kvm: Allow filtering of acked irqs


 Documentation/virtual/kvm/api.txt |   53 ++
 arch/x86/kvm/Kconfig  |1 
 arch/x86/kvm/i8254.c  |1 
 arch/x86/kvm/i8259.c  |8 +
 arch/x86/kvm/x86.c|8 +
 include/linux/kvm.h   |   39 -
 include/linux/kvm_host.h  |   18 ++
 virt/kvm/Kconfig  |3 
 virt/kvm/assigned-dev.c   |1 
 virt/kvm/eventfd.c|  315 +
 virt/kvm/ioapic.c |5 -
 virt/kvm/irq_comm.c   |   28 +++
 virt/kvm/kvm_main.c   |   26 +++
 13 files changed, 496 insertions(+), 10 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mellanox mlx4_core and SR-IOV

2012-08-10 Thread Lukas Hejtmanek
On Fri, Aug 10, 2012 at 12:51:53PM -0600, Chris Friesen wrote:
> On 08/03/2012 02:33 AM, Lukas Hejtmanek wrote:
> >I also tried OFED package from Mellanox which seems to have better SR-IOV
> >support (at least mlx4_ib does not complain that SR-IOV is not supported).
> >However, it does not work when SR-IOV enabled:
> 
> Last I heard they were not officially providing support for SR-IOV.
> Has anyone heard otherwise from the Mellanox folks?

they speak about it for 2 years:
http://www.openfabrics.org/archives/spring2010sonoma/Monday/1.30%20Liran%20Liss%20I%3FO%20Virtualization/sriov_liss.ppt

these are modified OFED drivers which seem to contain SR-IOV code also for IB
layer.
http://www.mellanox.com/content/pages.php?pg=products_dyn_family=26_section=34#tab-three

-- 
Lukáš Hejtmánek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/16] perf utils: remove unused function map__objdump_2ip

2012-08-10 Thread Cody P Schafer
map__objdump_2ip was introduced in:
ee11b90b12 perf top: Fix annotate for userspace

And it's last user removed in:
36532461a0 perf top: Ditch private annotation code, share perf annotate's

Remove it.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/map.c | 8 
 tools/perf/util/map.h | 1 -
 2 files changed, 9 deletions(-)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 287cb34..7d37159 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -262,14 +262,6 @@ u64 map__rip_2objdump(struct map *map, u64 rip)
return addr;
 }
 
-u64 map__objdump_2ip(struct map *map, u64 addr)
-{
-   u64 ip = map->dso->adjust_symbols ?
-   addr :
-   map->unmap_ip(map, addr);   /* RIP -> IP */
-   return ip;
-}
-
 void map_groups__init(struct map_groups *mg)
 {
int i;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 1e183d1..6c38d9e 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -104,7 +104,6 @@ static inline u64 identity__map_ip(struct map *map __used, 
u64 ip)
 
 /* rip/ip <-> addr suitable for passing to `objdump --start-address=` */
 u64 map__rip_2objdump(struct map *map, u64 rip);
-u64 map__objdump_2ip(struct map *map, u64 addr);
 
 struct symbol;
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/16] perf symbol: simplify out_fixup in kernel syms loading

2012-08-10 Thread Cody P Schafer
The only site that jumps to out_fixup has (kallsyms_filename == NULL).
And all paths that reach 'if (err > 0)' without 'goto out_fixup' have
kallsyms_filename != NULL.

So skip over both the check & dso__set_long_name(), and remove the
check.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 2127002..e5c3817 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1503,9 +1503,8 @@ do_kallsyms:
free(kallsyms_allocated_filename);
 
if (err > 0) {
+   dso__set_long_name(dso, strdup("[kernel.kallsyms]"));
 out_fixup:
-   if (kallsyms_filename != NULL)
-   dso__set_long_name(dso, strdup("[kernel.kallsyms]"));
map__fixup_start(map);
map__fixup_end(map);
}
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/16] perf symbol: factor want_symtab out of dso__load_sym()

2012-08-10 Thread Cody P Schafer
Only one callsite of dso__load_sym() uses the want_symtab functionality,
so place the logic at the callsite instead of within dso__load_sym().

This sets us up for removal of want_symtab completely once we keep
multiple elf handles (within symsrc's) around.

Setup for the later patch
"perf symbol: use both runtime and debug images"

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 21 ++---
 tools/perf/util/symbol-minimal.c |  8 ++--
 tools/perf/util/symbol.c | 10 +++---
 tools/perf/util/symbol.h |  3 ++-
 4 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 5915947..492ebec 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -525,6 +525,10 @@ static int dso__swap_init(struct dso *dso, unsigned char 
eidata)
return 0;
 }
 
+bool symsrc__has_symtab(struct symsrc *ss)
+{
+   return ss->symtab != NULL;
+}
 
 void symsrc__destroy(struct symsrc *ss)
 {
@@ -616,7 +620,7 @@ out_close:
 }
 
 int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *ss,
- symbol_filter_t filter, int kmodule, int want_symtab)
+ symbol_filter_t filter, int kmodule)
 {
struct kmap *kmap = dso->kernel ? map__kmap(map) : NULL;
struct map *curr_map = map;
@@ -636,21 +640,16 @@ int dso__load_sym(struct dso *dso, struct map *map, 
struct symsrc *ss,
 
dso->symtab_type = ss->type;
 
+   if (!ss->symtab) {
+   ss->symtab  = ss->dynsym;
+   ss->symshdr = ss->dynshdr;
+   }
+
elf = ss->elf;
ehdr = ss->ehdr;
sec = ss->symtab;
shdr = ss->symshdr;
 
-   if (sec == NULL) {
-   if (want_symtab)
-   goto out_elf_end;
-
-   sec  = ss->dynsym;
-   shdr = ss->dynshdr;
-   if (sec == NULL)
-   goto out_elf_end;
-   }
-
opdsec = ss->opdsec;
opdshdr = ss->opdshdr;
opdidx  = ss->opdidx;
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index 2f1584b..3290f04 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -260,6 +260,11 @@ out_close:
return -1;
 }
 
+bool symsrc__has_symtab(struct symsrc *ss __used)
+{
+   return false;
+}
+
 void symsrc__destroy(struct symsrc *ss)
 {
free(ss->name);
@@ -275,8 +280,7 @@ int dso__synthesize_plt_symbols(struct dso *dso __used,
 }
 
 int dso__load_sym(struct dso *dso, struct map *map __used, struct symsrc *ss,
- symbol_filter_t filter __used, int kmodule __used,
- int want_symtab __used)
+ symbol_filter_t filter __used, int kmodule __used)
 {
unsigned char *build_id[BUILD_ID_SIZE];
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index f8a3068..8e7d74f 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1089,8 +1089,12 @@ restart:
if (symsrc__init(, dso, name, symtab_type) < 0)
continue;
 
-   ret = dso__load_sym(dso, map, , filter, 0,
-   want_symtab);
+   if (want_symtab && !symsrc__has_symtab()) {
+   symsrc__destroy();
+   continue;
+   }
+
+   ret = dso__load_sym(dso, map, , filter, 0);
 
/*
 * Some people seem to have debuginfo files _WITHOUT_ debug
@@ -1376,7 +1380,7 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
if (symsrc__init(, dso, symfs_vmlinux, symtab_type))
return -1;
 
-   err = dso__load_sym(dso, map, , filter, 0, 0);
+   err = dso__load_sym(dso, map, , filter, 0);
symsrc__destroy();
 
if (err > 0) {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 95b3996..c73f4f2 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -252,6 +252,7 @@ struct symsrc {
 void symsrc__destroy(struct symsrc *ss);
 int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 enum dso_binary_type type);
+bool symsrc__has_symtab(struct symsrc *ss);
 
 #define DSO__SWAP(dso, type, val)  \
 ({ \
@@ -369,7 +370,7 @@ ssize_t dso__data_read_addr(struct dso *dso, struct map 
*map,
u8 *data, ssize_t size);
 int dso__test_data(void);
 int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *ss,
- symbol_filter_t filter, int kmodule, int want_symtab);
+ symbol_filter_t filter, int kmodule);
 int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss,
struct map *map, symbol_filter_t filter);
 
-- 
1.7.11.3

--
To unsubscribe from this list: send 

[PATCH 02/16] perf symbol: remove unused 'end' arg in kallsyms parse cb

2012-08-10 Thread Cody P Schafer
kallsyms__parse() takes a callback that is called on every discovered
symbol. As /proc/kallsyms does not supply symbol sizes, the callback was
simply called with end=start, faking the symbol size to 1.

All of the callbacks (there are 2) used in calls to kallsyms__parse()
are _only_ used as callbacks for kallsyms__parse().

Given that kallsyms__parse() lacks real information about what
end/length should be, don't make up a length in kallsyms__parse().
Instead have the callbacks handle guessing the length.

Also relocate a comment regarding symbol creation to the callback which
does symbol creation (kallsyms__parse() is not in general used to create
symbols).

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/event.c  |  2 +-
 tools/perf/util/symbol.c | 21 ++---
 tools/perf/util/symbol.h |  2 +-
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 2a6f33c..3a0f1a5 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -412,7 +412,7 @@ struct process_symbol_args {
 };
 
 static int find_symbol_cb(void *arg, const char *name, char type,
- u64 start, u64 end __used)
+ u64 start)
 {
struct process_symbol_args *args = arg;
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 891f83c..9afe6b1 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -563,7 +563,7 @@ size_t dso__fprintf(struct dso *dso, enum map_type type, 
FILE *fp)
 
 int kallsyms__parse(const char *filename, void *arg,
int (*process_symbol)(void *arg, const char *name,
- char type, u64 start, u64 end))
+ char type, u64 start))
 {
char *line = NULL;
size_t n;
@@ -603,13 +603,8 @@ int kallsyms__parse(const char *filename, void *arg,
break;
}
 
-   /*
-* module symbols are not sorted so we add all
-* symbols, setting length to 1, and rely on
-* symbols__fixup_end() to fix it up.
-*/
err = process_symbol(arg, symbol_name,
-symbol_type, start, start);
+symbol_type, start);
if (err)
break;
}
@@ -636,7 +631,7 @@ static u8 kallsyms2elf_type(char type)
 }
 
 static int map__process_kallsym_symbol(void *arg, const char *name,
-  char type, u64 start, u64 end)
+  char type, u64 start)
 {
struct symbol *sym;
struct process_kallsyms_args *a = arg;
@@ -645,8 +640,12 @@ static int map__process_kallsym_symbol(void *arg, const 
char *name,
if (!symbol_type__is_a(type, a->map->type))
return 0;
 
-   sym = symbol__new(start, end - start + 1,
- kallsyms2elf_type(type), name);
+   /*
+* module symbols are not sorted so we add all
+* symbols, setting length to 0, and rely on
+* symbols__fixup_end() to fix it up.
+*/
+   sym = symbol__new(start, 0, kallsyms2elf_type(type), name);
if (sym == NULL)
return -ENOMEM;
/*
@@ -1731,7 +1730,7 @@ struct process_args {
 };
 
 static int symbol__in_kernel(void *arg, const char *name,
-char type __used, u64 start, u64 end __used)
+char type __used, u64 start)
 {
struct process_args *args = arg;
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 38ccbbb..c9534fe 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -299,7 +299,7 @@ bool __dsos__read_build_ids(struct list_head *head, bool 
with_hits);
 int build_id__sprintf(const u8 *build_id, int len, char *bf);
 int kallsyms__parse(const char *filename, void *arg,
int (*process_symbol)(void *arg, const char *name,
- char type, u64 start, u64 end));
+ char type, u64 start));
 int filename__read_debuglink(const char *filename, char *debuglink,
 size_t size);
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/16] perf symbol: only un-prelink non-zero symbols

2012-08-10 Thread Cody P Schafer
Prelink only adjusts the addresses of non-zero symbols. Do the same when
we reverse the adjustments.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 9ca89f8..e037609 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -717,7 +717,7 @@ int dso__load_sym(struct dso *dso, struct map *map, const 
char *name, int fd,
goto new_symbol;
}
 
-   if (curr_dso->adjust_symbols) {
+   if (curr_dso->adjust_symbols && sym.st_value) {
pr_debug4("%s: adjusting symbol: st_value: %#" PRIx64 " 
"
  "sh_addr: %#" PRIx64 " sh_offset: %#" PRIx64 
"\n", __func__,
  (u64)sym.st_value, (u64)shdr.sh_addr,
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/16] perf symbol: don't try to synthesize plt without dynstr.

2012-08-10 Thread Cody P Schafer
If .dynsym exists but .dynstr is empty (NO_BITS or size==0), a segfault
occurs. Avoid this by checking that .dynstr is not empty.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index e037609..a2e994e 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -232,6 +232,9 @@ int dso__synthesize_plt_symbols(struct dso *dso, char 
*name, struct map *map,
if (symstrs == NULL)
goto out_elf_end;
 
+   if (symstrs->d_size == 0)
+   goto out_elf_end;
+
nr_rel_entries = shdr_rel_plt.sh_size / shdr_rel_plt.sh_entsize;
plt_offset = shdr_plt.sh_offset;
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/16] perf symbol: only set vmlinux longname & mark loaded if really loaded

2012-08-10 Thread Cody P Schafer
dso__load_vmlinux() uses the filename passed to it to directly set the
dso long_name, which resulted in a use after free due to
dso__load_vmlinux_path() treating 0 symbols as a load failure and
subsequently freeing the contents of dso->long_name.

Change dso__load_vmlinux() so that finding 0 symbols does not cause it
to consider itself loaded, and do not set long_name in such a case.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index e5c3817..96dbf28 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1364,13 +1364,14 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
if (fd < 0)
return -1;
 
-   dso__set_long_name(dso, (char *)vmlinux);
-   dso__set_loaded(dso, map->type);
err = dso__load_sym(dso, map, symfs_vmlinux, fd, filter, 0, 0);
close(fd);
 
-   if (err > 0)
+   if (err > 0) {
+   dso__set_long_name(dso, (char *)vmlinux);
+   dso__set_loaded(dso, map->type);
pr_debug("Using %s for symbols\n", symfs_vmlinux);
+   }
 
return err;
 }
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/16] perf symbol: use both runtime and debug images

2012-08-10 Thread Cody P Schafer
We keep both a 'runtime' elf image as well as a 'debug' elf image around
and generate symbols by looking at both of these.

This eliminates the need for the want_symtab/goto restart mechanism
combined with iterating over and reopening the elf images a second time.

Also give dso__synthsize_plt_symbols() the runtime image (which has
dynsyms) instead of the symbol image (which may only have a symtab and
no dynsyms).

Previously if a debug image was found all runtime images were ignored.

This fixes 2 issues:

 - Symbol resolution to failure on PowerPC systems with debug symbols
   installed, as the debug images lack a '.opd' section which contains
   function descriptors.

 - On all archs, plt synthesis failed when a debug image was loaded and
   that debug image lacks a dynsym section while a runtime image has a
   dynsym section.

Assumptions:

 - If a .opd section exists, it is contained in the highest priority
   image with a dynsym section.

 - This generally implies that the debug image lacks a dynsym section
   (ie: it is marked as NO_BITS).

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c |  5 +++
 tools/perf/util/symbol-minimal.c |  6 
 tools/perf/util/symbol.c | 77 +++-
 tools/perf/util/symbol.h |  1 +
 4 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 36e4a45..5b37e13 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -525,6 +525,11 @@ static int dso__swap_init(struct dso *dso, unsigned char 
eidata)
return 0;
 }
 
+bool symsrc__possibly_runtime(struct symsrc *ss)
+{
+   return ss->dynsym || ss->opdsec;
+}
+
 bool symsrc__has_symtab(struct symsrc *ss)
 {
return ss->symtab != NULL;
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index c0377a9..133a591 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -260,6 +260,12 @@ out_close:
return -1;
 }
 
+bool symsrc__possibly_runtime(struct symsrc *ss __used)
+{
+   /* Assume all sym sources could be a runtime image. */
+   return true;
+}
+
 bool symsrc__has_symtab(struct symsrc *ss __used)
 {
return false;
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 739e5a3..2aa871a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1026,11 +1026,12 @@ int dso__load(struct dso *dso, struct map *map, 
symbol_filter_t filter)
 {
char *name;
int ret = -1;
-   struct symsrc ss;
u_int i;
struct machine *machine;
char *root_dir = (char *) "";
-   int want_symtab;
+   int ss_pos = 0;
+   struct symsrc ss_[2];
+   struct symsrc *syms_ss = NULL, *runtime_ss = NULL;
 
dso__set_loaded(dso, map->type);
 
@@ -1072,12 +1073,12 @@ int dso__load(struct dso *dso, struct map *map, 
symbol_filter_t filter)
root_dir = machine->root_dir;
 
/* Iterate over candidate debug images.
-* On the first pass, only load images if they have a full symtab.
-* Failing that, do a second pass where we accept .dynsym also
+* Keep track of "interesting" ones (those which have a symtab, dynsym,
+* and/or opd section) for processing.
 */
-   want_symtab = 1;
-restart:
for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
+   struct symsrc *ss = _[ss_pos];
+   bool next_slot = false;
 
enum dso_binary_type symtab_type = binary_type_symtab[i];
 
@@ -1086,45 +1087,55 @@ restart:
continue;
 
/* Name is now the name of the next image to try */
-   if (symsrc__init(, dso, name, symtab_type) < 0)
+   if (symsrc__init(ss, dso, name, symtab_type) < 0)
continue;
 
-   if (want_symtab && !symsrc__has_symtab()) {
-   symsrc__destroy();
-   continue;
+   if (!syms_ss && symsrc__has_symtab(ss)) {
+   syms_ss = ss;
+   next_slot = true;
}
 
-   ret = dso__load_sym(dso, map, , , filter, 0);
-
-   /*
-* Some people seem to have debuginfo files _WITHOUT_ debug
-* info!?!?
-*/
-   if (!ret) {
-   symsrc__destroy();
-   continue;
+   if (!runtime_ss && symsrc__possibly_runtime(ss)) {
+   runtime_ss = ss;
+   next_slot = true;
}
 
-   if (ret > 0) {
-   int nr_plt;
+   if (next_slot) {
+   ss_pos++;
 
-   nr_plt = dso__synthesize_plt_symbols(dso, , map, 
filter);
-   if (nr_plt > 0)
-   

[PATCH 06/16] perf symbol: remove unneeded call to dso__set_long_name()

2012-08-10 Thread Cody P Schafer
dso__set_long_name() is already called by dso__load_vmlinux(), avoid
calling it a second time unnecessarily.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9afe6b1..2127002 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1387,10 +1387,8 @@ int dso__load_vmlinux_path(struct dso *dso, struct map 
*map,
filename = dso__build_id_filename(dso, NULL, 0);
if (filename != NULL) {
err = dso__load_vmlinux(dso, map, filename, filter);
-   if (err > 0) {
-   dso__set_long_name(dso, filename);
+   if (err > 0)
goto out;
-   }
free(filename);
}
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/16] perf symbol: introduce symsrc structure.

2012-08-10 Thread Cody P Schafer
Factors opening of certain sections & tracking certain elf info into an
external structure.

The goal here is to keep multiple elfs (and their looked up
sections/indexes) around during the symbol generation process (in
dso__load()).

We need this to properly resolve symbols on PPC due to the
use of function descriptors & the .opd section (ie: symbols which are
functions don't point to their actual location, they point to their
function descriptor in .opd which contains their actual location.

It would be possible to just keep the (Elf *) around, but then we'd end
up with duplicate code for looking up the same sections and checking for
the existence of an important section wouldn't be as clean (and we need
to keep the Elf stuff confined to symtab-elf.c).

Utilized by the later patch
"perf symbol: use both runtime and debug images"

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 119 +--
 tools/perf/util/symbol-minimal.c |  30 +-
 tools/perf/util/symbol.c |  22 
 tools/perf/util/symbol.h |  36 +++-
 4 files changed, 163 insertions(+), 44 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index a9a194d..6974b2a 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -536,24 +536,25 @@ static int dso__swap_init(struct dso *dso, unsigned char 
eidata)
return 0;
 }
 
-int dso__load_sym(struct dso *dso, struct map *map, const char *name, int fd,
- symbol_filter_t filter, int kmodule, int want_symtab)
+
+void symsrc__destroy(struct symsrc *ss)
+{
+   free(ss->name);
+   elf_end(ss->elf);
+   close(ss->fd);
+}
+
+int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
+enum dso_binary_type type)
 {
-   struct kmap *kmap = dso->kernel ? map__kmap(map) : NULL;
-   struct map *curr_map = map;
-   struct dso *curr_dso = dso;
-   Elf_Data *symstrs, *secstrs;
-   uint32_t nr_syms;
int err = -1;
-   uint32_t idx;
GElf_Ehdr ehdr;
-   GElf_Shdr shdr, opdshdr;
-   Elf_Data *syms, *opddata = NULL;
-   GElf_Sym sym;
-   Elf_Scn *sec, *sec_strndx, *opdsec;
Elf *elf;
-   int nr = 0;
-   size_t opdidx = 0;
+   int fd;
+
+   fd = open(name, O_RDONLY);
+   if (fd < 0)
+   return -1;
 
elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
if (elf == NULL) {
@@ -580,19 +581,88 @@ int dso__load_sym(struct dso *dso, struct map *map, const 
char *name, int fd,
goto out_elf_end;
}
 
-   sec = elf_section_by_name(elf, , , ".symtab", NULL);
+   ss->symtab = elf_section_by_name(elf, , >symshdr, ".symtab",
+   NULL);
+   if (ss->symshdr.sh_type != SHT_SYMTAB)
+   ss->symtab = NULL;
+
+   ss->dynsym_idx = 0;
+   ss->dynsym = elf_section_by_name(elf, , >dynshdr, ".dynsym",
+   >dynsym_idx);
+   if (ss->dynshdr.sh_type != SHT_DYNSYM)
+   ss->dynsym = NULL;
+
+   ss->opdidx = 0;
+   ss->opdsec = elf_section_by_name(elf, , >opdshdr, ".opd",
+   >opdidx);
+   if (ss->opdshdr.sh_type != SHT_PROGBITS)
+   ss->opdsec = NULL;
+
+   if (dso->kernel == DSO_TYPE_USER) {
+   GElf_Shdr shdr;
+   ss->adjust_symbols = (ehdr.e_type == ET_EXEC ||
+   elf_section_by_name(elf, , ,
+".gnu.prelink_undo",
+NULL) != NULL);
+   } else {
+   ss->adjust_symbols = 0;
+   }
+
+   ss->name   = strdup(name);
+   if (!ss->name)
+   goto out_elf_end;
+
+   ss->elf= elf;
+   ss->fd = fd;
+   ss->ehdr   = ehdr;
+   ss->type   = type;
+
+   return 0;
+
+out_elf_end:
+   elf_end(elf);
+out_close:
+   close(fd);
+   return err;
+}
+
+int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *ss,
+ symbol_filter_t filter, int kmodule, int want_symtab)
+{
+   struct kmap *kmap = dso->kernel ? map__kmap(map) : NULL;
+   struct map *curr_map = map;
+   struct dso *curr_dso = dso;
+   Elf_Data *symstrs, *secstrs;
+   uint32_t nr_syms;
+   int err = -1;
+   uint32_t idx;
+   GElf_Ehdr ehdr;
+   GElf_Shdr shdr, opdshdr;
+   Elf_Data *syms, *opddata = NULL;
+   GElf_Sym sym;
+   Elf_Scn *sec, *sec_strndx, *opdsec;
+   Elf *elf;
+   int nr = 0;
+   size_t opdidx = 0;
+
+   elf = ss->elf;
+   ehdr = ss->ehdr;
+   sec = ss->symtab;
+   shdr = ss->symshdr;
+
if (sec == NULL) {
if (want_symtab)
goto out_elf_end;
 
-   sec = elf_section_by_name(elf, , , ".dynsym", NULL);
+   sec  = ss->dynsym;
+   

[PATCH 10/16] perf symbol: track symtab_type of vmlinux

2012-08-10 Thread Cody P Schafer
Previously, symtab_type would have been left at 0, or KALLSYMS, which is not
quite accurate.

Introduce DSO_SYMTAB_TYPE__VMLINUX[_GUEST].

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol.c | 9 +
 tools/perf/util/symbol.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 96dbf28..8f5cabbf 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -923,6 +923,7 @@ char dso__symtab_origin(const struct dso *dso)
 {
static const char origin[] = {
[DSO_BINARY_TYPE__KALLSYMS] = 'k',
+   [DSO_BINARY_TYPE__VMLINUX]  = 'v',
[DSO_BINARY_TYPE__JAVA_JIT] = 'j',
[DSO_BINARY_TYPE__DEBUGLINK]= 'l',
[DSO_BINARY_TYPE__BUILD_ID_CACHE]   = 'B',
@@ -933,6 +934,7 @@ char dso__symtab_origin(const struct dso *dso)
[DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE]  = 'K',
[DSO_BINARY_TYPE__GUEST_KALLSYMS]   = 'g',
[DSO_BINARY_TYPE__GUEST_KMODULE]= 'G',
+   [DSO_BINARY_TYPE__GUEST_VMLINUX]= 'V',
};
 
if (dso == NULL || dso->symtab_type == DSO_BINARY_TYPE__NOT_FOUND)
@@ -1008,7 +1010,9 @@ int dso__binary_type_file(struct dso *dso, enum 
dso_binary_type type,
 
default:
case DSO_BINARY_TYPE__KALLSYMS:
+   case DSO_BINARY_TYPE__VMLINUX:
case DSO_BINARY_TYPE__GUEST_KALLSYMS:
+   case DSO_BINARY_TYPE__GUEST_VMLINUX:
case DSO_BINARY_TYPE__JAVA_JIT:
case DSO_BINARY_TYPE__NOT_FOUND:
ret = -1;
@@ -1364,6 +1368,11 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
if (fd < 0)
return -1;
 
+   if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
+   dso->symtab_type = DSO_BINARY_TYPE__GUEST_VMLINUX;
+   else
+   dso->symtab_type = DSO_BINARY_TYPE__VMLINUX;
+
err = dso__load_sym(dso, map, symfs_vmlinux, fd, filter, 0, 0);
close(fd);
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c9534fe..37f1ea1 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -158,6 +158,8 @@ struct addr_location {
 enum dso_binary_type {
DSO_BINARY_TYPE__KALLSYMS = 0,
DSO_BINARY_TYPE__GUEST_KALLSYMS,
+   DSO_BINARY_TYPE__VMLINUX,
+   DSO_BINARY_TYPE__GUEST_VMLINUX,
DSO_BINARY_TYPE__JAVA_JIT,
DSO_BINARY_TYPE__DEBUGLINK,
DSO_BINARY_TYPE__BUILD_ID_CACHE,
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/16] perf symbol: avoid segfault in elf_strptr

2012-08-10 Thread Cody P Schafer
If we call elf_section_by_name() with a truncated elf image (ie: the file
header indicates that the section headers are placed past the end of the
file), elf_strptr() causes a segfault within libelf.

Avoid this by checking that we can access the section string table
properly.

Should really be fixed in libelf/elfutils.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index a2e994e..a9a194d 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -129,6 +129,10 @@ static Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr 
*ep,
Elf_Scn *sec = NULL;
size_t cnt = 1;
 
+   /* Elf is corrupted/truncated, avoid calling elf_strptr. */
+   if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL))
+   return NULL;
+
while ((sec = elf_nextscn(elf, sec)) != NULL) {
char *str;
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/16] perf symbol: convert dso__load_syms to take 2 symsrc's

2012-08-10 Thread Cody P Schafer
To properly handle platforms with an opd section, both a runtime image
(which contains the opd section but possibly lacks symbols) and a symbol
image (which probably lacks an opd section but has symbols).

The next patch ("perf symbol: use both runtime and debug images")
adjusts the callsite in dso__load() to take advantage of being able to
pass both runtime & debug images.

Assumptions made here:

 - The opd section, if it exists in the runtime image, has headers in
   both the runtime image and the debug/syms image.
 - The index of the opd section (again, only if it exists in the
   runtime image) is the same in both the runtime and debug/symbols
   image.

Both of these are true on RHEL, but it is unclear how accurate they are
in general (on platforms with function descriptors in opd sections).

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 47 
 tools/perf/util/symbol-minimal.c |  1 +
 tools/perf/util/symbol.c |  4 ++--
 tools/perf/util/symbol.h |  5 +++--
 4 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 492ebec..36e4a45 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -619,7 +619,8 @@ out_close:
return err;
 }
 
-int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *ss,
+int dso__load_sym(struct dso *dso, struct map *map,
+ struct symsrc *syms_ss, struct symsrc *runtime_ss,
  symbol_filter_t filter, int kmodule)
 {
struct kmap *kmap = dso->kernel ? map__kmap(map) : NULL;
@@ -630,31 +631,27 @@ int dso__load_sym(struct dso *dso, struct map *map, 
struct symsrc *ss,
int err = -1;
uint32_t idx;
GElf_Ehdr ehdr;
-   GElf_Shdr shdr, opdshdr;
+   GElf_Shdr shdr;
Elf_Data *syms, *opddata = NULL;
GElf_Sym sym;
-   Elf_Scn *sec, *sec_strndx, *opdsec;
+   Elf_Scn *sec, *sec_strndx;
Elf *elf;
int nr = 0;
-   size_t opdidx = 0;
 
-   dso->symtab_type = ss->type;
+   dso->symtab_type = syms_ss->type;
 
-   if (!ss->symtab) {
-   ss->symtab  = ss->dynsym;
-   ss->symshdr = ss->dynshdr;
+   if (!syms_ss->symtab) {
+   syms_ss->symtab  = syms_ss->dynsym;
+   syms_ss->symshdr = syms_ss->dynshdr;
}
 
-   elf = ss->elf;
-   ehdr = ss->ehdr;
-   sec = ss->symtab;
-   shdr = ss->symshdr;
+   elf = syms_ss->elf;
+   ehdr = syms_ss->ehdr;
+   sec = syms_ss->symtab;
+   shdr = syms_ss->symshdr;
 
-   opdsec = ss->opdsec;
-   opdshdr = ss->opdshdr;
-   opdidx  = ss->opdidx;
-   if (opdsec)
-   opddata = elf_rawdata(opdsec, NULL);
+   if (runtime_ss->opdsec)
+   opddata = elf_rawdata(runtime_ss->opdsec, NULL);
 
syms = elf_getdata(sec, NULL);
if (syms == NULL)
@@ -679,13 +676,14 @@ int dso__load_sym(struct dso *dso, struct map *map, 
struct symsrc *ss,
nr_syms = shdr.sh_size / shdr.sh_entsize;
 
memset(, 0, sizeof(sym));
-   dso->adjust_symbols = ss->adjust_symbols;
+   dso->adjust_symbols = runtime_ss->adjust_symbols;
elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) {
struct symbol *f;
const char *elf_name = elf_sym__name(, symstrs);
char *demangled = NULL;
int is_label = elf_sym__is_label();
const char *section_name;
+   bool used_opd = false;
 
if (kmap && kmap->ref_reloc_sym && kmap->ref_reloc_sym->name &&
strcmp(elf_name, kmap->ref_reloc_sym->name) == 0)
@@ -704,14 +702,16 @@ int dso__load_sym(struct dso *dso, struct map *map, 
struct symsrc *ss,
continue;
}
 
-   if (opdsec && sym.st_shndx == opdidx) {
-   u32 offset = sym.st_value - opdshdr.sh_addr;
+   if (runtime_ss->opdsec && sym.st_shndx == runtime_ss->opdidx) {
+   u32 offset = sym.st_value - syms_ss->opdshdr.sh_addr;
u64 *opd = opddata->d_buf + offset;
sym.st_value = DSO__SWAP(dso, u64, *opd);
-   sym.st_shndx = elf_addr_to_index(elf, sym.st_value);
+   sym.st_shndx = elf_addr_to_index(runtime_ss->elf,
+   sym.st_value);
+   used_opd = true;
}
 
-   sec = elf_getscn(elf, sym.st_shndx);
+   sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
if (!sec)
goto out_elf_end;
 
@@ -777,7 +777,8 @@ int dso__load_sym(struct dso *dso, struct map *map, struct 
symsrc *ss,
goto new_symbol;
}
 
-   if (curr_dso->adjust_symbols && sym.st_value) {
+ 

[PATCH 13/16] perf symbol: switch dso__synthesize_plt_symbols() to use symsrc

2012-08-10 Thread Cody P Schafer
Previously dso__synthesize_plt_symbols() was reopening the elf file to
obtain dynsyms from it. Rather than reopen the file, use the already
opened reference within the symsrc to access it.

Setup for the later patch
"perf symbol: use both runtime and debug images"

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c | 25 +++--
 tools/perf/util/symbol-minimal.c |  3 ++-
 tools/perf/util/symbol.c |  8 +---
 tools/perf/util/symbol.h |  4 ++--
 4 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3a9c38a..5915947 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -166,7 +166,7 @@ static Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
  * And always look at the original dso, not at debuginfo packages, that
  * have the PLT data stripped out (shdr_rel_plt.sh_type == SHT_NOBITS).
  */
-int dso__synthesize_plt_symbols(struct dso *dso, char *name, struct map *map,
+int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss, struct map 
*map,
symbol_filter_t filter)
 {
uint32_t nr_rel_entries, idx;
@@ -181,21 +181,15 @@ int dso__synthesize_plt_symbols(struct dso *dso, char 
*name, struct map *map,
GElf_Ehdr ehdr;
char sympltname[1024];
Elf *elf;
-   int nr = 0, symidx, fd, err = 0;
+   int nr = 0, symidx, err = 0;
 
-   fd = open(name, O_RDONLY);
-   if (fd < 0)
-   goto out;
-
-   elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
-   if (elf == NULL)
-   goto out_close;
+   elf = ss->elf;
+   ehdr = ss->ehdr;
 
-   if (gelf_getehdr(elf, ) == NULL)
-   goto out_elf_end;
+   scn_dynsym = ss->dynsym;
+   shdr_dynsym = ss->dynshdr;
+   dynsym_idx = ss->dynsym_idx;
 
-   scn_dynsym = elf_section_by_name(elf, , _dynsym,
-".dynsym", _idx);
if (scn_dynsym == NULL)
goto out_elf_end;
 
@@ -291,13 +285,8 @@ int dso__synthesize_plt_symbols(struct dso *dso, char 
*name, struct map *map,
 
err = 0;
 out_elf_end:
-   elf_end(elf);
-out_close:
-   close(fd);
-
if (err == 0)
return nr;
-out:
pr_debug("%s: problems reading %s PLT info.\n",
 __func__, dso->long_name);
return 0;
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index f8b5764..2f1584b 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -266,7 +266,8 @@ void symsrc__destroy(struct symsrc *ss)
close(ss->fd);
 }
 
-int dso__synthesize_plt_symbols(struct dso *dso __used, char *name __used,
+int dso__synthesize_plt_symbols(struct dso *dso __used,
+   struct symsrc *ss __used,
struct map *map __used,
symbol_filter_t filter __used)
 {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 2b3495a..f8a3068 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1091,21 +1091,23 @@ restart:
 
ret = dso__load_sym(dso, map, , filter, 0,
want_symtab);
-   symsrc__destroy();
 
/*
 * Some people seem to have debuginfo files _WITHOUT_ debug
 * info!?!?
 */
-   if (!ret)
+   if (!ret) {
+   symsrc__destroy();
continue;
+   }
 
if (ret > 0) {
int nr_plt;
 
-   nr_plt = dso__synthesize_plt_symbols(dso, name, map, 
filter);
+   nr_plt = dso__synthesize_plt_symbols(dso, , map, 
filter);
if (nr_plt > 0)
ret += nr_plt;
+   symsrc__destroy();
break;
}
}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 5e55f98..95b3996 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -370,8 +370,8 @@ ssize_t dso__data_read_addr(struct dso *dso, struct map 
*map,
 int dso__test_data(void);
 int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *ss,
  symbol_filter_t filter, int kmodule, int want_symtab);
-int dso__synthesize_plt_symbols(struct dso *dso, char *name, struct map *map,
-   symbol_filter_t filter);
+int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss,
+   struct map *map, symbol_filter_t filter);
 
 void symbols__insert(struct rb_root *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root *symbols);
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe 

[PATCH 12/16] perf symbol: set symtab_type in dso__load_sym

2012-08-10 Thread Cody P Schafer
In certain cases, dso__load requires dso->symbol_type to be set prior to
calling it. With the introduction of symsrc*, the symtab_type is now
stored in a symsrc which is then passed to dso__load_sym().

Change dso__load_sym() to use the symtab_type from them symsrc (setting
dso->symtab_type as well).

Setup for later patch
"perf symbol: use both runtime and debug images"

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol-elf.c |  2 ++
 tools/perf/util/symbol.c | 13 +++--
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 6974b2a..3a9c38a 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -645,6 +645,8 @@ int dso__load_sym(struct dso *dso, struct map *map, struct 
symsrc *ss,
int nr = 0;
size_t opdidx = 0;
 
+   dso->symtab_type = ss->type;
+
elf = ss->elf;
ehdr = ss->ehdr;
sec = ss->symtab;
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index afec3f0..2b3495a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1079,14 +1079,14 @@ int dso__load(struct dso *dso, struct map *map, 
symbol_filter_t filter)
 restart:
for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
 
-   dso->symtab_type = binary_type_symtab[i];
+   enum dso_binary_type symtab_type = binary_type_symtab[i];
 
-   if (dso__binary_type_file(dso, dso->symtab_type,
+   if (dso__binary_type_file(dso, symtab_type,
  root_dir, name, PATH_MAX))
continue;
 
/* Name is now the name of the next image to try */
-   if (symsrc__init(, dso, name, dso->symtab_type) < 0)
+   if (symsrc__init(, dso, name, symtab_type) < 0)
continue;
 
ret = dso__load_sym(dso, map, , filter, 0,
@@ -1361,16 +1361,17 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
int err = -1;
struct symsrc ss;
char symfs_vmlinux[PATH_MAX];
+   enum dso_binary_type symtab_type;
 
snprintf(symfs_vmlinux, sizeof(symfs_vmlinux), "%s%s",
 symbol_conf.symfs, vmlinux);
 
if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
-   dso->symtab_type = DSO_BINARY_TYPE__GUEST_VMLINUX;
+   symtab_type = DSO_BINARY_TYPE__GUEST_VMLINUX;
else
-   dso->symtab_type = DSO_BINARY_TYPE__VMLINUX;
+   symtab_type = DSO_BINARY_TYPE__VMLINUX;
 
-   if (symsrc__init(, dso, symfs_vmlinux, dso->symtab_type))
+   if (symsrc__init(, dso, symfs_vmlinux, symtab_type))
return -1;
 
err = dso__load_sym(dso, map, , filter, 0, 0);
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/16] perf symbol: correct comment wrt kallsyms loading

2012-08-10 Thread Cody P Schafer
In kallsyms_parse() when calling process_symbol() (a callback argument
to kallsyms_parse()), we pass start as both start & end (ie:
start=start, end=start).

In map__process_kallsym_symbol(), the length is calculated as 'end - start + 1',
making the length 1, not 0.

Essentially, start & end define an inclusive range.

Signed-off-by: Cody P Schafer 
---
 tools/perf/util/symbol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index f02de8a..891f83c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -605,7 +605,7 @@ int kallsyms__parse(const char *filename, void *arg,
 
/*
 * module symbols are not sorted so we add all
-* symbols with zero length and rely on
+* symbols, setting length to 1, and rely on
 * symbols__fixup_end() to fix it up.
 */
err = process_symbol(arg, symbol_name,
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/16] perf: various symbol resolution fixes, including .opd section use.

2012-08-10 Thread Cody P Schafer
1-4,6,7 are small cleanups.

5 fixes a potential segfault.

8 fixes a use after free for dso->long_name

9 avoids a segfault in elfutils when a truncated elf is loaded.

10 properly tracks that a dso had symbols loaded from a vmlinux image

11-16 fix handling of the '.opd' section in the presence of debuginfo. They
  should also fix plt symbol synthesis (haven't seen the plt issues in
  practice).

--

Changes from v1:
 - rebased on top of
   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git  perf/core
 - In #2, make the symbols have size 0 instead of 1.

--

 tools/perf/util/event.c  |   2 +-
 tools/perf/util/map.c|   8 --
 tools/perf/util/map.h|   1 -
 tools/perf/util/symbol-elf.c | 182 ++-
 tools/perf/util/symbol-minimal.c |  48 +--
 tools/perf/util/symbol.c | 136 +
 tools/perf/util/symbol.h |  49 +--
 7 files changed, 290 insertions(+), 136 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/5] drm/i915: register LVDS connector even if we can't get a panel mode

2012-08-10 Thread Seth Forshee
On Mon, Aug 06, 2012 at 07:44:16AM +1000, Dave Airlie wrote:
> >> The "correct" approach is clearly to just have the drm core change the
> >> i2c mux before requesting edid, but that's made difficult because of the
> >> absence of ordering guarantees in initialisation. I don't like quirking
> >> this, since we're then back to the situation of potentially having to
> >> add every new piece of related hardware to the quirk list.
> >
> > The "correct" approach of switching the mux before we fetch the edid is
> > actualy the one I fear will result in fragile code: Only run on few
> > machines, and as you say with tons of funky interactions with the init
> > sequence ordering. And I guess people will bitch about the flickering
> > this will cause ;-)
> >
> > As long as it's only apple shipping multi-gpu machines with
> > broken/non-existing vbt, I'll happily stomach the quirk list entries.
> > They're bad, but imo the lesser evil.
> 
> Well in theory you can switch the ddc lines without switching the other lines,
> so we could do a mutex protected mux switch around edid retrival,
> 
> Of course someone would have to code it up first then we could see how
> ugly it would be.

I coded it up, and it's not really too bad. I've put a dump of my local
changes below. But there are a couple of problems.

First, I don't have a solution for the ordering of initialization. It
just happens to work out for me right now.

Even so, the code only works if I delay loading i915. apple-gmux is
definitely initializing first so the i2c mux should be getting switched,
but the transactions are failing.

[   19.445658] [drm:gmbus_xfer], GMBUS [i915 gmbus panel] timed out after NAK 
[   19.445672] [drm:gmbus_xfer], GMBUS [i915 gmbus panel] NAK for addr: 0050 
r(1) 

But if I prevent i915 from being auto-loaded and load later on it works
fine. I haven't been able to figure out what's going on. Any ideas?

Thanks,
Seth


diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index a8743c3..3f18e8a 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "drmP.h"
 #include "drm_edid.h"
 #include "drm_edid_modes.h"
@@ -82,6 +83,8 @@ struct detailed_mode_closure {
 #define LEVEL_GTF2 2
 #define LEVEL_CVT  3
 
+static DEFINE_MUTEX(drm_edid_mutex);
+
 static struct edid_quirk {
char vendor[4];
int product_id;
@@ -395,12 +398,25 @@ struct edid *drm_get_edid(struct drm_connector *connector,
  struct i2c_adapter *adapter)
 {
struct edid *edid = NULL;
+   struct pci_dev *pdev = connector->dev->pdev;
+   struct pci_dev *active_pdev = NULL;
+
+   mutex_lock(_edid_mutex);
+
+   if (pdev) {
+   active_pdev = vga_switcheroo_get_active_client();
+   vga_switcheroo_switch_ddc(pdev);
+   }
 
if (drm_probe_ddc(adapter))
edid = (struct edid *)drm_do_get_edid(connector, adapter);
 
+   if (active_pdev)
+   vga_switcheroo_switch_ddc(active_pdev);
+
connector->display_info.raw_edid = (char *)edid;
 
+   mutex_unlock(_edid_mutex);
return edid;
 
 }
diff --git a/drivers/gpu/vga/vga_switcheroo.c b/drivers/gpu/vga/vga_switcheroo.c
index e25cf31..e53f67d 100644
--- a/drivers/gpu/vga/vga_switcheroo.c
+++ b/drivers/gpu/vga/vga_switcheroo.c
@@ -205,6 +205,20 @@ find_active_client(struct list_head *head)
return NULL;
 }
 
+struct pci_dev *vga_switcheroo_get_active_client(void)
+{
+   struct vga_switcheroo_client *client;
+   struct pci_dev *pdev = NULL;
+
+   mutex_lock(_mutex);
+   client = find_active_client(_priv.clients);
+   if (client)
+   pdev = client->pdev;
+   mutex_unlock(_mutex);
+   return pdev;
+}
+EXPORT_SYMBOL(vga_switcheroo_get_active_client);
+
 int vga_switcheroo_get_client_state(struct pci_dev *pdev)
 {
struct vga_switcheroo_client *client;
@@ -252,6 +266,29 @@ void vga_switcheroo_client_fb_set(struct pci_dev *pdev,
 }
 EXPORT_SYMBOL(vga_switcheroo_client_fb_set);
 
+int vga_switcheroo_switch_ddc(struct pci_dev *pdev)
+{
+   int ret = 0;
+   int id;
+
+   mutex_lock(_mutex);
+
+   if (!vgasr_priv.handler) {
+   ret = -ENODEV;
+   goto out;
+   }
+
+   if (vgasr_priv.handler->switch_ddc) {
+   id = vgasr_priv.handler->get_client_id(pdev);
+   ret = vgasr_priv.handler->switch_ddc(id);
+   }
+
+out:
+   mutex_unlock(_mutex);
+   return ret;
+}
+EXPORT_SYMBOL(vga_switcheroo_switch_ddc);
+
 static int vga_switcheroo_show(struct seq_file *m, void *v)
 {
struct vga_switcheroo_client *client;
@@ -342,9 +379,15 @@ static int vga_switchto_stage2(struct 
vga_switcheroo_client *new_client)
fb_notifier_call_chain(FB_EVENT_REMAP_ALL_CONSOLE, );
}
 
+   if (vgasr_priv.handler->switch_ddc) {
+   ret = 

Re: [PATCH 2/2] ARM: local timers: add timer support using IO mapped register

2012-08-10 Thread Rob Herring
On 08/10/2012 04:58 PM, Rohit Vaswani wrote:
> The current arch_timer only support accessing through CP15 interface.
> Add support for ARM processors that only support IO mapped register
> interface
> 
> Signed-off-by: Rohit Vaswani 
> ---
>  .../devicetree/bindings/arm/arch_timer.txt |7 +
>  arch/arm/kernel/arch_timer.c   |  259 
> 
>  2 files changed, 223 insertions(+), 43 deletions(-)

The original file is 360 lines. It doesn't really seem like there's a
lot of overlap and I wonder if it is worth the extra overhead.

> 
> diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt 
> b/Documentation/devicetree/bindings/arm/arch_timer.txt
> index 52478c8..1c71799 100644
> --- a/Documentation/devicetree/bindings/arm/arch_timer.txt
> +++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
> @@ -14,6 +14,13 @@ The timer is attached to a GIC to deliver its 
> per-processor interrupts.
>  
>  - clock-frequency : The frequency of the main counter, in Hz. Optional.
>  
> +- irq-is-not-percpu: Specify is the timer irq is *NOT* a percpu (PPI) 
> interrupt
> +  In the default case i.e without this property, the timer irq is treated as 
> a
> +  PPI interrupt. Optional.

The first field in the gic interrupts binding already defines this.

> +
> +- If the node address and reg is specified, the arch_timer will try to use 
> the memory
> +  mapped timer. Optional.

This timer is fundamentally different h/w. You need a new compatible string.

> +
>  Example:
>  
>   timer {
> diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c
> index 1d0d9df..09604b7 100644
> --- a/arch/arm/kernel/arch_timer.c
> +++ b/arch/arm/kernel/arch_timer.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -29,8 +30,17 @@
>  static unsigned long arch_timer_rate;
>  static int arch_timer_ppi;
>  static int arch_timer_ppi2;
> +static int is_irq_percpu;
>  
>  static struct clock_event_device __percpu **arch_timer_evt;
> +static void __iomem *timer_base;
> +
> +struct arch_timer_operations {
> + void (*reg_write)(int, u32);
> + u32 (*reg_read)(int);
> + cycle_t (*get_cntpct)(void);
> + cycle_t (*get_cntvct)(void);
> +};
>  
>  /*
>   * Architected system timer support.
> @@ -44,7 +54,29 @@ static struct clock_event_device __percpu **arch_timer_evt;
>  #define ARCH_TIMER_REG_FREQ  1
>  #define ARCH_TIMER_REG_TVAL  2
>  
> -static void arch_timer_reg_write(int reg, u32 val)
> +/* Iomapped Register Offsets */
> +#define ARCH_TIMER_CNTP_LOW_REG  0x000
> +#define ARCH_TIMER_CNTP_HIGH_REG 0x004
> +#define ARCH_TIMER_CNTV_LOW_REG  0x008
> +#define ARCH_TIMER_CNTV_HIGH_REG 0x00C
> +#define ARCH_TIMER_CTRL_REG  0x02C
> +#define ARCH_TIMER_FREQ_REG  0x010
> +#define ARCH_TIMER_CNTP_TVAL_REG 0x028
> +#define ARCH_TIMER_CNTV_TVAL_REG 0x038
> +
> +static void timer_reg_write_mem(int reg, u32 val)
> +{
> + switch (reg) {
> + case ARCH_TIMER_REG_CTRL:
> + __raw_writel(val, timer_base + ARCH_TIMER_CTRL_REG);
> + break;
> + case ARCH_TIMER_REG_TVAL:
> + __raw_writel(val, timer_base + ARCH_TIMER_CNTP_TVAL_REG);
> + break;

This whole function seems a bit pointless as it only adds timer_base.

Rob

> + }
> +}
> +
> +static void timer_reg_write_cp15(int reg, u32 val)
>  {
>   switch (reg) {
>   case ARCH_TIMER_REG_CTRL:
> @@ -58,7 +90,28 @@ static void arch_timer_reg_write(int reg, u32 val)
>   isb();
>  }
>  
> -static u32 arch_timer_reg_read(int reg)
> +static u32 timer_reg_read_mem(int reg)
> +{
> + u32 val;
> +
> + switch (reg) {
> + case ARCH_TIMER_REG_CTRL:
> + val = __raw_readl(timer_base + ARCH_TIMER_CTRL_REG);
> + break;
> + case ARCH_TIMER_REG_FREQ:
> + val = __raw_readl(timer_base + ARCH_TIMER_FREQ_REG);
> + break;
> + case ARCH_TIMER_REG_TVAL:
> + val = __raw_readl(timer_base + ARCH_TIMER_CNTP_TVAL_REG);
> + break;
> + default:
> + BUG();
> + }
> +
> + return val;
> +}
> +
> +static u32 timer_reg_read_cp15(int reg)
>  {
>   u32 val;
>  
> @@ -79,6 +132,103 @@ static u32 arch_timer_reg_read(int reg)
>   return val;
>  }
>  
> +static cycle_t arch_counter_get_cntpct_mem(void)
> +{
> + u32 cvall, cvalh, thigh;
> +
> + do {
> + cvalh = __raw_readl(timer_base + ARCH_TIMER_CNTP_HIGH_REG);
> + cvall = __raw_readl(timer_base + ARCH_TIMER_CNTP_LOW_REG);
> + thigh = __raw_readl(timer_base + ARCH_TIMER_CNTP_HIGH_REG);
> + } while (cvalh != thigh);
> +
> + return ((cycle_t) cvalh << 32) | cvall;
> +}
> +
> +static cycle_t arch_counter_get_cntpct_cp15(void)
> +{
> + u32 cvall, cvalh;
> +
> + asm volatile("mrrc p15, 0, %0, %1, c14" : "=r" (cvall), "=r" (cvalh));
> +
> + return 

Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
On Fri, Aug 10, 2012 at 05:41:53PM -0400, Naoya Horiguchi wrote:
...
> +/*
>   * Dirty cache page page
>   * Issues: when the error hit a hole page the error is not properly
>   * propagated.
>   */
>  static int me_pagecache_dirty(struct page *p, unsigned long pfn)
>  {
> - /*
> -  * The original memory error handling on dirty pagecache has
> -  * a bug that user processes who use corrupted pages via read()
> -  * or write() can't be aware of the memory error and result
> -  * in throwing out dirty data silently.
> -  *
> -  * Until we solve the problem, let's close the path of memory
> -  * error handling for dirty pagecache. We just leave errors
> -  * for the 2nd MCE to trigger panics.
> -  */
> - return IGNORED;
> + struct address_space *mapping = page_mapping(p);
> +
> + SetPageError(p);
> + if (mapping) {
> + struct hwp_dirty *hwp;
> + struct inode *inode = mapping->host;
> +
> + /*
> +  * Memory error is reported to userspace by AS_HWPOISON flags
> +  * in mapping->flags. The mechanism is similar to that of
> +  * AS_EIO, but we have separete flags because there'are two
> +  * differences between them:
> +  *  1. Expected userspace handling. When user processes get
> +  * -EIO, they can retry writeback hoping the error in IO
> +  * devices is temporary, switch to write to other devices,
> +  * or do some other application-specific handling.
> +  * For -EHWPOISON, we can clear the error by overwriting
> +  * the corrupted page.
> +  *  2. When to clear. For -EIO, we can think that we recover
> +  * from the error when writeback succeeds. For -EHWPOISON
> +  * OTOH, we can see that things are back to normal when
> +  * corrupted data are overwritten from user buffer.
> +  */
> + hwp = kmalloc(sizeof(struct hwp_dirty), GFP_ATOMIC);
> + hwp->page = p;
> + hwp->fpage = NULL;
> + hwp->mapping = mapping;
> + hwp->index = page_index(p);

> + hwp->ino = inode->i_ino;
> + hwp->dev = inode->i_sb->s_dev;

Sorry, these two members are not in struct hwp_dirty in current version.
Please ignore them.

Thanks,
Naoya

> + add_hwp_dirty(hwp);
> +
> + pr_err("MCE %#lx: Corrupted dirty pagecache, dev %u:%u, 
> inode:%lu, index:%lu\n",
> +pfn, MAJOR(inode->i_sb->s_dev),
> +MINOR(inode->i_sb->s_dev), inode->i_ino, page_index(p));
> + mapping_set_error(mapping, -EHWPOISON);
> + }
> +
> + return me_pagecache_clean(p, pfn);
>  }
>  
>  /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] ARM: local timers: add timer support using IO mapped register

2012-08-10 Thread Rohit Vaswani
The current arch_timer only support accessing through CP15 interface.
Add support for ARM processors that only support IO mapped register
interface

Signed-off-by: Rohit Vaswani 
---
 .../devicetree/bindings/arm/arch_timer.txt |7 +
 arch/arm/kernel/arch_timer.c   |  259 
 2 files changed, 223 insertions(+), 43 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt 
b/Documentation/devicetree/bindings/arm/arch_timer.txt
index 52478c8..1c71799 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -14,6 +14,13 @@ The timer is attached to a GIC to deliver its per-processor 
interrupts.
 
 - clock-frequency : The frequency of the main counter, in Hz. Optional.
 
+- irq-is-not-percpu: Specify is the timer irq is *NOT* a percpu (PPI) interrupt
+  In the default case i.e without this property, the timer irq is treated as a
+  PPI interrupt. Optional.
+
+- If the node address and reg is specified, the arch_timer will try to use the 
memory
+  mapped timer. Optional.
+
 Example:
 
timer {
diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c
index 1d0d9df..09604b7 100644
--- a/arch/arm/kernel/arch_timer.c
+++ b/arch/arm/kernel/arch_timer.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -29,8 +30,17 @@
 static unsigned long arch_timer_rate;
 static int arch_timer_ppi;
 static int arch_timer_ppi2;
+static int is_irq_percpu;
 
 static struct clock_event_device __percpu **arch_timer_evt;
+static void __iomem *timer_base;
+
+struct arch_timer_operations {
+   void (*reg_write)(int, u32);
+   u32 (*reg_read)(int);
+   cycle_t (*get_cntpct)(void);
+   cycle_t (*get_cntvct)(void);
+};
 
 /*
  * Architected system timer support.
@@ -44,7 +54,29 @@ static struct clock_event_device __percpu **arch_timer_evt;
 #define ARCH_TIMER_REG_FREQ1
 #define ARCH_TIMER_REG_TVAL2
 
-static void arch_timer_reg_write(int reg, u32 val)
+/* Iomapped Register Offsets */
+#define ARCH_TIMER_CNTP_LOW_REG0x000
+#define ARCH_TIMER_CNTP_HIGH_REG   0x004
+#define ARCH_TIMER_CNTV_LOW_REG0x008
+#define ARCH_TIMER_CNTV_HIGH_REG   0x00C
+#define ARCH_TIMER_CTRL_REG0x02C
+#define ARCH_TIMER_FREQ_REG0x010
+#define ARCH_TIMER_CNTP_TVAL_REG   0x028
+#define ARCH_TIMER_CNTV_TVAL_REG   0x038
+
+static void timer_reg_write_mem(int reg, u32 val)
+{
+   switch (reg) {
+   case ARCH_TIMER_REG_CTRL:
+   __raw_writel(val, timer_base + ARCH_TIMER_CTRL_REG);
+   break;
+   case ARCH_TIMER_REG_TVAL:
+   __raw_writel(val, timer_base + ARCH_TIMER_CNTP_TVAL_REG);
+   break;
+   }
+}
+
+static void timer_reg_write_cp15(int reg, u32 val)
 {
switch (reg) {
case ARCH_TIMER_REG_CTRL:
@@ -58,7 +90,28 @@ static void arch_timer_reg_write(int reg, u32 val)
isb();
 }
 
-static u32 arch_timer_reg_read(int reg)
+static u32 timer_reg_read_mem(int reg)
+{
+   u32 val;
+
+   switch (reg) {
+   case ARCH_TIMER_REG_CTRL:
+   val = __raw_readl(timer_base + ARCH_TIMER_CTRL_REG);
+   break;
+   case ARCH_TIMER_REG_FREQ:
+   val = __raw_readl(timer_base + ARCH_TIMER_FREQ_REG);
+   break;
+   case ARCH_TIMER_REG_TVAL:
+   val = __raw_readl(timer_base + ARCH_TIMER_CNTP_TVAL_REG);
+   break;
+   default:
+   BUG();
+   }
+
+   return val;
+}
+
+static u32 timer_reg_read_cp15(int reg)
 {
u32 val;
 
@@ -79,6 +132,103 @@ static u32 arch_timer_reg_read(int reg)
return val;
 }
 
+static cycle_t arch_counter_get_cntpct_mem(void)
+{
+   u32 cvall, cvalh, thigh;
+
+   do {
+   cvalh = __raw_readl(timer_base + ARCH_TIMER_CNTP_HIGH_REG);
+   cvall = __raw_readl(timer_base + ARCH_TIMER_CNTP_LOW_REG);
+   thigh = __raw_readl(timer_base + ARCH_TIMER_CNTP_HIGH_REG);
+   } while (cvalh != thigh);
+
+   return ((cycle_t) cvalh << 32) | cvall;
+}
+
+static cycle_t arch_counter_get_cntpct_cp15(void)
+{
+   u32 cvall, cvalh;
+
+   asm volatile("mrrc p15, 0, %0, %1, c14" : "=r" (cvall), "=r" (cvalh));
+
+   return ((cycle_t) cvalh << 32) | cvall;
+}
+
+static cycle_t arch_counter_get_cntvct_mem(void)
+{
+   u32 cvall, cvalh, thigh;
+
+   do {
+   cvalh = __raw_readl(timer_base + ARCH_TIMER_CNTV_HIGH_REG);
+   cvall = __raw_readl(timer_base + ARCH_TIMER_CNTV_LOW_REG);
+   thigh = __raw_readl(timer_base + ARCH_TIMER_CNTV_HIGH_REG);
+   } while (cvalh != thigh);
+
+   return ((cycle_t) cvalh << 32) | cvall;
+}
+
+static cycle_t arch_counter_get_cntvct_cp15(void)
+{
+   u32 cvall, cvalh;
+
+   asm volatile("mrrc p15, 1, %0, %1, c14" : "=r" (cvall), "=r" 

[PATCH 1/2] ARM: local timers: Unmask interrupt before new TVAL is set

2012-08-10 Thread Rohit Vaswani
Level triggered interrupt is deasserted when a new TVAL is written
only when the interrupt is unmasked. Make sure that the interrupt
is unmasked in CTL register before TVAL is written.
If this order is not followed, there are chances that on some
hardware you would not receive any timer interrupts.

Signed-off-by: Rohit Vaswani 
---
 arch/arm/kernel/arch_timer.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c
index dd58035..1d0d9df 100644
--- a/arch/arm/kernel/arch_timer.c
+++ b/arch/arm/kernel/arch_timer.c
@@ -126,8 +126,8 @@ static int arch_timer_set_next_event(unsigned long evt,
ctrl |= ARCH_TIMER_CTRL_ENABLE;
ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
 
-   arch_timer_reg_write(ARCH_TIMER_REG_TVAL, evt);
arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
+   arch_timer_reg_write(ARCH_TIMER_REG_TVAL, evt);
 
return 0;
 }
-- 
Sent by an employee of the Qualcomm Innovation Center,Inc
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Failure while make modules_install if kmod was compiled with --with-rootprefix set

2012-08-10 Thread Arokux B.
Dear Mr. Marek, dear all,

I have detected a hidden failure while building the kernel. If
--with-rootprefix is set for kmod, then depmod will look for modules
installed at the location $ROOTPREFIX/lib/modules/. The
kernel build system does not know anything about $ROOTPREFIX, and so
the wrong directory is created for the test if the hack is needed for
an older versin of depmod at scripts/depmod.sh:19 mkdir -p
"$tmp_dir/lib/modules/$KERNELRELEASE". That is why "$DEPMOD" -b
"$tmp_dir" $KERNELRELEASE will always fail and kernel build system
will think that the hack is always needed and depmod_hack_needed will
always be true. After that the created symlink is wrong since it also
does not contain $ROOTPREFIX, which depmod will preprend. That is why
depmod will fail.

To cure the problem an additional variable $MOD_ROOT_PREFIX can be
introduced. With the help of this variable the paths in the
scripts/depmod.sh are parametrized. This variable should be set to the
same value which was passed to --with-rootprefix while compilation of
kmod. Example: if  --with-rootprefix is set to /usr and the modules
should be installed at the location /home/john, then the the following
make call should be issued: make INSTALL_MOD_PATH=/home/john
$MOD_ROOT_PREFIX=/usr. After that the modules will be installed at
/home/john/usr. However should be also added to other places where the
actuall installing takes place, and so I do not this this solution is
optimal, nevertheless, please find the patch for depmod.sh at the end
of this e-mail.

A more superior solution could be probably a new option for depmod
which would allow an overwriting of the $ROOTPREFIX. This option can
be used in depmod.sh then to overwrite $ROOTPREFIX with an empty
string.

I was unsure as of which solution is better if any at all and so such
a lengthy e-mail...

With kind regards,

Arokux


diff --git a/scripts/depmod.sh b/scripts/depmod.sh
index 2ae4817..87a6e42 100755
--- a/scripts/depmod.sh
+++ b/scripts/depmod.sh
@@ -16,16 +16,18 @@ fi
 # numbers, so we cheat with a symlink here
 depmod_hack_needed=true
 tmp_dir=$(mktemp -d ${TMPDIR:-/tmp}/depmod.XX)
-mkdir -p "$tmp_dir/lib/modules/$KERNELRELEASE"
+mkdir -p "$tmp_dir/$MOD_ROOT_PREFIX/lib/modules/$KERNELRELEASE"
+"$DEPMOD" -b "$tmp_dir" $KERNELRELEASE
+echo hello
 if "$DEPMOD" -b "$tmp_dir" $KERNELRELEASE 2>/dev/null; then
-   if test -e "$tmp_dir/lib/modules/$KERNELRELEASE/modules.dep" -o \
-   -e "$tmp_dir/lib/modules/$KERNELRELEASE/modules.dep.bin"; then
+   if test -e
"$tmp_dir/$MOD_ROOT_PREFIX/lib/modules/$KERNELRELEASE/modules.dep" -o
\
+   -e
"$tmp_dir/$MOD_ROOT_PREFIX/lib/modules/$KERNELRELEASE/modules.dep.bin";
then
depmod_hack_needed=false
fi
 fi
 rm -rf "$tmp_dir"
 if $depmod_hack_needed; then
-   symlink="$INSTALL_MOD_PATH/lib/modules/99.98.$KERNELRELEASE"
+   
symlink="$INSTALL_MOD_PATH/$MOD_ROOT_PREFIX/lib/modules/99.98.$KERNELRELEASE"
ln -s "$KERNELRELEASE" "$symlink"
KERNELRELEASE=99.98.$KERNELRELEASE
 fi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot

2012-08-10 Thread Justin Piszcz
On Fri, Aug 10, 2012 at 1:53 PM, Jesper Juhl  wrote:
> On Fri, 10 Aug 2012, Justin Piszcz wrote:
>
>> Hello,
>>
>> Motherboard: Supermicro X8DTH-6F
>> Distro: Debian Testing x86_64
>>
>> >From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and the
>> machine attempts to boot but hangs at the filesystem mounting part of the
>> boot process.

Hi,

Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem (60TB).

The 3.4 kernel works fine.

This is proven by commenting out the filesystem in /etc/fstab with
3.5.1, and all is OK.

When I run mount for that filesystem, it hangs, I ran alt+sysrq+t to
get additional output and I have pasted it below with the 3.5.1
kernel:

[  160.373406] mount   R  running task0  4361   4355 0x
[  160.373407]  8806266bdb68 0086 8806266bdaa8
8806266bdfd8
[  160.373410]  8806266bdfd8 4000 8806270b0600
880626c73a10
[  160.373413]  00011240 880c260177c0 880c260177c0

[  160.373415] Call Trace:
[  160.373416]  [] ? __schedule+0x299/0x770
[  160.373418]  [] __cond_resched+0x25/0x40
[  160.373420]  [] _cond_resched+0x2a/0x40
[  160.373421]  [] ext4_calculate_overhead+0x239/0x3e0
[  160.373425]  [] ext4_fill_super+0x1aa9/0x2930
[  160.373427]  [] mount_bdev+0x19f/0x1e0
[  160.373429]  [] ? ext4_calculate_overhead+0x3e0/0x3e0
[  160.373431]  [] ext4_mount+0x10/0x20
[  160.373433]  [] mount_fs+0x1b/0xd0
[  160.373434]  [] vfs_kern_mount+0x6f/0x110
[  160.373437]  [] do_kern_mount+0x4f/0x100
[  160.373439]  [] do_mount+0x2fe/0x8a0
[  160.373440]  [] ? strndup_user+0x53/0x70
[  160.373442]  [] sys_mount+0x90/0xe0
[  160.373443]  [] system_call_fastpath+0x1a/0x1f
[  160.373446] jbd2/sda1-8 S 880c2675f800 0  4362  2 0x
[  160.373448]  880623ca9e50 0046 880626c73a10
880623ca9fd8
[  160.373450]  880623ca9fd8 4000 8806271b9850
880626d08250
[  160.373453]  880623ca9da0 8806266bdbe0 880c2675f8a0
880c2675f888
[  160.373455] Call Trace:
[  160.373456]  [] ? default_wake_function+0xd/0x10
[  160.373458]  [] ? autoremove_wake_function+0x11/0x40
[  160.373460]  [] ? __wake_up_common+0x55/0x90
[  160.373462]  [] schedule+0x24/0x70
[  160.373463]  [] kjournald2+0x1ce/0x1e0
[  160.373465]  [] ? abort_exclusive_wait+0xb0/0xb0
[  160.373467]  [] ? commit_timeout+0x10/0x10
[  160.373469]  [] kthread+0x8e/0xa0
[  160.373471]  [] kernel_thread_helper+0x4/0x10
[  160.373472]  [] ? kthread_flush_work_fn+0x10/0x10
[  160.373474]  [] ? gs_change+0xb/0xb

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] vfs: fix file creation mode bugs

2012-08-10 Thread Richard W.M. Jones
On Tue, Aug 07, 2012 at 02:45:45PM +0200, Miklos Szeredi wrote:
> Al,
> 
> Please consider the following patches.
> 
> The first one fixes an old bug (stable CC-d).  The others are fixes for the
> atomic-open series.
> 
> Thanks,
> Miklos
> 
> 
> Miklos Szeredi (4):
>   vfs: canonicalize create mode in build_open_flags()
>   vfs: atomic_open(): fix create mode usage
>   vfs: pass right create mode to may_o_create()
>   fuse: check create mode in atomic open
> 
> ---
>  fs/fuse/dir.c |3 +++
>  fs/namei.c|4 ++--
>  fs/open.c |7 ---
>  3 files changed, 9 insertions(+), 5 deletions(-)

I added these four patches to the Fedora Rawhide kernel and these
fix the problems with ntfs-3g and my FUSE module.

Tested-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Current error reporting of memory errors on dirty pagecache has silent
data lost problem because AS_EIO in struct address_space is cleared
once checked.
A simple solution is to make AS_EIO sticky (as Wu Fengguang proposed in
https://lkml.org/lkml/2009/6/11/294), but this patch does more to make
dirty pagecache error recoverable under some conditions. Consider that
if there is a copy of the corrupted dirty pagecache on user buffer and
you write() over the error page with the copy data, then we can ignore
the effect of the error because no one consumes the corrupted data.

To implement this, this patch does roughly the following:
  - add data structures and handling routines to manage the metadata
of memory errors on dirty pagecaches,
  - return -EHWPOISON when we access to the error-affected address with
read(), partial-page write(), fsync(),
  - cancel hwpoison when we do full-page write() over the error-affected
address.

One reason why we have a separate flag AS_HWPOISON is that the conditions
of clearing flags differs between legacy IO error and memory error. AS_EIO
is cleared when subsequent writeback for the error-affected file succeeds.
OTOH, AS_HWPOISON can be cleared when a pagecache on which the error lies
is fully overwritten with copy data in user buffer.
Another reason is that we expect user processes which get the error report
from the kernel to handle it differently between the two types of errors.
Processes which get -EHWPOISON can search copy data in their buffers and
try to write() over the error pages if they have.

We have one behavioral change on PageHWPoison flag. Before this patch,
PageHWPoison means literally "the page is corrupted," and the pages with
PageHWPoison set are never reused. After this patch, we give another role
to this flag. When a user process tries to access the address which was
backed by the corrupted page (which is already removed from pagecache by
memory error handler,) we permit to add a new page onto the pagecache
with PageHWPoison flag set. But we refuse to read() and partial write()
on the page until the PageHWPoison flag is cleared by whole-page write().

Signed-off-by: Naoya Horiguchi 
---
 include/linux/page-flags.h |   2 +
 include/linux/pagemap.h|  91 
 mm/filemap.c   |  51 +++
 mm/memory-failure.c| 343 +++--
 mm/truncate.c  |   3 +
 5 files changed, 479 insertions(+), 11 deletions(-)

diff --git v3.6-rc1.orig/include/linux/page-flags.h 
v3.6-rc1/include/linux/page-flags.h
index b5d1384..25bbde0 100644
--- v3.6-rc1.orig/include/linux/page-flags.h
+++ v3.6-rc1/include/linux/page-flags.h
@@ -272,6 +272,8 @@ TESTSCFLAG(HWPoison, hwpoison)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 #else
 PAGEFLAG_FALSE(HWPoison)
+SETPAGEFLAG_NOOP(HWPoison)
+CLEARPAGEFLAG_NOOP(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
diff --git v3.6-rc1.orig/include/linux/pagemap.h 
v3.6-rc1/include/linux/pagemap.h
index e42c762..8b18560 100644
--- v3.6-rc1.orig/include/linux/pagemap.h
+++ v3.6-rc1/include/linux/pagemap.h
@@ -24,6 +24,7 @@ enum mapping_flags {
AS_ENOSPC   = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE  = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
+   AS_HWPOISON = __GFP_BITS_SHIFT + 4, /* pagecache is hwpoisoned */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -31,6 +32,8 @@ static inline void mapping_set_error(struct address_space 
*mapping, int error)
if (unlikely(error)) {
if (error == -ENOSPC)
set_bit(AS_ENOSPC, >flags);
+   else if (error == -EHWPOISON)
+   set_bit(AS_HWPOISON, >flags);
else
set_bit(AS_EIO, >flags);
}
@@ -541,4 +544,92 @@ static inline int add_to_page_cache(struct page *page,
return error;
 }
 
+#ifdef CONFIG_MEMORY_FAILURE
+extern int __hwpoison_file_range(struct address_space *mapping, loff_t start,
+   loff_t end);
+extern int __hwpoison_partial_write(struct address_space *mapping, loff_t pos,
+   size_t count);
+extern void __remove_hwp_dirty_pgoff(struct address_space *mapping,
+   pgoff_t index);
+extern void __remove_hwp_dirty_file(struct inode *inode);
+extern void __add_fake_hwpoison(struct page *page,
+   struct address_space *mapping, pgoff_t index);
+extern void __remove_fake_hwpoison(struct page *page,
+   struct address_space *mapping);
+
+static inline int hwpoison_file_range(struct address_space *mapping,
+   loff_t start, loff_t end)
+{
+   if (unlikely(test_bit(AS_HWPOISON, >flags)))
+   return 

[PATCH 0/3 v1] HWPOISON: improve dirty pagecache error handling

2012-08-10 Thread Naoya Horiguchi
Hi,

This patchset is to improve handling and reporting of memory errors on
dirty pagecache.

Patch 1 is to fix a messaging bug, and patch 2 is to temporarily undo
the code which can happen the data lost.  I think these two are obvious
fixes so I want to push them to merge promptly.

Patch 3 is for a new feature. The problem in error reporting (where AS_EIO
we rely on to report the error to userspace is cleared once checked) is
discussed when hwpoison core patches were reviewed, and we left it unfixed
because it can be fixed with more generic solution which covers legacy EIO.
But in my opinion, legacy EIO and hwpoison are different in how it can or
should be handled (for example, as described in patch 3, we can recover
from memory errors on dirty pagecache with overwriting.) So this patch
only solves the problem of memory error reporting.

My test for this patchset is available on:
https://github.com/Naoya-Horiguchi/test_memory_error_on_dirty_pagecache.git

Could you review or comment?

Thanks,
Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] HWPOISON: undo memory error handling for dirty pagecache

2012-08-10 Thread Naoya Horiguchi
Current memory error handling on dirty pagecache has a bug that user
processes who use corrupted pages via read() or write() can't be aware
of the memory error and result in discarding dirty data silently.

The following patch is to improve handling/reporting memory errors on
this case, but as a short term solution I suggest that we should undo
the present error handling code and just leave errors for such cases
(which expect the 2nd MCE to panic the system) to ensure data consistency.

Signed-off-by: Naoya Horiguchi 
Cc: sta...@vger.kernel.org
---
 mm/memory-failure.c | 54 +++--
 1 file changed, 11 insertions(+), 43 deletions(-)

diff --git v3.6-rc1.orig/mm/memory-failure.c v3.6-rc1/mm/memory-failure.c
index 79dfb2f..7e62797 100644
--- v3.6-rc1.orig/mm/memory-failure.c
+++ v3.6-rc1/mm/memory-failure.c
@@ -613,49 +613,17 @@ static int me_pagecache_clean(struct page *p, unsigned 
long pfn)
  */
 static int me_pagecache_dirty(struct page *p, unsigned long pfn)
 {
-   struct address_space *mapping = page_mapping(p);
-
-   SetPageError(p);
-   /* TBD: print more information about the file. */
-   if (mapping) {
-   /*
-* IO error will be reported by write(), fsync(), etc.
-* who check the mapping.
-* This way the application knows that something went
-* wrong with its dirty file data.
-*
-* There's one open issue:
-*
-* The EIO will be only reported on the next IO
-* operation and then cleared through the IO map.
-* Normally Linux has two mechanisms to pass IO error
-* first through the AS_EIO flag in the address space
-* and then through the PageError flag in the page.
-* Since we drop pages on memory failure handling the
-* only mechanism open to use is through AS_AIO.
-*
-* This has the disadvantage that it gets cleared on
-* the first operation that returns an error, while
-* the PageError bit is more sticky and only cleared
-* when the page is reread or dropped.  If an
-* application assumes it will always get error on
-* fsync, but does other operations on the fd before
-* and the page is dropped between then the error
-* will not be properly reported.
-*
-* This can already happen even without hwpoisoned
-* pages: first on metadata IO errors (which only
-* report through AS_EIO) or when the page is dropped
-* at the wrong time.
-*
-* So right now we assume that the application DTRT on
-* the first EIO, but we're not worse than other parts
-* of the kernel.
-*/
-   mapping_set_error(mapping, EIO);
-   }
-
-   return me_pagecache_clean(p, pfn);
+   /*
+* The original memory error handling on dirty pagecache has
+* a bug that user processes who use corrupted pages via read()
+* or write() can't be aware of the memory error and result
+* in throwing out dirty data silently.
+*
+* Until we solve the problem, let's close the path of memory
+* error handling for dirty pagecache. We just leave errors
+* for the 2nd MCE to trigger panics.
+*/
+   return IGNORED;
 }
 
 /*
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] HWPOISON: fix action_result() to print out dirty/clean

2012-08-10 Thread Naoya Horiguchi
action_result() fails to print out "dirty" even if an error occurred on a
dirty pagecache, because when we check PageDirty in action_result() it was
cleared after page isolation even if it's dirty before error handling. This
can break some applications that monitor this message, so should be fixed.

There are several callers of action_result() except page_action(), but
either of them are not for LRU pages but for free pages or kernel pages,
so we don't have to consider dirty or not for them.

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git v3.6-rc1.orig/mm/memory-failure.c v3.6-rc1/mm/memory-failure.c
index a6e2141..79dfb2f 100644
--- v3.6-rc1.orig/mm/memory-failure.c
+++ v3.6-rc1/mm/memory-failure.c
@@ -779,16 +779,16 @@ static struct page_state {
{ compound, compound,   "huge", me_huge_page },
 #endif
 
-   { sc|dirty, sc|dirty,   "swapcache",me_swapcache_dirty },
-   { sc|dirty, sc, "swapcache",me_swapcache_clean },
+   { sc|dirty, sc|dirty,   "dirty swapcache",  
me_swapcache_dirty },
+   { sc|dirty, sc, "clean swapcache",  
me_swapcache_clean },
 
-   { unevict|dirty, unevict|dirty, "unevictable LRU", me_pagecache_dirty},
-   { unevict,  unevict,"unevictable LRU", me_pagecache_clean},
+   { unevict|dirty, unevict|dirty, "dirty unevictable LRU", 
me_pagecache_dirty },
+   { unevict,  unevict,"clean unevictable LRU", 
me_pagecache_clean },
 
-   { mlock|dirty,  mlock|dirty,"mlocked LRU",  me_pagecache_dirty },
-   { mlock,mlock,  "mlocked LRU",  me_pagecache_clean },
+   { mlock|dirty,  mlock|dirty,"dirty mlocked LRU",
me_pagecache_dirty },
+   { mlock,mlock,  "clean mlocked LRU",
me_pagecache_clean },
 
-   { lru|dirty,lru|dirty,  "LRU",  me_pagecache_dirty },
+   { lru|dirty,lru|dirty,  "dirty LRU",me_pagecache_dirty },
{ lru|dirty,lru,"clean LRU",me_pagecache_clean },
 
/*
@@ -812,12 +812,8 @@ static struct page_state {
 
 static void action_result(unsigned long pfn, char *msg, int result)
 {
-   struct page *page = pfn_to_page(pfn);
-
-   printk(KERN_ERR "MCE %#lx: %s%s page recovery: %s\n",
-   pfn,
-   PageDirty(page) ? "dirty " : "",
-   msg, action_name[result]);
+   pr_err("MCE %#lx: %s page recovery: %s\n",
+   pfn, msg, action_name[result]);
 }
 
 static int page_action(struct page_state *ps, struct page *p,
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RESEND 2/2] msm: io: Change the default static iomappings to be shared

2012-08-10 Thread Rohit Vaswani
With 3.4 kernel the static iomappings can be shared with the ioremap
mappings. If ioremap is called with an address for which a static
mapping already exists, then that mapping should be used instead
of creating a new one.
However, the MT_DEVICE_NONSHARED flag prevents this. Hence, get rid
of this flag. Some targets (7X00) that require the static iomappings
to be NONSHARED use the MSM_DEVICE_TYPE and MSM_CHIP_DEVICE_TYPE macros.

Signed-off-by: Rohit Vaswani 
---
 arch/arm/mach-msm/io.c |   25 +
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mach-msm/io.c b/arch/arm/mach-msm/io.c
index 2409c0b..5fc2e48 100644
--- a/arch/arm/mach-msm/io.c
+++ b/arch/arm/mach-msm/io.c
@@ -33,23 +33,32 @@
.virtual = (unsigned long) MSM_##name##_BASE, \
.pfn = __phys_to_pfn(chip##_##name##_PHYS), \
.length = chip##_##name##_SIZE, \
-   .type = MT_DEVICE_NONSHARED, \
+   .type = MT_DEVICE, \
+}
+
+#define MSM_CHIP_DEVICE_TYPE(name, chip, mem_type) { \
+   .virtual = (unsigned long) MSM_##name##_BASE, \
+   .pfn = __phys_to_pfn(chip##_##name##_PHYS), \
+   .length = chip##_##name##_SIZE, \
+   .type = mem_type, \
 }
 
 #define MSM_DEVICE(name) MSM_CHIP_DEVICE(name, MSM)
+#define MSM_DEVICE_TYPE(name, mem_type) \
+   MSM_CHIP_DEVICE_TYPE(name, MSM, mem_type)
 
 #if defined(CONFIG_ARCH_MSM7X00A) || defined(CONFIG_ARCH_MSM7X27) \
|| defined(CONFIG_ARCH_MSM7X25)
 static struct map_desc msm_io_desc[] __initdata = {
-   MSM_DEVICE(VIC),
-   MSM_CHIP_DEVICE(CSR, MSM7X00),
-   MSM_DEVICE(DMOV),
-   MSM_CHIP_DEVICE(GPIO1, MSM7X00),
-   MSM_CHIP_DEVICE(GPIO2, MSM7X00),
-   MSM_DEVICE(CLK_CTL),
+   MSM_DEVICE_TYPE(VIC, MT_DEVICE_NONSHARED),
+   MSM_CHIP_DEVICE_TYPE(CSR, MSM7X00, MT_DEVICE_NONSHARED),
+   MSM_DEVICE_TYPE(DMOV, MT_DEVICE_NONSHARED),
+   MSM_CHIP_DEVICE_TYPE(GPIO1, MSM7X00, MT_DEVICE_NONSHARED),
+   MSM_CHIP_DEVICE_TYPE(GPIO2, MSM7X00, MT_DEVICE_NONSHARED),
+   MSM_CHIP_DEVICE_TYPE(CLK_CTL, MSM7X00, MT_DEVICE_NONSHARED),
 #if defined(CONFIG_DEBUG_MSM_UART1) || defined(CONFIG_DEBUG_MSM_UART2) || \
defined(CONFIG_DEBUG_MSM_UART3)
-   MSM_DEVICE(DEBUG_UART),
+   MSM_DEVICE_TYPE(DEBUG_UART, MT_DEVICE_NONSHARED),
 #endif
{
.virtual =  (unsigned long) MSM_SHARED_RAM_BASE,
-- 
Sent by an employee of the Qualcomm Innovation Center,Inc
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RESEND 1/2] msm: io: Remove 7x30 iomap region from 7x00

2012-08-10 Thread Rohit Vaswani
This is redundant code.

Signed-off-by: Rohit Vaswani 
---
 arch/arm/mach-msm/io.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mach-msm/io.c b/arch/arm/mach-msm/io.c
index a1e7b11..2409c0b 100644
--- a/arch/arm/mach-msm/io.c
+++ b/arch/arm/mach-msm/io.c
@@ -51,9 +51,6 @@ static struct map_desc msm_io_desc[] __initdata = {
defined(CONFIG_DEBUG_MSM_UART3)
MSM_DEVICE(DEBUG_UART),
 #endif
-#ifdef CONFIG_ARCH_MSM7X30
-   MSM_DEVICE(GCC),
-#endif
{
.virtual =  (unsigned long) MSM_SHARED_RAM_BASE,
.pfn = __phys_to_pfn(MSM_SHARED_RAM_PHYS),
-- 
Sent by an employee of the Qualcomm Innovation Center,Inc
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xtensa port maintenance

2012-08-10 Thread Arnd Bergmann
On Monday 06 August 2012, Max Filippov wrote:
> I have a couple of questions regarding the path of xtensa-specific patches
> upstream:
> - which git tree should they be targeted for? Should I set up a tree for
>   pull requests, or will patches be picked up into some existing tree?
>   (Looks like Linus' tree is the right target. AFAIK previously xtensa
>   patches went mostly through akpm tree).

Setting up a git tree is a good first step if you want to be the official
maintainer, and if you want to get it included into linux-next.

You should also update the maintainers file to list your git tree and name,
and have Chris give you an official approval for that update. My impression
is that he is still occasionally doing work on upstream maintainance but
has moved on to other priorities now. The two of you should decide
together if you want to both be listed as maintainers or one of you
should be a primary contact and the other one doing work in the background.

> - which mailing lists should they go to?
>   (I guess that besides linux-xte...@linux-xtensa.org list they should go
>   to linux-kernel@vger.kernel.org for general review. Anything else?)

There is also linux-arch, which has the architecture maintainers. You can
consult that list if you have specific questions about changes that are
going on across architectures.

What kind of changes to you expect to do to the architecture port?
Are there additional platforms you want to get supported? Do you
want to stay compatible with existing user space software, or are you
thinking about moving to the new generic system call interfaces that
would require rebuilding all user land binaries?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MPILIB: Provide count_leading/trailing_zeros() based on arch functions

2012-08-10 Thread David Miller
From: Jan Engelhardt 
Date: Fri, 10 Aug 2012 14:51:49 +0200 (CEST)

> 
> On Saturday 2012-07-21 02:46, David Miller wrote:
>>> Arnd Bergmann  wrote:
>>> 
 I don't generally like to put stuff into asm-generic when it's unlikely
 to be overridden by architectures. It would really belong into
 include/linux, but then again we have all the other bitops in asm-generic
 as well, so whatever...
>>> 
>>> Some arches (such as Sparc, I think) have count-leading-zero instructions.
>>
>>Yes, newer sparc64 chips have leading-zero-detect, and I was pretty
>>sure that powerpc had something similar.  It's called count-leading-
>>zeros or something like that.
> 
> And gcc has a __builtin_clz.

Which I can't use.  I have to patch the code at run time based upon
whether the cpu has the 'lzd' instruction or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [PATCH V4]Extcon: adc_jack: adc-jack driver to support 3.5 pi or simliar devices

2012-08-10 Thread Jonathan Cameron
On 08/08/2012 02:04 AM, anish kumar wrote:
> From: anish kumar 
>
> External connector devices that decides connection information based on
> ADC values may use adc-jack device driver. The user simply needs to
> provide a table of adc range and connection states. Then, extcon
> framework will automatically notify others.

Couple of utterly trivial points inline.
Otherwise looks fine to me.
>
> Changes in V1:
> added Lars-Peter Clausen suggested changes:
> Using macros to get rid of boiler plate code such as devm_kzalloc
> and module_platform_driver.Other changes suggested are related to
> coding guidelines.
>
> Changes in V2:
> Removed some unnecessary checks and changed the way we are un-regitering
> extcon and freeing the irq while removing.
>
> Changes in V3:
> Renamed the files to comply with extcon naming.
>
> Changes in this version:
> Added the cancel_work_sync during removing of driver.
>
> Reviewed-by: Lars-Peter Clausen 
> Signed-off-by: anish kumar 
> Signed-off-by: MyungJoo Ham 
Don't these normally go in order of when they occured?
Hence first sign offs are the authors, any acks / reviewed-bys
after that and final sign offs for the merges.
> ---
>  drivers/extcon/Kconfig |5 +
>  drivers/extcon/Makefile|1 +
>  drivers/extcon/extcon-adc-jack.c   |  194 
> 
>  include/linux/extcon/extcon-adc-jack.h |   73 
>  4 files changed, 273 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/extcon/extcon-adc-jack.c
>  create mode 100644 include/linux/extcon/extcon-adc-jack.h
>
> diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig
> index e175c8e..596e277 100644
> --- a/drivers/extcon/Kconfig
> +++ b/drivers/extcon/Kconfig
> @@ -21,6 +21,11 @@ config EXTCON_GPIO
> Say Y here to enable GPIO based extcon support. Note that GPIO
> extcon supports single state per extcon instance.
>
> +config EXTCON_ADC_JACK
> +tristate "ADC Jack extcon support"
> +help
> +  Say Y here to enable extcon device driver based on ADC values.
> +
>  config EXTCON_MAX77693
>   tristate "MAX77693 EXTCON Support"
>   depends on MFD_MAX77693
> diff --git a/drivers/extcon/Makefile b/drivers/extcon/Makefile
> index 88961b3..bc7111e 100644
> --- a/drivers/extcon/Makefile
> +++ b/drivers/extcon/Makefile
> @@ -4,6 +4,7 @@
>
>  obj-$(CONFIG_EXTCON) += extcon_class.o
>  obj-$(CONFIG_EXTCON_GPIO)+= extcon_gpio.o
> +obj-$(CONFIG_EXTCON_ADC_JACK)   += extcon-adc-jack.o
>  obj-$(CONFIG_EXTCON_MAX77693)+= extcon-max77693.o
>  obj-$(CONFIG_EXTCON_MAX8997) += extcon-max8997.o
>  obj-$(CONFIG_EXTCON_ARIZONA) += extcon-arizona.o
> diff --git a/drivers/extcon/extcon-adc-jack.c 
> b/drivers/extcon/extcon-adc-jack.c
> new file mode 100644
> index 000..cfc8c59
> --- /dev/null
> +++ b/drivers/extcon/extcon-adc-jack.c
> @@ -0,0 +1,194 @@
> +/*
> + * drivers/extcon/extcon-adc-jack.c
> + *
> + * Analog Jack extcon driver with ADC-based detection capability.
> + *
> + * Copyright (C) 2012 Samsung Electronics
> + * MyungJoo Ham 
> + *
> + * Modified for calling to IIO to get adc by 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/**
> + * struct adc_jack_data - internal data for adc_jack device driver
> + * @edev- extcon device.
> + * @cable_names - list of supported cables.
> + * @num_cables  - size of cable_names.
> + * @adc_condition   - list of adc value conditions.
> + * @num_condition   - size of adc_condition.
> + * @irq - irq number of attach/detach event (0 if not exist).
> + * @handling_delay  - interrupt handler will schedule extcon event
> + *  handling at handling_delay jiffies.
> + * @handler - extcon event handler called by interrupt handler.
> + * @chan - iio channel being queried.
> + */
> +struct adc_jack_data {
> + struct extcon_dev edev;
> +
> + const char **cable_names;
> + int num_cables;
> + struct adc_jack_cond *adc_condition;
> + int num_conditions;
> +
> + int irq;
> + unsigned long handling_delay; /* in jiffies */
> + struct delayed_work handler;
> +
> + struct iio_channel *chan;
> +};
> +
> +static void adc_jack_handler(struct work_struct *work)
> +{
> + struct adc_jack_data *data = container_of(to_delayed_work(work),
> +   struct adc_jack_data,
> +   handler);
> + u32 state = 0;
> + int ret, adc_val;
> + int i;
> +
> + ret = iio_read_channel_raw(data->chan, _val);
> + if (ret < 0) {
> + dev_err(data->edev.dev, "read channel() error: %d\n", ret);
> + return;
> 

linux-user-chroot 2012.2

2012-08-10 Thread Colin Walters
Hi,

This is the release of linux-user-chroot 2012.2.  The major change now
is that it makes use of Andy's new PR_SET_NO_NEW_PRIVS.  This doesn't
close any security hole I'm aware of - our previous use of the MS_NOSUID
bind mount over / should work - but, belt and suspenders as they say.

The code:
http://git.gnome.org/browse/linux-user-chroot/commit/?id=515c714471d0b5923f6633ef44a2270b23656ee9

As for how linux-user-chroot and PR_SET_NO_NEW_PRIVS relate, see this
thread:
http://thread.gmane.org/gmane.linux.kernel.lsm/15339

Summary
---

This tool allows regular (non-root) users to call chroot(2), create
Linux bind mounts, and use some Linux container features.  It's
primarily intended for use by build systems.

Project information
---

There's no web page yet; send patches to
Colin Walters 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] spi/bcm63xx: Ensure that memory is freed only after it is no longer used

2012-08-10 Thread Guenter Roeck
The call to spi_unregister_master() in the device remove function frees device
memory, and with it any device local data. However, device local data is still
accessed after the call to spi_unregister_master().

Acquire a reference to the SPI device and release it after cleanup is complete
to solve the problem.

Cc: Florian Fainelli 
Signed-off-by: Guenter Roeck 
---
Several drivers have this problem, and I am trying to find a common fix.

This solution is modeled after the approach used in spi-txx9spi:txx9spi_remove.
The other possible fix would be to move spi_unregister_master() to the end of
bcm63xx_spi_remove(), but I am not sure if it is a good idea to clean up
before the call to spi_unregister_master().

 drivers/spi/spi-bcm63xx.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-bcm63xx.c b/drivers/spi/spi-bcm63xx.c
index 6e25ef1..ea0aaa3 100644
--- a/drivers/spi/spi-bcm63xx.c
+++ b/drivers/spi/spi-bcm63xx.c
@@ -438,7 +438,7 @@ out:
 
 static int __devexit bcm63xx_spi_remove(struct platform_device *pdev)
 {
-   struct spi_master *master = platform_get_drvdata(pdev);
+   struct spi_master *master = spi_master_get(platform_get_drvdata(pdev));
struct bcm63xx_spi *bs = spi_master_get_devdata(master);
 
spi_unregister_master(master);
@@ -452,6 +452,8 @@ static int __devexit bcm63xx_spi_remove(struct 
platform_device *pdev)
 
platform_set_drvdata(pdev, 0);
 
+   spi_master_put(master);
+
return 0;
 }
 
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 3/3] KVM: perf kvm events analysis tool

2012-08-10 Thread David Ahern

Thanks for resubmitting this; it was on my to-do list as well.

On 8/9/12 9:19 PM, Dong Hao wrote:

+static bool kvm_events_exist(const char *event)
+{
+   char evt_path[MAXPATHLEN];
+   int fd;
+
+   snprintf(evt_path, MAXPATHLEN, "%s/kvm/%s/id", tracing_events_path,
+event);
+
+   fd = open(evt_path, O_RDONLY);


Use is_valid_tracepoint().

For consistency, it's worth adding a check for the other events too with 
an appropriate config message. e.g.,:

  https://lkml.org/lkml/2012/8/9/359

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >