date:20121214

Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-14 Thread Yinghai Lu

On Fri, Dec 14, 2012 at 12:08 PM, Yinghai Lu  wrote:
> On Fri, Dec 14, 2012 at 12:04 PM, Yinghai Lu  wrote:
>> On Fri, Dec 14, 2012 at 11:46 AM, H. Peter Anvin  wrote:
>>>
>>> I suspect we don't need init_level4_pgt at all and should just plan to
>>> get rid of it.  Is there any reason we can't just build the proper
>>> kernel page table set in pagetable_init() and switch to it there?
>>
>> then how to pass the info to AP?
>
> also we should merge early_level4_pgt with init_level4_pgt.
>
> and #PE handler could just extend to use BRK ...
>
> but need to make sure BRK get mapped at first, and BRK could cross the
> 1G, 512G boundary ...
>
> that could make things less impact to all.

I tailored your patch and made use 2M page increase to replace patch
ioremap function.

   [PATCH v6 12/27] x86: use io_remap to access real_mode_data

and it will extend init_level4_pgt to map extra range. that will limit
affect to even others.

please check if that is ok to you.

Thanks

Yinghai


limit_pf_handler.patch
Description: Binary data

Re: [PATCH] time: create __getnstimeofday for WARNless calls

2012-12-14 Thread Kees Cook

On Fri, Dec 14, 2012 at 5:16 PM, John Stultz  wrote:
> On 12/13/2012 10:17 AM, Kees Cook wrote:
>>
>> John, any feedback on this?
>>
> Sorry, yea, I've been meaning to get back to this.
>
> I'm still on the fence about just making getnstimeofday() safe for when
> timekeeping is suspended, but at the same time, your issue needs fixing.
> Also bailing out at the end still seems off to me. Even if someone is using
> the values despite the WARN_ON, they really are getting junk values, and for
> all the time that WARN_ON has been there, you're the first to report running
> into it.
>
> Even so, I think I'm ok with this patch for now, but I suspect we may want
> to rework it later.
>
> Looking at my inbox, I actually can't find a copy of this specific patch. Do
> you mind bouncing it to me, so I have something I can apply?
>
> Should this also get marked for -stable?

Nah, I don't think it's worth it.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq_stats: fix race between stats allocation and first usage

2012-12-14 Thread Konstantin Khlebnikov


Rafael J. Wysocki wrote:

On Friday, December 14, 2012 02:59:21 PM Konstantin Khlebnikov wrote:

This patch forces complete struct cpufreq_stats allocation for all cpus before
registering CPUFREQ_TRANSITION_NOTIFIER notifier, otherwise in some conditions
cpufreq_stat_notifier_trans() can be called in the middle of stats allocation,
in this case cpufreq_stats_table already exists, but stat->freq_table is NULL.


I'll queue it up for submission as v3.8 material.

Does it need to be marked as -stable material too?


It's very old and rare bug. I think you can leave it as is.



Rafael



Signed-off-by: Konstantin Khlebnikov
Cc: Rafael J. Wysocki
Cc: cpufreq
Cc: linux-pm

---

<1>[  363.116198] BUG: unable to handle kernel NULL pointer dereference at 
(null)
<1>[  363.116668] IP: [] 
cpufreq_stat_notifier_trans+0x64/0xf0 [cpufreq_stats]
<4>[  363.116977] PGD 23177e067 PUD 2349c1067 PMD 0
<4>[  363.117151] Oops:  [#1] SMP
<4>[  363.117151] last sysfs file: /sys/module/freq_table/initstate
<4>[  363.117151] CPU 5
<4>[  363.117151] Modules linked in: cpufreq_stats(+)(U) [a lot] [last 
unloaded: umc]
<4>[  363.117151]
<4>[  363.117151] Pid: 1690, comm: kondemand/5 veid: 0 Tainted: PWC 
---  T 2.6.32-279.5.1.el6-042stab061.7-vz #112 042stab061_7 System 
manufacturer System Product Name/Crosshair IV Formula
<4>[  363.117151] RIP: 0010:[]  [] 
cpufreq_stat_notifier_trans+0x64/0xf0 [cpufreq_stats]
<4>[  363.117151] RSP: 0018:880234281920  EFLAGS: 00010246
<4>[  363.117151] RAX: 001e12e8 RBX:  RCX: 
002ab980
<4>[  363.117151] RDX: 0004 RSI:  RDI: 
0005
<4>[  363.117151] RBP: 880234281940 R08:  R09: 

<4>[  363.117151] R10:  R11:  R12: 
880218ce7400
<4>[  363.117151] R13:  R14:  R15: 

<4>[  363.117151] FS:  7f499ffe0700() GS:88003100() 
knlGS:
<4>[  363.117151] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
<4>[  363.117151] CR2:  CR3: 000230af7000 CR4: 
06e0
<4>[  363.117151] DR0:  DR1:  DR2: 

<4>[  363.117151] DR3:  DR6: 0ff0 DR7: 
0400
<4>[  363.117151] Process kondemand/5 (pid: 1690, veid: 0, threadinfo 
88023428, task 8802330c48c0)
<4>[  363.117151] Stack:
<4>[  363.117151]  810cf4f3 0001  
a11a7ac0
<4>[  363.117151]  880234281990 815454a8 880234281c80 

<4>[  363.117151]  880234281a10 833be978 833be8e0 
0001
<4>[  363.117151] Call Trace:
<4>[  363.117151]  [] ? is_module_text_address+0x23/0x30
<4>[  363.117151]  [] notifier_call_chain+0x58/0xb0
<4>[  363.117151]  [] __srcu_notifier_call_chain+0x5d/0x90
<4>[  363.117151]  [] srcu_notifier_call_chain+0x16/0x20
<4>[  363.117151]  [] cpufreq_notify_transition+0x12a/0x190
<4>[  363.117151]  [] powernowk8_target+0x628/0xb30 
[powernow_k8]
<4>[  363.117151]  [] __cpufreq_driver_target+0x8b/0x90
<4>[  363.117151]  [] do_dbs_timer+0x3b8/0x3bc 
[cpufreq_ondemand]
<4>[  363.117151]  [] ? do_dbs_timer+0x0/0x3bc 
[cpufreq_ondemand]
<4>[  363.117151]  [] worker_thread+0x264/0x440
<4>[  363.117151]  [] ? worker_thread+0x213/0x440
<4>[  363.117151]  [] ? worker_thread+0x0/0x440
<4>[  363.117151]  [] ? autoremove_wake_function+0x0/0x40
<4>[  363.117151]  [] ? worker_thread+0x0/0x440
<4>[  363.117151]  [] kthread+0x96/0xa0
<4>[  363.117151]  [] child_rip+0xa/0x20
<4>[  363.117151]  [] ? restore_args+0x0/0x30
<4>[  363.117151]  [] ? kthread+0x0/0xa0
<4>[  363.117151]  [] ? child_rip+0x0/0x20
<4>[  363.117151] Code: 89 f9 48 8b 0c cd 20 53 9c 81 4c 8b 24 08 4d 85 e4 74 d3 8b 
4a 08 41 8b 54 24 10 45 8b 6c 24 18 85 d2 74 22 49 8b 74 24 28 31 db<3b>  0e 75 10 eb 
1a 66 0f 1f 44 00 00 48 63 c3 3b 0c 86 74 0c 83
<1>[  363.117151] RIP  [] 
cpufreq_stat_notifier_trans+0x64/0xf0 [cpufreq_stats]
<4>[  363.117151]  RSP
<4>[  363.117151] CR2: 
---
  drivers/cpufreq/cpufreq_stats.c |   11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index e40e508..9d7732b 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -364,18 +364,21 @@ static int __init cpufreq_stats_init(void)
if (ret)
return ret;

+   register_hotcpu_notifier(_stat_cpu_notifier);
+   for_each_online_cpu(cpu)
+   cpufreq_update_policy(cpu);
+
ret = cpufreq_register_notifier(_trans_block,
CPUFREQ_TRANSITION_NOTIFIER);
if (ret) {
cpufreq_unregister_notifier(_policy_block,
CPUFREQ_POLICY_NOTIFIER);
+   unregister_hotcpu_notifier(_stat_cpu_notifier);
+

[PATCH v3 5/5] KVM: x86: improve reexecute_instruction

2012-12-14 Thread Xiao Guangrong

The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |7 +
 arch/x86/kvm/paging_tmpl.h  |   23 +++-
 arch/x86/kvm/x86.c  |   58 +--
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dc87b65..487f0a1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -502,6 +502,13 @@ struct kvm_vcpu_arch {
u64 msr_val;
struct gfn_to_hva_cache data;
} pv_eoi;
+
+   /*
+* Cache the access info when fix page fault then use
+* them to detect unhandeable instruction.
+*/
+   gva_t fault_addr;
+   bool target_gfn_is_pt;
 };

 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0453fa0..b67fab3 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -506,21 +506,27 @@ out_gpte_changed:
  * size to map the gfn which is used as PDPT.
  */
 static bool
-FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
+FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, gva_t addr,
  struct guest_walker *walker, int user_fault)
 {
int level;
gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
+   bool self_changed = false;

if (!(walker->pte_access & ACC_WRITE_MASK ||
  (!is_write_protection(vcpu) && !user_fault)))
return false;

-   for (level = walker->level; level <= walker->max_level; level++)
-   if (!((walker->gfn ^ walker->table_gfn[level - 1]) & mask))
-   return true;
+   vcpu->arch.fault_addr = addr;

-   return false;
+   for (level = walker->level; level <= walker->max_level; level++) {
+   gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1];
+
+   self_changed |= !(gfn & mask);
+   vcpu->arch.target_gfn_is_pt |= !gfn;
+   }
+
+   return self_changed;
 }

 /*
@@ -548,7 +554,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int level = PT_PAGE_TABLE_LEVEL;
int force_pt_level;
unsigned long mmu_seq;
-   bool map_writable;
+   bool map_writable, is_self_change_mapping;

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);

@@ -576,9 +582,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
return 0;
}

+   is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, addr,
+  , user_fault);
+
if (walker.level >= PT_DIRECTORY_LEVEL)
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
-  || FNAME(is_self_change_mapping)(vcpu, , user_fault);
+  || is_self_change_mapping;
else
force_pt_level = 1;
if (!force_pt_level) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf66169..fc33563 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4756,29 +4756,25 @@ static int handle_emulation_failure(struct kvm_vcpu 
*vcpu)
 static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
 {
gpa_t gpa = cr2;
+   gfn_t gfn;
pfn_t pfn;
-   unsigned int indirect_shadow_pages;
-
-   spin_lock(>kvm->mmu_lock);
-   indirect_shadow_pages = vcpu->kvm->arch.indirect_shadow_pages;
-   spin_unlock(>kvm->mmu_lock);
-
-   if (!indirect_shadow_pages)
-   return false;

if (!vcpu->arch.mmu.direct_map) {
-   gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL);
+   /*
+* Write permission should be allowed since only
+* write access need to be emulated.
+*/
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);
+
+   /*
+* If the mapping is invalid in guest, let cpu retry
+* it to generate fault.
+*/
if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
+   return true;
}

-   /*
-* if emulation was due to access to shadowed page table
-* and it failed try to unshadow page and re-enter the
-* guest to let CPU execute the instruction.
-*/
-   if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
-   return true;
+   gfn = gpa_to_gfn(gpa);

[PATCH v3 4/5] KVM: x86: let reexecute_instruction work for tdp

2012-12-14 Thread Xiao Guangrong

Currently, reexecute_instruction refused to retry all instructions. If
nested npt is used, the emulation may be caused by shadow page, it can
be fixed by dropping the shadow page

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/x86.c |   19 +--
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eccd040..bf66169 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4753,17 +4753,24 @@ static int handle_emulation_failure(struct kvm_vcpu 
*vcpu)
return r;
 }

-static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
+static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
 {
-   gpa_t gpa;
+   gpa_t gpa = cr2;
pfn_t pfn;
+   unsigned int indirect_shadow_pages;
+
+   spin_lock(>kvm->mmu_lock);
+   indirect_shadow_pages = vcpu->kvm->arch.indirect_shadow_pages;
+   spin_unlock(>kvm->mmu_lock);

-   if (tdp_enabled)
+   if (!indirect_shadow_pages)
return false;

-   gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-   if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
+   if (!vcpu->arch.mmu.direct_map) {
+   gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL);
+   if (gpa == UNMAPPED_GVA)
+   return true; /* let cpu generate fault */
+   }

/*
 * if emulation was due to access to shadowed page table
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 3/5] KVM: x86: clean up reexecute_instruction

2012-12-14 Thread Xiao Guangrong

Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/x86.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76f5446..eccd040 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4761,19 +4761,18 @@ static bool reexecute_instruction(struct kvm_vcpu 
*vcpu, gva_t gva)
if (tdp_enabled)
return false;

+   gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
+   if (gpa == UNMAPPED_GVA)
+   return true; /* let cpu generate fault */
+
/*
 * if emulation was due to access to shadowed page table
 * and it failed try to unshadow page and re-enter the
 * guest to let CPU execute the instruction.
 */
-   if (kvm_mmu_unprotect_page_virt(vcpu, gva))
+   if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
return true;

-   gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-
-   if (gpa == UNMAPPED_GVA)
-   return true; /* let cpu generate fault */
-
/*
 * Do not retry the unhandleable instruction if it faults on the
 * readonly host memory, otherwise it will goto a infinite loop:
@@ -4828,7 +4827,7 @@ static bool retry_instruction(struct x86_emulate_ctxt 
*ctxt,
if (!vcpu->arch.mmu.direct_map)
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-   kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));

return true;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 2/5] KVM: MMU: fix infinite fault access retry

2012-12-14 Thread Xiao Guangrong

We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   13 -
 arch/x86/kvm/paging_tmpl.h |   35 ++-
 2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a3c890..54fc61e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2380,15 +2380,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (pte_access & ACC_WRITE_MASK) {

/*
-* There are two cases:
-* - the one is other vcpu creates new sp in the window
-*   between mapping_level() and acquiring mmu-lock.
-* - the another case is the new sp is created by itself
-*   (page-fault path) when guest uses the target gfn as
-*   its page table.
-* Both of these cases can be fixed by allowing guest to
-* retry the access, it will refault, then we can establish
-* the mapping by using small page.
+* Other vcpu creates new sp in the window between
+* mapping_level() and acquiring mmu-lock. We can
+* allow guest to retry the access, the mapping can
+* be fixed if guest refault.
 */
if (level > PT_PAGE_TABLE_LEVEL &&
has_wrprotected_page(vcpu->kvm, gfn, level))
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index c1e01b6..0453fa0 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -491,6 +491,38 @@ out_gpte_changed:
return 0;
 }

+ /*
+ * To see whether the mapped gfn can write its page table in the current
+ * mapping.
+ *
+ * It is the helper function of FNAME(page_fault). When guest uses large page
+ * size to map the writable gfn which is used as current page table, we should
+ * force kvm to use small page size to map it because new shadow page will be
+ * created when kvm establishes shadow page table that stop kvm using large
+ * page size. Do it early can avoid unnecessary #PF and emulation.
+ *
+ * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
+ * since the PDPT is always shadowed, that means, we can not use large page
+ * size to map the gfn which is used as PDPT.
+ */
+static bool
+FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
+ struct guest_walker *walker, int user_fault)
+{
+   int level;
+   gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
+
+   if (!(walker->pte_access & ACC_WRITE_MASK ||
+ (!is_write_protection(vcpu) && !user_fault)))
+   return false;
+
+   for (level = walker->level; level <= walker->max_level; level++)
+   if (!((walker->gfn ^ walker->table_gfn[level - 1]) & mask))
+   return true;
+
+   return false;
+}
+
 /*
  * Page fault handler.  There are several causes for a page fault:
  *   - there is no shadow pte for the guest pte
@@ -545,7 +577,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
}

if (walker.level >= PT_DIRECTORY_LEVEL)
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
+   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
+  || FNAME(is_self_change_mapping)(vcpu, , user_fault);
else
force_pt_level = 1;
if (!force_pt_level) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/5] KVM: MMU: fix Dirty bit missed if CR0.WP = 0

2012-12-14 Thread Xiao Guangrong

If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   47 ---
 arch/x86/kvm/paging_tmpl.h |   30 +++
 2 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 01d7c2a..2a3c890 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2342,8 +2342,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, 
gfn_t gfn,
 }

 static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-   unsigned pte_access, int user_fault,
-   int write_fault, int level,
+   unsigned pte_access, int level,
gfn_t gfn, pfn_t pfn, bool speculative,
bool can_unsync, bool host_writable)
 {
@@ -2378,9 +2377,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= (u64)pfn << PAGE_SHIFT;

-   if ((pte_access & ACC_WRITE_MASK)
-   || (!vcpu->arch.mmu.direct_map && write_fault
-   && !is_write_protection(vcpu) && !user_fault)) {
+   if (pte_access & ACC_WRITE_MASK) {

/*
 * There are two cases:
@@ -2399,19 +2396,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,

spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;

-   if (!vcpu->arch.mmu.direct_map
-   && !(pte_access & ACC_WRITE_MASK)) {
-   spte &= ~PT_USER_MASK;
-   /*
-* If we converted a user page to a kernel page,
-* so that the kernel can write to it when cr0.wp=0,
-* then we should prevent the kernel from executing it
-* if SMEP is enabled.
-*/
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
-   spte |= PT64_NX_MASK;
-   }
-
/*
 * Optimization: for pte sync, if spte was writable the hash
 * lookup is unnecessary (and expensive). Write protection
@@ -2442,18 +2426,15 @@ done:

 static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 unsigned pt_access, unsigned pte_access,
-int user_fault, int write_fault,
-int *emulate, int level, gfn_t gfn,
-pfn_t pfn, bool speculative,
-bool host_writable)
+int write_fault, int *emulate, int level, gfn_t gfn,
+pfn_t pfn, bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;

-   pgprintk("%s: spte %llx access %x write_fault %d"
-" user_fault %d gfn %llx\n",
+   pgprintk("%s: spte %llx access %x write_fault %d gfn %llx\n",
 __func__, *sptep, pt_access,
-write_fault, user_fault, gfn);
+write_fault, gfn);

if (is_rmap_spte(*sptep)) {
/*
@@ -2477,9 +2458,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
was_rmapped = 1;
}

-   if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
- level, gfn, pfn, speculative, true,
- host_writable)) {
+   if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
+ true, host_writable)) {
if (write_fault)
*emulate = 1;
kvm_mmu_flush_tlb(vcpu);
@@ -2571,10 +2551,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu 
*vcpu,
return -1;

for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, ACC_ALL,
-access, 0, 0, NULL,
-sp->role.level, gfn,
-page_to_pfn(pages[i]), true, true);
+   mmu_set_spte(vcpu, start, ACC_ALL, access, 0, NULL,
+sp->role.level, gfn, page_to_pfn(pages[i]),
+true, true);

return 0;
 }
@@ -2636,8 +2615,8 @@

[PATCH v3 0/5] KVM: x86: improve reexecute_instruction

2012-12-14 Thread Xiao Guangrong

Changlog:
- do not change pte access for mmio access
- a new bug is exposed that Dirty bit is not tracked if CR0.WP = 0
- cache something on page fault path and use them to detect unhandleable
  instruction, suggested by Marcelo

I will add the two testcase for unhandleable instruction after figure
out a way to notify the unemulationable error to guest

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-14 Thread Mike Galbraith

On Fri, 2012-12-14 at 11:43 +0100, Vincent Guittot wrote: 
> On 14 December 2012 08:45, Mike Galbraith  wrote:
> > On Fri, 2012-12-14 at 14:36 +0800, Alex Shi wrote:
> >> On 12/14/2012 12:45 PM, Mike Galbraith wrote:
> >> >> > Do you have further ideas for buddy cpu on such example?
> >> >>> > >
> >> >>> > > Which kind of sched_domain configuration have you for such system ?
> >> >>> > > and how many sched_domain level have you ?
> >> >> >
> >> >> > it is general X86 domain configuration. with 4 levels,
> >> >> > sibling/core/cpu/numa.
> >> > CPU is a bug that slipped into domain degeneration.  You should have
> >> > SIBLING/MC/NUMA (chasing that down is on todo).
> >>
> >> Maybe.
> >> the CPU/NUMA is different on domain flags, CPU has SD_PREFER_SIBLING.
> >
> > What I noticed during (an unrelated) bisection on a 40 core box was
> > domains going from so..
> >
> > 3.4.0-bisect (virgin)
> > [5.056214] CPU0 attaching sched-domain:
> > [5.065009]  domain 0: span 0,32 level SIBLING
> > [5.075011]   groups: 0 (cpu_power = 589) 32 (cpu_power = 589)
> > [5.088381]   domain 1: span 
> > 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
> > [5.107669]groups: 0,32 (cpu_power = 1178)  4,36 (cpu_power = 1178)  
> > 8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
> >  16,48 (cpu_power = 1177) 20,52 (cpu_power = 1178) 
> > 24,56 (cpu_power = 1177) 28,60 (cpu_power = 1177)
> >  64,72 (cpu_power = 1176) 68,76 (cpu_power = 1176)
> > [5.162115]domain 2: span 0-79 level NODE
> > [5.171927] groups: 
> > 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11773)
> >
> > 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11772)
> >
> > 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78 (cpu_power = 
> > 11773)
> >
> > 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 
> > 11770)
> >
> > ..to so, which looks a little bent.  CPU and MC have identical spans, so
> > CPU should have gone away, as it used to do.
> >
> > 3.6.0-bisect (virgin)
> > [3.978338] CPU0 attaching sched-domain:
> > [3.987125]  domain 0: span 0,32 level SIBLING
> > [3.997125]   groups: 0 (cpu_power = 588) 32 (cpu_power = 589)
> > [4.010477]   domain 1: span 
> > 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
> > [4.029748]groups: 0,32 (cpu_power = 1177)  4,36 (cpu_power = 1177)  
> > 8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
> >  16,48 (cpu_power = 1178) 20,52 (cpu_power = 1178) 
> > 24,56 (cpu_power = 1178) 28,60 (cpu_power = 1178)
> >  64,72 (cpu_power = 1178) 68,76 (cpu_power = 1177)
> > [4.084143]domain 2: span 
> > 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level CPU
> > [4.103796] groups: 
> > 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)
> > [4.124373] domain 3: span 0-79 level NUMA
> > [4.134369]  groups: 
> > 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)
> > 
> > 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11778)
> > 
> > 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74 ,78 (cpu_power = 
> > 11778)
> > 
> > 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 
> > 11780)
> >
> 
> Thanks. that's an interesting example of a numa topology
> 
> For your sched_domain difference,
> On 3.4, SD_PREFER_SIBLING was set for both MC and CPU level thanks to
> sd_balance_for_mc_power and  sd_balance_for_package_power
> On 3.6, SD_PREFER_SIBLING is only set for CPU level and this flag
> difference with MC level prevents the destruction of CPU sched_domain
> during the degeneration
> 
> We may need to set SD_PREFER_SIBLING for MC level

Ah, that explains oddity. (todo--).

Hm, seems changing flags should trigger a rebuild. (todo++,drat).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] input: add MELFAS mms144 touchscreen driver

2012-12-14 Thread Nikolay Epifanov

On 15.12.2012 05:46, Joonyoung Shim wrote:
> Hi,
> 
> 2012년 12월 13일 목요일에 Nikolay Epifanov님이 작성:
> 
> This is an initial driver for MELFAS touchscreen chip mms144.
> 
> Signed-off-by: Nikolay Epifanov 
> ---
> I don't know whether single driver could be used for both mms114 and
> mms144.
> Couldn't find datasheets for any of them.
> 
> 
> The touch data process logic of this driver is almost same with it of
> mms114 driver and offset of many registers is also same. I think you can
> merge this to mms114 driver.
> 
> Thanks.
> 
> 
> There are two firmwares available under redistribution license from
> AOSP tree:
> for FPCB revisions 3.2 and 3.1. Not sure if those are Melfas original
> or modified by Samsung. (Just for reference: they are
> mms144_ts_rev3*.fw at
> 
> https://android.googlesource.com/device/samsung/tuna/+/214d003a47e7fe2962df667c5d65bce92a21a40e/
> 
> mux_fw_flash(bool) from platform data switches pins between
> I2C<->GPIO modes.
> 
> Driver has been tested on Samsung Galaxy Nexus except suspend/resume.
> 
> Applied on
> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
> 
> branch next.
> 
>   drivers/input/touchscreen/Kconfig  |   14 +
>   drivers/input/touchscreen/Makefile |1 +
>   drivers/input/touchscreen/mms144.c |  821
> 
>   include/linux/i2c/mms144.h |   34 ++
>   4 files changed, 870 insertions(+)
> 
> diff --git a/drivers/input/touchscreen/Kconfig
> b/drivers/input/touchscreen/Kconfig
> index b93b598..78d9cd3 100644
> --- a/drivers/input/touchscreen/Kconfig
> +++ b/drivers/input/touchscreen/Kconfig
> @@ -369,6 +369,20 @@ config TOUCHSCREEN_MMS114
>To compile this driver as a module, choose M here: the
>module will be called mms114.
> 
> +config TOUCHSCREEN_MMS144
> +   tristate "MELFAS MMS144 touchscreen"
> +   depends on I2C
> +   help
> + Say Y here if you have the MELFAS MMS144 touchscreen
> controller
> + chip in your system.
> + Such kind of chip can be found in Samsung Galaxy Nexus
> + touchscreens.
> +
> + If unsure, say N.
> +
> + To compile this driver as a module, choose M here: the
> + module will be called mms144.
> +
>   config TOUCHSCREEN_MTOUCH
>  tristate "MicroTouch serial touchscreens"
>  select SERIO
> diff --git a/drivers/input/touchscreen/Makefile
> b/drivers/input/touchscreen/Makefile
> index 5f949c0..cfbe87c 100644
> --- a/drivers/input/touchscreen/Makefile
> +++ b/drivers/input/touchscreen/Makefile
> @@ -39,6 +39,7 @@ obj-$(CONFIG_TOUCHSCREEN_MC13783) += mc13783_ts.o
>   obj-$(CONFIG_TOUCHSCREEN_MCS5000)  += mcs5000_ts.o
>   obj-$(CONFIG_TOUCHSCREEN_MIGOR)+= migor_ts.o
>   obj-$(CONFIG_TOUCHSCREEN_MMS114)   += mms114.o
> +obj-$(CONFIG_TOUCHSCREEN_MMS144)   += mms144.o
>   obj-$(CONFIG_TOUCHSCREEN_MTOUCH)   += mtouch.o
>   obj-$(CONFIG_TOUCHSCREEN_MK712)+= mk712.o
>   obj-$(CONFIG_TOUCHSCREEN_HP600)+= hp680_ts_input.o
> diff --git a/drivers/input/touchscreen/mms144.c
> b/drivers/input/touchscreen/mms144.c
> new file mode 100644
> index 000..3bb84d0
> --- /dev/null
> +++ b/drivers/input/touchscreen/mms144.c
> @@ -0,0 +1,821 @@
> +/*
> + * mms144.c - Touchscreen driver for Melfas MMS144 touch controllers
> + *
> + * Copyright (C) 2011 Google Inc.
> + * Author: Dima Zavin 
> + * Simon Wilson 
> + * Copyright (C) 2012 Nikolay Epifanov 
> + *
> + * ISP reflashing code based on original code from Melfas.
> + *
> + * This program is free software; you can redistribute  it and/or
> modify it
> + * under  the terms of  the GNU General  Public License as
> published by the
> + * Free Software Foundation;  either version 2 of the  License, or
> (at your
> + * option) any later version.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MAX_FINGERS10
> +#define MAX_WIDTH  30
> +#define MAX_PRESSURE   255
> +#define FINGER_EVENT_SZ6
> +
> +/* Registers */
> +#define MMS_MODE_CONTROL   0x01
> +#define MMS_XYRES_HI   0x02
> +#define MMS_XRES_LO0x03
> +#define MMS_YRES_LO0x04
> +
> +#define MMS_INPUT_EVENT_PKT_SZ 0x0F
> +#define MMS_INPUT_EVENT0

[PATCH v2 1/6] staging/fwserial: Refine Kconfig help text

2012-12-14 Thread Peter Hurley

Users should be informed upfront that this is a Linux-only affair
currently.

Signed-off-by: Peter Hurley 
---
 drivers/staging/fwserial/Kconfig | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/fwserial/Kconfig b/drivers/staging/fwserial/Kconfig
index 580406c..b2f8331 100644
--- a/drivers/staging/fwserial/Kconfig
+++ b/drivers/staging/fwserial/Kconfig
@@ -3,7 +3,9 @@ config FIREWIRE_SERIAL
depends on FIREWIRE
help
   This enables TTY over IEEE 1394, providing high-speed serial
- connectivity to cabled peers.
+ connectivity to cabled peers. This driver implements a
+ ad-hoc transport protocol and is currently limited to
+ Linux-to-Linux communication.
 
  To compile this driver as a module, say M here:  the module will
  be called firewire-serial.
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/6] staging/fwserial: Remove bandwidth limit logic

2012-12-14 Thread Peter Hurley

Self-limiting asynchronous bandwidth (via reducing the payload)
is not necessary and does not work, because
 1) asynchronous traffic will absorb all available bandwidth (less that
being used for isochronous traffic)
 2) isochronous arbitration always wins.

Signed-off-by: Peter Hurley 
---
 drivers/staging/fwserial/fwserial.c |  3 ---
 drivers/staging/fwserial/fwserial.h | 15 ++-
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index 61ee290..be5db8a 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -40,12 +40,10 @@ static int num_ttys = 4;/* # of std ttys to create 
per fw_card*/
/* - doubles as loopback port index   */
 static bool auto_connect = true;/* try to VIRT_CABLE to every peer
*/
 static bool create_loop_dev = true; /* create a loopback device for each card 
*/
-bool limit_bw; /* limit async bandwidth to 20% of max*/
 
 module_param_named(ttys, num_ttys, int, S_IRUGO | S_IWUSR);
 module_param_named(auto, auto_connect, bool, S_IRUGO | S_IWUSR);
 module_param_named(loop, create_loop_dev, bool, S_IRUGO | S_IWUSR);
-module_param(limit_bw, bool, S_IRUGO | S_IWUSR);
 
 /*
  * Threshold below which the tty is woken for writing
@@ -2940,4 +2938,3 @@ MODULE_DEVICE_TABLE(ieee1394, fwserial_id_table);
 MODULE_PARM_DESC(ttys, "Number of ttys to create for each local firewire 
node");
 MODULE_PARM_DESC(auto, "Auto-connect a tty to each firewire node discovered");
 MODULE_PARM_DESC(loop, "Create a loopback device, fwloop, with ttys");
-MODULE_PARM_DESC(limit_bw, "Limit bandwidth utilization to 20%.");
diff --git a/drivers/staging/fwserial/fwserial.h 
b/drivers/staging/fwserial/fwserial.h
index 8b572ed..cb0eea0 100644
--- a/drivers/staging/fwserial/fwserial.h
+++ b/drivers/staging/fwserial/fwserial.h
@@ -351,7 +351,6 @@ struct fw_serial {
 #define TTY_DEV_NAME   "fwtty" /* ttyFW was taken   */
 static const char tty_dev_name[] =  TTY_DEV_NAME;
 static const char loop_dev_name[] = "fwloop";
-extern bool limit_bw;
 
 struct tty_driver *fwtty_driver;
 
@@ -370,18 +369,16 @@ static inline void fwtty_bind_console(struct fwtty_port 
*port,
 
 /*
  * Returns the max send async payload size in bytes based on the unit device
- * link speed - if set to limit bandwidth to max 20%, use lookup table
+ * link speed. Self-limiting asynchronous bandwidth (via reducing the payload)
+ * is not necessary and does not work, because
+ *   1) asynchronous traffic will absorb all available bandwidth (less that
+ * being used for isochronous traffic)
+ *   2) isochronous arbitration always wins.
  */
 static inline int link_speed_to_max_payload(unsigned speed)
 {
-   static const int max_async[] = { 307, 614, 1229, 2458, 4916, 9832, };
-   BUILD_BUG_ON(ARRAY_SIZE(max_async) - 1 != SCODE_3200);
-
speed = clamp(speed, (unsigned) SCODE_100, (unsigned) SCODE_3200);
-   if (limit_bw)
-   return max_async[speed];
-   else
-   return 1 << (speed + 9);
+   return 1 << (speed + 9);
 }
 
 #endif /* _FIREWIRE_FWSERIAL_H */
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 3/6] staging/fwserial: Limit tx/rx to 1394-2008 spec maximum

2012-12-14 Thread Peter Hurley

Per this conversation https://lkml.org/lkml/2012/11/27/587
limit the maximum transmission to the IEEE 1394-2008 specification
maximum size of 4096 bytes for asynchronous packets.

Signed-off-by: Peter Hurley 
---
 drivers/staging/fwserial/TODO   | 3 ---
 drivers/staging/fwserial/fwserial.c | 8 
 drivers/staging/fwserial/fwserial.h | 4 ++--
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/fwserial/TODO b/drivers/staging/fwserial/TODO
index 7269005..ffe47d1 100644
--- a/drivers/staging/fwserial/TODO
+++ b/drivers/staging/fwserial/TODO
@@ -12,9 +12,6 @@ TODOs
 1. This driver uses the same unregistered vendor id that the firewire core does
  (0xd00d1e). Perhaps this could be exposed as a define in
  firewire-constants.h?
-2. MAX_ASYNC_PAYLOAD needs to be publicly exposed by core/ohci
-   - otherwise how will this driver know the max size of address window to
- open for one packet write?
 3. Maybe device_max_receive() and link_speed_to_max_payload() should be
  taken up by the firewire core?
 4. To avoid dropping rx data while still limiting the maximum buffering,
diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index be5db8a..db1378d 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -174,10 +174,11 @@ static void dump_profile(struct seq_file *m, struct stats 
*stats)
 #define dump_profile(m, stats)
 #endif
 
-/* Returns the max receive packet size for the given card */
+/* Returns the max receive packet size for the given node */
 static inline int device_max_receive(struct fw_device *fw_device)
 {
-   return 1 <<  (clamp_t(int, fw_device->max_rec, 8U, 13U) + 1);
+   /* see IEEE 1394-2008 table 8-8 */
+   return 1 <<  (clamp_t(int, fw_device->max_rec, 8U, 11U) + 1);
 }
 
 static void fwtty_log_tx_error(struct fwtty_port *port, int rcode)
@@ -1683,8 +1684,7 @@ static void fwserial_virt_plug_complete(struct fwtty_peer 
*peer,
 
/* reconfigure tx_fifo optimally for this peer */
spin_lock_bh(>lock);
-   port->max_payload = min3(peer->max_payload, peer->fifo_len,
-MAX_ASYNC_PAYLOAD);
+   port->max_payload = min(peer->max_payload, peer->fifo_len);
dma_fifo_change_tx_limit(>tx_fifo, port->max_payload);
spin_unlock_bh(>port->lock);
 
diff --git a/drivers/staging/fwserial/fwserial.h 
b/drivers/staging/fwserial/fwserial.h
index cb0eea0..953ece6 100644
--- a/drivers/staging/fwserial/fwserial.h
+++ b/drivers/staging/fwserial/fwserial.h
@@ -377,8 +377,8 @@ static inline void fwtty_bind_console(struct fwtty_port 
*port,
  */
 static inline int link_speed_to_max_payload(unsigned speed)
 {
-   speed = clamp(speed, (unsigned) SCODE_100, (unsigned) SCODE_3200);
-   return 1 << (speed + 9);
+   /* Max async payload is 4096 - see IEEE 1394-2008 tables 6-4, 16-18 */
+   return min(512 << speed, 4096);
 }
 
 #endif /* _FIREWIRE_FWSERIAL_H */
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 4/6] staging/fwserial: Update TODO file per reviewer comments

2012-12-14 Thread Peter Hurley

Pursuant to this review https://lkml.org/lkml/2012/11/12/500
by Stefan Richter, update the TODO file.
- Clarify purpose of TODO file
- Remove firewire item #4. As discussed in this conversation
  https://lkml.org/lkml/2012/11/13/564 knowing the AR buffer size
  is not a hard requirement. The required rx buffer size can be
  determined experimentally.
- Remove firewire item #5. This was a private note for further
  experimentation.
- Change firewire item #1. Change suggested header from uapi header
  to kernel-only header.

Signed-off-by: Peter Hurley 
---
 drivers/staging/fwserial/TODO | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/fwserial/TODO b/drivers/staging/fwserial/TODO
index ffe47d1..8dae8fb 100644
--- a/drivers/staging/fwserial/TODO
+++ b/drivers/staging/fwserial/TODO
@@ -1,5 +1,5 @@
-TODOs
--
+TODOs prior to this driver moving out of staging
+
 1. Implement retries for RCODE_BUSY, RCODE_NO_ACK and RCODE_SEND_ERROR
- I/O is handled asynchronously which presents some issues when error
  conditions occur.
@@ -11,14 +11,9 @@ TODOs
 -- Issues with firewire stack --
 1. This driver uses the same unregistered vendor id that the firewire core does
  (0xd00d1e). Perhaps this could be exposed as a define in
- firewire-constants.h?
+ firewire.h?
 3. Maybe device_max_receive() and link_speed_to_max_payload() should be
  taken up by the firewire core?
-4. To avoid dropping rx data while still limiting the maximum buffering,
- the size of the AR context must be known. How to expose this to drivers?
-5. Explore if bigger AR context will reduce RCODE_BUSY responses
-   (or auto-grow to certain max size -- but this would require major surgery
-as the current AR is contiguously mapped)
 
 -- Issues with TTY core --
   1. Hack for alternate device name scheme
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 5/6] staging/fwserial: Assume firmware is OHCI-complaint

2012-12-14 Thread Peter Hurley

Devices which are OHCI v1.0/ v1.1/ v1.2-draft compliant or
RFC 2734 compliant are required by specification to support
max_rec of 8 (512 bytes) or more. Accept reported value.

Signed-off-by: Peter Hurley 
---
 drivers/staging/fwserial/fwserial.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index db1378d..59e90e6 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -174,11 +174,15 @@ static void dump_profile(struct seq_file *m, struct stats 
*stats)
 #define dump_profile(m, stats)
 #endif
 
-/* Returns the max receive packet size for the given node */
+/*
+ * Returns the max receive packet size for the given node
+ * Devices which are OHCI v1.0/ v1.1/ v1.2-draft or RFC 2734 compliant
+ * are required by specification to support max_rec of 8 (512 bytes) or more.
+ */
 static inline int device_max_receive(struct fw_device *fw_device)
 {
/* see IEEE 1394-2008 table 8-8 */
-   return 1 <<  (clamp_t(int, fw_device->max_rec, 8U, 11U) + 1);
+   return min(2 << fw_device->max_rec, 4096);
 }
 
 static void fwtty_log_tx_error(struct fwtty_port *port, int rcode)
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 6/6] staging/fwserial: Drop suggestion for helper fn integration

2012-12-14 Thread Peter Hurley

The firewire core does not require or want the suggested helper fns;
drop suggestion from TODO file.

Signed-off-by: Peter Hurley 
---
 drivers/staging/fwserial/TODO | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/fwserial/TODO b/drivers/staging/fwserial/TODO
index 8dae8fb..dc61d97 100644
--- a/drivers/staging/fwserial/TODO
+++ b/drivers/staging/fwserial/TODO
@@ -12,8 +12,6 @@ TODOs prior to this driver moving out of staging
 1. This driver uses the same unregistered vendor id that the firewire core does
  (0xd00d1e). Perhaps this could be exposed as a define in
  firewire.h?
-3. Maybe device_max_receive() and link_speed_to_max_payload() should be
- taken up by the firewire core?
 
 -- Issues with TTY core --
   1. Hack for alternate device name scheme
-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 0/5] staging/fwserial: Address reviewer comments

2012-12-14 Thread Peter Hurley

Overdue respin.
v2 changes:
   Don't use 'card' when referring to firewire node
   Removed lower clamp in link_speed_to_max_payload()
   Ripped out the bandwidth limit logic
   Drop suggestion to integrate link_speed_to_max_payload()
 & device_max_receive()
   Note required minimum max_rec value for OHCI-compliant devices

Peter Hurley (6):
  staging/fwserial: Refine Kconfig help text
  staging/fwserial: Remove bandwidth limit logic
  staging/fwserial: Limit tx/rx to 1394-2008 spec maximum
  staging/fwserial: Update TODO file per reviewer comments
  staging/fwserial: Assume firmware is OHCI-complaint
  staging/fwserial: Drop suggestion for helper fn integration

 drivers/staging/fwserial/Kconfig|  4 +++-
 drivers/staging/fwserial/TODO   | 16 +++-
 drivers/staging/fwserial/fwserial.c | 15 ---
 drivers/staging/fwserial/fwserial.h | 17 +++--
 4 files changed, 21 insertions(+), 31 deletions(-)

-- 
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mmotm 2012-12-14-17-51 uploaded (staging/sb105x)

2012-12-14 Thread Steven Rostedt

On Fri, 2012-12-14 at 21:29 -0800, Randy Dunlap wrote:
> On 12/14/12 17:52, a...@linux-foundation.org wrote:
> > The mm-of-the-moment snapshot 2012-12-14-17-51 has been uploaded to
> > 
> >http://www.ozlabs.org/~akpm/mmotm/
> > 
> > mmotm-readme.txt says
> > 
> > README for mm-of-the-moment:
> > 
> > http://www.ozlabs.org/~akpm/mmotm/
> > 
> > This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> > more than once a week.
> > 
> 
> 
> (not from mmotm patches)
> 
> When CONFIG_PARPORT is not enabled:
> 
> drivers/built-in.o: In function `multi_init':
> sb_pci_mp.c:(.init.text+0x162ac): undefined reference to 
> `parport_pc_probe_port'
> 

I sent out a fix today. Thanks,

https://lkml.org/lkml/2012/12/14/250

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mmotm 2012-12-14-17-51 uploaded (staging/sb105x)

2012-12-14 Thread Randy Dunlap

On 12/14/12 17:52, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2012-12-14-17-51 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> http://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.
> 


(not from mmotm patches)

When CONFIG_PARPORT is not enabled:

drivers/built-in.o: In function `multi_init':
sb_pci_mp.c:(.init.text+0x162ac): undefined reference to `parport_pc_probe_port'



-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] arch/tile: implement arch_ptrace using user_regset on tilegx

2012-12-14 Thread Simon Marchi

This patch changes arch_ptrace on tilegx so that it uses the user_regset
to implement the PTRACE_GETREGS and PTRACE_SETREGS operations.

The ifdefs and the old code can be removed when user_regset support for
the older architectures is there.

Signed-off-by: Simon Marchi 
---
 arch/tile/kernel/ptrace.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/tile/kernel/ptrace.c b/arch/tile/kernel/ptrace.c
index 0e68d06..9435dd1 100644
--- a/arch/tile/kernel/ptrace.c
+++ b/arch/tile/kernel/ptrace.c
@@ -196,18 +196,28 @@ long arch_ptrace(struct task_struct *child, long request,
break;
 
case PTRACE_GETREGS:  /* Get all registers from the child. */
+#ifdef __tilegx__
+   ret = copy_regset_to_user(child, _user_regset_view, 
REGSET_GPR,
+ 0, sizeof(struct pt_regs), datap);
+#else /* __tilegx__ */
if (copy_to_user(datap, getregs(child, ),
 sizeof(struct pt_regs)) == 0) {
ret = 0;
}
+#endif /* __tilegx__ */
break;
 
case PTRACE_SETREGS:  /* Set all registers in the child. */
+#ifdef __tilegx__
+   ret = copy_regset_from_user(child, _user_regset_view, 
REGSET_GPR,
+   0, sizeof(struct pt_regs), datap);
+#else /* __tilegx__ */
if (copy_from_user(, datap,
   sizeof(struct pt_regs)) == 0) {
putregs(child, );
ret = 0;
}
+#endif /* __tilegx__ */
break;
 
case PTRACE_GETFPREGS:  /* Get the child FPU state. */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] arch/tile: implement user_regset interface on tilegx

2012-12-14 Thread Simon Marchi

This is an implementation of user_regset for the tilegx architecture. It
reuses the basic blocks that were already there.

Signed-off-by: Simon Marchi 
---
I only tested these patches on a 3.0 kernel, as this is what my current
setup allows me. Some testing on more recent versions would be
appreciated, although I don't think the user_regset framework changed
much since then.

Also, I put some ifdefs so that these patches only affect tilegx and not
the older tile architectures, which I don't have access to. Hopefully
someone else can finish the work for those, it's probably not much.

 arch/tile/kernel/ptrace.c |   65 +
 1 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/arch/tile/kernel/ptrace.c b/arch/tile/kernel/ptrace.c
index b32bc3f..0e68d06 100644
--- a/arch/tile/kernel/ptrace.c
+++ b/arch/tile/kernel/ptrace.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 void user_enable_single_step(struct task_struct *child)
@@ -80,6 +82,69 @@ static void putregs(struct task_struct *child, struct 
pt_regs *uregs)
*regs = *uregs;
 }
 
+#ifdef __tilegx__
+
+enum tile_regset {
+   REGSET_GPR,
+};
+
+static int tile_gpr_get(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+   struct pt_regs regs;
+
+   getregs(target, );
+
+   return user_regset_copyout(, , , , , 0,
+  sizeof(regs));
+}
+
+static int tile_gpr_set(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+   int ret;
+   struct pt_regs regs;
+
+   ret = user_regset_copyin(, , , , , 0,
+sizeof(regs));
+   if (ret)
+   return ret;
+
+   putregs(target, );
+
+   return 0;
+}
+
+static const struct user_regset tile_user_regset[] = {
+   [REGSET_GPR] = {
+   .core_note_type = NT_PRSTATUS,
+   .n = ELF_NGREG,
+   .size = sizeof(elf_greg_t),
+   .align = sizeof(elf_greg_t),
+   .get = tile_gpr_get,
+   .set = tile_gpr_set,
+   },
+};
+
+static const struct user_regset_view tile_user_regset_view = {
+   .name = "tilegx",
+   .e_machine = ELF_ARCH,
+   .ei_osabi = ELF_OSABI,
+   .regsets = tile_user_regset,
+   .n = ARRAY_SIZE(tile_user_regset),
+};
+
+const struct user_regset_view *task_user_regset_view(struct task_struct *task)
+{
+   return _user_regset_view;
+}
+
+#endif /* __tilegx__ */
+
 long arch_ptrace(struct task_struct *child, long request,
 unsigned long addr, unsigned long data)
 {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] arch/tile: set CORE_DUMP_USE_REGSET on tilegx

2012-12-14 Thread Simon Marchi

Following the previous patch which adds support for user_regset, tilegx
can now use this feature.

Signed-off-by: Simon Marchi 
---
 arch/tile/include/asm/elf.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/tile/include/asm/elf.h b/arch/tile/include/asm/elf.h
index f8ccf08..7a793c7 100644
--- a/arch/tile/include/asm/elf.h
+++ b/arch/tile/include/asm/elf.h
@@ -169,4 +169,8 @@ do { \
 
 #endif /* CONFIG_COMPAT */
 
+#ifdef __tilegx__
+#define CORE_DUMP_USE_REGSET
+#endif /* __tilegx__ */
+
 #endif /* _ASM_TILE_ELF_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] x86/uapi for 3.8

2012-12-14 Thread H. Peter Anvin

On 12/14/2012 05:16 PM, Linus Torvalds wrote:
> On Fri, Dec 14, 2012 at 3:45 PM, David Howells  wrote:
>> Linus Torvalds  wrote:
>>
>>> Yeah, I think I have most of the x86 stuff merged now (just merged the
>>> EFI and ACPI trees), and at this point it might be worth regenerating
>>> it and getting this over and done with.
>>
>> Okay, regenerated and pushed.
> 
> Ugh. This doesn't seem to work for me at all. It causes infinite
> scrolling of some text that I have no idea about.
> 
> I started bisecting (because I thought it might be something else and
> I hadn't booted after every pull), but by now the only thing I have
> left is ARM and a couple of tiny OF patches .. and the x86 UAPI split.
> 
> The split *should* have been safe, since it's mostly a "compile or
> not" thing like Peter said, but we had similar problems on other
> architectures, when things compiled but didn't actually work due to
> missing #define's and #ifdef handling. Things like
> architecture-specific macros that have default versions available when
> the macro is missing etc.
> 
> Now, maybe it's some of the other remaining commits after all, but
> from where I am in the bisection it really looks like the uapi patch
> is the most likely culprit. So I thought I'd let people know...
> 

Hmmm... I can't reproduce your failure.  Could you send me your config
and a hint about what hardware you're seeing this on?  That might help
chase this down.

Macs in particular are EFI machines, except they hide OF snippets inside
ACPI (I kid you not.)  All of that creates interesting cross-dependencies.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: at drivers/tty/tty_buffer.c:476 flush_to_ldisc+0x1de/0x1f0()

2012-12-14 Thread Peter Hurley

On Fri, 2012-12-14 at 18:29 -0800, Greg Kroah-Hartman wrote:
> On Tue, Dec 11, 2012 at 10:01:24PM -0500, Dave Jones wrote:
> > Fuzz-testing fallout from post 3.7 tree as of commit 
> > 414a6750e59b0b687034764c464e9ddecac0f7a6
> > 
> > [ 2181.230579] [ cut here ]
> > [ 2181.231277] WARNING: at drivers/tty/tty_buffer.c:476 
> > flush_to_ldisc+0x1de/0x1f0()
> > [ 2181.232358] Hardware name: GA-MA78GM-S2H
> > [ 2181.232925] tty is NULL
> > [ 2181.233430] Modules linked in: l2tp_ppp l2tp_core fuse rfcomm 
> > binfmt_misc hidp bnep scsi_transport_iscsi ipt_ULOG nfnetlink rose ipx 
> > p8023 p8022 caif_socket caif af_rxrpc x25 irda af_key appletalk pppoe 
> > netrom pppox ppp_generic decnet phonet slhc psnap crc_ccitt ax25 llc2 rds 
> > atm llc nfc can nfsv3 nfs_acl nfs fscache lockd sunrpc ip6t_REJECT 
> > nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter 
> > ip6_tables snd_hda_codec_realtek btusb snd_hda_intel bluetooth usb_debug 
> > snd_hda_codec microcode snd_pcm serio_raw pcspkr snd_page_alloc snd_timer 
> > edac_core snd soundcore r8169 mii vhost_net tun macvtap macvlan kvm_amd kvm
> > [ 2181.245632] Pid: 29787, comm: kworker/0:1 Not tainted 3.7.0+ #12
> > [ 2181.246503] Call Trace:
> > [ 2181.246851]  [] warn_slowpath_common+0x7f/0xc0
> > [ 2181.247725]  [] warn_slowpath_fmt+0x46/0x50
> > [ 2181.248558]  [] ? ___ratelimit+0x9a/0x120
> > [ 2181.249347]  [] flush_to_ldisc+0x1de/0x1f0
> > [ 2181.250164]  [] process_one_work+0x207/0x750
> > [ 2181.251013]  [] ? process_one_work+0x197/0x750
> > [ 2181.251893]  [] ? destroy_work_on_stack+0x20/0x20
> > [ 2181.252809]  [] ? 
> > tty_insert_flip_string_fixed_flag+0x110/0x110
> > [ 2181.253993]  [] worker_thread+0x156/0x440
> > [ 2181.254815]  [] ? rescuer_thread+0x240/0x240
> > [ 2181.255638]  [] kthread+0xed/0x100
> > [ 2181.256374]  [] ? put_lock_stats.isra.23+0xe/0x40
> > [ 2181.257290]  [] ? kthread_create_on_node+0x160/0x160
> > [ 2181.258223]  [] ret_from_fork+0x7c/0xb0
> > [ 2181.259018]  [] ? kthread_create_on_node+0x160/0x160
> > [ 2181.259969] ---[ end trace 12dd9f01acd7e09f ]---
> 
> Jiri, I thought we resolved these warnings in the linux-next tree, how
> are they still showing up?

Greg, that's what the series that I just sent v2 of fixes. Look for
"[PATCH v2 0/11] tty: Fix buffer work access-after-free" et al.

I tried to get it done sooner but got waylaid by GP fault in SLUB caused
by nouveau (solved) and page allocation exhaustion in -next on 10gb
machine (not solved). That and some frustration with getting netconsole
running with kvm (solved).

Dave, how do you have your trinity command line + kvm configured? I had
to write a test jig to get this to happen but I'd prefer to reproduce it
in trinity.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.9-rt21

2012-12-14 Thread Mike Galbraith

On Wed, 2012-12-05 at 17:05 +0100, Thomas Gleixner wrote: 
> Dear RT Folks,
> 
> I'm pleased to announce the 3.6.9-rt21 release. 3.6.7-rt18, 3.6.8-rt19
> and 3.6.9-rt20 are not announced updates to the respective 3.6.y
> stable releases without any RT changes
> 
> Changes since 3.6.9-rt20:
> 
>* Fix the PREEMPT_LAZY implementation on ARM
> 
>* Fix the RCUTINY issues
> 
>* Fix a long standing scheduler bug (See commit log of
>  sched-enqueue-to-head.patch)

That last has an oversight buglet.

sched: add missing userspace->kernel struct sched_param.sched_priority inversion

Signed-off-by: Mike Galbraith 
---
 kernel/sched/core.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4624,7 +4624,7 @@ static int __sched_setscheduler(struct t
p->sched_reset_on_fork = reset_on_fork;
 
oldprio = p->prio;
-   if (oldprio == param->sched_priority)
+   if (oldprio == (MAX_RT_PRIO - 1) - param->sched_priority)
goto out;
 
on_rq = p->on_rq;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] EXTCON: Get and set cable properties

2012-12-14 Thread Tc, Jenny

Anton,

Could you please have a look at my comments below?

-jtc

> > While I see nothing wrong with the patch itself, I beg you to send
> > some users for the new calls. Don't be obsessed with the extcon
> > internals too much, think more about how things will interact (i.e. I
> > really really want to see how you use these calls from the power supply
> drivers).
> 
> The usage of extcon cable property is captured in patch
> https://lkml.org/lkml/2012/10/18/219
> This patch uses a extcon_dev  callback function get_cable_properties() to get
> the cable properties. As discussed in the previous mail thread, it may not be
> good to have a extcon call back function since the extcon provider may not
> be aware of the cable properties. This patch replaces the callback function
> with an API, so that whoever knows the cable property, can set the property
> using the extcon API extcon_cable_set_data().
> 
> The usage flow would be
> 1)Consumer gets a notification from the extcon 2)consumer reads the
> property using the API extcon_cable_get_data
> 
> This way it doesn't mandatory for the extcon provider to give the cable
> property.
> Anyone who is aware of the cable property can set the cable property using
> the API.
> It makes the consumer and provider implementations very simple.
> 
> With this new API, the callback function in patch
> https://lkml.org/lkml/2012/10/18/219 can be replaced by the API
> extcon_cable_set_data().
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [v3.7 Regression] [SCSI] sd: Implement support for WRITE SAME

2012-12-14 Thread Joseph Salisbury


On 12/14/2012 05:35 PM, Martin K. Petersen wrote:

"Joseph" == Joseph Salisbury  writes:

Joseph> I see that you are the author of this patch, so I wanted to run
Joseph> this by you.  I was thinking of requesting a revert for v3.7,
Joseph> but I wanted to get your feedback first.

I copied luksformat from a Debian box so I could try to reproduce on
OL6. Everything works fine for me here. I set up a encrypted ext4
device, unpacked a kernel tarball, and did a build.

The oops screenshot in the launchpad bug report is pretty useless.
Please provide a full backtrace so we can get a better idea what's going
on.



Thanks for the feedback, Martin.

I will do some additional research and testing.  I'll be sure to capture 
a full backtrace for analysis.


Thanks again,

Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v3.7 Regression] [SCSI] sd: Implement support for WRITE SAME

2012-12-14 Thread Joseph Salisbury


On 12/14/2012 04:11 PM, Mike Snitzer wrote:

On Fri, Dec 14 2012 at  3:30pm -0500,
Joseph Salisbury  wrote:


Hi Martin,

A bug was opened against the Ubuntu kernel[0].  After a kernel
bisect, it was found that reverting the following commit resolved
this bug:

commit 5db44863b6ebbb400c5e61d56ebe8f21ef48b1bd
Author: Martin K. Petersen 
Date:   Tue Sep 18 12:19:32 2012 -0400
[SCSI] sd: Implement support for WRITE SAME

The regression was introduced as of v3.7-rc7.

The bug can be reproduced with the following commands, which will
operate on a virtual scsi_debug device, so they won't change any
data on the test system. However, this will completely crash the
system:

sudo modprobe scsi_debug
sudo luksformat -t ext4 /dev/sdb <- Or whatever device gets assigned
after inserting scsi_debug.
sudo cryptsetup luksOpen /dev/sdb treasure

Everything works fine up to here, but the following will cause the crash:

sudo mount /dev/mapper/treasure /mnt

The bug can be reproduced on bare metal, in a VM and on i386 or amd64.

I see that you are the author of this patch, so I wanted to run this
by you.  I was thinking of requesting a revert for v3.7, but I
wanted to get your feedback first.


Thanks,

Joe


[0] https://bugs.launchpad.net/bugs/1089818

The WRITE SAME change was introduced long before v3.7-rc7.  I think your
bisect is somehow wrong.

From Linus' tree:
git describe --contains 5db4486
v3.7-rc7~19^2

Reverting commit 5db4486 solves the bug previously mentioned.



Milan Broz recently pointed out issues he found with luks when using a
late 3.7-rc (rc7 afaik).  Linus fixed that issue with this commit (which
landed in the final v3.7):
http://git.kernel.org/linus/684c9aaebbb0ea3a9954

That may not be _the_ problem though.  But have you tried the final
v3.7?


Yes, v3.7 without reverting 5db4486 exhibits the bug.



Mike

I'll research further and provide additional data.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: at drivers/tty/tty_buffer.c:476 flush_to_ldisc+0x1de/0x1f0()

2012-12-14 Thread Greg Kroah-Hartman

On Tue, Dec 11, 2012 at 10:01:24PM -0500, Dave Jones wrote:
> Fuzz-testing fallout from post 3.7 tree as of commit 
> 414a6750e59b0b687034764c464e9ddecac0f7a6
> 
> [ 2181.230579] [ cut here ]
> [ 2181.231277] WARNING: at drivers/tty/tty_buffer.c:476 
> flush_to_ldisc+0x1de/0x1f0()
> [ 2181.232358] Hardware name: GA-MA78GM-S2H
> [ 2181.232925] tty is NULL
> [ 2181.233430] Modules linked in: l2tp_ppp l2tp_core fuse rfcomm binfmt_misc 
> hidp bnep scsi_transport_iscsi ipt_ULOG nfnetlink rose ipx p8023 p8022 
> caif_socket caif af_rxrpc x25 irda af_key appletalk pppoe netrom pppox 
> ppp_generic decnet phonet slhc psnap crc_ccitt ax25 llc2 rds atm llc nfc can 
> nfsv3 nfs_acl nfs fscache lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables 
> snd_hda_codec_realtek btusb snd_hda_intel bluetooth usb_debug snd_hda_codec 
> microcode snd_pcm serio_raw pcspkr snd_page_alloc snd_timer edac_core snd 
> soundcore r8169 mii vhost_net tun macvtap macvlan kvm_amd kvm
> [ 2181.245632] Pid: 29787, comm: kworker/0:1 Not tainted 3.7.0+ #12
> [ 2181.246503] Call Trace:
> [ 2181.246851]  [] warn_slowpath_common+0x7f/0xc0
> [ 2181.247725]  [] warn_slowpath_fmt+0x46/0x50
> [ 2181.248558]  [] ? ___ratelimit+0x9a/0x120
> [ 2181.249347]  [] flush_to_ldisc+0x1de/0x1f0
> [ 2181.250164]  [] process_one_work+0x207/0x750
> [ 2181.251013]  [] ? process_one_work+0x197/0x750
> [ 2181.251893]  [] ? destroy_work_on_stack+0x20/0x20
> [ 2181.252809]  [] ? 
> tty_insert_flip_string_fixed_flag+0x110/0x110
> [ 2181.253993]  [] worker_thread+0x156/0x440
> [ 2181.254815]  [] ? rescuer_thread+0x240/0x240
> [ 2181.255638]  [] kthread+0xed/0x100
> [ 2181.256374]  [] ? put_lock_stats.isra.23+0xe/0x40
> [ 2181.257290]  [] ? kthread_create_on_node+0x160/0x160
> [ 2181.258223]  [] ret_from_fork+0x7c/0xb0
> [ 2181.259018]  [] ? kthread_create_on_node+0x160/0x160
> [ 2181.259969] ---[ end trace 12dd9f01acd7e09f ]---

Jiri, I thought we resolved these warnings in the linux-next tree, how
are they still showing up?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mm: Downgrade mmap_sem before locking or populating on mmap

2012-12-14 Thread Andy Lutomirski

This is a serious cause of mmap_sem contention.  MAP_POPULATE
and MCL_FUTURE, in particular, are disastrous in multithreaded programs.

Signed-off-by: Andy Lutomirski 
---

Changes from v1:

The non-unlocking versions of do_mmap_pgoff and mmap_region are still
available for aio_setup_ring's benefit.  In theory, aio_setup_ring
would do better with a lock-downgrading version, but that would be
somewhat ugly and doesn't help my workload.

 arch/tile/mm/elf.c |  9 +++---
 fs/aio.c   |  4 +++
 include/linux/mm.h | 19 ++--
 ipc/shm.c  |  6 ++--
 mm/fremap.c| 10 --
 mm/mmap.c  | 89 --
 mm/util.c  |  3 +-
 7 files changed, 117 insertions(+), 23 deletions(-)

diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 3cfa98b..a0441f2 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -129,12 +129,13 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 */
if (!retval) {
unsigned long addr = MEM_USER_INTRPT;
-   addr = mmap_region(NULL, addr, INTRPT_SIZE,
-  MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
-  VM_READ|VM_EXEC|
-  VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
+   addr = mmap_region_unlock(NULL, addr, INTRPT_SIZE,
+ MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
if (addr > (unsigned long) -PAGE_SIZE)
retval = (int) addr;
+   return retval;  /* We already unlocked mmap_sem. */
}
 #endif
 
diff --git a/fs/aio.c b/fs/aio.c
index 71f613c..253396c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -127,6 +127,10 @@ static int aio_setup_ring(struct kioctx *ctx)
info->mmap_size = nr_pages * PAGE_SIZE;
dprintk("attempting mmap of %lu bytes\n", info->mmap_size);
down_write(>mm->mmap_sem);
+   /*
+* XXX: If MCL_FUTURE is set, this will hold mmap_sem for write for
+*  longer than necessary.
+*/
info->mmap_base = do_mmap_pgoff(NULL, 0, info->mmap_size, 
PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE, 0);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcaab4e..139f636 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1441,14 +1441,27 @@ extern int install_special_mapping(struct mm_struct *mm,
 
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned 
long, unsigned long, unsigned long);
 
+/* These must be called with mmap_sem held for write. */
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff);
-extern unsigned long do_mmap_pgoff(struct file *, unsigned long,
-unsigned long, unsigned long,
-unsigned long, unsigned long);
+extern unsigned long do_mmap_pgoff(struct file *, unsigned long addr,
+   unsigned long len, unsigned long prot,
+   unsigned long flags, unsigned long pgoff);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t);
 
+/*
+ * These must be called with mmap_sem held for write, and they will release
+ * mmap_sem before they return.  They hold mmap_sem for a shorter time than
+ * the non-unlocking variants.
+ */
+extern unsigned long mmap_region_unlock(struct file *file, unsigned long addr,
+   unsigned long len, unsigned long flags,
+   vm_flags_t vm_flags, unsigned long pgoff);
+extern unsigned long do_mmap_pgoff_unlock(struct file *, unsigned long addr,
+   unsigned long len, unsigned long prot,
+   unsigned long flags, unsigned long pgoff);
+
 /* These take the mm semaphore themselves */
 extern unsigned long vm_brk(unsigned long, unsigned long);
 extern int vm_munmap(unsigned long, size_t);
diff --git a/ipc/shm.c b/ipc/shm.c
index dff40c9..d0001c8 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1068,12 +1068,14 @@ long do_shmat(int shmid, char __user *shmaddr, int 
shmflg, ulong *raddr,
addr > current->mm->start_stack - size - PAGE_SIZE * 5)
goto invalid;
}
-   
-   user_addr = do_mmap_pgoff(file, addr, size, prot, flags, 0);
+
+   user_addr = do_mmap_pgoff_unlock(file, addr, size, prot, flags, 0);
*raddr = user_addr;
err = 0;
if (IS_ERR_VALUE(user_addr))
err = (long)user_addr;
+   goto out_fput;
+
 invalid:
up_write(>mm->mmap_sem);
 
diff --git a/mm/fremap.c b/mm/fremap.c
index a0aaf0e..7ebe0a4 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -200,8 +200,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, 
unsigned long, size,

Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write

2012-12-14 Thread Andy Lutomirski

On Fri, Dec 14, 2012 at 6:01 PM, Darrick J. Wong
 wrote:
> On Fri, Dec 14, 2012 at 05:12:37PM -0800, Andy Lutomirski wrote:
>> It survived.  I hit at least one mm bug, but I really don't think it's
>> a problem with your code.  (I have not tried this workload on Linux
>> 3.7 at all before.  It normally runs on 3.5.)  The box in question is
>
> Would you mind sending along the bug report so I can make sure?

http://marc.info/?l=linux-mm=135553342803210=2

>
>> ext4 on LVM on dm-crypt on (hardware) RAID 5 on hpsa, which should not
>> need stable pages.
>>
>> The majority of the data written (that wasn't unlinked before it was
>> dropped from cache) was checksummed when written and verified later.
>> Most of this data was written using mmap.  This workload hammers the
>> vm concurrently in several threads, and it frequently stalls when
>> stable pages are enabled, so it's probably exercising the code
>> decently well.
>
> Did you observe any change in performance?

No.  But I'm comparing to 3.5 + butchery to remove stable pages.  With
stable pages on, this workload performs terribly.  (It's a soft
real-time thing, as you can possibly guess from my domain name, and
various latency monitoring things go nuts when stable pages are
active.)

Actually, performance appears to be improved, probably due to
https://lkml.org/lkml/2012/12/14/14, which I tested concurrently.

>
>> Feel free to add Tested-by: Andy Lutomirski 
>
> Will do!  Thanks for the testing!

My pleasure.  When these changes go in to an upstream kernel, they'll
represent a significant reduction in how much our kernel differs from
kernel.org's :)  Thanks for fixing this.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write

2012-12-14 Thread Darrick J. Wong

On Fri, Dec 14, 2012 at 05:12:37PM -0800, Andy Lutomirski wrote:
> On Thu, Dec 13, 2012 at 6:10 PM, Darrick J. Wong
>  wrote:
> > On Thu, Dec 13, 2012 at 05:48:06PM -0800, Andy Lutomirski wrote:
> >> On 12/13/2012 12:08 AM, Darrick J. Wong wrote:
> >> > Several complaints have been received regarding long file write 
> >> > latencies when
> >> > memory pages must be held stable during writeback.  Since it might not be
> >> > acceptable to stall programs for the entire duration of a page write 
> >> > (which may
> >> > take many milliseconds even on good hardware), enable a second strategy 
> >> > wherein
> >> > pages are snapshotted as part of submit_bio; the snapshot can be held 
> >> > stable
> >> > while writes continue.
> >> >
> >> > This provides a band-aid to provide stable page writes on jbd without 
> >> > needing
> >> > to backport the fixed locking scheme in jbd2.  A mount option is added 
> >> > to ext4
> >> > to allow administrators to enable it there.
> >>
> >> I'm a bit confused as to what it has to do with ext3.  Wouldn't this be
> >> useful as a mount option everywhere, though?
> >
> > ext3 requires snapshots; the rest are ok with either strategy.
> >
> > *If* snapshotting is generally liked, then yes I'll go redo it as a vfs 
> > mount
> > option.
> >
> >> If this becomes widely used, would it be better to snapshot on
> >> wait_for_stable_page instead of on io submission?
> >
> > That really depends on how long you can afford to wait and how much free
> > memory you have. :)  It's all a big tradeoff between write latency and
> > consumption of memory pages and bandwidth, and one that I doubt I'm 
> > qualified
> > to make for everyone.
> >
> >> FWIW, I'm about to pound pretty hard on this whole patchset on a box
> >> that doesn't need stable pages.  I'll let you know how it goes.
> >
> > Yay!
> >
> > --D
> 
> It survived.  I hit at least one mm bug, but I really don't think it's
> a problem with your code.  (I have not tried this workload on Linux
> 3.7 at all before.  It normally runs on 3.5.)  The box in question is

Would you mind sending along the bug report so I can make sure?

> ext4 on LVM on dm-crypt on (hardware) RAID 5 on hpsa, which should not
> need stable pages.
> 
> The majority of the data written (that wasn't unlinked before it was
> dropped from cache) was checksummed when written and verified later.
> Most of this data was written using mmap.  This workload hammers the
> vm concurrently in several threads, and it frequently stalls when
> stable pages are enabled, so it's probably exercising the code
> decently well.

Did you observe any change in performance?

> Feel free to add Tested-by: Andy Lutomirski 

Will do!  Thanks for the testing!

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] userns: Add a more complete capability subset test to commit_creds

2012-12-14 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> When unsharing a user namespace we reduce our credentials to just what
> can be done in that user namespace.  This is a subset of the credentials
> we previously had.  Teach commit_creds to recognize this is a subset
> of the credentials we have had before and don't clear the dumpability flag.
> 
> This allows an unprivileged  program to do:
> unshare(CLONE_NEWUSER);
> fd = open("/proc/self/uid_map", O_RDWR);
> 
> Where previously opening the uid_map writable would fail because
> the the task had been made non-dumpable.
> 
> Signed-off-by: "Eric W. Biederman" 

Acked-by: Serge Hallyn 

> ---
>  kernel/cred.c |   27 ++-
>  1 files changed, 26 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 48cea3d..709d521 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -455,6 +455,31 @@ error_put:
>   return ret;
>  }
>  
> +static bool cred_cap_issubset(const struct cred *set, const struct cred 
> *subset)
> +{
> + const struct user_namespace *set_ns = set->user_ns;
> + const struct user_namespace *subset_ns = subset->user_ns;
> +
> + /* If the two credentials are in the same user namespace see if
> +  * the capabilities of subset are a subset of set.
> +  */
> + if (set_ns == subset_ns)
> + return cap_issubset(subset->cap_permitted, set->cap_permitted);
> +
> + /* The credentials are in a different user namespaces
> +  * therefore one is a subset of the other only if a set is an
> +  * ancestor of subset and set->euid is owner of subset or one
> +  * of subsets ancestors.
> +  */
> + for (;subset_ns != _user_ns; subset_ns = subset_ns->parent) {
> + if ((set_ns == subset_ns->parent)  &&
> + uid_eq(subset_ns->owner, set->euid))
> + return true;
> + }
> +
> + return false;
> +}
> +
>  /**
>   * commit_creds - Install new credentials upon the current task
>   * @new: The credentials to be assigned
> @@ -493,7 +518,7 @@ int commit_creds(struct cred *new)
>   !gid_eq(old->egid, new->egid) ||
>   !uid_eq(old->fsuid, new->fsuid) ||
>   !gid_eq(old->fsgid, new->fsgid) ||
> - !cap_issubset(new->cap_permitted, old->cap_permitted)) {
> + !cred_cap_issubset(old, new)) {
>   if (task->mm)
>   set_dumpable(task->mm, suid_dumpable);
>   task->pdeath_signal = 0;
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] cpuidle: coupled: fix ready counter decrement

2012-12-14 Thread Colin Cross

On Fri, Dec 14, 2012 at 3:37 PM, Rafael J. Wysocki  wrote:
> On Friday, December 14, 2012 10:42:08 AM Sivaram Nair wrote:
>> The ready_waiting_counts atomic variable is compared against the wrong
>> online cpu count. The latter is computed incorrectly using logical-OR
>> instead of bit-OR. This patch fixes that.
>
> I'm queuing this up for submission as v3.8 material.
>
> I suppose it should be marked for -stable too?
>
> Rafael

Acked-by: Colin Cross 

Looks suitable for stable.

>> Signed-off-by: Sivaram Nair 
>> ---
>>  drivers/cpuidle/coupled.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
>> index 3265844..2a297f8 100644
>> --- a/drivers/cpuidle/coupled.c
>> +++ b/drivers/cpuidle/coupled.c
>> @@ -209,7 +209,7 @@ inline int cpuidle_coupled_set_not_ready(struct 
>> cpuidle_coupled *coupled)
>>   int all;
>>   int ret;
>>
>> - all = coupled->online_count || (coupled->online_count << WAITING_BITS);
>> + all = coupled->online_count | (coupled->online_count << WAITING_BITS);
>>   ret = atomic_add_unless(>ready_waiting_counts,
>>   -MAX_WAITING_CPUS, all);
>>
>>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

mmotm 2012-12-14-17-51 uploaded

2012-12-14 Thread akpm

The mm-of-the-moment snapshot 2012-12-14-17-51 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (3.x
or 3.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/?p=linux-mmotm.git;a=summary

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/?p=linux-mmots.git;a=summary

and use of this tree is similar to
http://git.cmpxchg.org/?p=linux-mmotm.git, described above.


This mmotm tree contains the following patches against 3.7:
(patches marked "*" will be included in linux-next)

  origin.patch
  linux-next.patch
  linux-next-rejects-git-rejects.patch
  make-my-i386-build-work.patch
  i-need-old-gcc.patch
  arch-alpha-kernel-systblss-remove-debug-check.patch
* cris-fix-i-o-macros.patch
* vfs-d_obtain_alias-needs-to-use-as-default-name.patch
* fs-block_devc-page-cache-wrongly-left-invalidated-after-revalidate_disk.patch
* 
arch-x86-platform-iris-irisc-register-a-platform-device-and-a-platform-driver.patch
* x86-numa-dont-check-if-node-is-numa_no_node.patch
* arch-x86-tools-insn_sanityc-identify-source-of-messages.patch
* uv-fix-incorrect-tlb-flush-all-issue.patch
* olpc-fix-olpc-xo1-scic-build-errors.patch
* x86-convert-update_mmu_cache-and-update_mmu_cache_pmd-to-functions.patch
* x86-fix-the-argument-passed-to-sync_global_pgds.patch
* x86-fix-a-compile-error-a-section-type-conflict.patch
* x86-make-mem=-option-to-work-for-efi-platform.patch
* audit-create-explicit-audit_seccomp-event-type.patch
* audit-catch-possible-null-audit-buffers.patch
* ceph-fix-dentry-reference-leak-in-ceph_encode_fh.patch
* cris-use-int-for-ssize_t-to-match-size_t.patch
* pcmcia-move-unbind-rebind-into-dev_pm_opscomplete.patch
* fb-rework-locking-to-fix-lock-ordering-on-takeover.patch
* fb-rework-locking-to-fix-lock-ordering-on-takeover-fix.patch
* fb-rework-locking-to-fix-lock-ordering-on-takeover-fix-2.patch
* cyber2000fb-avoid-palette-corruption-at-higher-clocks.patch
* irq-tsk-comm-is-an-array.patch
* kconfig-fix-irq-subsystem-menu.patch
* timeconstpl-remove-deprecated-defined-array.patch
* time-dont-inline-export_symbol-functions.patch
* coccinelle-add-api-d_find_aliascocci.patch
* h8300-select-generic-atomic64_t-support.patch
* mm-mempolicy-introduce-spinlock-to-read-shared-policy-tree.patch
* drivers-message-fusion-mptscsihc-missing-break.patch
* 
block-restore-proc-partitions-to-not-display-non-partitionable-removable-devices.patch
* block-remove-deadlock-in-disk_clear_events.patch
* block-remove-deadlock-in-disk_clear_events-fix.patch
* block-prevent-race-cleanup.patch
* block-prevent-race-cleanup-fix.patch
* vfs-increment-iversion-when-a-file-is-truncated.patch
* fs-change-return-values-from-eacces-to-eperm.patch
* fs-block_devc-need-not-to-check-inode-i_bdev-in-bd_forget.patch
* watchdog-trigger-all-cpu-backtrace-when-locked-up-and-going-to-panic.patch
* mm-slab-remove-duplicate-check.patch
  mm.patch
* memory-hotplug-document-and-enable-config_movable_node.patch
* memory-hotplug-document-and-enable-config_movable_node-fix.patch
* mm-memmap_init_zone-performance-improvement.patch
*

Re: [PATCH] cpuidle: coupled: fix the potensial race condition and deadlock

2012-12-14 Thread Colin Cross

On Sun, Dec 2, 2012 at 6:59 PM, Joseph Lo  wrote:
> Considering the chance that two CPU come into cpuidle_enter_state_coupled at
> very close time. The 1st CPU increases the waiting count and the 2nd CPU do 
> the
> same thing right away. The 2nd CPU found the 1st CPU already in waiting then
> prepare to poke it.
>
> Before the 2nd CPU to poke 1st CPU, the 1st found the waiting count already 
> same
> with cpu_online count. So the 1st won't go into waiting loop and the 2nd CPU
> didn't poke it yet. The 1st CPU will go into ready loop directly.
>
> Then the 2nd CPU set up the couple_cpuidle_mask for 1st CPU and poke it. But 
> the
> 1st CPU already lost the chance to handle the poke and clear the
> couple_cpuidle_mask. Now whole MPcore are alredy to be coupled. The MPcore 
> will
> go into the power saving idle mode.
>
> Because the poke was be implemented as a "smp_call_function_single", it's a
> software intrrupt of "IPI_SINGLE_FUNC". If the power saving idle mode of the
> platform can't retain the software interrupt of power saving idle mode, (e.g.
> Tegra's powerd-down idle mode will shut off cpu power that include the power 
> of
> GIC) the software interrupt will got lost.

This is the root of your problem.  The cpu should never go to idle
while an IPI is pending.  I thought we already had a patch to return
an error from gic_cpu_save when an IPI was pending and abort the idle
transition, but apparently not and I can't find any references to it.

> When the CPU resumed from the power saving idle mode and the system still keep
> idle, it will go into idle mode again immediately. Because the
> "smp_call_function_single" not allow the same function be called for the same
> cpu twice, or it will got a lock. So the "couple_cpuidle_mask" can't be 
> cleared
> by 1st CPU, the 2nd CPU also can't poke it again. Then the deadlock happens
> here.
>
> The fix here used different wake up mechanism. Because there are already two
> loops and a gloable variable "ready_waiting_counts" to sync the status of
> MPcore to coupled state, the "coupled_cpuidle_mask" was not really necessary.
> Just waking up the CPU from waiting and checking if the CPU need resched at
> outside world to take the CPU out of idle are enough. And this fix didn't
> modify the original behavior of coupled cpuidle framework. It should still
> compitable with the origianal. The cpuidle driver that already applies
> coupled cpuidle not need to change as well.

I don't like using the arch IPI functions directly, especially not
reusing arch_send_call_function_single_ipi without a function to call.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] x86/uapi for 3.8

2012-12-14 Thread Linus Torvalds

On Fri, Dec 14, 2012 at 5:41 PM, Linus Torvalds
 wrote:
> I was wrong. It's not the x86 UAPI split, it's the DT pull. More people added.

Looking at the merge (just in case it could have done something odd),
I'm starting to worry that this is some nasty heisenbug, and my
bisection is not trustworthy at all. Because the DT pull sure as heck
doesn't look like a likely candidate for anything either.

Ho humm. Anybody else see anything strange?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: percpu allocation failures in kvm

2012-12-14 Thread Andy Lutomirski

On Fri, Dec 14, 2012 at 5:03 PM, Marcelo Tosatti  wrote:
> On Thu, Dec 13, 2012 at 09:43:23PM -0800, Andy Lutomirski wrote:
>> On 3.7.0 + irrelevant patches, I get this on boot.  I've seen it on
>> and off on earlier kernels, I think (although I'm not currently
>> getting it on 3.5).
>>
>> [   10.230054] PERCPU: allocation failed, size=304 align=32, alloc
>> from reserved chunk failed
>> [   10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5
>> [   10.230060] Call Trace:
>> [   10.230070]  [] pcpu_alloc+0x9db/0xa40
>> [   10.230074]  [] ? find_symbol_in_section+0x4d/0x140
>> [   10.230077]  [] ? finished_loading+0x50/0x50
>> [   10.230080]  [] ? each_symbol_section+0x30/0x70
>> [   10.230083]  [] ? find_symbol+0x31/0x60
>> [   10.230086]  [] __alloc_reserved_percpu+0x13/0x20
>> [   10.230089]  [] load_module+0x3ed/0x1b50
>> [   10.230093]  [] ? __srcu_read_unlock+0x4b/0x70
>>
>> --Andy
>
> You're loading the kvm module, or loading some other module inside
> a kvm guest?
>

This is loading the kvm module on startup.  There are no guests.

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] Input: Add ChromeOS EC keyboard driver

2012-12-14 Thread Dmitry Torokhov

On Saturday, December 15, 2012 01:13:45 AM Grant Likely wrote:
> On Wed, 12 Dec 2012 13:33:48 -0800, Simon Glass  wrote:
> > Use the key-matrix layer to interpret key scan information from the EC
> > and inject input based on the FDT-supplied key map. This driver registers
> > itself with the ChromeOS EC driver to perform communications.
> > 
> > Additional FDT bindings are provided to specify rows/columns and the
> > auto-repeat information.
> > 
> > Signed-off-by: Simon Glass 
> > Signed-off-by: Luigi Semenzato 
> > Signed-off-by: Vincent Palatin 
> > ---
> > 
> >  .../devicetree/bindings/input/cros-ec-keyb.txt |   77 
> >  drivers/input/keyboard/Kconfig |   10 +
> >  drivers/input/keyboard/Makefile|1 +
> >  drivers/input/keyboard/cros_ec_keyb.c  |  413
> >   4 files changed, 501 insertions(+), 0 deletions(-)
> >  create mode 100644
> >  Documentation/devicetree/bindings/input/cros-ec-keyb.txt
> >  create mode 100644 drivers/input/keyboard/cros_ec_keyb.c
> > 
> > diff --git a/Documentation/devicetree/bindings/input/cros-ec-keyb.txt
> > b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt new file mode
> > 100644
> > index 000..67f51d8
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt
> > @@ -0,0 +1,77 @@
> > +ChromeOS EC Keyboard
> > +
> > +Google's ChromeOS EC Keyboard is a simple matrix keyboard implemented on
> > +a separate EC (Embedded Controller) device. It provides a message for
> > reading +key scans from the EC. These are then converted into keycodes
> > for processing +by the kernel.
> > +
> > +Required properties:
> > +- compatible: "google,cros-ec-keyb"
> > +- google,key-rows: Number of keyboard rows (must be <= 8)
> > +- google,key-columns: Number of keyboard columns (must be <= 13)
> > +- google,repeat-delay-ms: Key repeat delay in milliseconds
> > +- google,repeat-rate-ms: Key repeat rate in milliseconds
> 
> Hmmm, these should probably be in a common binding. Take a look at
> the other input bindings and make a proposal for properties to add to
> matrix-keymap.txt.

Actually these are not essentia for bringup and can be set from userspace,
so I'd say simply drop them.

Thanks.

--  
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] x86/uapi for 3.8

2012-12-14 Thread Linus Torvalds

I was wrong. It's not the x86 UAPI split, it's the DT pull. More people added.

Grant and Guennadi - it looks like some nasty memory corruption,
because now it didn't cause infinite scrolling, now it caused an oops
in unmap_single_vma() during exit_vmas(). In udevd. Followed by a
hang. I don't know *which* commit it is yet, but commit 4939e27d46fe
("Merge tag 'devicetree-for-linus' ...") is bad, and the previous
merge (the ARM mvebu one is fine).

I'll bisect to the commit, but wanted to let David off the hook, and
put the DT people on the hook. Maybe it's some odd merge artifact,
although I don't think that one had any conflicts at all.

 Linus

On Fri, Dec 14, 2012 at 5:16 PM, Linus Torvalds
 wrote:
>
> Ugh. This doesn't seem to work for me at all. It causes infinite
> scrolling of some text that I have no idea about.
>
> I started bisecting (because I thought it might be something else and
> I hadn't booted after every pull), but by now the only thing I have
> left is ARM and a couple of tiny OF patches .. and the x86 UAPI split.
>
> The split *should* have been safe, since it's mostly a "compile or
> not" thing like Peter said, but we had similar problems on other
> architectures, when things compiled but didn't actually work due to
> missing #define's and #ifdef handling. Things like
> architecture-specific macros that have default versions available when
> the macro is missing etc.
>
> Now, maybe it's some of the other remaining commits after all, but
> from where I am in the bisection it really looks like the uapi patch
> is the most likely culprit. So I thought I'd let people know...
>
> Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation

2012-12-14 Thread Toshi Kani

On Thu, 2012-12-13 at 23:15 +0800, Jiang Liu wrote:
> On 12/13/2012 10:42 PM, Toshi Kani wrote:
> > On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
> >> On 12/08/2012 09:08 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
>  On 2012-12-7 10:57, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 :
> >>>
>  2) an ACPI based hotplug manager driver, which is a platform independent
> driver and manages all hotplug slot created by the slot driver.
> >>>
> >>> It is surely impressive work, but I think is is a bit overdoing.  I
> >>> expect hot-pluggable servers come with management console and/or GUI
> >>> where a user can manage hardware units and initiate hot-plug operations.
> >>> I do not think the kernel needs to step into such area since it tends to
> >>> be platform-specific. 
> >> One of the major usages of this feature is for testing. 
> >> It will be hard for OSVs and OEMs to verify hotplug functionalities if it 
> >> could
> >> only be tested by physical hotplug or through management console. So to 
> >> pave the
> >> way for hotplug, we need to provide a mechanism for OEMs and OSVs to 
> >> execute 
> >> auto stress tests for hotplug functionalities.
> > 
> > Yes, but such OS->FW interface is platform-specific.  Some platforms use
> > IPMI for the OS to communicate with the management console.  In this
> > case, an OEM-specific command can be used to request a hotplug through
> > IPMI.  Some platforms may also support test programs to run on the
> > management console for validations.
> > 
> > For early development testing, Yinghai's SCI emulation patch can be used
> > to emulate hotplug events from the OS.  It would be part of the kernel
> > debugging features once this patch is accepted. 
> Hi Toshi,
>   ACPI 5.0 has provided some mechanism to normalize the way to issue
> RAS related requests to firmware. I hope ACPI 5.x will define some 
> standardized
> ways based on the PCC defined in 5.0. If needed, we may provide platform
> specific methods for them too.

Thanks for the pointer!  Yeah, the spec purposely does not define the
command.  When we support PCC, we will need to provide a way for user
app or oem module to supply a payload. 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: percpu allocation failures in kvm

2012-12-14 Thread Marcelo Tosatti

On Thu, Dec 13, 2012 at 09:43:23PM -0800, Andy Lutomirski wrote:
> On 3.7.0 + irrelevant patches, I get this on boot.  I've seen it on
> and off on earlier kernels, I think (although I'm not currently
> getting it on 3.5).
> 
> [   10.230054] PERCPU: allocation failed, size=304 align=32, alloc
> from reserved chunk failed
> [   10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5
> [   10.230060] Call Trace:
> [   10.230070]  [] pcpu_alloc+0x9db/0xa40
> [   10.230074]  [] ? find_symbol_in_section+0x4d/0x140
> [   10.230077]  [] ? finished_loading+0x50/0x50
> [   10.230080]  [] ? each_symbol_section+0x30/0x70
> [   10.230083]  [] ? find_symbol+0x31/0x60
> [   10.230086]  [] __alloc_reserved_percpu+0x13/0x20
> [   10.230089]  [] load_module+0x3ed/0x1b50
> [   10.230093]  [] ? __srcu_read_unlock+0x4b/0x70
> 
> --Andy

You're loading the kvm module, or loading some other module inside
a kvm guest?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/10] kvm: memory slot cleanups, fix, and increase

2012-12-14 Thread Marcelo Tosatti

On Mon, Dec 10, 2012 at 10:32:39AM -0700, Alex Williamson wrote:
> v2: Update 02/10 to not check userspace_addr when slot is removed.
> Yoshikawa-san withdrew objection to increase slot_bitmap prior
> to his series to remove slot_bitmap.
> 
> This series does away with any kind of complicated resizing of the
> slot array and simply does a one time increase.  I do compact struct
> kvm_memory_slot a bit to take better advantage of the space we are
> using.  This reduces each slot from 64 bytes (x86_64) to 56 bytes.
> By enforcing the API around valid operations for an unused slot and
> fields that can be modified runtime, I found and was able to fix a
> bug in iommu mapping for slots.  The renames enabled me to find the
> previously posted bug fix for catching slot overlaps.
> 
> As mentioned in the series, the primary motivation for increasing
> memory slots is assigned devices.  With this, I've been able to
> assign 30 devices to a single VM and could have gone further, but
> ran out of SRIOV VFs.  Typical devices use anywhere from 2-4 slots
> and max out at 8 slots.  125 user slots (3 private slots) allows
> us to support between 28 and 56 typical devices per VM.
> 
> Tested on x86_64, compiled on ia64, powerpc, and s390.
> 
> Thanks,
> Alex

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/5] KVM: x86: improve reexecute_instruction

2012-12-14 Thread Marcelo Tosatti

On Fri, Dec 14, 2012 at 12:50:09PM +0800, Xiao Guangrong wrote:
> >>> program a timer interrupt and #GP? 
> >>
> >> Could you please explain the detail?
> > 
> > Before the instruction which writes continuously to the pagetable, arm
> > say lapic timer. #GP on the interrupt handler and test with failure.
> 
> Sorry, I am confused about this. After Qemu exits due to 
> KVM_EXIT_INTERNAL_ERROR,
> the vm is stopped then interrupt can not be injected to guest. Or i missed 
> something?

Yes, but without fixed kernel kvm-unit test executable loops continuously.
Perhaps its more appropriate to fix generically, nevermind.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: create __getnstimeofday for WARNless calls

2012-12-14 Thread John Stultz


On 12/14/2012 05:16 PM, John Stultz wrote:

On 12/13/2012 10:17 AM, Kees Cook wrote:

John, any feedback on this?



Looking at my inbox, I actually can't find a copy of this specific 
patch. Do you mind bouncing it to me, so I have something I can apply?

Nm, I found it.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: create __getnstimeofday for WARNless calls

2012-12-14 Thread John Stultz


On 12/13/2012 10:17 AM, Kees Cook wrote:

John, any feedback on this?


Sorry, yea, I've been meaning to get back to this.

I'm still on the fence about just making getnstimeofday() safe for when 
timekeeping is suspended, but at the same time, your issue needs 
fixing.  Also bailing out at the end still seems off to me. Even if 
someone is using the values despite the WARN_ON, they really are getting 
junk values, and for all the time that WARN_ON has been there, you're 
the first to report running into it.


Even so, I think I'm ok with this patch for now, but I suspect we may 
want to rework it later.


Looking at my inbox, I actually can't find a copy of this specific 
patch. Do you mind bouncing it to me, so I have something I can apply?


Should this also get marked for -stable?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] x86/uapi for 3.8

2012-12-14 Thread Linus Torvalds

On Fri, Dec 14, 2012 at 3:45 PM, David Howells  wrote:
> Linus Torvalds  wrote:
>
>> Yeah, I think I have most of the x86 stuff merged now (just merged the
>> EFI and ACPI trees), and at this point it might be worth regenerating
>> it and getting this over and done with.
>
> Okay, regenerated and pushed.

Ugh. This doesn't seem to work for me at all. It causes infinite
scrolling of some text that I have no idea about.

I started bisecting (because I thought it might be something else and
I hadn't booted after every pull), but by now the only thing I have
left is ARM and a couple of tiny OF patches .. and the x86 UAPI split.

The split *should* have been safe, since it's mostly a "compile or
not" thing like Peter said, but we had similar problems on other
architectures, when things compiled but didn't actually work due to
missing #define's and #ifdef handling. Things like
architecture-specific macros that have default versions available when
the macro is missing etc.

Now, maybe it's some of the other remaining commits after all, but
from where I am in the bisection it really looks like the uapi patch
is the most likely culprit. So I thought I'd let people know...

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] Input: Add ChromeOS EC keyboard driver

2012-12-14 Thread Grant Likely

On Wed, 12 Dec 2012 13:33:48 -0800, Simon Glass  wrote:
> Use the key-matrix layer to interpret key scan information from the EC
> and inject input based on the FDT-supplied key map. This driver registers
> itself with the ChromeOS EC driver to perform communications.
> 
> Additional FDT bindings are provided to specify rows/columns and the
> auto-repeat information.
> 
> Signed-off-by: Simon Glass 
> Signed-off-by: Luigi Semenzato 
> Signed-off-by: Vincent Palatin 
> ---
>  .../devicetree/bindings/input/cros-ec-keyb.txt |   77 
>  drivers/input/keyboard/Kconfig |   10 +
>  drivers/input/keyboard/Makefile|1 +
>  drivers/input/keyboard/cros_ec_keyb.c  |  413 
> 
>  4 files changed, 501 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/input/cros-ec-keyb.txt
>  create mode 100644 drivers/input/keyboard/cros_ec_keyb.c
> 
> diff --git a/Documentation/devicetree/bindings/input/cros-ec-keyb.txt 
> b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt
> new file mode 100644
> index 000..67f51d8
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt
> @@ -0,0 +1,77 @@
> +ChromeOS EC Keyboard
> +
> +Google's ChromeOS EC Keyboard is a simple matrix keyboard implemented on
> +a separate EC (Embedded Controller) device. It provides a message for reading
> +key scans from the EC. These are then converted into keycodes for processing
> +by the kernel.
> +
> +Required properties:
> +- compatible: "google,cros-ec-keyb"
> +- google,key-rows: Number of keyboard rows (must be <= 8)
> +- google,key-columns: Number of keyboard columns (must be <= 13)
> +- google,repeat-delay-ms: Key repeat delay in milliseconds
> +- google,repeat-rate-ms: Key repeat rate in milliseconds

Hmmm, these should probably be in a common binding. Take a look at
the other input bindings and make a proposal for properties to add to
matrix-keymap.txt.

> +- linux.keymap: Key map as for matrix-keypad.txt

should be: linux,keymap (comma instead of period)

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write

2012-12-14 Thread Andy Lutomirski

On Thu, Dec 13, 2012 at 6:10 PM, Darrick J. Wong
 wrote:
> On Thu, Dec 13, 2012 at 05:48:06PM -0800, Andy Lutomirski wrote:
>> On 12/13/2012 12:08 AM, Darrick J. Wong wrote:
>> > Several complaints have been received regarding long file write latencies 
>> > when
>> > memory pages must be held stable during writeback.  Since it might not be
>> > acceptable to stall programs for the entire duration of a page write 
>> > (which may
>> > take many milliseconds even on good hardware), enable a second strategy 
>> > wherein
>> > pages are snapshotted as part of submit_bio; the snapshot can be held 
>> > stable
>> > while writes continue.
>> >
>> > This provides a band-aid to provide stable page writes on jbd without 
>> > needing
>> > to backport the fixed locking scheme in jbd2.  A mount option is added to 
>> > ext4
>> > to allow administrators to enable it there.
>>
>> I'm a bit confused as to what it has to do with ext3.  Wouldn't this be
>> useful as a mount option everywhere, though?
>
> ext3 requires snapshots; the rest are ok with either strategy.
>
> *If* snapshotting is generally liked, then yes I'll go redo it as a vfs mount
> option.
>
>> If this becomes widely used, would it be better to snapshot on
>> wait_for_stable_page instead of on io submission?
>
> That really depends on how long you can afford to wait and how much free
> memory you have. :)  It's all a big tradeoff between write latency and
> consumption of memory pages and bandwidth, and one that I doubt I'm qualified
> to make for everyone.
>
>> FWIW, I'm about to pound pretty hard on this whole patchset on a box
>> that doesn't need stable pages.  I'll let you know how it goes.
>
> Yay!
>
> --D

It survived.  I hit at least one mm bug, but I really don't think it's
a problem with your code.  (I have not tried this workload on Linux
3.7 at all before.  It normally runs on 3.5.)  The box in question is
ext4 on LVM on dm-crypt on (hardware) RAID 5 on hpsa, which should not
need stable pages.

The majority of the data written (that wasn't unlinked before it was
dropped from cache) was checksummed when written and verified later.
Most of this data was written using mmap.  This workload hammers the
vm concurrently in several threads, and it frequently stalls when
stable pages are enabled, so it's probably exercising the code
decently well.

Feel free to add Tested-by: Andy Lutomirski 

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fadvise: perform WILLNEED readahead in a workqueue

2012-12-14 Thread Eric Wong

Applications streaming large files may want to reduce disk spinups and
I/O latency by performing large amounts of readahead up front.
Applications also tend to read files soon after opening them, so waiting
on a slow fadvise may cause unpleasant latency when the application
starts reading the file.

As a userspace hacker, I'm sometimes tempted to create a background
thread in my app to run readahead().  However, I believe doing this
in the kernel will make life easier for other userspace hackers.

Since fadvise makes no guarantees about when (or even if) readahead
is performed, this change should not hurt existing applications.

"strace -T" timing on an uncached, one gigabyte file:

 Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
  After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.61>

Signed-off-by: Eric Wong 
---
 N.B.: I'm not sure if I'm misusing any kernel APIs here.  I managed to
 compile, boot, and run fadvise in a loop without anything blowing up.
 I've verified readahead gets performed via mincore().

 If the workqueue approach is acceptable, I'll proceed with
 changing MADV_WILLNEED, too.

 include/linux/mm.h |3 +++
 mm/fadvise.c   |   10 -
 mm/readahead.c |   62 
 3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcaab4e..17ab7d3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1473,6 +1473,9 @@ void task_dirty_inc(struct task_struct *tsk);
 #define VM_MAX_READAHEAD   128 /* kbytes */
 #define VM_MIN_READAHEAD   16  /* kbytes (includes current page) */
 
+void wq_page_cache_readahead(struct address_space *mapping, struct file *filp,
+   pgoff_t offset, unsigned long nr_to_read);
+
 int force_page_cache_readahead(struct address_space *mapping, struct file 
*filp,
pgoff_t offset, unsigned long nr_to_read);
 
diff --git a/mm/fadvise.c b/mm/fadvise.c
index a47f0f5..cf3bd4c 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -102,12 +102,10 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, 
loff_t len, int advice)
if (!nrpages)
nrpages = ~0UL;
 
-   /*
-* Ignore return value because fadvise() shall return
-* success even if filesystem can't retrieve a hint,
-*/
-   force_page_cache_readahead(mapping, f.file, start_index,
-  nrpages);
+   get_file(f.file); /* fput() is called by workqueue */
+
+   /* queue up the request, don't care if it fails */
+   wq_page_cache_readahead(mapping, f.file, start_index, nrpages);
break;
case POSIX_FADV_NOREUSE:
break;
diff --git a/mm/readahead.c b/mm/readahead.c
index 7963f23..56a80a9 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -19,6 +19,27 @@
 #include 
 #include 
 #include 
+#include 
+
+static struct workqueue_struct *readahead_wq __read_mostly;
+
+struct wq_ra_req {
+   struct work_struct work;
+   struct address_space *mapping;
+   struct file *file;
+   pgoff_t offset;
+   unsigned long nr_to_read;
+};
+
+static int __init init_readahead_wq(void)
+{
+   readahead_wq = alloc_workqueue("readahead", WQ_UNBOUND,
+   WQ_UNBOUND_MAX_ACTIVE);
+   BUG_ON(!readahead_wq);
+   return 0;
+}
+
+early_initcall(init_readahead_wq);
 
 /*
  * Initialise a struct file's readahead state.  Assumes that the caller has
@@ -204,6 +225,47 @@ out:
return ret;
 }
 
+static void wq_ra_req_fn(struct work_struct *work)
+{
+   struct wq_ra_req *req = container_of(work, struct wq_ra_req, work);
+
+   /* ignore errors, caller wanted fire-and-forget operation */
+   force_page_cache_readahead(req->mapping, req->file,
+   req->offset, req->nr_to_read);
+
+   fput(req->file);
+   kfree(req);
+}
+
+/*
+ * Fire-and-forget readahead using a workqueue, this allocates pages
+ * inside a workqueue and returns as soon as possible.
+ */
+void wq_page_cache_readahead(struct address_space *mapping, struct file *filp,
+   pgoff_t offset, unsigned long nr_to_read)
+{
+   struct wq_ra_req *req;
+
+   req = kzalloc(sizeof(*req), GFP_ATOMIC);
+
+   /*
+* we are fire-and-forget, not having enough memory means readahead
+* is not worth doing anyways
+*/
+   if (!req) {
+   fput(filp);
+   return;
+   }
+
+   INIT_WORK(>work, wq_ra_req_fn);
+   req->mapping = mapping;
+   req->file = filp;
+   req->offset = offset;
+   req->nr_to_read = nr_to_read;
+
+   queue_work(readahead_wq, >work);
+}
+
 /*
  * Chunk the readahead into 2 megabyte units, so that we don't pin too much
  * memory at once.
-- 
Eric Wong
--
To unsubscribe from this

Re: [PATCH v2] core_pattern: set core helpers root and namespace to crashing process

2012-12-14 Thread Neil Horman

On Fri, Dec 14, 2012 at 03:10:30PM -0800, Eric W. Biederman wrote:
> Neil Horman  writes:
> 
> > As its currently implemented, redirection of core dumps to a pipe reader 
> > should
> > be executed such that the reader runs in the namespace of the crashing 
> > process,
> > and it currently does not. This is the only sane way to deal with namespaces
> > properly it seems to me, and this patch implements that functionality.
> 
> I actually rather strongly disagree.
> 
> While we have a global core dump pattern core dumps to a a pipe reader
> should be executed such that the reader runs in the namespace of the
> process that set the pattern.  We can easily restrict that to the
> initial namespaces to make the implementation simpler.
> 
> If you want to play namespace games you can implement all of those in
> user space once my tree merges for v3.8.
> 
> I am really not a fan of the trigger process being able to control the
> environment of a privileged process.  It makes writing the privileged
> process much trickier.
> 
Why?  What specific problem do you see with allowing a privlidged process to
execute within a specific namespace, that doesn't also exist with having the
pipe reader execute in the init namespace?  Note I'm not saying that a poorly
constructed pipe reader application doesn't have security problems if it doesn't
validate the environment that its running in, but thats something that the pipe
reader needs to be sure about.

Note also, that if the token in core_pattern is set such that the core_pattern
is namespace/root relative, that container needs to install the application
relative its root as well (e.g. positive action still needs to be taken on the
part of the container admin to make this work).  For example, if you set
core_pattern="||/usr/bin/foo"
Then a process running in a chroot based at /sub/root/ still needs to install a
file /sub/root/usr/bin/foo, or the core dump will fail.  So its not like a
container can just have a core reader execute without first making an
administrative decision to do so.

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] userns: Add a more complete capability subset test to commit_creds

2012-12-14 Thread Eric W. Biederman


When unsharing a user namespace we reduce our credentials to just what
can be done in that user namespace.  This is a subset of the credentials
we previously had.  Teach commit_creds to recognize this is a subset
of the credentials we have had before and don't clear the dumpability flag.

This allows an unprivileged  program to do:
unshare(CLONE_NEWUSER);
fd = open("/proc/self/uid_map", O_RDWR);

Where previously opening the uid_map writable would fail because
the the task had been made non-dumpable.

Signed-off-by: "Eric W. Biederman" 
---
 kernel/cred.c |   27 ++-
 1 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 48cea3d..709d521 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -455,6 +455,31 @@ error_put:
return ret;
 }
 
+static bool cred_cap_issubset(const struct cred *set, const struct cred 
*subset)
+{
+   const struct user_namespace *set_ns = set->user_ns;
+   const struct user_namespace *subset_ns = subset->user_ns;
+
+   /* If the two credentials are in the same user namespace see if
+* the capabilities of subset are a subset of set.
+*/
+   if (set_ns == subset_ns)
+   return cap_issubset(subset->cap_permitted, set->cap_permitted);
+
+   /* The credentials are in a different user namespaces
+* therefore one is a subset of the other only if a set is an
+* ancestor of subset and set->euid is owner of subset or one
+* of subsets ancestors.
+*/
+   for (;subset_ns != _user_ns; subset_ns = subset_ns->parent) {
+   if ((set_ns == subset_ns->parent)  &&
+   uid_eq(subset_ns->owner, set->euid))
+   return true;
+   }
+
+   return false;
+}
+
 /**
  * commit_creds - Install new credentials upon the current task
  * @new: The credentials to be assigned
@@ -493,7 +518,7 @@ int commit_creds(struct cred *new)
!gid_eq(old->egid, new->egid) ||
!uid_eq(old->fsuid, new->fsuid) ||
!gid_eq(old->fsgid, new->fsgid) ||
-   !cap_issubset(new->cap_permitted, old->cap_permitted)) {
+   !cred_cap_issubset(old, new)) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] userns: Add a more complete capability subset test to commit_creds

2012-12-14 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> "Serge E. Hallyn"  writes:
> 
> > Quoting Eric W. Biederman (ebied...@xmission.com):
> >> 
> >> When unsharing a user namespace we reduce our credentials to just what
> >> can be done in that user namespace.  This is a subset of the credentials
> >> we previously had.  Teach commit_creds to recognize this is a subset
> >> of the credentials we have had before and don't clear the dumpability flag.
> >> 
> >> This allows an unprivileged  program to do:
> >> unshare(CLONE_NEWUSER);
> >> fd = open("/proc/self/uid_map", O_RDWR);
> >> 
> >> Where previously opening the uid_map writable would fail because
> >> the the task had been made non-dumpable.
> >> 
> >> Signed-off-by: "Eric W. Biederman" 
> >
> > Acked-by: Serge Hallyn 
> >
> >> ---
> >>  kernel/cred.c |   26 +-
> >>  1 files changed, 25 insertions(+), 1 deletions(-)
> >> 
> >> diff --git a/kernel/cred.c b/kernel/cred.c
> >> index 48cea3d..993a7ea41 100644
> >> --- a/kernel/cred.c
> >> +++ b/kernel/cred.c
> >> @@ -455,6 +455,30 @@ error_put:
> >>return ret;
> >>  }
> >>  
> >
> > Do you think we need to warn that this can only be used for
> > commit_creds?  (i.e. if someone tried ot use this in some
> > other context, the 'creds are subset of target ns is a child
> > of current_ns' assumption would be wrong)
> 
> This function should be a general test valid at any time.
> 
> Except that I forgot the bit of the test that asks is the original cred
> the owner of the subset user namespace.

Ok, with that change that'll be fine :)

> I will respin this patch.

Cool, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] spi: devicetree: add support for loopback mode

2012-12-14 Thread Grant Likely

On Wed, 12 Dec 2012 10:46:00 +0200, Felipe Balbi  wrote:
> there are a few spi master drivers which make
> use of that flag but there is no way to pass it
> through devicetree.
> 
> This patch just creates a way to pass SPI_LOOP
> via devicetree.

I don't understand how this would be useful since loopback mode is
really just a test feature. Is there any reason to do loopback for
something other than test?

I think it would be better to add a sysfs or debugfs property to
manipulate the SPI_LOOP flag from userspace. What do you think?

g.

> 
> Signed-off-by: Felipe Balbi 
> ---
>  Documentation/devicetree/bindings/spi/spi-bus.txt | 2 ++
>  drivers/spi/spi.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/spi/spi-bus.txt 
> b/Documentation/devicetree/bindings/spi/spi-bus.txt
> index 296015e..1949586 100644
> --- a/Documentation/devicetree/bindings/spi/spi-bus.txt
> +++ b/Documentation/devicetree/bindings/spi/spi-bus.txt
> @@ -55,6 +55,8 @@ contain the following properties.
>   chip select active high
>  - spi-3wire   - (optional) Empty property indicating device requires
>   3-wire mode.
> +- spi-loopback- (optional) Empty property indicating device requires
> + loopback mode.
>  
>  If a gpio chipselect is used for the SPI slave the gpio number will be passed
>  via the cs_gpio
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 3f1b9ee..6bcdc03 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -868,6 +868,8 @@ static void of_register_spi_devices(struct spi_master 
> *master)
>   spi->mode |= SPI_CS_HIGH;
>   if (of_find_property(nc, "spi-3wire", NULL))
>   spi->mode |= SPI_3WIRE;
> + if (of_find_property(nc, "spi-loopback", NULL))
> + spi->mode |= SPI_LOOP;
>  
>   /* Device speed */
>   prop = of_get_property(nc, "spi-max-frequency", );
> -- 
> 1.8.1.rc1.5.g7e0651a
> 

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] spi: omap2: disable DMA requests before complete()

2012-12-14 Thread Grant Likely

On Wed, 12 Dec 2012 10:45:59 +0200, Felipe Balbi  wrote:
> No actual errors have been found for completing
> before disabling DMA request lines, but it just
> looks more semantically correct that on our DMA
> callback we quiesce the whole thing before stating
> transfer is finished.
> 
> Signed-off-by: Felipe Balbi 

Applied, thanks.

g.

> ---
>  drivers/spi/spi-omap2-mcspi.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c
> index b610f52..68446db 100644
> --- a/drivers/spi/spi-omap2-mcspi.c
> +++ b/drivers/spi/spi-omap2-mcspi.c
> @@ -298,10 +298,10 @@ static void omap2_mcspi_rx_callback(void *data)
>   struct omap2_mcspi *mcspi = spi_master_get_devdata(spi->master);
>   struct omap2_mcspi_dma *mcspi_dma = 
> >dma_channels[spi->chip_select];
>  
> - complete(_dma->dma_rx_completion);
> -
>   /* We must disable the DMA RX request */
>   omap2_mcspi_set_dma_req(spi, 1, 0);
> +
> + complete(_dma->dma_rx_completion);
>  }
>  
>  static void omap2_mcspi_tx_callback(void *data)
> @@ -310,10 +310,10 @@ static void omap2_mcspi_tx_callback(void *data)
>   struct omap2_mcspi *mcspi = spi_master_get_devdata(spi->master);
>   struct omap2_mcspi_dma *mcspi_dma = 
> >dma_channels[spi->chip_select];
>  
> - complete(_dma->dma_tx_completion);
> -
>   /* We must disable the DMA TX request */
>   omap2_mcspi_set_dma_req(spi, 0, 0);
> +
> + complete(_dma->dma_tx_completion);
>  }
>  
>  static void omap2_mcspi_tx_dma(struct spi_device *spi,
> -- 
> 1.8.1.rc1.5.g7e0651a
> 

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH - v2] spi: davinci: add OF support for the spi controller

2012-12-14 Thread Grant Likely

On Tue, 11 Dec 2012 16:20:39 -0500, Murali Karicheri  
wrote:
> This adds OF support to DaVinci SPI controller to configure platform
> data through device bindings. Also replaces clk_enable() with
> of clk_prepare_enable() as well as clk_disable() with
> clk_disable_unprepare().
> 
> Signed-off-by: Murali Karicheri 
> Reviewed-by : Grant Likely 

Applied, thanks.

I did remove the OF_ALIAS_N property though. I know the COMPATIBLE one
uses it, but it is actually kind of redundant since it can also be
determined by counting the number of OF_ALIAS_* entries, and having the
_N one in there means extra work needs to be done to filter it out.

Also, I had to add a #ifndef _LINUX_OF_PRIVATE_H wrapper around the
whole header file. This is needed for all header files to protect
against multiple includes.

g.

> ---
>  - Change log
>  - v2 - changed the compatibility strings to include soc name
>   -  changed ti,davinci-num-cs to num-cs
>v1 - removed attribute for spi version. instead, compatibility string is
>  modified to include version info.
>   - pdata ptr in davinci_spi_platform_data is replaced with struct itself.
>   - spi_davinci_get_pdata() now populates the pdata in the above structure
>  with parsed values from DT bindings.
>   - rebased to v3.7 rc7 of linux-next
>   - replaces clk_* APIs with their prepare/unprepare version
>  .../devicetree/bindings/spi/spi-davinci.txt|   51 ++
>  drivers/spi/spi-davinci.c  |  102 
> +---
>  2 files changed, 139 insertions(+), 14 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/spi/spi-davinci.txt
> 
> diff --git a/Documentation/devicetree/bindings/spi/spi-davinci.txt 
> b/Documentation/devicetree/bindings/spi/spi-davinci.txt
> new file mode 100644
> index 000..8cb3fee
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/spi/spi-davinci.txt
> @@ -0,0 +1,51 @@
> +Davinci SPI controller device bindings
> +
> +Required properties:
> +- #address-cells: number of cells required to define a chip select
> + address on the SPI bus. Should be set to 1.
> +- #size-cells: should be zero.
> +- compatible:
> + - "ti,dm644x-spi" for SPI used similar to that on DM644x SoC family
> + - "ti,da8xx-spi" for SPI used similar to that on DA8xx SoC family
> +- reg: Offset and length of SPI controller register space
> +- num-cs: Number of chip selects
> +- ti,davinci-spi-intr-line: interrupt line used to connect the SPI
> + IP to the interrupt controller withn the SoC. Possible values
> + are 0 and 1. Manual says one of the two possible interrupt
> + lines can be tied to the interrupt controller. Set this
> + based on a specifc SoC configuration.
> +- interrupts: interrupt number offset at the irq parent
> +- clocks: spi clk phandle
> +
> +Example of a NOR flash slave device (n25q032) connected to DaVinci
> +SPI controller device over the SPI bus.
> +
> +spi0:spi@20BF {
> + #address-cells  = <1>;
> + #size-cells = <0>;
> + compatible  = "ti,dm644x-spi";
> + reg = <0x20BF 0x1000>;
> + num-cs  = <4>;
> + ti,davinci-spi-intr-line= <0>;
> + interrupts  = <338>;
> + clocks  = <>;
> +
> + flash: n25q032@0 {
> + #address-cells = <1>;
> + #size-cells = <1>;
> + compatible = "st,m25p32";
> + spi-max-frequency = <2500>;
> + reg = <0>;
> +
> + partition@0 {
> + label = "u-boot-spl";
> + reg = <0x0 0x8>;
> + read-only;
> + };
> +
> + partition@1 {
> + label = "test";
> + reg = <0x8 0x38>;
> + };
> + };
> +};
> diff --git a/drivers/spi/spi-davinci.c b/drivers/spi/spi-davinci.c
> index 147dfa8..e5d8489 100644
> --- a/drivers/spi/spi-davinci.c
> +++ b/drivers/spi/spi-davinci.c
> @@ -28,6 +28,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -135,7 +137,7 @@ struct davinci_spi {
>   int dma_rx_chnum;
>   int dma_tx_chnum;
>  
> - struct davinci_spi_platform_data *pdata;
> + struct davinci_spi_platform_data pdata;
>  
>   void(*get_rx)(u32 rx_data, struct davinci_spi *);
>   u32 (*get_tx)(struct davinci_spi *);
> @@ -213,7 +215,7 @@ static void davinci_spi_chipselect(struct spi_device 
> *spi, int value)
>   bool gpio_chipsel = false;
>  
>   dspi = spi_master_get_devdata(spi->master);
> - pdata = dspi->pdata;
> + pdata = >pdata;
>  
>   if (pdata->chip_sel && chip_sel < pdata->num_chipselect &&
>

[PATCH] PM / QoS: Rename local variable in dev_pm_qos_add_ancestor_request()

2012-12-14 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

Local variable 'error' in dev_pm_qos_add_ancestor_request() need
not contain error codes only, so rename it to 'ret'.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/base/power/qos.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux/drivers/base/power/qos.c
===
--- linux.orig/drivers/base/power/qos.c
+++ linux/drivers/base/power/qos.c
@@ -542,19 +542,19 @@ int dev_pm_qos_add_ancestor_request(stru
struct dev_pm_qos_request *req, s32 value)
 {
struct device *ancestor = dev->parent;
-   int error = -ENODEV;
+   int ret = -ENODEV;
 
while (ancestor && !ancestor->power.ignore_children)
ancestor = ancestor->parent;
 
if (ancestor)
-   error = dev_pm_qos_add_request(ancestor, req,
-  DEV_PM_QOS_LATENCY, value);
+   ret = dev_pm_qos_add_request(ancestor, req,
+DEV_PM_QOS_LATENCY, value);
 
-   if (error < 0)
+   if (ret < 0)
req->dev = NULL;
 
-   return error;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_add_ancestor_request);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Resend][PATCH] PM: Move disabling/enabling runtime PM to late suspend/early resume

2012-12-14 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

Currently, the PM core disables runtime PM for all devices right
after executing subsystem/driver .suspend() callbacks for them
and re-enables it right before executing subsystem/driver .resume()
callbacks for them.  This may lead to problems when there are
two devices such that the .suspend() callback executed for one of
them depends on runtime PM working for the other.  In that case,
if runtime PM has already been disabled for the second device,
the first one's .suspend() won't work correctly (and analogously
for resume).

To make those issues go away, make the PM core disable runtime PM
for devices right before executing subsystem/driver .suspend_late()
callbacks for them and enable runtime PM for them right after
executing subsystem/driver .resume_early() callbacks for them.  This
way the potential conflitcs between .suspend_late()/.resume_early()
and their runtime PM counterparts are still prevented from happening,
but the subtle ordering issues related to disabling/enabling runtime
PM for devices during system suspend/resume are much easier to avoid.

Reported-and-tested-by: Jan-Matthias Braun 
Signed-off-by: Rafael J. Wysocki 
---
 Documentation/power/runtime_pm.txt |9 +
 drivers/base/power/main.c  |9 -
 2 files changed, 9 insertions(+), 9 deletions(-)

Index: linux/drivers/base/power/main.c
===
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -513,6 +513,8 @@ static int device_resume_early(struct de
 
  Out:
TRACE_RESUME(error);
+
+   pm_runtime_enable(dev);
return error;
 }
 
@@ -589,8 +591,6 @@ static int device_resume(struct device *
if (!dev->power.is_suspended)
goto Unlock;
 
-   pm_runtime_enable(dev);
-
if (dev->pm_domain) {
info = "power domain ";
callback = pm_op(>pm_domain->ops, state);
@@ -930,6 +930,8 @@ static int device_suspend_late(struct de
pm_callback_t callback = NULL;
char *info = NULL;
 
+   __pm_runtime_disable(dev, false);
+
if (dev->power.syscore)
return 0;
 
@@ -1133,11 +1135,8 @@ static int __device_suspend(struct devic
 
  Complete:
complete_all(>power.completion);
-
if (error)
async_error = error;
-   else if (dev->power.is_suspended)
-   __pm_runtime_disable(dev, false);
 
return error;
 }
Index: linux/Documentation/power/runtime_pm.txt
===
--- linux.orig/Documentation/power/runtime_pm.txt
+++ linux/Documentation/power/runtime_pm.txt
@@ -642,12 +642,13 @@ out the following operations:
   * During system suspend it calls pm_runtime_get_noresume() and
 pm_runtime_barrier() for every device right before executing the
 subsystem-level .suspend() callback for it.  In addition to that it calls
-pm_runtime_disable() for every device right after executing the
-subsystem-level .suspend() callback for it.
+__pm_runtime_disable() with 'false' as the second argument for every device
+right before executing the subsystem-level .suspend_late() callback for it.
 
   * During system resume it calls pm_runtime_enable() and pm_runtime_put_sync()
-for every device right before and right after executing the subsystem-level
-.resume() callback for it, respectively.
+for every device right after executing the subsystem-level .resume_early()
+callback and right after executing the subsystem-level .resume() callback
+for it, respectively.
 
 7. Generic subsystem callbacks
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/8] mm: vmscan: disregard swappiness shortly before going OOM

2012-12-14 Thread Johannes Weiner

On Fri, Dec 14, 2012 at 05:13:45PM +0100, Michal Hocko wrote:
> On Fri 14-12-12 10:43:55, Rik van Riel wrote:
> > On 12/14/2012 03:37 AM, Michal Hocko wrote:
> > 
> > >I can answer the later. Because memsw comes with its price and
> > >swappiness is much cheaper. On the other hand it makes sense that
> > >swappiness==0 doesn't swap at all. Or do you think we should get back to
> > >_almost_ doesn't swap at all?
> > 
> > swappiness==0 will swap in emergencies, specifically when we have
> > almost no page cache left, we will still swap things out:
> > 
> > if (global_reclaim(sc)) {
> > free  = zone_page_state(zone, NR_FREE_PAGES);
> > if (unlikely(file + free <= high_wmark_pages(zone))) {
> > /*
> >  * If we have very few page cache pages, force-scan
> >  * anon pages.
> >  */
> > fraction[0] = 1;
> > fraction[1] = 0;
> > denominator = 1;
> > goto out;
> > 
> > This makes sense, because people who set swappiness==0 but
> > do have swap space available would probably prefer some
> > emergency swapping over an OOM kill.
> 
> Yes, but this is the global reclaim path. I was arguing about
> swappiness==0 & memcg. As this patch doesn't make a big difference for
> the global case (as both the changelog and you mentioned) then we should
> focus on whether this is desirable change for the memcg path. I think it
> makes sense to keep "no swapping at all for memcg semantic" as we have
> it currently.

I would prefer we could agree on one thing, though.  Having global
reclaim behave different from memcg reclaim violates the principle of
least surprise.  Having the code behave like that implicitely without
any mention of global_reclaim() and vm_swappiness() is unacceptable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] atkbd: Fix multi-char scancode handling on reconnect.

2012-12-14 Thread Shawn Nematbakhsh

On resume from suspend there is a possibility for multi-byte scancodes
to be handled incorrectly. atkbd_reconnect disables the processing of
scancodes in software by calling atkbd_disable, but the keyboard may
still be active because no disconnect command was sent. Later, software
handling is re-enabled. If a multi-byte scancode sent from the keyboard
straddles the re-enable, only the latter byte(s) will be handled.

In practice, this leads to cases where multi-byte break codes (ex. "e0
4d" - break code for right-arrow) are misread as make codes ("4d" - make
code for numeric 6), leading to one or more unwanted, untyped characters
being interpreted.

The solution implemented here involves sending command f5 (reset
disable) to the keyboard prior to disabling software handling of codes.
Later, the command to re-enable the keyboard is sent only after we are
prepared to handle scancodes.

Signed-off-by: Shawn Nematbakhsh 
---
 drivers/input/keyboard/atkbd.c |   26 --
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c
index add5ffd..da49e8b 100644
--- a/drivers/input/keyboard/atkbd.c
+++ b/drivers/input/keyboard/atkbd.c
@@ -844,6 +844,24 @@ static int atkbd_activate(struct atkbd *atkbd)
 }
 
 /*
+ * atkbd_deactivate() resets and disables the keyboard from sending
+ * keystrokes.
+ */
+static int atkbd_deactivate(struct atkbd *atkbd)
+{
+   struct ps2dev *ps2dev = >ps2dev;
+
+   if (ps2_command(ps2dev, NULL, ATKBD_CMD_RESET_DIS)) {
+   dev_err(>serio->dev,
+   "Failed to deactivate keyboard on %s\n",
+   ps2dev->serio->phys);
+   return -1;
+   }
+
+   return 0;
+}
+
+/*
  * atkbd_cleanup() restores the keyboard state so that BIOS is happy after a
  * reboot.
  */
@@ -1199,6 +1217,9 @@ static int atkbd_reconnect(struct serio *serio)
 
mutex_lock(>mutex);
 
+   if (atkbd->write)
+   atkbd_deactivate(atkbd);
+
atkbd_disable(atkbd);
 
if (atkbd->write) {
@@ -1208,8 +1229,6 @@ static int atkbd_reconnect(struct serio *serio)
if (atkbd->set != atkbd_select_set(atkbd, atkbd->set, 
atkbd->extra))
goto out;
 
-   atkbd_activate(atkbd);
-
/*
 * Restore LED state and repeat rate. While input core
 * will do this for us at resume time reconnect may happen
@@ -1224,6 +1243,9 @@ static int atkbd_reconnect(struct serio *serio)
}
 
atkbd_enable(atkbd);
+   if (atkbd->write)
+   atkbd_activate(atkbd);
+
retval = 0;
 
  out:
-- 
1.7.8.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-14 Thread NeilBrown

On Thu, 13 Dec 2012 11:42:18 -0600 Jon Hunter  wrote:

> 
> On 12/12/2012 10:33 PM, NeilBrown wrote:
> > On Thu, 13 Dec 2012 14:06:35 +1100 NeilBrown  wrote:
> > 
>  +omap_dm_timer_enable(omap->dm_timer);
> >>>
> >>> Do you need to call omap_dm_timer_enable here? _set_load and _set_match
> >>> will enable the timer. So this should not be necessary.
> >>
> >> True.  That is what you get for copying someone else's code and not
> >> understanding it fully.
> > 
> > However  omap_dm_timer_write_counter *doesn't* enable the timer, and
> > explicitly checks that it is already runtime-enabled.
> > 
> > Does that mean I don't need to call omap_dm_timer_write_counter here?  Or
> > does it mean that I do need the enable/disable pair?
> 
> Typically, omap_dm_timer_write_counter() is used to update the counter
> value while the counter is running and hence is enabled.
> 
> Looking at the code, some more I now see what they are trying to do. It
> seems that they are trying to force an overflow to occur as soon as they
> enable the timer. This will cause the timer to load the count value from
> the timer load register into the timer counter register. So that does
> make sense to me. However, this should not be necessary as
> omap_dm_timer_set_load should do this for you. Therefore, I think that
> you could accomplish the same thing by doing ...
> 
> omap_pwm_config
>   --> omap_dm_timer_set_load()
>   --> omap_dm_timer_set_match()
>   --> omap_dm_timer_set_pwm()
> 
> omap_pwm_enable
>   --> omap_dm_timer_start()
> 
> If we call _set_load in config then we don't need to call _load_start in
> the enable, we can just call _start.
> 
> Can you try this and see if this is working ok?

Seems to work, and is much neater.  Thanks.

Below is my current patch.
Unresolved issues are:
 - it uses
omap_dm_timer_request_specific()
   which apparently isn't ideal.
 - It still zeros things in the suspend routine.  I haven't explored this at
   all yet

Thanks,
NeilBrown

From 69ed735d1bc377e8e65345792997f809e60b5fbf Mon Sep 17 00:00:00 2001
From: NeilBrown 
Date: Sun, 2 Dec 2012 14:53:20 +1100
Subject: [PATCH] pwm: omap: Add PWM support using dual-mode timers

This patch is based on an earlier patch by Grant Erickson
which provided PWM devices using the 'legacy' interface.

This driver instead uses the new framework interface.

Platform data must be provided to identify which dmtimer to use.

Lots of cleanup and inprovements thanks to Thierry Reding
and Jon Hunter.

Cc: Grant Erickson 
Signed-off-by: NeilBrown 

diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
index ed81720..32c1253 100644
--- a/drivers/pwm/Kconfig
+++ b/drivers/pwm/Kconfig
@@ -85,6 +85,15 @@ config PWM_MXS
  To compile this driver as a module, choose M here: the module
  will be called pwm-mxs.
 
+config PWM_OMAP
+   tristate "OMAP PWM support"
+   depends on ARCH_OMAP && OMAP_DM_TIMER
+   help
+ Generic PWM framework driver for OMAP
+
+ To compile this driver as a module, choose M here: the module
+ will be called pwm-omap
+
 config PWM_PUV3
tristate "PKUnity NetBook-0916 PWM support"
depends on ARCH_PUV3
diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
index acfe482..f5d200d 100644
--- a/drivers/pwm/Makefile
+++ b/drivers/pwm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_PWM_IMX)   += pwm-imx.o
 obj-$(CONFIG_PWM_JZ4740)   += pwm-jz4740.o
 obj-$(CONFIG_PWM_LPC32XX)  += pwm-lpc32xx.o
 obj-$(CONFIG_PWM_MXS)  += pwm-mxs.o
+obj-$(CONFIG_PWM_OMAP) += pwm-omap.o
 obj-$(CONFIG_PWM_PUV3) += pwm-puv3.o
 obj-$(CONFIG_PWM_PXA)  += pwm-pxa.o
 obj-$(CONFIG_PWM_SAMSUNG)  += pwm-samsung.o
diff --git a/drivers/pwm/pwm-omap.c b/drivers/pwm/pwm-omap.c
new file mode 100644
index 000..344072c
--- /dev/null
+++ b/drivers/pwm/pwm-omap.c
@@ -0,0 +1,271 @@
+/*
+ *Copyright (c) 2012 NeilBrown 
+ *Heavily based on earlier code which is:
+ *Copyright (c) 2010 Grant Erickson 
+ *
+ *Also based on pwm-samsung.c
+ *
+ *This program is free software; you can redistribute it and/or
+ *modify it under the terms of the GNU General Public License
+ *version 2 as published by the Free Software Foundation.
+ *
+ *Description:
+ *  This file is the core OMAP support for the generic, Linux
+ *  PWM driver / controller, using the OMAP's dual-mode timers.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define DM_TIMER_LOAD_MIN  0xFFFE
+
+struct omap_chip {
+   struct omap_dm_timer*dm_timer;
+   enum pwm_polarity   polarity;
+   unsigned intduty_ns, period_ns;
+   struct pwm_chip chip;
+};
+
+#define to_omap_chip(chip) container_of(chip, struct omap_chip, chip)
+
+/**
+ * pwm_calc_value - Determine the counter value for a clock rate and period.
+ *

RE: [PATCH] EXTCON: Get and set cable properties

2012-12-14 Thread Tc, Jenny


> > I replied on the thread and pointed out issues that I see with this
> > solution. IMHO It's not fair to register a cable with
> > power_supply/regulator/charger manager just to expose the cable
> properties.
> Don't we do that with already in allmost all drivers?Like if I have a keyboard
> driver and then I register with input framework and if the keyboard driver
> needs clk services then I register with clk framework as well and same with
> other services needed by keyboard driver.I think it makes sense for a cable
> to register with different framework if it supports that functionality as 
> those
> services also would know that we have a cable with so and so property.

IMHO it's not a good choice to register a cable itself with any of the three 
subsystem
(power_supply/regulator/charger manager)

 For example it cannot register with power_supply subsystem since it's not a 
 power_supply. It's just source for a power_supply.We register either 
charger/battery with
 power supply. I couldn't find a way to register the cable with power supply 
subsystem. 

In previous discussion Anton also agreed to this.

Anton,
Could you please confirm?

I think the same case with regulator framework also. A cable doesn’t belong to 
a regulator framework.
A cable doesn’t expose any control attribute (current control/voltage control). 
It just have  properties (eg current)
 controlled by external agents (Host machine/wall charger)

And charger manager is a consumer and not a provider. It cannot decide the 
charger cable capabilities.


> > > As I said, extcon provider driver didn't provide directly charging
> > > current(int
> > > mA) and some state(unsigned long state) because the extcon provider
> > > driver(Micro USB interface controller device or other device related
> > > to external connector) haven't mechanism which detect dynamically
> > > charging current immediately. If you wish to provide charging
> > > current data to extcon consumer driver or framework, you should use
> > > regulator/power_supply framework or other method in extcon provider
> driver.

This information need not to come from the extcon provider. It can come from 
any driver
Who knows the state of a cable. It may be a platform driver or may be a driver 
belongs to any
of the three subsystem we discussed above 
(power_supply/regulator/charger_manager).
Just by having an API like this (extcon_cable_set_data), it's not mandatory to 
register the cable
with any of the subsystem.

> > > Also if you want to define 'struct extcon_chrgr_cable_props', you
> > > should certainly show how detect dynamically charging current or state.

For USB cables, it's determined by USB enumeration. For USB 3.0 it would be 
900mA and
USB 2.0 it's 500mA. Also in un configured  state this would be 150 and 100mA 
respectively

Re: [PATCH] PNP: Simplify setting of resources

2012-12-14 Thread Rafael J. Wysocki

On Wednesday, December 12, 2012 04:32:33 PM Witold Szczeponik wrote:
> This patch factors out the setting of PNP resources into one function which 
> is 
> then reused for all PNP resource types.  This makes the code more concise and 
> avoids duplication.  The parameters "type" and "flags" are not used at the
> moment but may be used by follow-up patches.  Placeholders for these patches 
> can be found in the comment lines that contain the "TBD" marker. 
> 
> As the code does not make any changes to the ABI, no regressions are expected.
> 
> NB: While at it, support for bus type resources is added. 
> 
> The patch is applied against Linux 3.7 as well as linux-pm.git/master 
> as of 2012-12-12.

Both patches queued up for submission as v3.8 material later in the cycle.

Thanks,
Rafael


> Signed-off-by: Witold Szczeponik 
> Reviewed-by: Bjorn Helgaas 
> 
> 
> Index: linux/drivers/pnp/interface.c
> ===
> --- linux.orig/drivers/pnp/interface.c
> +++ linux/drivers/pnp/interface.c
> @@ -298,6 +298,39 @@ static ssize_t pnp_show_current_resource
>   return ret;
>  }
>  
> +static char *pnp_get_resource_value(char *buf,
> + unsigned long type,
> + resource_size_t *start,
> + resource_size_t *end,
> + unsigned long *flags)
> +{
> + if (start)
> + *start = 0;
> + if (end)
> + *end = 0;
> + if (flags)
> + *flags = 0;
> +
> + /* TBD: allow for disabled resources */
> +
> + buf = skip_spaces(buf);
> + if (start) {
> + *start = simple_strtoull(buf, , 0);
> + if (end) {
> + buf = skip_spaces(buf);
> + if (*buf == '-') {
> + buf = skip_spaces(buf + 1);
> + *end = simple_strtoull(buf, , 0);
> + } else
> + *end = *start;
> + }
> + }
> +
> + /* TBD: allow for additional flags, e.g., IORESOURCE_WINDOW */
> +
> + return buf;
> +}
> +
>  static ssize_t pnp_set_current_resources(struct device *dmdev,
>struct device_attribute *attr,
>const char *ubuf, size_t count)
> @@ -305,7 +338,6 @@ static ssize_t pnp_set_current_resources
>   struct pnp_dev *dev = to_pnp_dev(dmdev);
>   char *buf = (void *)ubuf;
>   int retval = 0;
> - resource_size_t start, end;
>  
>   if (dev->status & PNP_ATTACHED) {
>   retval = -EBUSY;
> @@ -349,6 +381,10 @@ static ssize_t pnp_set_current_resources
>   goto done;
>   }
>   if (!strnicmp(buf, "set", 3)) {
> + resource_size_t start;
> + resource_size_t end;
> + unsigned long flags;
> +
>   if (dev->active)
>   goto done;
>   buf += 3;
> @@ -357,42 +393,37 @@ static ssize_t pnp_set_current_resources
>   while (1) {
>   buf = skip_spaces(buf);
>   if (!strnicmp(buf, "io", 2)) {
> - buf = skip_spaces(buf + 2);
> - start = simple_strtoul(buf, , 0);
> - buf = skip_spaces(buf);
> - if (*buf == '-') {
> - buf = skip_spaces(buf + 1);
> - end = simple_strtoul(buf, , 0);
> - } else
> - end = start;
> - pnp_add_io_resource(dev, start, end, 0);
> - continue;
> - }
> - if (!strnicmp(buf, "mem", 3)) {
> - buf = skip_spaces(buf + 3);
> - start = simple_strtoul(buf, , 0);
> - buf = skip_spaces(buf);
> - if (*buf == '-') {
> - buf = skip_spaces(buf + 1);
> - end = simple_strtoul(buf, , 0);
> - } else
> - end = start;
> - pnp_add_mem_resource(dev, start, end, 0);
> - continue;
> - }
> - if (!strnicmp(buf, "irq", 3)) {
> - buf = skip_spaces(buf + 3);
> - start = simple_strtoul(buf, , 0);
> - pnp_add_irq_resource(dev, start, 0);
> - continue;
> - }
> - if (!strnicmp(buf, "dma", 3)) {
> - buf = skip_spaces(buf + 3);
> - start =

Re: [PATCH 3/4] userns: Add a more complete capability subset test to commit_creds

2012-12-14 Thread Eric W. Biederman

"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> 
>> When unsharing a user namespace we reduce our credentials to just what
>> can be done in that user namespace.  This is a subset of the credentials
>> we previously had.  Teach commit_creds to recognize this is a subset
>> of the credentials we have had before and don't clear the dumpability flag.
>> 
>> This allows an unprivileged  program to do:
>> unshare(CLONE_NEWUSER);
>> fd = open("/proc/self/uid_map", O_RDWR);
>> 
>> Where previously opening the uid_map writable would fail because
>> the the task had been made non-dumpable.
>> 
>> Signed-off-by: "Eric W. Biederman" 
>
> Acked-by: Serge Hallyn 
>
>> ---
>>  kernel/cred.c |   26 +-
>>  1 files changed, 25 insertions(+), 1 deletions(-)
>> 
>> diff --git a/kernel/cred.c b/kernel/cred.c
>> index 48cea3d..993a7ea41 100644
>> --- a/kernel/cred.c
>> +++ b/kernel/cred.c
>> @@ -455,6 +455,30 @@ error_put:
>>  return ret;
>>  }
>>  
>
> Do you think we need to warn that this can only be used for
> commit_creds?  (i.e. if someone tried ot use this in some
> other context, the 'creds are subset of target ns is a child
> of current_ns' assumption would be wrong)

This function should be a general test valid at any time.

Except that I forgot the bit of the test that asks is the original cred
the owner of the subset user namespace.

I will respin this patch.

As a small segway this property that unshare(CLONE_NEWUSER) results
in a subset of the capabilities a process already had is a very
important property to make it possible to reason about user namespaces.
Maintaining this property is the reason behind the choices I made
in fixing cap_capable.

>> +static bool cred_cap_issubset(const struct cred *set, const struct cred 
>> *subset)
>> +{
>> +const struct user_namespace *set_ns = set->user_ns;
>> +const struct user_namespace *subset_ns = subset->user_ns;
>> +
>> +/* If the two credentials are in the same user namespace see if
>> + * the capabilities of subset are a subset of set.
>> + */
>> +if (set_ns == subset_ns)
>> +return cap_issubset(subset->cap_permitted, set->cap_permitted);
>> +
>> +/* The credentials are in a different user namespaces
>
> This can only happen during setns and CLONE_NEWUSER right?

Right.  This can only happen during setns, unshare(CLONE_NEWUSER),
and possibly during clone.  Otherwise we are not changing the user
namespace.  However for clarity and robustness I don't want the code
to rely on that.

>> + * therefore one is a subset of the other only if a set is an
>> + * ancestor of subset.
>> + */
>
>> +while (subset_ns != _user_ns) {
>> +if (set_ns == subset_ns->parent)
>> +return true;
>> +subset_ns = subset_ns->parent;
>> +}
>> +
>> +return false;
>> +}
>> +
>>  /**
>>   * commit_creds - Install new credentials upon the current task
>>   * @new: The credentials to be assigned
>> @@ -493,7 +517,7 @@ int commit_creds(struct cred *new)
>>  !gid_eq(old->egid, new->egid) ||
>>  !uid_eq(old->fsuid, new->fsuid) ||
>>  !gid_eq(old->fsgid, new->fsgid) ||
>> -!cap_issubset(new->cap_permitted, old->cap_permitted)) {
>> +!cred_cap_issubset(old, new)) {
>>  if (task->mm)
>>  set_dumpable(task->mm, suid_dumpable);
>>  task->pdeath_signal = 0;
>> -- 
>> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps.

2012-12-14 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> "Serge E. Hallyn"  writes:
> 
> > Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> >> A child user namespace having capabilities against processes in it's
> >> parent seems totally bizarre and pretty dangerous from a capabilities
> >> standpoint.
> >
> > How would it have them against its parent?
> 
> init_user_ns
>userns a --- created by kuid 1

Now a mapping needs to be set up (by a task with CAP_SYS_ADMIN in
init_user_ns) which allows kuids 1 and 2 to be used by userns a.
Otherwise (if no mapping is set up) userns a only has the overlowuid.

Realistically only kuids over 10 (let's say) would used.  i.e.
kuids 100,000-199,999 would map to container uids 0-99,999.

>  userns b -- created by kuid 2

Now a mapping needs to be set up (by a task with CAP_SYS_ADMIN in
userns a) which allows kuids 1 and 2 to be used by userns b.

If userns had been mapped with kuids 100,000-199,999 mapping to
uids 0-9, then only kuids in that range could be mapped into
userns b.

> process c in userns b with kuid 1
>
> Serge read the first permisison check in common_cap. 
> Think what happens in the above example.
> 
> For the rest I understand your concern.

Ok.  Then we can discuss my concern later (after the new year).

> Serge please read and look at the patches I have posted to fix
> the issues Andy found with the user namespace tree.  Especially
> the fix to commit_creds.

The setns fixes were IMO the most important - and interesting - ones :)

Thanks, Andy!

> After you have looked at the patches to fix the issues I will
> be happy to discuss things further with you.

Thanks,
-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Gnu

2012-12-14 Thread Ove Karlsen

The GNU GPL licence, is an open-source licence. However GNU has come to 
be associated with cows, stupid slogans involving beer, and Richard 
Stallman in a tinfoil-hat making obscure statements, such as acting as 
thought-police and forbidding the term "open source". Maybe he was one 
of those people who did a lot of LSD in the early hacking-culture. Silly 
dances, and naive geekness, treating trash-women with respect, and 
having romantic thoughts about them on his blog.


I don`t know what happened, but the open-source movement has grown 
large, and too large for these kinds of obscurities.


Humans, before God, has not been enjoined to respect obscure licences, 
or the culture associated with them.


But God is just. And these kinds of licences, can easily be replaced by 
others, who respect the sentiments of open-source. And that we can do 
togheter, in a common decision in agreement by all. And as open-source 
is growing, that is what I think we should do. Take the licence to the 
level, of the movement as a whole. If Stallman wants to protest, if will 
just be him, and his obscure behaviour.


And while we are at it, the linux penguin can go aswell.

And ofcourse that should be true of eagles in symbolism related to the 
US, or lions in symbolism related to Norway, for that matter.


Do you know what an eagle does? It looks at it`s prey, maybe a mouse, 
from high above, flies down, rips it`s skull open, and eats it`s brain. 
That is how the animal world is. Trained from child for killing, to 
survive. But what that has to do with national symbolism, you really can 
wonder.


Peace Be With You.
Uwaysi.

Also posted on my blog - http://paradoxuncreated.com/Blog/wordpress/?p=5801

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] m68k updates for 3.8

2012-12-14 Thread Al Viro

On Fri, Dec 14, 2012 at 03:48:20PM -0600, Rob Landley wrote:
> On 12/14/2012 06:04:51 AM, Greg Ungerer wrote:
> >Hi Rob,
> ...
> >>Somebody got one of my images to boot under aranym but they had
> >>to patch
> >>the kernel fairly extensively to add the emulated device support that
> >>emulator provided. It doesn't emulate real devices the way qemu does,
> >>but qemu doesn't fully emulate the processor (just coldfire in
> >>mainline)...
> >
> >I use aranym for testing m68k. Though I don't really pound to heavily
> >on the devices. I really only cross-compile small systems for testing
> >on it.
> 
> What kernel config do you use for aranym? I don't see an an aranym
> entry in
> arch/m68k/configs, and I stopped using it precisely because it
> required several large patches to add emulated device support for
> everything from serial console to block devices. (There was a kernel
> upgrade, it broke, I cut a release without it. Pretty much the same
> reason I stopped using squashfs for a year or so until it finally
> got merged.)

config NATFEAT
bool "ARAnyM emulator support"
depends on ATARI
help
  This option enables support for ARAnyM native features, such as
  access to a disk image as /dev/hda.

followed by rather obvious options that depend on it (block/console/NIC).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] userns: Add a more complete capability subset test to commit_creds

2012-12-14 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> When unsharing a user namespace we reduce our credentials to just what
> can be done in that user namespace.  This is a subset of the credentials
> we previously had.  Teach commit_creds to recognize this is a subset
> of the credentials we have had before and don't clear the dumpability flag.
> 
> This allows an unprivileged  program to do:
> unshare(CLONE_NEWUSER);
> fd = open("/proc/self/uid_map", O_RDWR);
> 
> Where previously opening the uid_map writable would fail because
> the the task had been made non-dumpable.
> 
> Signed-off-by: "Eric W. Biederman" 

Acked-by: Serge Hallyn 

> ---
>  kernel/cred.c |   26 +-
>  1 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 48cea3d..993a7ea41 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -455,6 +455,30 @@ error_put:
>   return ret;
>  }
>  

Do you think we need to warn that this can only be used for
commit_creds?  (i.e. if someone tried ot use this in some
other context, the 'creds are subset of target ns is a child
of current_ns' assumption would be wrong)

> +static bool cred_cap_issubset(const struct cred *set, const struct cred 
> *subset)
> +{
> + const struct user_namespace *set_ns = set->user_ns;
> + const struct user_namespace *subset_ns = subset->user_ns;
> +
> + /* If the two credentials are in the same user namespace see if
> +  * the capabilities of subset are a subset of set.
> +  */
> + if (set_ns == subset_ns)
> + return cap_issubset(subset->cap_permitted, set->cap_permitted);
> +
> + /* The credentials are in a different user namespaces

This can only happen during setns and CLONE_NEWUSER right?

> +  * therefore one is a subset of the other only if a set is an
> +  * ancestor of subset.
> +  */

> + while (subset_ns != _user_ns) {
> + if (set_ns == subset_ns->parent)
> + return true;
> + subset_ns = subset_ns->parent;
> + }
> +
> + return false;
> +}
> +
>  /**
>   * commit_creds - Install new credentials upon the current task
>   * @new: The credentials to be assigned
> @@ -493,7 +517,7 @@ int commit_creds(struct cred *new)
>   !gid_eq(old->egid, new->egid) ||
>   !uid_eq(old->fsuid, new->fsuid) ||
>   !gid_eq(old->fsgid, new->fsgid) ||
> - !cap_issubset(new->cap_permitted, old->cap_permitted)) {
> + !cred_cap_issubset(old, new)) {
>   if (task->mm)
>   set_dumpable(task->mm, suid_dumpable);
>   task->pdeath_signal = 0;
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] cpuidle: fix sysfs output for power_usage

2012-12-14 Thread Rafael J. Wysocki

On Friday, December 14, 2012 03:17:37 PM Sivaram Nair wrote:
> cpuidle_state->power_usage is signed; so change the corresponding sysfs
> ops to output signed value instead of unsigned.

What's actually wrong with printing it as an unsigned int?

Rafael


> Signed-off-by: Sivaram Nair 
> ---
>  drivers/cpuidle/sysfs.c |9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
> index 3409429..2fc79cd 100644
> --- a/drivers/cpuidle/sysfs.c
> +++ b/drivers/cpuidle/sysfs.c
> @@ -232,6 +232,13 @@ static struct cpuidle_state_attr attr_##_name = 
> __ATTR(_name, 0644, show, store)
>  static ssize_t show_state_##_name(struct cpuidle_state *state, \
>struct cpuidle_state_usage *state_usage, char *buf) \
>  { \
> + return sprintf(buf, "%d\n", state->_name);\
> +}
> +
> +#define define_show_state_u_function(_name) \
> +static ssize_t show_state_##_name(struct cpuidle_state *state, \
> +  struct cpuidle_state_usage *state_usage, char *buf) \
> +{ \
>   return sprintf(buf, "%u\n", state->_name);\
>  }
>  
> @@ -270,7 +277,7 @@ static ssize_t show_state_##_name(struct cpuidle_state 
> *state, \
>   return sprintf(buf, "%s\n", state->_name);\
>  }
>  
> -define_show_state_function(exit_latency)
> +define_show_state_u_function(exit_latency)
>  define_show_state_function(power_usage)
>  define_show_state_ull_function(usage)
>  define_show_state_ull_function(time)
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] please pull infiniband.git

2012-12-14 Thread Roland Dreier

On Fri, Dec 14, 2012 at 7:36 AM, Linus Torvalds
 wrote:
>> Any problem with this tree, or did it just slip through the cracks?
>
> It was merged seven hours before your email. Forgot to check?

No, just dumb-assery in how I fetched in one place and checked in
another.  Sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] cpuidle: fix finding state with min power_usage

2012-12-14 Thread Rafael J. Wysocki

On Friday, December 14, 2012 03:17:36 PM Sivaram Nair wrote:
> Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead
> of -1) to init the local copies so that functions that tries to find
> cpuidle states with minimum power usage works correctly even if they use
> non-negative values.

I'm queuing this up for submission as v3.8 material.

Thanks,
Rafael


> Signed-off-by: Sivaram Nair 
> ---
>  drivers/cpuidle/cpuidle.c|2 +-
>  drivers/cpuidle/governors/menu.c |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 8df53dd..fb4a7dd 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -70,7 +70,7 @@ int cpuidle_play_dead(void)
>   struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>   struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
>   int i, dead_state = -1;
> - int power_usage = -1;
> + int power_usage = INT_MAX;
>  
>   if (!drv)
>   return -ENODEV;
> diff --git a/drivers/cpuidle/governors/menu.c 
> b/drivers/cpuidle/governors/menu.c
> index bd40b94..20ea33a 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -312,7 +312,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
> cpuidle_device *dev)
>  {
>   struct menu_device *data = &__get_cpu_var(menu_devices);
>   int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
> - int power_usage = -1;
> + int power_usage = INT_MAX;
>   int i;
>   int multiplier;
>   struct timespec t;
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin

On 12/14/2012 03:48 PM, John Stultz wrote:
> On 12/14/2012 02:48 PM, H. Peter Anvin wrote:
>> On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:
>>> On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:
>>>
>>>
>>> This won't help in case of scenario you've been pointing in
>>> previous email (where c/r happens in a middle of vdso),
>>> would it? Because we still need somehow to be sure we're not
>>> checkpointing in a middle of signal handler which will return
>>> to some vdso place.
>> It is okay if and only if those vdso places never change... which I
>> think is doable if they only contain trival system call wrappers, i.e.
>> something like:
>>
>> movl $__SYS_gettimeofday, %eax
>> syscall
>> ret
> 
> Though doesn't this make it easier for exploits (somewhat undoing ASLR)?
> I know Andi always wanted to avoid having syscall instructions at a
> fixed location for the old vsyscall code (though I know we had it
> none-the-less for awhile).   But maybe I'm confusing issues here?
> 

They aren't in fixed addresses across processes... the vdso location can
still be randomized.  It just has to be the same across the
checkpoint/restart operation, just like all the other instructions.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] target updates for v3.8-rc1

2012-12-14 Thread Nicholas A. Bellinger

Hello Linus!

Here are the target updates for v3.8-rc1 merge window code.  Please go
ahead and pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next

Just a heads up that there is a minor merge conflict that you'll
encounter in target_handle_task_attr() code, that sfr has been carrying
a fix for recently within -next.  After dropping the HEAD section, the
resolution should look like:

http://git.kernel.org/?p=linux/kernel/git/nab/target-pending.git;a=commitdiff;h=50f966ab0a8630cacb46bb382cd2822a5c1448ac

It has been a very busy development cycle this time around in target
land, with the highlights including:

 - Kill struct se_subsystem_dev, in favor of direct se_device usage (hch)
 - Simplify reservations code by combining SPC-3 + SCSI-2 support 
   for virtual backends only (hch)
 - Simplify ALUA code for virtual only backends, and remove left
   over abstractions (hch)
 - Pass sense_reason_t as return value for I/O submission path (hch)
 - Refactor MODE_SENSE emulation to allow for easier addition of
   new mode pages. (roland)
 - Add emulation of MODE_SELECT (roland)
 - Fix bug in handling of ExpStatSN wrap-around (steve)
 - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
 - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
 - Convert ib_srpt to use modern target_submit_cmd caller +
   drop legacy ioctx->kref usage (nab)
 - Convert ib_srpt to use modern target_submit_tmr caller (nab)
 - Add link_magic for fabric allow_link destination target_items
   for symlinks within target_core_fabric_configfs.c code (nab)
 - Allocate pointers in instead of full structs for
   config_group->default_groups (sebastian)
 - Fix 32-bit highmem breakage for FILEIO (sebastian)

All told, hch was able to shave off another ~1K LOC by killing the
se_subsystem_dev abstraction, along with a number of PR + ALUA
simplifications.  Also, a nice patch by Roland is the refactoring of
MODE_SENSE handling, along with the addition of initial MODE_SELECT
emulation support for virtual backends.

Sebastian found a long-standing issue wrt to allocation of full
config_group instead of pointers for config_group->default_group[]
setup in a number of areas, which ends up saving memory with big
configurations.  He also managed to fix another long-standing BUG wrt to
broken 32-bit highmem support within the FILEIO backend driver.

Thank you again to everyone who contributed this round!

--nab

Andy Grover (1):
  target/iscsi_target: Add NodeACL tags for initiator group support

Chris Boot (2):
  sbp-target: use simple assignment in tgt_agent_rw_agent_state()
  sbp-target: fix error path in sbp_make_tpg()

Christoph Hellwig (11):
  target: kill struct se_subsystem_dev
  target: rename spc_ops
  target: move REPORT LUNS emulation to target_core_spc.c
  target/pscsi: call spc_emulate_report_luns directly
  target: provide generic sbc device type/revision helpers
  pscsi: fix REPORT LUNS handling
  target: kill dev->dev_task_attr_type
  target: simplify reservations code
  target: simplify alua support
  target: remove ->get_device_rev
  target: pass sense_reason as a return value

Dan Carpenter (1):
  target: update error handling for sbc_setup_write_same()

Fengguang Wu (1):
  target/pscsi: Make pscsi_configure_device + target_release_session
static

Kees Cook (1):
  sbp-target: remove depends on CONFIG_EXPERIMENTAL

Nicholas Bellinger (14):
  target: Fix incorrect starting offset after MODE_SENSE refactoring
  target: Fix incorrect inversion of TPGS_EXPLICT_ALUA check
  target: Fix possible TFO->write_pending() sense_reason_t silent WRITE
corruption
  target: Fix exception path pr_reg put regression for PR RELEASE
  target: Change sbc_emulate_noop to return sense_reason_t
  target/sbc: Seperate WRITE_SAME based on UNMAP flag in sbc_ops
  target: Add/check max_write_same_len device attribute + update block
limits VPD
  target/iblock: Add WRITE_SAME w/ UNMAP=0 emulation support
  target: Update copyright information to 2012
  target/iblock: Forward declare bio helpers
  target: Make spc_get_write_same_sectors return sector_t
  ib_srpt: Convert I/O path to target_submit_cmd + drop legacy
ioctx->kref
  ib_srpt: Convert TMR path to target_submit_tmr
  target: Add link_magic for fabric allow_link destination target_items

Roland Dreier (8):
  iscsi-target: Use list_first_entry() where appropriate
  target: Refactor MODE SENSE emulation
  target: Implement mode page 0x1c, "Informational Exceptions"
  target: Add emulation for MODE SELECT
  iscsi-target: Fix potential deadlock on lock taken in timer
  iscsi-target: Always send a response before terminating iSCSI
connection
  target: Clean up logic in transport_put_cmd()
  target: Clean up flow in transport_check_aborted_status()

Sachin Kamat (1):
  iscsi_target: Remove redundant null check before kfree

Sebastian Andrzej Siewior (6):
  target/configfs: allocate pointers instead of full struct for
default_groups

Re: [PATCH RESEND 0/6 v10] gpio: Add block GPIO

2012-12-14 Thread Roland Stigge

Hi Wolfgang,

thank you for the patch!

On 14/12/12 18:58, Wolfgang Grandegger wrote:
> +static void at91_gpiolib_set_block(struct gpio_chip *chip, unsigned long 
> mask, unsigned long val)
> +{
> + struct at91_gpio_chip *at91_gpio = to_at91_gpio_chip(chip);
> + void __iomem *pio = at91_gpio->regbase;
> +
> + __raw_writel(mask, pio + (val ? PIO_SODR : PIO_CODR));
> +}
> +

Without having an AT91 available right now, I guess the hardware
interface of this GPIO chip is different from the GPIO block API. While
the hardware has clear and set registers, the val parameter of
at91_gpiolib_set_block() should be interpreted as the actual output
values. See lpc32xx_gpo_set_block() for an example for handling set and
clear registers like this: First, set_bits and clear_bits words are
calculated from mask and val parameters, and finally written to the
respective hardware registers.

Note that one .set_block() can result in writing both the set and clear
registers of the hardware when val contains both 0s and 1s in
respectively masked positions.

Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread John Stultz


On 12/14/2012 02:48 PM, H. Peter Anvin wrote:

On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote:

On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote:


This won't help in case of scenario you've been pointing in
previous email (where c/r happens in a middle of vdso),
would it? Because we still need somehow to be sure we're not
checkpointing in a middle of signal handler which will return
to some vdso place.

It is okay if and only if those vdso places never change... which I
think is doable if they only contain trival system call wrappers, i.e.
something like:

movl $__SYS_gettimeofday, %eax
syscall
ret


Though doesn't this make it easier for exploits (somewhat undoing ASLR)? 
I know Andi always wanted to avoid having syscall instructions at a 
fixed location for the old vsyscall code (though I know we had it 
none-the-less for awhile).   But maybe I'm confusing issues here?


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] x86/uapi for 3.8

2012-12-14 Thread David Howells

Linus Torvalds  wrote:

> Yeah, I think I have most of the x86 stuff merged now (just merged the
> EFI and ACPI trees), and at this point it might be worth regenerating
> it and getting this over and done with.

Okay, regenerated and pushed.

David
---
The following changes since commit d42b3a2906a10b732ea7d7f849d49be79d242ef0:

  Merge branch 'core-efi-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2012-12-14 10:08:40 
-0800)

are available in the git repository at:


  git://git.infradead.org/users/dhowells/linux-headers.git 
tags/disintegrate-x86-20121214

for you to fetch changes up to af170c5061dd78512c469e6e2d211980cdb2c193:

  UAPI: (Scripted) Disintegrate arch/x86/include/asm (2012-12-14 22:37:13 +)


UAPI disintegration 2012-12-14


David Howells (1):
  UAPI: (Scripted) Disintegrate arch/x86/include/asm

 arch/x86/include/asm/Kbuild   |  26 ---
 arch/x86/include/asm/boot.h   |   9 +-
 arch/x86/include/asm/debugreg.h   |  79 +---
 arch/x86/include/asm/e820.h   |  74 +---
 arch/x86/include/asm/hw_breakpoint.h  |   5 +-
 arch/x86/include/asm/ist.h|  17 +-
 arch/x86/include/asm/kvm_para.h   |  99 +-
 arch/x86/include/asm/mce.h| 119 +---
 arch/x86/include/asm/msr.h|  11 +-
 arch/x86/include/asm/mtrr.h   |  93 +
 arch/x86/include/asm/posix_types.h|  10 -
 arch/x86/include/asm/processor-flags.h|  97 +-
 arch/x86/include/asm/ptrace.h |  75 +---
 arch/x86/include/asm/setup.h  |   5 +-
 arch/x86/include/asm/sigcontext.h | 216 +
 arch/x86/include/asm/signal.h | 140 +-
 arch/x86/include/asm/svm.h| 132 +
 arch/x86/include/asm/unistd.h |  14 +-
 arch/x86/include/asm/vm86.h   | 128 +
 arch/x86/include/asm/vmx.h|  89 +
 arch/x86/include/asm/vsyscall.h   |  16 +-
 arch/x86/include/uapi/asm/Kbuild  |  58 ++
 arch/x86/include/{ => uapi}/asm/a.out.h   |   0
 arch/x86/include/{ => uapi}/asm/auxvec.h  |   0
 arch/x86/include/{ => uapi}/asm/bitsperlong.h |   0
 arch/x86/include/uapi/asm/boot.h  |  10 +
 arch/x86/include/{ => uapi}/asm/bootparam.h   |   0
 arch/x86/include/{ => uapi}/asm/byteorder.h   |   0
 arch/x86/include/uapi/asm/debugreg.h  |  80 
 arch/x86/include/uapi/asm/e820.h  |  75 
 arch/x86/include/{ => uapi}/asm/errno.h   |   0
 arch/x86/include/{ => uapi}/asm/fcntl.h   |   0
 arch/x86/include/{ => uapi}/asm/hyperv.h  |   0
 arch/x86/include/{ => uapi}/asm/ioctl.h   |   0
 arch/x86/include/{ => uapi}/asm/ioctls.h  |   0
 arch/x86/include/{ => uapi}/asm/ipcbuf.h  |   0
 arch/x86/include/uapi/asm/ist.h   |  29 +++
 arch/x86/include/{ => uapi}/asm/kvm.h |   0
 arch/x86/include/uapi/asm/kvm_para.h  | 100 ++
 arch/x86/include/{ => uapi}/asm/ldt.h |   0
 arch/x86/include/uapi/asm/mce.h   | 121 
 arch/x86/include/{ => uapi}/asm/mman.h|   0
 arch/x86/include/{ => uapi}/asm/msgbuf.h  |   0
 arch/x86/include/{ => uapi}/asm/msr-index.h   |   0
 arch/x86/include/uapi/asm/msr.h   |  15 ++
 arch/x86/include/uapi/asm/mtrr.h  | 117 
 arch/x86/include/{ => uapi}/asm/param.h   |   0
 arch/x86/include/{ => uapi}/asm/perf_regs.h   |   0
 arch/x86/include/{ => uapi}/asm/poll.h|   0
 arch/x86/include/uapi/asm/posix_types.h   |   9 +
 arch/x86/include/{ => uapi}/asm/posix_types_32.h  |   0
 arch/x86/include/{ => uapi}/asm/posix_types_64.h  |   0
 arch/x86/include/{ => uapi}/asm/posix_types_x32.h |   0
 arch/x86/include/{ => uapi}/asm/prctl.h   |   0
 arch/x86/include/uapi/asm/processor-flags.h   |  99 ++
 arch/x86/include/{ => uapi}/asm/ptrace-abi.h  |   0
 arch/x86/include/uapi/asm/ptrace.h|  78 
 arch/x86/include/{ => uapi}/asm/resource.h|   0
 arch/x86/include/{ => uapi}/asm/sembuf.h  |   0
 arch/x86/include/{ => uapi}/asm/shmbuf.h  |   0
 arch/x86/include/uapi/asm/sigcontext.h| 221 ++
 arch/x86/include/{ => uapi}/asm/sigcontext32.h|   0
 arch/x86/include/{ => uapi}/asm/siginfo.h |   0
 arch/x86/include/uapi/asm/signa

Re: [PATCH v2] NTP: Add a CONFIG_RTC_SYSTOHC configuration

2012-12-14 Thread John Stultz


On 12/14/2012 03:19 PM, Jason Gunthorpe wrote:

The purpose of this option is to allow ARM/etc systems that rely on the
class RTC subsystem to have the same kind of automatic NTP based
synchronization that we have on PC platforms. Today ARM does not
implement update_persistent_clock and makes extensive use of the class
RTC system.

When enabled CONFIG_RTC_SYSTOHC will provide a generic
rtc_update_persistent_clock that stores the current time in the RTC and
is intended complement the existing CONFIG_RTC_HCTOSYS option that loads
the RTC at boot.

Like with RTC_HCTOSYS the platform's update_persistent_clock is used
first, if it works. Platforms with mixed class RTC and non-RTC drivers
need to return ENODEV when class RTC should be used. Such an update for
PPC is included in this patch.

Long term, implementations of update_persistent_clock should migrate to
proper class RTC drivers and use CONFIG_RTC_SYSTOHC instead.


Ok. This all sounds good.


Still one minor question below.

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 19c03ab..7b3702b 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -28,6 +28,7 @@ config RTC_HCTOSYS
  config RTC_HCTOSYS_DEVICE
string "RTC used to set the system time"
depends on RTC_HCTOSYS = y
+   depends on RTC_SYSTOHC = y


Is this right?

This should probably be a depends on RTC_HCTOSYS OR RTC_SYSTOCH.. 
Otherwise you have to select both in order to change the default device.



diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
new file mode 100644
index 000..a625740
--- /dev/null
+++ b/drivers/rtc/systohc.c
@@ -0,0 +1,44 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ */
+#include 
+#include 
+
+/**
+ * rtc_update_persistent_clock - Save NTP synchronized time to the RTC
+ * @now: Current time of day
+ *
+ * Replacement for the NTP platform function update_persistent_clock
+ * that stores time for later retrieval by rtc_hctosys
+ *
+ * Returns 0 on successful RTC update, -ENODEV if a RTC update is not
+ * possible at all, and various other -errno for specific temporary failure
+ * cases.
+ *
+ * If temporary failure is indicated the caller should try again 'soon'
+ */
+int rtc_update_persistent_clock(struct timespec now)


If we're going to move away from update_persistent_clock across the 
board (and as the update path doesn't have the same constraints of the 
read_persistent_clock interface), might it be better just to name this: 
rtc_update_clock()  (or something similar)?


That way if/when we do finally remove the other users of 
update_persistent_clock() and move them to an RTC driver, we will avoid 
any confusion between read/update.


This is in the similar vein of your suggestion of changing 
update_persistent_clock to platform_save_ntp_time_to_rtc()..


Sorry for the last minute nit! Other then these two issues, I'm happy to 
queue this (if Alessandro doesn't object).  Although with the merge 
window already open it may have to wait to 3.9, but I'll see what Thomas 
says.


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCIe/PM: Do not suspend port if any subordinate device need PME polling

2012-12-14 Thread Rafael J. Wysocki

On Friday, December 14, 2012 10:52:11 AM Huang Ying wrote:
> In
> 
>   http://www.mail-archive.com/linux-usb@vger.kernel.org/msg07976.html
> 
> Ulrich reported that his USB3 cardreader does not work reliably when
> connected to the USB3 port.  It turns out that USB3 controller failed
> to be waken up when plugging in the USB3 cardreader.  Further
> experiment found that the USB3 host controller can only be waken up
> via polling, while not via PME interrupt.  But if the PCIe port that
> the USB3 host controller is connected is suspended, we can not poll
> the USB3 host controller because its config space is not accessible if
> the PCIe port is put into low power state.
> 
> To solve the issue, the PCIe port will not be suspended if any
> subordinate device need PME polling.
> 
> Reported-by: Ulrich Eckhardt 
> Signed-off-by: Huang Ying 
> Tested-by: Sarah Sharp 
> Cc: sta...@vger.kernel.org# 3.6+
> ---
>  drivers/pci/pcie/portdrv_pci.c |   18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -134,10 +134,26 @@ static int pcie_port_runtime_resume(stru
>   return 0;
>  }
>  
> +static int pci_dev_pme_poll(struct pci_dev *pdev, void *data)
> +{
> + int *pme_poll = data;
> + *pme_poll = *pme_poll || pdev->pme_poll;

I would write that as

*pme_poll ||= pdev->pme_poll;

It is not a big deal, though.

> + return 0;
> +}
> +
>  static int pcie_port_runtime_idle(struct device *dev)
>  {
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int pme_poll = false;
> +
> + /*
> +  * If any subordinate device needs pme poll, we should keep
> +  * the port in D0, because we need port in D0 to poll it.
> +  */
> + pci_walk_bus(pdev->subordinate, pci_dev_pme_poll, _poll);
>   /* Delay for a short while to prevent too frequent suspend/resume */
> - pm_schedule_suspend(dev, 10);
> + if (!pme_poll)
> + pm_schedule_suspend(dev, 10);
>   return -EBUSY;
>  }
>  #else

Acked-by: Rafael J. Wysocki 


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ACPI/APEI: Fix the returned value in erst_dbg_read

2012-12-14 Thread Rafael J. Wysocki

On Friday, December 14, 2012 04:08:34 PM Adrian Huang wrote:
> If the persistent store is empty initially, the function 'erst_dbg_read'
> returns a nonzero value. The better way is to return a zero indicating the
> read operation reaches EOF.
> 
> Tested on two different servers.

I'm queuing this up for submission as v3.8 material.

Thanks,
Rafael


> Signed-off-by: Adrian Huang 
> ---
>  drivers/acpi/apei/erst-dbg.c |   11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/erst-dbg.c b/drivers/acpi/apei/erst-dbg.c
> index 903549d..04ab5c9 100644
> --- a/drivers/acpi/apei/erst-dbg.c
> +++ b/drivers/acpi/apei/erst-dbg.c
> @@ -111,8 +111,17 @@ retry_next:
>   if (rc)
>   goto out;
>   /* no more record */
> - if (id == APEI_ERST_INVALID_RECORD_ID)
> + if (id == APEI_ERST_INVALID_RECORD_ID) {
> + /*
> +  * If the persistent store is empty initially, the function
> +  * 'erst_read' below will return "-ENOENT" value. This causes
> +  * 'retry_next' label is entered again. The returned value
> +  * should be zero indicating the read operation is EOF.
> +  */
> + len = 0;
> +
>   goto out;
> + }
>  retry:
>   rc = len = erst_read(id, erst_dbg_buf, erst_dbg_buf_len);
>   /* The record may be cleared by others, try read next record */
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] cpuidle: coupled: fix ready counter decrement

2012-12-14 Thread Santosh Shilimkar


On Friday 14 December 2012 09:42 AM, Sivaram Nair wrote:

The ready_waiting_counts atomic variable is compared against the wrong
online cpu count. The latter is computed incorrectly using logical-OR
instead of bit-OR. This patch fixes that.

Signed-off-by: Sivaram Nair 
---

Looks right.
Acked-by: Santosh Shilimkar 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] cpuidle: coupled: fix ready counter decrement

2012-12-14 Thread Rafael J. Wysocki

On Friday, December 14, 2012 10:42:08 AM Sivaram Nair wrote:
> The ready_waiting_counts atomic variable is compared against the wrong
> online cpu count. The latter is computed incorrectly using logical-OR
> instead of bit-OR. This patch fixes that.

I'm queuing this up for submission as v3.8 material.

I suppose it should be marked for -stable too?

Rafael


> Signed-off-by: Sivaram Nair 
> ---
>  drivers/cpuidle/coupled.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> index 3265844..2a297f8 100644
> --- a/drivers/cpuidle/coupled.c
> +++ b/drivers/cpuidle/coupled.c
> @@ -209,7 +209,7 @@ inline int cpuidle_coupled_set_not_ready(struct 
> cpuidle_coupled *coupled)
>   int all;
>   int ret;
>  
> - all = coupled->online_count || (coupled->online_count << WAITING_BITS);
> + all = coupled->online_count | (coupled->online_count << WAITING_BITS);
>   ret = atomic_add_unless(>ready_waiting_counts,
>   -MAX_WAITING_CPUS, all);
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] userns: Fix typo in description of the limitation of userns_install

2012-12-14 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> Signed-off-by: "Eric W. Biederman" 

Acked-by: Serge Hallyn 

> ---
>  kernel/user_namespace.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index f5975cc..2b042c4 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -799,7 +799,7 @@ static int userns_install(struct nsproxy *nsproxy, void 
> *ns)
>   if (user_ns == current_user_ns())
>   return -EINVAL;
>  
> - /* Threaded many not enter a different user namespace */
> + /* Threaded processes may not enter a different user namespace */
>   if (atomic_read(>mm->mm_users) > 1)
>   return -EINVAL;
>  
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] userns: Require CAP_SYS_ADMIN for most uses of setns.

2012-12-14 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> Andy Lutomirski  found a nasty little bug in
> the permissions of setns.  With unprivileged user namespaces it
> became possible to create new namespaces without privilege.
> 
> However the setns calls were relaxed to only require CAP_SYS_ADMIN in
> the user nameapce of the targed namespace.
> 
> Which made the following nasty sequence possible.
> 
> pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
> if (pid == 0) { /* child */
>   system("mount --bind /home/me/passwd /etc/passwd");
> }
> else if (pid != 0) { /* parent */
>   char path[PATH_MAX];
>   snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
>   fd = open(path, O_RDONLY);
>   setns(fd, 0);
>   system("su -");
> }
>
> Prevent this possibility by requiring CAP_SYS_ADMIN
> in the current user namespace when joing all but the user namespace.
> 
> Signed-off-by: "Eric W. Biederman" 

Acked-by: Serge Hallyn 

> ---
>  fs/namespace.c   |3 ++-
>  ipc/namespace.c  |3 ++-
>  kernel/pid_namespace.c   |3 ++-
>  kernel/utsname.c |3 ++-
>  net/core/net_namespace.c |3 ++-
>  5 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c1bbe86..398a50f 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2781,7 +2781,8 @@ static int mntns_install(struct nsproxy *nsproxy, void 
> *ns)
>   struct path root;
>  
>   if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
> - !nsown_capable(CAP_SYS_CHROOT))
> + !nsown_capable(CAP_SYS_CHROOT) ||
> + !nsown_capable(CAP_SYS_ADMIN))
>   return -EPERM;
>  
>   if (fs->users != 1)
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index cf3386a..7c1fa45 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -170,7 +170,8 @@ static void ipcns_put(void *ns)
>  static int ipcns_install(struct nsproxy *nsproxy, void *new)
>  {
>   struct ipc_namespace *ns = new;
> - if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> + !nsown_capable(CAP_SYS_ADMIN))
>   return -EPERM;
>  
>   /* Ditch state from the old ipc namespace */
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 560da0d..fdbd0cd 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -325,7 +325,8 @@ static int pidns_install(struct nsproxy *nsproxy, void 
> *ns)
>   struct pid_namespace *active = task_active_pid_ns(current);
>   struct pid_namespace *ancestor, *new = ns;
>  
> - if (!ns_capable(new->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(new->user_ns, CAP_SYS_ADMIN) ||
> + !nsown_capable(CAP_SYS_ADMIN))
>   return -EPERM;
>  
>   /*
> diff --git a/kernel/utsname.c b/kernel/utsname.c
> index f6336d5..08b197e 100644
> --- a/kernel/utsname.c
> +++ b/kernel/utsname.c
> @@ -113,7 +113,8 @@ static int utsns_install(struct nsproxy *nsproxy, void 
> *new)
>  {
>   struct uts_namespace *ns = new;
>  
> - if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> + !nsown_capable(CAP_SYS_ADMIN))
>   return -EPERM;
>  
>   get_uts_ns(ns);
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 2e9a313..8acce01 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -649,7 +649,8 @@ static int netns_install(struct nsproxy *nsproxy, void 
> *ns)
>  {
>   struct net *net = ns;
>  
> - if (!ns_capable(net->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(net->user_ns, CAP_SYS_ADMIN) ||
> + !nsown_capable(CAP_SYS_ADMIN))
>   return -EPERM;
>  
>   put_net(nsproxy->net_ns);
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-14 Thread H. Peter Anvin

On 12/14/2012 03:09 PM, Stefani Seibold wrote:
> 
> Sorry for not following the discussion, but im am currently trying to
> compile the vclocktime.c as a 32 bit object. Most of the (clever) work
> is done.
> 
> After this the next step is to map the needed fixmaps into the 32 bit
> address space. Maybe this can be done with install_special_mapping().
> 

install_special_mapping() is indeed how it is done.  The suggestion is
to make the vvar page an actual section inside the vdso, and then just
substitute the vvar page into the mapping array when installing the vdso
into the process user space.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq_stats: fix race between stats allocation and first usage

2012-12-14 Thread Rafael J. Wysocki

On Friday, December 14, 2012 02:59:21 PM Konstantin Khlebnikov wrote:
> This patch forces complete struct cpufreq_stats allocation for all cpus before
> registering CPUFREQ_TRANSITION_NOTIFIER notifier, otherwise in some conditions
> cpufreq_stat_notifier_trans() can be called in the middle of stats allocation,
> in this case cpufreq_stats_table already exists, but stat->freq_table is NULL.

I'll queue it up for submission as v3.8 material.

Does it need to be marked as -stable material too?

Rafael


> Signed-off-by: Konstantin Khlebnikov 
> Cc: Rafael J. Wysocki 
> Cc: cpufreq 
> Cc: linux-pm 
> 
> ---
> 
> <1>[  363.116198] BUG: unable to handle kernel NULL pointer dereference at 
> (null)
> <1>[  363.116668] IP: [] 
> cpufreq_stat_notifier_trans+0x64/0xf0 [cpufreq_stats]
> <4>[  363.116977] PGD 23177e067 PUD 2349c1067 PMD 0
> <4>[  363.117151] Oops:  [#1] SMP
> <4>[  363.117151] last sysfs file: /sys/module/freq_table/initstate
> <4>[  363.117151] CPU 5
> <4>[  363.117151] Modules linked in: cpufreq_stats(+)(U) [a lot] [last 
> unloaded: umc]
> <4>[  363.117151]
> <4>[  363.117151] Pid: 1690, comm: kondemand/5 veid: 0 Tainted: PWC 
> ---  T 2.6.32-279.5.1.el6-042stab061.7-vz #112 042stab061_7 
> System manufacturer System Product Name/Crosshair IV Formula
> <4>[  363.117151] RIP: 0010:[]  [] 
> cpufreq_stat_notifier_trans+0x64/0xf0 [cpufreq_stats]
> <4>[  363.117151] RSP: 0018:880234281920  EFLAGS: 00010246
> <4>[  363.117151] RAX: 001e12e8 RBX:  RCX: 
> 002ab980
> <4>[  363.117151] RDX: 0004 RSI:  RDI: 
> 0005
> <4>[  363.117151] RBP: 880234281940 R08:  R09: 
> 
> <4>[  363.117151] R10:  R11:  R12: 
> 880218ce7400
> <4>[  363.117151] R13:  R14:  R15: 
> 
> <4>[  363.117151] FS:  7f499ffe0700() GS:88003100() 
> knlGS:
> <4>[  363.117151] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> <4>[  363.117151] CR2:  CR3: 000230af7000 CR4: 
> 06e0
> <4>[  363.117151] DR0:  DR1:  DR2: 
> 
> <4>[  363.117151] DR3:  DR6: 0ff0 DR7: 
> 0400
> <4>[  363.117151] Process kondemand/5 (pid: 1690, veid: 0, threadinfo 
> 88023428, task 8802330c48c0)
> <4>[  363.117151] Stack:
> <4>[  363.117151]  810cf4f3 0001  
> a11a7ac0
> <4>[  363.117151]  880234281990 815454a8 880234281c80 
> 
> <4>[  363.117151]  880234281a10 833be978 833be8e0 
> 0001
> <4>[  363.117151] Call Trace:
> <4>[  363.117151]  [] ? is_module_text_address+0x23/0x30
> <4>[  363.117151]  [] notifier_call_chain+0x58/0xb0
> <4>[  363.117151]  [] __srcu_notifier_call_chain+0x5d/0x90
> <4>[  363.117151]  [] srcu_notifier_call_chain+0x16/0x20
> <4>[  363.117151]  [] cpufreq_notify_transition+0x12a/0x190
> <4>[  363.117151]  [] powernowk8_target+0x628/0xb30 
> [powernow_k8]
> <4>[  363.117151]  [] __cpufreq_driver_target+0x8b/0x90
> <4>[  363.117151]  [] do_dbs_timer+0x3b8/0x3bc 
> [cpufreq_ondemand]
> <4>[  363.117151]  [] ? do_dbs_timer+0x0/0x3bc 
> [cpufreq_ondemand]
> <4>[  363.117151]  [] worker_thread+0x264/0x440
> <4>[  363.117151]  [] ? worker_thread+0x213/0x440
> <4>[  363.117151]  [] ? worker_thread+0x0/0x440
> <4>[  363.117151]  [] ? autoremove_wake_function+0x0/0x40
> <4>[  363.117151]  [] ? worker_thread+0x0/0x440
> <4>[  363.117151]  [] kthread+0x96/0xa0
> <4>[  363.117151]  [] child_rip+0xa/0x20
> <4>[  363.117151]  [] ? restore_args+0x0/0x30
> <4>[  363.117151]  [] ? kthread+0x0/0xa0
> <4>[  363.117151]  [] ? child_rip+0x0/0x20
> <4>[  363.117151] Code: 89 f9 48 8b 0c cd 20 53 9c 81 4c 8b 24 08 4d 85 e4 74 
> d3 8b 4a 08 41 8b 54 24 10 45 8b 6c 24 18 85 d2 74 22 49 8b 74 24 28 31 db 
> <3b> 0e 75 10 eb 1a 66 0f 1f 44 00 00 48 63 c3 3b 0c 86 74 0c 83
> <1>[  363.117151] RIP  [] 
> cpufreq_stat_notifier_trans+0x64/0xf0 [cpufreq_stats]
> <4>[  363.117151]  RSP 
> <4>[  363.117151] CR2: 
> ---
>  drivers/cpufreq/cpufreq_stats.c |   11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index e40e508..9d7732b 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -364,18 +364,21 @@ static int __init cpufreq_stats_init(void)
>   if (ret)
>   return ret;
>  
> + register_hotcpu_notifier(_stat_cpu_notifier);
> + for_each_online_cpu(cpu)
> + cpufreq_update_policy(cpu);
> +
>   ret = cpufreq_register_notifier(_trans_block,
>   CPUFREQ_TRANSITION_NOTIFIER);
>   if (ret) {
>   cpufreq_unregister_notifier(_policy_block,
>

Minimum toolchain requirements?

2012-12-14 Thread Rob Landley

Although the README and Documentation/Changes both say the kernel  
builds with gcc 3.2, this is no loner the case. In reality the new 3.7  
kernel no longer builds under unpatched gcc 4.2.1 (the last GPLv2  
release).


Building for i686 breaks with "arch/x86/kernel/cpu/perf_event_p6.c:22:  
error: p6_hw_cache_event_ids causes a section type conflict" (trivial  
workaround: patch kernel so CONFIG_BROKEN_RODATA defaults to y).  
Building for mips breaks with "arch/mips/lib/delay.c:24:5: warning:  
"__SIZEOF_LONG__" is not defined". (Introduced January 2007, gcc git  
commit 6a60f216c210e. Easy enough to add an equivalent to my toolchain.)


Over in my Aboriginal Linux project I've been patching both the kernel  
and my GPLv2 toolchain for a while to work around these random  
breakages (such as when sh4 decided it would only build with binutils  
2.20, which had only been out for 3 months at the time). But since I'm  
supposed to be catching Documentation stuff that falls through the  
cracks, if the docs are clearly out of date I should probably update  
them.


I'm still regression testing each new kernel release against gcc 4.2.1,  
binutils 2.17, and make 3.81 (I.E. the last GPLv2 release of each  
tool). My personal preference would be to upgrade Documentation/Changes  
to say "gcc 4.2.1 and binutils 2.17 are the oldest supported versions",  
and then try to push patches upstream that prevent the kernel from  
building on those. Unfortunately, when I try to push patches to make  
older toolchains work, the reception isn't exactly warm.


So I ask the question: what are the current minimum requirements to  
build the kernel?


Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] m68k updates for 3.8

2012-12-14 Thread Rob Landley


On 12/14/2012 06:04:51 AM, Greg Ungerer wrote:

Hi Rob,

...
Somebody got one of my images to boot under aranym but they had to  
patch

the kernel fairly extensively to add the emulated device support that
emulator provided. It doesn't emulate real devices the way qemu does,
but qemu doesn't fully emulate the processor (just coldfire in  
mainline)...


I use aranym for testing m68k. Though I don't really pound to heavily
on the devices. I really only cross-compile small systems for testing
on it.


What kernel config do you use for aranym? I don't see an an aranym  
entry in
arch/m68k/configs, and I stopped using it precisely because it required  
several large patches to add emulated device support for everything  
from serial console to block devices. (There was a kernel upgrade, it  
broke, I cut a release without it. Pretty much the same reason I  
stopped using squashfs for a year or so until it finally got merged.)


I can poke Laurent Vivier about possibly getting the qemu-system-m68k  
and the q800 board emulation to work better if there's interest from  
anyone other than me. (I just checked and it dies at the same place it  
did last year: setting up the page tables. The MMU emulation ain't  
there, and I haven't got documentation for it.)


My interest is that my aboriginal linux setup builds the same system  
for a dozen different targets and then natively builds packages inside  
the emulator. This allows me to regression test if their behavior  
diverges, even from a cron job if I want to. From my viewpoint, the  
more targets the merrier.


(I don't care hugely about which board emulation I'm using, the point  
is to run a native root filesystem including a native toolchain and  
build stuff locally on the board. This requires at least 256 megs of  
memory in the emulated board for gcc 4.2 (more for newer versions), and  
ideally I want a virtual network card so I can hook up distcc to the  
cross compiler and move the heavy lifting of compilation outside the  
emulator without reintroducing the whole "keep track of two  
simultaneous build contexts" complexity of cross compiling. So it's not  
"q800 vs aranym", it's "I already got qemu to emulate all the other  
targets I'm testing and it doesn't require an extensively patched  
kernel" vs "other emulator requiring patched kernel"...)


Thanks,

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] timekeeping: Add persistent_clock_exist flag

2012-12-14 Thread John Stultz


On 12/14/2012 01:56 PM, Jason Gunthorpe wrote:

On Fri, Dec 14, 2012 at 01:22:50PM -0800, John Stultz wrote:


Although from a timekeeping perspective, the read_persistent_clock()
interface is actually *much* preferred over the rtc HCTOSYS device.

Since read_persistent_clock() has the requirement that its safe to
call with IRQs disabled, we can use it in the timekeeping
suspend/resume code, which allows for better time accuracy.

Sure, but my view on this is that it has nothing to do with
read_persistent_clock. If the RTC driver can run with IRQs off is a
property of the RTC driver and RTC hardware - it has nothing to do
with the platform. ARM platforms will vary on a machine by machine
basis. The rtc-mv driver used on my ARM system is perfectly
re-entrant, lots of rtc on SOC drivers will be the same.

If this is the only thing keeping you on read_persistent_clock, for
real RTCs, then how about a RTC_DEV_SAFE_READ flag (or whatever) in
rtc_device.flags?

Reserve read_persistent_clock for things like that very specialized
non-RTC ARM counter.
Something like this could work, although I worry it only causes even 
more code paths:
1) read_persistent_clock for non-RTC counters, called from 
timekeeping_suspend/resume

2) IRQ safe RTC called from timekeeping_suspend/resume
3) Non-IRQ safe RTC suspend/resume logic



While we're suggesting cleanups, the RTC Kconfig choices probably
need a cleanup too, as  the list of all possible drivers can be
confusing, when usually each architecture has only a few that they

That is a general pain with the new 'everything is a driver'..
True, and maybe something I just have to live with.  But I can still 
make holiday wish lists :)


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] NTP: Add a CONFIG_RTC_SYSTOHC configuration

2012-12-14 Thread Jason Gunthorpe

The purpose of this option is to allow ARM/etc systems that rely on the
class RTC subsystem to have the same kind of automatic NTP based
synchronization that we have on PC platforms. Today ARM does not
implement update_persistent_clock and makes extensive use of the class
RTC system.

When enabled CONFIG_RTC_SYSTOHC will provide a generic
rtc_update_persistent_clock that stores the current time in the RTC and
is intended complement the existing CONFIG_RTC_HCTOSYS option that loads
the RTC at boot.

Like with RTC_HCTOSYS the platform's update_persistent_clock is used
first, if it works. Platforms with mixed class RTC and non-RTC drivers
need to return ENODEV when class RTC should be used. Such an update for
PPC is included in this patch.

Long term, implementations of update_persistent_clock should migrate to
proper class RTC drivers and use CONFIG_RTC_SYSTOHC instead.

Tested on ARM kirkwood and PPC405

Signed-off-by: Jason Gunthorpe 
---
 arch/powerpc/kernel/time.c |2 +-
 drivers/rtc/Kconfig|9 +
 drivers/rtc/Makefile   |1 +
 drivers/rtc/systohc.c  |   44 
 include/linux/time.h   |1 +
 kernel/time/ntp.c  |   15 +++
 6 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 drivers/rtc/systohc.c

v2 updates:
 - Be very careful to return ENODEV from rtc_update_persistent_clock,
   we don't want to loop on the fast retry path of sync_cmos_clock
   if there is no RTC set support available
 - Call the platform update_persistent_clock first. Only try
   the RTC version if it is compiled out, or explicitly returns ENODEV
 - Added 'depends on RTC_SYSTOHC = y' to KConfig
 - Don't fast rety in sync_cmos_clock if ENODEV is returned
 - Update PPC to return ENODEV if there is no mach specific function
   available in ppc_md. This will give rtc_update_persistent_clock
   a chance.
 - Use rtc_set_time not mms since rtc_hctosys assumes UTC.

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index ce4cb77..bc844a8 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -667,7 +667,7 @@ int update_persistent_clock(struct timespec now)
struct rtc_time tm;
 
if (!ppc_md.set_rtc_time)
-   return 0;
+   return -ENODEV;
 
to_tm(now.tv_sec + 1 + timezone_offset, );
tm.tm_year -= 1900;
diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 19c03ab..7b3702b 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -28,6 +28,7 @@ config RTC_HCTOSYS
 config RTC_HCTOSYS_DEVICE
string "RTC used to set the system time"
depends on RTC_HCTOSYS = y
+   depends on RTC_SYSTOHC = y
default "rtc0"
help
  The RTC device that will be used to (re)initialize the system
@@ -48,6 +49,14 @@ config RTC_HCTOSYS_DEVICE
  sleep states. Do not specify an RTC here unless it stays powered
  during all this system's supported sleep states.
 
+config RTC_SYSTOHC
+   bool "Set the RTC time based on NTP synchronization"
+   default y
+   help
+ If you say yes here, the system time (wall clock) will be stored
+  in the RTC specified by RTC_HCTOSYS_DEVICE approximately every 11
+ minutes if userspace reports synchronized NTP status.
+
 config RTC_DEBUG
bool "RTC debug support"
help
diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
index 56297f0..69d11f1 100644
--- a/drivers/rtc/Makefile
+++ b/drivers/rtc/Makefile
@@ -6,6 +6,7 @@ ccflags-$(CONFIG_RTC_DEBUG) := -DDEBUG
 
 obj-$(CONFIG_RTC_LIB)  += rtc-lib.o
 obj-$(CONFIG_RTC_HCTOSYS)  += hctosys.o
+obj-$(CONFIG_RTC_SYSTOHC)  += systohc.o
 obj-$(CONFIG_RTC_CLASS)+= rtc-core.o
 rtc-core-y := class.o interface.o
 
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
new file mode 100644
index 000..a625740
--- /dev/null
+++ b/drivers/rtc/systohc.c
@@ -0,0 +1,44 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ */
+#include 
+#include 
+
+/**
+ * rtc_update_persistent_clock - Save NTP synchronized time to the RTC
+ * @now: Current time of day
+ *
+ * Replacement for the NTP platform function update_persistent_clock
+ * that stores time for later retrieval by rtc_hctosys
+ *
+ * Returns 0 on successful RTC update, -ENODEV if a RTC update is not
+ * possible at all, and various other -errno for specific temporary failure
+ * cases.
+ *
+ * If temporary failure is indicated the caller should try again 'soon'
+ */
+int rtc_update_persistent_clock(struct timespec now)
+{
+   struct rtc_device *rtc;
+   struct rtc_time tm;
+   int err = -ENODEV;
+
+   if (now.tv_nsec < (NSEC_PER_SEC >> 1))
+   rtc_time_to_tm(now.tv_sec, );
+   else
+

[ 07/37] Input: matrix-keymap - provide proper module license

2012-12-14 Thread Greg Kroah-Hartman

3.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Florian Fainelli 

commit 55220bb3e5f917dd5fee1153c612f9a83599f639 upstream.

The matrix-keymap module is currently lacking a proper module license,
add one so we don't have this module tainting the entire kernel.  This
issue has been present since commit 1932811f426f ("Input: matrix-keymap
- uninline and prepare for device tree support")

Signed-off-by: Florian Fainelli 
Signed-off-by: Dmitry Torokhov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/input/matrix-keymap.c |3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/input/matrix-keymap.c
+++ b/drivers/input/matrix-keymap.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static bool matrix_keypad_map_key(struct input_dev *input_dev,
@@ -161,3 +162,5 @@ int matrix_keypad_build_keymap(const str
return 0;
 }
 EXPORT_SYMBOL(matrix_keypad_build_keymap);
+
+MODULE_LICENSE("GPL");


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 06/37] Staging: ipack/bridges/tpci200: avoid kernel bug when uninstalling a device

2012-12-14 Thread Greg Kroah-Hartman

3.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Samuel Iglesias Gonsálvez 

commit 9e58d05a1b24d2c0471c3b4df8f473a7543d7647 upstream.

Signed-off-by: Samuel Iglesias Gonsálvez 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/staging/ipack/bridges/tpci200.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/staging/ipack/bridges/tpci200.c
+++ b/drivers/staging/ipack/bridges/tpci200.c
@@ -604,8 +604,8 @@ static int tpci200_slot_unregister(struc
if (mutex_lock_interruptible(>mutex))
return -ERESTARTSYS;
 
-   ipack_device_unregister(dev);
tpci200->slots[dev->slot].dev = NULL;
+   ipack_device_unregister(dev);
mutex_unlock(>mutex);
 
return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 25/37] ACPI / PNP: Do not crash due to stale pointer use during system resume

2012-12-14 Thread Greg Kroah-Hartman

3.6-stable review patch.  If anyone has any objections, please let me know.

--

From: "Rafael J. Wysocki" 

commit a6b5e88c0e42093b9057856f35770966c8c591e3 upstream.

During resume from system suspend the 'data' field of
struct pnp_dev in pnpacpi_set_resources() may be a stale pointer,
due to removal of the associated ACPI device node object in the
previous suspend-resume cycle.  This happens, for example, if a
dockable machine is booted in the docking station and then suspended
and resumed and suspended again.  If that happens,
pnpacpi_build_resource_template() called from pnpacpi_set_resources()
attempts to use that pointer and crashes.

However, pnpacpi_set_resources() actually checks the device's ACPI
handle, attempts to find the ACPI device node object attached to it
and returns an error code if that fails, so in fact it knows what the
correct value of dev->data should be.  Use this observation to update
dev->data with the correct value if necessary and dump a call trace
if that's the case (once).

We still need to fix the root cause of this issue, but preventing
systems from crashing because of it is an improvement too.

Reported-and-tested-by: Zdenek Kabelac 
References: https://bugzilla.kernel.org/show_bug.cgi?id=51071
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/pnp/pnpacpi/core.c |3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/pnp/pnpacpi/core.c
+++ b/drivers/pnp/pnpacpi/core.c
@@ -95,6 +95,9 @@ static int pnpacpi_set_resources(struct
return -ENODEV;
}
 
+   if (WARN_ON_ONCE(acpi_dev != dev->data))
+   dev->data = acpi_dev;
+
ret = pnpacpi_build_resource_template(dev, );
if (ret)
return ret;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 36/37] rcu: Fix batch-limit size problem

2012-12-14 Thread Greg Kroah-Hartman

3.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

commit 878d7439d0f45a95869e417576774673d1fa243f upstream.

Commit 29c00b4a1d9e27 (rcu: Add event-tracing for RCU callback
invocation) added a regression in rcu_do_batch()

Under stress, RCU is supposed to allow to process all items in queue,
instead of a batch of 10 items (blimit), but an integer overflow makes
the effective limit being 1.  So, unless there is frequent idle periods
(during which RCU ignores batch limits), RCU can be forced into a
state where it cannot keep up with the callback-generation rate,
eventually resulting in OOM.

This commit therefore converts a few variables in rcu_do_batch() from
int to long to fix this problem, along with the module parameters
controlling the batch limits.

Signed-off-by: Eric Dumazet 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/rcutree.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -212,13 +212,13 @@ DEFINE_PER_CPU(struct rcu_dynticks, rcu_
.dynticks = ATOMIC_INIT(1),
 };
 
-static int blimit = 10;/* Maximum callbacks per rcu_do_batch. 
*/
-static int qhimark = 1;/* If this many pending, ignore blimit. */
-static int qlowmark = 100; /* Once only this many pending, use blimit. */
-
-module_param(blimit, int, 0);
-module_param(qhimark, int, 0);
-module_param(qlowmark, int, 0);
+static long blimit = 10;   /* Maximum callbacks per rcu_do_batch. */
+static long qhimark = 1;   /* If this many pending, ignore blimit. */
+static long qlowmark = 100;/* Once only this many pending, use blimit. */
+
+module_param(blimit, long, 0);
+module_param(qhimark, long, 0);
+module_param(qlowmark, long, 0);
 
 int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */
 int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
@@ -1543,7 +1543,8 @@ static void rcu_do_batch(struct rcu_stat
 {
unsigned long flags;
struct rcu_head *next, *list, **tail;
-   int bl, count, count_lazy, i;
+   long bl, count, count_lazy;
+   int i;
 
/* If no callbacks are ready, just return.*/
if (!cpu_has_callbacks_ready_to_invoke(rdp)) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 11/27] USB: cp210x: add Virtenio Preon32 device id

2012-12-14 Thread Greg Kroah-Hartman

3.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Markus Becker 

commit 356fe44f4b8ece867bdb9876b1854d7adbef9de2 upstream.

Signed-off-by: Markus Becker 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/serial/cp210x.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -118,6 +118,7 @@ static const struct usb_device_id id_tab
{ USB_DEVICE(0x10C4, 0x8477) }, /* Balluff RFID */
{ USB_DEVICE(0x10C4, 0x85EA) }, /* AC-Services IBUS-IF */
{ USB_DEVICE(0x10C4, 0x85EB) }, /* AC-Services CIS-IBUS */
+   { USB_DEVICE(0x10C4, 0x85F8) }, /* Virtenio Preon32 */
{ USB_DEVICE(0x10C4, 0x8664) }, /* AC-Services CAN-IF */
{ USB_DEVICE(0x10C4, 0x8665) }, /* AC-Services OBD-IF */
{ USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 26/37] ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000

2012-12-14 Thread Greg Kroah-Hartman

3.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Zhang Rui 

commit 129ff8f8d58297b04f47b5d6fad81aa2d08404e1 upstream.

Or else the laptop will boot with a dimmed screen.

References: https://bugzilla.kernel.org/show_bug.cgi?id=51141
Tested-by: Stefan Nagy 
Signed-off-by: Zhang Rui 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/acpi/video.c |   14 ++
 1 file changed, 14 insertions(+)

--- a/drivers/acpi/video.c
+++ b/drivers/acpi/video.c
@@ -389,6 +389,12 @@ static int __init video_set_bqc_offset(c
return 0;
 }
 
+static int video_ignore_initial_backlight(const struct dmi_system_id *d)
+{
+   use_bios_initial_backlight = 0;
+   return 0;
+}
+
 static struct dmi_system_id video_dmi_table[] __initdata = {
/*
 * Broken _BQC workaround 
http://bugzilla.kernel.org/show_bug.cgi?id=13121
@@ -433,6 +439,14 @@ static struct dmi_system_id video_dmi_ta
DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 7720"),
},
},
+   {
+.callback = video_ignore_initial_backlight,
+.ident = "HP Folio 13-2000",
+.matches = {
+   DMI_MATCH(DMI_BOARD_VENDOR, "Hewlett-Packard"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "HP Folio 13 - 2000 Notebook PC"),
+   },
+   },
{}
 };
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 01/27] mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls

2012-12-14 Thread Greg Kroah-Hartman

3.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Marek Szyprowski 

commit 387870f2d6d679746020fa8e25ef786ff338dc98 upstream.

dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later  trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.

This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.

Reported-by: Soeren Moch 
Reported-by: Thomas Petazzoni 
Signed-off-by: Marek Szyprowski 
Tested-by: Andrew Lunn 
Tested-by: Soeren Moch 
Signed-off-by: Greg Kroah-Hartman 

---
 mm/dmapool.c |   31 +++
 1 file changed, 7 insertions(+), 24 deletions(-)

--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -50,7 +50,6 @@ struct dma_pool { /* the pool */
size_t allocation;
size_t boundary;
char name[32];
-   wait_queue_head_t waitq;
struct list_head pools;
 };
 
@@ -62,8 +61,6 @@ struct dma_page { /* cacheable header f
unsigned int offset;
 };
 
-#definePOOL_TIMEOUT_JIFFIES((100 /* msec */ * HZ) / 1000)
-
 static DEFINE_MUTEX(pools_lock);
 
 static ssize_t
@@ -172,7 +169,6 @@ struct dma_pool *dma_pool_create(const c
retval->size = size;
retval->boundary = boundary;
retval->allocation = allocation;
-   init_waitqueue_head(>waitq);
 
if (dev) {
int ret;
@@ -227,7 +223,6 @@ static struct dma_page *pool_alloc_page(
memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
pool_initialise_page(pool, page);
-   list_add(>page_list, >page_list);
page->in_use = 0;
page->offset = 0;
} else {
@@ -315,30 +310,21 @@ void *dma_pool_alloc(struct dma_pool *po
might_sleep_if(mem_flags & __GFP_WAIT);
 
spin_lock_irqsave(>lock, flags);
- restart:
list_for_each_entry(page, >page_list, page_list) {
if (page->offset < pool->allocation)
goto ready;
}
-   page = pool_alloc_page(pool, GFP_ATOMIC);
-   if (!page) {
-   if (mem_flags & __GFP_WAIT) {
-   DECLARE_WAITQUEUE(wait, current);
 
-   __set_current_state(TASK_UNINTERRUPTIBLE);
-   __add_wait_queue(>waitq, );
-   spin_unlock_irqrestore(>lock, flags);
+   /* pool_alloc_page() might sleep, so temporarily drop >lock */
+   spin_unlock_irqrestore(>lock, flags);
 
-   schedule_timeout(POOL_TIMEOUT_JIFFIES);
+   page = pool_alloc_page(pool, mem_flags);
+   if (!page)
+   return NULL;
 
-   spin_lock_irqsave(>lock, flags);
-   __remove_wait_queue(>waitq, );
-   goto restart;
-   }
-   retval = NULL;
-   goto done;
-   }
+   spin_lock_irqsave(>lock, flags);
 
+   list_add(>page_list, >page_list);
  ready:
page->in_use++;
offset = page->offset;
@@ -348,7 +334,6 @@ void *dma_pool_alloc(struct dma_pool *po
 #ifdef DMAPOOL_DEBUG
memset(retval, POOL_POISON_ALLOCATED, pool->size);
 #endif
- done:
spin_unlock_irqrestore(>lock, flags);
return retval;
 }
@@ -435,8 +420,6 @@ void dma_pool_free(struct dma_pool *pool
page->in_use--;
*(int *)vaddr = page->offset;
page->offset = offset;
-   if (waitqueue_active(>waitq))
-   wake_up_locked(>waitq);
/*
 * Resist a temptation to do
 *if (!is_page_busy(page)) pool_free_page(pool, page);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 14/27] ACPI / PM: Add Sony Vaio VPCEB1S1E to nonvs blacklist.

2012-12-14 Thread Greg Kroah-Hartman

3.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Lan Tianyu 

commit 876ab79055019e248508cfd0dee7caa3c0c831ed upstream.

Sony Vaio VPCEB1S1E does not resume correctly without
acpi_sleep=nonvs, so add it to the ACPI sleep blacklist.

References: https://bugzilla.kernel.org/show_bug.cgi?id=48781
Reported-by: Sébastien Wilmet 
Signed-off-by: Lan Tianyu 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/acpi/sleep.c |8 
 1 file changed, 8 insertions(+)

--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -519,6 +519,14 @@ static struct dmi_system_id __initdata a
},
{
.callback = init_nvs_nosave,
+   .ident = "Sony Vaio VPCEB1S1E",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "VPCEB1S1E"),
+   },
+   },
+   {
+   .callback = init_nvs_nosave,
.ident = "Sony Vaio VGN-FW520F",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 08/27] USB: option: blacklist network interface on Huawei E173

2012-12-14 Thread Greg Kroah-Hartman

3.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Bjørn Mork 

commit f36446cf9bbebaa03a80d95cfeeafbaf68218249 upstream.

The Huawei E173 will normally appear as 12d1:1436 in Linux.  But
the modem has another mode with different device ID and a slightly
different set of descriptors. This is the mode used by Windows like
this:

  3Modem:  USB\VID_12D1_140C_00\6&3A1D2012&0&
  Networkcard: USB\VID_12D1_140C_01\6&3A1D2012&0&0001
  Appli.Inter: USB\VID_12D1_140C_02\6&3A1D2012&0&0002
  PC UI Inter: USB\VID_12D1_140C_03\6&3A1D2012&0&0003

All interfaces have the same ff/ff/ff class codes in this mode.
Blacklisting the network interface to allow it to be picked up by
the network driver.

Reported-by: Thomas Schäfer 
Signed-off-by: Bjørn Mork 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/serial/option.c |3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -81,6 +81,7 @@ static void option_instat_callback(struc
 #define OPTION_PRODUCT_GTM380_MODEM0x7201
 
 #define HUAWEI_VENDOR_ID   0x12D1
+#define HUAWEI_PRODUCT_E1730x140C
 #define HUAWEI_PRODUCT_K4505   0x1464
 #define HUAWEI_PRODUCT_K3765   0x1465
 #define HUAWEI_PRODUCT_K4605   0x14C6
@@ -553,6 +554,8 @@ static const struct usb_device_id option
{ USB_DEVICE(QUANTA_VENDOR_ID, QUANTA_PRODUCT_GLX) },
{ USB_DEVICE(QUANTA_VENDOR_ID, QUANTA_PRODUCT_GKE) },
{ USB_DEVICE(QUANTA_VENDOR_ID, QUANTA_PRODUCT_GLE) },
+   { USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_E173, 
0xff, 0xff, 0xff),
+   .driver_info = (kernel_ulong_t) _intf1_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_K4505, 
0xff, 0xff, 0xff),
.driver_info = (kernel_ulong_t) _cdc12_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_K3765, 
0xff, 0xff, 0xff),


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 06/27] x86: hpet: Fix masking of MSI interrupts

2012-12-14 Thread Greg Kroah-Hartman

3.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Jan Beulich 

commit 6acf5a8c931da9d26c8dd77d784daaf07fa2bff0 upstream.

HPET_TN_FSB is not a proper mask bit; it merely toggles between MSI and
legacy interrupt delivery. The proper mask bit is HPET_TN_ENABLE, so
use both bits when (un)masking the interrupt.

Signed-off-by: Jan Beulich 
Link: http://lkml.kernel.org/r/5093e0900278000a6...@nat28.tlf.novell.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/kernel/hpet.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -434,7 +434,7 @@ void hpet_msi_unmask(struct irq_data *da
 
/* unmask it */
cfg = hpet_readl(HPET_Tn_CFG(hdev->num));
-   cfg |= HPET_TN_FSB;
+   cfg |= HPET_TN_ENABLE | HPET_TN_FSB;
hpet_writel(cfg, HPET_Tn_CFG(hdev->num));
 }
 
@@ -445,7 +445,7 @@ void hpet_msi_mask(struct irq_data *data
 
/* mask it */
cfg = hpet_readl(HPET_Tn_CFG(hdev->num));
-   cfg &= ~HPET_TN_FSB;
+   cfg &= ~(HPET_TN_ENABLE | HPET_TN_FSB);
hpet_writel(cfg, HPET_Tn_CFG(hdev->num));
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1140 matches

Mail list logo