Re: [PATCH 7/8] nohz: Evaluate tick dependency once on context switch

2015-06-16 Thread Preeti U Murthy
On 06/12/2015 02:16 AM, Rik van Riel wrote:
> On 06/11/2015 01:36 PM, Frederic Weisbecker wrote:
>> The tick dependency is evaluated on every irq. This is a batch of checks
>> which determine whether it is safe to stop the tick or not. These checks
>> are often split in many details: posix cpu timers, scheduler, sched clock,
>> perf events. Each of which are made of smaller details: posix cpu
>> timer involves checking process wide timers then thread wide timers. Perf
>> involves checking freq events then more per cpu details.
>>
>> Checking these details every time we update the full dynticks state
>> bring avoidable overhead.
>>
>> So lets evaluate these dependencies once on context switch. Then the
>> further dependency checks will be performed through a single state check.
>>
>> This is a first step that can be later optimized by dividing task level
>> dependency, CPU level dependency and global dependency and update
>> each at the right time.
> 
>> +static void tick_nohz_full_update_dependencies(void)
>> +{
>> +struct tick_sched *ts = this_cpu_ptr(_cpu_sched);
>> +
>> +if (!posix_cpu_timers_can_stop_tick(current))
>> +ts->tick_needed |= TICK_NEEDED_POSIX_CPU_TIMER;
>> +
>> +if (!perf_event_can_stop_tick())
>> +ts->tick_needed |= TICK_NEEDED_PERF_EVENT;
>> +
>> +if (!sched_can_stop_tick())
>> +ts->tick_needed |= TICK_NEEDED_SCHED;
>>  
> 
> I see this getting kicked from task work and from ipi
> context, but does it get kicked on task wakeup, when
> we have a second runnable task on a CPU, but we decide
> not to preempt the currently running task to switch to
> it yet, but we will want to preempt the currently running
> task at a later point in time?

+1. This is not taken care of as far as I can see too.

Regards
Preeti U Murthy
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 04/14] crypto: add a new driver for Marvell's CESA

2015-06-16 Thread Herbert Xu
On Tue, Jun 16, 2015 at 11:58:58AM +0200, Boris Brezillon wrote:
>
> +config CRYPTO_DEV_MARVELL_CESA
> + tristate "New Marvell's Cryptographic Engine driver"
> + depends on (PLAT_ORION || ARCH_MVEBU || COMPILE_TEST) && HAS_DMA && 
> HAS_IOMEM
> + select CRYPTO_ALGAPI
> + select CRYPTO_AES
> + select CRYPTO_DES
> + select CRYPTO_BLKCIPHER2
> + select CRYPTO_HASH

While you're fixing the DMA issue, could you also replace BLKCIPHER2
with BLKCIPHER and kill the select on ALGAPI? BLKCIPHER2 is internal
to the crypto API and should not be used elsewhere while ALGAPI is
only meant to be selected by crypto types such as BLKCIPHER.

I know the existing driver does this too and I will fix that now.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 05/14] crypto: marvell/CESA: add TDMA support

2015-06-16 Thread Herbert Xu
On Wed, Jun 17, 2015 at 01:05:27PM +0800, Herbert Xu wrote:
> On Tue, Jun 16, 2015 at 11:58:59AM +0200, Boris Brezillon wrote:
> >
> > +   ret = dma_map_sg(cesa_dev->dev, req->src, creq->src_nents,
> > +DMA_TO_DEVICE);
> > +   if (ret != creq->src_nents)
> > +   return -ENOMEM;
> 
> Hmm it doesn't quite work like that.  It returns zero on error,
> otherwise it returns the number (n) of mapped entries which may be
> less than what you gave it due to merging.  You're then supposed
> to use only the first n entries which should contain everything.

Please take a look at Documentation/DMA-API-HOWTO.txt, it has
everything you need to know about the dma_map_sg interface.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V1] x86, espfix: postpone the initialization of espfix stack for AP

2015-06-16 Thread Zhu Guihua

Any feedback about this?

On 06/04/2015 05:45 PM, Gu Zheng wrote:

The following lockdep warning occurrs when running with latest kernel:
[3.178000] [ cut here ]
[3.183000] WARNING: CPU: 128 PID: 0 at kernel/locking/lockdep.c:2755 
lockdep_trace_alloc+0xdd/0xe0()
[3.193000] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[3.199000] Modules linked in:

[3.203000] CPU: 128 PID: 0 Comm: swapper/128 Not tainted 4.1.0-rc3 #70
[3.221000]   2d6601fb3e6d4e4c 88086fd5fc38 
81773f0a
[3.23]   88086fd5fc90 88086fd5fc78 
8108c85a
[3.238000]  88086fd6 0092 88086fd6 
00d0
[3.246000] Call Trace:
[3.249000]  [] dump_stack+0x4c/0x65
[3.255000]  [] warn_slowpath_common+0x8a/0xc0
[3.261000]  [] warn_slowpath_fmt+0x55/0x70
[3.268000]  [] lockdep_trace_alloc+0xdd/0xe0
[3.274000]  [] __alloc_pages_nodemask+0xad/0xca0
[3.281000]  [] ? __lock_acquire+0xf6d/0x1560
[3.288000]  [] alloc_page_interleave+0x3a/0x90
[3.295000]  [] alloc_pages_current+0x17d/0x1a0
[3.301000]  [] ? __get_free_pages+0xe/0x50
[3.308000]  [] __get_free_pages+0xe/0x50
[3.314000]  [] init_espfix_ap+0x17b/0x320
[3.32]  [] start_secondary+0xf1/0x1f0
[3.327000] ---[ end trace 1b3327d9d6a1d62c ]---

As we alloc pages with GFP_KERNEL in init_espfix_ap() which is called
before enabled local irq, and the lockdep sub-system considers this
behaviour as allocating memory with GFP_FS with local irq disabled,
then trigger the warning as mentioned about.

Though we could allocate them on the boot CPU side and hand them over to
the secondary CPU, but it seemes a bit waste if some of cpus are offline.
As thers is no need to these pages(espfix stack) until we try to run user
code, so we postpone the initialization of espfix stack, and let the boot
up routine init the espfix stack for the target cpu after it booted to
avoid the noise.

Signed-off-by: Gu Zheng 
---
v1:
   Alloc the page on the node the target CPU is on.
RFC:
   Let the boot up routine init the espfix stack for the target cpu after it
   booted.
---
---
  arch/x86/include/asm/espfix.h |2 +-
  arch/x86/kernel/espfix_64.c   |   28 
  arch/x86/kernel/smpboot.c |   14 +++---
  3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/espfix.h b/arch/x86/include/asm/espfix.h
index 99efebb..ca3ce9a 100644
--- a/arch/x86/include/asm/espfix.h
+++ b/arch/x86/include/asm/espfix.h
@@ -9,7 +9,7 @@ DECLARE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack);
  DECLARE_PER_CPU_READ_MOSTLY(unsigned long, espfix_waddr);
  
  extern void init_espfix_bsp(void);

-extern void init_espfix_ap(void);
+extern void init_espfix_ap(int cpu);
  
  #endif /* CONFIG_X86_64 */
  
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c

index f5d0730..e397583 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -131,25 +131,24 @@ void __init init_espfix_bsp(void)
init_espfix_random();
  
  	/* The rest is the same as for any other processor */

-   init_espfix_ap();
+   init_espfix_ap(0);
  }
  
-void init_espfix_ap(void)

+void init_espfix_ap(int cpu)
  {
-   unsigned int cpu, page;
+   unsigned int page;
unsigned long addr;
pud_t pud, *pud_p;
pmd_t pmd, *pmd_p;
pte_t pte, *pte_p;
-   int n;
+   int n, node;
void *stack_page;
pteval_t ptemask;
  
  	/* We only have to do this once... */

-   if (likely(this_cpu_read(espfix_stack)))
+   if (likely(per_cpu(espfix_stack, cpu)))
return; /* Already initialized */
  
-	cpu = smp_processor_id();

addr = espfix_base_addr(cpu);
page = cpu/ESPFIX_STACKS_PER_PAGE;
  
@@ -165,12 +164,15 @@ void init_espfix_ap(void)

if (stack_page)
goto unlock_done;
  
+	node = cpu_to_node(cpu);

ptemask = __supported_pte_mask;
  
  	pud_p = _pud_page[pud_index(addr)];

pud = *pud_p;
if (!pud_present(pud)) {
-   pmd_p = (pmd_t *)__get_free_page(PGALLOC_GFP);
+   struct page *page = alloc_pages_node(node, PGALLOC_GFP, 0);
+
+   pmd_p = (pmd_t *)page_address(page);
pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask));
paravirt_alloc_pmd(_mm, __pa(pmd_p) >> PAGE_SHIFT);
for (n = 0; n < ESPFIX_PUD_CLONES; n++)
@@ -180,7 +182,9 @@ void init_espfix_ap(void)
pmd_p = pmd_offset(, addr);
pmd = *pmd_p;
if (!pmd_present(pmd)) {
-   pte_p = (pte_t *)__get_free_page(PGALLOC_GFP);
+   struct page *page = alloc_pages_node(node, PGALLOC_GFP, 0);
+
+   pte_p = (pte_t *)page_address(page);
pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask));
paravirt_alloc_pte(_mm, 

Re: [uClinux-dev] m68k compile issue with 4.0.5

2015-06-16 Thread Waldemar Brodkorb
Hi Greg,
Greg Ungerer wrote,
> 
> I don't have any compile (or runtime) problems with m5208evb_defconfig
> on linux-4.0.5 either. That is close to what most people use with qemu.
> What .config are you using?

Are you sure the defconfig creates a bootable image?

I still need at least two patches.

http://cgit.openadk.org/cgi/cgit/openadk.git/tree/target/m68k/qemu-m68k/patches/4.0.5/qemu-coldfire.patch
Without this one I get black screen.

http://cgit.openadk.org/cgi/cgit/openadk.git/tree/target/m68k/qemu-m68k/patches/4.0.5/m68k-coldfire-fec.patch
Without this one I get 
qemu: hardware error: mcf_fec_read: Bad address 0x1c4
...
Abort

Networking still does not work for me after booting with the two
patches applied:
fec fec.0 (unnamed net_device) (uninitialized): MDIO read timeout
fec: probe of fec.0 failed with error -5

Any idea?

best regards
 Waldemar

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH perf/core v3 3/3] [BUGFIX] perf probe: Show usage even if the last event is skipped

2015-06-16 Thread Masami Hiramatsu
On 2015/06/16 23:46, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jun 16, 2015 at 08:50:57PM +0900, Masami Hiramatsu escreveu:
>> When the last part of converted events are blacklisted or out-of-text,
>> those are skipped and perf probe doesn't show usage examples.
>> This fixes it to show the example even if the last part of event list
>> is skipped.
>>
>> E.g. without this patch, events are added, but suddenly end;
> 
> End what? Stop being added?

"End without the last message", I meant.

> I.e. not all eligible events are added? From
> your description the problem seems to be that that last message: "You
> can now use it..."  is not presented, but here, without this patch, it
> is:

I see, actually, this happens only if the skipped symbols (
vfs_caches_init_early or vfs_caches_init) are placed at the end of the
matched symbol list. On Ubuntu 15.04 kernel, it doesn't have
vfs_load_quota_inode etc. and the vfs_caches_init is the last part of
the matched list. Since it is hard to reproduce, I've added a Note on
the end of description :)



>>
>> Note that this can be reproduced ONLY IF the vfs_caches_init*
>> is the last part of matched symbol list. I've checked this happens
>> on "3.19.0-generic #18-Ubuntu" kernel binary.
>>



To reproduce this bug, you need to find a good symbol matching pattern
which (1) matches both of valid function in .text and invalid function
in .inittext (2) invalid one must be on the end of matched function list.

I fortunately hit such pattern and found this bug, but it depends on
the kernel binary.

[...]


> I.e. the only problem I found was this:
> 
> [root@zoo ~]# time perf probe -l > /dev/null
> 
> real  0m15.408s
> user  0m14.892s
> sys   0m0.534s
> [root@zoo ~]# 
> [root@zoo ~]# perf stat perf probe -l > /dev/null
> 
>  Performance counter stats for 'perf probe -l':
> 
>   15256.588897  task-clock (msec) #1.001 CPUs utilized
>   
>116  context-switches  #0.008 K/sec
>   
>  4  cpu-migrations#0.000 K/sec
>   
>230,720  page-faults   #0.015 M/sec
>   
> 47,830,405,530  cycles#3.135 GHz  
>   
> 43,974,134,505  stalled-cycles-frontend   #   91.94% frontend cycles 
> idle   
>  stalled-cycles-backend   
> 11,540,587,038  instructions  #0.24  insns per cycle  
>   
>   #3.81  stalled cycles 
> per insn
>  2,807,769,592  branches  #  184.037 M/sec
>   
> 20,087,075  branch-misses #0.72% of all branches  
>   
> 
>   15.240796324 seconds time elapsed
> 
> [root@zoo ~]#
> 
> Can you check why it takes so long and check the need for this patch?

It is because perf probe -l is not optimized to show a lot of probes yet.
It initializes and loads debuginfo for each probe. I guess we can reuse
debuginfo among them. let me try...

Thank you,


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu...@hitachi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: m68k compile issue with 4.0.5

2015-06-16 Thread Waldemar Brodkorb
Hi Geert,
Geert Uytterhoeven wrote,

> Hi Waldemar,
> 
> On Mon, Jun 15, 2015 at 10:23 PM, Waldemar Brodkorb  wrote:
> > I am trying to build a M68K (Coldfire no-MMU) kernel for Qemu-system-m68k.
> > Any idea what change breaks the compile?
> 
> I tried a few m68knommu defconfigs, but can't reproduce it.
> 
> Is this a plain v4.0.5? I can't find the offending call to vma_fput().
> "git grep vma_fput" tells me there's no "vma_fput" in the kernel sources?

You are totally right. It is not plain vanilla, I am really sorry
about this ugly bug report. I normally use an option to disable any
external kernel patches, before reporting. I missed it this time.

I have experimented with aufs4 kernel patches lately.

You are allowed to kick my ass ;)

best regards
 Waldemar

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] TCO watchdog pretimeout handler

2015-06-16 Thread Guenter Roeck

On 06/16/2015 06:45 AM, Francois-Nicolas Muller wrote:

Use TCO watchdog first timeout (pretimeout) to dump CPU backtraces
and ease debug of watchdog expiration causes.
TCO logic generates a SCI interrupt, then its handler dumps all CPU
backtraces and calls panic (in order to execute registered panic
callbacks).
SCI interrupt number (GPE) is configured from ACPI tables.

Signed-off-by: Francois-Nicolas Muller 
---
Thanks Guenter for your review.


If I recall correctly, the iTCO watchdog can also generate an NMI.
Would it make sense to add support for handling this NMI as well ?


As far as I know, there is no NMI option for TCO watchdog interrupt.
Do you have any documentation about this ?



Actually that was a miscommunication, sorry. I confused it waith another 
watchdog.

I assume you took out all mention of SMI because it is not (yet) supported.
Would be interesting to know what systems out there actually use / configure.

 Here is a new version (v2) of the patch:

- rebased on latest kernel
- fixed coding style issues

Francois-Nicolas
---
  drivers/watchdog/iTCO_wdt.c | 50 +
  1 file changed, 50 insertions(+)

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index 3c3fd41..cd2569a 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -68,6 +68,8 @@
  #include/* For inb/outb/... */
  #include 
  #include 
+#include 
+#include 

  #include "iTCO_vendor.h"

@@ -127,6 +129,12 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");

+#define DEFAULT_PRETIMEOUT 0
+static bool pretimeout = DEFAULT_PRETIMEOUT;
+module_param(pretimeout, bool, 0);
+MODULE_PARM_DESC(pretimeout, "Enable watchdog pretimeout (default="
+   __MODULE_STRING(DEFAULT_PRETIMEOUT) ")");
+
  /*
   * Some TCO specific functions
   */
@@ -201,6 +209,45 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
return ret; /* returns: 0 = OK, -EIO = Error */
  }

+static unsigned char *tco_hid = "8086229C";
+


Do people understand what this means ? Is that some Intel magic string ?
Does this work for all instances of iTCO watchdogs, or only for a specific
system or iTCO version ?

Rafael asked this question as well, but I don't recall seeing an answer.

I see that it maps to a PCI ID for Intel Braswell, but I have no idea
how that translates to something useful for ACPI. Is this a well defined
(and allocated) ACPI HID ? How about other chips (non-Braswell)
which are supported by this driver ?


+static u32 iTCO_wdt_pretimeout_handler(acpi_handle gpe_device, u32 gpe,
+  void *context)
+{
+   /* dump backtraces for all available cores */
+   trigger_all_cpu_backtrace();
+
+   /* call panic notifiers */
+   panic("Kernel Watchdog");
+
+   return ACPI_INTERRUPT_HANDLED;
+}
+
+static acpi_status __init iTCO_wdt_register_gpe(acpi_handle handle,
+   u32 lvl, void *context, void **rv)
+{
+   unsigned long long gpe;
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_evaluate_object(handle, "_GPE", NULL, );
+   if (ACPI_FAILURE(status))
+   return status;
+
+   if (object.type != ACPI_TYPE_INTEGER)
+   return AE_BAD_DATA;
+
+   gpe = object.integer.value;
+   status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+ iTCO_wdt_pretimeout_handler, NULL);


Do we know for sure that _GPE is always associated with the watchdog ?
Is that because of tco_hid ?

Thanks,
Guenter


+   if (ACPI_FAILURE(status))
+   return status;
+
+   acpi_enable_gpe(NULL, gpe);
+   return AE_OK;
+}
+
  static int iTCO_wdt_start(struct watchdog_device *wd_dev)
  {
unsigned int val;
@@ -641,6 +688,9 @@ static int __init iTCO_wdt_init_module(void)
if (err)
return err;

+   if (pretimeout)
+   acpi_get_devices(tco_hid, iTCO_wdt_register_gpe, NULL, NULL);
+
return 0;
  }




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Resend PATCH v8 0/4] sched: Rewrite runnable load and utilization average tracking

2015-06-16 Thread Boqun Feng
On Wed, Jun 17, 2015 at 11:06:50AM +0800, Boqun Feng wrote:
> Hi Yuyang,
> 
> I've run the test as follow on tip/master without and with your
> patchset:
> 
> On a 12-core system (Intel(R) Xeon(R) CPU X5690 @ 3.47GHz)
> run stress --cpu 12
> run dbench 1

Sorry, I forget to say that `stress --cpu 12` and `dbench 1` are running
simultaneously. Thank Yuyang for reminding me that.

Regards,
Boqun


signature.asc
Description: PGP signature


Re: [PATCH v5 05/14] crypto: marvell/CESA: add TDMA support

2015-06-16 Thread Herbert Xu
On Tue, Jun 16, 2015 at 11:58:59AM +0200, Boris Brezillon wrote:
>
> + ret = dma_map_sg(cesa_dev->dev, req->src, creq->src_nents,
> +  DMA_TO_DEVICE);
> + if (ret != creq->src_nents)
> + return -ENOMEM;

Hmm it doesn't quite work like that.  It returns zero on error,
otherwise it returns the number (n) of mapped entries which may be
less than what you gave it due to merging.  You're then supposed
to use only the first n entries which should contain everything.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 1/1] net: replace if()/BUG with BUG_ON()

2015-06-16 Thread Maninder Singh
Hi David, 

>> Use BUG_ON(condition) instead of if(condition)/BUG()
>> 
>> Signed-off-by: Maninder Singh 
>> Reviewed-by: Akhilesh Kumar 

>Your email client corrupted this patch, making it unusable

I resent the patch with git send-mail , it results ok, hopefully it does not 
corrupt it.

Thanks

Re: perf,ftrace: fuzzer triggers warning in trace_events_filter code

2015-06-16 Thread Vince Weaver
On Tue, 16 Jun 2015, Steven Rostedt wrote:
> > Does this patch fix your issue?
> 
> I got this patch all ready and set to push to Linus. But I was hoping
> to add a Tested-by: from you. I already have you as Reported-by.
> 
> I would like to get this in before 4.1 is released.

sorry for the delay.  I've patched the kernel and I am unable to replicate 
the Warning with it applied.  So feel free to add a Tested-by from me.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] pagemap: switch to the new format and do some cleanup

2015-06-16 Thread Konstantin Khlebnikov
On Wed, Jun 17, 2015 at 12:29 AM, Andrew Morton
 wrote:
> On Mon, 15 Jun 2015 08:56:49 +0300 Konstantin Khlebnikov  
> wrote:
>
>> This patch removes page-shift bits (scheduled to remove since 3.11) and
>> completes migration to the new bit layout. Also it cleans messy macro.
>
> hm, I can't find any kernel version to which this patch comes close to
> applying.

This patchset applies to  4.1-rc8 and current mmotm without problems.
I guess you've tried pick this patch alone without previous changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: membarrier selftest issue

2015-06-16 Thread Pranith Kumar
Hi Mathieu,

On Mon, Jun 15, 2015 at 11:31 AM, Mathieu Desnoyers
 wrote:
> Hi,
>
> The updated membarrier tree:
> https://github.com/compudj/linux-dev
> branch: sys-membarrier-volatile
>
> has an issue building selftests (make -C tools/testing/selftests)
>
> It appears that including linux/membarrier.h from those userspace
> tests now fails (as of 4.1-rc8). Of course, my system does not have
> the updated userspace headers installed. What is the right course
> of action there ?
>

This somehow is working for me. Did you install the kernel? I am on
debian and I had to install kernel+headers+libc for this to work. pmu
tests did not work due to missing headers but membarrier and memfd did
work.

pranith@macmini:~/linux$ make -C tools/testing/selftests
make: Entering directory '/home/pranith/linux/tools/testing/selftests'
for TARGET in breakpoints cpu-hotplug efivarfs exec firmware ftrace
kcmp membarrier memfd memory-hotplug mount mqueue net powerpc ptrace
size sysctl timers user vm x86; do \
make -C $TARGET; \
done;
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/breakpoints'
echo "Not an x86 target, can't build breakpoints selftests"
Not an x86 target, can't build breakpoints selftests
make[1]: Leaving directory
'/home/pranith/linux/tools/testing/selftests/breakpoints'
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/cpu-hotplug'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory
'/home/pranith/linux/tools/testing/selftests/cpu-hotplug'
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/efivarfs'
gcc -Wallopen-unlink.c   -o open-unlink
gcc -Wallcreate-read.c   -o create-read
make[1]: Leaving directory
'/home/pranith/linux/tools/testing/selftests/efivarfs'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/exec'
gcc -Wall -o execveat execveat.c
ln -s -f execveat execveat.symlink
cp execveat execveat.denatured
chmod -x execveat.denatured
echo '#!/bin/sh' > script
echo 'exit $*' >> script
chmod +x script
mkdir -p subdir
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/exec'
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/firmware'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory
'/home/pranith/linux/tools/testing/selftests/firmware'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/ftrace'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/ftrace'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/kcmp'
gcc -I../../../../usr/include/kcmp_test.c   -o kcmp_test
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/kcmp'
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/membarrier'
gcc -g -I../../../../usr/include/ membarrier_test.c -o membarrier_test
make[1]: Leaving directory
'/home/pranith/linux/tools/testing/selftests/membarrier'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/memfd'
gcc -D_FILE_OFFSET_BITS=64 -I../../../../include/uapi/
-I../../../../include/ -I../../../../usr/include/ memfd_test.c -o
memfd_test
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/memfd'
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/memory-hotplug'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory
'/home/pranith/linux/tools/testing/selftests/memory-hotplug'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/mount'
gcc -Wall -O2 unprivileged-remount-test.c -o unprivileged-remount-test
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/mount'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/mqueue'
gcc -O2 mq_open_tests.c -o mq_open_tests -lrt
gcc -O2 -o mq_perf_tests mq_perf_tests.c -lrt -lpthread -lpopt
mq_perf_tests.c:39:18: fatal error: popt.h: No such file or directory
 #include 
  ^
compilation terminated.
Makefile:4: recipe for target 'all' failed
make[1]: *** [all] Error 1
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/mqueue'
make[1]: Entering directory '/home/pranith/linux/tools/testing/selftests/net'
gcc -Wall -O2 -g -I../../../../usr/include/ -o socket socket.c
gcc -Wall -O2 -g -I../../../../usr/include/ -o psock_fanout psock_fanout.c
gcc -Wall -O2 -g -I../../../../usr/include/ -o psock_tpacket psock_tpacket.c
make[1]: Leaving directory '/home/pranith/linux/tools/testing/selftests/net'
make[1]: Entering directory
'/home/pranith/linux/tools/testing/selftests/powerpc'
make -k -C pmu all
make[2]: Entering directory
'/home/pranith/linux/tools/testing/selftests/powerpc/pmu'
gcc -Wall -O2 -flto -Wall -Werror
-DGIT_VERSION='"v4.1-rc8-3-g9ab055c-dirty"'
-I/home/pranith/linux/tools/testing/selftests/powerpc  -m64 -o
count_instructions loop.S count_instructions.c ../harness.c event.c
lib.c
In file 

[PATCH v2] Staging: wilc1000: Remove casting the values returned by kmalloc()

2015-06-16 Thread Abdul, Hussain (H.)
From: Abdul Hussain 

This patch removes casting the values returned by memory allocation functions.

Signed-off-by: Abdul Hussain 
---
 drivers/staging/wilc1000/linux_mon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/wilc1000/linux_mon.c 
b/drivers/staging/wilc1000/linux_mon.c
index bc7feb4..8cba13c 100644
--- a/drivers/staging/wilc1000/linux_mon.c
+++ b/drivers/staging/wilc1000/linux_mon.c
@@ -247,7 +247,7 @@ static int mon_mgmt_tx(struct net_device *dev, const u8 
*buf, size_t len)
nic = netdev_priv(dev);
 
netif_stop_queue(dev);
-   mgmt_tx = (struct tx_complete_mon_data *)kmalloc(sizeof(struct 
tx_complete_mon_data), GFP_ATOMIC);
+   mgmt_tx = kmalloc(sizeof(struct tx_complete_mon_data), GFP_ATOMIC);
if (mgmt_tx == NULL) {
PRINT_ER("Failed to allocate memory for mgmt_tx structure\n");
return WILC_FAIL;
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspicious RCU usage at boot w/ arm ipi trace events?

2015-06-16 Thread Paul E. McKenney
On Tue, Jun 16, 2015 at 05:41:29PM -0700, Stephen Boyd wrote:
> On 06/16/2015 05:31 PM, Paul E. McKenney wrote:
> > On Tue, Jun 16, 2015 at 05:21:59PM -0700, John Stultz wrote:
> >> Not sure if I have the right folks on CC. Noticed this booting on a
> >> apq8064 based system w/ 4.1-rc7 (plus a few patches that I believe are
> >> unrelated).
> >>
> >>
> >> [0.008796] CPU0: thread -1, cpu 0, socket 0, mpidr 8000
> >> [0.009024] Setting up static identity map for 0x80208280 - 0x802082d8
> >> [0.084781] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
> >> [0.085119]
> >> [0.085139] ===
> >> [0.085152] [ INFO: suspicious RCU usage. ]
> >> [0.085168] 4.1.0-rc7-00033-gb5bed2f #153 Not tainted
> >> [0.085180] ---
> >> [0.085195] include/trace/events/ipi.h:68 suspicious
> >> rcu_dereference_check() usage!
> >> [0.085205]
> >> [0.085205] other info that might help us debug this:
> >> [0.085205]
> >> [0.085220]
> >> [0.085220] RCU used illegally from idle CPU!
> >> [0.085220] rcu_scheduler_active = 1, debug_locks = 0
> >> [0.085235] RCU used illegally from extended quiescent state!
> >> [0.085247] no locks held by swapper/0/0.
> >> [0.085256]
> >> [0.085256] stack backtrace:
> >> [0.085275] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> >> 4.1.0-rc7-00033-gb5bed2f #153
> >> [0.085288] Hardware name: Qualcomm (Flattened Device Tree)
> >> [0.085323] [] (unwind_backtrace) from []
> >> (show_stack+0x10/0x14)
> >> [0.085348] [] (show_stack) from []
> >> (dump_stack+0x70/0xbc)
> >> [0.085369] [] (dump_stack) from []
> >> (handle_IPI+0x428/0x604)
> >> [0.085393] [] (handle_IPI) from []
> >> (gic_handle_irq+0x54/0x5c)
> >> [0.085423] [] (gic_handle_irq) from []
> >> (__irq_svc+0x44/0x7c)
> >> [0.085436] Exception stack(0xc09f3f48 to 0xc09f3f90)
> >> [0.085451] 3f40:   0001 0001 
> >> c09f73b8 c09f4528 c0a5de9c
> >> [0.085468] 3f60: c076b4f0   c09ef108 c0a5cec1
> >> 0001  c09f3f90
> >> [0.085481] 3f80: c026bf60 c0210ab8 2113 
> >> [0.085512] [] (__irq_svc) from []
> >> (arch_cpu_idle+0x20/0x3c)
> >> [0.085549] [] (arch_cpu_idle) from []
> >> (cpu_startup_entry+0x2c0/0x5dc)
> >> [0.085576] [] (cpu_startup_entry) from []
> >> (start_kernel+0x358/0x3c4)
> >> [0.085600] [] (start_kernel) from [<8020807c>] (0x8020807c)
> >>
> >>
> >> Let me know if there's any patches you'd like me to test, or if you'd
> >> like config details.
> > At first glance, looks like a failure to invoke irq_enter() from
> > __irq_svc(), gic_handle_irq(), or handle_IPI().
> >
> > Thanx, Paul
> >
> 
> The tracepoint 'trace_ipi_entry' in handle_IPI()  is using RCU and we
> haven't called irq_enter() yet at the point. Does this tracepoint need
> to have _rcuidle() added to it?

Yes, I believe that would fix this problem.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Staging: wilc1000: Remove casting the values returned by kmalloc()

2015-06-16 Thread Abdul, Hussain (H.)
On Wednesday 17 June 2015 07:48 AM, gre...@linuxfoundation.org wrote:
> On Tue, Jun 16, 2015 at 07:09:56AM +, Abdul, Hussain (H.) wrote:
>> From: Abdul Hussain 
>>
>> This patch removes casting the values returned by memory allocation 
>> functions.
>>
>> Signed-off-by: Abdul Hussain 
>> ---
>>  drivers/staging/wilc1000/linux_mon.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> Does not apply :(
>
Greg,

I will verify and resend the patch.

Thanks,
Abdul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] mtd: nandsim: Fix kasprintf() usage

2015-06-16 Thread Joe Perches
On Tue, 2015-06-16 at 19:07 -0700, Brian Norris wrote:
> On Mon, Jun 01, 2015 at 11:10:50PM +0200, Richard Weinberger wrote:
> > kasprintf() used in get_partition_name() does a dynamic
> > memory allocation and can fail. We have to handle that case.
[]
> > diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
[]
> > @@ -743,6 +743,11 @@ static int init_nandsim(struct mtd_info *mtd)
> > goto error;
> > }
> > ns->partitions[i].name   = get_partition_name(i);
> > +   if (!ns->partitions[i].name) {
> > +   NS_ERR("unable to allocate memory.\n");
> 
> Probably don't really need the allocation failure messages. But this
> matches the current style, so we can just rip the messages out at
> another time.

Maybe that other time can use the more typical
pr_ mechanisms instead of NS_ too.

As far as I can tell, the only thing that the
NS_ macros do is prefix "error: " and
"warning: " to the output.

"[nandsim] " could be added via pr_fmt and it
could be changed to "nandsim: " for commonality
with the majority of the kernel logging.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESUBMIT Patch 1/1] net: replace if()/BUG with BUG_ON

2015-06-16 Thread Maninder Singh
Use BUG_ON(condition) instead of if(condition)/BUG() .

Signed-off-by: Maninder Singh 
Reviewed-by: Akhilesh Kumar 
---
 net/packet/af_packet.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5989c6..c91d405 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -547,8 +547,7 @@ static void prb_setup_retire_blk_timer(struct packet_sock 
*po, int tx_ring)
 {
struct tpacket_kbdq_core *pkc;
 
-   if (tx_ring)
-   BUG();
+   BUG_ON(tx_ring);
 
pkc = tx_ring ? GET_PBDQC_FROM_RB(>tx_ring) :
GET_PBDQC_FROM_RB(>rx_ring);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC NEXT] mm: Fix suspicious RCU usage at kernel/sched/core.c:7318

2015-06-16 Thread Martin KaFai Lau
On Tue, Jun 16, 2015 at 09:43:28PM -0500, Larry Finger wrote:
> On 06/16/2015 04:07 PM, Martin KaFai Lau wrote:
> >On Mon, Jun 15, 2015 at 04:25:18PM -0500, Larry Finger wrote:
> >>Additional backtrace lines are truncated. In addition, the above splat is
> >>followed by several "BUG: sleeping function called from invalid context
> >>at mm/slub.c:1268" outputs. As suggested by Martin KaFai Lau, these are the
> >>clue to the fix. Routine kmemleak_alloc_percpu() always uses GFP_KERNEL
> >>for its allocations, whereas it should use the value input to pcpu_alloc().
> >Just a minor nit, 'kmemleak_alloc_percpu() should follow the gfp from
> >per_alloc()' may be a more accurate title to describe the patch.
> 
> Do you mean that the subject should be changed?
Yes

A typo in my last response: s/per_alloc/pcpu_alloc/

--Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: dts: symlink cros-ec-keyboard from arm to arm64

2015-06-16 Thread Daniel Kurtz
On Tue, Jun 16, 2015 at 10:55 PM, Will Deacon  wrote:
> On Tue, Jun 16, 2015 at 03:35:41PM +0100, Daniel Kurtz wrote:
>> The cros-ec-keyboard.dtsi snippet is useful for both arm and arm64 boards.
>> Create a link between the two.
>>
>> This may not be the most scalable solution, so consider it temporary until
>> we find a more central repository for such shared .dtsi snippets.
>
> I don't have strong opinions either way, but we should be consistent as
> to whether we use relative paths:
>
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-June/350267.html
>
> or symlinks, like below.
>
> Does anybody have some technical arguments one way or the other?

 /include/ "../../../../arm/boot/dts/cros-ec-keyboard.dtsi"

Directly embedding the relative path to .dtsi in /arm/ from .dts in
/arm64/ is more straightforward and does not require maintaining extra
symlink .dtsi files.

I actually like it better than the symlink that I proposed :-).
Consider my patch abandoned, unless someone else feels strongly.

-Dan

>
> Will
>
>> Signed-off-by: Daniel Kurtz 
>> ---
>>  arch/arm64/boot/dts/cros-ec-keyboard.dtsi | 1 +
>>  1 file changed, 1 insertion(+)
>>  create mode 12 arch/arm64/boot/dts/cros-ec-keyboard.dtsi
>>
>> diff --git a/arch/arm64/boot/dts/cros-ec-keyboard.dtsi 
>> b/arch/arm64/boot/dts/cros-ec-keyboard.dtsi
>> new file mode 12
>> index 000..42220ac
>> --- /dev/null
>> +++ b/arch/arm64/boot/dts/cros-ec-keyboard.dtsi
>> @@ -0,0 +1 @@
>> +../../../arm/boot/dts/cros-ec-keyboard.dtsi
>> \ No newline at end of file
>> --
>> 2.2.0.rc0.207.ga3a616c
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible broken MM code in dell-laptop.c?

2015-06-16 Thread Darren Hart
On Tue, Jun 16, 2015 at 09:15:23AM +0200, Pali Rohár wrote:
> On Tuesday 16 June 2015 08:33:46 Michal Hocko wrote:
> > On Mon 15-06-15 23:27:59, Pali Rohár wrote:
> > > On Monday 15 June 2015 23:18:16 Michal Hocko wrote:
> > > > On Sun 14-06-15 11:05:07, Pali Rohár wrote:
> > > > > Hello,
> > > > > 
> > > > > in drivers/platform/x86/dell-laptop.c is this part of code:
> > > > > 
> > > > > static int __init dell_init(void)
> > > > > {
> > > > > ...
> > > > > 
> > > > >   /*
> > > > >   
> > > > >* Allocate buffer below 4GB for SMI data--only 32-bit physical
> > > > >addr * is passed to SMI handler.
> > > > >*/
> > > > >   
> > > > >   bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> > > > 
> > > > [...]
> > > > 
> > > > >   buffer = page_address(bufferpage);
> > > > 
> > > > [...]
> > > > 
> > > > > fail_rfkill:
> > > > >   free_page((unsigned long)bufferpage);
> > > > 
> > > > This one should be __free_page because it consumes struct page* and
> > > > it is the proper counter part for alloc_page. free_page, just to
> > > > make it confusing, consumes an address which has to be translated to
> > > > a struct page.
> > > > 
> > > > I have no idea why the API has been done this way and yeah, it is
> > > > really confusing.
> > > > 
> > > > [...]
> > > > 
> > > > > static void __exit dell_exit(void)
> > > > > {
> > > > > ...
> > > > > 
> > > > >   free_page((unsigned long)buffer);
> > > 
> > > So both, either:
> > > 
> > >  free_page((unsigned long)buffer);
> > > 
> > > or
> > > 
> > >  __free_page(bufferpage);
> > > 
> > > is correct?
> > 
> > Yes. Although I would use __free_page variant as both seem to be
> > globally visible.
> > 

Michal - thanks for the context.

I'm surprised by your recommendation to use __free_page() out here in platform
driver land.

I'd also prefer that the driver consistently free the same address to avoid
confusion.

For these reasons, free_page((unsigned long)buffer) seems like the better
option.

Can you elaborate on why you feel __free_page() is a better choice?

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf tools: Fix a problem when opening old perf.data with different byte order

2015-06-16 Thread Wang Nan
Following error occurs when trying to use 'perf report' on x86_64 to
cross analysis a perf.data generated by an old perf on a big-endian
machine:

 # perf report
 *** Error in `/home/w00229757/perf': free(): invalid next size (fast): 
0x032c99f0 ***
 === Backtrace: =
 /lib64/libc.so.6(+0x6eeef)[0x7ff6ff7e2eef]
 /lib64/libc.so.6(+0x78cae)[0x7ff6ff7eccae]
 /lib64/libc.so.6(+0x79987)[0x7ff6ff7ed987]
 /path/to/perf[0x4ac734]
 /path/to/perf[0x4ac829]
 /path/to/perf(perf_header__process_sections+0x129)[0x4ad2c9]
 /path/to/perf(perf_session__read_header+0x2e1)[0x4ad9e1]
 /path/to/perf(perf_session__new+0x168)[0x4bd458]
 /path/to/perf(cmd_report+0xfa0)[0x43eb70]
 /path/to/perf[0x47adc3]
 /path/to/perf(main+0x5f6)[0x42fd06]
 /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff6ff795bd5]
 /path/to/perf[0x42fe35]
 === Memory map: 
 [SNIP]

The bug is in perf_event__attr_swap(). It swaps all fields in
'struct perf_event_attr' without checking whether the swapped field
exist or not. In addition, in read_event_desc() allocs memory for attr
according to size read from perf.data. Therefore, if the perf.data is
collected by an old perf (without aux_watermark, for example),
when perf_event__attr_swap() swaping attr->aux_watermark it destroy
malloc's metadata.

This patch introduces boundary checking in perf_event__attr_swap(). It
adds macros bswap_field_64 and bswap_field_32 into
perf_event__attr_swap() to make it only swap exist fields.

Signed-off-by: Wang Nan 
---
 tools/perf/util/session.c | 40 ++--
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 88d87bf..d51ba31 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -517,20 +517,32 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
 {
attr->type  = bswap_32(attr->type);
attr->size  = bswap_32(attr->size);
-   attr->config= bswap_64(attr->config);
-   attr->sample_period = bswap_64(attr->sample_period);
-   attr->sample_type   = bswap_64(attr->sample_type);
-   attr->read_format   = bswap_64(attr->read_format);
-   attr->wakeup_events = bswap_32(attr->wakeup_events);
-   attr->bp_type   = bswap_32(attr->bp_type);
-   attr->bp_addr   = bswap_64(attr->bp_addr);
-   attr->bp_len= bswap_64(attr->bp_len);
-   attr->branch_sample_type = bswap_64(attr->branch_sample_type);
-   attr->sample_regs_user   = bswap_64(attr->sample_regs_user);
-   attr->sample_stack_user  = bswap_32(attr->sample_stack_user);
-   attr->aux_watermark  = bswap_32(attr->aux_watermark);
-
-   swap_bitfield((u8 *) (>read_format + 1), sizeof(u64));
+#define bswap_safe(f) \
+   (attr->size > offsetof(struct perf_event_attr, f))
+#define bswap_field(f, sz) \
+do {   \
+   if (bswap_safe(f))  \
+   attr->f = bswap_##sz(attr->f);  \
+} while(0)
+#define bswap_field_32(f) bswap_field(f, 32)
+#define bswap_field_64(f) bswap_field(f, 64)
+
+   bswap_field_64(config);
+   bswap_field_64(sample_period);
+   bswap_field_64(sample_type);
+   bswap_field_64(read_format);
+   bswap_field_32(wakeup_events);
+   bswap_field_32(bp_type);
+   bswap_field_64(bp_addr);
+   bswap_field_64(bp_len);
+   bswap_field_64(branch_sample_type);
+   bswap_field_64(sample_regs_user);
+   bswap_field_32(sample_stack_user);
+   bswap_field_32(aux_watermark);
+
+   if (bswap_safe(read_format))
+   swap_bitfield((u8 *) (>read_format + 1),
+ sizeof(u64));
 }
 
 static void perf_event__hdr_attr_swap(union perf_event *event,
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] cpu-hotadd-patch-liujiang

2015-06-16 Thread gongzg
From: songxiumiao 

---
 arch/x86/kernel/acpi/boot.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index dbe76a1..b8f93b3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -733,6 +733,10 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 
nid = acpi_get_node(handle);
if (nid != -1) {
+   if (try_online_node(nid)) {
+   pr_warn("failed to online node%d for CPU%d, use node%d 
instead.\n",nid, cpu, first_node(node_online_map));
+   nid = first_node(node_online_map);
+   }
set_apicid_to_node(physid, nid);
numa_set_node(cpu, nid);
}
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] SLIMbus: Device management on SLIMbus

2015-06-16 Thread Joe Perches
On Tue, 2015-06-16 at 19:45 -0600, Sagar Dharia wrote:
> SLIMbus (Serial Low Power Interchip Media Bus) is a specification
> developed by MIPI (Mobile Industry Processor Interface) alliance.
> SLIMbus is a 2-wire implementation, which is used to communicate with
> peripheral components like audio-codec.
[]
> diff --git a/drivers/slimbus/slimbus.c b/drivers/slimbus/slimbus.c
[]
> +static bool slim_eaddr_equal(struct slim_eaddr *a, struct slim_eaddr *b)
> +{
> + return (a->manf_id == b->manf_id &&
> + a->prod_code == b->prod_code &&
> + a->dev_index == b->dev_index &&
> + a->instance == b->instance);
> +}
> +
> +static const struct slim_device_id *
> +slim_match(const struct slim_device_id *id, const struct slim_device 
> *slim_dev)
> +{
> + while (id->manf_id != 0 || id->prod_code != 0) {
> + if (id->manf_id == slim_dev->e_addr.manf_id &&
> + id->prod_code == slim_dev->e_addr.prod_code &&
> + id->dev_index == slim_dev->e_addr.dev_index)
> + return id;
> + id++;
> + }
> + return NULL;
> +}
> +
> +static int slim_device_match(struct device *dev, struct device_driver 
> *driver)
> +{
> + struct slim_device *slim_dev;
> + struct slim_driver *drv = to_slim_driver(driver);
> +
> + if (dev->type != _dev_type)
> + return 0;
> +
> + slim_dev = to_slim_device(dev);
> + if (drv->id_table)
> + return slim_match(drv->id_table, slim_dev) != NULL;
> + return 0;
> +}

This should probably be a bool function return.

Maybe this:

static bool slim_device_match(struct device *dev, struct device_driver *driver)
{
struct slim_driver *drv = to_slim_driver(driver);

if (dev->type != _dev_type || !drv->id_table)
return false;

return slim_match(drv->id_table, to_slim_device(dev));
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Jun 17

2015-06-16 Thread Michael Ellerman
Hi all,

Changes since 20150615:

The rdma tree gained a conflict against the nfs tree.

The rdma tree gained a conflict against the net-next tree.

The tip tree gained a conflict against the iommu tree.

The tip tree gained a conflict against the crypto tree.

The akpm-current tree gained a conflict against the vfs tree.

Non-merge commits (relative to Linus' tree): 11023
 9537 files changed, 1021004 insertions(+), 233731 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 219 trees (counting Linus' and 31 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.


cheers


$ git checkout master
$ git reset --hard stable
Merging origin/master (0f57d86787d8 Linux 4.1-rc8)
Merging fixes/master (b94d525e58dc Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging kbuild-current/rc-fixes (c517d838eb7d Linux 4.0-rc1)
Merging arc-current/for-curr (e4140819dadc ARC: signal handling robustify)
Merging arm-current/fixes (0bbe6b5a73c0 ARM: 8388/1: tcm: Don't crash when TCM 
banks are protected by TrustZone)
Merging m68k-current/for-linus (b24f670b7f5b m68k/mac: Fix out-of-bounds array 
index in OSS IRQ source initialization)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-merge-mpe/fixes (c65b99f04684 Linux 4.1-rc6)
Merging powerpc-merge/merge (c517d838eb7d Linux 4.0-rc1)
Merging sparc/master (c46a024ea5eb Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (ac0a72a3e6e8 net/mlx4_core: Disable Granular QoS per VF 
under IB/Eth VPI configuration)
Merging ipsec/master (31a418986a58 xen: netback: read hotplug script once at 
start of day.)
Merging sound-current/for-linus (145c0e914d2c ALSA: hda - Fix unused label 
skip_i915)
Merging pci-current/for-linus (552bc94ebeeb PCI: Preserve resource size during 
alignment reordering)
Merging wireless-drivers/master (38fe44e61a89 Merge tag 
'iwlwifi-for-kalle-2015-05-28' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging driver-core.current/driver-core-linus (d4a4f75cd8f2 Linux 4.1-rc7)
Merging tty.current/tty-linus (d4a4f75cd8f2 Linux 4.1-rc7)
Merging usb.current/usb-linus (d4a4f75cd8f2 Linux 4.1-rc7)
Merging usb-gadget-fixes/fixes (c94e289f195e usb: gadget: remove incorrect 
__init/__exit annotations)
Merging usb-serial-fixes/usb-linus (df72d588c54d USB: cp210x: add ID for HubZ 
dual ZigBee and Z-Wave dongle)
Merging staging.current/staging-linus (d4a4f75cd8f2 Linux 4.1-rc7)
Merging char-misc.current/char-misc-linus (e26081808eda Linux 4.1-rc4)
Merging input-current/for-linus (7f2ca8b55aef Input: synaptics - add min/max 
quirk for Lenovo S540)
Merging crypto-current/master (412c98c1bef6 crypto: caam - fix RNG buffer cache 
alignment)
Merging ide/master (d681f1166919 ide: remove deprecated use of pci api)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (f36963c9d3f6 cpumask_set_cpu_local_first => 
cpumask_local_spread, lament)
Merging vfio-fixes/for-linus (db7d4d7f4021 vfio: Fix runaway interruptible 
timeout)
Merging kselftest-fixes/fixes (ba155e2d21f6 Linux 4.1-rc5)
Merging backlight-fixes/for-backlight-fixes (68feaca0b13e backlight: pwm: 
Handle EPROBE_DEFER while requesting the PWM)
Merging drm-intel-fixes/for-linux-next-fixes (245ec9d85696 Revert "drm/i915: 
Don't skip request retirement if the active list is empty")
Merging asm-generic/master (643165c8bbc8 Merge tag 'uaccess_for_upstream' of 

Re: kexec_load(2) bypasses signature verification

2015-06-16 Thread Theodore Ts'o
On Tue, Jun 16, 2015 at 04:27:57PM -0400, Vivek Goyal wrote:
> 
> So looks like you are looking for a system/option where you just want to
> always make use of kexec_file_load() and disable kexec_load(). This sounds
> like you want a kernel where kexec_load() is compiled out and you want
> only kexec_file_load() in.

Either compiled out or disabled via some flag (similar to how signed
moduled verification can be required via a flag that can be set, but
not unset once it is set), yes.

> Right now one can't do that becase kexec_file_load() depends on
> CONFIG_KEXEC option.
> 
> I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
> of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
> only signed kernel can be kexeced on that system.

That would certianly also be a workable strategy.

> This should gel well with long term strategy of deprecating kexec_load()
> at some point of time when kexec_file_load() is ready to completely
> replace it.

Well, note that Debian and Ubuntu are still using kexec-tools 2.0.7
(even in their latest development/unstable releases), which doesn't
have support for kexec_file_load().  So we need to get Debian to
upgrade its kexec-tools as part of this.  I'll try to file a
nag-o-gram to the Debian BTS.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the akpm-current tree with the vfs tree

2015-06-16 Thread Michael Ellerman
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  fs/ufs/namei.c

between commit:

  a50e4a02ad69 "ufs: don't bother with lock_ufs()/unlock_ufs() for directory 
access"

from the vfs tree and commit:

  2ce1efe3965b "fs/ufs: revert "ufs: fix deadlocks introduced by sb mutex merge"

from the akpm-current tree.

I fixed it up by taking the vfs version and can carry the fix as necessary (no
action is required).

cheers



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the tip tree with the crypto tree

2015-06-16 Thread Michael Ellerman
Hi all,

Today's linux-next merge of the tip tree got conflicts in:

  drivers/crypto/vmx/aes.c
  drivers/crypto/vmx/aes_cbc.c
  drivers/crypto/vmx/ghash.c

between commit:

  4beb10604597 "crypto: vmx - Reindent to kernel style"

from the crypto tree and commit:

  5f76eea88dcb "sched/preempt, powerpc: Disable preemption in 
enable_kernel_altivec() explicitly"

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

cheers


diff --cc drivers/crypto/vmx/aes.c
index 023e5f014783,a9064e36e7b5..
--- a/drivers/crypto/vmx/aes.c
+++ b/drivers/crypto/vmx/aes.c
@@@ -76,47 -73,53 +76,53 @@@ static void p8_aes_exit(struct crypto_t
  }
  
  static int p8_aes_setkey(struct crypto_tfm *tfm, const u8 *key,
 -unsigned int keylen)
 +   unsigned int keylen)
  {
 -int ret;
 -struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 -
 -preempt_disable();
 -pagefault_disable();
 -enable_kernel_altivec();
 -ret = aes_p8_set_encrypt_key(key, keylen * 8, >enc_key);
 -ret += aes_p8_set_decrypt_key(key, keylen * 8, >dec_key);
 -pagefault_enable();
 -preempt_enable();
 -
 -ret += crypto_cipher_setkey(ctx->fallback, key, keylen);
 -return ret;
 +  int ret;
 +  struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 +
++  preempt_disable();
 +  pagefault_disable();
 +  enable_kernel_altivec();
 +  ret = aes_p8_set_encrypt_key(key, keylen * 8, >enc_key);
 +  ret += aes_p8_set_decrypt_key(key, keylen * 8, >dec_key);
 +  pagefault_enable();
++  preempt_enable();
 +
 +  ret += crypto_cipher_setkey(ctx->fallback, key, keylen);
 +  return ret;
  }
  
  static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
  {
 -struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 -
 -if (in_interrupt()) {
 -crypto_cipher_encrypt_one(ctx->fallback, dst, src);
 -} else {
 -  preempt_disable();
 -pagefault_disable();
 -enable_kernel_altivec();
 -aes_p8_encrypt(src, dst, >enc_key);
 -pagefault_enable();
 -  preempt_enable();
 -}
 +  struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 +
 +  if (in_interrupt()) {
 +  crypto_cipher_encrypt_one(ctx->fallback, dst, src);
 +  } else {
++  preempt_disable();
 +  pagefault_disable();
 +  enable_kernel_altivec();
 +  aes_p8_encrypt(src, dst, >enc_key);
 +  pagefault_enable();
++  preempt_enable();
 +  }
  }
  
  static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
  {
 -struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 -
 -if (in_interrupt()) {
 -crypto_cipher_decrypt_one(ctx->fallback, dst, src);
 -} else {
 -  preempt_disable();
 -pagefault_disable();
 -enable_kernel_altivec();
 -aes_p8_decrypt(src, dst, >dec_key);
 -pagefault_enable();
 -  preempt_enable();
 -}
 +  struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
 +
 +  if (in_interrupt()) {
 +  crypto_cipher_decrypt_one(ctx->fallback, dst, src);
 +  } else {
++  preempt_disable();
 +  pagefault_disable();
 +  enable_kernel_altivec();
 +  aes_p8_decrypt(src, dst, >dec_key);
 +  pagefault_enable();
++  preempt_enable();
 +  }
  }
  
  struct crypto_alg p8_aes_alg = {
diff --cc drivers/crypto/vmx/aes_cbc.c
index 7120ab24d8c6,477284abdd11..
--- a/drivers/crypto/vmx/aes_cbc.c
+++ b/drivers/crypto/vmx/aes_cbc.c
@@@ -77,95 -74,95 +77,101 @@@ static void p8_aes_cbc_exit(struct cryp
  }
  
  static int p8_aes_cbc_setkey(struct crypto_tfm *tfm, const u8 *key,
 -unsigned int keylen)
 +   unsigned int keylen)
  {
 -int ret;
 -struct p8_aes_cbc_ctx *ctx = crypto_tfm_ctx(tfm);
 -
 -preempt_disable();
 -pagefault_disable();
 -enable_kernel_altivec();
 -ret = aes_p8_set_encrypt_key(key, keylen * 8, >enc_key);
 -ret += aes_p8_set_decrypt_key(key, keylen * 8, >dec_key);
 -pagefault_enable();
 -preempt_enable();
 -
 -ret += crypto_blkcipher_setkey(ctx->fallback, key, keylen);
 -return ret;
 +  int ret;
 +  struct p8_aes_cbc_ctx *ctx = crypto_tfm_ctx(tfm);
 +
++  preempt_disable();
 +  pagefault_disable();
 +  enable_kernel_altivec();
 +  ret = aes_p8_set_encrypt_key(key, keylen * 8, >enc_key);
 +  ret += aes_p8_set_decrypt_key(key, keylen * 8, >dec_key);
 +  pagefault_enable();
++  preempt_enable();
 +
 +  ret += crypto_blkcipher_setkey(ctx->fallback, key, keylen);
 +  return ret;
  }
  
  static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
 -struct scatterlist *dst, struct scatterlist *src,
 -unsigned int nbytes)
 +struct scatterlist *dst,
 +struct scatterlist *src, unsigned int 

linux-next: manual merge of the rdma tree with the net-next tree

2015-06-16 Thread Michael Ellerman
Hi Doug,

Today's linux-next merge of the rdma tree got a conflict in:

  drivers/infiniband/hw/mlx4/mad.c

between commit:

  7193a141eb74 "IB/mlx4: Set VF to read from QP counters"

from the net-next tree and commit:

  4cd7c9479aff "IB/mad: Add support for additional MAD info to/from drivers"

from the rdma tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

cheers


diff --cc drivers/infiniband/hw/mlx4/mad.c
index bc09b4e1f57c,3e2dee46caa2..
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@@ -817,12 -827,14 +819,12 @@@ static void edit_counter(struct mlx4_co
  }
  
  static int iboe_process_mad(struct ib_device *ibdev, int mad_flags, u8 
port_num,
-   struct ib_wc *in_wc, struct ib_grh *in_grh,
-   struct ib_mad *in_mad, struct ib_mad *out_mad)
+   const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+   const struct ib_mad *in_mad, struct ib_mad *out_mad)
  {
 -  struct mlx4_cmd_mailbox *mailbox;
 +  struct mlx4_counter counter_stats;
struct mlx4_ib_dev *dev = to_mdev(ibdev);
int err;
 -  u32 inmod = dev->counters[port_num - 1] & 0x;
 -  u8 mode;
  
if (in_mad->mad_hdr.mgmt_class != IB_MGMT_CLASS_PERF_MGMT)
return -EINVAL;
@@@ -850,15 -868,21 +852,23 @@@
  }
  
  int mlx4_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
-   struct ib_wc *in_wc, struct ib_grh *in_grh,
-   struct ib_mad *in_mad, struct ib_mad *out_mad)
+   const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+   const struct ib_mad_hdr *in, size_t in_mad_size,
+   struct ib_mad_hdr *out, size_t *out_mad_size,
+   u16 *out_mad_pkey_index)
  {
 +  struct mlx4_ib_dev *dev = to_mdev(ibdev);
+   const struct ib_mad *in_mad = (const struct ib_mad *)in;
+   struct ib_mad *out_mad = (struct ib_mad *)out;
+ 
+   BUG_ON(in_mad_size != sizeof(*in_mad) ||
+  *out_mad_size != sizeof(*out_mad));
+ 
switch (rdma_port_get_link_layer(ibdev, port_num)) {
case IB_LINK_LAYER_INFINIBAND:
 -  return ib_process_mad(ibdev, mad_flags, port_num, in_wc,
 -in_grh, in_mad, out_mad);
 +  if (!mlx4_is_slave(dev->dev))
 +  return ib_process_mad(ibdev, mad_flags, port_num, in_wc,
 +in_grh, in_mad, out_mad);
case IB_LINK_LAYER_ETHERNET:
return iboe_process_mad(ibdev, mad_flags, port_num, in_wc,
  in_grh, in_mad, out_mad);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/3] mm: make swapin readahead to improve thp collapse rate

2015-06-16 Thread Rik van Riel
On 06/16/2015 05:15 PM, Andrew Morton wrote:
> On Sun, 14 Jun 2015 18:04:43 +0300 Ebru Akagunduz  
> wrote:
> 
>> This patch makes swapin readahead to improve thp collapse rate.
>> When khugepaged scanned pages, there can be a few of the pages
>> in swap area.
>>
>> With the patch THP can collapse 4kB pages into a THP when
>> there are up to max_ptes_swap swap ptes in a 2MB range.
>>
>> The patch was tested with a test program that allocates
>> 800MB of memory, writes to it, and then sleeps. I force
>> the system to swap out all. Afterwards, the test program
>> touches the area by writing, it skips a page in each
>> 20 pages of the area.
>>
>> Without the patch, system did not swap in readahead.
>> THP rate was %47 of the program of the memory, it
>> did not change over time.
>>
>> With this patch, after 10 minutes of waiting khugepaged had
>> collapsed %99 of the program's memory.
>>
>> ...
>>
>> +/*
>> + * Bring missing pages in from swap, to complete THP collapse.
>> + * Only done if khugepaged_scan_pmd believes it is worthwhile.
>> + *
>> + * Called and returns without pte mapped or spinlocks held,
>> + * but with mmap_sem held to protect against vma changes.
>> + */
>> +
>> +static void __collapse_huge_page_swapin(struct mm_struct *mm,
>> +struct vm_area_struct *vma,
>> +unsigned long address, pmd_t *pmd,
>> +pte_t *pte)
>> +{
>> +unsigned long _address;
>> +pte_t pteval = *pte;
>> +int swap_pte = 0;
>> +
>> +pte = pte_offset_map(pmd, address);
>> +for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
>> + pte++, _address += PAGE_SIZE) {
>> +pteval = *pte;
>> +if (is_swap_pte(pteval)) {
>> +swap_pte++;
>> +do_swap_page(mm, vma, _address, pte, pmd, 0x0, pteval);
>> +/* pte is unmapped now, we need to map it */
>> +pte = pte_offset_map(pmd, _address);
>> +}
>> +}
>> +pte--;
>> +pte_unmap(pte);
>> +trace_mm_collapse_huge_page_swapin(mm, vma->vm_start, swap_pte);
>> +}
> 
> This is doing a series of synchronous reads.  That will be slw on
> spinning disks.
>
> This function should be significantly faster if it first gets all the
> necessary I/O underway.  I don't think we have a function which exactly
> does this.  Perhaps generalise swapin_readahead() or open-code
> something like

Looking at do_swap_page() and __lock_page_or_retry(), I guess
there already is a way to do the above.

Passing a "flags" of FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT
to do_swap_page() should result in do_swap_page() returning with
the pte unmapped and the mmap_sem still held if the page was not
immediately available to map into the pte (trylock_page succeeds).

Ebru, can you try passing the above as the flags argument to
do_swap_page(), and see what happens?

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the rdma tree with the nfs tree

2015-06-16 Thread Michael Ellerman
Hi Doug,

Today's linux-next merge of the rdma tree got a conflict in:

  net/sunrpc/xprtrdma/verbs.c

between commit:

  89e0d11258e9 "xprtrdma: Use ib_device pointer safely"

from the nfs tree and commit:

  8e37210b38fb "IB/core: Change ib_create_cq to use struct ib_cq_init_attr"

from the rdma tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

cheers


diff --cc net/sunrpc/xprtrdma/verbs.c
index 234083560d0e,52df265b472a..
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@@ -763,9 -692,9 +684,9 @@@ rpcrdma_ep_create(struct rpcrdma_ep *ep
init_waitqueue_head(>rep_connect_wait);
INIT_DELAYED_WORK(>rep_connect_worker, rpcrdma_connect_worker);
  
+   cq_attr.cqe = ep->rep_attr.cap.max_send_wr + 1;
 -  sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
 -rpcrdma_cq_async_error_upcall, ep, _attr);
 +  sendcq = ib_create_cq(ia->ri_device, rpcrdma_sendcq_upcall,
- rpcrdma_cq_async_error_upcall, ep,
- ep->rep_attr.cap.max_send_wr + 1, 0);
++rpcrdma_cq_async_error_upcall, ep, _attr);
if (IS_ERR(sendcq)) {
rc = PTR_ERR(sendcq);
dprintk("RPC:   %s: failed to create send CQ: %i\n",
@@@ -780,9 -709,9 +701,9 @@@
goto out2;
}
  
+   cq_attr.cqe = ep->rep_attr.cap.max_recv_wr + 1;
 -  recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
 -rpcrdma_cq_async_error_upcall, ep, _attr);
 +  recvcq = ib_create_cq(ia->ri_device, rpcrdma_recvcq_upcall,
- rpcrdma_cq_async_error_upcall, ep,
- ep->rep_attr.cap.max_recv_wr + 1, 0);
++rpcrdma_cq_async_error_upcall, ep, _attr);
if (IS_ERR(recvcq)) {
rc = PTR_ERR(recvcq);
dprintk("RPC:   %s: failed to create recv CQ: %i\n",



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the rdma tree with the nfs tree

2015-06-16 Thread Michael Ellerman
Hi Doug,

Today's linux-next merge of the rdma tree got a conflict in:

  net/sunrpc/xprtrdma/frwr_ops.c

between commit:

  951e721ca0d6 "xprtrdma: Introduce an FRMR recovery workqueue"

from the nfs tree and commit:

  76357c715f32 "xprtrdma, svcrdma: Switch to generic logging helpers"

from the rdma tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

cheers

diff --cc net/sunrpc/xprtrdma/frwr_ops.c
index 661fbc1784ab,d234521320a4..
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@@ -252,8 -128,8 +252,8 @@@ frwr_sendcompletion(struct ib_wc *wc
  
/* WARNING: Only wr_id and status are reliable at this point */
r = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
-   pr_warn("RPC:   %s: frmr %p flushed, status %d\n",
-   __func__, r, wc->status);
 -  dprintk("RPC:   %s: frmr %p (stale), status %s (%d)\n",
++  pr_warn("RPC:   %s: frmr %p flushed, status %s (%d)\n",
+   __func__, r, ib_wc_status_msg(wc->status), wc->status);
r->r.frmr.fr_state = FRMR_IS_STALE;
  }
  





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] writeback, blkio: add documentation for cgroup writeback support

2015-06-16 Thread Theodore Ts'o
On Tue, Jun 16, 2015 at 05:54:36PM -0400, Tejun Heo wrote:
> Hello, Ted.
> 
> On Mon, Jun 15, 2015 at 07:35:19PM -0400, Theodore Ts'o wrote:
> > So if there is some way we can signal to any cgroup that that might be
> > throttling writeback or disk I/O that the jbd/jbd2 process should be
> > considered privileged, that would be a good since it would allow us to
> > avoid a potential priority inversion problem. 
> 
> I see.  In the long term, I think we might need to come up with a way
> to overcharge a slower cgroup to avoid blocking faster ones for cases
> where some IOs are depended upon by more than one cgroups.  That'd
> take quite a bit of work from blkcg side.  Will think more about it.

Hmm, while we're at it, there's another priority inversion that can be
painful.  If a block directory has been pushed out of memory (possibly
because it was initially accessed by a cgroup with a very tiny amount
of memory allocated to its cgroup) and a process with a cgroup tries
to do a lookup in that directory, it will issue the read with such a
tightly constrained disk time that it might take minutes for the read
to complete.  The problem is that the VFS has locked the directory's
i_mutex *before* calling ext4_lookup().

If a high priority process then tries to read the same directory, or
in fact any VFS operation which requires taking the directory's
i_mutex first, including renaming the directory, the high priority
process will end up blocking until the read is completed --- which can
be minutes if the low priority process has a tiny amount of disk time
allocated to it.

There is a related problem where if a read for a particular block is
issued with a very low amount of amount of disk time, and that same
block is required by a high priority process, we can also get hit with
a very similar priority inversion problem.

To date the answer has always been, "Doctor, Doctor it hurts when I do
that"  The only way I can think of fixing the directory mutex
problem is by returning an error code to the VFS layer which instructs
it to unlock the directory, and then have it wait on some wait channel
so it ends up calling the lookup after the directory block has been
read into memory (and we can hope that due to a tight memory cgroup
the block doesn't end up getting ejected from memory right away).

As another solution for another part of the problem, if a high
priority process attempts a read and the I/O is already queued up, but
it's at the back of the bus because it was originally posted by a low
priority cgroup, the rest of the fix would be to elevate the priority
of said I/O request and then resort the queue.

As far as the filemap_fdatawait() call is concerned, if it's being
called by fsync() run by a low priority process, or from the writeback
thread, then it can certainly take place at a low prority.  But if the
filemap_fdatawait() is being done by a high priority process, such as
a jbd/jbd2 thread, then there needs to be a way that we can set a flag
in the wbc structure indicating that the writes should be submitted as
if it was issued from the kernel thread, and not based on who
originally dirtied the page.

It's going to be a number of point solutions, which is a bit ugly, but
I think that is much more likely to be successful than trying to
implement, say, a generalized priority inheritance scheme for block
I/O requests and related locks.   :-)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] storage:rbd: make the size of request is equal to the, size of the object

2015-06-16 Thread juncheng bai



On 2015/6/16 23:51, Ilya Dryomov wrote:

On Tue, Jun 16, 2015 at 5:14 PM, juncheng bai
 wrote:



On 2015/6/16 21:30, Ilya Dryomov wrote:


On Tue, Jun 16, 2015 at 2:57 PM, juncheng bai
 wrote:




On 2015/6/16 16:37, Ilya Dryomov wrote:



On Tue, Jun 16, 2015 at 6:28 AM, juncheng bai
 wrote:





On 2015/6/15 22:27, Ilya Dryomov wrote:




On Mon, Jun 15, 2015 at 4:23 PM, juncheng bai
 wrote:






On 2015/6/15 21:03, Ilya Dryomov wrote:





On Mon, Jun 15, 2015 at 2:18 PM, juncheng bai
 wrote:





 From 6213215bd19926d1063d4e01a248107dab8a899b Mon Sep 17
00:00:00
2001
From: juncheng bai 
Date: Mon, 15 Jun 2015 18:34:00 +0800
Subject: [PATCH] storage:rbd: make the size of request is equal to
the
  size of the object

ensures that the merged size of request can achieve the size of
the object.
when merge a bio to request or merge a request to request, the
sum of the segment number of the current request and the segment
number of the bio is not greater than the max segments of the
request,
so the max size of request is 512k if the max segments of request
is
BLK_MAX_SEGMENTS.

Signed-off-by: juncheng bai 
---
  drivers/block/rbd.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 0a54c58..dec6045 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3757,6 +3757,8 @@ static int rbd_init_disk(struct rbd_device
*rbd_dev)
 segment_size = rbd_obj_bytes(_dev->header);
 blk_queue_max_hw_sectors(q, segment_size /
SECTOR_SIZE);
 blk_queue_max_segment_size(q, segment_size);
+   if (segment_size > BLK_MAX_SEGMENTS * PAGE_SIZE)
+   blk_queue_max_segments(q, segment_size /
PAGE_SIZE);
 blk_queue_io_min(q, segment_size);
 blk_queue_io_opt(q, segment_size);






I made a similar patch on Friday, investigating blk-mq plugging
issue
reported by Nick.  My patch sets it to BIO_MAX_PAGES unconditionally
-
AFAIU there is no point in setting to anything bigger since the bios
will be clipped to that number of vecs.  Given that BIO_MAX_PAGES is
256, this gives is 1M direct I/Os.





Hi. For signal bio, the max number of bio_vec is BIO_MAX_PAGES, but a
request can be merged from multiple bios. We can see the below
function:
ll_back_merge_fn, ll_front_merge_fn and etc.
And I test in kernel 3.18 use this patch, and do:
echo 4096 > /sys/block/rbd0/queue/max_sectors_kb
We use systemtap to trace the request size, It is upto 4M.





Kernel 3.18 is pre rbd blk-mq transition, which happened in 4.0.  You
should test whatever patches you have with at least 4.0.

Putting that aside, I must be missing something.  You'll get 4M
requests on 3.18 both with your patch and without it, the only
difference would be the size of bios being merged - 512k vs 1M.  Can
you describe your test workload and provide before and after traces?


Hi. I update kernel version to 4.0.5. The test information as shown
below:
The base information:
03:28:13-root@server-186:~$uname -r
4.0.5

My simple systemtap script:
probe module("rbd").function("rbd_img_request_create")
{
   printf("offset:%lu length:%lu\n", ulong_arg(2), ulong_arg(3));
}

I use dd to execute the test case:
dd if=/dev/zero  of=/dev/rbd0 bs=4M count=1 oflag=direct

Case one: Without patch
03:30:23-root@server-186:~$cat /sys/block/rbd0/queue/max_sectors_kb
4096
03:30:35-root@server-186:~$cat /sys/block/rbd0/queue/max_segments
128

The output of systemtap for nornal data:
offset:0 length:524288
offset:524288 length:524288
offset:1048576 length:524288
offset:1572864 length:524288
offset:2097152 length:524288
offset:2621440 length:524288
offset:3145728 length:524288
offset:3670016 length:524288

Case two:With patch
cat /sys/block/rbd0/queue/max_sectors_kb
4096
03:49:14-root@server-186:linux-4.0.5$cat
/sys/block/rbd0/queue/max_segments
1024
The output of systemtap for nornal data:
offset:0 length:1048576
offset:1048576 length:1048576
offset:2097152 length:1048576
offset:3145728 length:1048576

According to the test, you are right.
Because the blk-mq doesn't use any scheduling policy.
03:52:13-root@server-186:linux-4.0.5$cat
/sys/block/rbd0/queue/scheduler
none

In previous versions of the kernel 4.0, the rbd use the defualt
scheduler:cfq

So, I think that the blk-mq need to do more?




There is no scheduler support in blk-mq as of now but your numbers
don't have anything to do with that.  The current behaviour is a result
of a bug in blk-mq.  It's fixed by [1], if you apply it you should see
4M requests with your stap script.

[1] http://article.gmane.org/gmane.linux.kernel/1941750


Hi.
First, Let's look at the result in the kernel version 3.18
The function blk_limits_max_hw_sectors different implemention between
3.18
and 4.0+. We need do:
echo 4094 >/sys/block/rbd0/queue/max_sectors_kb

The rbd device information:
11:13:18-root@server-186:~$cat /sys/block/rbd0/queue/max_sectors_kb
4094
11:15:28-root@server-186:~$cat 

Re: [PATCH RFC v7 2/3] crypto: rsa: add a new rsa generic implementation

2015-06-16 Thread Tadeusz Struk
On 06/16/2015 07:36 PM, Herbert Xu wrote:
> The existing crypto/asymmetric_key errno scheme doesn't really
> mesh in with the rest of crypto.  So you'll just have to pick one
> scheme and stick with it.
> 
> I don't really mind either way as long as the error codes are
> unique and meaningful.

So I would use -EINVAL since these are input parameters. Do you agree David?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC NEXT] mm: Fix suspicious RCU usage at kernel/sched/core.c:7318

2015-06-16 Thread Larry Finger

On 06/16/2015 04:07 PM, Martin KaFai Lau wrote:

On Mon, Jun 15, 2015 at 04:25:18PM -0500, Larry Finger wrote:

Additional backtrace lines are truncated. In addition, the above splat is
followed by several "BUG: sleeping function called from invalid context
at mm/slub.c:1268" outputs. As suggested by Martin KaFai Lau, these are the
clue to the fix. Routine kmemleak_alloc_percpu() always uses GFP_KERNEL
for its allocations, whereas it should use the value input to pcpu_alloc().

Just a minor nit, 'kmemleak_alloc_percpu() should follow the gfp from
per_alloc()' may be a more accurate title to describe the patch.


Do you mean that the subject should be changed?

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-gfx] [PATCH 6/8] pwm: crc: Add Crystalcove (CRC) PWM driver

2015-06-16 Thread Shobhit Kumar
On Wed, May 20, 2015 at 8:39 PM, Shobhit Kumar  wrote:
> On Thu, May 7, 2015 at 12:49 PM, Shobhit Kumar  wrote:
>> On Wed, May 6, 2015 at 5:44 PM, Thierry Reding  
>> wrote:
>>> On Tue, May 05, 2015 at 03:08:36PM +0530, Shobhit Kumar wrote:
 The Crystalcove PMIC controls PWM signals and this driver exports that
>>>
>>> You say signal_s_ here, but you only expose a single PWM device. Does
>>> the PMIC really control more than one? If it isn't, this should probably
>>> become: "controls a PWM output and this driver...".
>>
>> Actually it does support 3 of them but on the platform only one is
>> being used and I exported only that as of now. Probably I should
>> expand a little in the commit message indicating this. will re-post
>> after fixing based on your other comments.
>
> Updates pending due to personal leave. Can be expected next week.

Folks, really sorry, been busy with lot of unexpected and unavoidable
stuff. Working on getting the patches right. Expect them this week.

>>>
 capability as a PWM chip driver. This is platform device implementtaion
>>>
>>> "implementation"
>>>
 of the drivers/mfd cell device for CRC PMIC
>>>
>>> Sentences should end with a full stop.
>>>
 diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
 index b1541f4..954da3e 100644
 --- a/drivers/pwm/Kconfig
 +++ b/drivers/pwm/Kconfig
 @@ -183,6 +183,13 @@ config PWM_LPC32XX
 To compile this driver as a module, choose M here: the module
 will be called pwm-lpc32xx.

 +config PWM_CRC
 + bool "Intel Crystalcove (CRC) PWM support"
 + depends on X86 && INTEL_SOC_PMIC
 + help
 +   Generic PWM framework driver for Crystalcove (CRC) PMIC based PWM
 +   control.
 +
>>>
>>> This is badly sorted. Please keep the list sorted alphabetically.
>>>
  config PWM_LPSS
   tristate "Intel LPSS PWM support"
   depends on X86
 diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
 index ec50eb5..3d38fed 100644
 --- a/drivers/pwm/Makefile
 +++ b/drivers/pwm/Makefile
 @@ -35,3 +35,4 @@ obj-$(CONFIG_PWM_TIPWMSS)   += pwm-tipwmss.o
  obj-$(CONFIG_PWM_TWL)+= pwm-twl.o
  obj-$(CONFIG_PWM_TWL_LED)+= pwm-twl-led.o
  obj-$(CONFIG_PWM_VT8500) += pwm-vt8500.o
 +obj-$(CONFIG_PWM_CRC)+= pwm-crc.o
>>>
>>> This too.
>>>
 diff --git a/drivers/pwm/pwm-crc.c b/drivers/pwm/pwm-crc.c
 new file mode 100644
 index 000..987f3b4
 --- /dev/null
 +++ b/drivers/pwm/pwm-crc.c
 @@ -0,0 +1,171 @@
 +/*
 + * pwm-crc.c - Intel Crystal Cove PWM Driver
>>>
>>> I think you can safely remove this line. You already know what file it
>>> is when you open it in your editor, and the description is in the
>>> MODULE_DESCRIPTION string already.
>>>
 + *
 + * Copyright (C) 2015 Intel Corporation. All rights reserved.
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License version
 + * 2 as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * Author: Shobhit Kumar 
 + */
 +
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +
 +#define PWM0_CLK_DIV 0x4B
 +#define  PWM_OUTPUT_ENABLE   (1<<7)
>>>
>>> Should have spaces around <<.
>>>
 +#define  PWM_DIV_CLK_0   0x00 /* DIVIDECLK = BASECLK */
 +#define  PWM_DIV_CLK_100 0x63 /* DIVIDECLK = BASECLK/100 */
 +#define  PWM_DIV_CLK_128 0x7F /* DIVIDECLK = BASECLK/128 */
 +
 +#define PWM0_DUTY_CYCLE  0x4E
 +#define BACKLIGHT_EN 0x51
 +
 +#define PWM_MAX_LEVEL0xFF
 +
 +#define PWM_BASE_CLK 6000/* 6 MHz */
>>>
>>> This number is actually 6 KHz. I think it'd be better if you stuck with
>>> one unit here. Or perhaps there's some other reason why you can't use
>>> 600 here instead?
>>>
 +#define PWM_MAX_PERIOD_NS21333 /* 46.875KHz */
 +
 +/**
 + * struct crystalcove_pwm - Crystal Cove PWM controller
 + * @chip: the abstract pwm_chip structure.
 + * @regmap: the regmap from the parent device.
 + */
 +struct crystalcove_pwm {
 + struct pwm_chip chip;
 + struct platform_device *pdev;
>>>
>>> I think I had at some point requested that you get rid of this and use
>>> the chip.dev member instead. There's no kerneldoc for it and it isn't
>>> (well, almost, see below) used anywhere else, so perhaps you forgot to
>>> remove it here?
>>>
 + struct regmap *regmap;
 +};
 +
 +static inline struct 

Re: [PATCH RFC v7 2/3] crypto: rsa: add a new rsa generic implementation

2015-06-16 Thread Herbert Xu
On Tue, Jun 16, 2015 at 07:32:16PM -0700, Tadeusz Struk wrote:
>
> I thought that -EBADMSG was mainly used for authenticated ciphers in case 
> when verification of auth data fails.
> Since this are input params I thought that -EINVAL would be more appropriate.
> I can change it to -EBADMSG, no problem.
> Herbert, what do you think?

The existing crypto/asymmetric_key errno scheme doesn't really
mesh in with the rest of crypto.  So you'll just have to pick one
scheme and stick with it.

I don't really mind either way as long as the error codes are
unique and meaningful.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v7 2/3] crypto: rsa: add a new rsa generic implementation

2015-06-16 Thread Tadeusz Struk
Hi David,
On 06/16/2015 03:10 PM, David Howells wrote:
>> +static int _rsa_sign(const struct rsa_key *key, MPI s, MPI m)
>> > +{
>> > +  /* (1) Validate 0 <= m < n */
>> > +  if (mpi_cmp_ui(m, 0) < 0 || mpi_cmp(m, key->n) >= 0)
>> > +  return -EINVAL;
> Why -EINVAL not -EBADMSG?

I thought that -EBADMSG was mainly used for authenticated ciphers in case when 
verification of auth data fails.
Since this are input params I thought that -EINVAL would be more appropriate.
I can change it to -EBADMSG, no problem.
Herbert, what do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2015-06-16 Thread David Ahern

On 6/16/15 7:24 PM, Hemant Kumar wrote:

Because, this depends on the kernel tracepoint "kvm_hv:kvm_guest_exit".
perf_prepare_sample() in the kernel side sets the event->header.misc
field to
PERF_RECORD_MISC_KERNEL through perf_misc_flags(pt_regs). In case of
tracepoints which always get hit in the host kernel context, the
perf_misc_flags() will always return PERF_RECORD_MISC_KERNEL.

IMHO we will rather have to set the cpumode in the user space for this
tracepoint
and we can't depend on the event->header.misc field for this case.

What would you suggest?



oh, right you are using a tracepoint for this. It does not have the 
hooks to specify cpumode. Never mind.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Staging: wilc1000: Remove casting the values returned by kmalloc()

2015-06-16 Thread gre...@linuxfoundation.org
On Tue, Jun 16, 2015 at 07:10:28AM +, Abdul, Hussain (H.) wrote:
> From: Abdul Hussain 
> 
> This patch removes casting the values returned by memory allocation functions.
> 
> Signed-off-by: Abdul Hussain 
> ---
>  drivers/staging/wilc1000/linux_wlan.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Does not apply :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Staging: wilc1000: Remove casting the values returned by kmalloc()

2015-06-16 Thread gre...@linuxfoundation.org
On Tue, Jun 16, 2015 at 07:09:56AM +, Abdul, Hussain (H.) wrote:
> From: Abdul Hussain 
> 
> This patch removes casting the values returned by memory allocation functions.
> 
> Signed-off-by: Abdul Hussain 
> ---
>  drivers/staging/wilc1000/linux_mon.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Does not apply :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-16 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
> On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
>   
>> Do you consider take the following clean up? If yes, I will send a
>> formal patch, otherwise pls ignore it.
>> 
>
> On first glance, the old and new code do not appear to do the same thing,
> so let's leave it as it is.
>
>   
>> -   to_nodeid = dlm_dir_nodeid(r);
>> 
Sorry, seems it is the only different thing, if combines previous change
with below modification, then the behavior is same.

@@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mstype)
struct dlm_mhandle *mh;
int to_nodeid, error;
 
-   to_nodeid = r->res_nodeid;
+   if (mstype == DLM_MSG_LOOKUP)
+   to_nodeid = dlm_dir_nodeid(r);
+   else
+   to_nodeid = r->res_nodeid;

And for create_message, the second parameter (lkb) is not effective to
create three type msgs (REQUEST/LOOKUP/REMOVE).

Thanks,
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


MCE bug?

2015-06-16 Thread Rui Wang
Hi Boris & Tony,

While injecting MCEs using einj, I encountered a panic:

[0.305697] mce: CPU supports 22 MCE banks
[0.310288] BUG: unable to handle kernel NULL pointer dereference at 
   0100
[0.319057] IP: [] __queue_work+0x32/0x370
[0.325398] PGD 0
[0.327656] Oops:  [#1] SMP

...

[0.484045] Call Trace:
[0.486780]  [] queue_work_on+0x2b/0x50
[0.492821]  [] mce_schedule_work.part.16+0x29/0x30
[0.500020]  [] machine_check_poll+0x249/0x260
[0.506733]  [] __mcheck_cpu_init_generic+0x33/0x100
[0.514018]  [] mcheck_cpu_init+0x161/0x4b0
[0.520443]  [] identify_cpu+0x365/0x450
[0.526576]  [] identify_boot_cpu+0x10/0x7e
[0.532994]  [] check_bugs+0x9/0x2d
[0.538643]  [] start_kernel+0x469/0x495
[0.544771]  [] ? set_init_arg+0x55/0x55
[0.550900]  [] ? early_idt_handlers+0x120/0x120
[0.557805]  [] x86_64_start_reservations+0x2a/0x2c
[0.565001]  [] x86_64_start_kernel+0x13d/0x14c

It happened after the machine rebooted  (due to an injected fatal error). It 
tried to find leftover banks and then called mce_schedule_work() in 
machine_check_poll(), but it seemed too early and system_wq wasn't allocated 
yet, thus the NULL pointer.

Is it a known problem? I'm based on Linux 4.1.0-rc3-7.

Thanks
Rui


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] mtd: r852: Fix device_create_file() usage

2015-06-16 Thread Brian Norris
On Mon, Jun 01, 2015 at 11:10:49PM +0200, Richard Weinberger wrote:
> device_create_file() can fail, therefore we have to
> handle this case and abort.
> 
> Signed-off-by: Richard Weinberger 

Pushed the first 5. Still looking at the 6th.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] mtd: nandsim: Fix kasprintf() usage

2015-06-16 Thread Brian Norris
On Mon, Jun 01, 2015 at 11:10:50PM +0200, Richard Weinberger wrote:
> kasprintf() used in get_partition_name() does a dynamic
> memory allocation and can fail. We have to handle that case.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  drivers/mtd/nand/nandsim.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
> index f232427..52c0c1a 100644
> --- a/drivers/mtd/nand/nandsim.c
> +++ b/drivers/mtd/nand/nandsim.c
> @@ -743,6 +743,11 @@ static int init_nandsim(struct mtd_info *mtd)
>   goto error;
>   }
>   ns->partitions[i].name   = get_partition_name(i);
> + if (!ns->partitions[i].name) {
> + NS_ERR("unable to allocate memory.\n");

Probably don't really need the allocation failure messages. But this
matches the current style, so we can just rip the messages out at
another time.

> + ret = -ENOMEM;
> + goto error;
> + }
>   ns->partitions[i].offset = next_offset;
>   ns->partitions[i].size   = part_sz;
>   next_offset += ns->partitions[i].size;
> @@ -756,6 +761,11 @@ static int init_nandsim(struct mtd_info *mtd)
>   goto error;
>   }
>   ns->partitions[i].name   = get_partition_name(i);
> + if (!ns->partitions[i].name) {
> + NS_ERR("unable to allocate memory.\n");

Same here.

> + ret = -ENOMEM;
> + goto error;
> + }
>   ns->partitions[i].offset = next_offset;
>   ns->partitions[i].size   = remains;
>   ns->nbparts += 1;

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/6] of/slimbus: OF helper for SLIMbus

2015-06-16 Thread Sagar Dharia
OF helper routine scans the SLIMbus DeviceTree, allocates resources,
and creates slim_devices according to the hierarchy.

Signed-off-by: Sagar Dharia 
---
 Documentation/devicetree/bindings/slimbus/bus.txt | 34 ++
 drivers/slimbus/slimbus.c | 76 +++
 2 files changed, 110 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/slimbus/bus.txt

diff --git a/Documentation/devicetree/bindings/slimbus/bus.txt 
b/Documentation/devicetree/bindings/slimbus/bus.txt
new file mode 100644
index 000..a7a36aa
--- /dev/null
+++ b/Documentation/devicetree/bindings/slimbus/bus.txt
@@ -0,0 +1,34 @@
+SLIM(Serial Low Power Interchip Media Bus) bus
+
+SLIMbus is a 2-wire bus, and is used to communicate with peripheral
+components like audio-codec.
+
+Required property for SLIMbus controller node:
+- compatible   - name of SLIMbus controller.
+
+No other properties are required in the SLIMbus controller bus node.
+
+Child nodes:
+
+Every SLIMbus controller node can contain zero or more child nodes
+representing slave devices on the bus. Every SLIMbus slave device is
+uniquely determined by the 6 byte enumeration address.
+
+Required property for SLIMbus child node:
+enumeration-addr   - 6 byte enumeration address of the slave
+
+SLIMbus example for Qualcomm's slimbus manager compoent:
+
+   slim@2808 {
+   compatible = "qcom,slim-msm";
+   reg = <0x2808 0x2000>,
+   reg-names = "slimbus_physical";
+   interrupts = <0 33 0>;
+   interrupt-names = "slimbus_irq";
+   clocks = < SLIMBUS_SRC>, < AUDIO_SLIMBUS_CLK>;
+   clock-names = "iface_clk", "core_clk";
+
+   slim_codec_slave {
+   enumeration-addr = [00 01 60 00 17 02];
+   };
+   };
diff --git a/drivers/slimbus/slimbus.c b/drivers/slimbus/slimbus.c
index 2baf43a..0295a06 100644
--- a/drivers/slimbus/slimbus.c
+++ b/drivers/slimbus/slimbus.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_MUTEX(slim_lock);
 static DEFINE_IDR(ctrl_idr);
@@ -270,6 +271,80 @@ static LIST_HEAD(board_list);
 static LIST_HEAD(slim_ctrl_list);
 static DEFINE_MUTEX(board_lock);
 
+#if IS_ENABLED(CONFIG_OF)
+/* OF helpers for SLIMbus */
+static void of_register_slim_devices(struct slim_controller *ctrl)
+{
+   struct device_node *node;
+   struct slim_boardinfo *temp, *binfo = NULL;
+   int ret, n = 0;
+
+   if (!ctrl->dev.of_node)
+   return;
+
+   for_each_child_of_node(ctrl->dev.of_node, node) {
+   struct property *prop;
+   u8 *ea;
+   struct slim_device *slim;
+   char *name;
+
+   prop = of_find_property(node, "enumeration-addr", NULL);
+   if (!prop || prop->length != 6) {
+   dev_err(>dev, "of_slim: invalid E-addr\n");
+   continue;
+   }
+   ea = (u8 *)prop->value;
+   name = kcalloc(SLIMBUS_NAME_SIZE, sizeof(char), GFP_KERNEL);
+   if (!name)
+   goto of_slim_err;
+
+   ret = of_modalias_node(node, name, SLIMBUS_NAME_SIZE);
+   if (ret < 0) {
+   dev_err(>dev, "of_slim: modalias fail:%d on %s\n",
+   ret, node->full_name);
+   kfree(name);
+   continue;
+   }
+   slim = kzalloc(sizeof(struct slim_device), GFP_KERNEL);
+   if (!slim) {
+   kfree(name);
+   goto of_slim_err;
+   }
+   slim->e_addr.manf_id = (u16)(ea[5] << 8) | ea[4];
+   slim->e_addr.prod_code = (u16)(ea[3] << 8) | ea[2];
+   slim->e_addr.dev_index = ea[1];
+   slim->e_addr.instance = ea[0];
+
+
+   temp = krealloc(binfo, (n + 1) * sizeof(struct slim_boardinfo),
+   GFP_KERNEL);
+   if (!temp) {
+   kfree(name);
+   kfree(slim);
+   goto of_slim_err;
+   }
+   binfo = temp;
+   slim->dev.of_node = of_node_get(node);
+   slim->name = name;
+   binfo[n].bus_num = ctrl->nr;
+   binfo[n].slim_slave = slim;
+   n++;
+   }
+   slim_register_board_info(binfo, n);
+   return;
+
+of_slim_err:
+   n--;
+   while (n >= 0) {
+   kfree(binfo[n].slim_slave->name);
+   kfree(binfo[n].slim_slave);
+   }
+   kfree(binfo);
+}
+#else
+static void of_register_slim_devices(struct slim_controller *ctrl) { }
+#endif
+
 /* If controller is not present, only add to boards list */
 static void slim_match_ctrl_to_boardinfo(struct slim_controller *ctrl,
 struct 

Re: kexec_load(2) bypasses signature verification

2015-06-16 Thread Vivek Goyal
On Tue, Jun 16, 2015 at 08:32:37PM -0500, Eric W. Biederman wrote:
> Vivek Goyal  writes:
> 
> > On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:
> >> 
> >> Adding Vivek as he is the one who implemented kexec_file_load.
> >> I was hoping he would respond to this thread, and it looks like he
> >> simply has not ever been Cc'd.
> >> 
> >> Theodore Ts'o  writes:
> >> 
> >> > On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
> >> >> The bits that actually read Secure Boot state out of the UEFI
> >> >> variables, and apply protections to the machine to avoid compromise
> >> >> under the SB threat model.  Things like disabling the old kexec...
> >> >
> >> > I don't have any real interest in using Secure Boot, but I *am*
> >> > interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need to
> >> > have something similar to what we have with signed modules in terms of
> >> > CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
> >> > KEXEC_VERIFY_SIG.  This would mean creating a separate flag
> >> > independent of the one Linus suggested for Secure Boot, but since we
> >> > have one for signed modules, we do have precedent for this sort of
> >> > thing.
> >> 
> >> My overall request with respect to kexec has been that we implement
> >> things that make sense outside of the bizarre threat model of the Linux
> >> folks who were talking about secure boot.
> >> 
> >> nI have not navigated the labyrinth of config options but having a way to
> >> only boot signed things with kexec seems a completely sensible way to
> >> operate in the context of signed images.
> >> 
> >> I don't know how much that will help given that actors with sufficient
> >> resources have demonstrated the ability to steal private keys, but
> >> assuming binary signing is an effective technique (or why else do it)
> >> then having an option to limit kexec to only loading signed images seems
> >> sensible.
> >
> > I went through the mail chain on web and here are my thoughts.
> >
> > - So yes, upstream does not have the logic which automatically disables
> >   the old syscall (kexec_load()) on secureboot systems. Distributions
> >   carry those patches.
> >
> > - This KEXEC_VERIFY_SIG option only cotrols the behavior for
> >   kexec_file_load() syscall and is not meant to directly affect any
> >   behavior of old syscall (kexec_load()). I think I should have named
> >   it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
> >   "Verify kernel signature during kexec_file_load() syscall".
> >
> > - I think disabling old system call if KEXEC_VERIFY_SIG() is set
> >   will break existing setup which use old system call by default, except
> >   the case of secureboot system. And old syscall path is well tested
> >   and new syscall might not be in a position to support all the corner
> >   cases, atleast as of now.
> >
> > Ted, 
> >
> > So looks like you are looking for a system/option where you just want to
> > always make use of kexec_file_load() and disable kexec_load(). This sounds
> > like you want a kernel where kexec_load() is compiled out and you want
> > only kexec_file_load() in.
> >
> > Right now one can't do that becase kexec_file_load() depends on
> > CONFIG_KEXEC option.
> >
> > I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
> > of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
> > only signed kernel can be kexeced on that system.
> >
> > This should gel well with long term strategy of deprecating kexec_load()
> > at some point of time when kexec_file_load() is ready to completely
> > replace it.
> 
> Interesting.
> 
> I suspect that what we want is to have CONFIG_KEXEC for the core
> and additional CONFIG_KEXEC_LOAD option that covers that kexec_load call.
> 
> That should make it trivially easy to disable the kexec_load system call
> in cases where people care.

Or, we could create another option CONFIG_KEXEC_CORE/CONFIG_KEXEC_COMMON
which will be automatically selected when either CONFIG_KEXEC or
CONIG_KEXEC_FILE are selected.

All common code can go under this option and rest can go under respective
config options.

That way, those who have CONFIG_KEXEC=y in old config files will not be
broken. They don't have to learn about new options at all.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 5/6] slimbus: Add support for 'clock-pause' feature

2015-06-16 Thread Sagar Dharia
Per slimbus specification, a reconfiguration sequence known as
'clock pause' needs to be broadcast over the bus while entering low-
power mode. Clock-pause is initiated by the controller driver.
To exit clock-pause, controller typically wakes up the framer device.
Since wakeup precedure is controller-specific, framework calls it via
controller's function pointer to invoke it.

Signed-off-by: Sagar Dharia 
---
 drivers/slimbus/slimbus.c | 109 ++
 include/linux/slimbus.h   |  65 +++
 2 files changed, 174 insertions(+)

diff --git a/drivers/slimbus/slimbus.c b/drivers/slimbus/slimbus.c
index d74dfec..3cfa414 100644
--- a/drivers/slimbus/slimbus.c
+++ b/drivers/slimbus/slimbus.c
@@ -436,7 +436,9 @@ static int slim_register_controller(struct slim_controller 
*ctrl)
ctrl->min_cg = SLIM_MIN_CLK_GEAR;
if (!ctrl->max_cg)
ctrl->max_cg = SLIM_MAX_CLK_GEAR;
+   init_completion(>sched.pause_comp);
mutex_init(>m_ctrl);
+   mutex_init(>sched.m_reconf);
ret = device_register(>dev);
if (ret)
goto out_list;
@@ -1059,6 +1061,113 @@ int slim_request_clear_inf_element(struct slim_device 
*sb,
 }
 EXPORT_SYMBOL(slim_request_clear_inf_element);
 
+/**
+ * slim_ctrl_clk_pause: Called by slimbus controller to enter/exit 'clock 
pause'
+ * Slimbus specification needs this sequence to turn-off clocks for the bus.
+ * The sequence involves sending 3 broadcast messages (reconfiguration
+ * sequence) to inform all devices on the bus.
+ * To exit clock-pause, controller typically wakes up active framer device.
+ * @ctrl: controller requesting bus to be paused or woken up
+ * @wakeup: Wakeup this controller from clock pause.
+ * @restart: Restart time value per spec used for clock pause. This value
+ * isn't used when controller is to be woken up.
+ * This API executes clock pause reconfiguration sequence if wakeup is false.
+ * If wakeup is true, controller's wakeup is called.
+ * For entering clock-pause, -EBUSY is returned if a message txn in pending.
+ */
+int slim_ctrl_clk_pause(struct slim_controller *ctrl, bool wakeup, u8 restart)
+{
+   int i, ret = 0;
+   unsigned long flags;
+   struct slim_sched *sched = >sched;
+   struct slim_val_inf msg = {0, 0, NULL, NULL, NULL, NULL};
+
+   DEFINE_SLIM_BCAST_TXN(txn, SLIM_MSG_MC_BEGIN_RECONFIGURATION,
+   3, SLIM_LA_MANAGER, );
+
+   if (wakeup == false && restart > SLIM_CLK_UNSPECIFIED)
+   return -EINVAL;
+   mutex_lock(>m_reconf);
+   if (wakeup) {
+   if (sched->clk_state == SLIM_CLK_ACTIVE) {
+   mutex_unlock(>m_reconf);
+   return 0;
+   }
+   /**
+* Fine-tune calculation based on clock gear,
+* message-bandwidth after bandwidth management
+*/
+   ret = wait_for_completion_timeout(>pause_comp,
+   msecs_to_jiffies(100));
+   if (!ret) {
+   mutex_unlock(>m_reconf);
+   pr_err("Previous clock pause did not finish");
+   return -ETIMEDOUT;
+   }
+   ret = 0;
+   /**
+* Slimbus framework will call controller wakeup
+* Controller should make sure that it sets active framer
+* out of clock pause
+*/
+   if (sched->clk_state == SLIM_CLK_PAUSED && ctrl->wakeup)
+   ret = ctrl->wakeup(ctrl);
+   if (!ret)
+   sched->clk_state = SLIM_CLK_ACTIVE;
+   mutex_unlock(>m_reconf);
+   return ret;
+   }
+
+   /* already paused */
+   if (ctrl->sched.clk_state == SLIM_CLK_PAUSED) {
+   mutex_unlock(>m_reconf);
+   return 0;
+   }
+
+   spin_lock_irqsave(>txn_lock, flags);
+   for (i = 0; i < ctrl->last_tid; i++) {
+   /* Pending response for a message */
+   if (ctrl->txnt[i]) {
+   spin_unlock_irqrestore(>txn_lock, flags);
+   mutex_unlock(>m_reconf);
+   return -EBUSY;
+   }
+   }
+   spin_unlock_irqrestore(>txn_lock, flags);
+
+   sched->clk_state = SLIM_CLK_ENTERING_PAUSE;
+
+   /* clock pause sequence */
+   ret = slim_processtxn(ctrl, );
+   if (ret)
+   goto clk_pause_ret;
+
+   txn.mc = SLIM_MSG_MC_NEXT_PAUSE_CLOCK;
+   txn.rl = 4;
+   msg.num_bytes = 1;
+   msg.wbuf = 
+   ret = slim_processtxn(ctrl, );
+   if (ret)
+   goto clk_pause_ret;
+
+   txn.mc = SLIM_MSG_MC_RECONFIGURE_NOW;
+   txn.rl = 3;
+   msg.num_bytes = 1;
+   msg.wbuf = NULL;
+   ret = slim_processtxn(ctrl, );
+
+clk_pause_ret:
+   if (ret) {
+   

[PATCH V2 4/6] slim: qcom: Add Qualcomm Slimbus controller driver

2015-06-16 Thread Sagar Dharia
This controller driver programs manager, interface, and framer
devices for Qualcomm's slimbus HW block.
Manager component currently implements logical address setting,
and messaging interface.
Interface device reports bus synchronization information, and framer
device clocks the bus from the time it's woken up, until clock-pause
is executed by the manager device.

Signed-off-by: Sagar Dharia 
---
 .../devicetree/bindings/slimbus/slim-qcom-ctrl.txt |  45 ++
 drivers/slimbus/Kconfig|  10 +
 drivers/slimbus/Makefile   |   2 +
 drivers/slimbus/slim-qcom-ctrl.c   | 723 +
 drivers/slimbus/slim-qcom.c|  96 +++
 drivers/slimbus/slim-qcom.h|  89 +++
 6 files changed, 965 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/slimbus/slim-qcom-ctrl.txt
 create mode 100644 drivers/slimbus/slim-qcom-ctrl.c
 create mode 100644 drivers/slimbus/slim-qcom.c
 create mode 100644 drivers/slimbus/slim-qcom.h

diff --git a/Documentation/devicetree/bindings/slimbus/slim-qcom-ctrl.txt 
b/Documentation/devicetree/bindings/slimbus/slim-qcom-ctrl.txt
new file mode 100644
index 000..25c9609
--- /dev/null
+++ b/Documentation/devicetree/bindings/slimbus/slim-qcom-ctrl.txt
@@ -0,0 +1,45 @@
+Qualcomm SLIMBUS controller
+"qcom,slim-msm": This controller is used if applications processor
+   driver is controlling slimbus master component. This driver is
+   responsible for communicating with slave HW directly using
+   messaging interface, and doing data channel management.
+
+Required properties:
+
+ - reg : Offset and length of the register region(s) for the device
+ - reg-names : Register region name(s) referenced in reg above
+Required register resource entries are:
+"slimbus_physical": Physical adderss of controller register blocks
+"slimbus_bam_physical": Physical address of Bus Access Module (BAM)
+for this controller
+ - compatible : should be "qcom,slim-msm" if this is master component driver
+ - cell-index : SLIMBUS number used for this controller
+ - interrupts : Interrupt numbers used by this controller
+ - interrupt-names : Required interrupt resource entries are:
+   "slimbus_irq" : Interrupt for SLIMBUS core
+ - clocks : Interface and core clocks used by this slimbus controller
+ - clock-names : Required clock-name entries are:
+   "iface_clk" : Interface clock for this controller
+   "core_clk" : Interrupt for controller core's BAM
+
+Optional property:
+ - reg entry for slew rate : If slew rate control register is provided, this
+   entry should be used.
+ - reg-name for slew rate: "slimbus_slew_reg"
+ - dmaengine, and pipes used to communicate between controller and memory if
+   sps-BAM HW is used
+
+Example:
+   slim@2808 {
+   compatible = "qcom,slim-msm";
+   reg = <0x2808 0x2000>,
+   reg-names = "slimbus_physical";
+   interrupts = <0 33 0>;
+   interrupt-names = "slimbus_irq";
+   clocks = < SLIMBUS_SRC>, < AUDIO_SLIMBUS_CLK>;
+   clock-names = "iface_clk", "core_clk";
+
+   slim_codec_slave {
+   enumeration-addr = [00 01 60 00 17 02];
+   };
+   };
diff --git a/drivers/slimbus/Kconfig b/drivers/slimbus/Kconfig
index fb30497..ce988c2 100644
--- a/drivers/slimbus/Kconfig
+++ b/drivers/slimbus/Kconfig
@@ -7,3 +7,13 @@ menuconfig SLIMBUS
  Slimbus is standard interface between baseband and audio codec,
  and other peripheral components in mobile terminals.
 
+if SLIMBUS
+config SLIM_QCOM_CTRL
+   tristate "Qualcomm Slimbus Manager Component"
+   depends on SLIMBUS
+   default n
+   help
+ Select driver if Qualcomm's Slimbus Manager Component is
+ programmed using Linux kernel.
+
+endif
diff --git a/drivers/slimbus/Makefile b/drivers/slimbus/Makefile
index 05f53bc..efc3db4 100644
--- a/drivers/slimbus/Makefile
+++ b/drivers/slimbus/Makefile
@@ -2,3 +2,5 @@
 # Makefile for kernel slimbus framework.
 #
 obj-$(CONFIG_SLIMBUS)  += slimbus.o
+
+obj-$(CONFIG_SLIM_QCOM_CTRL)   += slim-qcom.o slim-qcom-ctrl.o
diff --git a/drivers/slimbus/slim-qcom-ctrl.c b/drivers/slimbus/slim-qcom-ctrl.c
new file mode 100644
index 000..feae9d1
--- /dev/null
+++ b/drivers/slimbus/slim-qcom-ctrl.c
@@ -0,0 +1,723 @@
+/* Copyright (c) 2011-2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  

[PATCH V2 6/6] slim: qcom: Add runtime-pm support using clock-pause feature

2015-06-16 Thread Sagar Dharia
Slimbus HW mandates that clock-pause sequence has to be executed
before disabling relevant interface and core clocks.
Runtime-PM's autosuspend feature is used here to enter/exit low
power mode for Qualcomm's Slimbus controller. Autosuspend feature
enables driver to avoid changing power-modes too frequently since
entering clock-pause is an expensive sequence

Signed-off-by: Sagar Dharia 
---
 drivers/slimbus/slim-qcom-ctrl.c | 133 ++-
 drivers/slimbus/slim-qcom.h  |   1 +
 2 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/drivers/slimbus/slim-qcom-ctrl.c b/drivers/slimbus/slim-qcom-ctrl.c
index feae9d1..ec8687d 100644
--- a/drivers/slimbus/slim-qcom-ctrl.c
+++ b/drivers/slimbus/slim-qcom-ctrl.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "slim-qcom.h"
 
 #define MSM_SLIM_NAME  "msm_slim_ctrl"
@@ -193,6 +194,7 @@ rx_ret_irq:
if (notify_rx)
complete(>rx_msgq_notify);
}
+   pm_runtime_mark_last_busy(dev->dev);
return IRQ_HANDLED;
 }
 
@@ -211,6 +213,28 @@ static void msm_slim_wait_retry(struct msm_slim_ctrl *dev)
msleep(msec_per_frm);
 }
 
+static int msm_clk_pause_wakeup(struct slim_controller *ctrl)
+{
+   struct msm_slim_ctrl *dev = slim_get_ctrldata(ctrl);
+
+   clk_prepare_enable(dev->hclk);
+   clk_prepare_enable(dev->rclk);
+   writel_relaxed(1, dev->base + FRM_WAKEUP);
+   /* Make sure framer wakeup write goes through before ISR fires */
+   mb();
+   /**
+* HW Workaround: Currently, slave is reporting lost-sync messages
+* after slimbus comes out of clock pause.
+* Transaction with slave fail before slave reports that message
+* Give some time for that report to come
+* Slimbus wakes up in clock gear 10 at 24.576MHz. With each superframe
+* being 250 usecs, we wait for 5-10 superframes here to ensure
+* we get the message
+*/
+   usleep_range(1250, 2500);
+   return 0;
+}
+
 static void msm_slim_cb(void *ctx, int err)
 {
if (err)
@@ -225,14 +249,29 @@ static int msm_xfer_msg(struct slim_controller *ctrl, 
struct slim_msg_txn *txn)
struct msm_slim_ctrl *dev = slim_get_ctrldata(ctrl);
u32 *pbuf;
u8 *puc;
-   int timeout;
+   int ret, timeout;
u8 la = txn->la;
+   enum slim_clk_state cur_clk_state = ctrl->sched.clk_state;
struct msm_wr_cb wr_cb = {msm_slim_cb, (void *)};
 
/* No support to send this dest-type of message */
if (txn->dt == SLIM_MSG_DEST_ENUMADDR)
return -EPROTONOSUPPORT;
 
+   if (cur_clk_state == SLIM_CLK_ENTERING_PAUSE) {
+   if (txn->mc != SLIM_MSG_MC_BEGIN_RECONFIGURATION &&
+   txn->mc != SLIM_MSG_MC_RECONFIGURE_NOW &&
+   txn->mc != SLIM_MSG_MC_NEXT_PAUSE_CLOCK)
+   return -EBUSY;
+   } else {
+   ret = pm_runtime_get_sync(dev->dev);
+   if (ret < 0) {
+   pm_runtime_set_suspended(dev->dev);
+   dev_err(dev->dev, "runtime-pm vote failed:%d\n", ret);
+   return ret;
+   }
+   }
+
/* HW expects length field to be excluded */
txn->rl--;
 
@@ -276,6 +315,8 @@ static int msm_xfer_msg(struct slim_controller *ctrl, 
struct slim_msg_txn *txn)
txn->mt);
 
mutex_unlock(>txn_lock);
+   if (cur_clk_state != SLIM_CLK_ENTERING_PAUSE)
+   pm_runtime_put(dev->dev);
return timeout ? 0 : -ETIMEDOUT;
 }
 
@@ -295,6 +336,13 @@ static int msm_set_laddr(struct slim_controller *ctrl,
ea[2] = (u8) (ead->prod_code & 0xFF);
ea[1] = ead->dev_index;
ea[0] = ead->instance;
+
+   ret = pm_runtime_get_sync(dev->dev);
+   if (ret < 0) {
+   pm_runtime_set_suspended(dev->dev);
+   dev_err(dev->dev, "runtime-pm vote failed:%d\n", ret);
+   return ret;
+   }
mutex_lock(>txn_lock);
/**
 * Retries are needed since bus may lose sync when multiple devices
@@ -305,6 +353,7 @@ retry_laddr:
buf = (u32 *)msm_slim_get_tx(dev, _cb);
if (buf == NULL) {
mutex_unlock(>txn_lock);
+   pm_runtime_put(dev->dev);
return -ENOMEM;
}
 
@@ -332,6 +381,7 @@ retry_laddr:
}
}
mutex_unlock(>txn_lock);
+   pm_runtime_put(dev->dev);
return ret;
 }
 
@@ -512,6 +562,7 @@ static int msm_slim_probe(struct platform_device *pdev)
}
dev->ctrl.set_laddr = msm_set_laddr;
dev->ctrl.xfer_msg = msm_xfer_msg;
+   dev->ctrl.wakeup =  msm_clk_pause_wakeup;
 
mutex_init(>txn_lock);
init_completion(>rx_msgq_notify);
@@ -615,6 +666,12 @@ static int msm_slim_probe(struct platform_device *pdev)
/* Add devices registered with 

[PATCH V2 3/6] slimbus: Add messaging APIs to slimbus framework

2015-06-16 Thread Sagar Dharia
Slimbus devices use value-element, and information elements to
control device parameters (e.g. value element is used to represent
gain for codec, information element is used to represent interrupt
status for codec when codec interrupt fires).
Messaging APIs are used to set/get these value and information
elements. Slimbus specification uses 8-bit "transaction IDs" for
messages where a read-value is anticipated. Framework uses a table
of pointers to store those TIDs and responds back to the caller in
O(1).
Caller can opt to do synchronous, or asynchronous reads/writes. For
asynchronous operations, the callback can be called from atomic
context.

Signed-off-by: Sagar Dharia 
Tested-by: Naveen Kaje 
---
 drivers/slimbus/slimbus.c | 275 ++
 include/linux/slimbus.h   | 128 +
 2 files changed, 403 insertions(+)

diff --git a/drivers/slimbus/slimbus.c b/drivers/slimbus/slimbus.c
index 0295a06..d74dfec 100644
--- a/drivers/slimbus/slimbus.c
+++ b/drivers/slimbus/slimbus.c
@@ -26,6 +26,14 @@ static DEFINE_IDR(ctrl_idr);
 static struct device_type slim_dev_type;
 static struct device_type slim_ctrl_type;
 
+#define DEFINE_SLIM_LDEST_TXN(name, mc, rl, la, msg) \
+   struct slim_msg_txn name = { rl, 0, mc, SLIM_MSG_DEST_LOGICALADDR, 0,\
+   0, la, msg, }
+
+#define DEFINE_SLIM_BCAST_TXN(name, mc, rl, la, msg) \
+   struct slim_msg_txn name = { rl, 0, mc, SLIM_MSG_DEST_BROADCAST, 0,\
+   0, la, msg, }
+
 static bool slim_eaddr_equal(struct slim_eaddr *a, struct slim_eaddr *b)
 {
return (a->manf_id == b->manf_id &&
@@ -784,6 +792,273 @@ int slim_get_logical_addr(struct slim_device *sb, struct 
slim_eaddr *e_addr,
 }
 EXPORT_SYMBOL(slim_get_logical_addr);
 
+/**
+ * slim_msg_response: Deliver Message response received from a device to the
+ * framework.
+ * @ctrl: Controller handle
+ * @reply: Reply received from the device
+ * @len: Length of the reply
+ * @tid: Transaction ID received with which framework can associate reply.
+ * Called by controller to inform framework about the response received.
+ * This helps in making the API asynchronous, and controller-driver doesn't 
need
+ * to manage 1 more table other than the one managed by framework mapping TID
+ * with buffers
+ */
+void slim_msg_response(struct slim_controller *ctrl, u8 *reply, u8 tid, u8 len)
+{
+   struct slim_val_inf *msg;
+
+   spin_lock(>txn_lock);
+   msg = ctrl->txnt[tid];
+   if (msg == NULL || msg->rbuf == NULL) {
+   spin_unlock(>txn_lock);
+   dev_err(>dev, "Got response to invalid TID:%d, len:%d\n",
+   tid, len);
+   return;
+   }
+   memcpy(msg->rbuf, reply, len);
+   ctrl->txnt[tid] = NULL;
+   if (msg->comp_cb)
+   msg->comp_cb(msg->ctx, 0);
+   spin_unlock(>txn_lock);
+}
+EXPORT_SYMBOL(slim_msg_response);
+
+static int slim_processtxn(struct slim_controller *ctrl,
+   struct slim_msg_txn *txn)
+{
+   int i = 0;
+   unsigned long flags;
+
+   if (slim_tid_txn(txn->mt, txn->mc)) {
+   spin_lock_irqsave(>txn_lock, flags);
+   for (i = 0; i < ctrl->last_tid; i++) {
+   if (ctrl->txnt[i] == NULL)
+   break;
+   }
+   if (i >= ctrl->last_tid) {
+   if (ctrl->last_tid == 255) {
+   spin_unlock_irqrestore(>txn_lock, flags);
+   return -ENOMEM;
+   }
+   ctrl->last_tid++;
+   }
+   ctrl->txnt[i] = txn->msg;
+   txn->tid = i;
+   spin_unlock_irqrestore(>txn_lock, flags);
+   }
+
+   return ctrl->xfer_msg(ctrl, txn);
+}
+
+static int slim_val_inf_sanity(struct slim_controller *ctrl,
+  struct slim_val_inf *msg, u8 mc)
+{
+   if (!msg || msg->num_bytes > 16 ||
+   (msg->start_offset + msg->num_bytes) > 0xC00)
+   goto reterr;
+   switch (mc) {
+   case SLIM_MSG_MC_REQUEST_VALUE:
+   case SLIM_MSG_MC_REQUEST_INFORMATION:
+   if (msg->rbuf != NULL)
+   return 0;
+   break;
+   case SLIM_MSG_MC_CHANGE_VALUE:
+   case SLIM_MSG_MC_CLEAR_INFORMATION:
+   if (msg->wbuf != NULL)
+   return 0;
+   break;
+   case SLIM_MSG_MC_REQUEST_CHANGE_VALUE:
+   case SLIM_MSG_MC_REQUEST_CLEAR_INFORMATION:
+   if (msg->rbuf != NULL && msg->wbuf != NULL)
+   return 0;
+   break;
+   default:
+   break;
+   }
+reterr:
+   dev_err(>dev, "Sanity check failed:msg:offset:0x%x, mc:%d\n",
+   msg->start_offset, mc);
+   return -EINVAL;
+}
+
+static u16 

[PATCH V2 1/6] SLIMbus: Device management on SLIMbus

2015-06-16 Thread Sagar Dharia
SLIMbus (Serial Low Power Interchip Media Bus) is a specification
developed by MIPI (Mobile Industry Processor Interface) alliance.
SLIMbus is a 2-wire implementation, which is used to communicate with
peripheral components like audio-codec.
SLIMbus uses Time-Division-Multiplexing to accommodate multiple data
channels, and control channel. Control channel has messages to do
device-enumeration, messages to send/receive control-data to/from
slimbus devices, messages for port/channel management, and messages to
do bandwidth allocation.
The framework supports multiple instances of the bus (1 controller per
bus), and multiple slave devices per controller.

This patch does device enumeration, logical address assignment,
informing device when the device reports present/absent etc.
Reporting present may need the driver to do the needful (e.g. turning
on voltage regulators powering the device). So probe is called
if the device is added to board-info list for a controller.
Additionally device is probed when it reports present if that device
doesn't need any such steps mentioned above.

Signed-off-by: Sagar Dharia 
---
 drivers/Kconfig |   2 +
 drivers/Makefile|   1 +
 drivers/slimbus/Kconfig |   9 +
 drivers/slimbus/Makefile|   4 +
 drivers/slimbus/slimbus.c   | 714 
 include/linux/mod_devicetable.h |  13 +
 include/linux/slimbus.h | 396 ++
 7 files changed, 1139 insertions(+)
 create mode 100644 drivers/slimbus/Kconfig
 create mode 100644 drivers/slimbus/Makefile
 create mode 100644 drivers/slimbus/slimbus.c
 create mode 100644 include/linux/slimbus.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index c0cc96b..e39c969 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -182,4 +182,6 @@ source "drivers/thunderbolt/Kconfig"
 
 source "drivers/android/Kconfig"
 
+source "drivers/slimbus/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 46d2554..37c1c88 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_TARGET_CORE) += target/
 obj-$(CONFIG_MTD)  += mtd/
 obj-$(CONFIG_SPI)  += spi/
 obj-$(CONFIG_SPMI) += spmi/
+obj-$(CONFIG_SLIMBUS)  += slimbus/
 obj-y  += hsi/
 obj-y  += net/
 obj-$(CONFIG_ATM)  += atm/
diff --git a/drivers/slimbus/Kconfig b/drivers/slimbus/Kconfig
new file mode 100644
index 000..fb30497
--- /dev/null
+++ b/drivers/slimbus/Kconfig
@@ -0,0 +1,9 @@
+#
+# SLIMBUS driver configuration
+#
+menuconfig SLIMBUS
+   tristate "Slimbus support"
+   help
+ Slimbus is standard interface between baseband and audio codec,
+ and other peripheral components in mobile terminals.
+
diff --git a/drivers/slimbus/Makefile b/drivers/slimbus/Makefile
new file mode 100644
index 000..05f53bc
--- /dev/null
+++ b/drivers/slimbus/Makefile
@@ -0,0 +1,4 @@
+#
+# Makefile for kernel slimbus framework.
+#
+obj-$(CONFIG_SLIMBUS)  += slimbus.o
diff --git a/drivers/slimbus/slimbus.c b/drivers/slimbus/slimbus.c
new file mode 100644
index 000..2baf43a
--- /dev/null
+++ b/drivers/slimbus/slimbus.c
@@ -0,0 +1,714 @@
+/* Copyright (c) 2011-2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static DEFINE_MUTEX(slim_lock);
+static DEFINE_IDR(ctrl_idr);
+static struct device_type slim_dev_type;
+static struct device_type slim_ctrl_type;
+
+static bool slim_eaddr_equal(struct slim_eaddr *a, struct slim_eaddr *b)
+{
+   return (a->manf_id == b->manf_id &&
+   a->prod_code == b->prod_code &&
+   a->dev_index == b->dev_index &&
+   a->instance == b->instance);
+}
+
+static const struct slim_device_id *
+slim_match(const struct slim_device_id *id, const struct slim_device *slim_dev)
+{
+   while (id->manf_id != 0 || id->prod_code != 0) {
+   if (id->manf_id == slim_dev->e_addr.manf_id &&
+   id->prod_code == slim_dev->e_addr.prod_code &&
+   id->dev_index == slim_dev->e_addr.dev_index)
+   return id;
+   id++;
+   }
+   return NULL;
+}
+
+static int slim_device_match(struct device *dev, struct device_driver *driver)
+{
+   struct slim_device *slim_dev;
+   struct slim_driver *drv = to_slim_driver(driver);
+
+   

[PATCH V2 0/6] Introduce framework for SLIMbus device drivers

2015-06-16 Thread Sagar Dharia
SLIMbus (Serial Low Power Interchip Media Bus) is a specification
developed by MIPI (Mobile Industry Processor Interface) alliance.
SLIMbus is a 2-wire implementation, which is used to communicate with
peripheral components like audio-codec.
SLIMbus uses Time-Division-Multiplexing to accommodate multiple data
channels, and control channel. Control channel has messages to do
device-enumeration, messages to send/receive control-data to/from
slimbus devices, messages for port/channel management, and messages to
do bandwidth allocation.
Framework is introduced to support  multiple instances of the bus
(1 controller per bus), and multiple slave devices per controller.
SPI and I2C frameworks, and comments from last time when I submitted
the patches were referred-to while working on this framework.

These patchsets introduce device-management, OF helpers, and messaging
APIs, and clock-pause feature for entering/exiting low-power mode for
SLIMbus. Framework patches to do channel, port and bandwidth
management are work-in-progress and will be sent out soon.

These patchsets were tested on Qualcomm Snapdragon processor board
using a controller driver.


Changes from V1 to V2:
* Addressed inline-code review comments from Joe and Mark Brown.
* Added initial binding document for slimbus
* Added initial version of Qualcomm controller driver and initial
  binding document for it.
* Added clock-pause feature for entering/exiting low power mode
* Added runtime-pm to Qualcomm controller driver

Sagar Dharia (6):
  SLIMbus: Device management on SLIMbus
  of/slimbus: OF helper for SLIMbus
  slimbus: Add messaging APIs to slimbus framework
  slim: qcom: Add Qualcomm Slimbus controller driver
  slimbus: Add support for 'clock-pause' feature
  slim: qcom: Add runtime-pm support using clock-pause feature

 Documentation/devicetree/bindings/slimbus/bus.txt  |   34 +
 .../devicetree/bindings/slimbus/slim-qcom-ctrl.txt |   45 +
 drivers/Kconfig|2 +
 drivers/Makefile   |1 +
 drivers/slimbus/Kconfig|   19 +
 drivers/slimbus/Makefile   |6 +
 drivers/slimbus/slim-qcom-ctrl.c   |  852 ++
 drivers/slimbus/slim-qcom.c|   96 ++
 drivers/slimbus/slim-qcom.h|   90 ++
 drivers/slimbus/slimbus.c  | 1174 
 include/linux/mod_devicetable.h|   13 +
 include/linux/slimbus.h|  589 ++
 12 files changed, 2921 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/slimbus/bus.txt
 create mode 100644 Documentation/devicetree/bindings/slimbus/slim-qcom-ctrl.txt
 create mode 100644 drivers/slimbus/Kconfig
 create mode 100644 drivers/slimbus/Makefile
 create mode 100644 drivers/slimbus/slim-qcom-ctrl.c
 create mode 100644 drivers/slimbus/slim-qcom.c
 create mode 100644 drivers/slimbus/slim-qcom.h
 create mode 100644 drivers/slimbus/slimbus.c
 create mode 100644 include/linux/slimbus.h

-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v0] arm64: dts: Add poweroff button device node for APM X-Gene platform

2015-06-16 Thread Y Vo
This patch adds poweroff button device node to support poweroff feature
on APM X-Gene Mustang platform.

Signed-off-by: Y Vo 
---
 arch/arm64/boot/dts/apm/apm-mustang.dts |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/apm/apm-mustang.dts 
b/arch/arm64/boot/dts/apm/apm-mustang.dts
index 83578e7..67979c8 100644
--- a/arch/arm64/boot/dts/apm/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm/apm-mustang.dts
@@ -23,6 +23,17 @@
device_type = "memory";
reg = < 0x1 0x 0x0 0x8000 >; /* Updated by 
bootloader */
};
+
+   gpio-keys {
+   compatible = "gpio-keys";
+   apm_ctrl_name = "Power Button";
+   btn3 {
+   label = "EXT_PWR";
+   linux,code = <116>;
+   linux,input-type = <0x1>; /* EV_KEY */
+   interrupts = <0x0 0x2d 0x1>;
+   };
+   };
 };
 
  {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] acer-wmi: update removal notice for sysfs interface

2015-06-16 Thread Martin Kepplinger
Signed-off-by: Martin Kepplinger 
---
These just look odd when out of date. The proper fix would probably be
to create a Documentation/ABI/testing/sysfs-platform-acer-wmi file and
remove the deprecated ones.


 drivers/platform/x86/acer-wmi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/platform/x86/acer-wmi.c b/drivers/platform/x86/acer-wmi.c
index 3ac29a1..83c4850 100644
--- a/drivers/platform/x86/acer-wmi.c
+++ b/drivers/platform/x86/acer-wmi.c
@@ -1670,7 +1670,7 @@ static ssize_t show_bool_threeg(struct device *dev,
u32 result; \
acpi_status status;
 
-   pr_info("This threeg sysfs will be removed in 2014 - used by: %s\n",
+   pr_info("This threeg sysfs will be removed in 2015 - used by: %s\n",
current->comm);
status = get_u32(, ACER_CAP_THREEG);
if (ACPI_SUCCESS(status))
@@ -1683,7 +1683,7 @@ static ssize_t set_bool_threeg(struct device *dev,
 {
u32 tmp = simple_strtoul(buf, NULL, 10);
acpi_status status = set_u32(tmp, ACER_CAP_THREEG);
-   pr_info("This threeg sysfs will be removed in 2014 - used by: %s\n",
+   pr_info("This threeg sysfs will be removed in 2015 - used by: %s\n",
current->comm);
if (ACPI_FAILURE(status))
return -EINVAL;
@@ -1695,7 +1695,7 @@ static DEVICE_ATTR(threeg, S_IRUGO | S_IWUSR, 
show_bool_threeg,
 static ssize_t show_interface(struct device *dev, struct device_attribute 
*attr,
char *buf)
 {
-   pr_info("This interface sysfs will be removed in 2014 - used by: %s\n",
+   pr_info("This interface sysfs will be removed in 2015 - used by: %s\n",
current->comm);
switch (interface->type) {
case ACER_AMW0:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec_load(2) bypasses signature verification

2015-06-16 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Tue, Jun 16, 2015 at 02:38:31PM -0500, Eric W. Biederman wrote:
>> 
>> Adding Vivek as he is the one who implemented kexec_file_load.
>> I was hoping he would respond to this thread, and it looks like he
>> simply has not ever been Cc'd.
>> 
>> Theodore Ts'o  writes:
>> 
>> > On Mon, Jun 15, 2015 at 09:37:05AM -0400, Josh Boyer wrote:
>> >> The bits that actually read Secure Boot state out of the UEFI
>> >> variables, and apply protections to the machine to avoid compromise
>> >> under the SB threat model.  Things like disabling the old kexec...
>> >
>> > I don't have any real interest in using Secure Boot, but I *am*
>> > interested in using CONFIG_KEXEC_VERIFY_SIG[1].  So perhaps we need to
>> > have something similar to what we have with signed modules in terms of
>> > CONFIG_MODULE_SIG_FORCE and module/sig_enforce, but for
>> > KEXEC_VERIFY_SIG.  This would mean creating a separate flag
>> > independent of the one Linus suggested for Secure Boot, but since we
>> > have one for signed modules, we do have precedent for this sort of
>> > thing.
>> 
>> My overall request with respect to kexec has been that we implement
>> things that make sense outside of the bizarre threat model of the Linux
>> folks who were talking about secure boot.
>> 
>> nI have not navigated the labyrinth of config options but having a way to
>> only boot signed things with kexec seems a completely sensible way to
>> operate in the context of signed images.
>> 
>> I don't know how much that will help given that actors with sufficient
>> resources have demonstrated the ability to steal private keys, but
>> assuming binary signing is an effective technique (or why else do it)
>> then having an option to limit kexec to only loading signed images seems
>> sensible.
>
> I went through the mail chain on web and here are my thoughts.
>
> - So yes, upstream does not have the logic which automatically disables
>   the old syscall (kexec_load()) on secureboot systems. Distributions
>   carry those patches.
>
> - This KEXEC_VERIFY_SIG option only cotrols the behavior for
>   kexec_file_load() syscall and is not meant to directly affect any
>   behavior of old syscall (kexec_load()). I think I should have named
>   it KEXEC_FILE_VERIFY_SIG. Though help text makes it clear.
>   "Verify kernel signature during kexec_file_load() syscall".
>
> - I think disabling old system call if KEXEC_VERIFY_SIG() is set
>   will break existing setup which use old system call by default, except
>   the case of secureboot system. And old syscall path is well tested
>   and new syscall might not be in a position to support all the corner
>   cases, atleast as of now.
>
> Ted, 
>
> So looks like you are looking for a system/option where you just want to
> always make use of kexec_file_load() and disable kexec_load(). This sounds
> like you want a kernel where kexec_load() is compiled out and you want
> only kexec_file_load() in.
>
> Right now one can't do that becase kexec_file_load() depends on
> CONFIG_KEXEC option.
>
> I am wondering that how about making CONFIG_KEXEC_FILE_LOAD independent
> of CONFIG_KEXEC. That way one can set CONFIG_KEXEC_VERIFY_SIG=y, and
> only signed kernel can be kexeced on that system.
>
> This should gel well with long term strategy of deprecating kexec_load()
> at some point of time when kexec_file_load() is ready to completely
> replace it.

Interesting.

I suspect that what we want is to have CONFIG_KEXEC for the core
and additional CONFIG_KEXEC_LOAD option that covers that kexec_load call.

That should make it trivially easy to disable the kexec_load system call
in cases where people care.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] apei/erst-dbg: Define pr_fmt macro to avoid the duplication of ERST_DBG_PFX

2015-06-16 Thread Wang Long
Define pr_fmt macro with {ERST DBG: } prefix, then remove all use
of ERST_DBG_PFX in the pr_* functions.

Signed-off-by: Wang Long 
---
 drivers/acpi/apei/erst-dbg.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/erst-dbg.c b/drivers/acpi/apei/erst-dbg.c
index 3a57ffb..c38133a 100644
--- a/drivers/acpi/apei/erst-dbg.c
+++ b/drivers/acpi/apei/erst-dbg.c
@@ -23,6 +23,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -31,8 +33,6 @@
 
 #include "apei-internal.h"
 
-#define ERST_DBG_PFX   "ERST DBG: "
-
 #define ERST_DBG_RECORD_LEN_MAX0x4000
 
 static void *erst_dbg_buf;
@@ -130,8 +130,7 @@ retry:
if (rc < 0)
goto out;
if (len > ERST_DBG_RECORD_LEN_MAX) {
-   pr_warn(ERST_DBG_PFX
-  "Record (ID: 0x%llx) length is too long: %zd\n",
+   pr_warn("Record (ID: 0x%llx) length is too long: %zd\n",
   id, len);
rc = -EIO;
goto out;
@@ -171,7 +170,7 @@ static ssize_t erst_dbg_write(struct file *filp, const char 
__user *ubuf,
return -EPERM;
 
if (usize > ERST_DBG_RECORD_LEN_MAX) {
-   pr_err(ERST_DBG_PFX "Too long record to be written\n");
+   pr_err("Too long record to be written\n");
return -EINVAL;
}
 
@@ -223,7 +222,7 @@ static struct miscdevice erst_dbg_dev = {
 static __init int erst_dbg_init(void)
 {
if (erst_disable) {
-   pr_info(ERST_DBG_PFX "ERST support is disabled.\n");
+   pr_info("ERST support is disabled.\n");
return -ENODEV;
}
return misc_register(_dbg_dev);
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] apei/*.c: use pr_warn instead of pr_warning

2015-06-16 Thread Wang Long
This patch can avoid the following WARNING when run
checkpatch.pl:
WARNING: Prefer pr_warn(... to pr_warning(...
#21: FILE: drivers/acpi/apei/erst-dbg.c:134:
+   pr_warning("Record (ID: 0x%llx) length is too long: 
%zd\n",

Signed-off-by: Wang Long 
---
 drivers/acpi/apei/apei-base.c | 12 ++--
 drivers/acpi/apei/einj.c  |  6 +++---
 drivers/acpi/apei/erst-dbg.c  |  2 +-
 drivers/acpi/apei/ghes.c  | 14 +++---
 drivers/acpi/apei/hest.c  |  6 +++---
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index a85ac07..5e85af3 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -182,7 +182,7 @@ rewind:
if (ip == ctx->ip) {
if (entry->instruction >= ctx->instructions ||
!ctx->ins_table[entry->instruction].run) {
-   pr_warning(FW_WARN APEI_PFX
+   pr_warn(FW_WARN APEI_PFX
"Invalid action table, unknown instruction type: %d\n",
   entry->instruction);
return -EINVAL;
@@ -223,7 +223,7 @@ static int apei_exec_for_each_entry(struct 
apei_exec_context *ctx,
if (end)
*end = i;
if (ins >= ctx->instructions || !ins_table[ins].run) {
-   pr_warning(FW_WARN APEI_PFX
+   pr_warn(FW_WARN APEI_PFX
"Invalid action table, unknown instruction type: %d\n",
   ins);
return -EINVAL;
@@ -589,7 +589,7 @@ static int apei_check_gar(struct acpi_generic_address *reg, 
u64 *paddr,
space_id = reg->space_id;
*paddr = get_unaligned(>address);
if (!*paddr) {
-   pr_warning(FW_BUG APEI_PFX
+   pr_warn(FW_BUG APEI_PFX
   "Invalid physical address in GAR 
[0x%llx/%u/%u/%u/%u]\n",
   *paddr, bit_width, bit_offset, access_size_code,
   space_id);
@@ -597,7 +597,7 @@ static int apei_check_gar(struct acpi_generic_address *reg, 
u64 *paddr,
}
 
if (access_size_code < 1 || access_size_code > 4) {
-   pr_warning(FW_BUG APEI_PFX
+   pr_warn(FW_BUG APEI_PFX
   "Invalid access size code in GAR 
[0x%llx/%u/%u/%u/%u]\n",
   *paddr, bit_width, bit_offset, access_size_code,
   space_id);
@@ -614,7 +614,7 @@ static int apei_check_gar(struct acpi_generic_address *reg, 
u64 *paddr,
*access_bit_width = 64;
 
if ((bit_width + bit_offset) > *access_bit_width) {
-   pr_warning(FW_BUG APEI_PFX
+   pr_warn(FW_BUG APEI_PFX
   "Invalid bit width + offset in GAR 
[0x%llx/%u/%u/%u/%u]\n",
   *paddr, bit_width, bit_offset, access_size_code,
   space_id);
@@ -623,7 +623,7 @@ static int apei_check_gar(struct acpi_generic_address *reg, 
u64 *paddr,
 
if (space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY &&
space_id != ACPI_ADR_SPACE_SYSTEM_IO) {
-   pr_warning(FW_BUG APEI_PFX
+   pr_warn(FW_BUG APEI_PFX
   "Invalid address space type in GAR 
[0x%llx/%u/%u/%u/%u]\n",
   *paddr, bit_width, bit_offset, access_size_code,
   space_id);
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index a095d4f..ff3daca 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -183,7 +183,7 @@ static int einj_get_available_error_type(u32 *type)
 static int einj_timedout(u64 *t)
 {
if ((s64)*t < SPIN_UNIT) {
-   pr_warning(FW_WARN EINJ_PFX
+   pr_warn(FW_WARN EINJ_PFX
   "Firmware does not respond in time\n");
return 1;
}
@@ -325,7 +325,7 @@ static int __einj_error_trigger(u64 trigger_paddr, u32 type,
}
rc = einj_check_trigger_header(trigger_tab);
if (rc) {
-   pr_warning(FW_BUG EINJ_PFX
+   pr_warn(FW_BUG EINJ_PFX
   "The trigger error action table is invalid\n");
goto out_rel_header;
}
@@ -707,7 +707,7 @@ static int __init einj_init(void)
 
rc = einj_check_table(einj_tab);
if (rc) {
-   pr_warning(FW_BUG EINJ_PFX "EINJ table is invalid\n");
+   pr_warn(FW_BUG EINJ_PFX "EINJ table is invalid\n");
return -EINVAL;
}
 
diff --git a/drivers/acpi/apei/erst-dbg.c b/drivers/acpi/apei/erst-dbg.c
index 04ab5c9..3a57ffb 100644
--- a/drivers/acpi/apei/erst-dbg.c
+++ b/drivers/acpi/apei/erst-dbg.c
@@ -130,7 +130,7 @@ retry:
if 

Re: [PATCH 0/3] aio: ctx->dead cleanups

2015-06-16 Thread Oleg Nesterov
Damn, sorry for noise, forgot to mention.

And whatever I missed, this ctx->dead check in aio_ring_remap()
looks wrong anyway. Please correct me.

On 06/17, Oleg Nesterov wrote:
>
> On 06/17, Oleg Nesterov wrote:
> >
> > On 06/17, Al Viro wrote:
> > >
> > > On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> > > > Al, please help. We are trying to backport some aio fixes and I am
> > > > absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> > > > race".
> > > >
> > > >
> > > > Firstly, I simply can't understand what exactly it tries to fix. OK,
> > > > aio_free_ring() can race with kill and we can remap the 
> > > > soon-to-be-killed
> > > > ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> > > > ctx->mmap_base after it drops mm->ioctx_lock.
> > >
> > > Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> > > Which tries to grab mmap_sem, blocks for mremap() from another thread
> > > and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> > > nothing whatsoever to the argument we'd passed to vm_munmap().
> >
> > Yes. But it seems that you missed another part of my email:
> >
> > So it seems to me we only need this change to ensure that move_vma() can
> > not succeed if ctx was already removed from ->ioctx_table, or, if we 
> > race
> > with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
> > ensure that move_vma()->aio_ring_mmap() can not race with
> > vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this 
> > race
> > doesn't look really bad. The kernel can't crash, just the application 
> > can
> > fool itself.
> >
> > So once again, could explain why do we really need to prevent this?
> > Afaics, if the application is stupid, it can only fool itself.
> >
> > And please note that ctx->mmap_base or/and ctx->mmap_size can be wrong
> > anyway. Say, an application can munmap() this vma, or munmap() the part
> > of this vma.
> 
> And speaking of aio_ring_remap() it can "corrupt" ->mmap_base even with
> this patch. Just you need to mremap() the tail of aio-mapped memory.
> 
> No?
> 
> Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] apei/erst-dbg: Define pr_fmt macro to avoid the duplication of ERST_DBG_PFX

2015-06-16 Thread long.wanglong
On 2015/6/15 22:42, Joe Perches wrote:
> On Mon, 2015-06-15 at 09:57 +, Wang Long wrote:
>> Define pr_fmt macro with {ERST DBG: } prefix, then remove all use
>> of ERST_DBG_PFXin the pr_* functions.
> []
>> diff --git a/drivers/acpi/apei/erst-dbg.c b/drivers/acpi/apei/erst-dbg.c
> []
>> @@ -31,7 +31,8 @@
>>  
>>  #include "apei-internal.h"
>>  
>> -#define ERST_DBG_PFX"ERST DBG: "
>> +#undef pr_fmt
>> +#define pr_fmt(fmt) "ERST DBG: " fmt
> 
> Moving this #define above the first #include is what's
> generally done and doesn't require an #undef
> 
OK, Thanks

> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/3] watchdog: bcm2835: Add poweroff code for the Raspberry Pi

2015-06-16 Thread Stephen Warren
On 06/16/2015 03:39 AM, Noralf Trønnes wrote:
> 
> Den 16.06.2015 05:07, skrev Stephen Warren:
>> On 06/13/2015 05:39 AM, Noralf Trønnes wrote:
>>> This adds a new poweroff function to the watchdog driver for the
>>> Raspberry Pi. Currently poweroff/halt results in a reboot.
>>>
>>> The Raspberry Pi firmware uses the RSTS register to know which
>>> partiton to boot from. The partiton value is spread into bits
>>> 0, 2, 4, 6, 8, 10. Partiton 63 is a special partition used by
>>> the firmware to indicate halt.
>>>
>>> The firmware made this change in 19 Aug 2013 and was matched
>>> by the downstream commit:
>>> Changes for new NOOBS multi partition booting from gsh
>> I don't understand why we need a new compatible value here; why not
>> simply modify the existing bcm2835_power_off() function. That is written
>> to do something that's interpreted by the RPi firmware, not something
>> that the bcm2835 HW does.
>>
>> Admittedly the current name is a bit misleading, but fixing that should
>> be a separate change to fixing the implementation to do what the current
>> firmware expects.
> 
> There are other boards that use the BCM2835 and I didn't want to break the
> behaviour for those that use the reference firmware.

We don't support those other board in mainline Linux AFAIK. In other
discussions, Eric Anholt stated that the Roku 2 for example doesn't use
the same firmware (albeit they were derived from the same base a long
way back apparently) so I have no good reason to believe this logic is a
standard across difference bcm2835 devices. Do you know more specific
details?

> Roku 2 device uses
> this soc, and changing bcm2835_power_off() would break support for it.
> ODROID-W also use BCM2835, but this is a Pi clone so I don't know if they
> have matched their firmware behaviour to that of the Pi (admittedly not
> many boards were made, their source of chips went dry).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] perf/kvm: Guest Symbol Resolution for powerpc

2015-06-16 Thread Hemant Kumar

Hi David,

Thanks for the review.

On 06/16/2015 08:23 PM, David Ahern wrote:

On 6/15/15 8:50 PM, Hemant Kumar wrote:

+/*
+ * Get the instruction pointer from the tracepoint data
+ */
+u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *data)
+{
+u64 tp_ip = data->ip;
+int trap;
+
+if (!strcmp(KVMPPC_EXIT, evsel->name)) {
+trap = raw_field_value(evsel->tp_format, "trap", 
data->raw_data);

+
+if (trap == HV_DECREMENTER)
+tp_ip = raw_field_value(evsel->tp_format, "pc",
+data->raw_data);
+}
+return tp_ip;
+}


You can tie a handler to an event; see builtin-trace.c for example 
(evsel->handler = handler). Then have the sample handler call it (e.g, 
see trace__process_sample). Then you don't have to check event names 
on each pass like this and just do event based processing.



+
+/*
+ * Get the HV and PR bits and accordingly, determine the cpumode
+ */
+u8 arch__get_cpumode(union perf_event *event, struct perf_evsel *evsel,
+ struct perf_sample *data)
+{
+unsigned long hv, pr, msr;
+u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+if (strcmp(KVMPPC_EXIT, evsel->name))
+goto ret;
+
+if (data->raw_data)
+msr = raw_field_value(evsel->tp_format, "msr", data->raw_data);
+else
+goto ret;
+
+hv = msr & ((long unsigned)1 << (PPC_MAX - HV_BIT));
+pr = msr & ((long unsigned)1 << (PPC_MAX - PR_BIT));
+
+if (!hv && pr)
+cpumode = PERF_RECORD_MISC_GUEST_USER;
+else
+cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+ret:
+return cpumode;
+}


Why isn't that set properly kernel side when the sample is generated?




Because, this depends on the kernel tracepoint "kvm_hv:kvm_guest_exit".
perf_prepare_sample() in the kernel side sets the event->header.misc 
field to

PERF_RECORD_MISC_KERNEL through perf_misc_flags(pt_regs). In case of
tracepoints which always get hit in the host kernel context, the
perf_misc_flags() will always return PERF_RECORD_MISC_KERNEL.

IMHO we will rather have to set the cpumode in the user space for this 
tracepoint

and we can't depend on the event->header.misc field for this case.

What would you suggest?

--
Thanks,
Hemant Kumar

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] aio: ctx->dead cleanups

2015-06-16 Thread Oleg Nesterov
On 06/17, Al Viro wrote:
>
> On Wed, Jun 17, 2015 at 01:39:06AM +0100, Al Viro wrote:
>
> > Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> > Which tries to grab mmap_sem, blocks for mremap() from another thread
> > and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> > nothing whatsoever to the argument we'd passed to vm_munmap().  Sure,
> > it had been recalculated by aio_ring_remap(), but it's too late for
> > us - we'd already fetched the old value.
>
> And yes, the leak you've spotted is real, but I would very much prefer
> to avoid that goto - something like this instead:
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 034e2d3..b36b530 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -291,7 +291,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>   if (err < 0) {
>   move_page_tables(new_vma, new_addr, vma, old_addr,
>moved_len, true);
> - return err;
> + vma = new_vma;
> + old_len = new_len;
> + old_addr = new_addr;
> + new_addr = err;

Personally, I'd really prefer to factor out at least this
move_page_tables() with six args. Although I agree, "goto previous_if"
doesn't look nice too, this needs cleanup.

But this is minor. I am already sleeping, most probably I misread
this code. But it seems that there is another bug with VM_ACCOUNT.

I'll recheck tomorrow and write another email.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] TPM: remove unnecessary little endian conversion

2015-06-16 Thread Ashley Lai

Looks good.

Reviewed-by: Ashley Lai 

Thanks,
--Ashley

On Tue, 9 Jun 2015, Hon Ching(Vicky) Lo wrote:


The base pointer for the event log is allocated in the local
kernel (in prom_instantiate_sml()), therefore it is already in
the host's endian byte order and requires no conversion.

The content of the 'basep' pointer in read_log() stores the
base address of the log. This patch ensures that it is correctly
implemented.

Signed-off-by: Hon Ching(Vicky) Lo 
Signed-off-by: Joy Latten 
---
drivers/char/tpm/tpm_of.c |4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm_of.c b/drivers/char/tpm/tpm_of.c
index c002d1b..62a22ce 100644
--- a/drivers/char/tpm/tpm_of.c
+++ b/drivers/char/tpm/tpm_of.c
@@ -24,7 +24,7 @@ int read_log(struct tpm_bios_log *log)
{
struct device_node *np;
const u32 *sizep;
-   const __be64 *basep;
+   const u64 *basep;

if (log->bios_event_log != NULL) {
pr_err("%s: ERROR - Eventlog already initialized\n", __func__);
@@ -63,7 +63,7 @@ int read_log(struct tpm_bios_log *log)

log->bios_event_log_end = log->bios_event_log + *sizep;

-   memcpy(log->bios_event_log, __va(be64_to_cpup(basep)), *sizep);
+   memcpy(log->bios_event_log, __va(*basep), *sizep);

return 0;

--
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] vTPM: support little endian guests

2015-06-16 Thread Ashley Lai

Just a small comment otherwise it looks good.

On Tue, 9 Jun 2015, Hon Ching(Vicky) Lo wrote:

case NONHOST_INFO:
-   name = tcpa_event_type_strings[event->event_type];
+   name =
+   tcpa_event_type_strings[do_endian_conversion
+   (event->event_type)];

Not being picky but if it does not exceed 80 characters it looks better
to join the line above.
   name = tcpa_event_type_strings[do_endian_conversion
  (event->event_type)];


case POST_CONTENTS:
-   name = tcpa_pc_event_id_strings[pc_event->event_id];
+   name =
+   tcpa_pc_event_id_strings[do_endian_conversion
+(pc_event->event_id)];

Same as above.

Thanks,
--Ashley Lai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: call_rcu from trace_preempt

2015-06-16 Thread Steven Rostedt
On Tue, 16 Jun 2015 18:04:39 -0700
Alexei Starovoitov  wrote:
> 
> you mean similar to what rb_wakeups() and friends are doing?
> makes sense. starting to study it...

Yes, I meant those functions. But yours would be slightly different. As
it would be the one calling the irq work that would be setting the
flag, not a separate entity (like the reader of the ring buffer).

I haven't fully thought it through, but I imagine it should still work.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] aio: ctx->dead cleanups

2015-06-16 Thread Oleg Nesterov
On 06/17, Oleg Nesterov wrote:
>
> On 06/17, Al Viro wrote:
> >
> > On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> > > Al, please help. We are trying to backport some aio fixes and I am
> > > absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> > > race".
> > >
> > >
> > > Firstly, I simply can't understand what exactly it tries to fix. OK,
> > > aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> > > ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> > > ctx->mmap_base after it drops mm->ioctx_lock.
> >
> > Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> > Which tries to grab mmap_sem, blocks for mremap() from another thread
> > and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> > nothing whatsoever to the argument we'd passed to vm_munmap().
>
> Yes. But it seems that you missed another part of my email:
>
>   So it seems to me we only need this change to ensure that move_vma() can
>   not succeed if ctx was already removed from ->ioctx_table, or, if we 
> race
>   with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
>   ensure that move_vma()->aio_ring_mmap() can not race with
>   vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this 
> race
>   doesn't look really bad. The kernel can't crash, just the application 
> can
>   fool itself.
>
> So once again, could explain why do we really need to prevent this?
> Afaics, if the application is stupid, it can only fool itself.
>
> And please note that ctx->mmap_base or/and ctx->mmap_size can be wrong
> anyway. Say, an application can munmap() this vma, or munmap() the part
> of this vma.

And speaking of aio_ring_remap() it can "corrupt" ->mmap_base even with
this patch. Just you need to mremap() the tail of aio-mapped memory.

No?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] aio: ctx->dead cleanups

2015-06-16 Thread Oleg Nesterov
On 06/17, Al Viro wrote:
>
> On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> > Al, please help. We are trying to backport some aio fixes and I am
> > absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> > race".
> >
> >
> > Firstly, I simply can't understand what exactly it tries to fix. OK,
> > aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> > ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> > ctx->mmap_base after it drops mm->ioctx_lock.
>
> Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> Which tries to grab mmap_sem, blocks for mremap() from another thread
> and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> nothing whatsoever to the argument we'd passed to vm_munmap().

Yes. But it seems that you missed another part of my email:

So it seems to me we only need this change to ensure that move_vma() can
not succeed if ctx was already removed from ->ioctx_table, or, if we 
race
with ioctx_alloc(), it was not added to ->ioctx_table. IOW, we need to
ensure that move_vma()->aio_ring_mmap() can not race with
vm_munmap(ctx->mmap_base) in kill_ioctx() or ioctx_alloc(). And this 
race
doesn't look really bad. The kernel can't crash, just the application 
can
fool itself.

So once again, could explain why do we really need to prevent this?
Afaics, if the application is stupid, it can only fool itself.

And please note that ctx->mmap_base or/and ctx->mmap_size can be wrong
anyway. Say, an application can munmap() this vma, or munmap() the part
of this vma.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 08/18] baycom_epp: Replace rdtscl() with native_read_tsc()

2015-06-16 Thread Thomas Sailer

Acked-by: Thomas Sailer 

On 06/17/2015 02:35 AM, Andy Lutomirski wrote:

This is only used if BAYCOM_DEBUG is defined.

Cc: walter harms 
Cc: Ralf Baechle 
Cc: Thomas Sailer 
Cc: linux-h...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---

I'm hoping for an ack for this to go through -tip.

  drivers/net/hamradio/baycom_epp.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hamradio/baycom_epp.c 
b/drivers/net/hamradio/baycom_epp.c
index 83c7cce0d172..44e5c3b5e0af 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -638,7 +638,7 @@ static int receive(struct net_device *dev, int cnt)
  #define GETTICK(x)\
  ({\
if (cpu_has_tsc)  \
-   rdtscl(x);\
+   x = (unsigned int)native_read_tsc();  \
  })
  #else /* __i386__ */
  #define GETTICK(x)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] isdn: disable HiSax NetJet driver on microblaze arch

2015-06-16 Thread Nicolai Stange
Fix an allmodconfig compilation failer on microblaze due to big endian
architectures being apparently unsupported by the NetJet code:
  drivers/isdn/hisax/nj_s.c: In function 'setup_netjet_s':
  drivers/isdn/hisax/nj_s.c:265:2:
  error: #error "not running on big endian machines now"

Modify the relevant Kconfig such that the NetJet code is not built on
microblaze anymore.

Note that endianess on microblaze is not determined through Kconfig,
but by means of a compiler provided CPP macro, namely __MICROBLAZEEL__.
However, gcc defaults to big endianess on that platform.

Signed-off-by: Nicolai Stange 
---
 The maintainer tree listed under "ISDN SUBSYSTEM" in MAINTAINERS does
 not exist anymore. I created the diff against the Linus tree.

 drivers/isdn/hisax/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/isdn/hisax/Kconfig b/drivers/isdn/hisax/Kconfig
index 97465ac..eb83d94 100644
--- a/drivers/isdn/hisax/Kconfig
+++ b/drivers/isdn/hisax/Kconfig
@@ -237,7 +237,7 @@ config HISAX_MIC
 
 config HISAX_NETJET
bool "NETjet card"
-   depends on PCI && (BROKEN || !(PPC || PARISC || M68K || (MIPS && 
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA && !CPU_LITTLE_ENDIAN)))
+   depends on PCI && (BROKEN || !(PPC || PARISC || M68K || (MIPS && 
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA && !CPU_LITTLE_ENDIAN) || MICROBLAZE))
depends on VIRT_TO_BUS
help
  This enables HiSax support for the NetJet from Traverse
@@ -249,7 +249,7 @@ config HISAX_NETJET
 
 config HISAX_NETJET_U
bool "NETspider U card"
-   depends on PCI && (BROKEN || !(PPC || PARISC || M68K || (MIPS && 
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA && !CPU_LITTLE_ENDIAN)))
+   depends on PCI && (BROKEN || !(PPC || PARISC || M68K || (MIPS && 
!CPU_LITTLE_ENDIAN) || FRV || (XTENSA && !CPU_LITTLE_ENDIAN) || MICROBLAZE))
depends on VIRT_TO_BUS
help
  This enables HiSax support for the Netspider U interface ISDN card
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: call_rcu from trace_preempt

2015-06-16 Thread Alexei Starovoitov

On 6/16/15 5:47 PM, Steven Rostedt wrote:

Do what I do in tracing. Use a bit (per cpu?) test.

Add the element to the list (that will be a cmpxchg, but I'm not sure
you can avoid it), then check the bit to see if the irq work is already
been activated. If not, then activate the irq work and set the bit.
Then you will not have any more cmpxchg in the fast path.


you mean similar to what rb_wakeups() and friends are doing?
makes sense. starting to study it...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scripts/tags.sh: Include trace_*_rcuidle() in tags

2015-06-16 Thread Stephen Boyd
Every tracepoint creates two functions, the usual one 'trace_*()'
and the rcuidle one 'trace_*_rcuidle()'. Add regex for the
rcuidle variant so that we can jump to the tracepoints that use
rcuidle.

Cc: Steven Rostedt 
Signed-off-by: Stephen Boyd 
---
 scripts/tags.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/scripts/tags.sh b/scripts/tags.sh
index cdb491d84503..896bb303f735 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -170,7 +170,9 @@ exuberant()
--regex-c='/^SYSCALL_DEFINE[[:digit:]]?\(([^,)]*).*/sys_\1/' \

--regex-c='/^COMPAT_SYSCALL_DEFINE[[:digit:]]?\(([^,)]*).*/compat_sys_\1/' \
--regex-c++='/^TRACE_EVENT\(([^,)]*).*/trace_\1/'   \
+   --regex-c++='/^TRACE_EVENT\(([^,)]*).*/trace_\1_rcuidle/'   \
--regex-c++='/^DEFINE_EVENT\([^,)]*, *([^,)]*).*/trace_\1/' \
+   --regex-c++='/^DEFINE_EVENT\([^,)]*, *([^,)]*).*/trace_\1_rcuidle/' \
--regex-c++='/PAGEFLAG\(([^,)]*).*/Page\1/' \
--regex-c++='/PAGEFLAG\(([^,)]*).*/SetPage\1/'  \
--regex-c++='/PAGEFLAG\(([^,)]*).*/ClearPage\1/'\
@@ -233,7 +235,9 @@ emacs()
--regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/'   \
--regex='/^COMPAT_SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/compat_sys_\1/' \
--regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/'  \
+   --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1_rcuidle/'  \
--regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1/' \
+   --regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1_rcuidle/' \
--regex='/PAGEFLAG(\([^,)]*\).*/Page\1/'\
--regex='/PAGEFLAG(\([^,)]*\).*/SetPage\1/' \
--regex='/PAGEFLAG(\([^,)]*\).*/ClearPage\1/'   \
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] aio: ctx->dead cleanups

2015-06-16 Thread Al Viro
On Wed, Jun 17, 2015 at 01:39:06AM +0100, Al Viro wrote:

> Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
> Which tries to grab mmap_sem, blocks for mremap() from another thread
> and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
> nothing whatsoever to the argument we'd passed to vm_munmap().  Sure,
> it had been recalculated by aio_ring_remap(), but it's too late for
> us - we'd already fetched the old value.

And yes, the leak you've spotted is real, but I would very much prefer
to avoid that goto - something like this instead:

diff --git a/mm/mremap.c b/mm/mremap.c
index 034e2d3..b36b530 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -291,7 +291,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (err < 0) {
move_page_tables(new_vma, new_addr, vma, old_addr,
 moved_len, true);
-   return err;
+   vma = new_vma;
+   old_len = new_len;
+   old_addr = new_addr;
+   new_addr = err;
}
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: call_rcu from trace_preempt

2015-06-16 Thread Steven Rostedt
On Tue, 16 Jun 2015 17:33:24 -0700
Alexei Starovoitov  wrote:

> On 6/16/15 10:37 AM, Steven Rostedt wrote:
> >>> + kfree(l);
> >> >
> >> >that's not right, since such thread defeats rcu protection of lookup.
> >> >We need either kfree_rcu/call_rcu or synchronize_rcu.
> >> >Obviously the former is preferred that's why I'm still digging into it.
> >> >Probably a thread that does kfree_rcu would be ok, but we shouldn't
> >> >be doing it unconditionally. For all networking programs and 99%
> >> >of tracing programs the existing code is fine and I don't want to
> >> >slow it down to tackle the corner case.
> >> >Extra spin_lock just to add it to the list is also quite costly.
> > Use a irq_work() handler to do the kfree_rcu(), and use llist (lockless
> > list) to add items to the list.
> 
> have been studying irq_work and llist... it will work, but it's quite
> costly too. Every kfree_rcu will be replaced with irq_work_queue(),
> which is irq_work_claim() with one lock_cmpxchg plus another
> lock_cmpxchg in llist_add, plus another lock_cmpxchg for our own llist
> of 'to be kfree_rcu-ed htab elements'. That's a lot.
> The must be better solution. Need to explore more.

Do what I do in tracing. Use a bit (per cpu?) test.

Add the element to the list (that will be a cmpxchg, but I'm not sure
you can avoid it), then check the bit to see if the irq work is already
been activated. If not, then activate the irq work and set the bit.
Then you will not have any more cmpxchg in the fast path.

In your irq work handler, you clear the bit, process all the entries
until they are empty, check if the bit is set again, and repeat.

I haven't looked at the thread before I was added to the Cc, so I'm
answering this out of context.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 01/18] x86/tsc: Inline native_read_tsc and remove __native_read_tsc

2015-06-16 Thread Andy Lutomirski
In cdc7957d1954 ("x86: move native_read_tsc() offline"),
native_read_tsc was moved out of line, presumably for some
now-obsolete vDSO-related reason.  Undo it.

The entire rdtsc, shl, or sequence is only 11 bytes, and calls via
rdtscl and similar helpers were already inlined.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/vdso/vclock_gettime.c  | 2 +-
 arch/x86/include/asm/msr.h| 8 +++-
 arch/x86/include/asm/pvclock.h| 2 +-
 arch/x86/include/asm/stackprotector.h | 2 +-
 arch/x86/include/asm/tsc.h| 2 +-
 arch/x86/kernel/apb_timer.c   | 4 ++--
 arch/x86/kernel/tsc.c | 6 --
 7 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index 9793322751e0..972b488ac16a 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -186,7 +186,7 @@ notrace static cycle_t vread_tsc(void)
 * but no one has ever seen it happen.
 */
rdtsc_barrier();
-   ret = (cycle_t)__native_read_tsc();
+   ret = (cycle_t)native_read_tsc();
 
last = gtod->cycle_last;
 
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index e6a707eb5081..88711470af7f 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -106,12 +106,10 @@ notrace static inline int native_write_msr_safe(unsigned 
int msr,
return err;
 }
 
-extern unsigned long long native_read_tsc(void);
-
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
-static __always_inline unsigned long long __native_read_tsc(void)
+static __always_inline unsigned long long native_read_tsc(void)
 {
DECLARE_ARGS(val, low, high);
 
@@ -181,10 +179,10 @@ static inline int rdmsrl_safe(unsigned msr, unsigned long 
long *p)
 }
 
 #define rdtscl(low)\
-   ((low) = (u32)__native_read_tsc())
+   ((low) = (u32)native_read_tsc())
 
 #define rdtscll(val)   \
-   ((val) = __native_read_tsc())
+   ((val) = native_read_tsc())
 
 #define rdpmc(counter, low, high)  \
 do {   \
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index d6b078e9fa28..71bd485c2986 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -62,7 +62,7 @@ static inline u64 pvclock_scale_delta(u64 delta, u32 
mul_frac, int shift)
 static __always_inline
 u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
 {
-   u64 delta = __native_read_tsc() - src->tsc_timestamp;
+   u64 delta = native_read_tsc() - src->tsc_timestamp;
return pvclock_scale_delta(delta, src->tsc_to_system_mul,
   src->tsc_shift);
 }
diff --git a/arch/x86/include/asm/stackprotector.h 
b/arch/x86/include/asm/stackprotector.h
index c2e00bb2a136..bc5fa2af112e 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -72,7 +72,7 @@ static __always_inline void boot_init_stack_canary(void)
 * on during the bootup the random pool has true entropy too.
 */
get_random_bytes(, sizeof(canary));
-   tsc = __native_read_tsc();
+   tsc = native_read_tsc();
canary += tsc + (tsc << 32UL);
 
current->stack_canary = canary;
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 94605c0e9cee..fd11128faf25 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -42,7 +42,7 @@ static __always_inline cycles_t vget_cycles(void)
if (!cpu_has_tsc)
return 0;
 #endif
-   return (cycles_t)__native_read_tsc();
+   return (cycles_t)native_read_tsc();
 }
 
 extern void tsc_init(void);
diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c
index ede92c3364d3..9fe111cc50f8 100644
--- a/arch/x86/kernel/apb_timer.c
+++ b/arch/x86/kernel/apb_timer.c
@@ -390,13 +390,13 @@ unsigned long apbt_quick_calibrate(void)
old = dw_apb_clocksource_read(clocksource_apbt);
old += loop;
 
-   t1 = __native_read_tsc();
+   t1 = native_read_tsc();
 
do {
new = dw_apb_clocksource_read(clocksource_apbt);
} while (new < old);
 
-   t2 = __native_read_tsc();
+   t2 = native_read_tsc();
 
shift = 5;
if (unlikely(loop >> shift == 0)) {
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 505449700e0c..e7710cd7ba00 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -308,12 +308,6 @@ unsigned long long
 sched_clock(void) __attribute__((alias("native_sched_clock")));
 #endif
 
-unsigned long long native_read_tsc(void)
-{
-   return __native_read_tsc();
-}
-EXPORT_SYMBOL(native_read_tsc);
-
 int check_tsc_unstable(void)
 {
return tsc_unstable;
-- 

Re: [PATCH] regulator: qcom_spmi: Fix missing parenthesis

2015-06-16 Thread Stephen Boyd
On 06/16/2015 05:39 PM, Axel Lin wrote:
> Fix the logic in spmi_regulator_common_list_voltage() to return correct uV.
>
> Signed-off-by: Axel Lin 

There are two, and patch already sent[1].

[1] https://patchwork.kernel.org/patch/6619381/

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspicious RCU usage at boot w/ arm ipi trace events?

2015-06-16 Thread Stephen Boyd
On 06/16/2015 05:31 PM, Paul E. McKenney wrote:
> On Tue, Jun 16, 2015 at 05:21:59PM -0700, John Stultz wrote:
>> Not sure if I have the right folks on CC. Noticed this booting on a
>> apq8064 based system w/ 4.1-rc7 (plus a few patches that I believe are
>> unrelated).
>>
>>
>> [0.008796] CPU0: thread -1, cpu 0, socket 0, mpidr 8000
>> [0.009024] Setting up static identity map for 0x80208280 - 0x802082d8
>> [0.084781] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
>> [0.085119]
>> [0.085139] ===
>> [0.085152] [ INFO: suspicious RCU usage. ]
>> [0.085168] 4.1.0-rc7-00033-gb5bed2f #153 Not tainted
>> [0.085180] ---
>> [0.085195] include/trace/events/ipi.h:68 suspicious
>> rcu_dereference_check() usage!
>> [0.085205]
>> [0.085205] other info that might help us debug this:
>> [0.085205]
>> [0.085220]
>> [0.085220] RCU used illegally from idle CPU!
>> [0.085220] rcu_scheduler_active = 1, debug_locks = 0
>> [0.085235] RCU used illegally from extended quiescent state!
>> [0.085247] no locks held by swapper/0/0.
>> [0.085256]
>> [0.085256] stack backtrace:
>> [0.085275] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>> 4.1.0-rc7-00033-gb5bed2f #153
>> [0.085288] Hardware name: Qualcomm (Flattened Device Tree)
>> [0.085323] [] (unwind_backtrace) from []
>> (show_stack+0x10/0x14)
>> [0.085348] [] (show_stack) from []
>> (dump_stack+0x70/0xbc)
>> [0.085369] [] (dump_stack) from []
>> (handle_IPI+0x428/0x604)
>> [0.085393] [] (handle_IPI) from []
>> (gic_handle_irq+0x54/0x5c)
>> [0.085423] [] (gic_handle_irq) from []
>> (__irq_svc+0x44/0x7c)
>> [0.085436] Exception stack(0xc09f3f48 to 0xc09f3f90)
>> [0.085451] 3f40:   0001 0001 
>> c09f73b8 c09f4528 c0a5de9c
>> [0.085468] 3f60: c076b4f0   c09ef108 c0a5cec1
>> 0001  c09f3f90
>> [0.085481] 3f80: c026bf60 c0210ab8 2113 
>> [0.085512] [] (__irq_svc) from []
>> (arch_cpu_idle+0x20/0x3c)
>> [0.085549] [] (arch_cpu_idle) from []
>> (cpu_startup_entry+0x2c0/0x5dc)
>> [0.085576] [] (cpu_startup_entry) from []
>> (start_kernel+0x358/0x3c4)
>> [0.085600] [] (start_kernel) from [<8020807c>] (0x8020807c)
>>
>>
>> Let me know if there's any patches you'd like me to test, or if you'd
>> like config details.
> At first glance, looks like a failure to invoke irq_enter() from
> __irq_svc(), gic_handle_irq(), or handle_IPI().
>
>   Thanx, Paul
>

The tracepoint 'trace_ipi_entry' in handle_IPI()  is using RCU and we
haven't called irq_enter() yet at the point. Does this tracepoint need
to have _rcuidle() added to it?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 09/18] staging/lirc_serial: Remove TSC-based timing

2015-06-16 Thread Andy Lutomirski
It wasn't compiled in by default.  I suspect that the driver was and
still is broken, though -- it's calling udelay with a parameter
that's derived from loops_per_jiffy.

Cc: Jarod Wilson 
Cc: de...@driverdev.osuosl.org
Cc: Greg Kroah-Hartman 
Signed-off-by: Andy Lutomirski 
---
 drivers/staging/media/lirc/lirc_serial.c | 63 ++--
 1 file changed, 4 insertions(+), 59 deletions(-)

diff --git a/drivers/staging/media/lirc/lirc_serial.c 
b/drivers/staging/media/lirc/lirc_serial.c
index dc7984455c3a..465796a686c4 100644
--- a/drivers/staging/media/lirc/lirc_serial.c
+++ b/drivers/staging/media/lirc/lirc_serial.c
@@ -327,9 +327,6 @@ static void safe_udelay(unsigned long usecs)
  * time
  */
 
-/* So send_pulse can quickly convert microseconds to clocks */
-static unsigned long conv_us_to_clocks;
-
 static int init_timing_params(unsigned int new_duty_cycle,
unsigned int new_freq)
 {
@@ -344,7 +341,6 @@ static int init_timing_params(unsigned int new_duty_cycle,
/* How many clocks in a microsecond?, avoiding long long divide */
work = loops_per_sec;
work *= 4295;  /* 4295 = 2^32 / 1e6 */
-   conv_us_to_clocks = work >> 32;
 
/*
 * Carrier period in clocks, approach good up to 32GHz clock,
@@ -357,10 +353,9 @@ static int init_timing_params(unsigned int new_duty_cycle,
pulse_width = period * duty_cycle / 100;
space_width = period - pulse_width;
dprintk("in init_timing_params, freq=%d, duty_cycle=%d, "
-   "clk/jiffy=%ld, pulse=%ld, space=%ld, "
-   "conv_us_to_clocks=%ld\n",
+   "clk/jiffy=%ld, pulse=%ld, space=%ld\n",
freq, duty_cycle, __this_cpu_read(cpu_info.loops_per_jiffy),
-   pulse_width, space_width, conv_us_to_clocks);
+   pulse_width, space_width);
return 0;
 }
 #else /* ! USE_RDTSC */
@@ -431,63 +426,14 @@ static long send_pulse_irdeo(unsigned long length)
return ret;
 }
 
-#ifdef USE_RDTSC
-/* Version that uses Pentium rdtsc instruction to measure clocks */
-
-/*
- * This version does sub-microsecond timing using rdtsc instruction,
- * and does away with the fudged LIRC_SERIAL_TRANSMITTER_LATENCY
- * Implicitly i586 architecture...  - Steve
- */
-
-static long send_pulse_homebrew_softcarrier(unsigned long length)
-{
-   int flag;
-   unsigned long target, start, now;
-
-   /* Get going quick as we can */
-   rdtscl(start);
-   on();
-   /* Convert length from microseconds to clocks */
-   length *= conv_us_to_clocks;
-   /* And loop till time is up - flipping at right intervals */
-   now = start;
-   target = pulse_width;
-   flag = 1;
-   /*
-* FIXME: This looks like a hard busy wait, without even an occasional,
-* polite, cpu_relax() call.  There's got to be a better way?
-*
-* The i2c code has the result of a lot of bit-banging work, I wonder if
-* there's something there which could be helpful here.
-*/
-   while ((now - start) < length) {
-   /* Delay till flip time */
-   do {
-   rdtscl(now);
-   } while ((now - start) < target);
-
-   /* flip */
-   if (flag) {
-   rdtscl(now);
-   off();
-   target += space_width;
-   } else {
-   rdtscl(now); on();
-   target += pulse_width;
-   }
-   flag = !flag;
-   }
-   rdtscl(now);
-   return ((now - start) - length) / conv_us_to_clocks;
-}
-#else /* ! USE_RDTSC */
 /* Version using udelay() */
 
 /*
  * here we use fixed point arithmetic, with 8
  * fractional bits.  that gets us within 0.1% or so of the right average
  * frequency, albeit with some jitter in pulse length - Steve
+ *
+ * This should use ndelay instead.
  */
 
 /* To match 8 fractional bits used for pulse/space length */
@@ -520,7 +466,6 @@ static long send_pulse_homebrew_softcarrier(unsigned long 
length)
}
return (actual-length) >> 8;
 }
-#endif /* USE_RDTSC */
 
 static long send_pulse_homebrew(unsigned long length)
 {
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 07/18] x86/cpu/amd: Use the full 64-bit TSC to detect the 2.6.2 bug

2015-06-16 Thread Andy Lutomirski
This code is timing 100k indirect calls, so the added overhead of
counting the number of cycles elapsed as a 64-bit number should be
insignificant.  Drop the optimization of using a 32-bit count.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/cpu/amd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 5bd3a99dc20b..c5ceec532799 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -107,7 +107,7 @@ static void init_amd_k6(struct cpuinfo_x86 *c)
const int K6_BUG_LOOP = 100;
int n;
void (*f_vide)(void);
-   unsigned long d, d2;
+   u64 d, d2;
 
printk(KERN_INFO "AMD K6 stepping B detected - ");
 
@@ -118,10 +118,10 @@ static void init_amd_k6(struct cpuinfo_x86 *c)
 
n = K6_BUG_LOOP;
f_vide = vide;
-   rdtscl(d);
+   d = native_read_tsc();
while (n--)
f_vide();
-   rdtscl(d2);
+   d2 = native_read_tsc();
d = d2-d;
 
if (d > 20*K6_BUG_LOOP)
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 08/18] baycom_epp: Replace rdtscl() with native_read_tsc()

2015-06-16 Thread Andy Lutomirski
This is only used if BAYCOM_DEBUG is defined.

Cc: walter harms 
Cc: Ralf Baechle 
Cc: Thomas Sailer 
Cc: linux-h...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---

I'm hoping for an ack for this to go through -tip.

 drivers/net/hamradio/baycom_epp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hamradio/baycom_epp.c 
b/drivers/net/hamradio/baycom_epp.c
index 83c7cce0d172..44e5c3b5e0af 100644
--- a/drivers/net/hamradio/baycom_epp.c
+++ b/drivers/net/hamradio/baycom_epp.c
@@ -638,7 +638,7 @@ static int receive(struct net_device *dev, int cnt)
 #define GETTICK(x)\
 ({\
if (cpu_has_tsc)  \
-   rdtscl(x);\
+   x = (unsigned int)native_read_tsc();  \
 })
 #else /* __i386__ */
 #define GETTICK(x)
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 04/18] x86/tsc: Replace rdtscll with native_read_tsc

2015-06-16 Thread Andy Lutomirski
Now that the read_tsc paravirt hook is gone, rdtscll() is just a
wrapper around native_read_tsc().  Unwrap it.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/boot/compressed/aslr.c  | 2 +-
 arch/x86/include/asm/msr.h   | 3 ---
 arch/x86/include/asm/tsc.h   | 5 +
 arch/x86/kernel/apb_timer.c  | 4 ++--
 arch/x86/kernel/apic/apic.c  | 8 
 arch/x86/kernel/cpu/mcheck/mce.c | 4 ++--
 arch/x86/kernel/espfix_64.c  | 2 +-
 arch/x86/kernel/hpet.c   | 4 ++--
 arch/x86/kernel/trace_clock.c| 2 +-
 arch/x86/kernel/tsc.c| 4 ++--
 arch/x86/kvm/vmx.c   | 2 +-
 arch/x86/lib/delay.c | 2 +-
 drivers/thermal/intel_powerclamp.c   | 4 ++--
 tools/power/cpupower/debug/kernel/cpufreq-test_tsc.c | 4 ++--
 14 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index d7b1f655b3ef..ea33236190b1 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -82,7 +82,7 @@ static unsigned long get_random_long(void)
 
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr(" RDTSC");
-   rdtscll(raw);
+   raw = native_read_tsc();
 
random ^= raw;
use_i8254 = false;
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index d1afac7df484..7273b74e0f99 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -192,9 +192,6 @@ do {
\
 #define rdtscl(low)\
((low) = (u32)native_read_tsc())
 
-#define rdtscll(val)   \
-   ((val) = native_read_tsc())
-
 #define rdtscp(low, high, aux) \
 do {\
unsigned long long _val = native_read_tscp(&(aux)); \
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 3da1cc1218ac..b4883902948b 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -21,15 +21,12 @@ extern void disable_TSC(void);
 
 static inline cycles_t get_cycles(void)
 {
-   unsigned long long ret = 0;
-
 #ifndef CONFIG_X86_TSC
if (!cpu_has_tsc)
return 0;
 #endif
-   rdtscll(ret);
 
-   return ret;
+   return native_read_tsc();
 }
 
 extern void tsc_init(void);
diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c
index 9fe111cc50f8..25efa534c4e4 100644
--- a/arch/x86/kernel/apb_timer.c
+++ b/arch/x86/kernel/apb_timer.c
@@ -263,7 +263,7 @@ static int apbt_clocksource_register(void)
 
/* Verify whether apbt counter works */
t1 = dw_apb_clocksource_read(clocksource_apbt);
-   rdtscll(start);
+   start = native_read_tsc();
 
/*
 * We don't know the TSC frequency yet, but waiting for
@@ -273,7 +273,7 @@ static int apbt_clocksource_register(void)
 */
do {
rep_nop();
-   rdtscll(now);
+   now = native_read_tsc();
} while ((now - start) < 20UL);
 
/* APBT is the only always on clocksource, it has to work! */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index dcb52850a28f..51af1ed1ae2e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -457,7 +457,7 @@ static int lapic_next_deadline(unsigned long delta,
 {
u64 tsc;
 
-   rdtscll(tsc);
+   tsc = native_read_tsc();
wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR));
return 0;
 }
@@ -592,7 +592,7 @@ static void __init lapic_cal_handler(struct 
clock_event_device *dev)
unsigned long pm = acpi_pm_read_early();
 
if (cpu_has_tsc)
-   rdtscll(tsc);
+   tsc = native_read_tsc();
 
switch (lapic_cal_loops++) {
case 0:
@@ -1209,7 +1209,7 @@ void setup_local_APIC(void)
long long max_loops = cpu_khz ? cpu_khz : 100;
 
if (cpu_has_tsc)
-   rdtscll(tsc);
+   tsc = native_read_tsc();
 
if (disable_apic) {
disable_ioapic_support();
@@ -1293,7 +1293,7 @@ void setup_local_APIC(void)
}
if (queued) {
if (cpu_has_tsc && cpu_khz) {
-   rdtscll(ntsc);
+   ntsc = native_read_tsc();
max_loops = (cpu_khz << 10) - (ntsc - tsc);
} else
max_loops--;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 

[PATCH v3 05/18] x86/tsc: Remove the rdtscp and rdtscpll macros

2015-06-16 Thread Andy Lutomirski
They have no users.  Leave native_read_tscp, which seems potentially
useful despite also having no callers.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/msr.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 7273b74e0f99..626f78199665 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -192,15 +192,6 @@ do {   
\
 #define rdtscl(low)\
((low) = (u32)native_read_tsc())
 
-#define rdtscp(low, high, aux) \
-do {\
-   unsigned long long _val = native_read_tscp(&(aux)); \
-   (low) = (u32)_val;  \
-   (high) = (u32)(_val >> 32); \
-} while (0)
-
-#define rdtscpll(val, aux) (val) = native_read_tscp(&(aux))
-
 /*
  * 64-bit version of wrmsr_safe():
  */
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 12/18] x86/tsc: Remove rdtscl()

2015-06-16 Thread Andy Lutomirski
It has no more callers, and it was never a very sensible interface
to begin with.  Users of the TSC should either read all 64 bits or
explicitly throw out the high bits.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/msr.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 626f78199665..c89ed6ceed02 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -189,9 +189,6 @@ do {
\
 
 #endif /* !CONFIG_PARAVIRT */
 
-#define rdtscl(low)\
-   ((low) = (u32)native_read_tsc())
-
 /*
  * 64-bit version of wrmsr_safe():
  */
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipmi: Convert the IPMI SI ACPI handling to a platform device

2015-06-16 Thread Corey Minyard
Ping, anyone one the ACPI list care to comment on this?

I should give some history here.  The IPMI spec specifies four different
IPMI interfaces using the IPI0001 ID.  Three of the interfaces are close
enough that they are in the same driver, since they all have about the
same behavior and the differences can be handled by a small state
machine.  However, one interface is quite different since it runs over
SMBus, so I wrote a different driver for it.

IPI0001 was hard-coded in the acpi_pnp_device_ids table, and that was
preventing the SMBus interface from finding IPI0001, even when it was
inside the SMBus ACPI path.  However, removing the direct PNP
nterfaces and instead using the platform interface allowed the removal
of IPI0001 from the acpi_pnp_device_ids table, and that seems to allow
both the SMBus and platform IPMI interface to work properly.

I have two specific questions:

  * Is having two device drivers handle the same id like this right?

  * Will automatic module loading do the right thing here?  To work right,
it would have to load both drivers and succeed if one of the drivers
succeeded.


There is a proposal on the table to create a new device id for the SMBus
interface, but if that could be avoided it would be better, I think.

Thanks,

-corey

On 06/10/2015 07:34 AM, miny...@acm.org wrote:
> From: Corey Minyard 
>
> The IPMI SI driver was using direct PNP, but that was not really
> ideal because the IPMI device is a platform device.  There was
> some special handling in the acpi_pnp.c code for making this work,
> but that was breaking ACPI handling for the IPMI SSIF driver.
>
> So use a platform device for ACPI detection and remove the
> entry from acpi_pnp.c.
>
> Signed-off-by: Corey Minyard 
> ---
>  drivers/acpi/acpi_pnp.c  |   2 -
>  drivers/char/ipmi/ipmi_si_intf.c | 320 
> +++
>  2 files changed, 157 insertions(+), 165 deletions(-)
>
>
> I'm posting this, primarily for comments from the ACPI group, since I'm
> removing IPI0001 from the acpi_pnp_device_ids array, and to be sure that
> I'm doing this correctly.
>
> Also, anyone feel free to test this, obviously :).
>
> Thanks,
>
> -corey
>
>
> diff --git a/drivers/acpi/acpi_pnp.c b/drivers/acpi/acpi_pnp.c
> index ff6d8ad..896351be 100644
> --- a/drivers/acpi/acpi_pnp.c
> +++ b/drivers/acpi/acpi_pnp.c
> @@ -19,8 +19,6 @@ static const struct acpi_device_id acpi_pnp_device_ids[] = {
>   {"PNP0600"},/* Generic ESDI/IDE/ATA compatible hard disk 
> controller */
>   /* floppy */
>   {"PNP0700"},
> - /* ipmi_si */
> - {"IPI0001"},
>   /* tpm_inf_pnp */
>   {"IFX0101"},/* Infineon TPMs */
>   {"IFX0102"},/* Infineon TPMs */
> diff --git a/drivers/char/ipmi/ipmi_si_intf.c 
> b/drivers/char/ipmi/ipmi_si_intf.c
> index e8b406b..7a93574 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -64,7 +64,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -309,9 +308,6 @@ static int num_force_kipmid;
>  #ifdef CONFIG_PCI
>  static bool pci_registered;
>  #endif
> -#ifdef CONFIG_ACPI
> -static bool pnp_registered;
> -#endif
>  #ifdef CONFIG_PARISC
>  static bool parisc_registered;
>  #endif
> @@ -2233,134 +2229,6 @@ static void spmi_find_bmc(void)
>   try_init_spmi(spmi);
>   }
>  }
> -
> -static int ipmi_pnp_probe(struct pnp_dev *dev,
> - const struct pnp_device_id *dev_id)
> -{
> - struct acpi_device *acpi_dev;
> - struct smi_info *info;
> - struct resource *res, *res_second;
> - acpi_handle handle;
> - acpi_status status;
> - unsigned long long tmp;
> - int rv = -EINVAL;
> -
> - acpi_dev = pnp_acpi_device(dev);
> - if (!acpi_dev)
> - return -ENODEV;
> -
> - info = smi_info_alloc();
> - if (!info)
> - return -ENOMEM;
> -
> - info->addr_source = SI_ACPI;
> - printk(KERN_INFO PFX "probing via ACPI\n");
> -
> - handle = acpi_dev->handle;
> - info->addr_info.acpi_info.acpi_handle = handle;
> -
> - /* _IFT tells us the interface type: KCS, BT, etc */
> - status = acpi_evaluate_integer(handle, "_IFT", NULL, );
> - if (ACPI_FAILURE(status)) {
> - dev_err(>dev, "Could not find ACPI IPMI interface type\n");
> - goto err_free;
> - }
> -
> - switch (tmp) {
> - case 1:
> - info->si_type = SI_KCS;
> - break;
> - case 2:
> - info->si_type = SI_SMIC;
> - break;
> - case 3:
> - info->si_type = SI_BT;
> - break;
> - case 4: /* SSIF, just ignore */
> - rv = -ENODEV;
> - goto err_free;
> - default:
> - dev_info(>dev, "unknown IPMI type %lld\n", tmp);
> - goto err_free;
> - }
> -
> - res = pnp_get_resource(dev, IORESOURCE_IO, 0);
> - if (res) 

[PATCH v3 11/18] drivers/input/gameport: Replace rdtscl() with native_read_tsc()

2015-06-16 Thread Andy Lutomirski
It's unclear to me why this code exists in the first place.

Acked-by: Dmitry Torokhov 
Cc: linux-in...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 drivers/input/gameport/gameport.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/gameport/gameport.c 
b/drivers/input/gameport/gameport.c
index e853a2134680..abc0cb22e750 100644
--- a/drivers/input/gameport/gameport.c
+++ b/drivers/input/gameport/gameport.c
@@ -149,9 +149,9 @@ static int old_gameport_measure_speed(struct gameport 
*gameport)
 
for(i = 0; i < 50; i++) {
local_irq_save(flags);
-   rdtscl(t1);
+   t1 = native_read_tsc();
for (t = 0; t < 50; t++) gameport_read(gameport);
-   rdtscl(t2);
+   t2 = native_read_tsc();
local_irq_restore(flags);
udelay(i * 10);
if (t2 - t1 < tx) tx = t2 - t1;
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] regulator: qcom_spmi: Fix missing parenthesis

2015-06-16 Thread Axel Lin
Fix the logic in spmi_regulator_common_list_voltage() to return correct uV.

Signed-off-by: Axel Lin 
---
 drivers/regulator/qcom_spmi-regulator.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/regulator/qcom_spmi-regulator.c 
b/drivers/regulator/qcom_spmi-regulator.c
index 162b865..1cc435e 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -747,10 +747,11 @@ static int spmi_regulator_common_list_voltage(struct 
regulator_dev *rdev,
return 0;
 
for (i = 0; i < vreg->set_points->count; i++) {
-   if (selector < vreg->set_points->range[i].n_voltages)
+   if (selector < vreg->set_points->range[i].n_voltages) {
uV = selector * vreg->set_points->range[i].step_uV
+ vreg->set_points->range[i].set_point_min_uV;
break;
+   }
 
selector -= vreg->set_points->range[i].n_voltages;
}
-- 
2.1.0



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] aio: ctx->dead cleanups

2015-06-16 Thread Al Viro
On Wed, Jun 17, 2015 at 01:04:14AM +0200, Oleg Nesterov wrote:
> Al, please help. We are trying to backport some aio fixes and I am
> absolutely confused by your b2edffdd912b "fix mremap() vs. ioctx_kill()
> race".
> 
> 
> Firstly, I simply can't understand what exactly it tries to fix. OK,
> aio_free_ring() can race with kill and we can remap the soon-to-be-killed
> ctx. So what? kill_ioctx() will the the correct (already re-mapped)
> ctx->mmap_base after it drops mm->ioctx_lock.

Huh?  kill_ioctx() picks ctx->mmap_base and passes it to vm_munmap().
Which tries to grab mmap_sem, blocks for mremap() from another thread
and waits for it to drop mmap_sem.  By that time ctx->mmap_base has
nothing whatsoever to the argument we'd passed to vm_munmap().  Sure,
it had been recalculated by aio_ring_remap(), but it's too late for
us - we'd already fetched the old value.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 10/18] input/joystick/analog: Switch from rdtscl() to native_read_tsc()

2015-06-16 Thread Andy Lutomirski
This timing code is hideous, and this doesn't help.  It gets rid of
one of the last users of rdtscl, though.

Acked-by: Dmitry Torokhov 
Cc: linux-in...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 drivers/input/joystick/analog.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
index 4284080e481d..f871b4f00056 100644
--- a/drivers/input/joystick/analog.c
+++ b/drivers/input/joystick/analog.c
@@ -143,7 +143,7 @@ struct analog_port {
 
 #include 
 
-#define GET_TIME(x)do { if (cpu_has_tsc) rdtscl(x); else x = 
get_time_pit(); } while (0)
+#define GET_TIME(x)do { if (cpu_has_tsc) x = (unsigned 
int)native_read_tsc(); else x = get_time_pit(); } while (0)
 #define DELTA(x,y) (cpu_has_tsc ? ((y) - (x)) : ((x) - (y) + ((x) < (y) ? 
PIT_TICK_RATE / HZ : 0)))
 #define TIME_NAME  (cpu_has_tsc?"TSC":"PIT")
 static unsigned int get_time_pit(void)
@@ -160,7 +160,7 @@ static unsigned int get_time_pit(void)
 return count;
 }
 #elif defined(__x86_64__)
-#define GET_TIME(x)rdtscl(x)
+#define GET_TIME(x)do { x = (unsigned int)native_read_tsc(); } while (0)
 #define DELTA(x,y) ((y)-(x))
 #define TIME_NAME  "TSC"
 #elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) || 
defined(CONFIG_ARM64) || defined(CONFIG_TILE)
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 16/18] x86/tsc: In read_tsc, use rdtsc_ordered() instead of get_cycles()

2015-06-16 Thread Andy Lutomirski
There are two logical changes here.  First, this removes a check for
cpu_has_tsc.  That check is unnecessary, as we don't register the
TSC as a clocksource on systems that have no TSC.  Second, it adds a
barrier, thus preventing observable non-monotonicity.

I suspect that the missing barrier was never a problem in practice
because system calls themselves were heavy enough barriers to
prevent user code from observing time warps due to speculation.
(Without the corresponding barrier in the vDSO, however,
non-monotonicity is easy to detect.)

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/tsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 21d6e04e3e82..451bade0d320 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -961,7 +961,7 @@ static struct clocksource clocksource_tsc;
  */
 static cycle_t read_tsc(struct clocksource *cs)
 {
-   return (cycle_t)get_cycles();
+   return (cycle_t)rdtsc_ordered();
 }
 
 /*
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 13/18] x86/tsc: Rename native_read_tsc() to rdtsc()

2015-06-16 Thread Andy Lutomirski
Now that there is no paravirt TSC, the "native" is inappropriate.
The function does RDTSC, so give it the obvious name: rdtsc()

Suggested-by: Borislav Petkov 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/boot/compressed/aslr.c  |  2 +-
 arch/x86/entry/vdso/vclock_gettime.c |  2 +-
 arch/x86/include/asm/msr.h   | 11 ++-
 arch/x86/include/asm/pvclock.h   |  2 +-
 arch/x86/include/asm/stackprotector.h|  2 +-
 arch/x86/include/asm/tsc.h   |  2 +-
 arch/x86/kernel/apb_timer.c  |  8 
 arch/x86/kernel/apic/apic.c  |  8 
 arch/x86/kernel/cpu/amd.c|  4 ++--
 arch/x86/kernel/cpu/mcheck/mce.c |  4 ++--
 arch/x86/kernel/espfix_64.c  |  2 +-
 arch/x86/kernel/hpet.c   |  4 ++--
 arch/x86/kernel/trace_clock.c|  2 +-
 arch/x86/kernel/tsc.c|  4 ++--
 arch/x86/kvm/lapic.c |  4 ++--
 arch/x86/kvm/svm.c   |  4 ++--
 arch/x86/kvm/vmx.c   |  4 ++--
 arch/x86/kvm/x86.c   | 12 ++--
 arch/x86/lib/delay.c |  8 
 drivers/input/gameport/gameport.c|  4 ++--
 drivers/input/joystick/analog.c  |  4 ++--
 drivers/net/hamradio/baycom_epp.c|  2 +-
 drivers/thermal/intel_powerclamp.c   |  4 ++--
 tools/power/cpupower/debug/kernel/cpufreq-test_tsc.c |  4 ++--
 24 files changed, 58 insertions(+), 49 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index ea33236190b1..6a9b96b4624d 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -82,7 +82,7 @@ static unsigned long get_random_long(void)
 
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr(" RDTSC");
-   raw = native_read_tsc();
+   raw = rdtsc();
 
random ^= raw;
use_i8254 = false;
diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index 972b488ac16a..0340d93c18ca 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -186,7 +186,7 @@ notrace static cycle_t vread_tsc(void)
 * but no one has ever seen it happen.
 */
rdtsc_barrier();
-   ret = (cycle_t)native_read_tsc();
+   ret = (cycle_t)rdtsc();
 
last = gtod->cycle_last;
 
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index c89ed6ceed02..ff0c120dafe5 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -109,7 +109,16 @@ notrace static inline int native_write_msr_safe(unsigned 
int msr,
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
-static __always_inline unsigned long long native_read_tsc(void)
+/**
+ * rdtsc() - returns the current TSC without ordering constraints
+ *
+ * rdtsc() returns the result of RDTSC as a 64-bit integer.  The
+ * only ordering constraint it supplies is the ordering implied by
+ * "asm volatile": it will put the RDTSC in the place you expect.  The
+ * CPU can and will speculatively execute that RDTSC, though, so the
+ * results can be non-monotonic if compared on different CPUs.
+ */
+static __always_inline unsigned long long rdtsc(void)
 {
DECLARE_ARGS(val, low, high);
 
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 71bd485c2986..6084bce345fc 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -62,7 +62,7 @@ static inline u64 pvclock_scale_delta(u64 delta, u32 
mul_frac, int shift)
 static __always_inline
 u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
 {
-   u64 delta = native_read_tsc() - src->tsc_timestamp;
+   u64 delta = rdtsc() - src->tsc_timestamp;
return pvclock_scale_delta(delta, src->tsc_to_system_mul,
   src->tsc_shift);
 }
diff --git a/arch/x86/include/asm/stackprotector.h 
b/arch/x86/include/asm/stackprotector.h
index bc5fa2af112e..58505f01962f 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -72,7 +72,7 @@ static __always_inline void boot_init_stack_canary(void)
 * on during the bootup the random pool has true entropy too.
 */
get_random_bytes(, sizeof(canary));
-   tsc = native_read_tsc();
+   tsc = rdtsc();
canary += tsc + (tsc << 32UL);
 
current->stack_canary = canary;
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index b4883902948b..3df7675debcf 100644
--- a/arch/x86/include/asm/tsc.h
+++ 

[PATCH v3 15/18] x86/tsc: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers

2015-06-16 Thread Andy Lutomirski
Using get_cycles was unnecessary: check_tsc_warp() is not called on
TSC-less systems.  Replace rdtsc_barrier(); get_cycles() with
rdtsc_ordered().

While we're at it, make the somewhat more dangerous change of
removing barrier_before_rdtsc after RDTSC in the TSC warp check
code.  This should be okay, though -- the vDSO TSC code doesn't have
that barrier, so, if removing the barrier from the warp check would
cause us to detect a warp that we otherwise wouldn't detect, then we
have a genuine bug.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/tsc_sync.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index dd8d0791dfb5..78083bf23ed1 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -39,16 +39,15 @@ static cycles_t max_warp;
 static int nr_warps;
 
 /*
- * TSC-warp measurement loop running on both CPUs:
+ * TSC-warp measurement loop running on both CPUs.  This is not called
+ * if there is no TSC.
  */
 static void check_tsc_warp(unsigned int timeout)
 {
cycles_t start, now, prev, end;
int i;
 
-   rdtsc_barrier();
-   start = get_cycles();
-   rdtsc_barrier();
+   start = rdtsc_ordered();
/*
 * The measurement runs for 'timeout' msecs:
 */
@@ -63,9 +62,7 @@ static void check_tsc_warp(unsigned int timeout)
 */
arch_spin_lock(_lock);
prev = last_tsc;
-   rdtsc_barrier();
-   now = get_cycles();
-   rdtsc_barrier();
+   now = rdtsc_ordered();
last_tsc = now;
arch_spin_unlock(_lock);
 
@@ -126,7 +123,7 @@ void check_tsc_sync_source(int cpu)
 
/*
 * No need to check if we already know that the TSC is not
-* synchronized:
+* synchronized or if we have no TSC.
 */
if (unsynchronized_tsc())
return;
@@ -190,6 +187,7 @@ void check_tsc_sync_target(void)
 {
int cpus = 2;
 
+   /* Also aborts if there is no TSC. */
if (unsynchronized_tsc() || tsc_clocksource_reliable)
return;
 
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 14/18] x86: Add rdtsc_ordered() and use it in trivial call sites

2015-06-16 Thread Andy Lutomirski
rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires
more thought than should be necessary.  Add an rdtsc_ordered()
helper and replace the trivial call sites with it.

This should not change generated code.  The duplication of the fence
asm is temporary.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/vdso/vclock_gettime.c | 16 ++--
 arch/x86/include/asm/msr.h   | 26 ++
 arch/x86/kernel/trace_clock.c|  7 +--
 arch/x86/kvm/x86.c   | 16 ++--
 arch/x86/lib/delay.c |  9 +++--
 5 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
b/arch/x86/entry/vdso/vclock_gettime.c
index 0340d93c18ca..ca94fa649251 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -175,20 +175,8 @@ static notrace cycle_t vread_pvclock(int *mode)
 
 notrace static cycle_t vread_tsc(void)
 {
-   cycle_t ret;
-   u64 last;
-
-   /*
-* Empirically, a fence (of type that depends on the CPU)
-* before rdtsc is enough to ensure that rdtsc is ordered
-* with respect to loads.  The various CPU manuals are unclear
-* as to whether rdtsc can be reordered with later loads,
-* but no one has ever seen it happen.
-*/
-   rdtsc_barrier();
-   ret = (cycle_t)rdtsc();
-
-   last = gtod->cycle_last;
+   cycle_t ret = (cycle_t)rdtsc_ordered();
+   u64 last = gtod->cycle_last;
 
if (likely(ret >= last))
return ret;
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index ff0c120dafe5..02bdd6c65017 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -127,6 +127,32 @@ static __always_inline unsigned long long rdtsc(void)
return EAX_EDX_VAL(val, low, high);
 }
 
+/**
+ * rdtsc_ordered() - read the current TSC in program order
+ *
+ * rdtsc_ordered() returns the result of RDTSC as a 64-bit integer.
+ * It is ordered like a load to a global in-memory counter.  It should
+ * be impossible to observe non-monotonic rdtsc_unordered() behavior
+ * across multiple CPUs as long as the TSC is synced.
+ */
+static __always_inline unsigned long long rdtsc_ordered(void)
+{
+   /*
+* The RDTSC instruction is not ordered relative to memory
+* access.  The Intel SDM and the AMD APM are both vague on this
+* point, but empirically an RDTSC instruction can be
+* speculatively executed before prior loads.  An RDTSC
+* immediately after an appropriate barrier appears to be
+* ordered as a normal load, that is, it provides the same
+* ordering guarantees as reading from a global memory location
+* that some other imaginary CPU is updating continuously with a
+* time stamp.
+*/
+   alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
+ "lfence", X86_FEATURE_LFENCE_RDTSC);
+   return rdtsc();
+}
+
 static inline unsigned long long native_read_pmc(int counter)
 {
DECLARE_ARGS(val, low, high);
diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
index 67efb8c96fc4..80bb24d9b880 100644
--- a/arch/x86/kernel/trace_clock.c
+++ b/arch/x86/kernel/trace_clock.c
@@ -12,10 +12,5 @@
  */
 u64 notrace trace_clock_x86_tsc(void)
 {
-   u64 ret;
-
-   rdtsc_barrier();
-   ret = rdtsc();
-
-   return ret;
+   return rdtsc_ordered();
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b0afdc74c28a..dfccaf2f2e00 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1419,20 +1419,8 @@ EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
 static cycle_t read_tsc(void)
 {
-   cycle_t ret;
-   u64 last;
-
-   /*
-* Empirically, a fence (of type that depends on the CPU)
-* before rdtsc is enough to ensure that rdtsc is ordered
-* with respect to loads.  The various CPU manuals are unclear
-* as to whether rdtsc can be reordered with later loads,
-* but no one has ever seen it happen.
-*/
-   rdtsc_barrier();
-   ret = (cycle_t)rdtsc();
-
-   last = pvclock_gtod_data.clock.cycle_last;
+   cycle_t ret = (cycle_t)rdtsc_ordered();
+   u64 last = pvclock_gtod_data.clock.cycle_last;
 
if (likely(ret >= last))
return ret;
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index f24bc59ab0a0..4453d52a143d 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -54,11 +54,9 @@ static void delay_tsc(unsigned long __loops)
 
preempt_disable();
cpu = smp_processor_id();
-   rdtsc_barrier();
-   bclock = rdtsc();
+   bclock = rdtsc_ordered();
for (;;) {
-   rdtsc_barrier();
-   now = rdtsc();
+   now = rdtsc_ordered();
if ((now - bclock) >= loops)
break;
 
@@ -79,8 

[PATCH v3 17/18] x86/kvm/tsc: Drop extra barrier and use rdtsc_ordered in kvmclock

2015-06-16 Thread Andy Lutomirski
__pvclock_read_cycles had an unnecessary barrier.  Get rid of that
barrier and clean up the code by just using rdtsc_ordered().

Cc: Paolo Bonzini 
Cc: Radim Krcmar 
Cc: Marcelo Tosatti 
Cc: k...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---

I'm hoping to get an ack for this to go in through -tip.  (Arguably
I'm the maintainer of this code given how it's used, but I should
still ask for an ack.)

arch/x86/include/asm/pvclock.h | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 6084bce345fc..cf2329ca4812 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -62,7 +62,18 @@ static inline u64 pvclock_scale_delta(u64 delta, u32 
mul_frac, int shift)
 static __always_inline
 u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
 {
-   u64 delta = rdtsc() - src->tsc_timestamp;
+   /*
+* Note: emulated platforms which do not advertise SSE2 support
+* break rdtsc_ordered, resulting in kvmclock not using the
+* necessary RDTSC barriers.  Without barriers, it is possible
+* that RDTSC instruction is executed before prior loads,
+* resulting in violation of monotonicity.
+*
+* On an SMP guest without SSE2, it's unclear how anything is
+* supposed to work correctly, though -- memory fences
+* (e.g. smp_mb) are important for more than just timing.
+*/
+   u64 delta = rdtsc_ordered() - src->tsc_timestamp;
return pvclock_scale_delta(delta, src->tsc_to_system_mul,
   src->tsc_shift);
 }
@@ -76,17 +87,9 @@ unsigned __pvclock_read_cycles(const struct 
pvclock_vcpu_time_info *src,
u8 ret_flags;
 
version = src->version;
-   /* Note: emulated platforms which do not advertise SSE2 support
-* result in kvmclock not using the necessary RDTSC barriers.
-* Without barriers, it is possible that RDTSC instruction reads from
-* the time stamp counter outside rdtsc_barrier protected section
-* below, resulting in violation of monotonicity.
-*/
-   rdtsc_barrier();
offset = pvclock_get_nsec_offset(src);
ret = src->system_time + offset;
ret_flags = src->flags;
-   rdtsc_barrier();
 
*cycles = ret;
*flags = ret_flags;
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 18/18] x86/tsc: Remove rdtsc_barrier()

2015-06-16 Thread Andy Lutomirski
All callers have been converted to rdtsc_ordered().

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/barrier.h | 11 ---
 arch/x86/um/asm/barrier.h  | 13 -
 2 files changed, 24 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index e51a8f803f55..818cb8788225 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -91,15 +91,4 @@ do { 
\
 #define smp_mb__before_atomic()barrier()
 #define smp_mb__after_atomic() barrier()
 
-/*
- * Stop RDTSC speculation. This is needed when you need to use RDTSC
- * (or get_cycles or vread that possibly accesses the TSC) in a defined
- * code region.
- */
-static __always_inline void rdtsc_barrier(void)
-{
-   alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
- "lfence", X86_FEATURE_LFENCE_RDTSC);
-}
-
 #endif /* _ASM_X86_BARRIER_H */
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index b9531d343134..755481f14d90 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -45,17 +45,4 @@
 #define read_barrier_depends() do { } while (0)
 #define smp_read_barrier_depends() do { } while (0)
 
-/*
- * Stop RDTSC speculation. This is needed when you need to use RDTSC
- * (or get_cycles or vread that possibly accesses the TSC) in a defined
- * code region.
- *
- * (Could use an alternative three way for this if there was one.)
- */
-static inline void rdtsc_barrier(void)
-{
-   alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
- "lfence", X86_FEATURE_LFENCE_RDTSC);
-}
-
 #endif
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 03/18] x86/tsc/paravirt: Remove the read_tsc and read_tscp paravirt hooks

2015-06-16 Thread Andy Lutomirski
We've had read_tsc and read_tscp paravirt hooks since the very
beginning of paravirt, i.e., d3561b7fa0fb ("[PATCH] paravirt: header
and stubs for paravirtualisation").  AFAICT the only paravirt guest
implementation that ever replaced these calls was vmware, and it's
gone.  Arguably even vmware shouldn't have hooked rdtsc -- we fully
support systems that don't have a TSC at all, so there's no point
for a paravirt implementation to pretend that we have a TSC but to
replace it.

I also doubt that these hooks actually worked.  Calls to rdtscl and
rdtscll, which respected the hooks, were used seemingly
interchangeably with native_read_tsc, which did not.

Just remove them.  If anyone ever needs them again, they can try
to make a case for why they need them.

Before, on a paravirt config:
   textdata bss dec hex filename
134265051827056 14508032297615931c62039 vmlinux

After:
   textdata bss dec hex filename
134266171827056 14508032297617051c620a9 vmlinux

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/msr.h| 16 
 arch/x86/include/asm/paravirt.h   | 34 --
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/paravirt.c|  2 --
 arch/x86/kernel/paravirt_patch_32.c   |  2 --
 arch/x86/xen/enlighten.c  |  3 ---
 6 files changed, 8 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 88711470af7f..d1afac7df484 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -178,12 +178,6 @@ static inline int rdmsrl_safe(unsigned msr, unsigned long 
long *p)
return err;
 }
 
-#define rdtscl(low)\
-   ((low) = (u32)native_read_tsc())
-
-#define rdtscll(val)   \
-   ((val) = native_read_tsc())
-
 #define rdpmc(counter, low, high)  \
 do {   \
u64 _l = native_read_pmc((counter));\
@@ -193,6 +187,14 @@ do {   
\
 
 #define rdpmcl(counter, val) ((val) = native_read_pmc(counter))
 
+#endif /* !CONFIG_PARAVIRT */
+
+#define rdtscl(low)\
+   ((low) = (u32)native_read_tsc())
+
+#define rdtscll(val)   \
+   ((val) = native_read_tsc())
+
 #define rdtscp(low, high, aux) \
 do {\
unsigned long long _val = native_read_tscp(&(aux)); \
@@ -202,8 +204,6 @@ do {
\
 
 #define rdtscpll(val, aux) (val) = native_read_tscp(&(aux))
 
-#endif /* !CONFIG_PARAVIRT */
-
 /*
  * 64-bit version of wrmsr_safe():
  */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d143bfad45d7..c2be0375bcad 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -174,19 +174,6 @@ static inline int rdmsrl_safe(unsigned msr, unsigned long 
long *p)
return err;
 }
 
-static inline u64 paravirt_read_tsc(void)
-{
-   return PVOP_CALL0(u64, pv_cpu_ops.read_tsc);
-}
-
-#define rdtscl(low)\
-do {   \
-   u64 _l = paravirt_read_tsc();   \
-   low = (int)_l;  \
-} while (0)
-
-#define rdtscll(val) (val = paravirt_read_tsc())
-
 static inline unsigned long long paravirt_sched_clock(void)
 {
return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
@@ -215,27 +202,6 @@ do {   \
 
 #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
 
-static inline unsigned long long paravirt_rdtscp(unsigned int *aux)
-{
-   return PVOP_CALL1(u64, pv_cpu_ops.read_tscp, aux);
-}
-
-#define rdtscp(low, high, aux) \
-do {   \
-   int __aux;  \
-   unsigned long __val = paravirt_rdtscp(&__aux);  \
-   (low) = (u32)__val; \
-   (high) = (u32)(__val >> 32);\
-   (aux) = __aux;  \
-} while (0)
-
-#define rdtscpll(val, aux) \
-do {   \
-   unsigned long __aux;\
-   val = paravirt_rdtscp(&__aux);  \
-   (aux) = __aux;  \
-} while (0)
-
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned 
entries)
 {
PVOP_VCALL2(pv_cpu_ops.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h 

[PATCH v3 06/18] x86/tsc: Use the full 64-bit tsc in tsc_delay

2015-06-16 Thread Andy Lutomirski
As a very minor optimization, tsc_delay was only using the low 32
bits of the TSC.  It's a delay function, so just use the whole
thing.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/lib/delay.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 9a52ad0c0758..35115f3786a9 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -49,16 +49,16 @@ static void delay_loop(unsigned long loops)
 /* TSC based delay: */
 static void delay_tsc(unsigned long __loops)
 {
-   u32 bclock, now, loops = __loops;
+   u64 bclock, now, loops = __loops;
int cpu;
 
preempt_disable();
cpu = smp_processor_id();
rdtsc_barrier();
-   rdtscl(bclock);
+   bclock = native_read_tsc();
for (;;) {
rdtsc_barrier();
-   rdtscl(now);
+   now = native_read_tsc();
if ((now - bclock) >= loops)
break;
 
@@ -80,7 +80,7 @@ static void delay_tsc(unsigned long __loops)
loops -= (now - bclock);
cpu = smp_processor_id();
rdtsc_barrier();
-   rdtscl(bclock);
+   bclock = native_read_tsc();
}
}
preempt_enable();
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 00/18] x86/tsc: Clean up rdtsc helpers

2015-06-16 Thread Andy Lutomirski
My sincere apologies for the spam.  I send an unholy mixture of the
real patch set and an old poorly split-up patch set, and the result
is incomprehensible.  Here's what I meant to send.

After the some recent threads about rdtsc barriers, I remembered
that our RDTSC wrappers are a big mess.  Let's clean it up.

Currently we have rdtscl, rdtscll, native_read_tsc,
paravirt_read_tsc, and rdtsc_barrier.  For people who haven't
noticed rdtsc_barrier and who haven't carefully read the docs,
there's no indication that all of the other accessors have a giant
ordering gotcha.  The macro forms are ugly, and the paravirt
implementation is completely pointless.

rdtscl is particularly awful.  It reads the low bits.  There are no
performance critical users of just the low bits anywhere in the
kernel.

Clean it up.  After this patch set, there are exactly three
functions.  rdtsc_unordered() is a function that does a raw RDTSC
and returns a 64-bit number.  rdtsc_ordered() is a function that
does a properly ordered RDTSC for general-purpose use.
barrier_before_rdtsc() is exactly what it sounds like.

Changes from v2:
 - Rename rdtsc_unordered to just rdtsc
 - Get rid of rdtsc_barrier entirely instead of renaming it
 - The KVM patch is new (see above)
 - Added some acks

Changes from v1:
 - None, except that I screwed up the v1 emails.

Andy Lutomirski (18):
  x86/tsc: Inline native_read_tsc and remove __native_read_tsc
  x86/msr/kvm: Remove vget_cycles()
  x86/tsc/paravirt: Remove the read_tsc and read_tscp paravirt hooks
  x86/tsc: Replace rdtscll with native_read_tsc
  x86/tsc: Remove the rdtscp and rdtscpll macros
  x86/tsc: Use the full 64-bit tsc in tsc_delay
  x86/cpu/amd: Use the full 64-bit TSC to detect the 2.6.2 bug
  baycom_epp: Replace rdtscl() with native_read_tsc()
  staging/lirc_serial: Remove TSC-based timing
  input/joystick/analog: Switch from rdtscl() to native_read_tsc()
  drivers/input/gameport: Replace rdtscl() with native_read_tsc()
  x86/tsc: Remove rdtscl()
  x86/tsc: Rename native_read_tsc() to rdtsc()
  x86: Add rdtsc_ordered() and use it in trivial call sites
  x86/tsc: Use rdtsc_ordered() in check_tsc_warp() and drop extra
barriers
  x86/tsc: In read_tsc, use rdtsc_ordered() instead of get_cycles()
  x86/kvm/tsc: Drop extra barrier and use rdtsc_ordered in kvmclock
  x86/tsc: Remove rdtsc_barrier()

 arch/x86/boot/compressed/aslr.c|  2 +-
 arch/x86/entry/vdso/vclock_gettime.c   | 16 +-
 arch/x86/include/asm/barrier.h | 11 
 arch/x86/include/asm/msr.h | 54 ---
 arch/x86/include/asm/paravirt.h| 34 
 arch/x86/include/asm/paravirt_types.h  |  2 -
 arch/x86/include/asm/pvclock.h | 21 
 arch/x86/include/asm/stackprotector.h  |  2 +-
 arch/x86/include/asm/tsc.h | 18 +--
 arch/x86/kernel/apb_timer.c|  8 +--
 arch/x86/kernel/apic/apic.c|  8 +--
 arch/x86/kernel/cpu/amd.c  |  6 +--
 arch/x86/kernel/cpu/mcheck/mce.c   |  4 +-
 arch/x86/kernel/espfix_64.c|  2 +-
 arch/x86/kernel/hpet.c |  4 +-
 arch/x86/kernel/paravirt.c |  2 -
 arch/x86/kernel/paravirt_patch_32.c|  2 -
 arch/x86/kernel/trace_clock.c  |  7 +--
 arch/x86/kernel/tsc.c  | 12 ++---
 arch/x86/kernel/tsc_sync.c | 14 +++--
 arch/x86/kvm/lapic.c   |  4 +-
 arch/x86/kvm/svm.c |  4 +-
 arch/x86/kvm/vmx.c |  4 +-
 arch/x86/kvm/x86.c | 26 +++--
 arch/x86/lib/delay.c   | 13 ++---
 arch/x86/um/asm/barrier.h  | 13 -
 arch/x86/xen/enlighten.c   |  3 --
 drivers/input/gameport/gameport.c  |  4 +-
 drivers/input/joystick/analog.c|  4 +-
 drivers/net/hamradio/baycom_epp.c  |  2 +-
 drivers/staging/media/lirc/lirc_serial.c   | 63 ++
 drivers/thermal/intel_powerclamp.c |  4 +-
 .../power/cpupower/debug/kernel/cpufreq-test_tsc.c |  4 +-
 33 files changed, 110 insertions(+), 267 deletions(-)

-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 02/18] x86/msr/kvm: Remove vget_cycles()

2015-06-16 Thread Andy Lutomirski
The only caller was kvm's read_tsc.  The only difference between
vget_cycles and native_read_tsc was that vget_cycles returned zero
instead of crashing on TSC-less systems.  KVM's already checks
vclock_mode before calling that function, so the extra check is
unnecessary.

(Off-topic, but the whole KVM clock host implementation is gross.
 IMO it should be rewritten.)

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/tsc.h | 13 -
 arch/x86/kvm/x86.c |  2 +-
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index fd11128faf25..3da1cc1218ac 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -32,19 +32,6 @@ static inline cycles_t get_cycles(void)
return ret;
 }
 
-static __always_inline cycles_t vget_cycles(void)
-{
-   /*
-* We only do VDSOs on TSC capable CPUs, so this shouldn't
-* access boot_cpu_data (which is not VDSO-safe):
-*/
-#ifndef CONFIG_X86_TSC
-   if (!cpu_has_tsc)
-   return 0;
-#endif
-   return (cycles_t)native_read_tsc();
-}
-
 extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26eaeb522cab..c26faf408bce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1430,7 +1430,7 @@ static cycle_t read_tsc(void)
 * but no one has ever seen it happen.
 */
rdtsc_barrier();
-   ret = (cycle_t)vget_cycles();
+   ret = (cycle_t)native_read_tsc();
 
last = pvclock_gtod_data.clock.cycle_last;
 
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >