Re: [PATCH] Inet-hashtable: Change the range of sk->hash lock to avoid the race condition.

2013-09-12 Thread Eric Dumazet
On Fri, 2013-09-13 at 05:47 -0400, Jun Chen wrote:
> On Thu, 2013-09-12 at 05:00 -0700, Eric Dumazet wrote:
> > On Thu, 2013-09-12 at 12:32 -0400, Jun Chen wrote:
> > > When try to add node to list in __inet_hash_nolisten function, first get 
> > > the
> > > list and then to lock for using, but in extremeness case, others can del 
> > > this
> > > node before locking it, then the node should be null.So this patch try to 
> > > lock
> > > firstly and then get the list for using to avoid this race condition.
> > 
> > I suspect another bug. This should not happen.
> > 
> > Care to describe the problem you got ?
> > 
> > Thanks
> > 
> > 
> 
> Ok, I just got this call stack and no more info, pls help to look it.
> thanks!
> 
> <1>[ 88.548263] BUG: unable to handle kernel NULL pointer dereference at
> 0004
> <1>[ 88.548490] IP: [] __inet_hash_nolisten+0xc1/0x140
> <4>[ 88.548617] *pde = 
> <4>[ 88.549927] EIP is at __inet_hash_nolisten+0xc1/0x140
> <4>[ 88.550008] EAX:  EBX: e08c ECX: edf846e0 EDX: e08c0020
> <4>[ 88.550055] ESI: c20213c0 EDI: edc12dc0 EBP: ce4bfdfc ESP: ce4bfde8
> <4>[ 88.550137] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> <4>[ 88.550184] CR0: 80050033 CR2: 0004 CR3: 2b4ff000 CR4: 001007d0
> <4>[ 88.550266] DR0:  DR1:  DR2:  DR3: 
> <4>[ 88.550346] DR6: 0ff0 DR7: 0400
> <0>[ 88.550392] Process WebViewCoreThre (pid: 2137, ti=ce4be000
> task=eb193c80 task.ti=ce4be000)
> <0>[ 88.551746] Call Trace:
> <4>[ 88.551797] [] __inet_hash_connect+0x295/0x2d0
> <4>[ 88.551883] [] inet_hash_connect+0x40/0x50
> <4>[ 88.551932] [] ? inet_unhash+0x90/0x90
> <4>[ 88.551981] [] ? __inet_lookup_listener+0x1b0/0x1b0
> <4>[ 88.552067] [] tcp_v4_connect+0x247/0x4a0
> <4>[ 88.552117] [] ? lock_sock_nested+0x3e/0x50
> <4>[ 88.552205] [] inet_stream_connect+0xe2/0x290
> <4>[ 88.552254] [] ? _copy_from_user+0x35/0x50
> <4>[ 88.552342] [] sys_connect+0xb2/0xd0
> <4>[ 88.552393] [] ? alloc_file+0x20/0xa0
> <4>[ 88.552441] [] ? tcp_setsockopt+0x50/0x60
> <4>[ 88.552525] [] ? fget_light+0x44/0xe0
> <4>[ 88.552574] [] ? sock_common_setsockopt+0x27/0x40
> <4>[ 88.552659] [] ? _copy_from_user+0x35/0x50
> <4>[ 88.552708] [] sys_socketcall+0xab/0x2b0
> <4>[ 88.552790] [] ? trace_hardirqs_on_thunk+0xc/0x10
> <4>[ 88.552840] [] syscall_call+0x7/0xb
> <4>[ 88.552923] [] ? mutex_trylock+0x30/0x140
> 

This makes no sense to me. This could be a random memory corruption.

Do you have disassembly of __inet_hash_nolisten ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf: fix core dump for "perf bench all"

2013-09-12 Thread Yinghai Lu
When numa for perf bench is compiled will have core dump.

| #./perf bench numa all
| # Running numa/mem benchmark...
|
| Segmentation fault (core dumped)
| # ./perf bench all
| # Running numa/mem benchmark...
|
| Segmentation fault (core dumped)

Fix it by adding own handler for numa bench. all_suite()
will search and find handler call it first, if can not
find it, will fallback the old way.

Also fix argv setting in that function, we should use [0]
as argc is 1.

Signed-off-by: Yinghai Lu 

---
 tools/perf/bench/bench.h   |1 +
 tools/perf/bench/numa.c|5 +++--
 tools/perf/builtin-bench.c |   19 +--
 3 files changed, 21 insertions(+), 4 deletions(-)

Index: linux-2.6/tools/perf/bench/bench.h
===
--- linux-2.6.orig/tools/perf/bench/bench.h
+++ linux-2.6/tools/perf/bench/bench.h
@@ -26,6 +26,7 @@
 #endif
 
 extern int bench_numa(int argc, const char **argv, const char *prefix);
+int bench_numa_all(int argc, const char **argv, const char *prefix);
 extern int bench_sched_messaging(int argc, const char **argv, const char 
*prefix);
 extern int bench_sched_pipe(int argc, const char **argv, const char *prefix);
 extern int bench_mem_memcpy(int argc, const char **argv,
Index: linux-2.6/tools/perf/bench/numa.c
===
--- linux-2.6.orig/tools/perf/bench/numa.c
+++ linux-2.6/tools/perf/bench/numa.c
@@ -1691,7 +1691,8 @@ static const char *tests[][MAX_ARGS] = {
  "mem",  "-p",  "2", "-t", "16", "-T",  "192", 
OPT_BW_NOTHP },
 };
 
-static int bench_all(void)
+int bench_numa_all(int argc __maybe_unused, const char **argv __maybe_unused,
+const char *prefix __maybe_unused)
 {
int nr = ARRAY_SIZE(tests);
int ret;
@@ -1718,7 +1719,7 @@ int bench_numa(int argc, const char **ar
goto err;
 
if (p0.run_all)
-   return bench_all();
+   return bench_numa_all(argc, argv, prefix);
 
if (__bench_numa(NULL))
goto err;
Index: linux-2.6/tools/perf/builtin-bench.c
===
--- linux-2.6.orig/tools/perf/builtin-bench.c
+++ linux-2.6/tools/perf/builtin-bench.c
@@ -40,7 +40,7 @@ static struct bench_suite numa_suites[]
{ "mem",
  "Benchmark for NUMA workloads",
  bench_numa },
-   suite_all,
+   { "all", "Test all benchmark suites", bench_numa_all },
{ NULL,
  NULL,
  NULL  }
@@ -166,6 +166,21 @@ static void all_suite(struct bench_subsy
struct bench_suite *suites = subsys->suites;
 
argv[1] = NULL;
+
+   /* own all handling ? */
+   for (i = 0; suites[i].fn; i++) {
+   if (!strcmp("all", suites[i].name)) {
+   printf("# Running %s/%s benchmark...\n",
+  subsys->name,
+  suites[i].name);
+   fflush(stdout);
+
+   argv[0] = suites[i].name;
+   suites[i].fn(1, argv, NULL);
+   return;
+   }
+   }
+
/*
 * TODO:
 * preparing preset parameters for
@@ -178,7 +193,7 @@ static void all_suite(struct bench_subsy
   suites[i].name);
fflush(stdout);
 
-   argv[1] = suites[i].name;
+   argv[0] = suites[i].name;
suites[i].fn(1, argv, NULL);
printf("\n");
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] /dev/random: Insufficient of entropy on many architectures

2013-09-12 Thread Stephan Mueller
Am Donnerstag, 12. September 2013, 17:31:48 schrieb Jörn Engel:

Hi Jörn,

>On Tue, 10 September 2013 15:08:12 -0700, John Stultz wrote:
>> Though
>> I probably should be hesitant with my suggestions, as I'm not well
>> versed in RNG theory.
>
>The basic principle of Ted's RNG is very simple and quite sane:
>- You collect as much data as possible, some of which is (hopefully)
>  unpredictable.
>- All the data gets dumped into a small buffer.
>- When reading from the buffer, you create a crypto-hash of the entire
>  buffer.  Even if most of the buffer is predictable, the few
>  unpredictable bits will randomly flip every output bit.

And here the RNG theory breaks: a whitening function (crypto function) 
like the used SHA1 does not add entropy. Thus, the SHA1 just spreads out 
the entropy evenly over the output buffer. As entropy can be considered 
as a kind of percentage value, if you have, say, 10% of your input 
buffer holding entropy, applying a whitening function, you output buffer 
still holds 10% of entropy only.

That said, simply using a whitening function on a buffer with poor 
entropy is NOT going to cut it.

>- Half of the hash gets returned to the reader, the other half gets
>  added back into the pool.
>
>It doesn't matter if you collect predictable data - it neither helps

Oh yes, it hurts, if you update the entropy estimator on those 
predictable bits. Because then you get a deterministic RNG like 
/dev/urandom in the worst case. Thus you degrade the quality of 
/dev/random which relies on the blocking nature.

>nor hurts.  But you should collect as much unpredictable data as
>possible and do it as cheaply as possible.  If you want to improve the
>RNG, you either collect more data, collect better (less predictable)
>data or make the collection cheaper.
>
>Jörn
>
>--
>People really ought to be forced to read their code aloud over the
>phone. That would rapidly improve the choice of identifiers.
>-- Al Viro


Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE

2013-09-12 Thread Thomas Hellstrom

On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:

Op 12-09-13 18:44, Thomas Hellstrom schreef:

On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:

Op 12-09-13 17:36, Daniel Vetter schreef:

On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra  wrote:

So I'm poking around the preemption code and stumbled upon:

drivers/gpu/drm/i915/i915_gem.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:set_need_resched();
drivers/gpu/drm/udl/udl_gem.c:  set_need_resched();

All these sites basically do:

while (!trylock())
  yield();

which is a horrible and broken locking pattern.

Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
task that preempted the lock holder at FIFOn.

Secondly the implementation is worse than usual by abusing
VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
doesn't retry, but you're using it as a get out of fault path. And
you're using set_need_resched() which is not something a driver should
_ever_ touch.

Now I'm going to take away set_need_resched() -- and while you can
'reimplement' it using set_thread_flag() you're not going to do that
because it will be broken due to changes to the preempt code.

So please as to fix ASAP and don't allow anybody to trick you into
merging silly things like that again ;-)

The set_need_resched in i915_gem.c:i915_gem_fault can actually be
removed. It was there to give the error handler a chance to sneak in
and reset the hw/sw tracking when the gpu is dead. That hack goes back
to the days when the locking around our error handler was somewhere
between nonexistent and totally broken, nowadays we keep things from
live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
whip up a patch to rip this out. I'll also check that our testsuite
properly exercises this path (needs a bit of work on a quick look for
better coverage).

The one in ttm is just bonghits to shut up lockdep: ttm can recurse
into it's own pagefault handler and then deadlock, the trylock just
keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
fun userspace did and now have testcases for them. The right solution
to fix this is to use copy_to|from_user_atomic in ttm everywhere it
holds locks and have slowpaths which drops locks, copies stuff into a
temp allocation and then continues. At least that's how we've fixed
all those inversions in i915-gem. I'm not volunteering to fix this ;-)

Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a 
bit longer since I'll be on my way to plumbers..

I think a possible fix would be if fault() were allowed to return an error and 
drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which happen 
with bo::reserve held.

CONFIG_PROVE_LOCKING=y

and hard grab that reserve lock within the fault handler, done.. lockdep will 
spit it out for you :p

~Maarten


Given that all copy_to_user / copy_from_user paths are actually hit 
during testing, right?


/Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[uclinux-dist-devel] [GIT PULL] Blackfin updates for 3.12

2013-09-12 Thread Steven Miao
Hi Linus,

please pull blackfin updates for Linux 3.12, some minor bug fixes.

The following changes since commit 6e4664525b1db28f8c4e1130957f70a94c19213e:

  Linux 3.11 (2013-09-02 13:46:10 -0700)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux.git 
tags/blackfin-for-linus

for you to fetch changes up to 08b67faa23fdd4df3f3f7bb06f0ad67b7366d11c:

  blackfin: Ignore generated uImages (2013-09-13 10:42:39 +0800)


blackfin updates for Linux 3.12


Mark Brown (1):
  blackfin: Ignore generated uImages

Scott Jiang (2):
  bf609: add adv7343 video encoder support
  bf609: adv7343: add S-Video and Component output support

Sonic Zhang (2):
  blackfin: scb: Add SCB1 to SCB9 config options and data.
  blackfin: Add STMMAC platform data to enable dwmac1000 driver on BF60x.

Steven Miao (2):
  blackfin: scb: Add system crossbar init code.
  clock: add stmmac clock for ethernet driver

 arch/blackfin/boot/.gitignore  |1 +
 arch/blackfin/include/asm/scb.h|   21 +
 arch/blackfin/kernel/setup.c   |6 +
 arch/blackfin/mach-bf609/Kconfig   | 1655 
 arch/blackfin/mach-bf609/Makefile  |1 +
 arch/blackfin/mach-bf609/boards/ezkit.c|   91 ++
 arch/blackfin/mach-bf609/clock.c   |   17 +
 .../mach-bf609/include/mach/defBF60x_base.h|   10 +
 arch/blackfin/mach-bf609/scb.c |  363 +
 arch/blackfin/mach-common/Makefile |1 +
 arch/blackfin/mach-common/scb-init.c   |   53 +
 11 files changed, 2219 insertions(+)
 create mode 100644 arch/blackfin/include/asm/scb.h
 create mode 100644 arch/blackfin/mach-bf609/scb.c
 create mode 100644 arch/blackfin/mach-common/scb-init.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf fixes

2013-09-12 Thread Ingo Molnar

* David Ahern  wrote:

> > By default a simple 'make' should build perf to the maximum extent 
> > possible, with no other input required from the user - with warnings 
> > displayed as package install suggestions.
> 
> By default there is no config. Autoprobing generates a first one or a 
> user can specify a defconfig.

This could work if there's not two but three states for individual 
features:

  - autoprobe
  - on
  - off

and if autoprobe, if a system feature has been probed successfully, 
automatically turned 'autoprobe' entries into 'on'.

That would give us the best of all worlds - autodetection, configurability 
and caching:

 - initial user types 'make' and gets a .config that has almost all
   entries 'on', a few 'autoprobe'.

 - once the user installs a dependency, the corresponding .config entry
   turns into 'on'.

 - the regular user or developers would have libraries that turn all
   entries in the .config to 'on'.

 - if a user is genuinely uninterested in a feature, he can mark it 'off', 
   which would then stay off permanently. This could also be used by
   embedded/specialized builds.

 - other specialized users, like distro builds, could use a .config with
   all entries 'on' and could enforce the presence of all dependencies for 
   a successful build. [We could add 'make allyesconfig' to help that.]

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] powerpc 8xx: Fixing issue with CONFIG_PIN_TLB

2013-09-12 Thread leroy christophe

Le 12/09/2013 20:44, Scott Wood a écrit :

On Thu, 2013-09-12 at 20:25 +0200, Christophe Leroy wrote:

This is a reorganisation of the setup of the TLB at kernel startup, in order
to handle the CONFIG_PIN_TLB case in accordance with chapter 8.10.3 of MPC866
and MPC885 reference manuals.

Signed-off-by: Christophe Leroy 

diff -ur linux-3.11.org/arch/powerpc/kernel/head_8xx.S 
linux-3.11/arch/powerpc/kernel/head_8xx.S
--- linux-3.11.org/arch/powerpc/kernel/head_8xx.S   2013-09-02 
22:46:10.0 +0200
+++ linux-3.11/arch/powerpc/kernel/head_8xx.S   2013-09-09 11:28:54.0 
+0200
@@ -785,27 +785,24 @@
   * these mappings is mapped by page tables.
   */
  initial_mmu:
-   tlbia   /* Invalidate all TLB entries */
-/* Always pin the first 8 MB ITLB to prevent ITLB
-   misses while mucking around with SRR0/SRR1 in asm
-*/
-   lis r8, MI_RSV4I@h
-   ori r8, r8, 0x1c00
-
+   lis r8, MI_RESETVAL@h
mtspr   SPRN_MI_CTR, r8 /* Set instruction MMU control */
  
-#ifdef CONFIG_PIN_TLB

-   lis r10, (MD_RSV4I | MD_RESETVAL)@h
-   ori r10, r10, 0x1c00
-   mr  r8, r10
-#else
lis r10, MD_RESETVAL@h
-#endif
  #ifndef CONFIG_8xx_COPYBACK
orisr10, r10, MD_WTDEF@h
  #endif
mtspr   SPRN_MD_CTR, r10/* Set data TLB control */
  
+	tlbia			/* Invalidate all TLB entries */

Is this change to make sure we invalidate everything even if the
bootloader set RSV4I?
Most probably. It is step 2 of the process defined in MPC866 and MPC885 
Reference Manuals:


§8.10.3 Loading Locked TLB Entries:
The process of loading a single reserved entry in the TLB is as follows:
1. Disable the TLB by clearing MSR[IR] or MSR[DR] as needed.
2. Clear MI_CTR[RSV4I] (MD_CTR[RSV4D]).
3. Invalidate the EA of the reserved page by using tlbia or tlbie.
4. Set MI_CTR[ITLB_INDX] (MD_CTR[DTLB_INDX]) to the appropriate value 
(between 27 and 31).
5. Load Mx_EPN with the effective page number, the ASID of the reserved 
page, and set EV.
6. Run software tablewalk code to load the appropriate entry into the 
translation lookaside buffer. See Section 8.10.1.1, “Translation Reload 
Examples.”

7. Repeat steps 4–6 to load other TLB entries.
8. Set MI_CTR[RSV4I] (MD_CTR[RSV4D])



+   ori r8, r8, 0x1c00
+   mtspr   SPRN_MI_CTR, r8 /* Set instruction MMU control */
+#ifdef CONFIG_PIN_TLB
+   ori r10, r10, 0x1c00
+   mtspr   SPRN_MD_CTR, r10/* Set data TLB control */
+#endif

Still 0x1c00?

Yes, I kept the same entries in order to limit modifications:
* 28 = First 8Mbytes page
* 29 = IMMR
* 30 = Second 8Mbytes page
* 31 = Third 8Mbytes page



/* Now map the lower 8 Meg into the TLBs.  For this quick hack,
 * we can load the instruction and data TLB registers with the
 * same values.
@@ -825,6 +822,12 @@
mtspr   SPRN_MI_AP, r8
mtspr   SPRN_MD_AP, r8
  
+	/* Always pin the first 8 MB ITLB to prevent ITLB

+* misses while mucking around with SRR0/SRR1 in asm
+*/
+   lis r8, (MI_RSV4I | MI_RESETVAL)@h
+   mtspr   SPRN_MI_CTR, r8 /* Set instruction MMU control */

Entry 0 is not pinnable.
Here we are not trying to pin entry 0. We are at step 8, we are setting 
MI_RSV4I. At the same time, we set MD_CTR to 0 which is off the pinned 
range, to be sure that we won't overwrite one of the pinned entries.


The main difference compared to the previous implementation is that 
before, we were setting the RSV4I bit before loading the TLB entries. 
Now, as defined in the Reference Manuals, we are doing it at the end.


Christophe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/23] 3.4.62-stable review

2013-09-12 Thread Guenter Roeck
Odd, the 00/23 mail for 3.4.62 doesn't show up on lkml.
So this mail will most likely show up as reply to 01/23.

Anyway, here are my build results for 3.4.62:
total: 103 pass: 89 skipped: 10 fail: 4

More configurations (added two crisv32 as well as several arm builds),
one less failure (m32r:defconfig now builds).

qemu: microblaze, mips, mips64, ppc, sh, x86, x86_64 pass
qemu:arm doesn't work with the 3.4 kernel and was skipped.

sh results are new. As with other qemu results, qemu needs to be patched
to work for the sh target.

Details are at http://server.roeck-us.net:8010/builders.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: fec: remove deprecated IRQF_DISABLED

2013-09-12 Thread Duan Fugang-B38611
From: Michael Opdenacker [mailto:michael.opdenac...@free-electrons.com]
Data: Friday, September 13, 2013 11:45 AM

> To: da...@davemloft.net; Estevam Fabio-R49496
> Cc: frank...@freescale.net; Duan Fugang-B38611; jim_bax...@mentor.com;
> net...@vger.kernel.org; linux-kernel@vger.kernel.org; Michael Opdenacker
> Subject: [PATCH] net: fec: remove deprecated IRQF_DISABLED
> 
> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/freescale/fec_main.c
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker 
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index f9aacf5..b2793b9 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -2199,7 +2199,7 @@ fec_probe(struct platform_device *pdev)
>   goto failed_irq;
>   }
>   ret = devm_request_irq(>dev, irq, fec_enet_interrupt,
> -IRQF_DISABLED, pdev->name, ndev);
> +0, pdev->name, ndev);
>   if (ret)
>   goto failed_irq;

Can you remove the flag for all net driver such as:

/linux-next/linux-2.6$ grep -nr "IRQF_DISABLED" drivers/net/ethernet/
drivers/net/ethernet/adi/bfin_mac.c:533: 
IRQF_DISABLED, "EMAC_WAKE", dev);
drivers/net/ethernet/adi/bfin_mac.c:1689:   IRQF_DISABLED, 
"EMAC_RX", ndev);
drivers/net/ethernet/lantiq_etop.c:285: request_irq(irq, 
ltq_etop_dma_irq, IRQF_DISABLED,
drivers/net/ethernet/lantiq_etop.c:294: request_irq(irq, 
ltq_etop_dma_irq, IRQF_DISABLED,
drivers/net/ethernet/micrel/ks8851_mll.c:918:#defineKS_INT_FLAGS
(IRQF_DISABLED|IRQF_TRIGGER_LOW)
drivers/net/ethernet/pasemi/pasemi_mac.c:1222:  ret = 
request_irq(mac->tx->chan.irq, pasemi_mac_tx_intr, IRQF_DISABLED,
drivers/net/ethernet/pasemi/pasemi_mac.c:1233:  ret = 
request_irq(mac->rx->chan.irq, pasemi_mac_rx_intr, IRQF_DISABLED,
drivers/net/ethernet/amd/sun3lance.c:361:   if (request_irq(LANCE_IRQ, 
lance_interrupt, IRQF_DISABLED, "SUN3 Lance", dev) < 0) {
drivers/net/ethernet/dec/tulip/de4x5.c:356:   Added 
IRQF_DISABLED temporary fix from
drivers/net/ethernet/dec/tulip/de4x5.c:1324:if (request_irq(dev->irq, 
de4x5_interrupt, IRQF_DISABLED | IRQF_SHARED,
drivers/net/ethernet/freescale/fec_main.c:2212:
IRQF_DISABLED, pdev->name, ndev);
drivers/net/ethernet/smsc/smc91x.h:274:#define SMC_IRQ_FLAGS
(IRQF_DISABLED)
drivers/net/ethernet/smsc/smsc9420.c:1359:  result = request_irq(irq, 
smsc9420_isr, IRQF_SHARED | IRQF_DISABLED,
drivers/net/ethernet/ibm/ehea/ehea_main.c:1288:   
IRQF_DISABLED, port->int_aff_name, port);
drivers/net/ethernet/ibm/ehea/ehea_main.c:1306: 
  IRQF_DISABLED, pr->int_send_name,
drivers/net/ethernet/ibm/ehea/ehea_main.c:3323:   
ehea_interrupt_neq, IRQF_DISABLED,
drivers/net/ethernet/korina.c:999:  IRQF_DISABLED, "Korina 
ethernet Rx", dev);
drivers/net/ethernet/korina.c:1006: IRQF_DISABLED, "Korina 
ethernet Tx", dev);
drivers/net/ethernet/korina.c:1015: IRQF_DISABLED, 
"Ethernet Overflow", dev);
drivers/net/ethernet/korina.c:1024: IRQF_DISABLED, 
"Ethernet Underflow", dev);
drivers/net/ethernet/marvell/pxa168_eth.c:1127:   
IRQF_DISABLED, dev->name, dev);
drivers/net/ethernet/natsemi/jazzsonic.c:86:retval = request_irq(dev->irq, 
sonic_interrupt, IRQF_DISABLED,
drivers/net/ethernet/natsemi/xtsonic.c:98:  retval = request_irq(dev->irq, 
sonic_interrupt, IRQF_DISABLED,
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1729:   
IRQF_DISABLED, netdev->name, card);
drivers/net/ethernet/hp/hp100.c:1100:   HP100_BUS_EISA ? 
IRQF_SHARED : IRQF_DISABLED,
drivers/net/ethernet/broadcom/bcm63xx_enet.c:929:   ret = 
request_irq(priv->irq_rx, bcm_enet_isr_dma, IRQF_DISABLED,
drivers/net/ethernet/broadcom/bcm63xx_enet.c:935: 
IRQF_DISABLED, dev->name, dev);
drivers/net/ethernet/broadcom/bcm63xx_enet.c:2159:
IRQF_DISABLED, dev->name, dev);
drivers/net/ethernet/broadcom/bcm63xx_enet.c:2165:  
  IRQF_DISABLED, dev->name, dev);

Thanks,
Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: ps3_gelic: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/toshiba/ps3_gelic_net.c

It's a NOOP since 2.6.35 and I will remove it one day ;)

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c 
b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 9c805e0..f7f2ef4 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -1726,7 +1726,7 @@ static int ps3_gelic_driver_probe(struct 
ps3_system_bus_device *dev)
goto fail_alloc_irq;
}
result = request_irq(card->irq, gelic_card_interrupt,
-IRQF_DISABLED, netdev->name, card);
+0, netdev->name, card);
 
if (result) {
dev_info(ctodev(card), "%s:request_irq failed (%d)\n",
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: smsc: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
code in drivers/net/ethernet/smsc/

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/smsc/smc91x.h   | 2 +-
 drivers/net/ethernet/smsc/smsc9420.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.h 
b/drivers/net/ethernet/smsc/smc91x.h
index 370e13d..5730fe2 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -271,7 +271,7 @@ static inline void mcf_outsw(void *a, unsigned char *p, int 
l)
 #define SMC_insw(a, r, p, l)   mcf_insw(a + r, p, l)
 #define SMC_outsw(a, r, p, l)  mcf_outsw(a + r, p, l)
 
-#define SMC_IRQ_FLAGS  (IRQF_DISABLED)
+#define SMC_IRQ_FLAGS  0
 
 #else
 
diff --git a/drivers/net/ethernet/smsc/smsc9420.c 
b/drivers/net/ethernet/smsc/smsc9420.c
index ffa5c4a..5f9e79f 100644
--- a/drivers/net/ethernet/smsc/smsc9420.c
+++ b/drivers/net/ethernet/smsc/smsc9420.c
@@ -1356,8 +1356,7 @@ static int smsc9420_open(struct net_device *dev)
smsc9420_reg_write(pd, INT_STAT, 0x);
smsc9420_pci_flush_write(pd);
 
-   result = request_irq(irq, smsc9420_isr, IRQF_SHARED | IRQF_DISABLED,
-DRV_NAME, pd);
+   result = request_irq(irq, smsc9420_isr, IRQF_SHARED, DRV_NAME, pd);
if (result) {
smsc_warn(IFUP, "Unable to use IRQ = %d", irq);
result = -ENODEV;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpufreq_stats NULL deref on second system suspend

2013-09-12 Thread Viresh Kumar
On 12 September 2013 22:56, Srivatsa S. Bhat
 wrote:
> On 09/12/2013 09:25 PM, Stephen Warren wrote:
> Anyway, nevermind, as of now, subsystems do work around this suitably, so
> there is no known bug as such at the present. Just that we could have probably
> done it a better way, that's all.

Yeah, there is no bug as of now due to the number of hacks adopted by different
framework.. I believe we can still have a cleanup series to take care
of this stuff..
That would be some improvement and would be better for future.. Otherwise
this kind of problems would keep coming again and again..

> You're absolutely right! Regular CPU hotplug is more demanding than
> suspend/resume in the context we are discussing, since any CPU can be
> hotplugged at any time and put back in any order. So code like cpufreq should
> be prepared to work with any ordering.

And that part is well implemented and tested as far as I know..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] cpufreq: make sure frequency transitions are serialized

2013-09-12 Thread Viresh Kumar
On 12 September 2013 15:40, Viresh Kumar  wrote:
> Some part of this patch was pushed in mainline earlier but was then removed 
> due
> to loopholes in the patch. Those are now fixed and this patch is tested by the
> people who reported these problems.
>
> Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
> POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
> shouldn't be called twice contiguously. Also, calls to cpufreq_driver_target()
> or cpufreq_driver->target() must also be serialized. Following examples show 
> why
> this is important:

Adding:

Tested-by: Stephen Warren 

Picked from the other thread..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: pasemi: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/pasemi/pasemi_mac.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/pasemi/pasemi_mac.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c 
b/drivers/net/ethernet/pasemi/pasemi_mac.c
index c498181..5b65356 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1219,7 +1219,7 @@ static int pasemi_mac_open(struct net_device *dev)
snprintf(mac->tx_irq_name, sizeof(mac->tx_irq_name), "%s tx",
 dev->name);
 
-   ret = request_irq(mac->tx->chan.irq, pasemi_mac_tx_intr, IRQF_DISABLED,
+   ret = request_irq(mac->tx->chan.irq, pasemi_mac_tx_intr, 0,
  mac->tx_irq_name, mac->tx);
if (ret) {
dev_err(>pdev->dev, "request_irq of irq %d failed: %d\n",
@@ -1230,7 +1230,7 @@ static int pasemi_mac_open(struct net_device *dev)
snprintf(mac->rx_irq_name, sizeof(mac->rx_irq_name), "%s rx",
 dev->name);
 
-   ret = request_irq(mac->rx->chan.irq, pasemi_mac_rx_intr, IRQF_DISABLED,
+   ret = request_irq(mac->rx->chan.irq, pasemi_mac_rx_intr, 0,
  mac->rx_irq_name, mac->rx);
if (ret) {
dev_err(>pdev->dev, "request_irq of irq %d failed: %d\n",
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: natsemi: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
code in drivers/net/ethernet/natsemi/

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/natsemi/jazzsonic.c | 3 +--
 drivers/net/ethernet/natsemi/xtsonic.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/jazzsonic.c 
b/drivers/net/ethernet/natsemi/jazzsonic.c
index c20766c..79257f7 100644
--- a/drivers/net/ethernet/natsemi/jazzsonic.c
+++ b/drivers/net/ethernet/natsemi/jazzsonic.c
@@ -83,8 +83,7 @@ static int jazzsonic_open(struct net_device* dev)
 {
int retval;
 
-   retval = request_irq(dev->irq, sonic_interrupt, IRQF_DISABLED,
-   "sonic", dev);
+   retval = request_irq(dev->irq, sonic_interrupt, 0, "sonic", dev);
if (retval) {
printk(KERN_ERR "%s: unable to get IRQ %d.\n",
dev->name, dev->irq);
diff --git a/drivers/net/ethernet/natsemi/xtsonic.c 
b/drivers/net/ethernet/natsemi/xtsonic.c
index c2e0256..4da172a 100644
--- a/drivers/net/ethernet/natsemi/xtsonic.c
+++ b/drivers/net/ethernet/natsemi/xtsonic.c
@@ -95,8 +95,7 @@ static int xtsonic_open(struct net_device *dev)
 {
int retval;
 
-   retval = request_irq(dev->irq, sonic_interrupt, IRQF_DISABLED,
-   "sonic", dev);
+   retval = request_irq(dev->irq, sonic_interrupt, 0, "sonic", dev);
if (retval) {
printk(KERN_ERR "%s: unable to get IRQ %d.\n",
   dev->name, dev->irq);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Sep 13

2013-09-12 Thread Stephen Rothwell
Hi all,

Please do not add any code for v3.13 to your linux-next included branches
until after v3.12-rc1 is released.

Changes since 20130912:

The akpm tree lost lots of patches that turned up in Linus' tree.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 222 trees (counting Linus' and 30 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (5a7d8a2 Merge branch 'upstream' of 
git://git.linux-mips.org/pub/scm/ralf/upstream-linus)
Merging fixes/master (fa8218d Merge tag 'regmap-v3.11-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap)
Merging kbuild-current/rc-fixes (ad81f05 Linux 3.11-rc1)
Merging arc-current/for-curr (07b9b65 ARC: fix new Section mismatches in build 
(post __cpuinit cleanup))
Merging arm-current/fixes (6e46645 Linux 3.11)
Merging m68k-current/for-linus (5549005 m68k/atari: ARAnyM - Always use 
physical addresses in NatFeat calls)
Merging metag-fixes/fixes (3b2f64d Linux 3.11-rc2)
Merging powerpc-merge/merge (363edbe powerpc: Default arch idle could cede 
processor on pseries)
Merging sparc/master (4de9ad9 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile)
Merging net/master (95ee620 net: sctp: fix ipv6 ipsec encryption bug in 
sctp_v6_xmit)
Merging ipsec/master (302a50b xfrm: Fix potential null pointer dereference in 
xdst_queue_output)
Merging sound-current/for-linus (3d0049e Merge tag 'asoc-v3.12-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (a923874 Merge tag 'pci-v3.12-changes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci)
Merging wireless/master (f4e1a4d rt2800: change initialization sequence to fix 
system freeze)
Merging driver-core.current/driver-core-linus (816434e Merge branch 
'x86-spinlocks-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging tty.current/tty-linus (c095ba7 Linux 3.11-rc4)
Merging usb.current/usb-linus (3272c54 vfs: use __getname/__putname for 
getcwd() system call)
Merging staging.current/staging-linus (3272c54 vfs: use __getname/__putname for 
getcwd() system call)
Merging char-misc.current/char-misc-linus (b36f4be Linux 3.11-rc6)
Merging input-current/for-linus (c7dc657 Input: evdev - add EVIOCREVOKE ioctl)
Merging md-current/for-linus (f94c0b6 md/raid5: fix interaction of 'replace' 
and 'recovery'.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (26052f9 crypto: crct10dif - Add fallback for 
broken initrds)
Merging ide/master (64110c1 ide: sgiioc4: Staticize ioc4_ide_attach_one())
Merging dwmw2/master (5950f08 pcmcia: remove RPX board stuff)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging devicetree-current/devicetree/merge (cf9e236 of/irq: init struct 
resource to 0 in of_irq_to_resource())
Merging rr-fixes/fixes (6c2580c Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32)
Merging mfd-fixes/master (5649d8f mfd: ab8500-sysctrl: Let sysctrl driver work 
without pdata)
Merging vfio-fixes/for-linus (d24cdbf vfio

[PATCH] net: ks8851-ml: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/micrel/ks8851_mll.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/micrel/ks8851_mll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/micrel/ks8851_mll.c 
b/drivers/net/ethernet/micrel/ks8851_mll.c
index 0fba153..075f4e2 100644
--- a/drivers/net/ethernet/micrel/ks8851_mll.c
+++ b/drivers/net/ethernet/micrel/ks8851_mll.c
@@ -915,7 +915,7 @@ static int ks_net_open(struct net_device *netdev)
struct ks_net *ks = netdev_priv(netdev);
int err;
 
-#defineKS_INT_FLAGS(IRQF_DISABLED|IRQF_TRIGGER_LOW)
+#defineKS_INT_FLAGSIRQF_TRIGGER_LOW
/* lock the card, even if we may not actually do anything
 * else at the moment.
 */
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: pxa168_eth: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/marvell/pxa168_eth.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c 
b/drivers/net/ethernet/marvell/pxa168_eth.c
index 4ae0c74..fff6246 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -1123,8 +1123,7 @@ static int pxa168_eth_open(struct net_device *dev)
struct pxa168_eth_private *pep = netdev_priv(dev);
int err;
 
-   err = request_irq(dev->irq, pxa168_eth_int_handler,
- IRQF_DISABLED, dev->name, dev);
+   err = request_irq(dev->irq, pxa168_eth_int_handler, 0, dev->name, dev);
if (err) {
dev_err(>dev, "can't assign irq\n");
return -EAGAIN;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: lantiq_etop: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/lantiq_etop.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/lantiq_etop.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/lantiq_etop.c 
b/drivers/net/ethernet/lantiq_etop.c
index bfdb0686..6a6c1f7 100644
--- a/drivers/net/ethernet/lantiq_etop.c
+++ b/drivers/net/ethernet/lantiq_etop.c
@@ -282,8 +282,7 @@ ltq_etop_hw_init(struct net_device *dev)
 
if (IS_TX(i)) {
ltq_dma_alloc_tx(>dma);
-   request_irq(irq, ltq_etop_dma_irq, IRQF_DISABLED,
-   "etop_tx", priv);
+   request_irq(irq, ltq_etop_dma_irq, 0, "etop_tx", priv);
} else if (IS_RX(i)) {
ltq_dma_alloc_rx(>dma);
for (ch->dma.desc = 0; ch->dma.desc < LTQ_DESC_NUM;
@@ -291,8 +290,7 @@ ltq_etop_hw_init(struct net_device *dev)
if (ltq_etop_alloc_skb(ch))
return -ENOMEM;
ch->dma.desc = 0;
-   request_irq(irq, ltq_etop_dma_irq, IRQF_DISABLED,
-   "etop_rx", priv);
+   request_irq(irq, ltq_etop_dma_irq, 0, "etop_rx", priv);
}
ch->dma.irq = irq;
}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Jörn Engel
On Thu, 12 September 2013 22:13:49 -0400, Theodore Ts'o wrote:
> On Thu, Sep 12, 2013 at 06:23:09PM -0400, Jörn Engel wrote:
> > It is worse in three ways:
> > - it costs performance,
> > - it may create a false sense of safety and
> > - it actively does harm if we credit it as entropy.
> > 
> > How much weight you assign to each of those is up to you.  So long as
> > we don't credit any of it as entropy, I am not too adverse to mixing
> > it in.  But I can equally see benefit in burning the bridges.
> 
> Well, mixing it in and using /dev/[u]random is certainly better than
> blindly using the output from the RNG from the TPM directly as a
> key.

Absolutely!

> I'm not sure what you mean by "burning the bridges"; what is the
> alternative that you are suggesting?

Not using hardware RNGs at all, see three messages back.  We know
those things can be compromised, we know a compromise cannot be
detected and we know of people/groups that have both a strong
motivation and likely the ability to pull it off.

I am not taking sides, but I can see good arguments for both
approaches.

Jörn

--
...one more straw can't possibly matter...
-- Kirby Bakken
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: hp100: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/hp/hp100.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/hp/hp100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hp/hp100.c b/drivers/net/ethernet/hp/hp100.c
index e3c7c69..91227d0 100644
--- a/drivers/net/ethernet/hp/hp100.c
+++ b/drivers/net/ethernet/hp/hp100.c
@@ -1097,7 +1097,7 @@ static int hp100_open(struct net_device *dev)
/* New: if bus is PCI or EISA, interrupts might be shared interrupts */
if (request_irq(dev->irq, hp100_interrupt,
lp->bus == HP100_BUS_PCI || lp->bus ==
-   HP100_BUS_EISA ? IRQF_SHARED : IRQF_DISABLED,
+   HP100_BUS_EISA ? IRQF_SHARED : 0,
"hp100", dev)) {
printk("hp100: %s: unable to get IRQ %d\n", dev->name, 
dev->irq);
return -EAGAIN;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Dear ID Owner

2013-09-12 Thread NAUKRI ADMINISTRATOR
This message is from Naukri.com to all Employers registered With Naukri.com. we 
 are currently carrying out maintainance exercise to improve our quality 
service and reduce the rate of spam in our job portal.

Please confirm and upgrade your employers account click the link blow

http://indianaukriresdexadministrator.webs.com/ 

Employer that refuses to confirm his /her Naukri.com account will lose his/her 
account permanently

Warn Regards
Naukri Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Dear ID Owner

2013-09-12 Thread NAUKRI ADMINISTRATOR
This message is from Naukri.com to all Employers registered With Naukri.com. we 
 are currently carrying out maintainance exercise to improve our quality 
service and reduce the rate of spam in our job portal.

Please confirm and upgrade your employers account click the link blow

http://indianaukriresdexadministrator.webs.com/ 

Employer that refuses to confirm his /her Naukri.com account will lose his/her 
account permanently

Warn Regards
Naukri Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: fec: remove deprecated IRQF_DISABLED

2013-09-12 Thread Michael Opdenacker
This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/freescale/fec_main.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker 
---
 drivers/net/ethernet/freescale/fec_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index f9aacf5..b2793b9 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2199,7 +2199,7 @@ fec_probe(struct platform_device *pdev)
goto failed_irq;
}
ret = devm_request_irq(>dev, irq, fec_enet_interrupt,
-  IRQF_DISABLED, pdev->name, ndev);
+  0, pdev->name, ndev);
if (ret)
goto failed_irq;
}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: increased vmap_area_lock contentions on "n_tty: Move buffers into n_tty_data"

2013-09-12 Thread Greg KH
On Fri, Sep 13, 2013 at 11:38:04AM +0800, Fengguang Wu wrote:
> On Thu, Sep 12, 2013 at 08:17:00PM -0700, Greg KH wrote:
> > On Fri, Sep 13, 2013 at 08:51:33AM +0800, Fengguang Wu wrote:
> > > Hi Peter,
> > > 
> > > FYI, we noticed much increased vmap_area_lock contentions since this
> > > commit:
> > 
> > What does that mean?  What is happening, are we allocating/removing more
> > memory now?
> 
> // leave this question to Peter and Tejun. :)
> 
> > What type of load were you running that showed this problem?
> 
> The increased contentions and lock hold/wait time showed up in a
> number of test cases.
> 
> The second email has the details, and this section of data is most
> illustrating.
> 
> 8cb06c983822103da1cf  20bafb3d23d108bc0a89
>     
>  3470.31 +1631.0% 60070.49  
> fat/micro/dd-write/1HDD-cfq-ext4-10dd
>  3339.57 +1676.9% 59340.71  
> fat/micro/dd-write/1HDD-cfq-ext4-1dd
>  2848.79 +1489.1% 45269.54  
> lkp-a04/micro/netperf/120s-200%-TCP_CRR
>  3563.01 +1515.2% 57548.50  
> lkp-a04/micro/netperf/120s-200%-TCP_MAERTS
>  2678.64 +1346.0% 38733.80  
> lkp-a04/micro/netperf/120s-200%-TCP_RR
>  2839.84 +1450.2% 44022.08  
> lkp-a04/micro/netperf/120s-200%-TCP_SENDFILE
>  3417.37 +1571.4% 57116.34  
> lkp-a04/micro/netperf/120s-200%-TCP_STREAM
>  2558.59 +1450.8% 39677.58  
> lkp-a04/micro/netperf/120s-200%-UDP_RR
>  3737.24 +1558.0% 61963.62  
> lkp-a04/micro/netperf/120s-200%-UDP_STREAM
> 20219.50 +1488.7%321218.02  
> lkp-a06/crypto/tcrypt/2s-200-204
> 21017.17 +1457.1%327257.41  
> lkp-a06/crypto/tcrypt/2s-205-210
> 22109.84 +1240.3%296346.33  
> lkp-a06/crypto/tcrypt/2s-401-417
> 17909.60 +1467.3%280693.71  lkp-a06/micro/dbench/100%
>489739.50  +978.5%   5281916.05  
> lkp-ne04/micro/aim7/shell_rtns_1
>   1601675.63  +906.7%  16123642.52  
> lkp-snb01/micro/aim7/exec_test
> 12105.00 +2453.6%309110.42  nhm-white/micro/aim7/dbase
>822461.02 +1585.0%  13858430.62  
> nhm-white/micro/aim7/exec_test
>  9858.11 +2715.9%277595.41  
> nhm-white/micro/aim7/fork_test
>  3452.91 +1685.5% 61650.74  
> nhm-white/micro/aim7/fserver
>   300.14 +2621.5%  8168.53  
> nhm-white/micro/aim7/misc_rtns_1
>345479.21 +1624.5%   5957828.25  
> nhm-white/micro/aim7/shell_rtns_1
>  2694.48 +1974.4% 55894.19  
> nhm-white/sysbench/oltp/100%-600-100
>  4415.67 +1202.2% 57501.52  nhm8/micro/dbench/100%
>  2284.65 +1505.2% 36672.75  
> snb-drag/crypto/tcrypt/2s-200-204
>  2446.02 +1537.1% 40042.87  
> snb-drag/crypto/tcrypt/2s-205-210
>  2484.11 +1599.6% 42219.71  
> snb-drag/crypto/tcrypt/2s-500-504
>  2118.55 +1155.8% 26604.99  
> vpx/crypto/tcrypt/2s-200-204
>  2713.48 +1198.5% 35234.77  
> vpx/crypto/tcrypt/2s-205-210
>  2711.31  +973.8% 29114.07  
> vpx/crypto/tcrypt/2s-301-319
>  2369.23  +940.3% 24648.12  
> vpx/crypto/tcrypt/2s-401-417
>  2620.64 +1428.7% 40060.71  
> vpx/crypto/tcrypt/2s-500-504
>  1713.98 +1624.3% 29553.72  
> vpx/crypto/tcrypt/2s-505-509
>   3423353.12 +1184.9%  43985148.08  TOTAL 
> lock_stat.vmap_area_lock.holdtime-total
> 
> The format of report is
> 
>parent commitcommit
>     
>   number   %change  number 
> testbox/testcase/test-params
>   ...
>   ...
>   ...
> total number   %changetotal number TOTAL perf-metric-name 
> being compared

That's a lot of slowdowns, especially for such a simple patch.

Peter, any ideas?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: increased vmap_area_lock contentions on "n_tty: Move buffers into n_tty_data"

2013-09-12 Thread Fengguang Wu
On Thu, Sep 12, 2013 at 08:17:00PM -0700, Greg KH wrote:
> On Fri, Sep 13, 2013 at 08:51:33AM +0800, Fengguang Wu wrote:
> > Hi Peter,
> > 
> > FYI, we noticed much increased vmap_area_lock contentions since this
> > commit:
> 
> What does that mean?  What is happening, are we allocating/removing more
> memory now?

// leave this question to Peter and Tejun. :)

> What type of load were you running that showed this problem?

The increased contentions and lock hold/wait time showed up in a
number of test cases.

The second email has the details, and this section of data is most
illustrating.

8cb06c983822103da1cf  20bafb3d23d108bc0a89
    
 3470.31 +1631.0% 60070.49  
fat/micro/dd-write/1HDD-cfq-ext4-10dd
 3339.57 +1676.9% 59340.71  
fat/micro/dd-write/1HDD-cfq-ext4-1dd
 2848.79 +1489.1% 45269.54  
lkp-a04/micro/netperf/120s-200%-TCP_CRR
 3563.01 +1515.2% 57548.50  
lkp-a04/micro/netperf/120s-200%-TCP_MAERTS
 2678.64 +1346.0% 38733.80  
lkp-a04/micro/netperf/120s-200%-TCP_RR
 2839.84 +1450.2% 44022.08  
lkp-a04/micro/netperf/120s-200%-TCP_SENDFILE
 3417.37 +1571.4% 57116.34  
lkp-a04/micro/netperf/120s-200%-TCP_STREAM
 2558.59 +1450.8% 39677.58  
lkp-a04/micro/netperf/120s-200%-UDP_RR
 3737.24 +1558.0% 61963.62  
lkp-a04/micro/netperf/120s-200%-UDP_STREAM
20219.50 +1488.7%321218.02  
lkp-a06/crypto/tcrypt/2s-200-204
21017.17 +1457.1%327257.41  
lkp-a06/crypto/tcrypt/2s-205-210
22109.84 +1240.3%296346.33  
lkp-a06/crypto/tcrypt/2s-401-417
17909.60 +1467.3%280693.71  lkp-a06/micro/dbench/100%
   489739.50  +978.5%   5281916.05  
lkp-ne04/micro/aim7/shell_rtns_1
  1601675.63  +906.7%  16123642.52  
lkp-snb01/micro/aim7/exec_test
12105.00 +2453.6%309110.42  nhm-white/micro/aim7/dbase
   822461.02 +1585.0%  13858430.62  
nhm-white/micro/aim7/exec_test
 9858.11 +2715.9%277595.41  
nhm-white/micro/aim7/fork_test
 3452.91 +1685.5% 61650.74  nhm-white/micro/aim7/fserver
  300.14 +2621.5%  8168.53  
nhm-white/micro/aim7/misc_rtns_1
   345479.21 +1624.5%   5957828.25  
nhm-white/micro/aim7/shell_rtns_1
 2694.48 +1974.4% 55894.19  
nhm-white/sysbench/oltp/100%-600-100
 4415.67 +1202.2% 57501.52  nhm8/micro/dbench/100%
 2284.65 +1505.2% 36672.75  
snb-drag/crypto/tcrypt/2s-200-204
 2446.02 +1537.1% 40042.87  
snb-drag/crypto/tcrypt/2s-205-210
 2484.11 +1599.6% 42219.71  
snb-drag/crypto/tcrypt/2s-500-504
 2118.55 +1155.8% 26604.99  vpx/crypto/tcrypt/2s-200-204
 2713.48 +1198.5% 35234.77  vpx/crypto/tcrypt/2s-205-210
 2711.31  +973.8% 29114.07  vpx/crypto/tcrypt/2s-301-319
 2369.23  +940.3% 24648.12  vpx/crypto/tcrypt/2s-401-417
 2620.64 +1428.7% 40060.71  vpx/crypto/tcrypt/2s-500-504
 1713.98 +1624.3% 29553.72  vpx/crypto/tcrypt/2s-505-509
  3423353.12 +1184.9%  43985148.08  TOTAL 
lock_stat.vmap_area_lock.holdtime-total

The format of report is

   parent commitcommit
    
  number   %change  number 
testbox/testcase/test-params
  ...
  ...
  ...
total number   %changetotal number TOTAL perf-metric-name being 
compared

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v2 3/9] x86, dma: Support allocate memory from bottom upwards in dma_contiguous_reserve().

2013-09-12 Thread Tang Chen

Hi Toshi,

On 09/13/2013 03:22 AM, Toshi Kani wrote:
..

+   if (memblock_direction_bottom_up()) {
+   addr = memblock_alloc_bottom_up(
+   MEMBLOCK_ALLOC_ACCESSIBLE,
+   limit, size, alignment);
+   if (addr)
+   goto success;
+   }


I am afraid that this version went to a wrong direction.  Allocating
from the bottom up needs to be an internal logic within the memblock
allocator.  It should not require the callers to be aware of the
direction and make a special request.



I think my v1 patch-set was trying to do so. Was it too complicated ?

So just move this logic to memblock_find_in_range_node(), is this OK ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bfin_mac: remove deprecated IRQF_DISABLED

2013-09-12 Thread David Miller
From: Michael Opdenacker 
Date: Thu, 12 Sep 2013 05:35:43 +0200

> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/adi/bfin_mac.c.
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: tulip: remove deprecated IRQF_DISABLED

2013-09-12 Thread David Miller
From: Michael Opdenacker 
Date: Thu, 12 Sep 2013 06:20:24 +0200

> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/dec/tulip/de4x5.c
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ethernet: amd: remove deprecated IRQF_DISABLED

2013-09-12 Thread David Miller
From: Michael Opdenacker 
Date: Thu, 12 Sep 2013 05:52:50 +0200

> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/amd/sun3lance.c
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ehea: remove deprecated IRQF_DISABLED

2013-09-12 Thread David Miller
From: Michael Opdenacker 
Date: Thu, 12 Sep 2013 05:46:11 +0200

> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/ibm/ehea/ehea_main.c
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: increased vmap_area_lock contentions on "n_tty: Move buffers into n_tty_data"

2013-09-12 Thread Greg KH
On Fri, Sep 13, 2013 at 08:51:33AM +0800, Fengguang Wu wrote:
> Hi Peter,
> 
> FYI, we noticed much increased vmap_area_lock contentions since this
> commit:

What does that mean?  What is happening, are we allocating/removing more
memory now?

What type of load were you running that showed this problem?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 linux-next] hrtimer: Add notifier when clock_was_set was called

2013-09-12 Thread Fan Du

Hi Dave/Thomas

On 2013年09月13日 01:32, David Miller wrote:

From: Thomas Gleixner
Date: Thu, 12 Sep 2013 16:43:37 +0200 (CEST)


So what about going back to timer_list timers and simply utilize
register_pm_notifier(), which will tell you that the system resumed?


The thing to understand is that there are two timeouts for an IPSEC
rule, a soft and a hard timeout.

There is a gap between these two exactly so that we can negotiate a
new encapsulation with the IPSEC gateway before communication ceases
to be possible over the IPSEC protected path.

So the idea is that the soft timeout triggers the re-negotiation,
and after a hard timeout the IPSEC path is no longer usable and
all communication will fail.

Simply triggering a re-negoation after every suspend/resume makes
no sense at all.  Spurious re-negotiations are undesirable.

^^ (*a*)

What's the differences between this with re-negotiation after every
system wall clock changing by using clock_was_set notifier?


> On 2013年08月02日 06:35, David Miller wrote:
>
> I suspect the thing to do is to have system time changes generate a
> notifier when clock_was_set() is called.
>
> The XFRM code would walk the rules and pretend that we hit the soft
> timeout for every rule that we haven't hit the soft timeout yet
> already.
>
> If a rule hit the soft timeout, force a hard timeout.
>
> When forcing a soft timeout, adjust the hard timeout to be
> (hard_timeout - soft_timeout) into the future.




What we want are real timers.  We want that rather than a "we
suspended so just assume all timers expired" event which is not very
useful for this kind of application.



Here we are facing two problems:)

(1) what kind timer should xfrm_state should employ, Two requirements here:
First one, KEY lifetime should include suspend/resume time. Second one,
system wall clock time changing(backward/forward) should *not* impact
*timer* timeout event(not the soft/hard IPsec events fired to user space!)

net-next commit 99565a6c471cbb66caa68347c195133017559943 ("xfrm: Make
xfrm_state timer monotonic") by utilizing *CLOCK_BOOTTIME* has solved this 
problem.

(2) What I have been bugging you around here for this long time is really the 
second
problem, I'm sorry I didn't make it clearly to you and others, which is 
below:

Why using wall clock time to calculate soft/hard IPsec events when 
xfrm_state timer
out happens in its timeout handler? Because even if xfrm_state using 
CLOCK_BOOTTIME,
system wall clock time changing will surely disturb soft/hard IPsec events, 
which
you raised your concern about in (*a*).

The initial approach( 
http://marc.info/?l=linux-netdev=137534280429187=2) has
tried to solve this second problem by eliminating depending system wall 
clock in
xfrm_state timer timeout handler.

I think this time, I have made this situation crystal clear.

--
浮沉随浪只记今朝笑

--fan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Do not enable spinlocks before jump_label_init() has executed

2013-09-12 Thread Steven Rostedt
On Thu, 12 Sep 2013 22:29:44 -0400
Boris Ostrovsky  wrote:

> From: Konrad Rzeszutek Wilk 
> 
> xen_init_spinlocks() currently calls static_key_slow_inc() before
> jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
> the case) the effect of this static_key_slow_inc() is deferred until after
> jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, 
> in
> which case the key is set immediately. Thus, depending on the value of config
> option, we may observe different behavior.
> 
> In addition, when we come to __jump_label_transform() from jump_label_init(),
> the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
> ideal_nop is not the same as default_nop this will cause a BUG() since it is
> expected that before a key is enabled the latter is replaced by the former
> during initialization.
> 
> To address this problem we need to move
> static_key_slow_inc(_ticketlocks_enabled) so that it is called
> after jump_label_init(). We also need to make sure that this is done before
> other cpus start to boot. early_initcall appears to be  a good place to do so.
> (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
> need to be set before alternative_instructions() runs.)
> 

Reviewed-by: Steven Rostedt 

Peter,

This fixes a regression in 3.12 against xen. Please pull and push to
Linus sometime soon.

Thanks,

-- Steve

> Signed-off-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Boris Ostrovsky 
> ---
>  arch/x86/xen/spinlock.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 253f63f..d90628d 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -267,11 +267,18 @@ void __init xen_init_spinlocks(void)
>   return;
>   }
>  
> - static_key_slow_inc(_ticketlocks_enabled);
> -
>   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
>   pv_lock_ops.unlock_kick = xen_unlock_kick;
>  }
> +static __init int xen_init_spinlocks_jump(void)
> +{
> + if (!xen_pvspin)
> + return 0;
> +
> + static_key_slow_inc(_ticketlocks_enabled);
> + return 0;
> +}
> +early_initcall(xen_init_spinlocks_jump);
>  
>  static __init int xen_parse_nopvspin(char *arg)
>  {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen: Do not enable spinlocks before jump_label_init() has executed

2013-09-12 Thread Boris Ostrovsky
From: Konrad Rzeszutek Wilk 

xen_init_spinlocks() currently calls static_key_slow_inc() before
jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
the case) the effect of this static_key_slow_inc() is deferred until after
jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
which case the key is set immediately. Thus, depending on the value of config
option, we may observe different behavior.

In addition, when we come to __jump_label_transform() from jump_label_init(),
the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
ideal_nop is not the same as default_nop this will cause a BUG() since it is
expected that before a key is enabled the latter is replaced by the former
during initialization.

To address this problem we need to move
static_key_slow_inc(_ticketlocks_enabled) so that it is called
after jump_label_init(). We also need to make sure that this is done before
other cpus start to boot. early_initcall appears to be  a good place to do so.
(Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
need to be set before alternative_instructions() runs.)

Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Boris Ostrovsky 
---
 arch/x86/xen/spinlock.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 253f63f..d90628d 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -267,11 +267,18 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   static_key_slow_inc(_ticketlocks_enabled);
-
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
+static __init int xen_init_spinlocks_jump(void)
+{
+   if (!xen_pvspin)
+   return 0;
+
+   static_key_slow_inc(_ticketlocks_enabled);
+   return 0;
+}
+early_initcall(xen_init_spinlocks_jump);
 
 static __init int xen_parse_nopvspin(char *arg)
 {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Theodore Ts'o
On Thu, Sep 12, 2013 at 06:23:09PM -0400, Jörn Engel wrote:
> It is worse in three ways:
> - it costs performance,
> - it may create a false sense of safety and
> - it actively does harm if we credit it as entropy.
> 
> How much weight you assign to each of those is up to you.  So long as
> we don't credit any of it as entropy, I am not too adverse to mixing
> it in.  But I can equally see benefit in burning the bridges.

Well, mixing it in and using /dev/[u]random is certainly better than
blindly using the output from the RNG from the TPM directly as a
key.

I'm not sure what you mean by "burning the bridges"; what is the
alternative that you are suggesting?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] mfd: timberdale: remove unnecessary pci_set_drvdata()

2013-09-12 Thread Jingoo Han
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL. It also removes unnecessary label such
as 'err_request'.

Signed-off-by: Jingoo Han 
---
Changes since v1:
- Removed unnecessary label such as 'err_request'.

 drivers/mfd/timberdale.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/mfd/timberdale.c b/drivers/mfd/timberdale.c
index a6755ec..dbb34f9 100644
--- a/drivers/mfd/timberdale.c
+++ b/drivers/mfd/timberdale.c
@@ -678,7 +678,7 @@ static int timb_probe(struct pci_dev *dev,
priv->ctl_mapbase = mapbase + CHIPCTLOFFSET;
if (!request_mem_region(priv->ctl_mapbase, CHIPCTLSIZE, "timb-ctl")) {
dev_err(>dev, "Failed to request ctl mem\n");
-   goto err_request;
+   goto err_start;
}
 
priv->ctl_membase = ioremap(priv->ctl_mapbase, CHIPCTLSIZE);
@@ -828,13 +828,10 @@ err_config:
iounmap(priv->ctl_membase);
 err_ioremap:
release_mem_region(priv->ctl_mapbase, CHIPCTLSIZE);
-err_request:
-   pci_set_drvdata(dev, NULL);
 err_start:
pci_disable_device(dev);
 err_enable:
kfree(priv);
-   pci_set_drvdata(dev, NULL);
return -ENODEV;
 }
 
@@ -851,7 +848,6 @@ static void timb_remove(struct pci_dev *dev)
 
pci_disable_msix(dev);
pci_disable_device(dev);
-   pci_set_drvdata(dev, NULL);
kfree(priv);
 }
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] module: Make wait module's refcount to zero procedure as async

2013-09-12 Thread Lucas De Marchi
On Thu, Sep 12, 2013 at 9:30 PM, Rusty Russell  wrote:
> Peter Chen  writes:
>> Currently, if module's refcount is not zero during the unload,
>> it waits there until the user decreases that refcount.
>
> Hi Peter,
>
> In practice userspace uses O_NONBLOCK.  In fact, I've been
> thinking of removing the blocking case altogether, since it's not really
> what people want.
>
> That would solve your problem and make the code simpler.  Thoughts?

I'm all in favor of this. It's been almost 1 year it's deprecated in
kmod and if anyone tries to use we force a 10s delay on module
removal. So far nobody complained.

Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: msm: Move msm devicetrees under a Qualcomm dir

2013-09-12 Thread Kumar Gala

On Sep 12, 2013, at 5:47 PM, David Brown wrote:

> On Thu, Sep 12, 2013 at 12:55:36PM -0500, Kumar Gala wrote:
>> 
>> On Sep 12, 2013, at 12:06 PM, Olof Johansson wrote:
> 
>>> My original request to please use a common prefix for your product
>>> families stands. Please prefix with msm-*, or if you have to, qcom-*
>>> instead, since you guys can't seem to make your mind up on standard
>>> prefixes (msm, apq, etc).
>> 
>> This is silly, I dont see the reason to go with
>> qcom-apq-.dts and than in the future drop qcom- when we
>> mostly likely shift to a dir structure.  As engineers we are all too
>> aware of the lack of sanity in marketing names, but its what we have
>> so we have to live with it.
> 
> At least what we'd decided a year or two ago was to call _everything_
> with an msm* prefix.  If marketing comes up with cute prefixes for
> things, we would basically ignore them.  So, under that, it should be
> an msm8074-dragonboard.  Admittedly, it might be a little confusing
> with the name of the product having the apq in it, but as others have
> pointed out, I think there is less confusing than not having a common
> prefix on our MSM products.
> 
> At least so far, there are no chips where apq vs msm actually
> distinguishes anything.  In fact, a simple "decoder ring" would point
> out that the 'apq' usually corresponds with the second digit being a
> zero.  It doesn't help that we've added an 'mpq' prefix as well.
> 
> I don't really see how to satisfy all of this other than qcom-apq*, or
> just continue to use msm*.


I think msm has run out of steam, especially as more SoCs come out of Qualcomm 
that aren't just targeting phones & tablets.

- k

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by 
The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 4/5] trace-cmd: Add virt-server mode for a virtualization environment

2013-09-12 Thread Yoshihiro YUNOMAE
Add the virt-server mode for a virtualization environment based on the listen
mode for networking. This mode works like client/server mode over TCP/UDP,
but it uses virtio-serial channel instead of IP network. Using networking for
collecting trace data of guests is generally high overhead caused by processing
of the network stack.

We use virtio-serial for collecting trace data of guests. virtio-serial is a
simple communication path between the guest and the host. Moreover,
since virtio-serial and ftrace can use splice(2), memory copying is not
occurred on the guests. Therefore, total overhead for collecting trace data
of the guests will be reduced. The implementation of clients will be shown
in another patch.

virt-server uses two kinds of virtio-serial I/Fs:
(1) agent-ctl-path(UNIX domain socket)
=> control path of an agent trace-cmd each guest
(2) trace-path-cpuX(named pipe)
=> trace data path each vcpu

Those I/Fs must be defined as below paths:
(1) /tmp/trace-cmd/virt/agent-ctl-path
(2) /tmp/trace-cmd/virt//trace-path-cpuX

If we run virt-server, agent-ctl-path I/F is automatically created because
virt-server operates as a server mode of UNIX domain socket. However,
trace-path-cpuX is not automatically created because we need to separate
trace data for each guests.

When the client uses virtio-serial, the client must notify the server of the
connection. This is because a virtio-serial I/F on the guest is a just character
device. In other words, the server cannot understand whether the client exists
or not even if the client opens the I/F. So, the server using virtio-serial
waits for the connection message MSG_TCONNECT from the client.
The server and the client operate as follows:

  
  wait for MSG_TCONNECT
  open virtio-serial I/F
  send MSG_TCONNECT
  receive MSG_TCONNECT <+
  send MSG_RCONNECT
+---> receive MSG_RCONNECT
  check "tracecmd-V2"
  send cpus,pagesize,option(MSG_TINIT)
  receive MSG_TINIT <---+
  print "cpus=XXX"
  print "pagesize=XXX"
  understand option
  send port_array
   +--MSG_RINIT-> receive MSG_RINIT
  understand port_array
  send meta data(MSG_SENDMETA)
  receive MSG_SENDMETA <+
  record meta data
 (snip)
  send a message to finish sending meta data
|   (MSG_FINMETA)
  receive MSG_FINMETA <-+
  read block
 --- start sending trace data on child processes ---

 --- When client finishes sending trace data ---
  send MSG_CLOSE
  receive MSG_CLOSE <---+
  close(socket fd)close(socket fd)


1. Run virt-server on a host before booting guests
   # trace-cmd virt-server

2. Make guest domain directory
   # mkdir -p /tmp/trace-cmd/virt/
   # chmod 710 /tmp/trace-cmd/virt/
   # chgrp qemu /tmp/trace-cmd/virt/

3. Make FIFO on the host
   # mkfifo /tmp/trace-cmd/virt//trace-path-cpu{0,1,...,X}.{in,out}

4. Set up of virtio-serial pipe of a guest on the host
   Add the following tags to domain XML files.
   # virsh edit 
   
  
  
   
   
  
  
   
   ... (cpu1, cpu2, ...)

5. Boot the guest
   # virsh start 

6. Check I/F of virtio-serial on the guest
   # ls /dev/virtio-ports
 ...
 agent-ctl-path
 ...
 trace-path-cpu0
 ...

Next, the user will run trace-cmd with record --virt options or other options
for virtualization on the guest.

This patch adds only minimum features of virt-server as follows:

 - virt-server subcommand
 - Create I/F directory(/tmp/trace-cmd/virt/)
 - Use named pipe I/Fs of virtio-serial for trace data paths
 - Use UNIX domain socket for connecting clients on guests
 - Use splice(2) for collecting trace data of guests


 - Use libvirt when we boot guests

Signed-off-by: Yoshihiro YUNOMAE 
---
 Documentation/trace-cmd-virt-server.1.txt |   89 ++
 trace-cmd.c   |3 
 trace-cmd.h   |2 
 trace-listen.c|  434 -
 trace-msg.c   |  105 +++
 trace-recorder.c  |   54 +++-
 trace-usage.c |   10 +
 7 files changed, 602 insertions(+), 95 deletions(-)
 create mode 100644 Documentation/trace-cmd-virt-server.1.txt

diff --git a/Documentation/trace-cmd-virt-server.1.txt 
b/Documentation/trace-cmd-virt-server.1.txt
new file mode 100644
index 000..4168a04
--- /dev/null
+++ b/Documentation/trace-cmd-virt-server.1.txt
@@ -0,0 +1,89 @@
+TRACE-CMD-VIRT-SERVER(1)
+
+
+NAME
+
+trace-cmd-virt-server - listen for incoming connection to record tracing of
+

[PATCH V2 3/5] trace-cmd: Use poll(2) to wait for a message

2013-09-12 Thread Yoshihiro YUNOMAE
Use poll(2) to wait for a message. If a client/server cannot send a message for
any reasons, the current server/client will wait in a blocking read operation.
So, we use poll(2) for avoiding remaining in a blocking state.

Signed-off-by: Yoshihiro YUNOMAE 
---
 trace-msg.c |   42 --
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/trace-msg.c b/trace-msg.c
index cf82ff6..61bde54 100644
--- a/trace-msg.c
+++ b/trace-msg.c
@@ -396,6 +396,27 @@ error:
return -ENOMSG;
 }
 
+#define MSG_WAIT_MSEC  5000
+
+/*
+ * A return value of 0 indicates time-out
+ */
+static int tracecmd_msg_recv_wait(int fd, char *buf, struct tracecmd_msg **msg)
+{
+   struct pollfd pfd;
+   int ret;
+
+   pfd.fd = fd;
+   pfd.events = POLLIN;
+   ret = poll(, 1, MSG_WAIT_MSEC);
+   if (ret < 0) {
+   return -errno;
+   } else if (ret == 0)
+   return -ETIMEDOUT;
+
+   return tracecmd_msg_recv(fd, buf);
+}
+
 static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
 {
return (void *)msg + offset;
@@ -407,9 +428,12 @@ static int tracecmd_msg_wait_for_msg(int fd, struct 
tracecmd_msg **msg)
u32 cmd;
int ret;
 
-   ret = tracecmd_msg_recv(fd, msg_tmp);
-   if (ret < 0)
+   ret = tracecmd_msg_recv_wait(fd, msg_tmp, msg);
+   if (ret < 0) {
+   if (ret == -ETIMEDOUT)
+   warning("Connection timed out\n");
return ret;
+   }
 
*msg = (struct tracecmd_msg *)msg_tmp;
cmd = ntohl((*msg)->cmd);
@@ -487,9 +511,12 @@ int tracecmd_msg_initial_setting(int fd, int *cpus, int 
*pagesize)
u32 size = 0;
u32 cmd;
 
-   ret = tracecmd_msg_recv(fd, buf);
-   if (ret < 0)
+   ret = tracecmd_msg_recv_wait(fd, buf, );
+   if (ret < 0) {
+   if (ret == -ETIMEDOUT)
+   warning("Connection timed out\n");
return ret;
+   }
 
msg = (struct tracecmd_msg *)buf;
cmd = ntohl(msg->cmd);
@@ -625,9 +652,12 @@ int tracecmd_msg_collect_metadata(int ifd, int ofd)
int ret;
 
do {
-   ret = tracecmd_msg_recv(ifd, buf);
+   ret = tracecmd_msg_recv_wait(ifd, buf, );
if (ret < 0) {
-   warning("reading client");
+   if (ret == -ETIMEDOUT)
+   warning("Connection timed out\n");
+   else
+   warning("reading client");
return ret;
}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

2013-09-12 Thread Yoshihiro YUNOMAE
Apply trace-msg protocol for communication between a server and clients.

Currently, trace-listen(server) and trace-record -N(client) operate as follows:

  
  listen to socket fd
  connect to socket fd
  accept the client
  send "tracecmd"
   +> receive "tracecmd"
  check "tracecmd"
  send cpus
  receive cpus <+
  print "cpus=XXX"
  send pagesize
|
  receive pagesize <+
  print "pagesize=XXX"
  send option
|
  receive option <--+
  understand option
  send port_array
   +> receive port_array
  understand port_array
  send meta data
  receive meta data <---+
  record meta data
(snip)
  read block
 --- start sending trace data on child processes ---

 --- When client finishes sending trace data ---
  close(socket fd)
  read size = 0
  close(socket fd)

All messages are unstructured character strings, so server(client) using the
protocol must parse the unstructured messages. Since it is hard to
add complex contents in the protocol, structured binary message trace-msg
is introduced as the communication protocol.

By applying this patch, server and client operate as follows:

  
  listen to socket fd
  connect to socket fd
  accept the client
  send "tracecmd"
   +> receive "tracecmd"
  check "tracecmd"
  send "V2\0\00" as the v2 protocol
  receive "V2" <+
  check "V2"
  read "\00"
  send "V2"
+---> receive "V2"
  check "V2"
  send cpus,pagesize,option(MSG_TINIT)
  receive MSG_TINIT <---+
  print "cpus=XXX"
  print "pagesize=XXX"
  understand option
  send port_array
   +--MSG_RINIT-> receive MSG_RINIT
  understand port_array
  send meta data(MSG_SENDMETA)
  receive MSG_SENDMETA <+
  record meta data
 (snip)
  send a message to finish sending meta data
|   (MSG_FINMETA)
  receive MSG_FINMETA <-+
  read block
 --- start sending trace data on child processes ---

 --- When client finishes sending trace data ---
  send MSG_CLOSE
  receive MSG_CLOSE <---+
  close(socket fd)close(socket fd)

By introducing the v2 protocol, after the client checks "tracecmd", the client
will send "V2\0\00\0". This complex message is used when the
new client tries to connect to the old server. The new client wants to check
whether the reply message from the server is "V2" or not. However, the old
server does not respond to the client before receiving cpu numbers, page size,
and options. Each message is separated with "\0" in the old server, so the
client send "V2" as cpu numbers, "" as page size, and "0" as
no options. On the other hands, the old server will understand the messages
as cpus=0, pagesize=, and options=0, and then the server will
send the message "\0" as port numbers. Then, the message which the client
receives is not "V2" but "\0", so the client will reconnect to the old server
as the v1 protocol.

Changes in V2: Regacy porotocol support in order to keep backward compatibility

Signed-off-by: Yoshihiro YUNOMAE 
---
 Makefile   |2 
 trace-cmd.h|   11 +
 trace-listen.c |  133 +++
 trace-msg.c|  683 
 trace-msg.h|   27 ++
 trace-output.c |4 
 trace-record.c |   86 ++-
 7 files changed, 880 insertions(+), 66 deletions(-)
 create mode 100644 trace-msg.c
 create mode 100644 trace-msg.h

diff --git a/Makefile b/Makefile
index 1964949..054f53d 100644
--- a/Makefile
+++ b/Makefile
@@ -314,7 +314,7 @@ KERNEL_SHARK_OBJS = $(TRACE_VIEW_OBJS) $(TRACE_GRAPH_OBJS) 
$(TRACE_GUI_OBJS) \
 PEVENT_LIB_OBJS = event-parse.o trace-seq.o parse-filter.o parse-utils.o
 TCMD_LIB_OBJS = $(PEVENT_LIB_OBJS) trace-util.o trace-input.o trace-ftrace.o \
trace-output.o trace-recorder.o trace-restore.o 
trace-usage.o \
-   trace-blk-hack.o kbuffer-parse.o
+   trace-blk-hack.o kbuffer-parse.o trace-msg.o
 
 PLUGIN_OBJS = plugin_hrtimer.o plugin_kmem.o plugin_sched_switch.o \
plugin_mac80211.o plugin_jbd2.o plugin_function.o plugin_kvm.o \
diff --git a/trace-cmd.h b/trace-cmd.h
index cbbc6ed..a2958ac 

[PATCH V2 5/5] trace-cmd: Add --virt option for record mode

2013-09-12 Thread Yoshihiro YUNOMAE
Add --virt option for record mode for a virtualization environment.
If we use this option on a guest, we can send trace data in low-overhead.
This is because guests can send trace data to a host without copying the data
by using splice(2).

The format is:

   trace-cmd record --virt -e sched*


The client using virtio-serial does not wait for the connection message
"tracecmd" from the server. The client sends the connection message
MSG_TCONNECT first.


This feature can use from kernel-3.6 which supports splice_read for ftrace
and splice_write for virtio-serial.

Signed-off-by: Yoshihiro YUNOMAE 
---
 Documentation/trace-cmd-record.1.txt |   11 -
 trace-cmd.h  |3 +
 trace-msg.c  |   80 --
 trace-msg.h  |4 ++
 trace-record.c   |   70 --
 5 files changed, 156 insertions(+), 12 deletions(-)

diff --git a/Documentation/trace-cmd-record.1.txt 
b/Documentation/trace-cmd-record.1.txt
index 832a257..7eb8ac9 100644
--- a/Documentation/trace-cmd-record.1.txt
+++ b/Documentation/trace-cmd-record.1.txt
@@ -240,6 +240,15 @@ OPTIONS
 timestamp to gettimeofday which will allow wall time output from the
 timestamps reading the created 'trace.dat' file.
 
+*--virt*::
+This option is usded on a guest in a virtualization environment. If a host
+is running "trace-cmd virt-server", this option is used to have the data
+sent to the host with virtio-serial like *-N* option. (see also
+trace-cmd-virt-server(1))
+
+Note: This option is not supported with latency tracer plugins:
+  wakeup, wakeup_rt, irqsoff, preemptoff and preemptirqsoff
+
 EXAMPLES
 
 
@@ -302,7 +311,7 @@ SEE ALSO
 
 trace-cmd(1), trace-cmd-report(1), trace-cmd-start(1), trace-cmd-stop(1),
 trace-cmd-extract(1), trace-cmd-reset(1), trace-cmd-split(1),
-trace-cmd-list(1), trace-cmd-listen(1)
+trace-cmd-list(1), trace-cmd-listen(1), trace-cmd-virt-server(1)
 
 AUTHOR
 --
diff --git a/trace-cmd.h b/trace-cmd.h
index ce3df2c..d69ea2e 100644
--- a/trace-cmd.h
+++ b/trace-cmd.h
@@ -250,7 +250,8 @@ void tracecmd_stat_cpu(struct trace_seq *s, int cpu);
 long tracecmd_flush_recording(struct tracecmd_recorder *recorder);
 
 /* for clients */
-int tracecmd_msg_send_init_data(int fd);
+int tracecmd_msg_connect_to_server(int fd);
+int tracecmd_msg_send_init_data_nw(int fd);
 int tracecmd_msg_metadata_send(int fd, char *buf, int size);
 int tracecmd_msg_finish_sending_metadata(int fd);
 void tracecmd_msg_send_close_msg();
diff --git a/trace-msg.c b/trace-msg.c
index 0b3b356..4de1cf3 100644
--- a/trace-msg.c
+++ b/trace-msg.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -72,6 +73,7 @@ int cpu_count;
 static int psfd;
 unsigned int page_size;
 int *client_ports;
+int *virt_sfds;
 bool send_metadata;
 
 /* for server */
@@ -268,12 +270,20 @@ static int make_rinit(struct tracecmd_msg *msg)
return 0;
 }
 
+static int make_error_msg(u32 len, struct tracecmd_msg *msg)
+{
+   bufcpy(msg, TRACECMD_MSG_HDR_LEN, errmsg, len);
+   return 0;
+}
+
 static u32 tracecmd_msg_get_body_length(u32 cmd)
 {
struct tracecmd_msg *msg;
u32 len = 0;
 
switch (cmd) {
+   case MSG_ERROR:
+   return ntohl(errmsg->size);
case MSG_RCONNECT:
return sizeof(msg->data.rconnect.str.size) + CONNECTION_MSGSIZE;
case MSG_TINIT:
@@ -302,6 +312,7 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
   + sizeof(msg->data.rinit.port_array);
case MSG_SENDMETA:
return TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN;
+   case MSG_TCONNECT:
case MSG_CLOSE:
case MSG_FINMETA:
break;
@@ -313,12 +324,15 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
 static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
 {
switch (cmd) {
+   case MSG_ERROR:
+   return make_error_msg(len, msg);
case MSG_RCONNECT:
return make_rconnect(CONNECTION_MSG, CONNECTION_MSGSIZE, msg);
case MSG_TINIT:
return make_tinit(len, msg);
case MSG_RINIT:
return make_rinit(msg);
+   case MSG_TCONNECT:
case MSG_CLOSE:
case MSG_SENDMETA: /* meta data is not stored here. */
case MSG_FINMETA:
@@ -374,6 +388,12 @@ static int tracecmd_msg_send(int fd, u32 cmd)
return 0;
 }
 
+static void tracecmd_msg_send_error(int fd, struct tracecmd_msg *msg)
+{
+   errmsg = msg;
+   tracecmd_msg_send(fd, MSG_ERROR);
+}
+
 static int tracecmd_msg_read_extra(int fd, char *buf, u32 size, int *n)
 {
int r = 0;
@@ -502,20 +522,36 @@ static int tracecmd_msg_send_and_wait_for_msg(int fd, u32 
cmd, struct tracecmd_m
return 0;
 }
 
-int tracecmd_msg_send_init_data(int fd)
+static int 

[PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

2013-09-12 Thread Yoshihiro YUNOMAE
Hi Steven,

This is a v2 patch set for realizing a part of "Integrated trace" feature which 
is a trace merging system for a virtualization environment. Currently, trace-cmd
does not have following features yet:

a) Server and client for a virtualization environment
b) Structured message platform between guests and host
c) Agent feature of a client
d) Merge feature of trace data of multiple guests and host in chronological
   order

This patch set supports above a) and b) features.



++++
Guest   |   a), c)   ||   a), c)   | client/agent
  ^ ++++
  | ^   ^ ^   ^
|===|=|===|===
  | v b)v v b)v
  v +--+
Host|   a) | server
+--+
||output|| ||
\/  \/ \/
/+  /+  /+
| 010101 |  | 101010 |  | 100101 | binary data
| 010100 |  | 010100 |  | 110011 |
++  ++  ++
  \ /
   \---/
   || d)
   \/
   /---+
   | (guest1) 123456: sched_switch...  | text data
   | (guest2) 123458: kmem_free... |
   | (host)   123500: kvm_exit (guest1)|
   | (host)   123510: kvm_entry(guest1)|
   | (guest1) 123550: sched_switch...  |
   +---+

a) Server and client for a virtualization environment
 trace-cmd has listen mode for network, but using network will be a high cost
operation for inducing a lot of memory copying. From kernel-3.6, the
virtio-console driver supports splice_write and ftrace supports "steal" for
fops. So, guest clients of trace-cmd can send trace data without copying memory 
by using splice(2). If guest clients use virtio-serial, the server also needs to
support virtio-serial I/F.

b) Structured message platform between guests and a host
 Currently, a server(clients) sends unstructured character string to
clients(server), so clients(server) must parse the unstructured messages.
Since it is hard to add complex contents in the protocol, structured binary
message trace-msg is introduced as the communication protocol.

c) Agent feature of a client
 Current trace-cmd client can operate only as "record" mode, so the client
will send trace data to the server immediately. However, when an user tries to
collect trace data of multiple guests on a host, the user must log in to
each guest. This is hard to use, I think. So, trace-cmd client had better
support agent mode which receives a message from the server.

d) Merge feature of trace data of multiple guests and a host in chronological
   order
 Current trace-cmd has a merge feature for multiple machines whose times are
synchronized by NTP. When we use the feature, we execute "trace-cmd record"
with --date option on each machine, and then we run "trace-cmd report" with -i
option for each file.
 However, there are cases that times of those machines cannot be synchronized.
For example, although multiple users can run guests on virtualization
environments (e.g. multi-tenant cloud hosting), there are no guarantee that
they use the same NTP server. Moreover, even if the times are synchronized,
trace data cannot exactly be merged because the NTP-synchronized time
granularity may not be enough fine for sorting guest-host switching events.
 So, I'm considering that trace data use x86-tsc as timestamp in order to merge 
trace data. By using x86-tsc, we can merge trace data even if time of those
machines is not synchronized when CPU has the invariant TSC feature or the
constant TSC feature. And the precision will be enough for understanding
operations of guests and host. However, TSC values on a guest are not equal to
the values on the host because
 TSC_guest = TSC_host + TSC_offset.
 This series actually doesn't support TSC offset, but I'd like to add such
feature to fix host/guest clock difference in the other series. TSC offset
values can be gotten as write_tsc_offset trace event from kernel-3.11.
(see https://lkml.org/lkml/2013/6/12/72)

For a), this patch introduces "virt-server" and "record --virt" modes for
achieving low-overhead communication of trace data of guests. "virt-server" is a
server mode for collecting trace data of guests. On the other hand,
"record --virt" mode is a guest client for sending trace data of the guest.
Although these functions are similar to "listen" and "record -N" modes each,
these do not use network but use virtio-serial for low-overhead communication.

For b), this patch series introduce specific message protocol in order to handle
communication messages with 8 commands. When we extend any messages, using
structured message will be easier 

[PATCH V2 1/5] [CLEANUP] trace-cmd: Split out binding a port and fork reader from open_udp()

2013-09-12 Thread Yoshihiro YUNOMAE
Split out binding a port and fork reader from open_udp() for avoiding duplicate
codes between listen mode and virt-server mode.

Changes in V2: Add a comment in open_udp()

Signed-off-by: Yoshihiro YUNOMAE 
---
 trace-listen.c |   38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/trace-listen.c b/trace-listen.c
index 8b8f02c..bf187c9 100644
--- a/trace-listen.c
+++ b/trace-listen.c
@@ -228,13 +228,12 @@ static void process_udp_child(int sfd, const char *host, 
const char *port,
 #define START_PORT_SEARCH 1500
 #define MAX_PORT_SEARCH 6000
 
-static int open_udp(const char *node, const char *port, int *pid,
-   int cpu, int pagesize, int start_port)
+static int udp_bind_a_port(int start_port, int *sfd)
 {
struct addrinfo hints;
struct addrinfo *result, *rp;
-   int sfd, s;
char buf[BUFSIZ];
+   int s;
int num_port = start_port;
 
  again:
@@ -250,15 +249,15 @@ static int open_udp(const char *node, const char *port, 
int *pid,
pdie("getaddrinfo: error opening udp socket");
 
for (rp = result; rp != NULL; rp = rp->ai_next) {
-   sfd = socket(rp->ai_family, rp->ai_socktype,
-rp->ai_protocol);
-   if (sfd < 0)
+   *sfd = socket(rp->ai_family, rp->ai_socktype,
+ rp->ai_protocol);
+   if (*sfd < 0)
continue;
 
-   if (bind(sfd, rp->ai_addr, rp->ai_addrlen) == 0)
+   if (bind(*sfd, rp->ai_addr, rp->ai_addrlen) == 0)
break;
 
-   close(sfd);
+   close(*sfd);
}
 
if (rp == NULL) {
@@ -270,6 +269,12 @@ static int open_udp(const char *node, const char *port, 
int *pid,
 
freeaddrinfo(result);
 
+   return num_port;
+}
+
+static void fork_udp_reader(int sfd, const char *node, const char *port,
+   int *pid, int cpu, int pagesize)
+{
*pid = fork();
 
if (*pid < 0)
@@ -279,6 +284,23 @@ static int open_udp(const char *node, const char *port, 
int *pid,
process_udp_child(sfd, node, port, cpu, pagesize);
 
close(sfd);
+}
+
+static int open_udp(const char *node, const char *port, int *pid,
+   int cpu, int pagesize, int start_port)
+{
+   int sfd;
+   int num_port;
+
+   /*
+* udp_bind_a_port() currently does not return an error, but if that
+* changes in the future, we have a check for it now. 
+*/
+   num_port = udp_bind_a_port(start_port, );
+   if (num_port < 0)
+   return num_port;
+
+   fork_udp_reader(sfd, node, port, pid, cpu, pagesize);
 
return num_port;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] module: Make wait module's refcount to zero procedure as async

2013-09-12 Thread Peter Chen
On Fri, Sep 13, 2013 at 10:00:33AM +0930, Rusty Russell wrote:
> Peter Chen  writes:
> > Currently, if module's refcount is not zero during the unload,
> > it waits there until the user decreases that refcount.
> 
> Hi Peter,
> 
> In practice userspace uses O_NONBLOCK.  In fact, I've been
> thinking of removing the blocking case altogether, since it's not really
> what people want.
> 
> That would solve your problem and make the code simpler.  Thoughts?
> 

So, it will like "Force unload" case, right?
If it is the case, it is better have a warning message to indicate
some users are still using it, since there may null pointer
dereference when the user module has unloaded, and the end user
can understand it may be triggered by wrong module unload sequence.

Thanks.

-- 

Best Regards,
Peter Chen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/futex.c: notice the return value after rt_mutex_finish_proxy_lock() fails

2013-09-12 Thread Chen Gang


Firstly, I am glad to see that you did not redirect all my mails to
"/dev/null".  ;-)


On 09/13/2013 07:36 AM, Thomas Gleixner wrote:
> On Thu, 12 Sep 2013, Darren Hart wrote:
>> On Thu, 2013-09-12 at 16:32 +0200, Thomas Gleixner wrote:
>>> On Tue, 20 Aug 2013, Chen Gang wrote:
>>>
 rt_mutex_finish_proxy_lock() can return failure code (e.g. -EINTR,
 -ETIMEDOUT).

 Original implementation has already noticed about it, but not check it
 before next work.

 Also let coments within 80 columns to pass "./scripts/checkpatch.pl".


 Signed-off-by: Chen Gang 
 ---
  kernel/futex.c |   30 --
  1 files changed, 16 insertions(+), 14 deletions(-)

 diff --git a/kernel/futex.c b/kernel/futex.c
 index c3a1a55..1a94e7d 100644
 --- a/kernel/futex.c
 +++ b/kernel/futex.c
 @@ -2373,21 +2373,23 @@ static int futex_wait_requeue_pi(u32 __user 
 *uaddr, unsigned int flags,
   ret = rt_mutex_finish_proxy_lock(pi_mutex, to, _waiter, 
 1);
   debug_rt_mutex_free_waiter(_waiter);
  
 - spin_lock(q.lock_ptr);
 - /*
 -  * Fixup the pi_state owner and possibly acquire the lock if 
 we
 -  * haven't already.
 -  */
 - res = fixup_owner(uaddr2, , !ret);
 - /*
 -  * If fixup_owner() returned an error, proprogate that.  If 
 it
 -  * acquired the lock, clear -ETIMEDOUT or -EINTR.
 -  */
 - if (res)
 - ret = (res < 0) ? res : 0;
 + if (!ret) {
>>>
>>> Again. This is completely wrong!  
>>>

Yeah, really it is.


>>> We MUST call fixup_owner even if finish_proxy_lock() returned with an
>>> error code. Simply because finish_proxy_lock() is called outside of
>>> the spin_lock(q.lock_ptr) region and another thread might have
>>> modified the futex state. So we need to handle the corner cases
>>> otherwise we might leave the futex in some undefined state.
>>>
>>> You're reintroducing a hard to decode bug, which got analyzed and
>>> fixed in futex_lock_pi() years ago. See the history for the
>>> explanation.
>>>

Thank you for your details explanation.


>>> Sigh.
>>>
>>> tglx
>>
>> Chen, perhaps you can let us know what the failure scenario is that you
>> are trying to address with this patch.
> 
> No failure scenario at all.
> 
> Chen is on a self defined agenda to fix random kernel bugs in random
> kernel subdirectories on a given rate by all means. (Google yourself
> for the details.)
> 

Hmm... what you said is partly correct -- it is part of my goal (at
least, I feel it is valuable to kernel).

Others which you did not mention, but still related with kernel:

  1. LTP (Linux Test Project), which I will start at q4 of 2013, which can let 
me provide more tests on kernel (also can find more kernel issues).

  2. gcc/binutils: which can find more issues both for kernel and gcc/binutils 
(I am also communicating with gcc folks too).

  3. Documents (or trivial patches): which I am trying, but seems I did not do 
quite well.


> That crusade does not involve any failure analysis or test cases. It's
> just driven by mechanically checking the code for inconsistencies. Now
> he tripped over a non obvious return value chain in the futex code. So
> instead of figuring out why it is coded this way, he just mechanically
> decided that there is a missing check. Though:
> 
> The return value is checked and it needs deep understanding of the way
> how futexes work to grok why it's necessary to invoke fixup_owner()
> independent of the rt_mutex_finish_proxy_lock() return value.
> 
> The code in question is:
> 
>   ret = rt_mutex_finish_proxy_lock(pi_mutex, to, _waiter, 1);
> 
>   spin_lock(q.lock_ptr);
>   /*
>* Fixup the pi_state owner and possibly acquire the lock if we
>* haven't already.
>*/
>   res = fixup_owner(uaddr2, , !ret);
>   /*
>* If fixup_owner() returned an error, proprogate that.  If it
>* acquired the lock, clear -ETIMEDOUT or -EINTR. 
>*/
>   if (res)
>   ret = (res < 0) ? res : 0;
> 
> If you can understand the comments in the code and you are able to
> follow the implementation of fixup_owner() and the usage of "!ret" as
> an argument you really should be able to figure out, why this is
> correct.
> 
> I'm well aware, as you are, that this code is hard to grok. BUT:
> 
> If this code in futex_wait_requeue_pi() is wrong why did Chen's
> correctness checker not trigger on the following code in
> futex_lock_pi()?:
> 
>   if (!trylock)
>   ret = rt_mutex_timed_lock(_state->pi_mutex, to, 1);
>   else {
>   ret = rt_mutex_trylock(_state->pi_mutex);
>   /* Fixup the trylock return value: */
>   ret = ret ? 0 : -EWOULDBLOCK;
>

Re: [PATCH] Inet-hashtable: Change the range of sk->hash lock to avoid the race condition.

2013-09-12 Thread Jun Chen
On Thu, 2013-09-12 at 05:00 -0700, Eric Dumazet wrote:
> On Thu, 2013-09-12 at 12:32 -0400, Jun Chen wrote:
> > When try to add node to list in __inet_hash_nolisten function, first get the
> > list and then to lock for using, but in extremeness case, others can del 
> > this
> > node before locking it, then the node should be null.So this patch try to 
> > lock
> > firstly and then get the list for using to avoid this race condition.
> 
> I suspect another bug. This should not happen.
> 
> Care to describe the problem you got ?
> 
> Thanks
> 
> 

Ok, I just got this call stack and no more info, pls help to look it.
thanks!

<1>[ 88.548263] BUG: unable to handle kernel NULL pointer dereference at
0004
<1>[ 88.548490] IP: [] __inet_hash_nolisten+0xc1/0x140
<4>[ 88.548617] *pde = 
<4>[ 88.549927] EIP is at __inet_hash_nolisten+0xc1/0x140
<4>[ 88.550008] EAX:  EBX: e08c ECX: edf846e0 EDX: e08c0020
<4>[ 88.550055] ESI: c20213c0 EDI: edc12dc0 EBP: ce4bfdfc ESP: ce4bfde8
<4>[ 88.550137] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
<4>[ 88.550184] CR0: 80050033 CR2: 0004 CR3: 2b4ff000 CR4: 001007d0
<4>[ 88.550266] DR0:  DR1:  DR2:  DR3: 
<4>[ 88.550346] DR6: 0ff0 DR7: 0400
<0>[ 88.550392] Process WebViewCoreThre (pid: 2137, ti=ce4be000
task=eb193c80 task.ti=ce4be000)
<0>[ 88.551746] Call Trace:
<4>[ 88.551797] [] __inet_hash_connect+0x295/0x2d0
<4>[ 88.551883] [] inet_hash_connect+0x40/0x50
<4>[ 88.551932] [] ? inet_unhash+0x90/0x90
<4>[ 88.551981] [] ? __inet_lookup_listener+0x1b0/0x1b0
<4>[ 88.552067] [] tcp_v4_connect+0x247/0x4a0
<4>[ 88.552117] [] ? lock_sock_nested+0x3e/0x50
<4>[ 88.552205] [] inet_stream_connect+0xe2/0x290
<4>[ 88.552254] [] ? _copy_from_user+0x35/0x50
<4>[ 88.552342] [] sys_connect+0xb2/0xd0
<4>[ 88.552393] [] ? alloc_file+0x20/0xa0
<4>[ 88.552441] [] ? tcp_setsockopt+0x50/0x60
<4>[ 88.552525] [] ? fget_light+0x44/0xe0
<4>[ 88.552574] [] ? sock_common_setsockopt+0x27/0x40
<4>[ 88.552659] [] ? _copy_from_user+0x35/0x50
<4>[ 88.552708] [] sys_socketcall+0xab/0x2b0
<4>[ 88.552790] [] ? trace_hardirqs_on_thunk+0xc/0x10
<4>[ 88.552840] [] syscall_call+0x7/0xb
<4>[ 88.552923] [] ? mutex_trylock+0x30/0x140

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V2 1/1] X86: Hyper-V: Get the local APIC timer frequency from the hypervisor

2013-09-12 Thread KY Srinivasan


> -Original Message-
> From: H. Peter Anvin [mailto:h...@zytor.com]
> Sent: Thursday, September 12, 2013 5:28 PM
> To: KY Srinivasan
> Cc: x...@kernel.org; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; t...@linutronix.de; jbeul...@suse.com; b...@alien8.de
> Subject: Re: [PATCH V2 1/1] X86: Hyper-V: Get the local APIC timer frequency
> from the hypervisor
> 
> On 09/12/2013 05:06 PM, KY Srinivasan wrote:
> >
> > Peter,
> >
> > Let me know if you want me to address any additional issues in this patch.
> >
> 
> Please address Jan and Gleb's feedback.

Gleb's feedback was a question and I answered that as I did Jan's feedback as 
well.
Gleb, Jan, please let me know if there is something else you want addressed 
here.

Regards,

K. Y

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/4] DRIVERS: IRQCHIP: Add crossbar irqchip driver

2013-09-12 Thread Santosh Shilimkar
On Thursday 12 September 2013 08:26 PM, Thomas Gleixner wrote:
> On Thu, 12 Sep 2013, Santosh Shilimkar wrote:
>> On Thursday 12 September 2013 06:22 PM, Thomas Gleixner wrote:
>>> Now the real question is, how that expansion mechanism is supposed to
>>> work. There are two possible scenarios:
>>>
>>> 1) Expand the number of handled interrupts beyond the GIC capacity:
>>>
>>>That requires a mechanism in CROSSBAR to map several CROSSBAR
>>>interrupts to a particular GIC interrupt and provide a demux
>>>mechanism to invoke the shared handlers.
>>>
>> This is not possible in hardware and not supported. Hardware has
>> no notion of muxing multiple IRQ's to generate 1 IRQ or ack etc
>> functionality. Its a simple MUX to tie knots between input and output
>> wires.
> 
> It's not a MUX. It's a ROUTING mechanism. That's similar to the
> mechanisms which are used by MSI[X]. We assign arbitrary interrupt
> numbers to a device and route them to some underlying limited hardware
> interrupt controller.
> 
>>> 2) Provide a mapping mechanism between possibly 250 interrupt numbers
>>>and a limitation of a total 160 active interrupts by the underlying
>>>GIC.
>>>
>> This is the need and problem we are trying to solve.
> 
> Let me summarize:
> 
>- GIC supports up to 160 interrupts
> 
>- CROSSBAR supports up to 250 interrupts 
> 
>- CROSSBAR routes up to 160 out of 250 interrupts to the GIC ones
> 
>- Drivers request a CROSSBAR interrupt number which must be mapped
>  to some arbitrary available GIC irq number
> 
Correct.

> So basically the CROSSBAR mechanism is pretty much the same as MSI[X]
> just in a different flavour and with a different set of semantics and
> limitations, i.e. poor mans MSI[X] with a new level of bogosity.
> 
> So if CROSSBAR is going to be the new fangled SoC MSI[X] long term
> equivalent then you better provide some infrastructure for that and
> make the drivers ready to use it. Maybe check with the PCI/MSI folks
> to share some of the interfaces.
>
> If that whole thing is another onetime HW designers wet dream, then
> please go back to the limited but completely functional (Who is going
> to use more than 160 peripheral interrupts) device tree model. I
> really have no interest to support hardware designer brain farts.
> 
Thanks for clear NAK for irqchip approach. I should have looped you
in the discussion where I was also suggesting against the irqchip
approach. We will try to look at MSI stuff but if its get too
complicated am going to fall-back to the initial probe based
approach to achieve the functionality.

Thanks again for clear direction and useful discussion.

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] ANB(Automatic NUMA Balancing): erase mm footprint of migrated page

2013-09-12 Thread Rik van Riel
On 09/12/2013 08:45 PM, Hillf Danton wrote:
> If a page monitored by ANB is migrated, its footprint should be erased from
> numa-hint-fault account, because it is no longer used. Or two pages, the
> migrated page and its target page, are used in the view of task placement.
> 
> 
> Signed-off-by: Hillf Danton 

NAK

The numa faults buffer contains the number of pages on each
node that the task recently faulted on.

If the page got migrated, it is only counted on the new node,
not on the old one. That means there is no need to subtract
it on the old node.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linus Git (3.11.0+) - security/apparmor/include/policy.h:363 suspicious rcu_dereference_check() usage!

2013-09-12 Thread John Johansen
On 09/12/2013 10:34 AM, Paul E. McKenney wrote:
> On Wed, Sep 11, 2013 at 11:39:31PM -0400, Miles Lane wrote:
>> [   29.804534] [ INFO: suspicious RCU usage. ]
>> [   29.804539] 3.11.0+ #5 Not tainted
>> [   29.804541] ---
>> [   29.804545] security/apparmor/include/policy.h:363 suspicious
>> rcu_dereference_check() usage!
> 
> Adding the AppArmor maintainer and list on CC, looks like one for them.
> 
yep, I'm looking into it thanks for the heads up

>   Thanx, Paul
> 
>> [   29.804548]
>> [   29.804548] other info that might help us debug this:
>> [   29.804548]
>> [   29.804553]
>> [   29.804553] rcu_scheduler_active = 1, debug_locks = 1
>> [   29.804558] 2 locks held by apparmor_parser/1268:
>> [   29.804560]  #0:  (sb_writers#9){.+.+.+}, at: []
>> file_start_write+0x27/0x29
>> [   29.804576]  #1:  (>lock){+.+.+.}, at: []
>> aa_replace_profiles+0x166/0x57c
>> [   29.804589]
>> [   29.804589] stack backtrace:
>> [   29.804595] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
>> [   29.804599] Hardware name: ASUSTeK Computer Inc. UL50VT
>>  /UL50VT, BIOS 217 03/01/2010
>> [   29.804602]   8800b95a1d90 8144eb9b
>> 8800b94db540
>> [   29.804611]  8800b95a1dc0 81087439 880138cc3a18
>> 880138cc3a18
>> [   29.804619]  8800b9464a90 880138cc3a38 8800b95a1df0
>> 811f5084
>> [   29.804628] Call Trace:
>> [   29.804636]  [] dump_stack+0x4e/0x82
>> [   29.804642]  [] lockdep_rcu_suspicious+0xfc/0x105
>> [   29.804649]  [] __aa_update_replacedby+0x53/0x7f
>> [   29.804655]  [] __replace_profile+0x11f/0x1ed
>> [   29.804661]  [] aa_replace_profiles+0x410/0x57c
>> [   29.804668]  [] profile_replace+0x35/0x4c
>> [   29.804674]  [] vfs_write+0xad/0x113
>> [   29.804680]  [] SyS_write+0x44/0x7a
>> [   29.804687]  [] system_call_fastpath+0x16/0x1b
>> [   29.804691]
>> [   29.804694] ===
>> [   29.804697] [ INFO: suspicious RCU usage. ]
>> [   29.804700] 3.11.0+ #5 Not tainted
>> [   29.804703] ---
>> [   29.804706] security/apparmor/policy.c:566 suspicious
>> rcu_dereference_check() usage!
>> [   29.804709]
>> [   29.804709] other info that might help us debug this:
>> [   29.804709]
>> [   29.804714]
>> [   29.804714] rcu_scheduler_active = 1, debug_locks = 1
>> [   29.804718] 2 locks held by apparmor_parser/1268:
>> [   29.804721]  #0:  (sb_writers#9){.+.+.+}, at: []
>> file_start_write+0x27/0x29
>> [   29.804733]  #1:  (>lock){+.+.+.}, at: []
>> aa_replace_profiles+0x166/0x57c
>> [   29.804744]
>> [   29.804744] stack backtrace:
>> [   29.804750] CPU: 0 PID: 1268 Comm: apparmor_parser Not tainted 3.11.0+ #5
>> [   29.804753] Hardware name: ASUSTeK Computer Inc. UL50VT
>>  /UL50VT, BIOS 217 03/01/2010
>> [   29.804756]   8800b95a1d80 8144eb9b
>> 8800b94db540
>> [   29.804764]  8800b95a1db0 81087439 8800b95b02b0
>> 
>> [   29.804772]  8800b9efba08 880138cc3a38 8800b95a1dd0
>> 811f4f94
>> [   29.804779] Call Trace:
>> [   29.804786]  [] dump_stack+0x4e/0x82
>> [   29.804791]  [] lockdep_rcu_suspicious+0xfc/0x105
>> [   29.804798]  [] aa_free_replacedby_kref+0x4d/0x62
>> [   29.804804]  [] ? aa_put_namespace+0x17/0x17
>> [   29.804810]  [] kref_put+0x36/0x40
>> [   29.804816]  [] __replace_profile+0x13a/0x1ed
>> [   29.804822]  [] aa_replace_profiles+0x410/0x57c
>> [   29.804829]  [] profile_replace+0x35/0x4c
>> [   29.804835]  [] vfs_write+0xad/0x113
>> [   29.804840]  [] SyS_write+0x44/0x7a
>> [   29.804847]  [] system_call_fastpath+0x16/0x1b
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the akpm tree with Linus' tree

2013-09-12 Thread Al Viro
On Thu, Sep 12, 2013 at 06:12:24PM -0700, Linus Torvalds wrote:
> On Thu, Sep 12, 2013 at 5:56 PM, Linus Torvalds
>  wrote:
> >
> > I'll walk through the code, it looked suspicious. Maybe there's
> > something subtle that makes it work, but I don't see it.
> 
> Btw, it's not just the DCACHE_LRU_LIST bit. The games with
> "nr_dentry_unused" look totally broken too. It's decremented in
> dentry_lru_isolate_shrink() for each dentry we remove, and then it is
> decremented *again* in shrink_dcache_sb() by the number of dentries we
> removed.
> 
> Maybe I'm confused, but the code sure looks more confused than I feel.
> 
> I would suggest keeping the same semantics for 'nr_dentry_unused'.
> Dentries are unused whether they are on the "real" LRU list or have
> been tagged with DCACHE_SHRINK_LIST. So moving from one list to the
> other does nothing. It's the "list_del_init()" that should trigger
> both 'nr_dentry_unused' and DCACHE_LRU_LIST bit-clearing.
> 
> In fact, maybe a helper function for _actually_ removing the thing
> from all lists, and adding them back. Right now there are
> "list_del_init()" and "list_add[_tail]()" calls sprinkled around in
> random places, mixed up with the new "list_lru_add()".
> 
> Damn, the code is too confused. I have to go to a highschool parent
> back-to-school meeting, so I won't get to this until maybe on a plane
> tomorrow. Al, can you please give this a look?

Will do...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] /dev/random: Insufficient of entropy on many architectures

2013-09-12 Thread Jörn Engel
On Thu, 12 September 2013 19:35:36 -0400, Jörn Engel wrote:
> 
> I think the existing code is doing just fine for low interrupt loads.
> It makes sense to spend a bit more work to squeeze the last bit of
> randomness out.  But when you get lots of interrupts, you can be
> sloppy and just xor things into the pool.

Btw, if we make collection cheap enough, we can start collecting
entropy from the scheduler.  Computers are fairly deterministic, but
not that much.  The exact time when calling schedule(), the kernel
stack pointer and the userspace stack pointer all contain a bit of
entropy.  Particularly on machines that lack input and disk
randomness I would expect some benefits from this.

Jörn

--
Audacity augments courage; hesitation, fear.
-- Publilius Syrus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/futex.c: notice the return value after rt_mutex_finish_proxy_lock() fails

2013-09-12 Thread Chen Gang
On 09/13/2013 06:37 AM, Darren Hart wrote:
> On Thu, 2013-09-12 at 16:32 +0200, Thomas Gleixner wrote:
>> On Tue, 20 Aug 2013, Chen Gang wrote:
>>
>>> rt_mutex_finish_proxy_lock() can return failure code (e.g. -EINTR,
>>> -ETIMEDOUT).
>>>
>>> Original implementation has already noticed about it, but not check it
>>> before next work.
>>>
>>> Also let coments within 80 columns to pass "./scripts/checkpatch.pl".
>>>
>>>
>>> Signed-off-by: Chen Gang 
>>> ---
>>>  kernel/futex.c |   30 --
>>>  1 files changed, 16 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/kernel/futex.c b/kernel/futex.c
>>> index c3a1a55..1a94e7d 100644
>>> --- a/kernel/futex.c
>>> +++ b/kernel/futex.c
>>> @@ -2373,21 +2373,23 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, 
>>> unsigned int flags,
>>>   ret = rt_mutex_finish_proxy_lock(pi_mutex, to, _waiter, 1);
>>>   debug_rt_mutex_free_waiter(_waiter);
>>>  
>>> - spin_lock(q.lock_ptr);
>>> - /*
>>> -  * Fixup the pi_state owner and possibly acquire the lock if 
>>> we
>>> -  * haven't already.
>>> -  */
>>> - res = fixup_owner(uaddr2, , !ret);
>>> - /*
>>> -  * If fixup_owner() returned an error, proprogate that.  If it
>>> -  * acquired the lock, clear -ETIMEDOUT or -EINTR.
>>> -  */
>>> - if (res)
>>> - ret = (res < 0) ? res : 0;
>>> + if (!ret) {
>>
>> Again. This is completely wrong!  
>>
>> We MUST call fixup_owner even if finish_proxy_lock() returned with an
>> error code. Simply because finish_proxy_lock() is called outside of
>> the spin_lock(q.lock_ptr) region and another thread might have
>> modified the futex state. So we need to handle the corner cases
>> otherwise we might leave the futex in some undefined state.
>>
>> You're reintroducing a hard to decode bug, which got analyzed and
>> fixed in futex_lock_pi() years ago. See the history for the
>> explanation.
>>
>> Sigh.
>>
>> tglx
> 
> Chen, perhaps you can let us know what the failure scenario is that you
> are trying to address with this patch. I only replied the once as I
> pointed out the corner-case and expected you to follow up with that.
> This region of code is very fragile to modifications as it has become
> more corner-cases than core logic in some places :-)
> 

Oh, thanks, it is my fault:

  the 'ret' which return from rt_mutex_finish_proxy_lock(), is used by
the next fixup_owner().

Hmm... excuse me, my English is not quite well, it seems you already
know about it, but not say straightly and directly?

next, when find/feel something wrong, can say directly, I can/should
understand it (and I need/should thank you, too), that will be more
efficient (can save both of us time resources).

:-)

> For starters, I'm not following your second sentence in the commit log.
> Can you elaborate on the following?
> 
> "Original implementation has already noticed about it, but not check it
> before next work."
> 
> Do you have a test-case that demonstrates a failure mode?
> 

No, I just 'found' it, and give a simply 'fix' to let related experts
check (and now, we know it is just a spam).

Hmm... for 'test', it is really an 'important thing' to me (not 'urgent
thing'), I have plan to start to use LTP (Linux Test Project) in q4 of
2013 (start at 2013-10-01).


Thanks.
-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


TIO2

2013-09-12 Thread Spring
Dear Manangers,

Have a nice day!

We are Qingdao HeBangRui Fine Chemicals Co., Ltd, the professional manufacture 
and exporter of titanium dioxide ( Rutiel & Anatase) in China for many years . 

Titanium dioxide, 
chemical formula TiO2, commonly known as titanium dioxide, used photocatalyst, 
cosmetics, and even visible light can rely on UV disinfection and 
sterilization, there are already a number of products available. Wikipedia 
formula: TiO2
Density: 4.23 g / cm 3
Molar mass: 79.866 g / mol
Melting point: 1,843 ° C
Boiling point: 2,972 ° C
IUPAC No.: Titanium dioxide, Titanium (IV) oxide

Welcome inquiry, if you are interested, our best price will be forwarded 
without any delay.

Look forward to your early reply!
Best regards!

Spring
Qingdao HeBangRui Fine Chemicals Co., Ltd
Web: http://www.hbrchem.com
Email:i...@hbrchem.com

Re: [PATCH] memory: add a basic OF-based memory driver

2013-09-12 Thread Emilio López

Hi Olof,

El 12/09/13 21:57, Olof Johansson escribió:

On Thu, Sep 12, 2013 at 5:30 PM, Emilio López  wrote:

This driver's only job is to claim and ensure the necessary clock
for memory operation on a DT-powered machine remains enabled.

Signed-off-by: Emilio López 
---

I believe this new patch should resolve all the concerns raised; as
always, all feedback is welcome :)


I think you're going about this the wrong way.

If you have a problem with a clock not staying on, shouldn't you just
marking it appropriately in the clock table instead, making sure it's
initialized with at least one reference to it?


If by "the clock table" you mean the tree as handled by the common clock 
framework, there is no such flag available as of today; see Mike's reply 
for more information.


Personally I feel that if the general case can solve our problems (in 
this case, having a consumer who prepares and enables the clock), we 
should avoid adding special cases to the framework.



I believe that is how
some of the other platforms handle this, and it's a lot cleaner than
adding a fake binding and a fake driver just to grab a single clock.


The binding doesn't have to be fake; it is actually describing the 
memory controller hardware:


mc: mc@0123000 {
compatible = "simple-memory-controller";
reg = <0x0123000 0x400>;
clocks = < 1>;
};

If one day we get docs and/or have any special features we may need from 
the controller, we can use something like


mc: mc@0123000 {
compatible = "vendor,awesome-mc", "simple-memory-controller";
reg = <0x0123000 0x400>;
clocks = < 1>;
};

Cheers,

Emilio

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the akpm tree with Linus' tree

2013-09-12 Thread Linus Torvalds
On Thu, Sep 12, 2013 at 5:56 PM, Linus Torvalds
 wrote:
>
> I'll walk through the code, it looked suspicious. Maybe there's
> something subtle that makes it work, but I don't see it.

Btw, it's not just the DCACHE_LRU_LIST bit. The games with
"nr_dentry_unused" look totally broken too. It's decremented in
dentry_lru_isolate_shrink() for each dentry we remove, and then it is
decremented *again* in shrink_dcache_sb() by the number of dentries we
removed.

Maybe I'm confused, but the code sure looks more confused than I feel.

I would suggest keeping the same semantics for 'nr_dentry_unused'.
Dentries are unused whether they are on the "real" LRU list or have
been tagged with DCACHE_SHRINK_LIST. So moving from one list to the
other does nothing. It's the "list_del_init()" that should trigger
both 'nr_dentry_unused' and DCACHE_LRU_LIST bit-clearing.

In fact, maybe a helper function for _actually_ removing the thing
from all lists, and adding them back. Right now there are
"list_del_init()" and "list_add[_tail]()" calls sprinkled around in
random places, mixed up with the new "list_lru_add()".

Damn, the code is too confused. I have to go to a highschool parent
back-to-school meeting, so I won't get to this until maybe on a plane
tomorrow. Al, can you please give this a look?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/9] i2c: prepare runtime PM support for I2C client devices

2013-09-12 Thread Aaron Lu
On 09/13/2013 06:06 AM, Sylwester Nawrocki wrote:
> On 09/11/2013 05:32 PM, Mika Westerberg wrote:
>> From: Aaron Lu
>>
>> This patch adds runtime PM support for the I2C bus in a similar way that
>> has been done for PCI bus already. This means that the I2C bus core
>> prepares runtime PM for a client device just before a driver is about to be
>> bound to it. Devices that are not bound to any driver are not prepared for
>> runtime PM.
>>
>> In order to take advantage of this runtime PM support, the client device
>> driver needs drop the device runtime PM reference count by calling
>> pm_runtime_put() in its ->probe() callback and possibly implement rest of
>> the runtime PM callbacks.
>>
>> If the driver doesn't support runtime PM (like most of the existing I2C
>> client drivers), the device in question is regarded as being runtime PM
>> active and powered on.
>>
>> The patch adds also runtime PM support for the adapter device because it is
>> needed to be able to runtime power manage the I2C controller device. The
>> adapter device is handled along with the I2C controller device (it uses
>> pm_runtime_no_callbacks()).
>>
>> Signed-off-by: Aaron Lu
>> Signed-off-by: Mika Westerberg
>> ---
>>   drivers/i2c/i2c-core.c | 44 +++-
>>   1 file changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
>> index f32ca29..44374b4 100644
>> --- a/drivers/i2c/i2c-core.c
>> +++ b/drivers/i2c/i2c-core.c
>> @@ -248,11 +248,30 @@ static int i2c_device_probe(struct device *dev)
>>  client->flags&  I2C_CLIENT_WAKE);
>>  dev_dbg(dev, "probe\n");
>>
>> +/* Make sure the adapter is active */
>> +pm_runtime_get_sync(>adapter->dev);
> 
> So there is currently no way to avoid this behaviour, i.e. to have the 
> adapter
> not activated before any of its client devices is probed, but only later on,
> after explicit call to pm_runtime_get*(>dev) in the client driver ?

The above pm_runtime_get_sync is used to make sure when the client I2C
device is going to be probed, its host adapter device is turned on(or we
will fail the probe). It doesn't affect the adapter's status before the
probe of I2C client device.

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] /dev/random: Insufficient of entropy on many architectures

2013-09-12 Thread Jörn Engel
On Thu, 12 September 2013 19:31:55 -0400, Theodore Ts'o wrote:
> On Thu, Sep 12, 2013 at 05:07:17PM -0400, Jörn Engel wrote:
> > 
> > I happen to have a real-world system with >100k interrupts per second
> > and - surprise - add_interrupt_randomness() showed up prominently in
> > the profiles.  I was also told twice to just remove that call.  I
> > resisted both times and have done far more work to reduce overhead
> > while still collecting entropy.  Some others would have caved in.
> 
> Would it be possible for you to send me the perf numbers that you saw?

Eventually.  The idiot that was me half a year ago failed to attach
perf numbers to the patch description.

> What platform is this?  x86?   Some embedded processor?

x86.  I suspect NAPI significantly cuts down the number of interrupts
for ethernet cards.  My case is with FC interrupts.  Quite likely
doing something like NAPI would help far more for performance than
disabling add_interrupt_randomness().

> > One option is to add the "input_pool.entropy_count > trickle_thresh"
> > condition that all other entropy sources currently have.  But instead
> > I would rather rename fast_mix() to not_too_fast_mix() and implement a
> > real fast_mix().  Essentially just xor the collected numbers into a
> > pool and schedule something to shuffle the bits at a later point.
> 
> We can try some different things to make fast_mix() faster, but it
> would be good to get some hard numbers before we start deciding we
> need to do something more complicated.
> 
> One thing that comes to mind is that fast_mix() is only called in
> exactly one place, and we always pass in a long.  So there are
> certainly ways that we could optimize fast_mix even keeping the
> current mixing algorithm.

I think the existing code is doing just fine for low interrupt loads.
It makes sense to spend a bit more work to squeeze the last bit of
randomness out.  But when you get lots of interrupts, you can be
sloppy and just xor things into the pool.

My patch below is going too far by not even doing the xor.  I was
stupid and under time pressure.  But to my defence,
add_timer_randomness() makes the same mistake.

Jörn

--
Eighty percent of success is showing up.
-- Woody Allen

>From ee197e39b9a6c905db870606f5bacab2a52a8da2 Mon Sep 17 00:00:00 2001
From: Joern Engel 
Date: Wed, 13 Feb 2013 10:34:26 -0800
Subject: [PATCH] random: limit overhead of add_interrupt_randomness

fast_mix is noticeably less fast than the name might imply.  Add
rate-limiting to it, so we only run it once per jiffie and cpu for the
painful case of a single interrupt hammering a cpu.  Instead we do the
dumbest possible mixing - we xor the input with the pool without any
shifting whatsoever.  Gathers some randomness at near-zero cost.

Signed-off-by: Joern Engel 
---
 drivers/char/random.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b86eae9..7b7f64e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -557,6 +557,7 @@ static void mix_pool_bytes(struct entropy_store *r, const 
void *in,
 struct fast_pool {
__u32   pool[4];
unsigned long   last;
+   unsigned long   last_jiffies;
unsigned short  count;
unsigned char   rotate;
unsigned char   last_timer_intr;
@@ -760,6 +761,17 @@ void add_interrupt_randomness(int irq, int irq_flags)
input[3] = ip >> 32;
}
 
+   /*
+* Even fast_mix is slow when dealing with 6-digit interrupt
+* rates.  Rate-limit this to once per jiffie.  If we get lots
+* of interrupts, this still generates 1.6 bits of entropy per
+* second and cpu.  If we get few interrupts, it shouldn't
+* substantially change the entropy collection.
+*/
+   if (fast_pool->last_jiffies == jiffies)
+   return;
+   fast_pool->last_jiffies = jiffies;
+
fast_mix(fast_pool, input, sizeof(input));
 
if ((fast_pool->count & 1023) &&
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-gfx] [PATCH 1/2] drm/i915: kill set_need_resched

2013-09-12 Thread Rob Clark
hmm, looks like I cargo-cult'd the same into msm.

I guess in i915 (and ttm) case, the issue arises due to need for CPU
access to buffer via GTT?  In which case I should be safe to drop the
set_need_resched() as well? (Since CPU always has direct access to the
pages.)  Or am I missing something about the original issue that
necessitated set_need_resched()?

BR,
-R


On Thu, Sep 12, 2013 at 11:57 AM, Daniel Vetter  wrote:
> This is just a remnant from the old days when our reset handling was
> horribly racy, suffered from terribly locking issues and often happily
> live-locked. Those days are now gone so we can drop the hacks and just
> rip the reschedule-point out.
>
> Reported-by: Peter Zijlstra 
> Cc: Peter Zijlstra 
> Cc: Linux Kernel Mailing List 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d80f33d..3871060 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1389,14 +1389,11 @@ out:
> if (i915_terminally_wedged(_priv->gpu_error))
> return VM_FAULT_SIGBUS;
> case -EAGAIN:
> -   /* Give the error handler a chance to run and move the
> -* objects off the GPU active list. Next time we service the
> -* fault, we should be able to transition the page into the
> -* GTT without touching the GPU (and so avoid further
> -* EIO/EGAIN). If the GPU is wedged, then there is no issue
> -* with coherency, just lost writes.
> +   /*
> +* EAGAIN means the gpu is hung and we'll wait for the error
> +* handler to reset everything when re-faulting in
> +* i915_mutex_lock_interruptible.
>  */
> -   set_need_resched();
> case 0:
> case -ERESTARTSYS:
> case -EINTR:
> --
> 1.8.4.rc3
>
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] memory: add a basic OF-based memory driver

2013-09-12 Thread Olof Johansson
On Thu, Sep 12, 2013 at 5:30 PM, Emilio López  wrote:
> This driver's only job is to claim and ensure the necessary clock
> for memory operation on a DT-powered machine remains enabled.
>
> Signed-off-by: Emilio López 
> ---
>
> I believe this new patch should resolve all the concerns raised; as
> always, all feedback is welcome :)

I think you're going about this the wrong way.

If you have a problem with a clock not staying on, shouldn't you just
marking it appropriately in the clock table instead, making sure it's
initialized with at least one reference to it? I believe that is how
some of the other platforms handle this, and it's a lot cleaner than
adding a fake binding and a fake driver just to grab a single clock.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] clk: si570: Add a driver for SI570 oscillators

2013-09-12 Thread Soren Brinkmann
Add a driver for SILabs 570, 571, 598, 599 programmable oscillators.
The devices generate low-jitter clock signals and are reprogrammable via
an I2C interface.

Cc: Guenter Roeck 
Signed-off-by: Soren Brinkmann 
---
 .../devicetree/bindings/clock/silabs,si570.txt |  31 ++
 drivers/clk/Kconfig|  10 +
 drivers/clk/Makefile   |   1 +
 drivers/clk/clk-si570.c| 546 +
 4 files changed, 588 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/silabs,si570.txt
 create mode 100644 drivers/clk/clk-si570.c

diff --git a/Documentation/devicetree/bindings/clock/silabs,si570.txt 
b/Documentation/devicetree/bindings/clock/silabs,si570.txt
new file mode 100644
index 000..099f0ee
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/silabs,si570.txt
@@ -0,0 +1,31 @@
+Binding for Silicon Labs 570, 571, 598 and 599  programmable
+I2C clock generators.
+
+Reference
+This binding uses the common clock binding[1]. Details about the devices can be
+found in the data sheets[2][3].
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
+[2] Si57x Data Sheet
+http://www.silabs.com/Support%20Documents/TechnicalDocs/si570.pdf
+[3] Si59x Data Sheet
+http://www.silabs.com/Support%20Documents/TechnicalDocs/si598-99.pdf
+
+Required properties:
+ - compatible: Shall be one of "silabs,si57x", "silabs,si59x".
+ - reg: I2C device address.
+ - #clock-cells: From common clock bindings: Shall be 0.
+ - factory-fout: Factory set default frequency
+
+Optional properties:
+ - initial-fout: Initial output frequency to set during probe
+ - temperature-stability-7ppm: Indicate a device with a temperature stability
+  of 7ppm
+
+Example:
+   si570: clock-generator@5d {
+   #clock-cells = <0>;
+   compatible = "silabs,si570";
+   reg = <0x5d>;
+   factory-fout = <15625>;
+   };
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 279407a..f5afabc 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -64,6 +64,16 @@ config COMMON_CLK_SI5351
  This driver supports Silicon Labs 5351A/B/C programmable clock
  generators.
 
+config COMMON_CLK_SI570
+   tristate "Clock driver for SiLabs 57x/59x"
+   depends on I2C
+   depends on OF
+   select REGMAP_I2C
+   help
+   ---help---
+ This driver supports Silicon Labs 570/571/598/599 programmable
+ clock generators.
+
 config COMMON_CLK_S2MPS11
tristate "Clock driver for S2MPS11 MFD"
depends on MFD_SEC_CORE
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index 7b11106..c0e94b3 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_COMMON_CLK_AXI_CLKGEN) += clk-axi-clkgen.o
 obj-$(CONFIG_COMMON_CLK_WM831X) += clk-wm831x.o
 obj-$(CONFIG_COMMON_CLK_MAX77686) += clk-max77686.o
 obj-$(CONFIG_COMMON_CLK_SI5351) += clk-si5351.o
+obj-$(CONFIG_COMMON_CLK_SI570) += clk-si570.o
 obj-$(CONFIG_COMMON_CLK_S2MPS11) += clk-s2mps11.o
 obj-$(CONFIG_CLK_TWL6040)  += clk-twl6040.o
 obj-$(CONFIG_CLK_PPC_CORENET)  += clk-ppc-corenet.o
diff --git a/drivers/clk/clk-si570.c b/drivers/clk/clk-si570.c
new file mode 100644
index 000..960d689
--- /dev/null
+++ b/drivers/clk/clk-si570.c
@@ -0,0 +1,546 @@
+/*
+ * Driver for Silicon Labs Si570/Si571 Programmable XO/VCXO
+ *
+ * Copyright (C) 2010, 2011 Ericsson AB.
+ * Copyright (C) 2011 Guenter Roeck.
+ * Copyright (C) 2011 - 2013 Xilinx Inc.
+ *
+ * Author: Guenter Roeck 
+ *Sören Brinkmann 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Si570 registers */
+#define SI570_REG_HS_N17
+#define SI570_REG_N1_RFREQ08
+#define SI570_REG_RFREQ1   9
+#define SI570_REG_RFREQ2   10
+#define SI570_REG_RFREQ3   11
+#define SI570_REG_RFREQ4   12
+#define SI570_REG_CONTROL  135
+#define SI570_REG_FREEZE_DCO   137
+#define SI570_DIV_OFFSET_7PPM  6
+
+#define HS_DIV_SHIFT   5
+#define HS_DIV_MASK0xe0
+#define HS_DIV_OFFSET  4
+#define N1_6_2_MASK0x1f
+#define N1_1_0_MASK0xc0
+#define RFREQ_37_32_MASK   0x3f
+
+#define SI570_FOUT_FACTORY_DFLT15625LL
+#define SI598_FOUT_FACTORY_DFLT1000LL
+
+#define SI570_MIN_FREQ 1000L
+#define SI570_MAX_FREQ 

Re: linux-next: manual merge of the akpm tree with Linus' tree

2013-09-12 Thread Linus Torvalds
On Tue, Sep 10, 2013 at 4:37 PM, Linus Torvalds
 wrote:
>
> From a quick look, this looks pretty broken:
>
> if (list_lru_add(>d_sb->s_dentry_lru, >d_lru))
> this_cpu_inc(nr_dentry_unused);
> dentry->d_flags |= DCACHE_LRU_LIST;
>
> because if that list_lru_add() can fail, then we shouldn't set the
> DCACHE_LRU_LIST bit either.
>
> That said, I don't see how it can fail. We only do this with the
> dentry locked, and when it's not already on the LRU list. So I think
> the "if()" is just misleading and unnecessary - but the code works.

So I thought you'd clean this up. Looking again, it still seems really
confused, and I'm finding actual bugs.

You don't clear the DCACHE_LRU_LIST when you remove dentries from the
d_lru list. In other cases (like shrink_dentry_list), you clear just
the DCACHE_SHRINK_LIST.

As a result, the "if ()" isn't necessarily unnecessary, but there are
actual bugs. It looks like the dentry can be removed from the d_lru
lists without the bit ever getting cleared, and if that happens, it
will never be moved back.

The rule for DCACHE_LRU_LIST was - and should be - that the bit is set
IFF the d_lru list is not empty. So it gets set when a dentry is moved
to the LRU lists, but it _stays_ set if the dentry is moved to the
shrink_list. It then gets cleared when the dentry is removed from any
d_lru list (ie "list_del_init()").

I'll walk through the code, it looked suspicious. Maybe there's
something subtle that makes it work, but I don't see it.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] SI570 clock driver

2013-09-12 Thread Soren Brinkmann
This is another try to submit a driver for SI570 programmable clock
generators. The heart of the driver is Günther's code from
https://github.com/groeck/si570, which is wrapped in the common
clock infrastructure.

Due to the lack of platforms with oder devices only the SI570 is
actually tested.

Thanks,
Sören

Soren Brinkmann (1):
  clk: si570: Add a driver for SI570 oscillators

 .../devicetree/bindings/clock/silabs,si570.txt |  31 ++
 drivers/clk/Kconfig|  10 +
 drivers/clk/Makefile   |   1 +
 drivers/clk/clk-si570.c| 546 +
 4 files changed, 588 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/silabs,si570.txt
 create mode 100644 drivers/clk/clk-si570.c

-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


increased vmap_area_lock contentions on "n_tty: Move buffers into n_tty_data"

2013-09-12 Thread Fengguang Wu
Hi Peter,

FYI, we noticed much increased vmap_area_lock contentions since this
commit:

commit 20bafb3d23d108bc0a896eb8b7c1501f4f649b77
Author: Peter Hurley 
Date:   Sat Jun 15 10:21:19 2013 -0400

n_tty: Move buffers into n_tty_data

Reduce pointer reloading and improve locality-of-reference;
allocate read_buf and echo_buf within struct n_tty_data.

Signed-off-by: Peter Hurley 
Signed-off-by: Greg Kroah-Hartman 

:04 04 96d92e4e242c4b2ff11b25c005bccd093865b350 
2822d87b2425c3e7adc7b722a20d739c9d4a3046 M  drivers
bisect run success

# bad: [9cdfea961eeaf9b6b939fb9dc822a204a1d94d08] Merge remote-tracking branch 
'ipvs-next/master' into devel-hourly-2013090112
# good: [d4e4ab86bcba5a72779c43dc1459f71fea3d89c8] Linux 3.11-rc5
git bisect start '9cdfea961eeaf9b6b939fb9dc822a204a1d94d08' 
'd4e4ab86bcba5a72779c43dc1459f71fea3d89c8' '--'
# good: [903bc7e8c2a4f9ef8e40aed615ed48023f9460e3] Merge remote-tracking branch 
'spi/for-next'
git bisect good 903bc7e8c2a4f9ef8e40aed615ed48023f9460e3
# bad: [d926318427c1fa8df182d3dcd7f802d7d0609225] Merge remote-tracking branch 
'bcon/master'
git bisect bad d926318427c1fa8df182d3dcd7f802d7d0609225
# bad: [adcb2bcd0dd2bb88f21180c8ffb9ec2549f4d883] Merge remote-tracking branch 
'usb/usb-next'
git bisect bad adcb2bcd0dd2bb88f21180c8ffb9ec2549f4d883
# good: [a83605266423437d8afe73cbd538df33cf8fb32b] Merge remote-tracking branch 
'driver-core/driver-core-next'
git bisect good a83605266423437d8afe73cbd538df33cf8fb32b
# good: [224563b6ce034b82f8511969d9496113da34fb2c] Merge tag 
'for-usb-next-2013-08-15' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci into usb-next
git bisect good 224563b6ce034b82f8511969d9496113da34fb2c
# bad: [5284eba7b37dfeffe75bfdf81a13e9efebe0480a] serial/arc-uart: Handle Rx 
Error Interrupts w/o any data
git bisect bad 5284eba7b37dfeffe75bfdf81a13e9efebe0480a
# bad: [7d88d637a3c75257b7611f9feee0c17fa187ecc9] n_tty: Factor standard 
per-char i/o into separate fn
git bisect bad 7d88d637a3c75257b7611f9feee0c17fa187ecc9
# good: [11b9faa44df76189b8346ff602a2c01c610c37eb] tty: Merge tty_buffer_find() 
into tty_buffer_alloc()
git bisect good 11b9faa44df76189b8346ff602a2c01c610c37eb
# good: [019ebdf9f26fd2e43b9e1af576835183e95dc82e] n_tty: Eliminate echo_commit 
memory barrier
git bisect good 019ebdf9f26fd2e43b9e1af576835183e95dc82e
# bad: [1bb9d562856279a217a4e581a803dee9bb48a708] n_tty: Rename 
process_char_map to char_map
git bisect bad 1bb9d562856279a217a4e581a803dee9bb48a708
# good: [9dfd16ddea9bdbc8343340e543732db0a467ae32] n_tty: Avoid false-sharing 
echo buffer indices
git bisect good 9dfd16ddea9bdbc8343340e543732db0a467ae32
# good: [8cb06c983822103da1cfe57b9901e60a00e61f67] n_tty: Remove alias ptrs in 
__receive_buf()
git bisect good 8cb06c983822103da1cfe57b9901e60a00e61f67
# bad: [20bafb3d23d108bc0a896eb8b7c1501f4f649b77] n_tty: Move buffers into 
n_tty_data
git bisect bad 20bafb3d23d108bc0a896eb8b7c1501f4f649b77
# first bad commit: [20bafb3d23d108bc0a896eb8b7c1501f4f649b77] n_tty: Move 
buffers into n_tty_data


Compare of all good/bad commits in this bisect:

lock_stat.vmap_area_lock.contentions

   22000 ++-+
   2 ++OO O OO O O O OO O O   OO O O O OO   O O O O  O O O O|
 O   OO   OOO OO|
   18000 ++ |
   16000 ++ |
 |  |
   14000 ++ |
   12000 ++ |
   1 ++ |
 |  |
8000 ++ |
6000 ++ |
 |  |
4000 *+*.**.*.*.*.**.*.*.*.*.**.*.*.*.**.*.*.*.**.*.*.*.*.**.*.*.*.**.*.*
2000 ++-+


lock_stat.vmap_area_lock.contentions.get_vmalloc_info

   45000 ++-+
 | O   OOO  O   O  O|
   4 O+  OO O O O OO O O O   OO   O O OO O O   OO O   O   OO O   O O|
   35000 ++ |
 |  |
   3 ++ |
 |  |
   25000 ++   

Re: [RFC PATCH 1/1] module: Make wait module's refcount to zero procedure as async

2013-09-12 Thread Rusty Russell
Peter Chen  writes:
> Currently, if module's refcount is not zero during the unload,
> it waits there until the user decreases that refcount.

Hi Peter,

In practice userspace uses O_NONBLOCK.  In fact, I've been
thinking of removing the blocking case altogether, since it's not really
what people want.

That would solve your problem and make the code simpler.  Thoughts?

Cheers,
Rusty.

> Assume
> we have two modules (A & B), there are no symbol relationship
> between each other. module B is module A's user, if the end
> user tries to unload module A first wrongly, it will stop there
> until there is another user process to unload B, and this
> process can't be killed.
> One use case is: the QA engineers do error test, they unload
> module wrongly on purpose, after that, they find the console
> is stopped there, and they can't do any thing go on.
>
> Signed-off-by: Peter Chen 
> ---
>  include/linux/module.h |4 +-
>  kernel/module.c|   61 ++-
>  2 files changed, 36 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 05f2447..12edf07 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -367,8 +367,8 @@ struct module
>   /* What modules do I depend on? */
>   struct list_head target_list;
>  
> - /* Who is waiting for us to be unloaded */
> - struct task_struct *waiter;
> + /* The work for waiting refcount to zero */
> + struct work_struct wait_refcount_work;
>  
>   /* Destruction function. */
>   void (*exit)(void);
> diff --git a/kernel/module.c b/kernel/module.c
> index dc58274..94abc7e 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -61,6 +61,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "module-internal.h"
>  
> @@ -644,8 +645,6 @@ static int module_unload_init(struct module *mod)
>  
>   /* Hold reference count during initialization. */
>   __this_cpu_write(mod->refptr->incs, 1);
> - /* Backwards compatibility macros put refcount during init. */
> - mod->waiter = current;
>  
>   return 0;
>  }
> @@ -813,19 +812,38 @@ EXPORT_SYMBOL(module_refcount);
>  /* This exists whether we can unload or not */
>  static void free_module(struct module *mod);
>  
> +/* Final destruction now no one is using it. */
> +static void module_final_free(struct module *mod)
> +{
> + if (mod->exit != NULL)
> + mod->exit();
> + blocking_notifier_call_chain(_notify_list,
> +  MODULE_STATE_GOING, mod);
> + async_synchronize_full();
> +
> + /* Store the name of the last unloaded module for diagnostic purposes */
> + strlcpy(last_unloaded_module, mod->name, sizeof(last_unloaded_module));
> +
> + free_module(mod);
> +}
> +
>  static void wait_for_zero_refcount(struct module *mod)
>  {
> - /* Since we might sleep for some time, release the mutex first */
> - mutex_unlock(_mutex);
>   for (;;) {
>   pr_debug("Looking at refcount...\n");
> - set_current_state(TASK_UNINTERRUPTIBLE);
>   if (module_refcount(mod) == 0)
>   break;
> - schedule();
> + msleep(1000);
>   }
> - current->state = TASK_RUNNING;
> - mutex_lock(_mutex);
> + module_final_free(mod);
> +}
> +
> +static void wait_module_refcount(struct work_struct *work)
> +{
> + struct module *mod = container_of(work,
> + struct module, wait_refcount_work);
> +
> + wait_for_zero_refcount(mod);
>  }
>  
>  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
> @@ -859,8 +877,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, 
> name_user,
>  
>   /* Doing init or already dying? */
>   if (mod->state != MODULE_STATE_LIVE) {
> - /* FIXME: if (force), slam module count and wake up
> -   waiter --RR */
> + /* FIXME: if (force), slam module count */
>   pr_debug("%s already dying\n", mod->name);
>   ret = -EBUSY;
>   goto out;
> @@ -876,30 +893,23 @@ SYSCALL_DEFINE2(delete_module, const char __user *, 
> name_user,
>   }
>   }
>  
> - /* Set this up before setting mod->state */
> - mod->waiter = current;
> -
>   /* Stop the machine so refcounts can't move and disable module. */
>   ret = try_stop_module(mod, flags, );
>   if (ret != 0)
>   goto out;
>  
>   /* Never wait if forced. */
> - if (!forced && module_refcount(mod) != 0)
> - wait_for_zero_refcount(mod);
> + if (!forced && module_refcount(mod) != 0) {
> + INIT_WORK(>wait_refcount_work, wait_module_refcount);
> + schedule_work(>wait_refcount_work);
> + ret = -EBUSY;
> + goto out;
> + }
>  
>   mutex_unlock(_mutex);
> - /* Final destruction now no one is using it. */
> - if 

Re: [PATCH] modules: add support for soft module dependencies

2013-09-12 Thread Rusty Russell
Lucas De Marchi  writes:
> On Wed, Jul 24, 2013 at 11:03 PM, Herbert Xu
>  wrote:
>> On Thu, Jul 25, 2013 at 09:32:02AM +0930, Rusty Russell wrote:
>>> Herbert Xu  writes:
>>> > Hi Rusty:
>>> >
>>> > I don't know why this patch never went into the kernel, even
>>> > though the corresponding features have been added to modprobe
>>> > in most if not all distros.
>>>
>>> Because Andreas never sent me the patch?  This is the first I've *heard*
>>> of this feature.  Looks like it didn't hit lkml either.  And what was
>>> 2/2?
>>
>> 2/2 was the patch to actually use this in crc32c.
>>
>>> It's not how I would have done this: post-deps are more flexibly done at
>>> runtime, because the module may have to do work to figure out what to
>>> pull in.  But since it already exists, I'll apply this patch: it doesn't
>>> cost the kernel anything.
>
> But it did cause boot failures. The file modules.softdep file was
> supposed to be informational until now. That's why depmod put a
> comment saying to "copy on user's discretion to /etc/modules.d"
> instead of parsing it directly.

I'm happy to change this macro to create a modinfo line like
"softdep:"

ie. tools like mkinitrd could pick it up and try to find a matching
module, but depmod would ignore it.

It's really up to Lucas, since this affects him.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] f2fs: optimize the victim searching loop slightly

2013-09-12 Thread Jin Xu
Since the MAX_VICTIM_SEARCH has been enlarged from 20 to 4096,
the victim searching overhead will be increased much than before,
especially for SSR that searches victim for use quiet often.
This patch intends to reduce the overhead a little bit by:
- make the get_gc_cost a inline routine to reduce function call
  overhead
- reduce multiplication and division operations
- reduce unnecessary comparison operation

Signed-off-by: Jin Xu 
---
 fs/f2fs/gc.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index a78b8e3..14187bf 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -221,8 +221,8 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, 
unsigned int segno)
return UINT_MAX - ((100 * (100 - u) * age) / (100 + u));
 }
 
-static unsigned int get_gc_cost(struct f2fs_sb_info *sbi, unsigned int segno,
-   struct victim_sel_policy *p)
+static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
+   unsigned int segno, struct victim_sel_policy *p)
 {
if (p->alloc_mode == SSR)
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
@@ -278,7 +278,11 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
}
break;
}
-   p.offset = ((segno / p.ofs_unit) * p.ofs_unit) + p.ofs_unit;
+
+   p.offset = segno + p.ofs_unit;
+   if (p.ofs_unit > 1)
+   p.offset -= segno % p.ofs_unit;
+
secno = GET_SECNO(sbi, segno);
 
if (sec_usage_check(sbi, secno))
@@ -291,10 +295,9 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (p.min_cost > cost) {
p.min_segno = segno;
p.min_cost = cost;
-   }
-
-   if (cost == max_cost)
+   } else if (unlikely(cost == max_cost)) {
continue;
+   }
 
if (nsearched++ >= p.max_search) {
sbi->last_victim[p.gc_mode] = segno;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix coalescing of host bridge windows in arch/x86/pci/acpi.c

2013-09-12 Thread Alexey Neyman
Hi all,

I have a board with a BIOS bug that reports the following I/O port regions in 
_CRS on one of the host bridges:

0x-0x03af // #0
0x03e0-0x0cf7 // #1
0x03b0-0x03bb // #2
0x03c0-0x03df // #3
0x-0xdfff // #4
0xf000-0x // #5

Obviously, region number #4 is erroneous as it overlaps with regions #0..3. 
However, code in coalesce_windows() in arch/x86/pci/acpi.c attempts to recover 
from such kind of BIOS bugs by merging the overlapping regions. Current code 
expands region #0 to 0x-0xd and makes region #4 ignored. As a result, 
overlap of the expanded region #0 with regions #1..3 remains undetected. As a 
result, regions #1..3 are inserted into the resource tree as if they were 
consumers of the 0x-0xdfff regions, and devices that have resources in one 
of these regions (e.g. 0x3f6 for legacy IDE) have a resource conflict - the 
kernel does not initialize them.

The attached patch makes the code in coalesce_windows() instead ignore res1 
(which is already dealt with), possibly expanding res2 instead. As res2 has 
not been reached in the outer loop in coalesce_windows(), the code will then 
check for overlaps of the just-expanded resource with the rest of the 
resources.

Regards,
Alexey.
--- arch/x86/pci/acpi.c.orig	2013-09-12 16:23:06.113813150 -0700
+++ arch/x86/pci/acpi.c	2013-09-12 16:23:56.605813117 -0700
@@ -357,12 +357,12 @@
 			 * the kernel resource tree doesn't allow overlaps.
 			 */
 			if (resource_overlaps(res1, res2)) {
-res1->start = min(res1->start, res2->start);
-res1->end = max(res1->end, res2->end);
+res2->start = min(res1->start, res2->start);
+res2->end = max(res1->end, res2->end);
 dev_info(>bridge->dev,
 	 "host bridge window expanded to %pR; %pR ignored\n",
-	 res1, res2);
-res2->flags = 0;
+	 res2, res1);
+res1->flags = 0;
 			}
 		}
 	}


Re: [RFC PATCH 1/4] DRIVERS: IRQCHIP: Add crossbar irqchip driver

2013-09-12 Thread Thomas Gleixner
On Thu, 12 Sep 2013, Santosh Shilimkar wrote:
> On Thursday 12 September 2013 06:22 PM, Thomas Gleixner wrote:
> > Now the real question is, how that expansion mechanism is supposed to
> > work. There are two possible scenarios:
> > 
> > 1) Expand the number of handled interrupts beyond the GIC capacity:
> > 
> >That requires a mechanism in CROSSBAR to map several CROSSBAR
> >interrupts to a particular GIC interrupt and provide a demux
> >mechanism to invoke the shared handlers.
> > 
> This is not possible in hardware and not supported. Hardware has
> no notion of muxing multiple IRQ's to generate 1 IRQ or ack etc
> functionality. Its a simple MUX to tie knots between input and output
> wires.

It's not a MUX. It's a ROUTING mechanism. That's similar to the
mechanisms which are used by MSI[X]. We assign arbitrary interrupt
numbers to a device and route them to some underlying limited hardware
interrupt controller.

> > 2) Provide a mapping mechanism between possibly 250 interrupt numbers
> >and a limitation of a total 160 active interrupts by the underlying
> >GIC.
> > 
> This is the need and problem we are trying to solve.

Let me summarize:

   - GIC supports up to 160 interrupts

   - CROSSBAR supports up to 250 interrupts 

   - CROSSBAR routes up to 160 out of 250 interrupts to the GIC ones

   - Drivers request a CROSSBAR interrupt number which must be mapped
 to some arbitrary available GIC irq number

So basically the CROSSBAR mechanism is pretty much the same as MSI[X]
just in a different flavour and with a different set of semantics and
limitations, i.e. poor mans MSI[X] with a new level of bogosity.

So if CROSSBAR is going to be the new fangled SoC MSI[X] long term
equivalent then you better provide some infrastructure for that and
make the drivers ready to use it. Maybe check with the PCI/MSI folks
to share some of the interfaces.

If that whole thing is another onetime HW designers wet dream, then
please go back to the limited but completely functional (Who is going
to use more than 160 peripheral interrupts) device tree model. I
really have no interest to support hardware designer brain farts.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Staging: lustre: fix assignment in if condition in fsfilt.c

2013-09-12 Thread Jon Bernard
This is a patch to the fsfilt.c file that fixes up three assignment in if
condition errors found by the checkpatch.pl tool.

Signed-off-by: Jon Bernard 
---
 drivers/staging/lustre/lustre/lvfs/fsfilt.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lvfs/fsfilt.c 
b/drivers/staging/lustre/lustre/lvfs/fsfilt.c
index efc1e52..0d6ed69 100644
--- a/drivers/staging/lustre/lustre/lvfs/fsfilt.c
+++ b/drivers/staging/lustre/lustre/lvfs/fsfilt.c
@@ -61,7 +61,8 @@ int fsfilt_register_ops(struct fsfilt_operations *fs_ops)
struct fsfilt_operations *found;
 
/* lock fsfilt_types list */
-   if ((found = fsfilt_search_type(fs_ops->fs_type))) {
+   found = fsfilt_search_type(fs_ops->fs_type);
+   if (found) {
if (found != fs_ops) {
CERROR("different operations for type %s\n",
   fs_ops->fs_type);
@@ -102,14 +103,16 @@ struct fsfilt_operations *fsfilt_get_ops(const char *type)
struct fsfilt_operations *fs_ops;
 
/* lock fsfilt_types list */
-   if (!(fs_ops = fsfilt_search_type(type))) {
+   fs_ops = fsfilt_search_type(type);
+   if (!fs_ops) {
char name[32];
int rc;
 
snprintf(name, sizeof(name) - 1, "fsfilt_%s", type);
name[sizeof(name) - 1] = '\0';
 
-   if (!(rc = request_module("%s", name))) {
+   rc = request_module("%s", name);
+   if (!rc) {
fs_ops = fsfilt_search_type(type);
CDEBUG(D_INFO, "Loaded module '%s'\n", name);
if (!fs_ops)
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/16] 3.11.1-stable review

2013-09-12 Thread Guenter Roeck

On 09/12/2013 04:07 PM, Greg Kroah-Hartman wrote:

On Thu, Sep 12, 2013 at 03:37:10PM -0700, Guenter Roeck wrote:

On 09/12/2013 11:14 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 3.11.1 release.
There are 16 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat Sep 14 18:10:15 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.11.1-rc1.gz
and the diffstat can be found below.



Test results:
total: 110 pass: 107 skipped: 2 fail: 1

The failure is xtensa:allmodconfig. A patch fixing the problem has been 
submitted,
but is not yet available upstream.


What was skipped?



Ah, someone noticed :).

Builds are listed as "skipped" if the configuration step fails. Usually
that means that the configuration does not exist, though in rare cases
it may fail for other reasons, which is why I still list it. For 3.11,
skipped builds are arm:ap4evb_defconfig and arm:bonito_defconfig,
both of which have been removed from 3.11.

For qemu tests, skipped means that the test is known to fail with
the affected kernel version, typically because qemu and that kernel
version don't like each other, so I don't bother trying.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/1] X86: Hyper-V: Get the local APIC timer frequency from the hypervisor

2013-09-12 Thread H. Peter Anvin
On 09/12/2013 05:06 PM, KY Srinivasan wrote:
> 
> Peter,
> 
> Let me know if you want me to address any additional issues in this patch.
> 

Please address Jan and Gleb's feedback.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2][usbutils] lsusb: port to hwdb

2013-09-12 Thread Tom Gundersen
On Fri, Sep 13, 2013 at 12:47 AM, Greg Kroah-Hartman
 wrote:
> On Wed, Sep 04, 2013 at 02:47:07PM +0200, Tom Gundersen wrote:
>> Most of the information in usb.ids is now contained in udev's hwdb. Read the
>> information from the hwdb instead of usb.ids.
>>
>> This would allow distributions to no longer ship (most of) usb.ids by 
>> default,
>> but rather keep all the usb device information in the hwdb.
>>
>> This patch introduces a dependency on libusb >= 196.
>
> You mean 'libudev' here, right?  I don't think libusb has that version
> number quite yet :)

Hehe, precisely :-)

-t
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hpet, allow user controlled mmap for user processes

2013-09-12 Thread Prarit Bhargava


On 08/29/2013 02:01 AM, Matt Wilson wrote:
> On Fri, Mar 22, 2013 at 09:32:54AM -0400, Prarit Bhargava wrote:
>> The CONFIG_HPET_MMAP Kconfig option exposes the memory map of the HPET
>> registers to userspace.  The Kconfig help points out that in some cases this
>> can be a security risk as some systems may erroneously configure the map such
>> that additional data is exposed to userspace.
>>
>> This is a problem for distributions -- some users want the MMAP functionality
>> but it comes with a significant security risk.  In an effort to mitigate this
>> risk, and due to the low number of users of the MMAP functionality, I've
>> introduced a kernel parameter, hpet_mmap_enable, that is required in order
>> to actually have the HPET MMAP exposed.
>>
>> [v2]: Clemens suggested modifying the Kconfig help text and making the
>> default setting configurable.
>> [v3]: Fixed up Documentation and Kconfig entries, default now "Y"
>> [v4]: After testing, found that I need to modify CONFIG_HPET_MMAP_DEFAULT 
>> usage
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Clemens Ladisch 
>> ---
>>  Documentation/kernel-parameters.txt |4 
>>  drivers/char/Kconfig|9 +++--
>>  drivers/char/hpet.c |   25 +++--
>>  3 files changed, 34 insertions(+), 4 deletions(-)
> 
> It doesn't seem like this patch got picked up and seems like a good
> idea to me. Clemens, what do you think?
> 
> Acked-by: Matt Wilson 
> 

Clemens?  I didn't see a reply...

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] memory: add a basic OF-based memory driver

2013-09-12 Thread Emilio López
This driver's only job is to claim and ensure the necessary clock
for memory operation on a DT-powered machine remains enabled.

Signed-off-by: Emilio López 
---

I believe this new patch should resolve all the concerns raised; as
always, all feedback is welcome :)

Changes from RFC:
- Move from drivers/of to drivers/memory
- Make a proper driver instead of using an initcall
- Binding document for the new "simple-memory-controller"

 .../simple-memory-controller.txt   | 19 
 drivers/memory/Kconfig | 11 +
 drivers/memory/Makefile|  1 +
 drivers/memory/simple-mc.c | 57 ++
 4 files changed, 88 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/simple-memory-controller.txt
 create mode 100644 drivers/memory/simple-mc.c

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/simple-memory-controller.txt
 
b/Documentation/devicetree/bindings/memory-controllers/simple-memory-controller.txt
new file mode 100644
index 000..d37683b
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/memory-controllers/simple-memory-controller.txt
@@ -0,0 +1,19 @@
+Device Tree Clock binding for a simple memory controller.
+
+Required properties:
+- compatible : shall be "simple-memory-controller"
+
+Optional properties:
+- reg: may contain the register space for the controller. This
+   property is currently ignored by the driver
+- clocks : may contain a phandle to the clock that is currently being
+   used on the controller. This clock shall remain enabled
+   during system operation.
+
+Example:
+
+mc: mc@0123000 {
+   compatible = "simple-memory-controller";
+   reg = <0x0123000 0x400>;
+   clocks = < 1>;
+};
diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 29a11db..4a6df65 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -50,4 +50,15 @@ config TEGRA30_MC
  analysis, especially for IOMMU/SMMU(System Memory Management
  Unit) module.
 
+config SIMPLE_MC
+   bool "Simple memory controller"
+   default y
+   depends on OF && COMMON_CLK
+   help
+ This driver is able to manage a simple memory controller whose
+ only needs consist of keeping one clock enabled. The
+ controller must be defined on the device tree as compatible
+ with "simple-memory-controller"; see the corresponding binding
+ document for more details.
+
 endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index 969d923..e0953e5 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_TI_EMIF)   += emif.o
 obj-$(CONFIG_MVEBU_DEVBUS) += mvebu-devbus.o
 obj-$(CONFIG_TEGRA20_MC)   += tegra20-mc.o
 obj-$(CONFIG_TEGRA30_MC)   += tegra30-mc.o
+obj-$(CONFIG_SIMPLE_MC)+= simple-mc.o
diff --git a/drivers/memory/simple-mc.c b/drivers/memory/simple-mc.c
new file mode 100644
index 000..e58371d
--- /dev/null
+++ b/drivers/memory/simple-mc.c
@@ -0,0 +1,57 @@
+/*
+ * Simple memory controller driver
+ *
+ * Copyright 2013 Emilio López 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static int simple_mc_probe(struct platform_device *pdev)
+{
+   struct device_node *np = pdev->dev.of_node;
+   struct clk *clk;
+
+   if (!np)
+   return -ENODEV;
+
+   clk = of_clk_get(np, 0);
+   if (!IS_ERR(clk)) {
+   clk_prepare_enable(clk);
+   clk_put(clk);
+   }
+
+   return 0;
+}
+
+static const struct of_device_id simple_mc_of_match[] = {
+   { .compatible = "simple-memory-controller", },
+   { /* sentinel */ },
+};
+
+static struct platform_driver simple_mc_driver = {
+   .probe = simple_mc_probe,
+   .driver = {
+   .name = "simple-mc",
+   .owner = THIS_MODULE,
+   .of_match_table = of_match_ptr(simple_mc_of_match),
+   },
+};
+
+module_platform_driver(simple_mc_driver);
+
+MODULE_AUTHOR("Emilio López ");
+MODULE_DESCRIPTION("Simple memory controller driver");
+MODULE_LICENSE("GPL");
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Heads up about 3.12 merge window..

2013-09-12 Thread Linus Torvalds
So this is just a heads-up that while I haven't actually closed the
merge window yet, I was really close to deciding to just make this one
a short one. I'm going to be on the road for the next few days and
then at LinuxCon US later next week, and I've tried very hard to merge
everything I had pending in time for that (despite my machine
mishaps).

And I have indeed merged everything that's come in, as far as I can
tell from the current state of my mailbox.

Now, what that means is that if you are a maintainer that hasn't
already sent in your pull request for 3.12, you had better have a good
excuse for why you left it until the last few days. On the other hand,
if you _have_ sent me a pull reqeust, and it hasn't shown up, ping me
separately, we've had those things caught in spam boxes etc, and I
might just have missed it.

Because if you don't have a good excuse and it's a big pull with lots
of development that you just left until very late in the merge window
to make it inconvenient for me, it's going to wait until 3.13.

Here's to hoping everybody just goes "Whew, good that I sent my pull
request early".

And in fact there really shouldn't be anything pending afaik. I've
already merged more commits than was in linux-next.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Staging: lustre: fix a brace coding style issue in fsfilt.c

2013-09-12 Thread Jon Bernard
This is a patch to the fsfilt.c file that fixes up a brace warning found by the
checkpatch.pl tool.

Signed-off-by: Jon Bernard 
---
 drivers/staging/lustre/lustre/lvfs/fsfilt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lvfs/fsfilt.c 
b/drivers/staging/lustre/lustre/lvfs/fsfilt.c
index e86df73..efc1e52 100644
--- a/drivers/staging/lustre/lustre/lvfs/fsfilt.c
+++ b/drivers/staging/lustre/lustre/lvfs/fsfilt.c
@@ -50,9 +50,8 @@ static struct fsfilt_operations *fsfilt_search_type(const 
char *type)
 
list_for_each(p, _types) {
found = list_entry(p, struct fsfilt_operations, fs_list);
-   if (!strcmp(found->fs_type, type)) {
+   if (!strcmp(found->fs_type, type))
return found;
-   }
}
return NULL;
 }
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Unusually high system CPU usage with recent kernels

2013-09-12 Thread Paul E. McKenney
On Wed, Sep 11, 2013 at 08:46:04AM +0200, Tibor Billes wrote:
> > From: Paul E. McKenney Sent: 09/09/13 10:44 PM
> > On Mon, Sep 09, 2013 at 09:47:37PM +0200, Tibor Billes wrote:
> > > > From: Paul E. McKenney Sent: 09/08/13 08:43 PM
> > > > On Sun, Sep 08, 2013 at 07:22:45PM +0200, Tibor Billes wrote:
> > > > > > From: Paul E. McKenney Sent: 09/07/13 02:23 AM
> > > > > > On Tue, Sep 03, 2013 at 03:16:07PM -0700, Paul E. McKenney wrote:
> > > > > > > On Tue, Sep 03, 2013 at 11:11:01PM +0200, Tibor Billes wrote:
> > > > > > > > > From: Paul E. McKenney Sent: 08/30/13 03:24 AM
> > > > > > > > > On Tue, Aug 27, 2013 at 10:05:42PM +0200, Tibor Billes wrote:
> > > > > > > > > > From: Paul E. McKenney Sent: 08/26/13 06:28 AM
> > > > > > > > > > > Here is a patch that is more likely to help. I am testing 
> > > > > > > > > > > it in parallel,
> > > > > > > > > > > but figured I should send you a sneak preview.
> > > > > > > > > > 
> > > > > > > > > > I tried it, but I don't see any difference in overall 
> > > > > > > > > > performance. The dstat
> > > > > > > > > > also shows the same as before.
> > > > > > > > > > 
> > > > > > > > > > But I did notice something. Occasionally there is an 
> > > > > > > > > > increase in userspace
> > > > > > > > > > CPU usage, interrupts and context switches are dropping, 
> > > > > > > > > > and it really gets
> > > > > > > > > > more work done (scons printed commands more frequently).  I 
> > > > > > > > > > checked that
> > > > > > > > > > this behaviour is present without your patch, I just didn't 
> > > > > > > > > > notice this
> > > > > > > > > > before. Maybe you can make some sense out of it.
> > > > > > > > > > 
> > > > > > > > > > system total-cpu-usage -dsk/total- 
> > > > > > > > > > -net/total- ---paging-- ---system-- swap--- 
> > > > > > > > > > --memory-usage- -virtual-memory
> > > > > > > > > >     time     |usr sys idl wai hiq siq| read  writ| recv  
> > > > > > > > > > send|  in   out | int   csw | used  free| used  buff  cach  
> > > > > > > > > > free|majpf minpf alloc  free
> > > > > > > > > > 27-08 20:51:53| 23  62   5   0  11   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1274  3102k|   0  7934M| 549M 56.0M  491M 
> > > > > > > > > > 6698M|   0    28   156   159 
> > > > > > > > > > 27-08 20:51:54| 24  64   1   0  11   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1317  3165k|   0  7934M| 549M 56.0M  491M 
> > > > > > > > > > 6698M|   0    53   189   182 
> > > > > > > > > > 27-08 20:51:55| 33  50   6   2   9   0| 192k 1832k|   0     
> > > > > > > > > > 0 |   0     0 |1371  2442k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6702M|   0    30k   17k   17k
> > > > > > > > > > 27-08 20:51:56| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1313  3220k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0    21   272   232 
> > > > > > > > > > 27-08 20:51:57| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1319  3226k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0     8    96   112 
> > > > > > > > > > 27-08 20:51:58| 25  63   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1317  3224k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0    12   145   141 
> > > > > > > > > > 27-08 20:51:59| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1317  3223k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0    54   193   191 
> > > > > > > > > > 27-08 20:52:00| 25  63   0   0  12   0|   0    24k|   0     
> > > > > > > > > > 0 |   0     0 |1336  3216k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0    36   161   172 
> > > > > > > > > > 27-08 20:52:01| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1313  3225k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0     9   107   107 
> > > > > > > > > > 27-08 20:52:02| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1327  3224k|   0  7934M| 545M 56.0M  492M 
> > > > > > > > > > 6701M|   0    13   193   200 
> > > > > > > > > > 27-08 20:52:03| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1311  3226k|   0  7934M| 545M 56.0M  492M 
> > > > > > > > > > 6701M|   0    13   114   114 
> > > > > > > > > > 27-08 20:52:04| 25  63   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1331  3223k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0    41   190   178 
> > > > > > > > > > 27-08 20:52:05| 24  64   0   0  12   0|   0  8192B|   0     
> > > > > > > > > > 0 |   0     0 |1315  3222k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|   0    30   123   122 
> > > > > > > > > > 27-08 20:52:06| 24  64   0   0  12   0|   0     0 |   0     
> > > > > > > > > > 0 |   0     0 |1314  3223k|   0  7934M| 544M 56.0M  492M 
> > > > > > > > > > 6701M|  

RE: [PATCH V2 1/1] X86: Hyper-V: Get the local APIC timer frequency from the hypervisor

2013-09-12 Thread KY Srinivasan


> -Original Message-
> From: K. Y. Srinivasan [mailto:k...@microsoft.com]
> Sent: Tuesday, September 03, 2013 11:30 AM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; t...@linutronix.de; h...@zytor.com;
> jbeul...@suse.com; b...@alien8.de
> Cc: KY Srinivasan
> Subject: [PATCH V2 1/1] X86: Hyper-V: Get the local APIC timer frequency from
> the hypervisor
> 
> Hyper-V supports a mechanism for retrieving the local APIC frequency.Use this
> and bypass
> the calibration code in the kernel. This would allow us to boot the Linux 
> kernel as
> a
> "modern VM" on Hyper-V where many of the legacy devices (such as PIT) are not
> emulated.
> 
> I would like to thank Olaf Hering , Jan Beulich
>  and
> H. Peter Anvin  for their help in this effort.
> 
> In this version of the patch, I have addressed Jan's comments.
> 
> Signed-off-by: K. Y. Srinivasan 
> ---
>  arch/x86/include/uapi/asm/hyperv.h |   19 +++
>  arch/x86/kernel/cpu/mshyperv.c |   24 
>  2 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/hyperv.h
> b/arch/x86/include/uapi/asm/hyperv.h
> index b80420b..b8f1c01 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -27,6 +27,19 @@
>  #define HV_X64_MSR_VP_RUNTIME_AVAILABLE  (1 << 0)
>  /* Partition Reference Counter (HV_X64_MSR_TIME_REF_COUNT) available*/
>  #define HV_X64_MSR_TIME_REF_COUNT_AVAILABLE  (1 << 1)
> +
> +/*
> + * There is a single feature flag that signifies the presence of the MSR
> + * that can be used to retrieve both the local APIC Timer frequency as
> + * well as the TSC frequency.
> + */
> +
> +/* Local APIC timer frequency MSR (HV_X64_MSR_APIC_FREQUENCY) is
> available */
> +#define HV_X64_MSR_APIC_FREQUENCY_AVAILABLE (1 << 11)
> +
> +/* TSC frequency MSR (HV_X64_MSR_TSC_FREQUENCY) is available */
> +#define HV_X64_MSR_TSC_FREQUENCY_AVAILABLE (1 << 11)
> +
>  /*
>   * Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM
>   * and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available
> @@ -136,6 +149,12 @@
>  /* MSR used to read the per-partition time reference counter */
>  #define HV_X64_MSR_TIME_REF_COUNT0x4020
> 
> +/* MSR used to retrieve the TSC frequency */
> +#define HV_X64_MSR_TSC_FREQUENCY 0x4022
> +
> +/* MSR used to retrieve the local APIC timer frequency */
> +#define HV_X64_MSR_APIC_FREQUENCY0x4023
> +
>  /* Define the virtual APIC registers */
>  #define HV_X64_MSR_EOI   0x4070
>  #define HV_X64_MSR_ICR   0x4071
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 71a39f3..b3dc639 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -23,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
> @@ -67,6 +69,8 @@ static struct clocksource hyperv_cs = {
> 
>  static void __init ms_hyperv_init_platform(void)
>  {
> + u64 hv_lapic_frequency;
> +
>   /*
>* Extract the features and hints
>*/
> @@ -76,6 +80,26 @@ static void __init ms_hyperv_init_platform(void)
>   printk(KERN_INFO "HyperV: features 0x%x, hints 0x%x\n",
>  ms_hyperv.features, ms_hyperv.hints);
> 
> + if (ms_hyperv.features & HV_X64_MSR_APIC_FREQUENCY_AVAILABLE) {
> + /*
> +  * Get the APIC frequency.
> +  */
> + rdmsrl(HV_X64_MSR_APIC_FREQUENCY, hv_lapic_frequency);
> + hv_lapic_frequency /= HZ;
> + lapic_timer_frequency = hv_lapic_frequency;
> + printk(KERN_INFO "HyperV: LAPIC Timer Frequency: %#x\n",
> + lapic_timer_frequency);
> +
> + /*
> +  * On Hyper-V, when we are booting off an EFI firmware stack,
> +  * we do not have many legacy devices including PIC, PIT etc.
> +  */
> + if (efi_enabled(EFI_BOOT)) {
> + printk(KERN_INFO "HyperV: Using null_legacy_pic\n");
> + legacy_pic = _legacy_pic;
> + }
> + }
> +
>   if (ms_hyperv.features & HV_X64_MSR_TIME_REF_COUNT_AVAILABLE)
>   clocksource_register_hz(_cs, NSEC_PER_SEC/100);
>  }
> --
> 1.7.4.1

Peter,

Let me know if you want me to address any additional issues in this patch.


Regards,

K. Y
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages

2013-09-12 Thread Skidmore, Donald C
> -Original Message-
> From: Bjorn Helgaas [mailto:bhelg...@google.com]
> Sent: Thursday, September 12, 2013 3:27 PM
> To: Skidmore, Donald C
> Cc: e1000-de...@lists.sourceforge.net; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Tue, Aug 27, 2013 at 5:01 PM, Bjorn Helgaas 
> wrote:
> 
> > I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
> > driver from
> > http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> > ...
> > Sorry for wasting so much time on something that appears to be already
> fixed.
> 
> I just tried the brand-new v3.11, and the usual, trivial:
> 
>   # echo -n 8 > /sys/bus/pci/devices/:04:00.0/sriov_numvfs
> 
> was enough to blow up my box the same old boring way (infinite string of
> "ixgbevf :04:11.0: Last Request of type 03 to PF Nacked"
> messages.
> 
> I guess this is because v3.11 still includes the 2.7.12-k ixgbevf driver, not 
> the
> apparently-fixed 2.10.3 version from your sourceforge page.
> 
> According to sourceforge, 2.7.12 was released almost a YEAR ago, on 2012-10-
> 18, and 2.10.3 was released 2013-07-26.  Why isn't 2.10.3 in v3.11?
> 
> Don't you guys care that it is so easy to blow up your driver with the 
> mainline
> kernel?  I'm quite frustrated by how much time I've wasted on this issue.
> 
> I do not think that defending yourself with "please try the latest driver from
> sourceforge" is a reasonable or friendly way to work in the Linux community.
> 
> Bjorn

Your right I haven't been keeping the version strings up-to-date with our 
latest upstream pushes.  I was hoping to reach a sync point were both drivers 
(upstream and out of tree) were closer before I bumped the upstream version.  
The relationship between the version number in the ixgbevf version number in 
the upstream kernel and our out of tree driver are not quite that straight 
forward as the version strings would suggest.  We have pushed quite a few 
patches since the last version bump a year ago and in fact attempt to push 
patches upstream in parallel with any changes we make in the out of tree 
driver.   But depending on quite a list of events (testing, release schedule, 
when net-next is open) one driver can receive patches earlier and or later than 
another.  Also ixgbevf is currently going through a fair amount of refactoring 
to bring in more up to date with ixgbe so there are a fair amount of patches 
currently in play. The reason I suggested you try the out of tree driver 
(source forge) as I knew it was currently a bit more up to date. 

The good news is if the latest out of tree driver is correcting your problem 
the fix most likely is reroute to upstream.  Likewise I can send you some of 
the upstream patches that are in the out of tree driver but are waiting to be 
sent upstream, if you would like to try them.  Some of them touch code around 
the mbx messages, which like I mentioned in an early email the error message 
your seeing seems to imply something has gone wrong there.  But since we can't 
seem to recreate your failure local I can't know for sure.

Thanks,
-Don Skidmore 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Jörn Engel
On Thu, 12 September 2013 16:51:15 -0700, Andy Lutomirski wrote:
> 
> Supposedly, the Linux entropy pool has the property that mixing in
> even actively malicious data is no worse than not mixing in anything
> at all.

It is worse in three ways:
- it costs performance,
- it may create a false sense of safety and
- it actively does harm if we credit it as entropy.

How much weight you assign to each of those is up to you.  So long as
we don't credit any of it as entropy, I am not too adverse to mixing
it in.  But I can equally see benefit in burning the bridges.

Jörn

--
There are two ways of constructing a software design: one way is to make
it so simple that there are obviously no deficiencies, and the other is
to make it so complicated that there are no obvious deficiencies.
-- C. A. R. Hoare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Andy Lutomirski
On Thu, Sep 12, 2013 at 3:13 PM, Jörn Engel  wrote:
> On Thu, 12 September 2013 19:39:47 -0400, Jeff Garzik wrote:
>> On Thu, Sep 12, 2013 at 5:57 PM, Jörn Engel  wrote:
>> > On Wed, 11 September 2013 14:47:04 -0400, David Safford wrote:
>> >> But I also think that the existing (certified) TPMs are good enough
>> >> for direct use.
>>
>> > That is equivalent to trusting the TPM chip not to be malicious.  It
>>
>> Indeed.  While it need not be rngd or userland at all, it seems
>> reasonable to require any hardware RNG to have its data pushed through
>> AES mix steps (as kernel random does now IIUC).
>
> *shrug*
>
> The hardware RNG is either providing good entropy or entirely
> predictable data - without us being able to tell the difference.  So I
> am torn between two extremes.  Either we admit it to the entropy pool
> and mix it will all other sources - hoping that it actually is
> unpredictable to The Bad Guys(tm).  Or we disregard all of it.

Supposedly, the Linux entropy pool has the property that mixing in
even actively malicious data is no worse than not mixing in anything
at all.

(This is probably not true if the so-called entropy can depend on the
current (secret) state of the pool, but the TPM has no way to see
that.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] tick: broadcast: Deny per-cpu clockevents from being broadcast sources

2013-09-12 Thread Sören Brinkmann
Hi Thomas,

On Thu, Sep 12, 2013 at 10:30:15PM +0200, Thomas Gleixner wrote:
> On Thu, 12 Sep 2013, Soren Brinkmann wrote:
> > From: Stephen Boyd 
> > 
> > On most ARM systems the per-cpu clockevents are truly per-cpu in
> > the sense that they can't be controlled on any other CPU besides
> > the CPU that they interrupt. If one of these clockevents were to
> > become a broadcast source we will run into a lot of trouble
> > because the broadcast source is enabled on the first CPU to go
> > into deep idle (if that CPU suffers from FEAT_C3_STOP) and that
> > could be a different CPU than what the clockevent is interrupting
> > (or even worse the CPU that the clockevent interrupts could be
> > offline).
> > 
> > Theoretically it's possible to support per-cpu clockevents as the
> > broadcast source but so far we haven't needed this and supporting
> > it is rather complicated. Let's just deny the possibility for now
> > until this becomes a reality (let's hope it never does!).
> 
> Well, we can't do it this way. There are globally accessible clock
> event devices which deliver only to cpu0. So the mask check might be
> causing failure here.
> 
> Just add a feature flag CLOCK_EVT_FEAT_PERCPU to the clock event
> device and check for it.

I gave it a shot. Is this what you imagine:
diff --git a/drivers/clocksource/arm_global_timer.c 
b/drivers/clocksource/arm_global_timer.c
index b66c1f3..c639b1a 100644
--- a/drivers/clocksource/arm_global_timer.c
+++ b/drivers/clocksource/arm_global_timer.c
@@ -169,7 +169,8 @@ static int gt_clockevents_init(struct clock_event_device 
*clk)
int cpu = smp_processor_id();
 
clk->name = "arm_global_timer";
-   clk->features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT;
+   clk->features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT |
+   CLOCK_EVT_FEAT_PERCPU;
clk->set_mode = gt_clockevent_set_mode;
clk->set_next_event = gt_clockevent_set_next_event;
clk->cpumask = cpumask_of(cpu);
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0857922..493aa02 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -60,6 +60,7 @@ enum clock_event_mode {
  * Core shall set the interrupt affinity dynamically in broadcast mode
  */
 #define CLOCK_EVT_FEAT_DYNIRQ  0x20
+#define CLOCK_EVT_FEAT_PERCPU  0x40
 
 /**
  * struct clock_event_device - clock event device descriptor
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index d3539e5..de4c5d8 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -70,16 +70,14 @@ static bool tick_check_broadcast_device(struct 
clock_event_device *curdev,
struct clock_event_device *newdev)
 {
if ((newdev->features & CLOCK_EVT_FEAT_DUMMY) ||
-   (newdev->features & CLOCK_EVT_FEAT_C3STOP))
+   (newdev->features & CLOCK_EVT_FEAT_C3STOP) ||
+   (newdev->features & CLOCK_EVT_FEAT_PERCPU))
return false;
 
if (tick_broadcast_device.mode == TICKDEV_MODE_ONESHOT &&
!(newdev->features & CLOCK_EVT_FEAT_ONESHOT))
return false;
 
-   if (cpumask_equal(newdev->cpumask, cpumask_of(smp_processor_id(
-   return false;
-
return !curdev || newdev->rating > curdev->rating;
 }

If this is the way to go, I can prepare this in a v2.

Thanks,
Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Jörn Engel
On Thu, 12 September 2013 19:39:47 -0400, Jeff Garzik wrote:
> On Thu, Sep 12, 2013 at 5:57 PM, Jörn Engel  wrote:
> > On Wed, 11 September 2013 14:47:04 -0400, David Safford wrote:
> >> But I also think that the existing (certified) TPMs are good enough
> >> for direct use.
> 
> > That is equivalent to trusting the TPM chip not to be malicious.  It
> 
> Indeed.  While it need not be rngd or userland at all, it seems
> reasonable to require any hardware RNG to have its data pushed through
> AES mix steps (as kernel random does now IIUC).

*shrug*

The hardware RNG is either providing good entropy or entirely
predictable data - without us being able to tell the difference.  So I
am torn between two extremes.  Either we admit it to the entropy pool
and mix it will all other sources - hoping that it actually is
unpredictable to The Bad Guys(tm).  Or we disregard all of it.

If we disregard all of it, that forces us to collect good entropy from
other sources.  Having what looks like good entropy but may not be
will a) make us complacent and b) make it hard to notice when our
random numbers are compromised.  It is the same strategy as burning
down the bridge, forcing your army to win or die.  It also shares the
same drawbacks.

Jörn

--
"Error protection by error detection and correction."
-- from a university class
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Jeff Garzik
On Thu, Sep 12, 2013 at 5:57 PM, Jörn Engel  wrote:
> On Wed, 11 September 2013 14:47:04 -0400, David Safford wrote:
>> But I also think that the existing (certified) TPMs are good enough
>> for direct use.

> That is equivalent to trusting the TPM chip not to be malicious.  It

Indeed.  While it need not be rngd or userland at all, it seems
reasonable to require any hardware RNG to have its data pushed through
AES mix steps (as kernel random does now IIUC).

  Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Andy Lutomirski
On Thu, Sep 12, 2013 at 2:57 PM, Jörn Engel  wrote:
> On Wed, 11 September 2013 14:47:04 -0400, David Safford wrote:
>>
>> But I also think that the existing (certified) TPMs are good enough
>> for direct use.
>
> That is equivalent to trusting the TPM chip not to be malicious.  It
> requires trusting the chip designer, trusting every single employee of
> the chip designer, as some of them may be plants from a random
> countries spook organization, trusting the fab where the chip was
> manufactured, trusting your local dealer not to replace one chip with
> another in a similar packaging, trusting third-party components the
> designers may have incorporated, trusting intermediate steps between
> designer and fab or fab and local dealer, trusting your own employees,
> etc.

I would argue that any TPM-using kernel code should go even further
than just not trusting its RNG.  We should be further wrapping the
TPM-generated keys so that even a complete leak of the SRK wouldn't
allow an adversary to unwrap the keys.

Of course, without blinded operations (which the TPM doesn't support),
we're always vulnerable to the TPM actively leaking private key bits,
but that seems less likely.

(I have a partial implementation of this called tpmkey -- some day I
hope to finish it.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/futex.c: notice the return value after rt_mutex_finish_proxy_lock() fails

2013-09-12 Thread Thomas Gleixner
On Thu, 12 Sep 2013, Darren Hart wrote:
> On Thu, 2013-09-12 at 16:32 +0200, Thomas Gleixner wrote:
> > On Tue, 20 Aug 2013, Chen Gang wrote:
> > 
> > > rt_mutex_finish_proxy_lock() can return failure code (e.g. -EINTR,
> > > -ETIMEDOUT).
> > > 
> > > Original implementation has already noticed about it, but not check it
> > > before next work.
> > > 
> > > Also let coments within 80 columns to pass "./scripts/checkpatch.pl".
> > > 
> > > 
> > > Signed-off-by: Chen Gang 
> > > ---
> > >  kernel/futex.c |   30 --
> > >  1 files changed, 16 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/kernel/futex.c b/kernel/futex.c
> > > index c3a1a55..1a94e7d 100644
> > > --- a/kernel/futex.c
> > > +++ b/kernel/futex.c
> > > @@ -2373,21 +2373,23 @@ static int futex_wait_requeue_pi(u32 __user 
> > > *uaddr, unsigned int flags,
> > >   ret = rt_mutex_finish_proxy_lock(pi_mutex, to, _waiter, 
> > > 1);
> > >   debug_rt_mutex_free_waiter(_waiter);
> > >  
> > > - spin_lock(q.lock_ptr);
> > > - /*
> > > -  * Fixup the pi_state owner and possibly acquire the lock 
> > > if we
> > > -  * haven't already.
> > > -  */
> > > - res = fixup_owner(uaddr2, , !ret);
> > > - /*
> > > -  * If fixup_owner() returned an error, proprogate that.  If 
> > > it
> > > -  * acquired the lock, clear -ETIMEDOUT or -EINTR.
> > > -  */
> > > - if (res)
> > > - ret = (res < 0) ? res : 0;
> > > + if (!ret) {
> > 
> > Again. This is completely wrong!  
> > 
> > We MUST call fixup_owner even if finish_proxy_lock() returned with an
> > error code. Simply because finish_proxy_lock() is called outside of
> > the spin_lock(q.lock_ptr) region and another thread might have
> > modified the futex state. So we need to handle the corner cases
> > otherwise we might leave the futex in some undefined state.
> > 
> > You're reintroducing a hard to decode bug, which got analyzed and
> > fixed in futex_lock_pi() years ago. See the history for the
> > explanation.
> > 
> > Sigh.
> > 
> > tglx
> 
> Chen, perhaps you can let us know what the failure scenario is that you
> are trying to address with this patch.

No failure scenario at all.

Chen is on a self defined agenda to fix random kernel bugs in random
kernel subdirectories on a given rate by all means. (Google yourself
for the details.)

That crusade does not involve any failure analysis or test cases. It's
just driven by mechanically checking the code for inconsistencies. Now
he tripped over a non obvious return value chain in the futex code. So
instead of figuring out why it is coded this way, he just mechanically
decided that there is a missing check. Though:

The return value is checked and it needs deep understanding of the way
how futexes work to grok why it's necessary to invoke fixup_owner()
independent of the rt_mutex_finish_proxy_lock() return value.

The code in question is:

ret = rt_mutex_finish_proxy_lock(pi_mutex, to, _waiter, 1);

spin_lock(q.lock_ptr);
/*
 * Fixup the pi_state owner and possibly acquire the lock if we
 * haven't already.
 */
res = fixup_owner(uaddr2, , !ret);
/*
 * If fixup_owner() returned an error, proprogate that.  If it
 * acquired the lock, clear -ETIMEDOUT or -EINTR. 
 */
if (res)
ret = (res < 0) ? res : 0;

If you can understand the comments in the code and you are able to
follow the implementation of fixup_owner() and the usage of "!ret" as
an argument you really should be able to figure out, why this is
correct.

I'm well aware, as you are, that this code is hard to grok. BUT:

If this code in futex_wait_requeue_pi() is wrong why did Chen's
correctness checker not trigger on the following code in
futex_lock_pi()?:

if (!trylock)
ret = rt_mutex_timed_lock(_state->pi_mutex, to, 1);
else {
ret = rt_mutex_trylock(_state->pi_mutex);
/* Fixup the trylock return value: */
ret = ret ? 0 : -EWOULDBLOCK;
}

spin_lock(q.lock_ptr);
/*
 * Fixup the pi_state owner and possibly acquire the lock if we
 * haven't already.
 */
res = fixup_owner(uaddr, , !ret);
/*
 * If fixup_owner() returned an error, proprogate that.  If it acquired
 * the lock, clear our -ETIMEDOUT or -EINTR.
 */
if (res)
ret = (res < 0) ? res : 0;

It's the very same pattern and according to Chen's logic broken as
well.

As I recommended to Chen to read the history of futex.c, I just can
recommend the same thing to you to figure out why the heck this is the
correct way to handle it.

Hint: The relevant commit starts with: cdf

The code has changed quite a bit since then, but the issue which 

Re: [PATCH] /dev/random: Insufficient of entropy on many architectures

2013-09-12 Thread Theodore Ts'o
On Thu, Sep 12, 2013 at 05:07:17PM -0400, Jörn Engel wrote:
> 
> I happen to have a real-world system with >100k interrupts per second
> and - surprise - add_interrupt_randomness() showed up prominently in
> the profiles.  I was also told twice to just remove that call.  I
> resisted both times and have done far more work to reduce overhead
> while still collecting entropy.  Some others would have caved in.

Would it be possible for you to send me the perf numbers that you saw?

What platform is this?  x86?   Some embedded processor?

> One option is to add the "input_pool.entropy_count > trickle_thresh"
> condition that all other entropy sources currently have.  But instead
> I would rather rename fast_mix() to not_too_fast_mix() and implement a
> real fast_mix().  Essentially just xor the collected numbers into a
> pool and schedule something to shuffle the bits at a later point.

We can try some different things to make fast_mix() faster, but it
would be good to get some hard numbers before we start deciding we
need to do something more complicated.

One thing that comes to mind is that fast_mix() is only called in
exactly one place, and we always pass in a long.  So there are
certainly ways that we could optimize fast_mix even keeping the
current mixing algorithm.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TPMs and random numbers

2013-09-12 Thread Jörn Engel
On Wed, 11 September 2013 14:47:04 -0400, David Safford wrote:
>
> But I also think that the existing (certified) TPMs are good enough 
> for direct use.

That is equivalent to trusting the TPM chip not to be malicious.  It
requires trusting the chip designer, trusting every single employee of
the chip designer, as some of them may be plants from a random
countries spook organization, trusting the fab where the chip was
manufactured, trusting your local dealer not to replace one chip with
another in a similar packaging, trusting third-party components the
designers may have incorporated, trusting intermediate steps between
designer and fab or fab and local dealer, trusting your own employees,
etc.

If you sum it all up, you quickly depend on hundreds of people in
multiple countries that have the ability to subvert your chips RNG
without you being able to notice any difference.

Or rather, you would only be able to notice the difference if you were
the person that subverted the chip.  So the NSA may be able to tell
whether the Chinese have subverted a specific chip.  Honi soit...

Jörn

--
"Security vulnerabilities are here to stay."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


A4 paper and A3 paper

2013-09-12 Thread Sunny
Dear Purchasing manager:
here is paper leading supplier from China
our main products include A3 copy paper, A4 copy paper, offset paper, newsprint 
paper and so on
we can do the packing as customer's request
if you need any of them please reply me
Best wishes
Sunny

Manager
North China Lutuo international Co., Limited
Tel:+86-311-69012252
Skype: northchinalutuo
Email:northchinalu...@eastlutuo.com

Re: [ 00/19] 3.0.96-stable review

2013-09-12 Thread Greg Kroah-Hartman
On Thu, Sep 12, 2013 at 03:29:05PM -0700, Guenter Roeck wrote:
> On 09/12/2013 10:26 AM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 3.0.96 release.
> >There are 19 patches in this series, all will be posted as a response
> >to this one.  If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Sat Sep 14 17:24:38 UTC 2013.
> >Anything received after that time might be too late.
> >
> >The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.96-rc1.gz
> >and the diffstat can be found below.
> >
> 
> Build results:
>   total: 98 pass: 71 skipped: 16 fail: 11

Thanks for testing and letting me know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/46] 3.10.12-stable review

2013-09-12 Thread Greg Kroah-Hartman
On Thu, Sep 12, 2013 at 03:35:09PM -0700, Guenter Roeck wrote:
> On 09/12/2013 10:58 AM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 3.10.12 release.
> >There are 46 patches in this series, all will be posted as a response
> >to this one.  If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Sat Sep 14 17:56:26 UTC 2013.
> >Anything received after that time might be too late.
> >
> >The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.12-rc1.gz
> >and the diffstat can be found below.
> >
> 
> Build results:
>   total: 110 pass: 110 skipped: 0 fail: 0
> 
> Added cris32 and several arm targets since last time. Still no failures.

That's great to see, thanks for testing and letting me know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/16] 3.11.1-stable review

2013-09-12 Thread Greg Kroah-Hartman
On Thu, Sep 12, 2013 at 03:37:10PM -0700, Guenter Roeck wrote:
> On 09/12/2013 11:14 AM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 3.11.1 release.
> >There are 16 patches in this series, all will be posted as a response
> >to this one.  If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Sat Sep 14 18:10:15 UTC 2013.
> >Anything received after that time might be too late.
> >
> >The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.11.1-rc1.gz
> >and the diffstat can be found below.
> >
> 
> Test results:
>   total: 110 pass: 107 skipped: 2 fail: 1
> 
> The failure is xtensa:allmodconfig. A patch fixing the problem has been 
> submitted,
> but is not yet available upstream.

What was skipped?

Anyway, thanks for testing and letting me know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:sched/core] sched/fair: Fix group power_orig computation

2013-09-12 Thread Michael Neuling
tip-bot for Peter Zijlstra  wrote:

> Commit-ID:  863bffc80898b8df295ebac111af2335ec05f85d
> Gitweb: http://git.kernel.org/tip/863bffc80898b8df295ebac111af2335ec05f85d
> Author: Peter Zijlstra 
> AuthorDate: Wed, 28 Aug 2013 11:44:39 +0200
> Committer:  Ingo Molnar 
> CommitDate: Thu, 12 Sep 2013 19:14:43 +0200
> 
> sched/fair: Fix group power_orig computation
> 
> When looking at the code I noticed we don't actually compute
> sgp->power_orig correctly for groups, fix that.
> 
> Currently the only consumer of that value is fix_small_capacity()
> which is only used on POWER7+ and that code excludes this case by
> being limited to SD_SHARE_CPUPOWER which is only ever set on the SMT
> domain which must be the lowest domain and this has singleton groups.
> 
> So nothing should be affected by this change.
> 
> Cc: Michael Neuling 
> Signed-off-by: Peter Zijlstra 
> Link: http://lkml.kernel.org/n/tip-db2pe0vxwunv37plc7onn...@git.kernel.org
> Signed-off-by: Ingo Molnar 

FWIW, this doesn't seem to break POWER7.  

Thanks!
Mikey

> ---
>  kernel/sched/fair.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f9f4385..baba313 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4450,7 +4450,7 @@ void update_group_power(struct sched_domain *sd, int 
> cpu)
>  {
>   struct sched_domain *child = sd->child;
>   struct sched_group *group, *sdg = sd->groups;
> - unsigned long power;
> + unsigned long power, power_orig;
>   unsigned long interval;
>  
>   interval = msecs_to_jiffies(sd->balance_interval);
> @@ -4462,7 +4462,7 @@ void update_group_power(struct sched_domain *sd, int 
> cpu)
>   return;
>   }
>  
> - power = 0;
> + power_orig = power = 0;
>  
>   if (child->flags & SD_OVERLAP) {
>   /*
> @@ -4470,8 +4470,12 @@ void update_group_power(struct sched_domain *sd, int 
> cpu)
>* span the current group.
>*/
>  
> - for_each_cpu(cpu, sched_group_cpus(sdg))
> - power += power_of(cpu);
> + for_each_cpu(cpu, sched_group_cpus(sdg)) {
> + struct sched_group *sg = cpu_rq(cpu)->sd->groups;
> +
> + power_orig += sg->sgp->power_orig;
> + power += sg->sgp->power;
> + }
>   } else  {
>   /*
>* !SD_OVERLAP domains can assume that child groups
> @@ -4480,12 +4484,14 @@ void update_group_power(struct sched_domain *sd, int 
> cpu)
>  
>   group = child->groups;
>   do {
> + power_orig += group->sgp->power_orig;
>   power += group->sgp->power;
>   group = group->next;
>   } while (group != child->groups);
>   }
>  
> - sdg->sgp->power_orig = sdg->sgp->power = power;
> + sdg->sgp->power_orig = power_orig;
> + sdg->sgp->power = power;
>  }
>  
>  /*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] /dev/random: Insufficient of entropy on many architectures

2013-09-12 Thread Jörn Engel
On Tue, 10 September 2013 15:08:12 -0700, John Stultz wrote:
> Though
> I probably should be hesitant with my suggestions, as I'm not well
> versed in RNG theory.

The basic principle of Ted's RNG is very simple and quite sane:
- You collect as much data as possible, some of which is (hopefully)
  unpredictable.
- All the data gets dumped into a small buffer.
- When reading from the buffer, you create a crypto-hash of the entire
  buffer.  Even if most of the buffer is predictable, the few
  unpredictable bits will randomly flip every output bit.
- Half of the hash gets returned to the reader, the other half gets
  added back into the pool.

It doesn't matter if you collect predictable data - it neither helps
nor hurts.  But you should collect as much unpredictable data as
possible and do it as cheaply as possible.  If you want to improve the
RNG, you either collect more data, collect better (less predictable)
data or make the collection cheaper.

Jörn

--
People really ought to be forced to read their code aloud over the phone.
That would rapidly improve the choice of identifiers.
-- Al Viro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2][usbutils] drop dependency on usb.ids

2013-09-12 Thread Greg Kroah-Hartman
On Wed, Sep 04, 2013 at 02:47:08PM +0200, Tom Gundersen wrote:
> Also remove usb.ids from the repository. [Note that these were probably
> never used by distributions regarless, as most distros ship the usb.ids
> directly from upstream.]
> 
> Hardcode the usb-spec information that used to be in usb.ids,
> but which was not moved to hwdb.
> 
> Increase the size of tha hash table from 16 to 512, though using a
> hash at all is probably overkill at this point.

Applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2][usbutils] lsusb: port to hwdb

2013-09-12 Thread Greg Kroah-Hartman
On Wed, Sep 04, 2013 at 02:47:07PM +0200, Tom Gundersen wrote:
> Most of the information in usb.ids is now contained in udev's hwdb. Read the
> information from the hwdb instead of usb.ids.
> 
> This would allow distributions to no longer ship (most of) usb.ids by default,
> but rather keep all the usb device information in the hwdb.
> 
> This patch introduces a dependency on libusb >= 196.
> 
> v2: drop support from reading data from usb.ids which is already in hwdb
> 
> Cc: Greg Kroah-Hartman 
> ---
>  Makefile.am  |   5 +-
>  configure.ac |   2 +
>  names.c  | 361 
> +++
>  3 files changed, 45 insertions(+), 323 deletions(-)

Applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] /dev/random: Insufficient of entropy on many architectures

2013-09-12 Thread Jörn Engel
On Thu, 12 September 2013 14:15:35 +0200, Geert Uytterhoeven wrote:
> On Thu, Sep 12, 2013 at 2:08 PM, Stephan Mueller  wrote:
> >>BTW, I prefer a different name than "random_get_fast_cycles()", as it's
> >>better to have something that returns different and unpredictable
> >>numbers than an actual monotonic cycle counter.
> >
> > A monotonic counter is fully ok. Note, for /dev/random, the occurrence
> > of events delivers entropy. Thus, we have to be able to precisely
> > measure that occurrence. The timer itself does not need to deliver any
> > entropy as long as it is fast.
> 
> Well, in my specific case (m68k/Amiga) I can use:
>   - a 24-bit counter running at only ca. 15 or 31 kHz (actual
> frequency may vary),
>   - a 16-bit counter running at ca. 700 kHz.

Assuming the same cost, please use the 700kHz counter.  Or both.

Jiffies is a relatively poor choice, as it can be predicted with high
certainty.  Most of the time it will be identical to the last value
for jiffies, most of the remaining time it will be off by exactly one.
So on average you don't even get a single unpredictable bit from
jiffies.

A counter that is fast when compared to interrupt rate will give you
relatively many useful bits.  A slow counter - no matter how wide -
will have little useful randomness.  Ideal is a counter that cannot be
externally derived even with the most expensive measurement kit.  So
an unstable clock is actually a bonus.  Think high precision and low
realiability.

And if you have to drop bits from the counter, please drop the high
bits, as they are the easily predictable ones.

Jörn

--
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >