[PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement"

2012-12-12 Thread Ingo Molnar

* David Ahern  wrote:

> On 12/12/12 8:34 PM, Linus Torvalds wrote:
> >On Wed, Dec 12, 2012 at 7:25 PM, David Ahern  wrote:
> >>
> >>Are you running an older perf binary on the 3.8 kernel?
> >
> >I am.. I don't tend to rebuild 'perf'..
> >
> >>Does this work: perf record -e cycles:ppH  ...
> >
> >Yes it does. What is 'H' and why should anybody care? Especially since
> >I'm not running virtualized.
> >
> > That whole "exclude_guest" test is insane when there isn't 
> > any virtualization going on. Very annoying.
> 
> you know what's worse? [...]

No, nothing can be worse than breaking 99% of our installed 
base...

I'm wondering where this broke - is it:

  20b279ddb38c perf: Require exclude_guest to use PEBS - kernel side enforcement

Linus, does the straight revert below fix everything for you - 
or do we need to do more?

( The VM problem needs a different fix: a new include_guest bit 
  should be introduced, which would naturally default to 'off' 
  on older binaries, and the old bit should be phased out. Then 
  new perf binaries can turn on that bit safely. Or PEBS should 
  be fixed for guests. Or something along these lines - but 
  it should *not* by fixed by regressing existing binaries ... )

Thanks,

Ingo

->
>From 581ba4671bf1d1095e9ecf843be61904e4c97e91 Mon Sep 17 00:00:00 2001
From: Ingo Molnar 
Date: Thu, 13 Dec 2012 08:41:40 +0100
Subject: [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel
 side enforcement"

This reverts commit 20b279ddb38ca42f8863cec07b4d45ec24589f13.

Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/perf_event.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..6774c17 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -340,9 +340,6 @@ int x86_setup_perfctr(struct perf_event *event)
/* BTS is currently only allowed for user-mode. */
if (!attr->exclude_kernel)
return -EOPNOTSUPP;
-
-   if (!attr->exclude_guest)
-   return -EOPNOTSUPP;
}
 
hwc->config |= config;
@@ -385,9 +382,6 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip) {
int precise = 0;
 
-   if (!event->attr.exclude_guest)
-   return -EOPNOTSUPP;
-
/* Support for constant skid */
if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] avoid entropy starvation due to stack protection

2012-12-12 Thread Stephan Mueller
On 13.12.2012 01:43:21, +0100, Andrew Morton
 wrote:

Hi Andrew,
> On Tue, 11 Dec 2012 13:33:04 +0100
> Stephan Mueller  wrote:
>
>> Some time ago, I noticed the fact that for every newly
>> executed process, the function create_elf_tables requests 16 bytes of
>> randomness from get_random_bytes. This is easily visible when calling
>>
>> while [ 1 ]
>> do
>>  cat /proc/sys/kernel/random/entropy_avail
>>  sleep 1
>> done
> Please see
> http://ozlabs.org/~akpm/mmotm/broken-out/binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch
>
> That patch is about one week from a mainline merge, btw.

Initially I was also thinking about get_random_int. But stack protection
depends on non-predictable numbers to ensure it cannot be defeated. As
get_random_int depends on MD5 which is assumed to be broken now, I
discarded the idea of using get_random_int.

Moreover, please consider that get_cycles is an architecture-specific
function that on some architectures only returns 0 (For all
architectures where this is implemented, you have no guarantee that it
increments as a high-resolution timer). So, the quality of
get_random_int is questionable IMHO for the use as a stack protector.

Also note, that other in-kernel users of get_random_bytes may be
converted to using the proposed kernel pool to avoid more entropy drainage.

Please note that the suggested approach of fully seeding a deterministic
RNG never followed by a re-seeding is used elsewhere (e.g. the OpenSSL
RNG). Therefore, I think the suggested approach is viable.

Ciao
Stephan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-12 Thread Thierry Reding
On Thu, Dec 13, 2012 at 01:38:28PM +1100, NeilBrown wrote:
> On Wed, 12 Dec 2012 12:31:45 +0100 Thierry Reding
>  wrote:
> 
> > On Wed, Dec 12, 2012 at 07:24:30PM +1100, NeilBrown wrote:
[...]
> > > + struct omap_dm_timer*dm_timer;
> > > + unsigned intpolarity;
> > 
> > The PWM subsystem already has enum pwm_polarity for this.
> > 
> 
> I'll use that then  and as there  is a pwm_set_polarity() interface, that
> probably means that I don't need to configure the polarity via the platform
> data?  That would be a lot cleaner.

I guess the answer to that question is: it depends. If the user can set
the polarity (via platform or other means), then yes, you don't have to
pass it in here. However there may be users that don't support setting
the polarity or there may even be situations where the PWM goes through
an additional inverter on the board and therefore doesn't need the
polarity inversed after all, even if the user driver requests it.

Generally though I think that it is up to the user drivers to take care
of this and call pwm_set_polarity() as appropriate, so yes, I don't
think you have to explicitly pass it via platform data at all.

> > > + if (omap->duty_ns == duty_ns &&
> > > + omap->period_ns == period_ns)
> > > + /* No change - don't cause any transients */
> > > + return 0;
> > 
> > Note to self: this might be a candidate to put in the core.
> 
> might be useful, though the core doesn't currently "know" the current values.

Yes, but that can be changed. PWM is still a very young subsystem and
I'm trying to be cautious not to add too much cruft to it unless it's
really worth it.

> > > + omap_dm_timer_set_pwm(omap->dm_timer,
> > > +   !omap->polarity,
> > > +   toggle,
> > > +   trigger);
> > 
> > This doesn't either. Also you should be explicit about the polarity
> > parameter, since enum pwm_polarity is an enum and therefore negating it
> > isn't very nice (it should work though).
> > 
> > You could solve this by doing something like:
> > 
> > if (omap->polarity == PWM_POLARITY_NORMAL)
> > polarity = 1;
> > else
> > polarity = 0;
> 
> (omap->polarity == PWM_POLARITY_NORMAL)
> 
> would have the same effect.

Yes, that should work as well. However I'm not a friend of using such
expressions in a function call. But since you'll probably be reworking
this anyway because of the pwm_set_polarity() comments from above you
might just want to stick the proper value into omap->polarity in your
.set_polarity() implementation and not need the extra negation here.

> > > +static int __devinit omap_pwm_probe(struct platform_device *pdev)
> > 
> > No more __devinit, please.
> 
> If you say so (having no idea what it did :-)

This was used to mark functions depending on whether HOTPLUG was enabled
or not. For instance functions marked __devinit could be discarded after
the init stage if HOTPLUG was disabled because it would be guaranteed to
not be called after the init stage. Recently however HOTPLUG was changed
to be always enabled because the gains were very small and most people
would get them wrong anyway.

> > > +#if CONFIG_PM
> > > +static int omap_pwm_suspend(struct platform_device *pdev, pm_message_t 
> > > state)
> > > +{
> > > + struct omap_chip *omap = platform_get_drvdata(pdev);
> > > + /* No one preserve these values during suspend so reset them
> > > +  * Otherwise driver leaves PWM unconfigured if same values
> > > +  * passed to pwm_config
> > > +  */
> > > + omap->period_ns = 0;
> > > + omap->duty_ns = 0;
> > > +
> > > + return 0;
> > > +}
> > > +#else
> > > +#define omap_pwm_suspend NULL
> > > +#endif
> > 
> > This doesn't look right. You should implement .resume() if you really
> > care, in which case the resume callback would have to reconfigure with
> > the cached values. In that case maybe you should switch to dev_pm_ops
> > and SIMPLE_DEV_PM_OPS() as well.
> > 
> > If you don't, just resetting these values will not make the PWM work
> > properly after resume either since it will have to be explicitly
> > reconfigured.
> 
> I just copied that from pwm-samsung.c
> 
> I think the point is to avoid the "no transients" short-circuit in
> omap_pwm_config if the config is unchanged.
> 
> The assumption is that pwm_disable() will be called before suspend and
> pwm_config()/pwm_enable() after resume.  So there is no point actually
> configuring anything in .resume() - it makes sense to wait until pwm_config()
> is called (if ever).  But we want to make sure that pwm_config actually does
> something.

Okay, that makes sense. User drivers should actually be better suited to
reset PWM devices to their proper state on resume.

> > > +MODULE_AUTHOR("Grant Erickson ");
> > > +MODULE_AUTHOR("NeilBrown ");
> > 
> > Shouldn't this be "Neil Brown"? I noticed you use the concatenated form
> > in the email address as well, so maybe that's on purpose?
> 
> Yes, it is on purpose.  

Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Ingo Molnar

* David Ahern  wrote:

> > But doing it this way was wrong. Switch that "exclude_guest" 
> > attribute around, and admit that "H" was bogus, and that the 
> > right thing to do was to add a "V" flag that sets the 
> > "force_guest" flag instead.
> 
> I understand this is annoying. [...]

It's not annoying, it's outright broken - it's a regression that 
we'll fix.

> [...] Older binaries on newer kernels was the only case I 
> could not fix. [...]

The "only" case?? Old, working binaries are actually our _most_ 
important usecase: it's 99.9% of our current installed base ...

> [...] (I guess a message could be added kernel side to at 
> least give a hint.) But the alternative -- based on code that 
> has existed for some time -- is for older binaries to crash 
> VMs.

That should be fixed differently, by not breaking existing 
working functionality.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpuidle - remove the power_specified field in the driver

2012-12-12 Thread Daniel Lezcano
On 12/12/2012 11:42 PM, Rafael J. Wysocki wrote:
> On Wednesday, December 12, 2012 09:00:53 PM Daniel Lezcano wrote:
>> On 12/12/2012 07:50 PM, Julius Werner wrote:
>>> Thanks again for making this happen, Daniel. I like this version,
>>> except for the small nitpick that I still think it would make sense to
>>> also turn the loop in menu.c around. How about something like this:
>>>
>>>for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i++) 
>>> {
>>> struct cpuidle_state *s = >states[i];
>>> if (!s->disable && s->exit_latency <= latency_req &&
>>> s->target_residency <= data->predicted_us &&
>>> s->exit_latency * multiplier <= data->predicted_us) {
>>> data->last_state_idx = i;
>>> data->exit_us = s->exit_latency;
>>> break;
>>> }
>>> }
>>
>> Actually I was planning to do that in a separate patch.
> 
> Can you submit that second patch too, please, so that people don't have to
> wonder?

Sure.


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Wed, Dec 12, 2012 at 8:31 PM, David Ahern  wrote:
> >
> >
> > See commit 26a4f3c0. But that was not enough.
> 
> Why? Make the people who run virtualization do the extra work. Things
> never worked for them anyway, so forcing *them* to set a flag to get a
> working thing is sane.
> 
> Forcing everybody else to set a flag is insane. See?

Yeah, that's 100% stupid, we'll revert this change.

Arnado, wanna do it or should I? This slipped through the 
testing cracks ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-12 Thread Stefani Seibold
Am Mittwoch, den 12.12.2012, 22:47 -0800 schrieb H. Peter Anvin:
> Should be a simple matter of sharing pages.  Look perhaps at the x32 vdso for 
> a hint.
> 


> >
> >Any idea or clean solution how i can map the 64 bit vgtod into the 32
> >bit address space? Thats the only problem i see.
> >

No, i see no special handling for x32 vdso to do this. I am not sure if
x32 vdso can access the 64 bit address space of vsyscall_gtod_data. I
can't test this due the lack of a x32 abi system.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] applesmc: add sysfs file to report OSK

2012-12-12 Thread Henrik Rydberg
Hi Gabriel,

> I could try to hardcode the OSK inside QEMU, or I could try to include
> it as a default config file entry, but I'm quite certain the QEMU project
> would be uncomfortable "distributing" a string on which Apple claims
> copyright. Even if that happened, distros might then balk at shipping
> QEMU as part of their package repositories, for the exact same reason.

I understand this is frustrating, and I believe you have made a good
case for the rationale. However, the technical issues remain. Also,
there might still be other, simpler, solutions.

> The only viable (from a legal CYA standpoint) thing I can think of is
> to make it easy to acquire the OSK automatically, on demand, directly
> from the hardware. Right now, the logical place for that is applesmc.ko.
> It already controls access to the SMC, and already reports values for
> various keys.

How about encrypting the string with a key only found on an Apple
computer? There are strings available in both ACPI and EFI that could
serve such a purpose.

Regarding the patch, I agree with Guenter that putting more unrelated
things into the hwmon subsystem makes no sense. Most of the
information in applesmc should go into the hwmon, thermal, backlight
and input subsystems, but some strings should go somewhere else (maybe
/sys/firmware/smc/?). The reluctance you experience here is a
technical one; someone will need to make an effort to create a good
place for your string, and it does not help that the string is, in
fact, a constant. :-)

So, don't give up hope, but please do not expect an immediate solution.

Thanks.
Henrik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-12 Thread Thierry Reding
On Thu, Dec 13, 2012 at 02:06:35PM +1100, NeilBrown wrote:
> 
> [Thierry: question for you near the end - thanks]
> 
> On Wed, 12 Dec 2012 10:08:28 -0600 Jon Hunter  wrote:
> 
> > Hi Neil,
> > 
> > On 12/12/2012 02:24 AM, NeilBrown wrote:
[...]
> > > +{
> > > + struct omap_chip *omap = platform_get_drvdata(pdev);
> > > + int status = 0;
> > > +
> > > + status = pwmchip_remove(>chip);
> > > + if (status < 0)
> > > + goto done;
> > > +
> > > + omap_dm_timer_free(omap->dm_timer);
> > 
> > Is it guaranteed that the timer will be disabled at this point?
> 
> Uhmm... it seems that pwm_put() doesn't call pwm_disable(), so I guess it
> might not be disabled.
> Thierry: should pwm_put do that, or do I need a 'free' function in my chip
> ops to do that?

To be honest, I haven't decided yet. =) There have been discussions that
resulted in a request to run pwm_disable() from pwmchip_remove() on all
PWM devices a chip provides.

This isn't implemented yet and I'm not sure about all the side-effects.
I think for now the best way would be to implement .free() within this
driver, or even do an explicit pwm_disable() in the driver's .remove()
function to do this. When I've come to a decision I'll refactor all of
that in one patch across the whole subsystem.

Thierry


pgpKbq0nitgDD.pgp
Description: PGP signature


ebeam PATCH (new eBeam input driver)

2012-12-12 Thread Yann Cantin
Hi,

Do i need to resubmit the patchset for the next merge window ?

If so, do you guys prefer it against linus or next branch ?

For the record :

> New USB input driver for eBeam devices.
> 
> Currently supported (tested) :
> - Luidia eBeam classic projection and edge projection models
> - Nec "interactive solution" NP01Wi1 & NP01Wi2 accessories.
> 
> Patch 1 to blacklist the devices for hid generic-usb.
> 
> Patch 2 is the actual driver.

-- 
Yann Cantin
A4FEB47F
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-12 Thread Yinghai Lu
On Wed, Dec 12, 2012 at 9:26 PM, H. Peter Anvin  wrote:
>>
>> The new page table setup in tip:x86/mm2 should make that easier to
>> achieve, however... I won't have time to test this out tonight, though.
>>
>>  -hpa
>
>
> Well, minus a simple brainfart now it actually gets into the page table
> setup.

of init_mem_mapping in setup_arch?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec and struct boot_params

2012-12-12 Thread Yinghai Lu
On Wed, Dec 12, 2012 at 8:23 PM, H. Peter Anvin  wrote:
> I still want to do what I mentioned before, because we need to not rely on
> the initialized/16-bit portion so much:
>
> 1. add a field in the uninitialized portion, call it "sentinel";
> 2. make sure the byte position corresponding to the "sentinel" field is
>nonzero in the bzImage file;
> 3. if the kernel boots up and sentinel is nonzero, erase those fields
>that you identified as uninitialized;

thanks for the instruction. please check if you are ok with update patch

> 4. assign a proper boot loader ID to kexec, so we have a way of dealing
>with this kind of debacles in the future (that is what the
>bootloader ID is for: it gives us a way to work around
>bootloader-specific problems.)

ok, add one patch for kexec-tools

>
> We also need to formalize the 64-bit entry point properly, including all the
> entry conditions and so forth.  That needs to be documented.

do you mean, in some file, like bzImage_entry_64.txt
1. kernel 16 bit code length is defined by setup_sects in setup_header.
2. after that there are kernel code
   a. 32bit entry is 0,
   b. 64bit entry is 0x200
3. when using 32bit entry, kernel should under 1G, initrd should be
under 2G, zero_page, command_line should be under 1G.
 when using 64bit entry, kernel, initrd, zero_page, command_line could
be above 4G.

Thanks

Yinghai


ext_ramdisk_image.patch
Description: Binary data


linux-next: Tree for Dec 13

2012-12-12 Thread Stephen Rothwell
Hi all,

Changes since 20121211:

Lots of conflicts are migrating between trees.

The powerpc tree still had its build failure for which I applied a patch.

The l2-mtd tree gained a conflict against Linus' tree.

The virtio tree gained a conflict against Linus' tree.

The akpm tree lost lots of commits that turned up elsewhere and still had
its build failure for which I reverted a commit.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 214 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (9977d9b Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal)
Merging fixes/master (caf4919 Revert "revert "Revert "mm: remove 
__GFP_NO_KSWAPD""" and associated damage)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (810883f ARM: 7594/1: Add .smp entry for REALVIEW_EB)
Merging m68k-current/for-linus (34fa78b m68k: fix sigset_t accessor functions)
Merging powerpc-merge/merge (e716e01 powerpc/eeh: Do not invalidate PE properly)
Merging sparc/master (df2fc24 Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux)
Merging net/master (2959440 Linux 3.7)
Merging sound-current/for-linus (d846b17 ALSA: hda - Fix build without 
CONFIG_PM)
Merging pci-current/for-linus (ff8e59b PCI/portdrv: Don't create hotplug slots 
unless port supports hotplug)
Merging wireless/master (6bdd253 mac80211: fix remain-on-channel 
(non-)cancelling)
Merging driver-core.current/driver-core-linus (1ebaf4f Merge branch 
'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging tty.current/tty-linus (1ebaf4f Merge branch 'x86-timers-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging usb.current/usb-linus (1ebaf4f Merge branch 'x86-timers-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging staging.current/staging-linus (1ebaf4f Merge branch 
'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging char-misc.current/char-misc-linus (1ebaf4f Merge branch 
'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging input-current/for-linus (e12b3ce Input: wacom - fix touch support for 
Bamboo Fun CTH-461)
Merging md-current/for-linus (874807a md/raid1{,0}: fix deadlock in 
bitmap_unplug.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in 
cryptd_queue_worker to prevent data corruption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (03a0b4c solos-pci: fix double-free of TX skb in DMA mode)
CONFLICT (content): Merge conflict in arch/x86/Kconfig.cpu
CONFLICT (content): Merge conflict in arch/x86/Kconfig
CONFLICT (content): Merge conflict in arch/powerpc/Kconfig
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (a0d271c Linux 3.6)
Merging devicetree-current/devicetree/merge (0e622d3 of/address: sparc: 

[PATCH] nfs: Remove unused list nfs4_clientid_list

2012-12-12 Thread ycnian
From: Yanchuan Nian 

This list was designed to store struct nfs4_client in the client side.
But nfs4_client was obsolete and has been removed from the source code. 
So remove the unused list.

Signed-off-by: Yanchuan Nian 
---
 fs/nfs/nfs4state.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index c351e6b..51c9b0b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -66,7 +66,6 @@
 
 const nfs4_stateid zero_stateid;
 static DEFINE_MUTEX(nfs_clid_init_mutex);
-static LIST_HEAD(nfs4_clientid_list);
 
 int nfs4_init_clientid(struct nfs_client *clp, struct rpc_cred *cred)
 {
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-12 Thread H. Peter Anvin
Should be a simple matter of sharing pages.  Look perhaps at the x32 vdso for a 
hint.

Stefani Seibold  wrote:

>Am Mittwoch, den 12.12.2012, 22:14 -0800 schrieb H. Peter Anvin:
>> This is too late for 3.8 anyway, so there is time to make it work
>correctly before tge 3.9 merge window anyway.  After this merge window
>is over I may pull tjis into a testing branch, but compat support is a
>precondition.
>> 
>> The vdso is only optional if you build in backwards compatibility
>anyway, and software has a right to expect a specific numeric kernel
>version to export a single ABI.
>> 
>
>Any idea or clean solution how i can map the 64 bit vgtod into the 32
>bit address space? Thats the only problem i see.
>
>- Stefani

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nfs: Remove duplicate function declaration in internal.h

2012-12-12 Thread ycnian
From: Yanchuan Nian 

Remove duplicate function declaration in internal.h

Signed-off-by: Yanchuan Nian 
---
 fs/nfs/internal.h |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 59b133c..440674f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -409,9 +409,6 @@ extern struct nfs_write_header *nfs_writehdr_alloc(void);
 extern void nfs_writehdr_free(struct nfs_pgio_header *hdr);
 extern int nfs_generic_flush(struct nfs_pageio_descriptor *desc,
 struct nfs_pgio_header *hdr);
-extern void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
-   struct inode *inode, int ioflags,
-   const struct nfs_pgio_completion_ops *compl_ops);
 extern void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio);
 extern void nfs_writedata_release(struct nfs_write_data *wdata);
 extern void nfs_commit_free(struct nfs_commit_data *p);
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] nfs: Don't return referenced delegations

2012-12-12 Thread ycnian
From: Yanchuan Nian 

The client returns unreferenced delegations in state management. It scans all
delegations and tests the NFS_DELEGATION_REFERENCED flag. if this flag is not
set, NFS_DELEGATION_RETURN will be set and the delegation will be returned. 
But unfortunately NFS_DELEGATION_REFERENCED is cleared after the testing, so 
delegations which are still being used will be returned in the next state
management.

Signed-off-by: Yanchuan Nian 
---
 fs/nfs/delegation.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 81c5eec..485e8c0 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -506,7 +506,7 @@ static void nfs_mark_return_unreferenced_delegations(struct 
nfs_server *server)
struct nfs_delegation *delegation;
 
list_for_each_entry_rcu(delegation, >delegations, super_list) {
-   if (test_and_clear_bit(NFS_DELEGATION_REFERENCED, 
>flags))
+   if (test_bit(NFS_DELEGATION_REFERENCED, >flags))
continue;
nfs_mark_return_delegation(server, delegation);
}
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] linux-pstore.git

2012-12-12 Thread Anton Vorontsov
Hello Linus,

Here are just a few fixups for the pstore subsystem, nothing special this
time. Please pull.

Thank you!

The following changes since commit 70a6f46d7b0ec03653b9ab3f8063a9717a4a53ef:

  pstore: Fix NULL pointer dereference in console writes (2012-11-14 18:30:21 
-0800)

are available in the git repository at:

  git://git.infradead.org/users/cbou/linux-pstore.git tags/for-v3.8

for you to fetch changes up to ebacfd1ece3bfa46296fc92c6f996cb5f7fc75e6:

  pstore/ftrace: Adjust for ftrace_ops->func prototype change (2012-12-12 
19:50:04 -0800)


Anton Vorontsov (1):
  pstore/ftrace: Adjust for ftrace_ops->func prototype change

Arve Hjønnevåg (1):
  pstore/ram: Fix bounds checks for mem_size, record_size, console_size and 
ftrace_size

Hannes Reinecke (1):
  pstore/ram: Fixup section annotations

Maxime Bizon (1):
  pstore/ram: Fix undefined usage of rounddown_pow_of_two(0)

 fs/pstore/ftrace.c |  4 +++-
 fs/pstore/ram.c| 42 +++---
 2 files changed, 34 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iommu: moving initialization earlier

2012-12-12 Thread Alexey Kardashevskiy
The iommu_init() call initializes IOMMU internal structures and data
required for the API to function such as iommu_group_alloc().
It is registered as a subsys_initcall.

One of the IOMMU users is a PCI subsystem on POWER which discovers new
IOMMU tables during the PCI scan so the most logical place to call
iommu_group_alloc() is when a new group is just discovered. However
PCI scan is done from subsys_initcall hook as well what makes
using of the IOMMU API impossible.

The patch moves IOMMU subsystem initialization one step earlier.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/iommu/iommu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index de857bb..b0afd3d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -865,7 +865,7 @@ printk("%s %u\n", __func__, __LINE__);
 
return 0;
 }
-subsys_initcall(iommu_init);
+arch_initcall(iommu_init);
 
 int iommu_domain_get_attr(struct iommu_domain *domain,
  enum iommu_attr attr, void *data)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-12-12 Thread Alexey Kardashevskiy

On 13/12/12 13:29, Benjamin Herrenschmidt wrote:

On Wed, 2012-12-12 at 07:34 -0700, Alex Williamson wrote:

But what would I put there?... IOMMU ID is more than enough at the moment
and struct iommu_table does not have anything what would have made sense to
show in the sysfs...


I believe David mentioned that PEs had user visible names.  Perhaps they
match an enclosure location or something.  Group numbers are rather
arbitrary and really have no guarantee of persistence.  Thanks,


I agree. Make up something, for example domain[PE] or something like
that.


To be able to add a PE number, I need to call iommu_group_alloc() in the 
correct place where I know this number OR I have to carry it in iommu_table 
till the moment the iommu_group_alloc() is called (acceptable but not cool).


I will post a patch which would help as a response to this mail.


--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-12 Thread Stefani Seibold
Am Mittwoch, den 12.12.2012, 22:14 -0800 schrieb H. Peter Anvin:
> This is too late for 3.8 anyway, so there is time to make it work correctly 
> before tge 3.9 merge window anyway.  After this merge window is over I may 
> pull tjis into a testing branch, but compat support is a precondition.
> 
> The vdso is only optional if you build in backwards compatibility anyway, and 
> software has a right to expect a specific numeric kernel version to export a 
> single ABI.
> 

Any idea or clean solution how i can map the 64 bit vgtod into the 32
bit address space? Thats the only problem i see.

- Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-12 Thread H. Peter Anvin
This is too late for 3.8 anyway, so there is time to make it work correctly 
before tge 3.9 merge window anyway.  After this merge window is over I may pull 
tjis into a testing branch, but compat support is a precondition.

The vdso is only optional if you build in backwards compatibility anyway, and 
software has a right to expect a specific numeric kernel version to export a 
single ABI.

Stefani Seibold  wrote:

>Am Mittwoch, den 12.12.2012, 15:34 -0800 schrieb H. Peter Anvin:
>> On 12/12/2012 12:19 PM, stef...@seibold.net wrote:
>> > diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c
>b/arch/x86/vdso/vdso32/vclock_gettime.c
>> > new file mode 100644
>> > index 000..c9a1909
>> > --- /dev/null
>> > +++ b/arch/x86/vdso/vdso32/vclock_gettime.c
>> > @@ -0,0 +1,7 @@
>> > +/*
>> > + * since vgtod layout differs between X86_64 and x86_32, it is not
>possible to
>> > + * provide a 32 bit vclock with a 64 bit kernel
>> > + */
>> > +#ifdef CONFIG_X86_32
>> > +#include "../vclock_gettime.c"
>> > +#endif
>> 
>> This is where this goes fail.  Sorry, it is not acceptable to
>introduce 
>> an ABI on x86-32 without providing it also on x86-64 in compatibility
>mode.
>> 
>> I also don't believe it is not possible... it might require some more
>
>> cleverness; perhaps we need to do the 32-bit vgtod in such a way that
>it 
>> *is* compatible with 64 bits.
>> 
>
>The comment is ambiguous:
>
>Since vgtod layout differs between X86_64 and x86_32 AND the vgtod is
>not inside the accessible address space of a 32 bit program, it is
>CURRENTLY not possible to provide a 32 bit vclock with a 64 bit kernel
>
>As i understand VDSO it is an alternativ way, so if there is no support
>for it, there must be a fall back to the system call interface in the
>program or lib, which tries to use it.
>
>So there is no drawback for 32 bit programs running on a 64 bit kernel.
>
>
>I think this feature is not so important and can implemented in a
>subsequent patch, because a 64 bit kernel system mostly runs 64 bit
>programs. Let us fix this things step by step.
>
>- Stefani

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-12 Thread H. Peter Anvin
No, let's not.  Why?  Because if we do that we may inadvertently create an ABI 
which is hard to support across the board.

Stefani Seibold  wrote:

>Am Mittwoch, den 12.12.2012, 15:34 -0800 schrieb H. Peter Anvin:
>> On 12/12/2012 12:19 PM, stef...@seibold.net wrote:
>> > diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c
>b/arch/x86/vdso/vdso32/vclock_gettime.c
>> > new file mode 100644
>> > index 000..c9a1909
>> > --- /dev/null
>> > +++ b/arch/x86/vdso/vdso32/vclock_gettime.c
>> > @@ -0,0 +1,7 @@
>> > +/*
>> > + * since vgtod layout differs between X86_64 and x86_32, it is not
>possible to
>> > + * provide a 32 bit vclock with a 64 bit kernel
>> > + */
>> > +#ifdef CONFIG_X86_32
>> > +#include "../vclock_gettime.c"
>> > +#endif
>> 
>> This is where this goes fail.  Sorry, it is not acceptable to
>introduce 
>> an ABI on x86-32 without providing it also on x86-64 in compatibility
>mode.
>> 
>> I also don't believe it is not possible... it might require some more
>
>> cleverness; perhaps we need to do the 32-bit vgtod in such a way that
>it 
>> *is* compatible with 64 bits.
>> 
>
>The comment is ambiguous:
>
>Since vgtod layout differs between X86_64 and x86_32 AND the vgtod is
>not inside the accessible address space of a 32 bit program, it is
>CURRENTLY not possible to provide a 32 bit vclock with a 64 bit kernel
>
>As i understand VDSO it is an alternativ way, so if there is no support
>for it, there must be a fall back to the system call interface in the
>program or lib, which tries to use it.
>
>So there is no drawback for 32 bit programs running on a 64 bit kernel.
>
>
>I think this feature is not so important and can implemented in a
>subsequent patch, because a 64 bit kernel system mostly runs 64 bit
>programs. Let us fix this things step by step.
>
>- Stefani

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/8] mm: vmscan: disregard swappiness shortly before going OOM

2012-12-12 Thread Simon Jeons
On Wed, 2012-12-12 at 16:43 -0500, Johannes Weiner wrote:
> When a reclaim scanner is doing its final scan before giving up and
> there is swap space available, pay no attention to swappiness
> preference anymore.  Just swap.
> 

Confuse! If it's final scan and still swap space available, why nr[lru]
= div64_u64(scan * fraction[file], denominator); instead of nr[lru] =
scan; ? 

> Note that this change won't make too big of a difference for general
> reclaim: anonymous pages are already force-scanned when there is only
> very little file cache left, and there very likely isn't when the
> reclaimer enters this final cycle.
> 
> Signed-off-by: Johannes Weiner 
> ---
>  mm/vmscan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 3874dcb..6e53446 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1751,7 +1751,7 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>   unsigned long scan;
>  
>   scan = get_lru_size(lruvec, lru);
> - if (sc->priority || noswap || !vmscan_swappiness(sc)) {
> + if (sc->priority || noswap) {
>   scan >>= sc->priority;
>   if (!scan && force_scan)
>   scan = SWAP_CLUSTER_MAX;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v3 3/7] rtc: DA9055 RTC driver

2012-12-12 Thread Ashish Jangam
On Wed, 2012-12-12 at 13:11 -0800, Andrew Morton wrote:
> On Mon, 10 Dec 2012 11:12:29 +0530
> Ashish Jangam  wrote:
> 
> > On Wed, 2012-11-28 at 14:54 +0530, Ashish Jangam wrote:
> > > On Tue, 2012-11-27 at 14:23 -0800, Andrew Morton wrote:
> > > > On Fri, 23 Nov 2012 15:41:03 +0530
> > > > Ashish Jangam  wrote:
> > > > 
> > > > > On Tue, 2012-10-23 at 15:33 +0530, Ashish Jangam wrote:
> > > > > > Does this patch looks good?
> > > > > > On Thu, 2012-10-11 at 16:10 +0530, Ashish Jangam wrote:
> > > > > > > This is the RTC patch for the DA9055 PMIC. This patch has got 
> > > > > > > dependency on
> > > > > > > the DA9055 MFD core.
> > > > > > > 
> > > > > > > This patch is functionally tested on Samsung SMDKV6410.
> > > > 
> > > > "shubhro " made a comment, but it was 
> > > > ignored:
> > > > 
> > > > : On Thu, 11 Oct 2012 16:53:34 +0530
> > > > : shubhro  wrote:
> > > > : 
> > > > : > On Thursday 11 October 2012 04:10 PM, Ashish Jangam wrote:
> > > > : > > +#define da9055_rtc_freeze NULL
> > > > : > > +#endif
> > > > : > > +
> > > > : > > +static const struct dev_pm_ops da9055_rtc_pm_ops = {
> > > > : > > + .suspend = da9055_rtc_suspend,
> > > > : > > + .resume = da9055_rtc_resume,
> > > > : > > +
> > > > : > > + .freeze = da9055_rtc_freeze,
> > > > : > > + .thaw = da9055_rtc_resume,
> > > > : > > + .restore = da9055_rtc_resume,
> > > > : > > +
> > > > : > > + .poweroff = da9055_rtc_suspend,
> > > > : > > +};
> > > > : > You may want to use simple dev pm ops
> > > To support generic PM ops these additional ops were supported.
> > Are there any further comments on this patch.
> 
> Is the patch usable against current mainline?  "This patch has got
> dependency on the DA9055 MFD core".  What is the status of that?
The dependent DA9055 MFD core is merged.
> 
> If the patch is ready to go then please refresh it, retest and resend?
> 
Ok, I can test and resend it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-12 Thread Stefani Seibold
Am Mittwoch, den 12.12.2012, 15:34 -0800 schrieb H. Peter Anvin:
> On 12/12/2012 12:19 PM, stef...@seibold.net wrote:
> > diff --git a/arch/x86/vdso/vdso32/vclock_gettime.c 
> > b/arch/x86/vdso/vdso32/vclock_gettime.c
> > new file mode 100644
> > index 000..c9a1909
> > --- /dev/null
> > +++ b/arch/x86/vdso/vdso32/vclock_gettime.c
> > @@ -0,0 +1,7 @@
> > +/*
> > + * since vgtod layout differs between X86_64 and x86_32, it is not 
> > possible to
> > + * provide a 32 bit vclock with a 64 bit kernel
> > + */
> > +#ifdef CONFIG_X86_32
> > +#include "../vclock_gettime.c"
> > +#endif
> 
> This is where this goes fail.  Sorry, it is not acceptable to introduce 
> an ABI on x86-32 without providing it also on x86-64 in compatibility mode.
> 
> I also don't believe it is not possible... it might require some more 
> cleverness; perhaps we need to do the 32-bit vgtod in such a way that it 
> *is* compatible with 64 bits.
> 

The comment is ambiguous:

Since vgtod layout differs between X86_64 and x86_32 AND the vgtod is
not inside the accessible address space of a 32 bit program, it is
CURRENTLY not possible to provide a 32 bit vclock with a 64 bit kernel

As i understand VDSO it is an alternativ way, so if there is no support
for it, there must be a fall back to the system call interface in the
program or lib, which tries to use it.

So there is no drawback for 32 bit programs running on a 64 bit kernel. 

I think this feature is not so important and can implemented in a
subsequent patch, because a 64 bit kernel system mostly runs 64 bit
programs. Let us fix this things step by step.

- Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: linux-next: manual merge of the l2-mtd tree with Linus' tree

2012-12-12 Thread Kumar, Anil
On Thu, Dec 13, 2012 at 07:07:55, Stephen Rothwell wrote:
> Hi Artem,
> 
> Today's linux-next merge of the l2-mtd tree got a conflict in
> Documentation/devicetree/bindings/arm/davinci/nand.txt between commit
> fed16bba8726 ("mtd: nand: davinci: fix the binding documentation") from
> Linus' tree and commit 192afdbfbc5c ("mtd: davinci: add support for
> parition binding nodes") from the l2-mtd tree.
> 
> I fixed it up (maybe- see below) and can carry the fix as necessary (no
> action is required).
> 
> -- 
> Cheers,
> Stephen Rothwells...@canb.auug.org.au
> 
> diff --cc Documentation/devicetree/bindings/arm/davinci/nand.txt
> index 49fc7ad,4746452..000
> --- a/Documentation/devicetree/bindings/arm/davinci/nand.txt
> +++ b/Documentation/devicetree/bindings/arm/davinci/nand.txt
> @@@ -23,16 -23,37 +23,24 @@@ Recommended properties 
>   - ti,davinci-nand-buswidth: buswidth 8 or 16
>   - ti,davinci-nand-use-bbt: use flash based bad block table support.
>   
> + nand device bindings may contain additional sub-nodes describing
> + partitions of the address space. See partition.txt for more detail.
> + 
>  -Example (enbw_cmc board):
>  -aemif@6000 {
>  -compatible = "ti,davinci-aemif";
>  -#address-cells = <2>;
>  -#size-cells = <1>;
>  -reg = <0x6800 0x8>;
>  -ranges = <2 0 0x6000 0x0200
>  -  3 0 0x6200 0x0200
>  -  4 0 0x6400 0x0200
>  -  5 0 0x6600 0x0200
>  -  6 0 0x6800 0x0200>;
>  -nand@3,0 {
>  -compatible = "ti,davinci-nand";
>  -reg = <3 0x0 0x807ff
>  -6 0x0 0x8000>;
>  -#address-cells = <1>;
>  -#size-cells = <1>;
>  -ti,davinci-chipselect = <1>;
>  -ti,davinci-mask-ale = <0>;
>  -ti,davinci-mask-cle = <0>;
>  -ti,davinci-mask-chipsel = <0>;
>  -ti,davinci-ecc-mode = "hw";
>  -ti,davinci-ecc-bits = <4>;
>  -ti,davinci-nand-use-bbt;
>  +Example(da850 EVM ):
>  +nand_cs3@6200 {
>  +compatible = "ti,davinci-nand";
>  +reg = <0x6200 0x807ff
>  +0x6800 0x8000>;
>  +ti,davinci-chipselect = <1>;
>  +ti,davinci-mask-ale = <0>;
>  +ti,davinci-mask-cle = <0>;
>  +ti,davinci-mask-chipsel = <0>;
>  +ti,davinci-ecc-mode = "hw";
>  +ti,davinci-ecc-bits = <4>;
>  +ti,davinci-nand-use-bbt;
> + 
>  -partition@18 {
>  -label = "ubifs";
>  -reg = <0x18 0x7e8>;
>  -};
> ++partition@18 {
> ++label = "ubifs";
> ++reg = <0x18 0x7e8>;

partition@18 is sub-node of nand_cs3@6200.
nand_cs3@6200 needs to use below properties 

#address-cells = <1>;
#size-cells = <1>;

Without these properties DT build will give reg format Warning. 

> + };
>   };
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/8] mm: memcg: only evict file pages when we have plenty

2012-12-12 Thread Simon Jeons
On Wed, 2012-12-12 at 16:53 -0500, Rik van Riel wrote:
> On 12/12/2012 04:43 PM, Johannes Weiner wrote:
> > dc0422c "mm: vmscan: only evict file pages when we have plenty" makes
> > a point of not going for anonymous memory while there is still enough
> > inactive cache around.
> >
> > The check was added only for global reclaim, but it is just as useful
> > for memory cgroup reclaim.
> >
> > Signed-off-by: Johannes Weiner 
> > ---
> >   mm/vmscan.c | 19 ++-
> >   1 file changed, 10 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 157bb11..3874dcb 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1671,6 +1671,16 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> > denominator = 1;
> > goto out;
> > }
> > +   /*
> > +* There is enough inactive page cache, do not reclaim
> > +* anything from the anonymous working set right now.
> > +*/
> > +   if (!inactive_file_is_low(lruvec)) {
> > +   fraction[0] = 0;
> > +   fraction[1] = 1;
> > +   denominator = 1;
> > +   goto out;
> > +   }
> >
> > anon  = get_lru_size(lruvec, LRU_ACTIVE_ANON) +
> > get_lru_size(lruvec, LRU_INACTIVE_ANON);
> > @@ -1688,15 +1698,6 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> > fraction[1] = 0;
> > denominator = 1;
> > goto out;
> > -   } else if (!inactive_file_is_low_global(zone)) {
> > -   /*
> > -* There is enough inactive page cache, do not
> > -* reclaim anything from the working set right now.
> > -*/
> > -   fraction[0] = 0;
> > -   fraction[1] = 1;
> > -   denominator = 1;
> > -   goto out;
> > }
> > }
> >
> >
> 
> I believe the if() block should be moved to AFTER
> the check where we make sure we actually have enough
> file pages.

Where check enough file pages? 
if (unlikely(file + free <= high_wmark_pages(zone))), correct?

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/8] mm: memcg: only evict file pages when we have plenty

2012-12-12 Thread Simon Jeons
On Wed, 2012-12-12 at 16:43 -0500, Johannes Weiner wrote:
> dc0422c "mm: vmscan: only evict file pages when we have plenty" makes

Can't find dc0422c.

> a point of not going for anonymous memory while there is still enough
> inactive cache around.
> 
> The check was added only for global reclaim, but it is just as useful
> for memory cgroup reclaim.
> 
> Signed-off-by: Johannes Weiner 
> ---
>  mm/vmscan.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 157bb11..3874dcb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1671,6 +1671,16 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>   denominator = 1;
>   goto out;
>   }
> + /*
> +  * There is enough inactive page cache, do not reclaim
> +  * anything from the anonymous working set right now.
> +  */
> + if (!inactive_file_is_low(lruvec)) {
> + fraction[0] = 0;
> + fraction[1] = 1;
> + denominator = 1;
> + goto out;
> + }
>  
>   anon  = get_lru_size(lruvec, LRU_ACTIVE_ANON) +
>   get_lru_size(lruvec, LRU_INACTIVE_ANON);
> @@ -1688,15 +1698,6 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>   fraction[1] = 0;
>   denominator = 1;
>   goto out;
> - } else if (!inactive_file_is_low_global(zone)) {
> - /*
> -  * There is enough inactive page cache, do not
> -  * reclaim anything from the working set right now.
> -  */
> - fraction[0] = 0;
> - fraction[1] = 1;
> - denominator = 1;
> - goto out;
>   }
>   }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc: added DSCR support to ptrace

2012-12-12 Thread Alexey Kardashevskiy
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.

The kernel already supports DSCR value per thread but there is also
a need in a ability to change it from an external process for
the specific pid.

The patch adds new register index PT_DSCR (index=44) which can be
set/get by:
  ptrace(PTRACE_POKEUSER, traced_process, PT_DSCR << 3, dscr);
  dscr = ptrace(PTRACE_PEEKUSER, traced_process, PT_DSCR << 3, NULL);

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/ptrace.h |1 +
 arch/powerpc/kernel/ptrace.c  |   17 +
 2 files changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 9c21ed4..340fe36 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -276,6 +276,7 @@ static inline unsigned long 
regs_get_kernel_stack_nth(struct pt_regs *regs,
 #define PT_DAR 41
 #define PT_DSISR 42
 #define PT_RESULT 43
+#define PT_DSCR 44
 #define PT_REGS_COUNT 44
 
 #define PT_FPR048  /* each FP reg occupies 2 slots in this space */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index c10fc28..aa19389 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -179,6 +179,18 @@ static int set_user_msr(struct task_struct *task, unsigned 
long msr)
return 0;
 }
 
+static unsigned long get_user_dscr(struct task_struct *task)
+{
+   return task->thread.dscr;
+}
+
+static int set_user_dscr(struct task_struct *task, unsigned long dscr)
+{
+   task->thread.dscr = dscr;
+   task->thread.dscr_inherit = 1;
+   return 0;
+}
+
 /*
  * We prevent mucking around with the reserved area of trap
  * which are used internally by the kernel.
@@ -200,6 +212,9 @@ unsigned long ptrace_get_reg(struct task_struct *task, int 
regno)
if (regno == PT_MSR)
return get_user_msr(task);
 
+   if (regno == PT_DSCR)
+   return get_user_dscr(task);
+
if (regno < (sizeof(struct pt_regs) / sizeof(unsigned long)))
return ((unsigned long *)task->thread.regs)[regno];
 
@@ -218,6 +233,8 @@ int ptrace_put_reg(struct task_struct *task, int regno, 
unsigned long data)
return set_user_msr(task, data);
if (regno == PT_TRAP)
return set_user_trap(task, data);
+   if (regno == PT_DSCR)
+   return set_user_dscr(task, data);
 
if (regno <= PT_MAX_PUT_REG) {
((unsigned long *)task->thread.regs)[regno] = data;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-12 Thread H. Peter Anvin

On 12/12/2012 09:12 PM, H. Peter Anvin wrote:

Here is a version that compiles.  It doesn't *boot* yet, because the
switchover from dynamic mode to the real pagetables doesn't happen right
and so we end up on an uninitialized set of page tables.

The new page table setup in tip:x86/mm2 should make that easier to
achieve, however... I won't have time to test this out tonight, though.

 -hpa


Well, minus a simple brainfart now it actually gets into the page table 
setup.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include 
+
 #ifndef __ASSEMBLY__
 #include 
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END  _AC(0xff00, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES  64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..9443c77 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include 
 #include 
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-   pgd_t *pgd = pgd_offset_k(0UL);
-   pgd_clear(pgd);
-   __flush_tlb_all();
+   unsigned long i;
+
+   for (i = 0; i < PTRS_PER_PGD-1; i++)
+   early_level4_pgt[i].pgd = 0;
+
+   next_early_pgt = 0;
+   early_pgt_resets++;
+
+   __native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   unsigned long i;
+   pgdval_t pgd, *pgd_p;
+   pudval_t *pud_p;
+   pmdval_t pmd, *pmd_p;
+
+   if (physaddr >= MAXMEM)
+   return -1;  /* Invalid address - puke */
+
+   i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+   pgd_p = _level4_pgt[i].pgd;
+   pgd = *pgd_p;
+
+   /*
+* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+* critical -- __PAGE_OFFSET would point us back into the dynamic
+* range and we might end up looping forever...
+*/
+   if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+   pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+   } else {
+   if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+   reset_early_page_tables();
+
+   pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+   for (i = 0; i < PTRS_PER_PUD; i++)
+   pud_p[i] = 0;
+
+   *pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+   }
+   i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+   pud_p += i;
+
+   pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+   pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_p[i] = pmd;
+   pmd += PMD_SIZE;
+   }
+
+   *pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+   return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not 
initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+   /* Kill off the identity-map trampoline */
+   reset_early_page_tables();
+
/* clear bss before set_intr_gate with early_idt_handler */
clear_bss();
 
-   /* Make NULL pointers segfault */
-   zap_identity_mappings();
-
+   /* XXX - this is wrong... we need to build page tables from scratch */
max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..d539692 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
.code64
.globl startup_64
 startup_64:
-
/*
 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 * and someone has loaded an identity mapped page table

Re: [PATCH 1/1] Drivers: hv: balloon: Fix a memory leak

2012-12-12 Thread Jason Wang
On 12/12/2012 03:07 AM, K. Y. Srinivasan wrote:
> The send buffer was being leaked; fix it.
>
> Signed-off-by: K. Y. Srinivasan 
> Reviewed-by: Haiyang Zhang 
> Reported-by: Jason Wang 
> ---
>  drivers/hv/hv_balloon.c |   23 +--
>  1 files changed, 13 insertions(+), 10 deletions(-)

Thanks.

Acked-by: Jason Wang 
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 00c8447..c02a33e 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -882,7 +882,7 @@ static int balloon_probe(struct hv_device *dev,
>   balloon_onchannelcallback, dev);
>  
>   if (ret)
> - return ret;
> + goto probe_error0;
>  
>   dm_device.dev = dev;
>   dm_device.state = DM_INITIALIZING;
> @@ -894,7 +894,7 @@ static int balloon_probe(struct hv_device *dev,
>kthread_run(dm_thread_func, _device, "hv_balloon");
>   if (IS_ERR(dm_device.thread)) {
>   ret = PTR_ERR(dm_device.thread);
> - goto probe_error0;
> + goto probe_error1;
>   }
>  
>   hv_set_drvdata(dev, _device);
> @@ -917,12 +917,12 @@ static int balloon_probe(struct hv_device *dev,
>   VM_PKT_DATA_INBAND,
>   VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
>   if (ret)
> - goto probe_error1;
> + goto probe_error2;
>  
>   t = wait_for_completion_timeout(_device.host_event, 5*HZ);
>   if (t == 0) {
>   ret = -ETIMEDOUT;
> - goto probe_error1;
> + goto probe_error2;
>   }
>  
>   /*
> @@ -931,7 +931,7 @@ static int balloon_probe(struct hv_device *dev,
>*/
>   if (dm_device.state == DM_INIT_ERROR) {
>   ret = -ETIMEDOUT;
> - goto probe_error1;
> + goto probe_error2;
>   }
>   /*
>* Now submit our capabilities to the host.
> @@ -964,12 +964,12 @@ static int balloon_probe(struct hv_device *dev,
>   VM_PKT_DATA_INBAND,
>   VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
>   if (ret)
> - goto probe_error1;
> + goto probe_error2;
>  
>   t = wait_for_completion_timeout(_device.host_event, 5*HZ);
>   if (t == 0) {
>   ret = -ETIMEDOUT;
> - goto probe_error1;
> + goto probe_error2;
>   }
>  
>   /*
> @@ -978,18 +978,20 @@ static int balloon_probe(struct hv_device *dev,
>*/
>   if (dm_device.state == DM_INIT_ERROR) {
>   ret = -ETIMEDOUT;
> - goto probe_error1;
> + goto probe_error2;
>   }
>  
>   dm_device.state = DM_INITIALIZED;
>  
>   return 0;
>  
> -probe_error1:
> +probe_error2:
>   kthread_stop(dm_device.thread);
>  
> -probe_error0:
> +probe_error1:
>   vmbus_close(dev->channel);
> +probe_error0:
> + kfree(send_buffer);
>   return ret;
>  }
>  
> @@ -1002,6 +1004,7 @@ static int balloon_remove(struct hv_device *dev)
>  
>   vmbus_close(dev->channel);
>   kthread_stop(dm->thread);
> + kfree(send_buffer);
>  
>   return 0;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NTP: Add a CONFIG_RTC_SYSTOHC configuration

2012-12-12 Thread Jason Gunthorpe
On Wed, Dec 12, 2012 at 04:18:31PM -0800, John Stultz wrote:

> I do, although again, in the case where the arch specific
> implementation is "better", we end up losing granularity (s390 is
> the specific example I'm thinking of), since this prefers the RTC
> implementation over the arch specific one.  So maybe I'd suggest
> switching it so we prefer the arch specific one, and then remove the
> arch specific implementations where they are inferior to the RTC
> one.

Unfortunately I put rtc_update_persistent_clock first because it can
have sensible error reporting. update_persistent_clock will return 0
if there is no RTC device available, or if the RTC was successfully
updated.

I can make rtc_update_persistent_clock return -ENODEV.

> As long as we have a clear iterative path forward, with a solution
> for the cases where the arch method is actually preferred, I think
> it sounds like a reasonable approach.

I think it is fine to leave it as a configure option, archs can
default it to yes when it is appropriate for them.

A quick scan through the 3.7 tree for update_persistent_clock::
 alpha - single non class RTC clock scheme
 cris - printk's and does nothing
 mips - weak function rtc_mips_set_time, all implementations are
non class rtc
 mn10300 - single non class RTC clock scheme
 powerpc - indirects through ppc_md.set_rtc_time, all implementations
   do not use class RTC
 sh - indirects through rtc_sh_set_time, two implementations, neither
  use class RTC
 sparc - Seems to be class rtc converted. Already implements this
 patch's functionality in an arch specific file
 x86 - All the functions under the set_wallclock indirection seem
   to be non class RTC cases

No other arches seem to have update_persistent_clock in them.

I think the s390 functionality you are refering to is in its
read_persistant_clock, which looks like it has ns resolution.

That is also fine because s390 does not use class rtc so there is no
duplicate path to the 'tod' code either through
rtc_update_persistent_clock or through rtc_hctosys.

Basically, as far as I can tell, if rtc_update_persistent_clock
succeeds then update_persistent_clock is a nop. I can't find any case
where *both* could succeed. Thus trying rtc_update_persistent_clock
first is OK.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] regulator: core: if voltage scaling fails, restore original

2012-12-12 Thread Mark Brown
On Wed, Dec 12, 2012 at 12:45:52PM +0100, Paolo Pisati wrote:
> And after a second look it's clear what's going on:

After a second look at what?  You've not provided any context, I've no
idea what you're talking about here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

2012-12-12 Thread H. Peter Anvin
Here is a version that compiles.  It doesn't *boot* yet, because the 
switchover from dynamic mode to the real pagetables doesn't happen right 
and so we end up on an uninitialized set of page tables.


The new page table setup in tip:x86/mm2 should make that easier to 
achieve, however... I won't have time to test this out tonight, though.


-hpa

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 766ea16..2d88344 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_PGTABLE_64_DEFS_H
 #define _ASM_X86_PGTABLE_64_DEFS_H
 
+#include 
+
 #ifndef __ASSEMBLY__
 #include 
 
@@ -60,4 +62,6 @@ typedef struct { pteval_t pte; } pte_t;
 #define MODULES_END  _AC(0xff00, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
+#define EARLY_DYNAMIC_PAGE_TABLES  64
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..9443c77 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,11 +26,73 @@
 #include 
 #include 
 
-static void __init zap_identity_mappings(void)
+/*
+ * Manage page tables very early on.
+ */
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+static unsigned int __initdata next_early_pgt = 2, early_pgt_resets = 0;
+
+/* Wipe all early page tables except for the kernel symbol map */
+static void __init reset_early_page_tables(void)
 {
-   pgd_t *pgd = pgd_offset_k(0UL);
-   pgd_clear(pgd);
-   __flush_tlb_all();
+   unsigned long i;
+
+   for (i = 0; i < PTRS_PER_PGD-1; i++)
+   early_level4_pgt[i].pgd = 0;
+
+   next_early_pgt = 0;
+   early_pgt_resets++;
+
+   __native_flush_tlb();
+}
+
+/* Create a new PMD entry */
+int __init early_make_pgtable(unsigned long address)
+{
+   unsigned long physaddr = address - __PAGE_OFFSET;
+   unsigned long i;
+   pgdval_t pgd, *pgd_p;
+   pudval_t *pud_p;
+   pmdval_t pmd, *pmd_p;
+
+   if (physaddr >= MAXMEM)
+   return -1;  /* Invalid address - puke */
+
+   i = (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
+   pgd_p = _level4_pgt[i].pgd;
+   pgd = *pgd_p;
+
+   /*
+* The use of __START_KERNEL_map rather than __PAGE_OFFSET here is
+* critical -- __PAGE_OFFSET would point us back into the dynamic
+* range and we might end up looping forever...
+*/
+   if (pgd && next_early_pgt < EARLY_DYNAMIC_PAGE_TABLES) {
+   pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map);
+   } else {
+   if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES-1)
+   reset_early_page_tables();
+
+   pud_p = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+   for (i = 0; i < PTRS_PER_PUD; i++)
+   pud_p[i] = 0;
+
+   *pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + _KERNPG_TABLE;
+   }
+   i = (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+   pud_p += i;
+
+   pmd_p = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
+   pmd = (physaddr & PUD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL);
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd_p[i] = pmd;
+   pmd += PMD_SIZE;
+   }
+
+   *pud_p = (pudval_t)pmd_p - __START_KERNEL_map + _KERNPG_TABLE;
+
+   return 0;
 }
 
 /* Don't add a printk in there. printk relies on the PDA which is not 
initialized 
@@ -70,12 +132,13 @@ void __init x86_64_start_kernel(char * real_mode_data)
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
 
+   /* Kill off the identity-map trampoline */
+   reset_early_page_tables();
+
/* clear bss before set_intr_gate with early_idt_handler */
clear_bss();
 
-   /* Make NULL pointers segfault */
-   zap_identity_mappings();
-
+   /* XXX - this is wrong... we need to build page tables from scratch */
max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
 
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..0e040b3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -47,14 +47,13 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
.code64
.globl startup_64
 startup_64:
-
/*
 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
 * and someone has loaded an identity mapped page table
 * for us.  These identity mapped page tables map all of the
 * kernel pages and possibly all of memory.
 *
-* %esi holds a physical pointer to real_mode_data.
+* %rsi holds a physical pointer to real_mode_data.

Re: [PATCH] mm/hugetlb: create hugetlb cgroup file in hugetlb_init

2012-12-12 Thread Simon Jeons
On Wed, 2012-12-12 at 12:23 +0100, Michal Hocko wrote:
> On Wed 12-12-12 18:44:13, Xishi Qiu wrote:
> > On 2012/12/12 18:19, Michal Hocko wrote:
> > 
> > > On Wed 12-12-12 16:25:59, Jianguo Wu wrote:
> > >> Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y
> > >> and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option,
> > >> system will boot fail.
> > >>
> > >> This failure is caused by following code path:
> > >> setup_hugepagesz
> > >>  hugetlb_add_hstate
> > >>  hugetlb_cgroup_file_init
> > >>  cgroup_add_cftypes
> > >>  kzalloc <--slab is *not available* yet
> > >>
> > >> For this path, slab is not available yet, so memory allocated will be
> > >> failed, and cause WARN_ON() in hugetlb_cgroup_file_init().
> > >>
> > >> So I move hugetlb_cgroup_file_init() into hugetlb_init().
> > > 
> > > I do not think this is a good idea. hugetlb_init is in __init section as
> > > well so what guarantees that the slab is initialized by then? Isn't this
> > > just a good ordering that makes this working?
> > 
> > Hi Michal,
> > 
> > __initcall functions will be called in
> > start_kernel()
> > rest_init()  // -> slab is already
> > kernel_init()
> > kernel_init_freeable()
> > do_basic_setup()
> > do_initcalls()
> > 
> > and setup_hugepagesz() will be called in
> > start_kernel()
> > parse_early_param()  // -> before mm_init() -> kmem_cache_init()
> > 
> > Is this right?
> 
> Yes this is right. I just noticed that kmem_cache_init_late is an __init
> function as well and didn't realize it is called directly. Sorry about
> the confusion.
> Anyway I still think it would be a better idea to move the call into the
> hugetlb_cgroup_create callback where it is more logical IMO but now that
> I'm looking at other controllers (blk and kmem.tcp) they all do this from
> init calls as well. So it doesn't make sense to have hugetlb behave
> differently.

Which callback in cgroup_subsys should hugetlb_cgroup_create associated?
Currently, there is no such callback.  

> 
> So
> Acked-by: Michal Hocko 
> 
> Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] regulator: core: if voltage scaling fails, restore original

2012-12-12 Thread Paul Walmsley
On Thu, 13 Dec 2012, Paul Walmsley wrote:

> Seems to me, naïvely, that in the above code, regulator->min_uV and 
> regulator->max_uV should be set only after _regulator_do_set_voltage() 
> succeeds?

Eh, never mind.  Looks like you took a similar strategy in the subsequent 
patch you sent..


- Paul

Re: [PATCH] regulator: core: if voltage scaling fails, restore original

2012-12-12 Thread Paul Walmsley
On Wed, 12 Dec 2012, Paolo Pisati wrote:

> but inside regulator_set_voltage(), we save the new regulator voltage before
> actually ramping up:
> 
> core.c::regulator_set_voltage():
>   ...
> regulator->min_uV = min_uV;
> regulator->max_uV = max_uV;
> 
> ret = regulator_check_consumers(rdev, _uV, _uV);
> if (ret < 0)
> goto out2;
> 
> ret = _regulator_do_set_voltage(rdev, min_uV, max_uV);  <-- ERROR!!!
> if (ret < 0)
> goto out2;
>   ...

I'm not too familiar with this code.  But isn't this where the bug is, 
rather than in that optimization commit you mentioned?  Seems to me, 
naïvely, that in the above code, regulator->min_uV and regulator->max_uV 
should be set only after _regulator_do_set_voltage() succeeds?


- Paul

Re: [TRIVIAL PATCH 15/26] unicore32: Convert print_symbol to %pSR

2012-12-12 Thread guanxuetao
> Use the new vsprintf extension to avoid any possible
> message interleaving.
>
> Signed-off-by: Joe Perches 

Sorry for my last email, which should not reply all.

Acked-by: Guan Xuetao 

> ---
>  arch/unicore32/kernel/process.c |5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/unicore32/kernel/process.c
> b/arch/unicore32/kernel/process.c
> index 62bad9f..d17a893 100644
> --- a/arch/unicore32/kernel/process.c
> +++ b/arch/unicore32/kernel/process.c
> @@ -169,8 +169,9 @@ void __show_regs(struct pt_regs *regs)
>   init_utsname()->release,
>   (int)strcspn(init_utsname()->version, " "),
>   init_utsname()->version);
> - print_symbol("PC is at %s\n", instruction_pointer(regs));
> - print_symbol("LR is at %s\n", regs->UCreg_lr);
> + printk(KERN_DEFAULT "PC is at %pSR\n",
> +(void *)instruction_pointer(regs));
> + printk(KERN_DEFAULT "LR is at %pSR\n", (void *)regs->UCreg_lr);
>   printk(KERN_DEFAULT "pc : [<%08lx>]lr : [<%08lx>]psr: %08lx\n"
>  "sp : %08lx  ip : %08lx  fp : %08lx\n",
>   regs->UCreg_pc, regs->UCreg_lr, regs->UCreg_asr,
> --
> 1.7.8.112.g3fd21
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] (xen) stable/for-linus-3.8-rc0-tag

2012-12-12 Thread Konrad Rzeszutek Wilk
Hey Linus,

Please git pull the following tag:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
stable/for-linus-3.8-rc0-tag

or branch (this is included in this email b/c I keep on getting:
"fatal: Couldn't find remote ref stable/for-linus-3.8-rc0-tag"
even though I _know_ that the tag is there?! I can see it on the
website and when I do git fetch/clone). Anyhow if you get the same
message, please git pull the branch:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
stable/for-linus-3.8

which has 
Features:
 - Add necessary infrastructure to make balloon driver work under ARM.
 - Add /dev/xen/privcmd interfaces to work with ARM and PVH.
 - Improve Xen PCIBack wild-card parsing.
 - Add Xen ACPI PAD (Processor Aggregator) support - so can offline/online
   sockets depending on the power consumption.
 - PVHVM + kexec = use an E820_RESV region for the shared region so we don't
   overwrite said region during kexec reboot.
 - Cleanups, compile fixes.


This merge will result in conflicts - there were some ARM patches in
v3.7-rcX that this branch does not contain. The recipe is that the
balloon drivers (drivers/xen/balloon.c) can now compile under ARM, so
the temporary functions added in arch/arm/xen/enlighten.c. can be
ripped out (so these functions are axed: alloc_xenballooned_pages and
free_xenballooned_pages). Consequently the drivers/xen/Makefile
can now have the balloon.o on the generic build target.

To illustrate this I've created a branch called 'for-linus-3.8.resolved'
(http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commitdiff;h=32423604549179b47d4d31c880c5fc45b0903543)
and also included as a big diff in this email.

Please pull!

 arch/arm/include/asm/xen/interface.h |1 +
 arch/arm/xen/enlighten.c |  123 +++
 arch/x86/include/asm/xen/interface.h |1 +
 arch/x86/xen/Kconfig |1 +
 arch/x86/xen/enlighten.c |  102 ++-
 arch/x86/xen/mmu.c   |   17 +++-
 arch/x86/xen/suspend.c   |2 +-
 arch/x86/xen/xen-ops.h   |2 +-
 drivers/xen/Kconfig  |3 +
 drivers/xen/Makefile |7 +-
 drivers/xen/balloon.c|5 +-
 drivers/xen/privcmd.c|   72 +-
 drivers/xen/xen-acpi-pad.c   |  182 ++
 drivers/xen/xen-pciback/pci_stub.c   |  118 --
 drivers/xen/xenbus/xenbus_xs.c   |1 -
 include/xen/interface/memory.h   |   44 -
 include/xen/interface/platform.h |   17 +++
 include/xen/xen-ops.h|9 ++-
 18 files changed, 591 insertions(+), 116 deletions(-)

Ian Campbell (5):
  xen: add pages parameter to xen_remap_domain_mfn_range
  xen: balloon: allow PVMMU interfaces to be compiled out
  xen: arm: enable balloon driver
  xen: correctly use xen_pfn_t in remap_domain_mfn_range.
  xen: arm: implement remap interfaces needed for privcmd mappings.

Jan Beulich (2):
  xen-pciback: simplify and tighten parsing of device IDs
  xen-pciback: reject out of range inputs

Konrad Rzeszutek Wilk (2):
  xen/acpi: Fix compile error by missing decleration for xen_domain.
  xen/acpi: Move the xen_running_on_version_or_later function.

Liu, Jinsong (2):
  xen/acpi: ACPI PAD driver
  xen/acpi: revert pad config check in xen_check_mwait

Mukesh Rathor (1):
  xen: privcmd: support autotranslated physmap guests.

Olaf Hering (2):
  xen PVonHVM: use E820_Reserved area for shared_info
  xen/PVonHVM: fix compile warning in init_hvm_pv_info

Sachin Kamat (1):
  xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h

diff --git a/arch/arm/include/asm/xen/interface.h 
b/arch/arm/include/asm/xen/interface.h
index 5000397..1151188 100644
--- a/arch/arm/include/asm/xen/interface.h
+++ b/arch/arm/include/asm/xen/interface.h
@@ -49,6 +49,7 @@ DEFINE_GUEST_HANDLE(void);
 DEFINE_GUEST_HANDLE(uint64_t);
 DEFINE_GUEST_HANDLE(uint32_t);
 DEFINE_GUEST_HANDLE(xen_pfn_t);
+DEFINE_GUEST_HANDLE(xen_ulong_t);
 
 /* Maximum number of virtual CPUs in multi-processor guests. */
 #define MAX_VIRT_CPUS 1
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index f576092..7a32976 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -17,6 +19,8 @@
 #include 
 #include 
 
+#include 
+
 struct start_info _xen_start_info;
 struct start_info *xen_start_info = &_xen_start_info;
 EXPORT_SYMBOL_GPL(xen_start_info);
@@ -29,6 +33,10 @@ struct shared_info *HYPERVISOR_shared_info = (void 
*)_dummy_shared_info;
 
 DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
 
+/* These are unused until we support booting "pre-ballooned" */
+unsigned long xen_released_pages;
+struct xen_memory_region 

Re: [PATCH] mm: add node physical memory range to sysfs

2012-12-12 Thread Dave Hansen
On 12/12/2012 06:03 PM, Davidlohr Bueso wrote:
> On Wed, 2012-12-12 at 17:48 -0800, Dave Hansen wrote:
>> But if we went and did it per-DIMM (showing which physical addresses and
>> NUMA nodes a DIMM maps to), wouldn't that be redundant with this
>> proposed interface?
> 
> If DIMMs overlap between nodes, then we wouldn't have an exact range for
> a node in question. Having both approaches would complement each other.

How is that possible?  If NUMA nodes are defined by distances from CPUs
to memory, how could a DIMM have more than a single distance to any
given CPU?

>> How do you plan to use this in practice, btw?
> 
> It started because I needed to recognize the address of a node to remove
> it from the e820 mappings and have the system "ignore" the node's
> memory.

Actually, now that I think about it, can you check in the
/sys/devices/system/ directories for memory and nodes?  We have linkages
there for each memory section to every NUMA node, and you can also
derive the physical address from the phys_index in each section.  That
should allow you to work out physical addresses for a given node.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Linus Torvalds
On Wed, Dec 12, 2012 at 8:31 PM, David Ahern  wrote:
>
>
> See commit 26a4f3c0. But that was not enough.

Why? Make the people who run virtualization do the extra work. Things
never worked for them anyway, so forcing *them* to set a flag to get a
working thing is sane.

Forcing everybody else to set a flag is insane. See?

Your "that was not enough" is insane. It's purely about which *default
convention* you choose. The "if (!event->attr.exclude_guest)" test is
the wrong default convention, and it *should* have been "if
(event->attr.include_guest)" with the virtualization people forced to
use "cycles:ppV".

Claiming that there is some hardware overrun is silly, since that's
totally *independent* of the choice of which way the flag works!

> Requiring exclude_guest was
> another required piece. If you want to see the discussion:
> https://lkml.org/lkml/2012/7/9/264

The only thing that discussion shows is that people were *AWARE* that
this was a stupid change. I see Peter pointing out that this breaks
peoples existing working setups.

You broke the WORKING case for old binaries in order to give an error
return in a case that NEVER EVEN WORKED with those binaries. Don't you
see how insane that is?

The 'H' flag is totally the wrong way around.  Exactly because it only
"fixes" a case that was already working, and makes a case that never
worked anyway now return an error value. That's not sane. Since the
old broken case never worked, nobody can have depended on it. See why
I'm saying that it's the people who use virtualization who should be
forced to use the new flag, not the other way around?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86, doc: Add a formal bootloader ID for kexec-tools

2012-12-12 Thread tip-bot for H. Peter Anvin
Commit-ID:  4bfe24ac1d9d9f06b30b2909f53ea462c72cfe1d
Gitweb: http://git.kernel.org/tip/4bfe24ac1d9d9f06b30b2909f53ea462c72cfe1d
Author: H. Peter Anvin 
AuthorDate: Wed, 12 Dec 2012 20:24:12 -0800
Committer:  H. Peter Anvin 
CommitDate: Wed, 12 Dec 2012 20:25:43 -0800

x86, doc: Add a formal bootloader ID for kexec-tools

Add bootloader id D for kexec-tools (it is kexec-tools, not the kexec
system call, that builds the data structures, therefore it is
kexec-tools which is the "bootloader" for this purpose.)

Signed-off-by: H. Peter Anvin 
Cc: Eric W. Biederman 
Cc: Yinghai Lu 
Link: http://lkml.kernel.org/r/50c95832.5030...@zytor.com
---
 Documentation/x86/boot.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 9efceff..c25a53e 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -381,6 +381,7 @@ Protocol:   2.00+
A  Gujin
B  Qemu
C  Arcturus Networks uCbootloader
+   D  kexec-tools
E  Extended (see ext_loader_type)
F  Special  (0xFF = undefined)
10  Reserved
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-12 Thread NeilBrown
On Thu, 13 Dec 2012 14:06:35 +1100 NeilBrown  wrote:

> > > + omap_dm_timer_enable(omap->dm_timer);
> > 
> > Do you need to call omap_dm_timer_enable here? _set_load and _set_match
> > will enable the timer. So this should not be necessary.
> 
> True.  That is what you get for copying someone else's code and not
> understanding it fully.

However  omap_dm_timer_write_counter *doesn't* enable the timer, and
explicitly checks that it is already runtime-enabled.

Does that mean I don't need to call omap_dm_timer_write_counter here?  Or
does it mean that I do need the enable/disable pair?

> 
> > 
> > > + omap_dm_timer_set_load(omap->dm_timer, autoreload, load_value);
> > > + omap_dm_timer_set_match(omap->dm_timer, enable, match_value);
> > > +
> > > + dev_dbg(chip->dev,
> > > + "load value: %#08x (%d), "
> > > + "match value: %#08x (%d)\n",
> > > + load_value, load_value,
> > > + match_value, match_value);
> > > +
> > > + omap_dm_timer_set_pwm(omap->dm_timer,
> > > +   !omap->polarity,
> > > +   toggle,
> > > +   trigger);
> > > +
> > > + /* Set the counter to generate an overflow event immediately. */
> > > +
> > > + omap_dm_timer_write_counter(omap->dm_timer, DM_TIMER_LOAD_MIN);
> > > +
> > > + /* Now that we're done configuring the dual-mode timer, disable it
> > > +  * again. We'll enable and start it later, when requested.
> > > +  */
> > > +
> > > + omap_dm_timer_disable(omap->dm_timer);
> > 
> > Similarly the disable should not be needed here either.
> > 


Thanks,
NeilBrown


signature.asc
Description: PGP signature


linux-next: reminder 2

2012-12-12 Thread Stephen Rothwell
Hi all,

If you have a tree that is included in linux-next, please clean it up
after it has been merged into your upstream (i.e. Linus' tree in most
cases).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpdSW35eyHwu.pgp
Description: PGP signature


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread David Ahern

On 12/12/12 8:51 PM, Linus Torvalds wrote:

SO WHY DON'T YOU JUST DO THAT THEN?

Disable PEBS on Vt-x enter and re-enable it on exit. End of story.
Exactly like you say.


See commit 26a4f3c0. But that was not enough. Requiring exclude_guest 
was another required piece. If you want to see the discussion: 
https://lkml.org/lkml/2012/7/9/264




But doing it this way was wrong. Switch that "exclude_guest" attribute
around, and admit that "H" was bogus, and that the right thing to do
was to add a "V" flag that sets the "force_guest" flag instead.


I understand this is annoying. Older binaries on newer kernels was the 
only case I could not fix. (I guess a message could be added kernel side 
to at least give a hint.) But the alternative -- based on code that has 
existed for some time -- is for older binaries to crash VMs.


David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: zram: fix invalid memory references during disk write

2012-12-12 Thread Nitin Gupta
On 12/11/2012 10:27 AM, Greg KH wrote:
> On Thu, Nov 29, 2012 at 10:45:09PM -0800, Nitin Gupta wrote:
>> Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
>> of incompressible pages") which caused invalid memory references
>> during disk write. Invalid references could occur in two cases:
>>  - Incoming data expands on compression: In this case, reference was
>> made to kunmap()'ed bio page.
>>  - Partial (non PAGE_SIZE) write with incompressible data: In this
>> case, reference was made to a kfree()'ed buffer.
>>
>> Fixes bug 50081:
>> https://bugzilla.kernel.org/show_bug.cgi?id=50081
>>
>> Upstream commit ID: c8f2f0d: zram: Fix handling of incompressible pages
>> Apply to versions: 3.6.5, 3.6.6, 3.6.7, 3.6.8
>>
>> Cc:  # staging-next: 37b51fd: zram: factor-out
>>  # zram_decompress_page() function
>> Signed-off-by: Nitin Gupta 
>> Reported-by: Mihail Kasadjikov 
>> Reported-by: Tomas M 
>> Reviewed-by: Minchan Kim 
>> ---
>>  drivers/staging/zram/zram_drv.c |   39 
>> ---
>>  1 file changed, 24 insertions(+), 15 deletions(-)
> 
> This patch fails to apply to 3.6.10:
> 
> patching file drivers/staging/zram/zram_drv.c
> Hunk #1 succeeded at 282 (offset 17 lines).
> Hunk #2 FAILED at 286.
> Hunk #3 succeeded at 319 (offset 17 lines).
> Hunk #4 succeeded at 340 (offset 17 lines).
> Hunk #5 succeeded at 353 (offset 17 lines).
> Hunk #6 succeeded at 368 (offset 17 lines).
> Hunk #7 succeeded at 385 (offset 17 lines).
> 1 out of 7 hunks FAILED -- saving rejects to file 
> drivers/staging/zram/zram_drv.c.rej
> 
> Please redo this patch and resend it if you wish for it to be applied to
> the 3.6-stable tree.
>

Please apply 37b51fd ("zram: factor-out zram_decompress_page()
function") before applying this patch. I also mentioned this in the
changelog description above in the format as specified in
stable_kernel_rules.txt but looks like I misunderstood that doc.

Thanks,
Nitin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec and struct boot_params

2012-12-12 Thread H. Peter Anvin

On 12/12/2012 06:49 PM, Yinghai Lu wrote:


Hi, Peter,

What's your decision about this?

Do you mean have one boot_params mask in initdata and AND that with
boot_params from bootloader
to clean not used bytes?

So later will not need to check
 if (boot_params.hdr.xloadflags & USE_EXT_BOOT_PARAMS)
?

I worked out other patches that remove kdump 896M limitation.
would like to post those patches to get more testing.
those are needed for bigger system with lots of pcie devices.



ping!



I still want to do what I mentioned before, because we need to not rely 
on the initialized/16-bit portion so much:


1. add a field in the uninitialized portion, call it "sentinel";
2. make sure the byte position corresponding to the "sentinel" field is
   nonzero in the bzImage file;
3. if the kernel boots up and sentinel is nonzero, erase those fields
   that you identified as uninitialized;
4. assign a proper boot loader ID to kexec, so we have a way of dealing
   with this kind of debacles in the future (that is what the
   bootloader ID is for: it gives us a way to work around
   bootloader-specific problems.)

We also need to formalize the 64-bit entry point properly, including all 
the entry conditions and so forth.  That needs to be documented.


Eric, any thoughts or additional opinions?

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/11] drivers/base: Add hotplug framework code

2012-12-12 Thread Greg KH
On Wed, Dec 12, 2012 at 09:02:45PM -0700, Toshi Kani wrote:
> On Wed, 2012-12-12 at 15:54 -0800, Greg KH wrote:
> > On Wed, Dec 12, 2012 at 04:17:14PM -0700, Toshi Kani wrote:
> > > Added hotplug.c, which is the hotplug framework code.
> > 
> > Again, better naming please.
> 
> Yes, I will change it to be more specific, something like
> "sys_hotplug.c".

Ugh, what's wrong with just a simple "system_bus.c" or something like
that, and then put all of the needed system bus logic in there and tie
the cpus and other sysdev code into that?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] Hot-plug and Online/Offline framework

2012-12-12 Thread Greg KH
On Wed, Dec 12, 2012 at 08:37:44PM -0700, Toshi Kani wrote:
> On Wed, 2012-12-12 at 16:55 -0800, Greg KH wrote:
> > On Wed, Dec 12, 2012 at 05:39:36PM -0700, Toshi Kani wrote:
> > > On Wed, 2012-12-12 at 15:56 -0800, Greg KH wrote:
> > > > On Wed, Dec 12, 2012 at 04:17:12PM -0700, Toshi Kani wrote:
> > > > > This patchset is an initial prototype of proposed hot-plug framework
> > > > > for design review.  The hot-plug framework is designed to provide 
> > > > > the common framework for hot-plugging and online/offline operations
> > > > > of system devices, such as CPU, Memory and Node.  While this patchset
> > > > > only supports ACPI-based hot-plug operations, the framework itself is
> > > > > designed to be platform-neural and can support other FW architectures
> > > > > as necessary.
> > > > > 
> > > > > The patchset has not been fully tested yet, esp. for memory hot-plug.
> > > > > Any help for testing will be very appreciated since my test setup
> > > > > is limited.
> > > > > 
> > > > > The patchset is based on the linux-next branch of linux-pm.git tree.
> > > > > 
> > > > > Overview of the Framework
> > > > > =
> > > > 
> > > > 
> > > > 
> > > > Why all the new framework, doesn't the existing bus infrastructure
> > > > provide everything you need here?  Shouldn't you just be putting your
> > > > cpus and memory sticks on a bus and handle stuff that way?  What makes
> > > > these types of devices so unique from all other devices that Linux has
> > > > been handling in a dynamic manner (i.e. hotplugging them) for many many
> > > > years?
> > > > 
> > > > Why are you reinventing the wheel?
> > > 
> > > Good question.  Yes, USB and PCI hotplug operate based on their bus
> > > structures.  USB and PCI cards only work under USB and PCI bus
> > > controllers.  So, their framework can be composed within the bus
> > > structures as you pointed out.
> > > 
> > > However, system devices such CPU and memory do not have their standard
> > > bus.  ACPI allows these system devices to be enumerated, but it does not
> > > make ACPI as the HW bus hierarchy for CPU and memory, unlike PCI and
> > > USB.  Therefore, CPU and memory modules manage CPU and memory outside of
> > > ACPI.  This makes sense because CPU and memory can be used without ACPI.
> > > 
> > > This leads us an issue when we try to manage system device hotplug
> > > within ACPI, because ACPI does not control everything.  This patchset
> > > provides a common hotplug framework for system devices, which both ACPI
> > > and non-ACPI modules (i.e. CPU and memory modules) can participate and
> > > are coordinated for their hotplug operations.  This is analogous to the
> > > boot-up sequence, which ACPI and non-ACPI modules can participate to
> > > enable CPU and memory.
> > 
> > Then create a "virtual" bus and put the devices you wish to control on
> > that.  That is what the "system bus" devices were supposed to be, it's
> > about time someone took that code and got it all working properly in
> > this way, that is why it was created oh so long ago.
> 
> It may be the ideal, but it will take us great effort to make such
> things to happen based on where we are now.  It is going to be a long
> way.  I believe the first step is to make the boot-up flow and hot-plug
> flow consistent for system devices.  This is what this patchset is
> trying to do.

If you use the system "bus" for this, the "flow" will be identical, that
is what the driver core provides for you.  I don't see why you need to
implement something that sits next to it and not just use what we
already have here.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/11] drivers/base: Add hotplug framework code

2012-12-12 Thread Toshi Kani
On Wed, 2012-12-12 at 15:54 -0800, Greg KH wrote:
> On Wed, Dec 12, 2012 at 04:17:14PM -0700, Toshi Kani wrote:
> > Added hotplug.c, which is the hotplug framework code.
> 
> Again, better naming please.

Yes, I will change it to be more specific, something like
"sys_hotplug.c".

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 02/11] drivers/base: Add hotplug framework code

2012-12-12 Thread Toshi Kani
On Wed, 2012-12-12 at 15:55 -0800, Greg KH wrote:
> On Wed, Dec 12, 2012 at 04:17:14PM -0700, Toshi Kani wrote:
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -21,6 +21,7 @@ endif
> >  obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
> >  obj-$(CONFIG_REGMAP)   += regmap/
> >  obj-$(CONFIG_SOC_BUS) += soc.o
> > +obj-$(CONFIG_HOTPLUG)  += hotplug.o
> 
> CONFIG_HOTPLUG just got always enabled in the kernel, and I'm about to
> delete it around the 3.8-rc2 timeframe, so please don't add new usages
> of it to the kernel.

Sounds good.  I will simply change it obj-y then.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 01/11] Add hotplug.h for hotplug framework

2012-12-12 Thread Toshi Kani
On Wed, 2012-12-12 at 15:53 -0800, Greg KH wrote:
> On Wed, Dec 12, 2012 at 04:17:13PM -0700, Toshi Kani wrote:
> > Added include/linux/hotplug.h, which defines the hotplug framework
> > interfaces used by the framework itself and handlers.
> 
> No, please name this properly, _everything_ is hotpluggable these days,
> and unless you want the whole kernel and all busses and devices to use
> this, then it needs to be named much better than this, sorry.
> 
> We went through this same issue over 10 years ago, please, let's learn
> from our mistakes and not do it again.

Agreed.  I will come up with a better name to avoid the confusion.

> > +/* Add Validate order values */
> > +#define HP_ACPI_BUS_ADD_VALIDATE_ORDER 0   /* must be 
> > first */
> 
> This is really ACPI specific, so why not just put it under include/acpi/
> instead?

Yes, this needs to be revisited.  For now, it is defined in the same
file since it helps to manage the ordering when all values are defined
in a same place.  We may need the ordering values defined in each arch
when this framework is used by multiple architectures. 

> And note, PPC and other arches probably do this already (s390?) so to
> exclude them from the beginning would not be a good idea.

Thanks for the suggestion.  I will check other architectures and bring
them to the discussions. 

-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pstore/ram: Fix bounds checks for mem_size, record_size, console_size and ftrace_size

2012-12-12 Thread Anton Vorontsov
On Tue, Dec 11, 2012 at 05:49:24PM -0800, Arve Hjønnevåg wrote:
> The bounds check in ramoops_init_prz was incorrect and ramoops_init_przs
> had no check. Additionally, ramoops_init_przs allows record_size to be 0,
> but ramoops_pstore_write_buf would always crash in this case.
> 
> Signed-off-by: Arve Hjønnevåg 
> ---

Applied, thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Linus Torvalds
On Wed, Dec 12, 2012 at 7:43 PM, David Ahern  wrote:
>
> you know what's worse? All of your VMs blowing up because anyone runs perf
> with precise attribute. Virtualization and and performance monitoring
> collide. From the log message for commit 1342798.
>
> "Intel PEBS in VT-x context uses the DS address as a guest linear address,
> even though its programmed by the host as a host linear address. This either
> results in guest memory corruption and or the hardware faulting and
> 'crashing' the virtual machine.  Therefore we have to disable PEBS on VT-x
> enter and re-enable on VT-x exit, enforcing a strict exclude_guest."

Right.

SO WHY DON'T YOU JUST DO THAT THEN?

Disable PEBS on Vt-x enter and re-enable it on exit. End of story.
Exactly like you say.

But don't in the process screw up people WHO DON'T EVEN DO VIRTUALIZATION!

So please, just remove that idiotic "if (!event->attr.exclude_guest)"
test. It's wrong. It cannot possibly do the right thing.  It is
totally misdesigned, exactly because you don't even know beforehand if
somebody uses virtualization or not.

Now, if the feature had been done the sane way around, and you'd have
an explicit flag that says "force this even on entry to virtualized
guests", then you could have said "Dave, I can't do that combination
of precise and virtualized guests". At that point you have - at perf
record time - a valid reason to say EOPNOTSUPP.

But doing it this way was wrong. Switch that "exclude_guest" attribute
around, and admit that "H" was bogus, and that the right thing to do
was to add a "V" flag that sets the "force_guest" flag instead.

Problem solved, without screwing people who have no reason to ever care.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc: added DSCR support to ptrace

2012-12-12 Thread Alexey Kardashevskiy
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.

The kernel already supports DSCR value per thread but there is also
a need in a ability to change it from an external process for
the specific pid.

The patch adds new register index PT_DSCR (index=44) which can be
set/get by:
  ptrace(PTRACE_POKEUSER, traced_process, PT_DSCR << 3, dscr);
  dscr = ptrace(PTRACE_PEEKUSER, traced_process, PT_DSCR << 3, NULL);

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/ptrace.h |1 +
 arch/powerpc/kernel/ptrace.c  |   16 
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 9c21ed4..340fe36 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -276,6 +276,7 @@ static inline unsigned long 
regs_get_kernel_stack_nth(struct pt_regs *regs,
 #define PT_DAR 41
 #define PT_DSISR 42
 #define PT_RESULT 43
+#define PT_DSCR 44
 #define PT_REGS_COUNT 44
 
 #define PT_FPR048  /* each FP reg occupies 2 slots in this space */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index c10fc28..d3ba67b 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -179,6 +179,17 @@ static int set_user_msr(struct task_struct *task, unsigned 
long msr)
return 0;
 }
 
+static unsigned long get_user_dscr(struct task_struct *task)
+{
+   return task->thread.dscr;
+}
+
+static int set_user_dscr(struct task_struct *task, unsigned long dscr)
+{
+   task->thread.dscr = dscr;
+   return 0;
+}
+
 /*
  * We prevent mucking around with the reserved area of trap
  * which are used internally by the kernel.
@@ -200,6 +211,9 @@ unsigned long ptrace_get_reg(struct task_struct *task, int 
regno)
if (regno == PT_MSR)
return get_user_msr(task);
 
+   if (regno == PT_DSCR)
+   return get_user_dscr(task);
+
if (regno < (sizeof(struct pt_regs) / sizeof(unsigned long)))
return ((unsigned long *)task->thread.regs)[regno];
 
@@ -218,6 +232,8 @@ int ptrace_put_reg(struct task_struct *task, int regno, 
unsigned long data)
return set_user_msr(task, data);
if (regno == PT_TRAP)
return set_user_trap(task, data);
+   if (regno == PT_DSCR)
+   return set_user_dscr(task, data);
 
if (regno <= PT_MAX_PUT_REG) {
((unsigned long *)task->thread.regs)[regno] = data;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] clk: debug clock tree

2012-12-12 Thread Prashant Gaikwad
Adds debug file "clock_tree" in /sys/kernel/debug/clk dir.
It helps to view all the clock registered in tree format.

For example:
   clockenable_cnt  prepare_cnt  rate
-
 i2s0_sync  0   02400
 spdif_in_sync  0   02400
spdif_mux   0   02400
   spdif0   02400
  spdif_doubler 0   04800
 spdif_div  0   04800
spdif_2x0   04800
 clk_32k2   232768
blink_override  1   132768
   blink1   132768
 clk_m  2   21200
clk_out_3_mux   0   01200
   clk_out_30   01200
pll_ref 3   31200
   pll_e_mux0   01200
  pll_e 0   01
 cml0   0   01
 cml1   0   01
 pciex  0   01
   pll_d2   0   0100
  pll_d2_out0   0   050
   pll_d0   0100
  pll_d_out00   050
 dsib_mux   0   050
dsib0   050
 dsia   0   050

Signed-off-by: Prashant Gaikwad 
---
 drivers/clk/clk.c |   59 +
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 56e4495e..7daf201 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -34,6 +34,59 @@ static struct dentry *rootdir;
 static struct dentry *orphandir;
 static int inited = 0;
 
+static void clk_tree_show_one(struct seq_file *s, struct clk *c, int level)
+{
+   if (!c)
+   return;
+
+   seq_printf(s, "%*s%-*s %-11d %-12d %-10lu",
+  level * 3 + 1, "",
+  30 - level * 3, c->name,
+  c->enable_count, c->prepare_count, c->rate);
+   seq_printf(s, "\n");
+}
+
+static void clk_tree_show_subtree(struct seq_file *s, struct clk *c, int level)
+{
+   struct clk *child;
+   struct hlist_node *tmp;
+
+   if (!c)
+   return;
+
+   clk_tree_show_one(s, c, level);
+
+   hlist_for_each_entry(child, tmp, >children, child_node)
+   clk_tree_show_subtree(s, child, level + 1);
+}
+
+static int clk_tree_show(struct seq_file *s, void *data)
+{
+   struct clk *c;
+   struct hlist_node *tmp;
+
+   seq_printf(s, "   clockenable_cnt  prepare_cnt  
rate\n");
+   seq_printf(s, 
"-\n");
+
+   hlist_for_each_entry(c, tmp, _root_list, child_node)
+   clk_tree_show_subtree(s, c, 0);
+
+   return 0;
+}
+
+
+static int clk_tree_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, clk_tree_show, inode->i_private);
+}
+
+static const struct file_operations clk_tree_fops = {
+   .open   = clk_tree_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 /* caller must hold prepare_lock */
 static int clk_debug_create_one(struct clk *clk, struct dentry *pdentry)
 {
@@ -167,12 +220,18 @@ static int __init clk_debug_init(void)
 {
struct clk *clk;
struct hlist_node *tmp;
+   struct dentry *d;
 
rootdir = debugfs_create_dir("clk", NULL);
 
if (!rootdir)
return -ENOMEM;
 
+   d = debugfs_create_file("clock_tree", S_IRUGO, rootdir, NULL,
+   _tree_fops);
+   if (!d)
+   return -ENOMEM;
+
orphandir = debugfs_create_dir("orphans", rootdir);
 
if (!orphandir)
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/11] Hot-plug and Online/Offline framework

2012-12-12 Thread Toshi Kani
On Wed, 2012-12-12 at 16:55 -0800, Greg KH wrote:
> On Wed, Dec 12, 2012 at 05:39:36PM -0700, Toshi Kani wrote:
> > On Wed, 2012-12-12 at 15:56 -0800, Greg KH wrote:
> > > On Wed, Dec 12, 2012 at 04:17:12PM -0700, Toshi Kani wrote:
> > > > This patchset is an initial prototype of proposed hot-plug framework
> > > > for design review.  The hot-plug framework is designed to provide 
> > > > the common framework for hot-plugging and online/offline operations
> > > > of system devices, such as CPU, Memory and Node.  While this patchset
> > > > only supports ACPI-based hot-plug operations, the framework itself is
> > > > designed to be platform-neural and can support other FW architectures
> > > > as necessary.
> > > > 
> > > > The patchset has not been fully tested yet, esp. for memory hot-plug.
> > > > Any help for testing will be very appreciated since my test setup
> > > > is limited.
> > > > 
> > > > The patchset is based on the linux-next branch of linux-pm.git tree.
> > > > 
> > > > Overview of the Framework
> > > > =
> > > 
> > > 
> > > 
> > > Why all the new framework, doesn't the existing bus infrastructure
> > > provide everything you need here?  Shouldn't you just be putting your
> > > cpus and memory sticks on a bus and handle stuff that way?  What makes
> > > these types of devices so unique from all other devices that Linux has
> > > been handling in a dynamic manner (i.e. hotplugging them) for many many
> > > years?
> > > 
> > > Why are you reinventing the wheel?
> > 
> > Good question.  Yes, USB and PCI hotplug operate based on their bus
> > structures.  USB and PCI cards only work under USB and PCI bus
> > controllers.  So, their framework can be composed within the bus
> > structures as you pointed out.
> > 
> > However, system devices such CPU and memory do not have their standard
> > bus.  ACPI allows these system devices to be enumerated, but it does not
> > make ACPI as the HW bus hierarchy for CPU and memory, unlike PCI and
> > USB.  Therefore, CPU and memory modules manage CPU and memory outside of
> > ACPI.  This makes sense because CPU and memory can be used without ACPI.
> > 
> > This leads us an issue when we try to manage system device hotplug
> > within ACPI, because ACPI does not control everything.  This patchset
> > provides a common hotplug framework for system devices, which both ACPI
> > and non-ACPI modules (i.e. CPU and memory modules) can participate and
> > are coordinated for their hotplug operations.  This is analogous to the
> > boot-up sequence, which ACPI and non-ACPI modules can participate to
> > enable CPU and memory.
> 
> Then create a "virtual" bus and put the devices you wish to control on
> that.  That is what the "system bus" devices were supposed to be, it's
> about time someone took that code and got it all working properly in
> this way, that is why it was created oh so long ago.

It may be the ideal, but it will take us great effort to make such
things to happen based on where we are now.  It is going to be a long
way.  I believe the first step is to make the boot-up flow and hot-plug
flow consistent for system devices.  This is what this patchset is
trying to do.

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread David Ahern

On 12/12/12 8:34 PM, Linus Torvalds wrote:

On Wed, Dec 12, 2012 at 7:25 PM, David Ahern  wrote:


Are you running an older perf binary on the 3.8 kernel?


I am.. I don't tend to rebuild 'perf'..


Does this work: perf record -e cycles:ppH  ...


Yes it does. What is 'H' and why should anybody care? Especially since
I'm not running virtualized.

That whole "exclude_guest" test is insane when there isn't any
virtualization going on. Very annoying.


you know what's worse? All of your VMs blowing up because anyone runs 
perf with precise attribute. Virtualization and and performance 
monitoring collide. From the log message for commit 1342798.


"Intel PEBS in VT-x context uses the DS address as a guest linear 
address, even though its programmed by the host as a host linear 
address. This either results in guest memory corruption and or the 
hardware faulting and 'crashing' the virtual machine.  Therefore we have 
to disable PEBS on VT-x enter and re-enable on VT-x exit, enforcing a 
strict exclude_guest."


David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 UPDATE] mm/hugetlb: create hugetlb cgroup file in hugetlb_init

2012-12-12 Thread Jianguo Wu
Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y
and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option,
system will boot fail.

This failure is caused by following code path:
setup_hugepagesz
hugetlb_add_hstate
hugetlb_cgroup_file_init
cgroup_add_cftypes
kzalloc <--slab is *not available* yet

For this path, slab is not available yet, so memory allocated will be
failed, and cause WARN_ON() in hugetlb_cgroup_file_init().

So I move hugetlb_cgroup_file_init() into hugetlb_init().

Changelog:
  do code refactor as suggesting by Aneesh
  add Reviewed-by and Acked-by 

Signed-off-by: Jianguo Wu 
Signed-off-by: Jiang Liu 
Reviewed-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
---
 include/linux/hugetlb_cgroup.h |7 ++-
 mm/hugetlb.c   |   11 +--
 mm/hugetlb_cgroup.c|   18 --
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index d73878c..5bb9c28 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -62,7 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned 
long nr_pages,
 struct page *page);
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
   struct hugetlb_cgroup *h_cg);
-extern int hugetlb_cgroup_file_init(int idx) __init;
+extern void hugetlb_cgroup_file_init(void) __init;
 extern void hugetlb_cgroup_migrate(struct page *oldhpage,
   struct page *newhpage);
 
@@ -111,10 +111,7 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long 
nr_pages,
return;
 }
 
-static inline int __init hugetlb_cgroup_file_init(int idx)
-{
-   return 0;
-}
+static inline void __init hugetlb_cgroup_file_init() {}
 
 static inline void hugetlb_cgroup_migrate(struct page *oldhpage,
  struct page *newhpage)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1ef2cd4..a30da48 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1906,14 +1906,12 @@ static int __init hugetlb_init(void)
default_hstate.max_huge_pages = default_hstate_max_huge_pages;
 
hugetlb_init_hstates();
-
gather_bootmem_prealloc();
-
report_hugepages();
 
hugetlb_sysfs_init();
-
hugetlb_register_all_nodes();
+   hugetlb_cgroup_file_init();
 
return 0;
 }
@@ -1943,13 +1941,6 @@ void __init hugetlb_add_hstate(unsigned order)
h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
-   /*
-* Add cgroup control files only if the huge page consists
-* of more than two normal pages. This is because we use
-* page[2].lru.next for storing cgoup details.
-*/
-   if (order >= HUGETLB_CGROUP_MIN_ORDER)
-   hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
 
parsed_hstate = h;
 }
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index a3f358f..7af7459 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -340,7 +340,7 @@ static char *mem_fmt(char *buf, int size, unsigned long 
hsize)
return buf;
 }
 
-int __init hugetlb_cgroup_file_init(int idx)
+static void __init __hugetlb_cgroup_file_init(int idx)
 {
char buf[32];
struct cftype *cft;
@@ -382,7 +382,21 @@ int __init hugetlb_cgroup_file_init(int idx)
 
WARN_ON(cgroup_add_cftypes(_subsys, h->cgroup_files));
 
-   return 0;
+   return;
+}
+
+void __init hugetlb_cgroup_file_init()
+{
+   struct hstate *h;
+
+   for_each_hstate(h)
+   /*
+* Add cgroup control files only if the huge page consists
+* of more than two normal pages. This is because we use
+* page[2].lru.next for storing cgroup details.
+*/
+   if (huge_page_order(h) >= HUGETLB_CGROUP_MIN_ORDER)
+   __hugetlb_cgroup_file_init(hstate_index(h));
 }
 
 /*
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Linus Torvalds
On Wed, Dec 12, 2012 at 7:25 PM, David Ahern  wrote:
>
> Are you running an older perf binary on the 3.8 kernel?

I am.. I don't tend to rebuild 'perf'..

> Does this work: perf record -e cycles:ppH  ...

Yes it does. What is 'H' and why should anybody care? Especially since
I'm not running virtualized.

That whole "exclude_guest" test is insane when there isn't any
virtualization going on. Very annoying.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] fs: remove obsolete simple_strto

2012-12-12 Thread Dave Chinner
On Fri, Dec 07, 2012 at 05:25:19PM +0530, Abhijit Pawar wrote:
> This patch replace the obsolete simple_strto with kstrto

The XFS changes look fine. Consider those:

Acked-by: Dave Chinner 

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread David Ahern

On 12/12/12 7:53 PM, Linus Torvalds wrote:

Hmm. This may be entirely unrelated to this particular pull request, but

perf record -e cycles:pp

no longer works on my westmere machine (Operation not supported). It
used to work, but I haven't tried to bisect it, since I hope somebody
will just go "oh, I know what's up".


One last "I may know what's up" question. I wonder if you are tripping 
on this:


if (event->attr.precise_ip) {
int precise = 0;

if (!event->attr.exclude_guest)
return -EOPNOTSUPP;

Are you running an older perf binary on the 3.8 kernel?

Does this work: perf record -e cycles:ppH  ...

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2012-12-12 Thread David Miller
From: Linus Torvalds 
Date: Wed, 12 Dec 2012 18:37:08 -0800

> On Wed, Dec 12, 2012 at 6:27 PM, David Miller  wrote:
>>
>> There are two SCTP HMAC cookie algorithms, MD5 and SHA1.
>>
>> What used to happen is that you had to choose one at build
>> time, and then you were stuck with that decision and it was
>> all that you could use.
>>
>> Now, it's selectable at run time.
>>
>> If there's anything you find particularly anti-social about
>> this, I'm sure we can adjust it.
> 
> So I'd suggest doing the same thing that the new thermal throttling
> Kconfig does: start off by asking for the default algorithm, then ask
> about the others.
> 
> The "choice" part selects the one that is default (so it never gets
> asked about and is obviously compiled in), and the rest default to no
> like we should.
> 
> See drivers/thermal/Kconfig for an example of this. I think we do it
> in other places too, but that one happens to be new so I picked it as
> an example.
> 
> The rule should be that we *never* default anything to 'yes', unless
> it's old functionality that we always compiled in before too, and now
> it got made conditional. So if you see a "default y" on new options,
> you should basically consider it broken.
> 
> We're already bloating too much, we should not encourage people to
> make things more bloated than necessary.
> 
> Btw, that Kconfig option has basically no useful help text either.
> What's the point of repeating the question as a "help" message?
> 
> If people can't explain why anybody should enable it, it sure as hell
> shouldn't default to 'y'. Maybe it shouldn't exist at all?

Neil and Vlad, please take care of this.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-12-12 Thread Alex Williamson
On Thu, 2012-12-13 at 13:57 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2012-12-12 at 16:30 -0700, Alex Williamson wrote:
> > Locked page accounting in this version is very, very broken.  How do
> > powerpc folks feel about seemingly generic kernel iommu interfaces
> > messing with the current task mm?  Besides that, more problems
> > below...
> 
> After a second look & thought...
> 
> This whole accounting business is fucked. First, we simply can't just
> randomly return errors from H_PUT_TCE because the process reached some
> rlimit. This is not a proper failure mode. That means that the guest
> will probably panic() ... possibly right in the middle of some disk
> writeback or god knows what. Not good.
> 
> Also the overhead of doing all that crap on every TCE map/unmap is
> ridiculous.
> 
> Finally, it's just not going to work for real mode which we really want,
> since we can't take the mmap-sem in real mode anyway, so unless we
> convert that counter to an atomic, we can't do it.
> 
> I'd suggest just not bothering, or if you want to bother, check once
> when creating a TCE table that the rlimit is enough to bolt as many
> pages as can be populated in that table and fail to create *that*. The
> failure mode is much better, ie, qemu failing to create a PCI bus due to
> insufficient rlimits.

I agree, we don't seem to be headed in the right direction.  x86 needs
to track rlimits or else a user can exploit the interface to pin all the
memory in the system.  On power, only the iova window can be pinned, so
it's a fixed amount.  I could see it as granting access to a group
implicitly grants access to pinning the iova window.  We can still make
it more explicit by handling the rlimit accounting upfront.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] mm/hugetlb: create hugetlb cgroup file in hugetlb_init

2012-12-12 Thread Jianguo Wu
Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y
and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option,
system will boot fail.

This failure is caused by following code path:
setup_hugepagesz
hugetlb_add_hstate
hugetlb_cgroup_file_init
cgroup_add_cftypes
kzalloc <--slab is *not available* yet

For this path, slab is not available yet, so memory allocated will be
failed, and cause WARN_ON() in hugetlb_cgroup_file_init().

So I move hugetlb_cgroup_file_init() into hugetlb_init().

Changelog:
  use huge_page_order(h) instead of h->order as suggesting by Aneesh
  add Reviewed-by and Acked-by 

Signed-off-by: Jianguo Wu 
Signed-off-by: Jiang Liu 
Reviewed-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
---
 include/linux/hugetlb_cgroup.h |7 ++-
 mm/hugetlb.c   |   11 +--
 mm/hugetlb_cgroup.c|   23 +--
 3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index d73878c..5bb9c28 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -62,7 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned 
long nr_pages,
 struct page *page);
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
   struct hugetlb_cgroup *h_cg);
-extern int hugetlb_cgroup_file_init(int idx) __init;
+extern void hugetlb_cgroup_file_init(void) __init;
 extern void hugetlb_cgroup_migrate(struct page *oldhpage,
   struct page *newhpage);
 
@@ -111,10 +111,7 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long 
nr_pages,
return;
 }
 
-static inline int __init hugetlb_cgroup_file_init(int idx)
-{
-   return 0;
-}
+static inline void __init hugetlb_cgroup_file_init() {}
 
 static inline void hugetlb_cgroup_migrate(struct page *oldhpage,
  struct page *newhpage)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1ef2cd4..a30da48 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1906,14 +1906,12 @@ static int __init hugetlb_init(void)
default_hstate.max_huge_pages = default_hstate_max_huge_pages;
 
hugetlb_init_hstates();
-
gather_bootmem_prealloc();
-
report_hugepages();
 
hugetlb_sysfs_init();
-
hugetlb_register_all_nodes();
+   hugetlb_cgroup_file_init();
 
return 0;
 }
@@ -1943,13 +1941,6 @@ void __init hugetlb_add_hstate(unsigned order)
h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
-   /*
-* Add cgroup control files only if the huge page consists
-* of more than two normal pages. This is because we use
-* page[2].lru.next for storing cgoup details.
-*/
-   if (order >= HUGETLB_CGROUP_MIN_ORDER)
-   hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
 
parsed_hstate = h;
 }
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index a3f358f..0aa8ae1 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -340,7 +340,7 @@ static char *mem_fmt(char *buf, int size, unsigned long 
hsize)
return buf;
 }
 
-int __init hugetlb_cgroup_file_init(int idx)
+static void __init __hugetlb_cgroup_file_init(int idx)
 {
char buf[32];
struct cftype *cft;
@@ -382,7 +382,26 @@ int __init hugetlb_cgroup_file_init(int idx)
 
WARN_ON(cgroup_add_cftypes(_subsys, h->cgroup_files));
 
-   return 0;
+   return;
+}
+
+void __init hugetlb_cgroup_file_init()
+{
+   struct hstate *h;
+   int idx;
+
+   idx = 0;
+   for_each_hstate(h) {
+   /*
+* Add cgroup control files only if the huge page consists
+* of more than two normal pages. This is because we use
+* page[2].lru.next for storing cgroup details.
+*/
+   if (huge_page_order(h) >= HUGETLB_CGROUP_MIN_ORDER)
+   __hugetlb_cgroup_file_init(idx);
+
+   idx++;
+   }
 }
 
 /*
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread David Ahern

On 12/12/12 8:09 PM, Linus Torvalds wrote:

On Wed, Dec 12, 2012 at 7:02 PM, David Ahern  wrote:


Can you add -v and see if it spits out more info?


No more info.


I'm surprised you are not seeing this as well:

   } else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
ui__error("\'precise\' request may not be supported. "
  "Try removing 'p' modifier\n");
rc = -err;
goto out;
}

I made changes in this area relatively recently; I'll take a look.

David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling

2012-12-12 Thread Alex Shi

>> now, on the other hand, if you have two threads of a process that
>> share a bunch of data structures, and you'd spread these over 2
>> sockets, you end up bouncing data between the two sockets a lot,
>> running inefficient --> bad for power.
> 
> Yeah, that should be addressed by the NUMA patches people are working on
> right now.


Yes, as to balance/powersaving policy, we can tight pack tasks firstly,
then NUMA balancing will make memory follow us.

BTW, NUMA balancing is more related with page in memory. not LLC.
> 
>> having said all this, if you have to tasks that don't have such
>> cache effects, the most efficient way of running things will be on 2
>> hyperthreading halves... it's very hard to beat the power efficiency
>> of that. But this assumes the tasks don't compete with resources much
>> on the HT level, and achieve good scaling. and this still has to
>> compete with "race to halt", because if you're done quicker, you can
>> put the memory in self refresh quicker.
> 
> Right, how are we addressing the breakeven in that case? AFAIK, we
> do schedule them now on two different cores (not HT threads, i.e. no
> resource sharing besides L2) so that we get done faster, i.e. race to

that's balance policy for. :)
> idle in the performance case. And in the powersavings' case we leave
> them as tightly packed as possible.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Linus Torvalds
On Wed, Dec 12, 2012 at 7:02 PM, David Ahern  wrote:
>
> Can you add -v and see if it spits out more info?

No more info.

Sure, it does the usual "do  you have an APIC" message (it does that
without "-v" too), which isn't useful:

  Error: sys_perf_event_open() syscall returned with 95 (Operation not
supported) for event cycles:pp. /bin/dmesg may provide additional
information.

  No hardware sampling interrupt available. No APIC? If so then you
can boot the kernel with the "lapic" boot parameter to force-enable
it.

And yes, I have a local apic. Every single modern CPU does.

The error message is garbage and actively misleading. Lack of an APIC
is just about the *least* likely possible reason for the EOPNOTSUPP
error return.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes

2012-12-12 Thread Simon Jeons
On Thu, 2012-12-13 at 09:48 +0800, Tang Chen wrote:
> On 12/13/2012 08:28 AM, Simon Jeons wrote:
> > On Wed, 2012-12-12 at 18:32 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> On 12/12/2012 05:29 PM, Simon Jeons wrote:
> >>>
> >>> Thanks for your clarify.
> >>>
> >>> Enable PAE on x86 32bit kernel, 8G memory, movablecore=6.5G
> >>
> >> Could you please provide more info ?
> >>
> >> Such as the whole kernel commondline. And did this happen after
> >> you applied these patches ? What is the output without these
> >> patches ?
> >
> > This result is without the patches, I didn't add more kernel
> > commandline, just movablecore=6.5G, but output as you see is strange, so
> > what happened?
> 
> Hi Simon,
> 
> For now, I'm not quite sure what happened. Could you please provide the
> output without the movablecore=6.5G option ?
> 
> Seeing from your output, your totalpages=2051391, which is about 8G. But
> the memory mapped for your node 0 is obviously not enough.
> 
> When we have high memory, ZONE_MOVABLE is taken from ZONE_HIGH. So the
> first line, 8304MB HIGHMEM available is also strange.
> 
> So I think we need more info to find out the problem. :)
> 

[0.00] 8304MB HIGHMEM available.
[0.00] 885MB LOWMEM available.
[0.00]   mapped low ram: 0 - 375fe000
[0.00]   low ram: 0 - 375fe000
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x0001-0x00ff]
[0.00]   Normal   [mem 0x0100-0x375fdfff]
[0.00]   HighMem  [mem 0x375fe000-0x3e5f]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x0001-0x0009cfff]
[0.00]   node   0: [mem 0x0010-0x1fff]
[0.00]   node   0: [mem 0x2020-0x3fff]
[0.00]   node   0: [mem 0x4020-0xb69cbfff]
[0.00]   node   0: [mem 0xb6a46000-0xb6a47fff]
[0.00]   node   0: [mem 0xb6b1c000-0xb6cf]
[0.00]   node   0: [mem 0x-0x3e5f]
[0.00] On node 0 totalpages: 2051391
[0.00] free_area_init_node: node 0, pgdat c0c2cc00, node_mem_map
f19c2200
[0.00]   DMA zone: 32 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 3949 pages, LIFO batch:0
[0.00]   Normal zone: 1740 pages used for memmap
[0.00]   Normal zone: 220466 pages, LIFO batch:31
[0.00]   HighMem zone: 16609 pages used for memmap
[0.00]   HighMem zone: 1808595 pages, LIFO batch:31


menuentry 'Fedora (3.7.0+)' --class fedora --class gnu-linux --class gnu
--class os $menuentry_id_option
'gnulinux-simple-7ed9528d-006f-4d9e-93d9-f68b0967ca99' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1
--hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'
eba9dfce-d7f1-4b5c-9199-f2abf80e5dc6
else
  search --no-floppy --fs-uuid --set=root
eba9dfce-d7f1-4b5c-9199-f2abf80e5dc6
fi
echo 'Loading Fedora (3.7.0+)'
linux   /vmlinuz-3.7.0+ root=/dev/mapper/vg_kernel-lv_root ro
rd.md=0 rd.dm=0 rd.lvm.lv=vg_kernel/lv_root SYSFONT=True  KEYTABLE=us
rd.luks=0 LANG=en_US.UTF-8 rd.lvm.lv=vg_kernel/lv_swap rhgb quiet
echo 'Loading initial ramdisk ...'
initrd /initramfs-3.7.0+.img
}


> Thank. :)
> 
> >
> >>
> >> Thanks. :)
> >>
> 
>  [0.00] 8304MB HIGHMEM available.
>  [0.00] 885MB LOWMEM available.
>  [0.00]   mapped low ram: 0 - 375fe000
>  [0.00]   low ram: 0 - 375fe000
>  [0.00] Zone ranges:
>  [0.00]   DMA  [mem 0x0001-0x00ff]
>  [0.00]   Normal   [mem 0x0100-0x375fdfff]
>  [0.00]   HighMem  [mem 0x375fe000-0x3e5f]
>  [0.00] Movable zone start for each node
>  [0.00] Early memory node ranges
>  [0.00]   node   0: [mem 0x0001-0x0009cfff]
>  [0.00]   node   0: [mem 0x0010-0x1fff]
>  [0.00]   node   0: [mem 0x2020-0x3fff]
>  [0.00]   node   0: [mem 0x4020-0xb69cbfff]
>  [0.00]   node   0: [mem 0xb6a46000-0xb6a47fff]
>  [0.00]   node   0: [mem 0xb6b1c000-0xb6cf]
>  [0.00]   node   0: [mem 0x-0x3e5f]
>  [0.00] On node 0 totalpages: 2051391
>  [0.00] free_area_init_node: node 0, pgdat c0c26a80,
>  node_mem_map
>  f19de200
>  [0.00]   DMA zone: 32 pages used for memmap
>  [0.00]   DMA zone: 0 pages reserved
>  [0.00]   DMA zone: 3949 pages, LIFO batch:0
>  [0.00]   Normal zone: 1740 pages used for memmap
>  [0.00]   Normal zone: 220466 pages, LIFO batch:31
>  [0.00]   HighMem zone: 16609 pages 

[PATCH] Documentation/java.txt: add Java 7 support

2012-12-12 Thread Jonathan Callen
The sample wrapper currently fails on some Java 7 .class files.  This
updates the wrapper to properly handle those files.

Signed-off-by: Jonathan Callen 
---
 Documentation/java.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/java.txt b/Documentation/java.txt
index e6a7232..4180205 100644
--- a/Documentation/java.txt
+++ b/Documentation/java.txt
@@ -188,6 +188,9 @@ shift
 #define CP_METHODREF 10
 #define CP_INTERFACEMETHODREF 11
 #define CP_NAMEANDTYPE 12
+#define CP_METHODHANDLE 15
+#define CP_METHODTYPE 16
+#define CP_INVOKEDYNAMIC 18
 
 /* Define some commonly used error messages */
 
@@ -242,14 +245,19 @@ void skip_constant(FILE *classfile, u_int16_t *cur)
break;
case CP_CLASS:
case CP_STRING:
+   case CP_METHODTYPE:
seekerr = fseek(classfile, 2, SEEK_CUR);
break;
+   case CP_METHODHANDLE:
+   seekerr = fseek(classfile, 3, SEEK_CUR);
+   break;
case CP_INTEGER:
case CP_FLOAT:
case CP_FIELDREF:
case CP_METHODREF:
case CP_INTERFACEMETHODREF:
case CP_NAMEANDTYPE:
+   case CP_INVOKEDYNAMIC:
seekerr = fseek(classfile, 4, SEEK_CUR);
break;
case CP_LONG:
-- 
1.7.12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-12 Thread NeilBrown

[Thierry: question for you near the end - thanks]

On Wed, 12 Dec 2012 10:08:28 -0600 Jon Hunter  wrote:

> Hi Neil,
> 
> On 12/12/2012 02:24 AM, NeilBrown wrote:
> > 
> > 
> > This patch is based on an earlier patch by Grant Erickson
> > which provided pwm devices using the 'legacy' interface.
> > 
> > This driver instead uses the new framework interface.
> > 
> > Cc: Grant Erickson 
> > Signed-off-by: NeilBrown 
> > 
> > diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> > index ed81720..7df573a 100644
> > --- a/drivers/pwm/Kconfig
> > +++ b/drivers/pwm/Kconfig
> > @@ -85,6 +85,15 @@ config PWM_MXS
> >   To compile this driver as a module, choose M here: the module
> >   will be called pwm-mxs.
> >  
> > +config PWM_OMAP
> > +   tristate "OMAP pwm support"
> > +   depends on ARCH_OMAP
> 
> We should probably have depends on or selects OMAP_DM_TIMER here too.

Sounds sensible - fixed.

> 
> > +   help
> > + Generic PWM framework driver for OMAP
> > +
> > + To compile this driver as a module, choose M here: the module
> > + will be called pwm-omap
> > +
> >  config PWM_PUV3
> > tristate "PKUnity NetBook-0916 PWM support"
> > depends on ARCH_PUV3
> > diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
> > index acfe482..f5d200d 100644
> > --- a/drivers/pwm/Makefile
> > +++ b/drivers/pwm/Makefile
> > @@ -5,6 +5,7 @@ obj-$(CONFIG_PWM_IMX)   += pwm-imx.o
> >  obj-$(CONFIG_PWM_JZ4740)   += pwm-jz4740.o
> >  obj-$(CONFIG_PWM_LPC32XX)  += pwm-lpc32xx.o
> >  obj-$(CONFIG_PWM_MXS)  += pwm-mxs.o
> > +obj-$(CONFIG_PWM_OMAP) += pwm-omap.o
> >  obj-$(CONFIG_PWM_PUV3) += pwm-puv3.o
> >  obj-$(CONFIG_PWM_PXA)  += pwm-pxa.o
> >  obj-$(CONFIG_PWM_SAMSUNG)  += pwm-samsung.o
> > diff --git a/drivers/pwm/pwm-omap.c b/drivers/pwm/pwm-omap.c
> > new file mode 100644
> > index 000..e3dbce3
> > --- /dev/null
> > +++ b/drivers/pwm/pwm-omap.c
> > @@ -0,0 +1,318 @@
> > +/*
> > + *Copyright (c) 2012 NeilBrown 
> > + *Heavily based on earlier code which is:
> > + *Copyright (c) 2010 Grant Erickson 
> > + *
> > + *Also based on pwm-samsung.c
> > + *
> > + *This program is free software; you can redistribute it and/or
> > + *modify it under the terms of the GNU General Public License
> > + *version 2 as published by the Free Software Foundation.
> > + *
> > + *Description:
> > + *  This file is the core OMAP2/3 support for the generic, Linux
> 
> I would drop the OMAP2/3 and just say OMAP here. Potentially this should
> work for OMAP1-5.
> 

Done.


> > + *  PWM driver / controller, using the OMAP's dual-mode timers.
> > + *
> > + *The 'id' number for the device encodes the number of the dm timer
> > + *to use, and the polarity of the output.
> > + *lsb is '1' of active-high, and '0' for active low
> > + *remaining bit a timer number and need to be shifted down before use.
> > + */
> > +
> > +#define pr_fmt(fmt) "pwm-omap: " fmt
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> 
> This is going to be a problem for the single zImage work, because we
> cannot include any plat headers in driver code any more. Therefore,
> although it is not ideal, one way to handle this is pass function
> pointers to the various dmtimer APIs that are needed via the platform
> data. Painful I know ...

But that doesn't work with devicetree does it?

Can't we move the dmtimer.h file to include/linux/omap-dmtimer.h or something?

It only included other things from include/linux, so it should be safe.

> 
> > +#define DM_TIMER_LOAD_MIN  0xFFFE
> > +
> > +struct omap_chip {
> > +   struct platform_device  *pdev;
> > +
> > +   struct omap_dm_timer*dm_timer;
> > +   unsigned intpolarity;
> > +   const char  *label;
> > +
> > +   unsigned intduty_ns, period_ns;
> > +   struct pwm_chip chip;
> > +};
> > +
> > +#define to_omap_chip(chip) container_of(chip, struct omap_chip, chip)
> > +
> > +#definepwm_dbg(_pwm, msg...) dev_dbg(&(_pwm)->pdev->dev, msg)
> > +
> > +/**
> > + * pwm_calc_value - determines the counter value for a clock rate and 
> > period.
> > + * @clk_rate: The clock rate, in Hz, of the PWM's clock source to compute 
> > the
> > + *counter value for.
> > + * @ns: The period, in nanoseconds, to computer the counter value for.
> > + *
> > + * Returns the PWM counter value for the specified clock rate and period.
> > + */
> > +static inline int pwm_calc_value(unsigned long clk_rate, int ns)
> > +{
> > +   const unsigned long nanoseconds_per_second = 10;
> > +   int cycles;
> > +   __u64 c;
> > +
> > +   c = (__u64)clk_rate * ns;
> > +   do_div(c, nanoseconds_per_second);
> > +   cycles = c;
> > +
> > +   return DM_TIMER_LOAD_MIN - cycles;
> > +}
> > +
> > +static int omap_pwm_enable(struct pwm_chip 

Re: [TRIVIAL PATCH 08/26] m32r: Convert print_symbol to %pSR

2012-12-12 Thread Hirokazu Takata
Acked-by: Hirokazu Takata 

Thank you.

From: Joe Perches 
Subject: [TRIVIAL PATCH 08/26] m32r: Convert print_symbol to %pSR
Date: Wed, 12 Dec 2012 10:18:57 -0800
> Use the new vsprintf extension to avoid any possible
> message interleaving.
> 
> Signed-off-by: Joe Perches 
> ---
>  arch/m32r/kernel/traps.c |6 ++
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/m32r/kernel/traps.c b/arch/m32r/kernel/traps.c
> index 3bcb207..9fe3467 100644
> --- a/arch/m32r/kernel/traps.c
> +++ b/arch/m32r/kernel/traps.c
> @@ -132,10 +132,8 @@ static void show_trace(struct task_struct *task, 
> unsigned long *stack)
>   printk("Call Trace: ");
>   while (!kstack_end(stack)) {
>   addr = *stack++;
> - if (__kernel_text_address(addr)) {
> - printk("[<%08lx>] ", addr);
> - print_symbol("%s\n", addr);
> - }
> + if (__kernel_text_address(addr))
> + printk("[<%08lx>] %pSR\n", addr, (void *)addr);
>   }
>   printk("\n");
>  }
> -- 
> 1.7.8.112.g3fd21
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] ktest: Fix breakage from change of oldnoconfig to olddefconfig

2012-12-12 Thread Adam Lee
On Wed, Dec 12, 2012 at 05:14:41PM -0500, Steven Rostedt wrote:
> From: Steven Rostedt 
> 
> Commit fb16d891 "kconfig: replace 'oldnoconfig' with 'olddefconfig', and
> keep the old name", changed ktest's default config update from
> oldnoconfig to olddefconfig without adding oldnoconfig as a backup.
> The make oldnoconfig works much better than its backup of:
>yes '' | make oldconfig
> 
> But due to this change, and the fact that ktest is used to build lots of
> older kernels (and for bisects), it forgoes the oldnoconfig completely.
> 
> Cc: Adam Lee 
> Cc: Michal Marek 
> Signed-off-by: Steven Rostedt 

This makes sense, we should provide a backup for older kernels. Sorry
for not noticing the breakage.

Acked-by: Adam Lee 

-- 
Regards,
Adam Lee
http://adam8157.info
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread David Ahern

On 12/12/12 7:53 PM, Linus Torvalds wrote:

Hmm. This may be entirely unrelated to this particular pull request, but

perf record -e cycles:pp

no longer works on my westmere machine (Operation not supported). It
used to work, but I haven't tried to bisect it, since I hope somebody
will just go "oh, I know what's up".


Can you add -v and see if it spits out more info?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/hugetlb: create hugetlb cgroup file in hugetlb_init

2012-12-12 Thread Jianguo Wu
On 2012/12/13 1:05, Aneesh Kumar K.V wrote:

> Jianguo Wu  writes:
> 
>> Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y
>> and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option,
>> system will boot fail.
>>
>> This failure is caused by following code path:
>> setup_hugepagesz
>>  hugetlb_add_hstate
>>  hugetlb_cgroup_file_init
>>  cgroup_add_cftypes
>>  kzalloc <--slab is *not available* yet
>>
>> For this path, slab is not available yet, so memory allocated will be
>> failed, and cause WARN_ON() in hugetlb_cgroup_file_init().
>>
>> So I move hugetlb_cgroup_file_init() into hugetlb_init().
>>
>> Signed-off-by: Jianguo Wu 
>> Signed-off-by: Jiang Liu 
>> ---
>>  include/linux/hugetlb_cgroup.h |7 ++-
>>  mm/hugetlb.c   |   11 +--
>>  mm/hugetlb_cgroup.c|   23 +--
>>  3 files changed, 24 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
>> index d73878c..5bb9c28 100644
>> --- a/include/linux/hugetlb_cgroup.h
>> +++ b/include/linux/hugetlb_cgroup.h
>> @@ -62,7 +62,7 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned 
>> long nr_pages,
>>   struct page *page);
>>  extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
>> struct hugetlb_cgroup *h_cg);
>> -extern int hugetlb_cgroup_file_init(int idx) __init;
>> +extern void hugetlb_cgroup_file_init(void) __init;
>>  extern void hugetlb_cgroup_migrate(struct page *oldhpage,
>> struct page *newhpage);
>>
>> @@ -111,10 +111,7 @@ hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long 
>> nr_pages,
>>  return;
>>  }
>>
>> -static inline int __init hugetlb_cgroup_file_init(int idx)
>> -{
>> -return 0;
>> -}
>> +static inline void __init hugetlb_cgroup_file_init() {}
>>
>>  static inline void hugetlb_cgroup_migrate(struct page *oldhpage,
>>struct page *newhpage)
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 1ef2cd4..a30da48 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1906,14 +1906,12 @@ static int __init hugetlb_init(void)
>>  default_hstate.max_huge_pages = default_hstate_max_huge_pages;
>>
>>  hugetlb_init_hstates();
>> -
>>  gather_bootmem_prealloc();
>> -
>>  report_hugepages();
>>
>>  hugetlb_sysfs_init();
>> -
>>  hugetlb_register_all_nodes();
>> +hugetlb_cgroup_file_init();
>>
>>  return 0;
>>  }
>> @@ -1943,13 +1941,6 @@ void __init hugetlb_add_hstate(unsigned order)
>>  h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
>>  snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>>  huge_page_size(h)/1024);
>> -/*
>> - * Add cgroup control files only if the huge page consists
>> - * of more than two normal pages. This is because we use
>> - * page[2].lru.next for storing cgoup details.
>> - */
>> -if (order >= HUGETLB_CGROUP_MIN_ORDER)
>> -hugetlb_cgroup_file_init(hugetlb_max_hstate - 1);
>>
>>  parsed_hstate = h;
>>  }
>> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
>> index a3f358f..284cb68 100644
>> --- a/mm/hugetlb_cgroup.c
>> +++ b/mm/hugetlb_cgroup.c
>> @@ -340,7 +340,7 @@ static char *mem_fmt(char *buf, int size, unsigned long 
>> hsize)
>>  return buf;
>>  }
>>
>> -int __init hugetlb_cgroup_file_init(int idx)
>> +static void __init __hugetlb_cgroup_file_init(int idx)
>>  {
>>  char buf[32];
>>  struct cftype *cft;
>> @@ -382,7 +382,26 @@ int __init hugetlb_cgroup_file_init(int idx)
>>
>>  WARN_ON(cgroup_add_cftypes(_subsys, h->cgroup_files));
>>
>> -return 0;
>> +return;
>> +}
>> +
>> +void __init hugetlb_cgroup_file_init()
>> +{
>> +struct hstate *h;
>> +int idx;
>> +
>> +idx = 0;
>> +for_each_hstate(h) {
>> +/*
>> + * Add cgroup control files only if the huge page consists
>> + * of more than two normal pages. This is because we use
>> + * page[2].lru.next for storing cgoup details.
>> + */
>> +if (h->order >= HUGETLB_CGROUP_MIN_ORDER)
>> +__hugetlb_cgroup_file_init(idx);
> 
> Is it better to say ?
> 
>  if (huge_page_order(h) >= HUGETLB_CGROUP_MIN_ORDER)
>   __hugetlb_cgroup_file_init(hstate_index(h));

Hi Aneesh,

Thanks for your review and suggestion, this is better.

Thanks,
Jianguo Wu

> 
> It should be ok both case.
> 
> Reviewed-by: Aneesh Kumar K.V 
> 
>> +
>> +idx++;
>> +}
>>  }
>>
>>  /*
>> -- 1.7.1
> 
> -anesh
> 
> 
> .
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-12-12 Thread Benjamin Herrenschmidt
On Wed, 2012-12-12 at 16:30 -0700, Alex Williamson wrote:
> Locked page accounting in this version is very, very broken.  How do
> powerpc folks feel about seemingly generic kernel iommu interfaces
> messing with the current task mm?  Besides that, more problems
> below...

After a second look & thought...

This whole accounting business is fucked. First, we simply can't just
randomly return errors from H_PUT_TCE because the process reached some
rlimit. This is not a proper failure mode. That means that the guest
will probably panic() ... possibly right in the middle of some disk
writeback or god knows what. Not good.

Also the overhead of doing all that crap on every TCE map/unmap is
ridiculous.

Finally, it's just not going to work for real mode which we really want,
since we can't take the mmap-sem in real mode anyway, so unless we
convert that counter to an atomic, we can't do it.

I'd suggest just not bothering, or if you want to bother, check once
when creating a TCE table that the rlimit is enough to bolt as many
pages as can be populated in that table and fail to create *that*. The
failure mode is much better, ie, qemu failing to create a PCI bus due to
insufficient rlimits.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] perf changes for v3.8

2012-12-12 Thread Linus Torvalds
Hmm. This may be entirely unrelated to this particular pull request, but

   perf record -e cycles:pp

no longer works on my westmere machine (Operation not supported). It
used to work, but I haven't tried to bisect it, since I hope somebody
will just go "oh, I know what's up".

dmesg says:

  Performance Events: PEBS fmt1+, 16-deep LBR, Westmere events, Intel
PMU driver.
  perf_event_intel: CPUID marked event: 'bus cycles' unavailable
  ... version:3
  ... bit width:  48
  ... generic registers:  4
  ... value mask: 
  ... max period: 7fff
  ... fixed-purpose events:   3
  ... event mask: 0007000f

Any ideas?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling

2012-12-12 Thread Alex Shi
On 12/12/2012 10:21 PM, Vincent Guittot wrote:
>>> >> If Linux is to continue to work efficiently on heterogeneous
>>> >> multi-processing platforms, it needs to provide scheduling mechanisms
>>> >> that can be exploited as per the demands of the HW architecture.
>> >
>> > Linus definitely disagree such ideas. :) So, need to summaries the
>> > logical beyond all hardware.
>> >
>>> >> example is the "small task packing (and spreading)" for which Vincent
>>> >> Guittot has posted a patchset[1] earlier and so has Alex now.
>> >
>> > Sure. I just thought my patchset should handled the 'small task
>> > packing' scenario. Could you guy like to have a try?
> Hi Alex,
> 
> Yes, I will do a try with your patchset when i will have some spare time

Thanks Vincent! the balance and powersaving policy should have effect.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/hugetlb: create hugetlb cgroup file in hugetlb_init

2012-12-12 Thread Jianguo Wu
On 2012/12/12 19:23, Michal Hocko wrote:

> On Wed 12-12-12 18:44:13, Xishi Qiu wrote:
>> On 2012/12/12 18:19, Michal Hocko wrote:
>>
>>> On Wed 12-12-12 16:25:59, Jianguo Wu wrote:
 Build kernel with CONFIG_HUGETLBFS=y,CONFIG_HUGETLB_PAGE=y
 and CONFIG_CGROUP_HUGETLB=y, then specify hugepagesz=xx boot option,
 system will boot fail.

 This failure is caused by following code path:
 setup_hugepagesz
hugetlb_add_hstate
hugetlb_cgroup_file_init
cgroup_add_cftypes
kzalloc <--slab is *not available* yet

 For this path, slab is not available yet, so memory allocated will be
 failed, and cause WARN_ON() in hugetlb_cgroup_file_init().

 So I move hugetlb_cgroup_file_init() into hugetlb_init().
>>>
>>> I do not think this is a good idea. hugetlb_init is in __init section as
>>> well so what guarantees that the slab is initialized by then? Isn't this
>>> just a good ordering that makes this working?
>>
>> Hi Michal,
>>
>> __initcall functions will be called in
>> start_kernel()
>>  rest_init()  // -> slab is already
>>  kernel_init()
>>  kernel_init_freeable()
>>  do_basic_setup()
>>  do_initcalls()
>>
>> and setup_hugepagesz() will be called in
>> start_kernel()
>>  parse_early_param()  // -> before mm_init() -> kmem_cache_init()
>>
>> Is this right?
> 
> Yes this is right. I just noticed that kmem_cache_init_late is an __init
> function as well and didn't realize it is called directly. Sorry about
> the confusion.
> Anyway I still think it would be a better idea to move the call into the
> hugetlb_cgroup_create callback where it is more logical IMO but now that

Hi Michal,

Thanks for your review and comments:).
hugetlb_cgroup_create is a callback of hugetlb_subsys,
and hugetlb_cgroup_file_init add h->cgroup_files to hugetlb_subsys,
so we cann't move hugetlb_cgroup_file_init into hugetlb_cgroup_create, right?

Thanks,
Jianguo wu

> I'm looking at other controllers (blk and kmem.tcp) they all do this from
> init calls as well. So it doesn't make sense to have hugetlb behave
> differently.
> 
> So
> Acked-by: Michal Hocko 
> 
> Thanks!



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec and struct boot_params

2012-12-12 Thread Yinghai Lu
On Thu, Dec 6, 2012 at 10:57 PM, Yinghai Lu  wrote:
> On Wed, Dec 5, 2012 at 7:12 PM, Yinghai Lu  wrote:
>> On Wed, Dec 5, 2012 at 5:57 PM, H. Peter Anvin  wrote:
>>> Hi Eric,
>>>
>>> Could you give a list of struct boot_params field which are initialized
>>> properly by kexec?  I think we need to do the sentinel hack, and want to
>>> know what to whitelist.
>>
>> the one with *
>> ? means set only when lfb_depth > 8
>> also
>>0xa2 to 0x1df is not set
>>
>> struct x86_linux_param_header {
>> uint8_t  orig_x;/* 0x00 */  *
>> uint8_t  orig_y;/* 0x01 */  *
>> uint16_t ext_mem_k; /* 0x02 -- EXT_MEM_K sits 
>> here */   *
>> uint16_t orig_video_page;   /* 0x04 */  *
>> uint8_t  orig_video_mode;   /* 0x06 */  *
>> uint8_t  orig_video_cols;   /* 0x07 */  *
>> uint16_t unused2;   /* 0x08 */
>> uint16_t orig_video_ega_bx; /* 0x0a */  *
>> uint16_t unused3;   /* 0x0c */
>> uint8_t  orig_video_lines;  /* 0x0e */  *
>> uint8_t  orig_video_isVGA;  /* 0x0f */   *
>> uint16_t orig_video_points; /* 0x10 */   *
>>
>> /* VESA graphic mode -- linear frame buffer */
>> uint16_t lfb_width; /* 0x12 */   *
>> uint16_t lfb_height;/* 0x14 */   *
>> uint16_t lfb_depth; /* 0x16 */   *
>> uint32_t lfb_base;  /* 0x18 */   *
>> uint32_t lfb_size;  /* 0x1c */   *
>> uint16_t cl_magic;  /* 0x20 */   *
>> #define CL_MAGIC_VALUE 0xA33F
>> uint16_t cl_offset; /* 0x22 */   *
>> uint16_t lfb_linelength;/* 0x24 */   *
>> uint8_t  red_size;  /* 0x26 */   ?
>> uint8_t  red_pos;   /* 0x27 */   ?
>> uint8_t  green_size;/* 0x28 */   ?
>> uint8_t  green_pos; /* 0x29 */   ?
>> uint8_t  blue_size; /* 0x2a */   ?
>> uint8_t  blue_pos;  /* 0x2b */   ?
>> uint8_t  rsvd_size; /* 0x2c */   ?
>> uint8_t  rsvd_pos;  /* 0x2d */   ?
>> uint16_t vesapm_seg;/* 0x2e */   *
>> uint16_t vesapm_off;/* 0x30 */
>> uint16_t pages; /* 0x32 */   *
>> uint8_t  reserved4[12]; /* 0x34 -- 0x3f reserved for 
>> future expansion */
>>
>> struct apm_bios_info apm_bios_info; /* 0x40 */   *
>> struct drive_info_struct drive_info;/* 0x80 */   *
>> struct sys_desc_table sys_desc_table;   /* 0xa0 */   * only .length =
>> 0, aka 0xa2 to 0x1df is not set
>> uint32_t alt_mem_k; /* 0x1e0 */  *
>> uint8_t  reserved5[4];  /* 0x1e4 */
>> uint8_t  e820_map_nr;   /* 0x1e8 */  *
>> uint8_t  eddbuf_entries;/* 0x1e9 */  *
>> uint8_t  edd_mbr_sig_buf_entries;   /* 0x1ea */  *
>> uint8_t  reserved6[6];  /* 0x1eb */
>> HEADER. copied and or 
>> meset 0 and set.
>> uint8_t  reserved16[0x290 - 0x248]; /* 0x248 */
>> uint32_t edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];   /* 0x290 */  *
>> #endif
>> struct  e820entry e820_map[E820MAX];/* 0x2d0 */   *
>> uint8_t _pad8[48];  /* 0xcd0 */
>> struct  edd_info eddbuf[EDDMAXNR];  /* 0xd00 */   *
>> /* 0xeec */
>> #define COMMAND_LINE_SIZE 2048
>> };
>>
>>
>>>
>>> Also, does kexec set a bootloader ID?
>>
>> no, 0xff
>
> Hi, Peter,
>
> What's your decision about this?
>
> Do you mean have one boot_params mask in initdata and AND that with
> boot_params from bootloader
> to clean not used bytes?
>
> So later will not need to check
> if (boot_params.hdr.xloadflags & USE_EXT_BOOT_PARAMS)
> ?
>
> I worked out other patches that remove kdump 896M limitation.
> would like to post those patches to get more testing.
> those are needed for bigger system with lots of pcie devices.


ping!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-12 Thread Alex Shi
On 12/13/2012 10:17 AM, Alex Shi wrote:
> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
>> During the creation of sched_domain, we define a pack buddy CPU for each CPU
>> when one is available. We want to pack at all levels where a group of CPU can
>> be power gated independently from others.
>> On a system that can't power gate a group of CPUs independently, the flag is
>> set at all sched_domain level and the buddy is set to -1. This is the default
>> behavior.
>> On a dual clusters / dual cores system which can power gate each core and
>> cluster independently, the buddy configuration will be :
>>
>>   | Cluster 0   | Cluster 1   |
>>   | CPU0 | CPU1 | CPU2 | CPU3 |
>> ---
>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
>>
>> Small tasks tend to slip out of the periodic load balance so the best place
>> to choose to migrate them is during their wake up. The decision is in O(1) as
>> we only check again one buddy CPU
> 
> Just have a little worry about the scalability on a big machine, like on
> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
> is different on task distribution decision.

In above big machine example, only one buddy cpu is not sufficient on
each of level, like for 4 sockets level, maybe tasks can just full fill
2 sockets, then we just use 2 sockets, that is more performance/power
efficient. But one buddy cpu here need to spread tasks to 4 sockets all.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-12 Thread NeilBrown
On Wed, 12 Dec 2012 10:20:34 -0600 Jon Hunter  wrote:

> 
> On 12/12/2012 05:31 AM, Thierry Reding wrote:
> > On Wed, Dec 12, 2012 at 07:24:30PM +1100, NeilBrown wrote:
> 
> [snip]
> 
> >> +static int omap_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm)
> >> +{
> >> +  struct omap_chip *omap = to_omap_chip(chip);
> >> +  int status = 0;
> >> +
> >> +  /* Enable the counter--always--before attempting to write its
> >> +   * registers and then set the timer to its minimum load value to
> >> +   * ensure we get an overflow event right away once we start it.
> >> +   */
> > 
> > Block comments should be in the following format:
> > 
> > /*
> >  * foo...
> >  * bar...
> >  */
> > 
> >> +
> >> +  omap_dm_timer_enable(omap->dm_timer);
> >> +  omap_dm_timer_write_counter(omap->dm_timer, DM_TIMER_LOAD_MIN);
> >> +  omap_dm_timer_start(omap->dm_timer);
> >> +  omap_dm_timer_disable(omap->dm_timer);
> > 
> > So omap_dm_timer_disable() doesn't actually stop the timer? It just
> > disables the access to the registers?
> 
> I thought this looked odd too ;-)
> 
> So what is going on here is that omap_dm_timer_start() calls
> omap_dm_timer_enable() but does not call omap_dm_timer_disable(). So the
> last disable really just complements the first enable (ie. decrements
> the use count), but the timer will not actually be disabled, because the
> start has called an extra enable.
> 
> These four function calls can be replaced by one call to
> omap_dm_timer_set_load_start() and I think that will be much clearer and
> concise.

So it now reads:


/*
 * Set the timer to its minimum load value to ensure we get an
 * overflow event right away once we start it.
 */

omap_dm_timer_set_load_start(omap->dm_timer, true, DM_TIMER_LOAD_MIN);


Certainly more concise - thanks.


> 
> In general, it should not be necessary to call these
> omap_dm_timer_enable/disable APIs directly. I am not sure what the
> history is or if there is a use-case that really requires this. So in
> the future may be I should make them static so they cannot be used
> directly :-)

I've removed the other instance of these calls - in omap_pwm_config.


Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-12-12 Thread Benjamin Herrenschmidt
On Wed, 2012-12-12 at 16:30 -0700, Alex Williamson wrote:

> Locked page accounting in this version is very, very broken.  How do
> powerpc folks feel about seemingly generic kernel iommu interfaces
> messing with the current task mm?  Besides that, more problems below...

Not good at all :-)

I don't understand tho ... H_PUT_TCE calls should be in the qemu context
(or the guest) as current at the point of the call, so everything should
be accounted fine on the *current* task when those calls occur, what's
the point of the work queue Alexey ?

This code looks horribly complicated ... where does it come from ?

> > +/*
> > + * iommu_reset_table is called when it started/stopped being used.
> > + *
> > + * restore==true says to bring the iommu_table into the state as it was
> > + * before being used by VFIO.
> > + */
> > +void iommu_reset_table(struct iommu_table *tbl, bool restore)
> > +{
> > +   /* Page#0 is marked as used in iommu_init_table, so we clear it... */
> > +   if (!restore && (tbl->it_offset == 0))
> > +   clear_bit(0, tbl->it_map);
> > +
> > +   iommu_clear_tces(tbl, tbl->it_offset, tbl->it_size);
> 
> This does locked page accounting and unpins pages, even on startup when
> the pages aren't necessarily pinned or accounted against the current
> process.

Not sure what you mean Alex, and not sure either what Alexey
implementation actually does but indeed, pages inside an iommu table
that was used by the host don't have their refcount elevated by the fact
that they are there.

So when taking ownership of an iommu for vfio, you probably need to FAIL
if any page is already mapped. Only once you know the iommu is clear for
use, then you can start populating it and account for anything you put
in it (and de-account anything you remove from it when cleaning things
up).

> > +
> > +   /* ... or restore  */
> > +   if (restore && (tbl->it_offset == 0))
> > +   set_bit(0, tbl->it_map);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_reset_table);
> > +
> > +/*
> > + * Returns the number of used IOMMU pages (4K) within
> > + * the same system page (4K or 64K).
> > + *
> > + * syspage_weight_zero is optimized for expected case == 0
> > + * syspage_weight_one is optimized for expected case > 1
> > + * Other case are not used in this file.
> > + */
> > +#if PAGE_SIZE == IOMMU_PAGE_SIZE
> > +
> > +#define syspage_weight_zero(map, offset)   test_bit((map), (offset))
> > +#define syspage_weight_one(map, offset)test_bit((map), 
> > (offset))
> > +
> > +#elif PAGE_SIZE/IOMMU_PAGE_SIZE == 16
> > +
> > +static int syspage_weight_zero(unsigned long *map, unsigned long offset)
> > +{
> > +   offset &= PAGE_MASK >> IOMMU_PAGE_SHIFT;
> > +   return 0xUL & (map[BIT_WORD(offset)] >>
> > +   (offset & (BITS_PER_LONG-1)));
> > +}
> 
> I would have expected these to be bools and return true if the weight
> matches the value.

What is that business anyway ? It's very obscure.

> If you replaced 0x above w/ this, would you need the #error below?
> 
> (1UL << (PAGE_SIZE/IOMMU_PAGE_SIZE)) - 1)
> 
> > +
> > +static int syspage_weight_one(unsigned long *map, unsigned long offset)
> > +{
> > +   int ret = 0, nbits = PAGE_SIZE/IOMMU_PAGE_SIZE;
> > +
> > +   /* Aligns TCE entry number to system page boundary */
> > +   offset &= PAGE_MASK >> IOMMU_PAGE_SHIFT;
> > +
> > +   /* Count used 4K pages */
> > +   while (nbits && (ret < 2)) {
> 
> Don't you have a ffs()?  Could also be used for _zero.  Surely there are
> some bitops helpers that could help here even on big endian.  hweight
> really doesn't work?
> 
> > +   if (test_bit(offset, map))
> > +   ++ret;
> > +
> > +   --nbits;
> > +   ++offset;
> > +   }
> > +
> > +   return ret;
> > +}
> > +#else
> > +#error TODO: support other page size
> > +#endif

What combinations do you support ?

> > +static void tce_flush(struct iommu_table *tbl)
> > +{
> > +   /* Flush/invalidate TLB caches if necessary */
> > +   if (ppc_md.tce_flush)
> > +   ppc_md.tce_flush(tbl);
> > +
> > +   /* Make sure updates are seen by hardware */
> > +   mb();
> > +}
>> +
> > +/*
> > + * iommu_clear_tces clears tces and returned the number of system pages
> > + * which it called put_page() on
> > + */
> > +static long clear_tces_nolock(struct iommu_table *tbl, unsigned long entry,
> > +   unsigned long pages)
> > +{
> > +   int i, retpages = 0, clr;
> > +   unsigned long oldtce, oldweight;
> > +   struct page *page;
> > +
> > +   for (i = 0; i < pages; ++i, ++entry) {
> > +   if (!test_bit(entry - tbl->it_offset, tbl->it_map))
> > +   continue;
> > +
> > +   oldtce = ppc_md.tce_get(tbl, entry);
> > +   ppc_md.tce_free(tbl, entry, 1);
> > +
> > +   oldweight = syspage_weight_one(tbl->it_map,
> > +   entry - tbl->it_offset);
> > +   clr = __test_and_clear_bit(entry - tbl->it_offset,
> > +   tbl->it_map);
> > +
> > 

Re: linux-next: unusual update of the security tree

2012-12-12 Thread Stephen Rothwell
Hi James,

On Fri, 7 Dec 2012 10:21:31 +1100 (EST) James Morris  wrote:
>
> On Thu, 6 Dec 2012, Linus Torvalds wrote:
> 
> > Have people pulled that thing into anything else? Because quite
> > frankly, I think it's unsalvageable except with a rebase.
> 
> AFAIK, only developers such as Casey will have pulled it for development 
> purposes.
> 
> And sorry, I should be checking the trees I pull from more carefully.

Are you going to fix this before asking Linus to pull?
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpu7LR2jHhBy.pgp
Description: PGP signature


Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-12-12 Thread Alexey Kardashevskiy

On 13/12/12 10:30, Alex Williamson wrote:

On Wed, 2012-12-12 at 23:34 +1100, Alexey Kardashevskiy wrote:

This patch initializes IOMMU groups based on the IOMMU
configuration discovered during the PCI scan on POWERNV
(POWER non virtualized) platform. The IOMMU groups are
to be used later by VFIO driver (PCI pass through).

It also implements an API for mapping/unmapping pages for
guest PCI drivers and providing DMA window properties.
This API is going to be used later by QEMU-VFIO to handle
h_put_tce hypercalls from the KVM guest.

Although this driver has been tested only on the POWERNV
platform, it should work on any platform which supports
TCE tables.

To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
option and configure VFIO as required.

Cc: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
  arch/powerpc/include/asm/iommu.h |   10 ++
  arch/powerpc/kernel/iommu.c  |  329 ++
  arch/powerpc/platforms/powernv/pci.c |  134 ++
  drivers/iommu/Kconfig|8 +
  4 files changed, 481 insertions(+)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index cbfe678..3c861ae 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -76,6 +76,9 @@ struct iommu_table {
struct iommu_pool large_pool;
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
+#ifdef CONFIG_IOMMU_API
+   struct iommu_group *it_group;
+#endif
  };

  struct scatterlist;
@@ -147,5 +150,12 @@ static inline void iommu_restore(void)
  }
  #endif

+extern void iommu_reset_table(struct iommu_table *tbl, bool restore);
+extern long iommu_clear_tces(struct iommu_table *tbl, unsigned long ioba,
+   unsigned long size);
+extern long iommu_put_tces(struct iommu_table *tbl, unsigned long ioba,
+   uint64_t tce, enum dma_data_direction direction,
+   unsigned long size);
+
  #endif /* __KERNEL__ */
  #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ff5a6ce..f3bb2e7 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -44,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 

  #define DBG(...)

@@ -856,3 +858,330 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
free_pages((unsigned long)vaddr, get_order(size));
}
  }
+
+#ifdef CONFIG_IOMMU_API
+/*
+ * SPAPR TCE API
+ */
+
+struct vwork {
+   struct mm_struct*mm;
+   longnpage;
+   struct work_struct  work;
+};
+
+/* delayed decrement/increment for locked_vm */
+static void lock_acct_bg(struct work_struct *work)
+{
+   struct vwork *vwork = container_of(work, struct vwork, work);
+   struct mm_struct *mm;
+
+   mm = vwork->mm;
+   down_write(>mmap_sem);
+   mm->locked_vm += vwork->npage;
+   up_write(>mmap_sem);
+   mmput(mm);
+   kfree(vwork);
+}
+
+static void lock_acct(long npage)
+{
+   struct vwork *vwork;
+   struct mm_struct *mm;
+
+   if (!current->mm)
+   return; /* process exited */
+
+   if (down_write_trylock(>mm->mmap_sem)) {
+   current->mm->locked_vm += npage;
+   up_write(>mm->mmap_sem);
+   return;
+   }
+
+   /*
+* Couldn't get mmap_sem lock, so must setup to update
+* mm->locked_vm later. If locked_vm were atomic, we
+* wouldn't need this silliness
+*/
+   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
+   if (!vwork)
+   return;
+   mm = get_task_mm(current);
+   if (!mm) {
+   kfree(vwork);
+   return;
+   }
+   INIT_WORK(>work, lock_acct_bg);
+   vwork->mm = mm;
+   vwork->npage = npage;
+   schedule_work(>work);
+}


Locked page accounting in this version is very, very broken.  How do
powerpc folks feel about seemingly generic kernel iommu interfaces
messing with the current task mm?  Besides that, more problems below...


+
+/*
+ * iommu_reset_table is called when it started/stopped being used.
+ *
+ * restore==true says to bring the iommu_table into the state as it was
+ * before being used by VFIO.
+ */
+void iommu_reset_table(struct iommu_table *tbl, bool restore)
+{
+   /* Page#0 is marked as used in iommu_init_table, so we clear it... */
+   if (!restore && (tbl->it_offset == 0))
+   clear_bit(0, tbl->it_map);
+
+   iommu_clear_tces(tbl, tbl->it_offset, tbl->it_size);


This does locked page accounting and unpins pages, even on startup when
the pages aren't necessarily pinned or accounted against the current
process.

>

+
+   /* ... or restore  */
+   if (restore && (tbl->it_offset == 0))
+   

Re: [PATCH] OMAP: add pwm driver using dmtimers.

2012-12-12 Thread NeilBrown
On Wed, 12 Dec 2012 12:31:45 +0100 Thierry Reding
 wrote:

> On Wed, Dec 12, 2012 at 07:24:30PM +1100, NeilBrown wrote:
> > 
> > 
> > This patch is based on an earlier patch by Grant Erickson
> > which provided pwm devices using the 'legacy' interface.
> > 
> > This driver instead uses the new framework interface.
> 
> I'd prefer some kind of description about the driver here.

I'm not really sure what more there is to say.  There was a bit of text in a
comment at the top of the file which I've copied to the commit comment.


>Also the
> subject should be something like:
> 
>   pwm: Add OMAP support using dual-mode timers
> 
> or
> 
>   pwm: omap: Add PWM support using dual-mode timers

Done - I chose the second.

> 
> I take this description to mean that OMAP doesn't have dedicated PWM
> hardware? Otherwise it might be bad to call this pwm-omap.

Correct.  The timers can be used for a number of things which explicitly
includes PWM.

> 
> Also please use all-caps when referring to PWM devices in prose. A few
> other comments inline below.

OK.

> 
> > Cc: Grant Erickson 
> > Signed-off-by: NeilBrown 
> > 
> > diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> > index ed81720..7df573a 100644
> > --- a/drivers/pwm/Kconfig
> > +++ b/drivers/pwm/Kconfig
> > @@ -85,6 +85,15 @@ config PWM_MXS
> >   To compile this driver as a module, choose M here: the module
> >   will be called pwm-mxs.
> >  
> > +config PWM_OMAP
> > +   tristate "OMAP pwm support"
> 
> "OMAP PWM support"

Fixed.

> 
> > diff --git a/drivers/pwm/pwm-omap.c b/drivers/pwm/pwm-omap.c
> [...]
> > + *The 'id' number for the device encodes the number of the dm timer
> > + *to use, and the polarity of the output.
> > + *lsb is '1' of active-high, and '0' for active low
> > + *remaining bit a timer number and need to be shifted down before use.
> 
> I don't know if this is such a good idea. Usually you number platform
> devices sequentially, while this would leave gaps in the numbering. I
> know that adding platform data may sound a bit like overkill, but I
> really think the added clarity and consistency is worth it.

I guess so.  No other PWM driver seems to use platform data, and I needed so
little...
I'll see what I can do.


> 
> > +#define pr_fmt(fmt) "pwm-omap: " fmt
> 
> You don't seem to be using any of the pr_*() logging functions, so this
> isn't needed.

Gone now, thanks.


> 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#define DM_TIMER_LOAD_MIN  0xFFFE
> > +
> > +struct omap_chip {
> > +   struct platform_device  *pdev;
>
> I don't see this field being used anywhere.

No.  Gone.

> 
> > +   struct omap_dm_timer*dm_timer;
> > +   unsigned intpolarity;
> 
> The PWM subsystem already has enum pwm_polarity for this.
> 

I'll use that then  and as there  is a pwm_set_polarity() interface, that
probably means that I don't need to configure the polarity via the platform
data?  That would be a lot cleaner.


> > +   const char  *label;
> 
> This isn't used anywhere either.

Gone.

> 
> > +
> > +   unsigned intduty_ns, period_ns;
> > +   struct pwm_chip chip;
> > +};
> > +
> > +#define to_omap_chip(chip) container_of(chip, struct omap_chip, chip)
> > +
> > +#definepwm_dbg(_pwm, msg...) dev_dbg(&(_pwm)->pdev->dev, msg)
> 
> This is never used.

:-)  There is a theme here.


> 
> > +
> > +/**
> > + * pwm_calc_value - determines the counter value for a clock rate and 
> > period.
> 
> Nit: You should either start the sentence with a capital or not
> terminate it with a full stop.

In this case the sentence really includes the function name which is
case-sensitive so cannot be capitalised ;-)
I'll rephrase a bit and find something to capitalise.

> 
> > + * @clk_rate: The clock rate, in Hz, of the PWM's clock source to compute 
> > the
> > + *counter value for.
> > + * @ns: The period, in nanoseconds, to computer the counter value for.
> 
> "compute"

Yep.

> 
> > + *
> > + * Returns the PWM counter value for the specified clock rate and period.
> > + */
> > +static inline int pwm_calc_value(unsigned long clk_rate, int ns)
> > +{
> > +   const unsigned long nanoseconds_per_second = 10;
> 
> Maybe use NSEC_PER_SEC instead?

Good idea, thanks.

> 
> > +   int cycles;
> > +   __u64 c;
> 
> I think for in-kernel use, the custom is to stick with simply u64.

It is, yes.


> 
> > +
> > +   c = (__u64)clk_rate * ns;
> > +   do_div(c, nanoseconds_per_second);
> > +   cycles = c;
> > +
> > +   return DM_TIMER_LOAD_MIN - cycles;
> 
> Can't you just do "DM_TIMER_LOAD_MIN - c" and get rid of the cycles
> variable altogether?

Yep.

> 
> > +static int omap_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm)
> > +{
> > +   struct omap_chip *omap = 

Re: [PATCH 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain

2012-12-12 Thread Alex Shi
On 12/12/2012 09:31 PM, Vincent Guittot wrote:
> This new flag SD_SHARE_POWERDOMAIN is used to reflect whether groups of CPU in
> a sched_domain level can or not reach a different power state. If clusters can
> be power gated independently, as an example, the flag should be cleared at CPU
> level. This information is used to decide if it's worth packing some tasks in
> a group of CPUs in order to power gated the other groups instead of spreading
> the tasks. The default behavior of the scheduler is to spread tasks so the
> flag is set into all sched_domains
> 
> Signed-off-by: Vincent Guittot 
> ---
>  arch/ia64/include/asm/topology.h |1 +
>  arch/tile/include/asm/topology.h |1 +
>  include/linux/sched.h|1 +
>  include/linux/topology.h |4 
>  kernel/sched/core.c  |6 ++
>  5 files changed, 13 insertions(+)
> 
> diff --git a/arch/ia64/include/asm/topology.h 
> b/arch/ia64/include/asm/topology.h
> index a2496e4..6d0b617 100644
> --- a/arch/ia64/include/asm/topology.h
> +++ b/arch/ia64/include/asm/topology.h
> @@ -65,6 +65,7 @@ void build_cpu_to_node_map(void);
>   | SD_BALANCE_EXEC   \
>   | SD_BALANCE_FORK   \
>   | SD_WAKE_AFFINE,   \
> + | arch_sd_local_flags(0)\
>   .last_balance   = jiffies,  \
>   .balance_interval   = 1,\
>   .nr_balance_failed  = 0,\
> diff --git a/arch/tile/include/asm/topology.h 
> b/arch/tile/include/asm/topology.h
> index d5e86c9..adc8710 100644
> --- a/arch/tile/include/asm/topology.h
> +++ b/arch/tile/include/asm/topology.h
> @@ -71,6 +71,7 @@ static inline const struct cpumask *cpumask_of_node(int 
> node)
>   | 0*SD_WAKE_AFFINE  \
>   | 0*SD_SHARE_CPUPOWER   \
>   | 0*SD_SHARE_PKG_RESOURCES  \
> + | arch_sd_local_flags(0)\
>   | 0*SD_SERIALIZE\
>   ,   \
>   .last_balance   = jiffies,  \
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 046e39a..3287be1 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -844,6 +844,7 @@ enum cpu_idle_type {
>  #define SD_BALANCE_WAKE  0x0010  /* Balance on wakeup */
>  #define SD_WAKE_AFFINE   0x0020  /* Wake task to waking CPU */
>  #define SD_SHARE_CPUPOWER0x0080  /* Domain members share cpu power */
> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>  #define SD_SHARE_PKG_RESOURCES   0x0200  /* Domain members share cpu pkg 
> resources */
>  #define SD_SERIALIZE 0x0400  /* Only a single load balancing 
> instance */
>  #define SD_ASYM_PACKING  0x0800  /* Place busy groups earlier in 
> the domain */
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index d3cf0d6..3eab293 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -99,6 +99,8 @@ int arch_update_cpu_topology(void);
>   | 1*SD_WAKE_AFFINE  \
>   | 1*SD_SHARE_CPUPOWER   \
>   | 1*SD_SHARE_PKG_RESOURCES  \
> + | arch_sd_local_flags(SD_SHARE_CPUPOWER|\
> + SD_SHARE_PKG_RESOURCES) \
>   | 0*SD_SERIALIZE\
>   | 0*SD_PREFER_SIBLING   \
>   | arch_sd_sibling_asym_packing()\
> @@ -131,6 +133,7 @@ int arch_update_cpu_topology(void);
>   | 1*SD_WAKE_AFFINE  \
>   | 0*SD_SHARE_CPUPOWER   \
>   | 1*SD_SHARE_PKG_RESOURCES  \
> + | arch_sd_local_flags(SD_SHARE_PKG_RESOURCES)\
>   | 0*SD_SERIALIZE\
>   ,   \
>   .last_balance   = jiffies,  \
> @@ -161,6 +164,7 @@ int arch_update_cpu_topology(void);
>   | 1*SD_WAKE_AFFINE  \
>   | 0*SD_SHARE_CPUPOWER   \
>   | 0*SD_SHARE_PKG_RESOURCES  \
> + | arch_sd_local_flags(0)\

The general style looks like prefering SD_XXX flag directly.

--
To unsubscribe from this list: send the line "unsubscribe 

Re: [PATCH] X86/acpi: remove redundant logic of acpi memory hotadd

2012-12-12 Thread Jiang Liu
On 2012-12-12 22:37, Liu, Jinsong wrote:
> Wen Congyang wrote:
>> At 12/08/2012 06:19 AM, Rafael J. Wysocki Wrote:
>>> On Tuesday, December 04, 2012 01:39:54 AM Liu, Jinsong wrote:
 Resend it, add Rafael and linux-a...@vger.kernel.org
>>>
>>> I wonder what memory hotplug people think about that.
>>>
>>> Thanks,
>>> Rafael
>>>
>>>
 ===
 From 1d39279e45c54ce531691da5ffe261e7689dd92c Mon Sep 17 00:00:00
 2001 
 From: Liu Jinsong 
 Date: Wed, 14 Nov 2012 18:52:06 +0800
 Subject: [PATCH] X86/acpi: remove redundant logic of acpi memory
 hotadd 

 When memory hotadd, acpi_memory_enable_device has already been done
 at drv->ops.add (acpi_memory_device_add), no need to do it again
 at notify callback.

 At acpi_memory_enable_device, acpi_memory_get_device_resources
 is also a redundant action, since it has been done at drv->ops.add.

 Signed-off-by: Liu Jinsong 
 ---
  drivers/acpi/acpi_memhotplug.c |   17 -
  1 files changed, 0 insertions(+), 17 deletions(-)

 diff --git a/drivers/acpi/acpi_memhotplug.c
 b/drivers/acpi/acpi_memhotplug.c 
 index 24c807f..a6489fd 100644
 --- a/drivers/acpi/acpi_memhotplug.c
 +++ b/drivers/acpi/acpi_memhotplug.c
 @@ -220,15 +220,6 @@ static int acpi_memory_enable_device(struct
acpi_memory_device *mem_device) struct acpi_memory_info *info;
int node;

 -
 -  /* Get the range from the _CRS */
 -  result = acpi_memory_get_device_resources(mem_device);
 -  if (result) {
 -  printk(KERN_ERR PREFIX "get_device_resources failed\n");
 -  mem_device->state = MEMORY_INVALID_STATE;
 -  return result;
 -  }
 -
node = acpi_get_node(mem_device->device->handle);   /*
 * Tell the VM there is more memory here...
 @@ -357,14 +348,6 @@ static void
acpi_memory_device_notify(acpi_handle handle, u32 event, void
 *data) break; } 

 -  if (acpi_memory_check_device(mem_device))
 -  break;
>>
>> Hmm, if acpi_memory_check_device() fails, it means the memory device
>> disappears 
>> I don't know if a real hardware uses this way to remove memory device.
>>
 -
 -  if (acpi_memory_enable_device(mem_device)) {
 -  printk(KERN_ERR PREFIX "Cannot enable memory device\n");
 -  break;
 -  }
>>
>> If acpi_memory_get_device() doesn't fail, it means that the device
>> has been managed by this driver, so I think we can do this cleanup.
>>
>> Thanks
>> Wen Congyang
>>
> 
> Thanks! any comments from Huawei side, Jiang?
Hi Jinsong,

We think it's ok.

acpi_memory_device_notify
acpi_memory_get_device
acpi_memory_device_add
acpi_memory_get_device_resources
acpi_memory_enable_device
acpi_memory_get_device_resources(redundant)
acpi_memory_check_device(redundant)
acpi_memory_enable_device(redundant)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2012-12-12 Thread Linus Torvalds
On Wed, Dec 12, 2012 at 6:27 PM, David Miller  wrote:
>
> There are two SCTP HMAC cookie algorithms, MD5 and SHA1.
>
> What used to happen is that you had to choose one at build
> time, and then you were stuck with that decision and it was
> all that you could use.
>
> Now, it's selectable at run time.
>
> If there's anything you find particularly anti-social about
> this, I'm sure we can adjust it.

So I'd suggest doing the same thing that the new thermal throttling
Kconfig does: start off by asking for the default algorithm, then ask
about the others.

The "choice" part selects the one that is default (so it never gets
asked about and is obviously compiled in), and the rest default to no
like we should.

See drivers/thermal/Kconfig for an example of this. I think we do it
in other places too, but that one happens to be new so I picked it as
an example.

The rule should be that we *never* default anything to 'yes', unless
it's old functionality that we always compiled in before too, and now
it got made conditional. So if you see a "default y" on new options,
you should basically consider it broken.

We're already bloating too much, we should not encourage people to
make things more bloated than necessary.

Btw, that Kconfig option has basically no useful help text either.
What's the point of repeating the question as a "help" message?

If people can't explain why anybody should enable it, it sure as hell
shouldn't default to 'y'. Maybe it shouldn't exist at all?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB: adds support for Novatek wireless mouse to usbhid

2012-12-12 Thread draekko.dev
I followed what was listed here
http://www.kernel.org/doc/Documentation/SubmittingPatches i won't be
resubmitting, i've already spent way too much time for a few lines of
code, I've already been through this with the Ubuntu guys amongst other
reasons. I can email you the file as attachment but that's the extent of
my involvement at this point.

Wishing you y'all a nice evening,
Ben

PS. Resubmitting this email as it for some reason was only sent to Greg.

On Wed, Dec 12, 2012 at 8:50 PM, Greg KH  wrote:
> On Wed, Dec 12, 2012 at 08:43:32PM -0500, Benoit Touchette wrote:
>> This patch adds support for the Novatek wireless mouse (device id
>> 0603:1602) to usbhid and fixes bug
>> https://bugzilla.kernel.org/show_bug.cgi?id=47031
>>
>> Reported-by: Benoit Touchette 
>> Signed-off-by: Benoit Touchette 
>>
>> ---
>>
>> --- linux-3.6.10/drivers/hid/hid-ids.h.orig2012-12-10
>> 14:13:27.0 -0500
>> +++ linux-3.6.10/drivers/hid/hid-ids.h2012-12-12 20:06:41.0 -0500
>
> Please use the scripts/get_maintainer.pl tool to send this patch to the
> correct people and mailing list.
>
> Also, your patch is line-wrapped, and can't be applied, you need to look
> into fixing your email client to not do this if you wish to get patches
> accepted.
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 答复: [TRIVIAL PATCH 15/26] unicore32: Convert print_symbol to %pSR

2012-12-12 Thread Joe Perches
On Thu, 2012-12-13 at 09:00 +0800, Guan Xuetao wrote:
> 确认下面的patch可以正常编译,无warning信息

There is no unicore32 cross compiler available
on kernel.org.

ftp://ftp.kernel.org/pub/tools/crosstool/index.html

These are unsigned long cast to (void *)
What is the warning and how does it differ
from any existing warning?

> > diff --git a/arch/unicore32/kernel/process.c
[]
> > @@ -169,8 +169,9 @@ void __show_regs(struct pt_regs *regs)
> > init_utsname()->release,
> > (int)strcspn(init_utsname()->version, " "),
> > init_utsname()->version);
> > -   print_symbol("PC is at %s\n", instruction_pointer(regs));
> > -   print_symbol("LR is at %s\n", regs->UCreg_lr);
> > +   printk(KERN_DEFAULT "PC is at %pSR\n",
> > +  (void *)instruction_pointer(regs));
> > +   printk(KERN_DEFAULT "LR is at %pSR\n", (void *)regs->UCreg_lr);
> > printk(KERN_DEFAULT "pc : [<%08lx>]lr : [<%08lx>]psr:
> %08lx\n"
> >"sp : %08lx  ip : %08lx  fp : %08lx\n",
> > regs->UCreg_pc, regs->UCreg_lr, regs->UCreg_asr,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2012-12-12 Thread David Miller
From: Linus Torvalds 
Date: Wed, 12 Dec 2012 18:15:04 -0800

> On Wed, Dec 12, 2012 at 12:11 PM, David Miller  wrote:
>>
>> There is one merge conflict to resolve in net/sched/cls_cgroup.c,
>> one commit changes the name of some members to "css_*" (this came
>> from Tejun's tree) and another commit adds an "attach" method.
> 
> There's more than that. The ARM board mess is apparently now affecting
> the networking merges too.
> 
> I fixed it up. Hopefully correctly.
> 
> Also, why does the new SHA1 hmac cookie support default to 'y'?

There are two SCTP HMAC cookie algorithms, MD5 and SHA1.

What used to happen is that you had to choose one at build
time, and then you were stuck with that decision and it was
all that you could use.

Now, it's selectable at run time.

If there's anything you find particularly anti-social about
this, I'm sure we can adjust it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] regulator updates for v3.8

2012-12-12 Thread Mark Brown
The following changes since commit 9489e9dcae718d5fde988e4a684a0f55b5f94d17:

  Linux 3.7-rc7 (2012-11-25 17:59:19 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 
tags/regulator-3.8

for you to fetch changes up to 4ffc45c3604dd8e283884ce006faf0e955cbd9e6:

  Merge remote-tracking branch 'regulator/topic/sm8767' into regulator-next 
(2012-12-11 21:44:55 +0900)



regulator: Updates for v3.8

A fairly quiet release again, a couple of relatively small new features
and a bunch of driver specific work including yet more code elimination
and fixes from Axel Lin.

- Addidion of linear_min_sel for offsetting linear selectors in the
  helpers.
- Support for continuous voltage ranges for regulators with extremely
  high resolution.
- Drivers for AS3711, DA9055, MAX9873, TPS51632, TPS80031 and ARM vexpress.


Amit Daniel Kachhap (3):
  regulator: s5m8767: Fix to work when platform registers less regulators
  regulator: s5m8767: Fix to read the first DVS register.
  regulator: s5m8767: Fix to work even if no DVS gpio present

AnilKumar Ch (1):
  regulator: tps65910: fix BUG_ON() shown with vrtc regulator

Ashish Jangam (1):
  regulator: DA9055 regulator driver

Axel Lin (31):
  regulator: vexpress: Add terminating entry for 
vexpress_regulator_of_match table
  regulator: tps51632: Fix trivial typo for TPS51632 Kconfig help text
  regulator: da9055: Fix checking wrong value in da9055_gpio_init
  regulator: da9055: Properly handle voltage range that doesn't start with 
0 offset
  regulator: max8973: Fix callback setting for max8973_dcdc_ops.enable
  regulator: tps80031: Trivial cleanups
  regulator: tps80031: Implement list_voltage and set n_voltages = 1 for 
fixed regulators
  regulator: tps65090: Add MODULE_ALIAS
  regulator: core: Allow specific minimal selector for starting linear 
mapping
  regulator: palmas: Use linear_min_sel and 
regulator_[map|list]_voltage_linear
  regulator: da9055: Use linear_min_sel and 
regulator_[map|list]_voltage_linear
  regulator: tps51632: Fix writing to wrong register when enable_pwm_dvfs 
is set
  regulator: da9055: Select maximum current in specific range for 
set_current_limit
  regulator: da9052: Ensure setting current limit within specific range
  regulator: as3711: Fix valid min_uV/max_UV checking in as3711_bound_check
  regulator: as3711: Fix the logic in as3711_sel_check
  regulator: pcf50633: Use linear_min_sel and 
regulator_[map|list]_voltage_linear
  regulator: tps51632: Use linear_min_sel and 
regulator_[map|list]_voltage_linear
  regulator: wm831x-dcdc: Ensure selected voltage falls within requested 
range
  regulator: wm831x-dcdc: Add MODULE_ALIAS for wm831x-boostp
  regulator: tps51632: Ensure [base|max]_voltage_uV pdata settings are valid
  regulator: palmas: Return raw register values as the selectors in 
[get|set]_voltage_sel
  regulator: palmas: Convert palmas_ops_smps to 
regulator_[get|set]_voltage_sel_regmap
  regulator: gpio-regulator: Add ifdef CONFIG_OF guard for 
regulator_gpio_of_match
  regulator: lp8788-ldo: Remove val array in lp8788_config_ldo_enable_mode
  regulator: tps80031: Convert tps80031_ldo_ops to linear_min_sel and 
list_voltage_linear
  regulator: lp8788-buck: Kill _gpio_request function
  regulator: max1586: Implement get_voltage_sel callback
  regulator: anatop: Use linear_min_sel with linear mapping
  regulator: anatop: Use regulator_[get|set]_voltage_sel_regmap
  regulator: core: Fix logic to determinate if regulator can change voltage

Bill Pemberton (3):
  regulator: remove use of __devexit_p
  regulator: remove use of __devinit
  regulator: remove use of __devexit

Frank Li (2):
  regulator: gpio-regulator: doc: Fix wrong key gpio-enable
  regulator: gpio-regulator: fix can't find regulator node in dt

Guennadi Liakhovetski (1):
  regulator: add a regulator driver for the AS3711 PMIC

Laxman Dewangan (11):
  regulator: tps51632: Add tps51632 regulator driver
  regulator: tps65090: Register all regulators in single probe call
  regulator: tps65090: rename driver name and regulator name
  regulator: tps65090: Add support for LDO regulators
  regulator: tps65090: add external control support for DCDC
  regulator: tps80031: add regulator driver for tps80031
  mfd: tps6586x: move regulator dt parsing to regulator driver
  regulator: tps80031: add prefix TPS80031 on common defines.
  regulator: max8973: add regulator driver support
  regulator: max8973: provide enable/disable if external control disabled
  regulator: tps6586x: fix build warning in debug build

Lee Jones (4):
  regulator: gpio-regulator: Allow use of GPIO controlled 

Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-12-12 Thread Benjamin Herrenschmidt
On Wed, 2012-12-12 at 07:34 -0700, Alex Williamson wrote:
> > But what would I put there?... IOMMU ID is more than enough at the moment 
> > and struct iommu_table does not have anything what would have made sense to 
> > show in the sysfs...
> 
> I believe David mentioned that PEs had user visible names.  Perhaps they
> match an enclosure location or something.  Group numbers are rather
> arbitrary and really have no guarantee of persistence.  Thanks, 

I agree. Make up something, for example domain[PE] or something like
that.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-12 Thread Alex Shi
On 12/12/2012 09:31 PM, Vincent Guittot wrote:
> During the creation of sched_domain, we define a pack buddy CPU for each CPU
> when one is available. We want to pack at all levels where a group of CPU can
> be power gated independently from others.
> On a system that can't power gate a group of CPUs independently, the flag is
> set at all sched_domain level and the buddy is set to -1. This is the default
> behavior.
> On a dual clusters / dual cores system which can power gate each core and
> cluster independently, the buddy configuration will be :
> 
>   | Cluster 0   | Cluster 1   |
>   | CPU0 | CPU1 | CPU2 | CPU3 |
> ---
> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
> 
> Small tasks tend to slip out of the periodic load balance so the best place
> to choose to migrate them is during their wake up. The decision is in O(1) as
> we only check again one buddy CPU

Just have a little worry about the scalability on a big machine, like on
a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
is different on task distribution decision.

> 
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/core.c  |1 +
>  kernel/sched/fair.c  |  110 
> ++
>  kernel/sched/sched.h |5 +++
>  3 files changed, 116 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4f36e9d..3436aad 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5693,6 +5693,7 @@ cpu_attach_domain(struct sched_domain *sd, struct 
> root_domain *rd, int cpu)
>   rcu_assign_pointer(rq->sd, sd);
>   destroy_sched_domains(tmp, cpu);
>  
> + update_packing_domain(cpu);
>   update_domain_cache(cpu);
>  }
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9916d41..fc93d96 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -163,6 +163,73 @@ void sched_init_granularity(void)
>   update_sysctl();
>  }
>  
> +
> +#ifdef CONFIG_SMP
> +/*
> + * Save the id of the optimal CPU that should be used to pack small tasks
> + * The value -1 is used when no buddy has been found
> + */
> +DEFINE_PER_CPU(int, sd_pack_buddy);
> +
> +/* Look for the best buddy CPU that can be used to pack small tasks
> + * We make the assumption that it doesn't wort to pack on CPU that share the
> + * same powerline. We looks for the 1st sched_domain without the
> + * SD_SHARE_POWERDOMAIN flag. Then We look for the sched_group witht the 
> lowest
> + * power per core based on the assumption that their power efficiency is
> + * better */
> +void update_packing_domain(int cpu)
> +{
> + struct sched_domain *sd;
> + int id = -1;
> +
> + sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN & SD_LOAD_BALANCE);
> + if (!sd)
> + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
> + else
> + sd = sd->parent;
> +
> + while (sd && (sd->flags && SD_LOAD_BALANCE)) {
> + struct sched_group *sg = sd->groups;
> + struct sched_group *pack = sg;
> + struct sched_group *tmp;
> +
> + /*
> +  * The sched_domain of a CPU points on the local sched_group
> +  * and the 1st CPU of this local group is a good candidate
> +  */
> + id = cpumask_first(sched_group_cpus(pack));
> +
> + /* loop the sched groups to find the best one */
> + for (tmp = sg->next; tmp != sg; tmp = tmp->next) {
> + if (tmp->sgp->power * pack->group_weight >
> + pack->sgp->power * tmp->group_weight)
> + continue;
> +
> + if ((tmp->sgp->power * pack->group_weight ==
> + pack->sgp->power * tmp->group_weight)
> +  && (cpumask_first(sched_group_cpus(tmp)) >= id))
> + continue;
> +
> + /* we have found a better group */
> + pack = tmp;
> +
> + /* Take the 1st CPU of the new group */
> + id = cpumask_first(sched_group_cpus(pack));
> + }
> +
> + /* Look for another CPU than itself */
> + if (id != cpu)
> + break;
> +
> + sd = sd->parent;
> + }
> +
> + pr_debug("CPU%d packing on CPU%d\n", cpu, id);
> + per_cpu(sd_pack_buddy, cpu) = id;
> +}
> +
> +#endif /* CONFIG_SMP */
> +
>  #if BITS_PER_LONG == 32
>  # define WMULT_CONST (~0UL)
>  #else
> @@ -5083,6 +5150,46 @@ static bool numa_allow_migration(struct task_struct 
> *p, int prev_cpu, int new_cp
>   return true;
>  }
>  
> +static bool is_buddy_busy(int cpu)
> +{
> + struct rq *rq = cpu_rq(cpu);
> +
> + /*
> +  * A busy buddy is a CPU with a high load or a small load with a lot of
> +  * running tasks.

Re: [PATCH v4 0/7] enable support for AMBA drivers under x86

2012-12-12 Thread H. Peter Anvin

On 12/12/2012 06:02 PM, H. Peter Anvin wrote:

Hmm... with this patchset on top of v3.7 I still get a bunch of warnings
and errors.

More details when the build completes (if it does.)  I might still push
out the branch to let Fengguang's test robot take it for a spin.



Build failure:

o.i386-allconfig/make.log:make[4]: *** [drivers/gpio/gpio-pl061.o] Error 1
o.i386-allconfig/make.log:make[3]: *** [drivers/gpio] Error 2

Warnings:

/home/hpa/kernel/tip.x86-amba/drivers/dma/amba-pl08x.c:848:3: warning: 
format ‘%x’ expects argument of type ‘unsigned int’, but argument 4 has 
type ‘dma_addr_t’ [-Wformat]
/home/hpa/kernel/tip.x86-amba/drivers/dma/amba-pl08x.c:848:3: warning: 
format ‘%x’ expects argument of type ‘unsigned int’, but argument 7 has 
type ‘dma_addr_t’ [-Wformat]
/home/hpa/kernel/tip.x86-amba/arch/x86/pci/sta2x11-fixup.c:186:2: 
warning: initialization from incompatible pointer type [enabled by default]
/home/hpa/kernel/tip.x86-amba/arch/x86/pci/sta2x11-fixup.c:186:2: 
warning: (near initialization for ‘sta2x11_dma_ops.free’) [enabled by 
default]
/home/hpa/kernel/tip.x86-amba/drivers/extcon/extcon-max77693.c:355:10: 
warning: ‘adclow’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
/home/hpa/kernel/tip.x86-amba/drivers/extcon/extcon-max77693.c:355:10: 
warning: ‘adc1k’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
/home/hpa/kernel/tip.x86-amba/drivers/extcon/extcon-max77693.c:355:10: 
warning: ‘adc’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Networking

2012-12-12 Thread Linus Torvalds
On Wed, Dec 12, 2012 at 12:11 PM, David Miller  wrote:
>
> There is one merge conflict to resolve in net/sched/cls_cgroup.c,
> one commit changes the name of some members to "css_*" (this came
> from Tejun's tree) and another commit adds an "attach" method.

There's more than that. The ARM board mess is apparently now affecting
the networking merges too.

I fixed it up. Hopefully correctly.

Also, why does the new SHA1 hmac cookie support default to 'y'?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch resend 1/2] oprofile: move task handoff to oprofile

2012-12-12 Thread David Rientjes
In the discussion surrounding 83dbbdbb3866 ("android, lowmemorykiller:
remove task handoff notifier"), I promised to isolate the task handoff
notifier to oprofile.  The bug that was fixed in that commit occurred
because a task handoff notifier registered before oprofile was not
freeing the task_struct appropriately.  In a chain of multiple possible
notifiers, it's impossible to determine which one was called last to
actually do the freeing, which leads to a task_struct leak.

Oprofile is the only user of the task handoff notifier, so move it to the 
oprofile code so nobody can use it accidently again.  It will be the only
subsystem allowed to use task handoffs.

Signed-off-by: David Rientjes 
---
 resend: no response to first posting on Nov 14

 drivers/oprofile/buffer_sync.c |   18 ++
 include/linux/profile.h|   35 +--
 kernel/profile.c   |   20 
 3 files changed, 31 insertions(+), 42 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -42,6 +42,24 @@ static LIST_HEAD(dead_tasks);
 static cpumask_var_t marked_cpus;
 static DEFINE_SPINLOCK(task_mortuary);
 static void process_task_mortuary(void);
+static ATOMIC_NOTIFIER_HEAD(task_free_notifier);
+
+static int task_handoff_register(struct notifier_block *n)
+{
+   return atomic_notifier_chain_register(_free_notifier, n);
+}
+
+static int task_handoff_unregister(struct notifier_block *n)
+{
+   return atomic_notifier_chain_unregister(_free_notifier, n);
+}
+
+int profile_handoff_task(struct task_struct *task)
+{
+   int ret;
+   ret = atomic_notifier_call_chain(_free_notifier, 0, task);
+   return (ret == NOTIFY_OK) ? 1 : 0;
+}
 
 /* Take ownership of the task struct and place it on the
  * list for processing. Only after two full buffer syncs
diff --git a/include/linux/profile.h b/include/linux/profile.h
--- a/include/linux/profile.h
+++ b/include/linux/profile.h
@@ -16,6 +16,7 @@
 struct proc_dir_entry;
 struct pt_regs;
 struct notifier_block;
+struct task_struct;
 
 #if defined(CONFIG_PROFILING) && defined(CONFIG_PROC_FS)
 void create_prof_cpu_mask(struct proc_dir_entry *de);
@@ -36,6 +37,18 @@ enum profile_type {
PROFILE_MUNMAP
 };
 
+#if defined(CONFIG_PROFILING) && defined(CONFIG_OPROFILE)
+/* task is dead, free task struct ? Returns 1 if
+ * the task was taken, 0 if the task should be freed.
+ */
+int profile_handoff_task(struct task_struct *task);
+#else
+static inline int profile_handoff_task(struct task_struct *task)
+{
+   return 0;
+}
+#endif /* CONFIG_PROFILING && CONFIG_OPROFILE */
+
 #ifdef CONFIG_PROFILING
 
 extern int prof_on __read_mostly;
@@ -62,23 +75,12 @@ static inline void profile_hit(int type, void *ip)
profile_hits(type, ip, 1);
 }
 
-struct task_struct;
-struct mm_struct;
-
 /* task is in do_exit() */
 void profile_task_exit(struct task_struct * task);
 
-/* task is dead, free task struct ? Returns 1 if
- * the task was taken, 0 if the task should be freed.
- */
-int profile_handoff_task(struct task_struct * task);
-
 /* sys_munmap */
 void profile_munmap(unsigned long addr);
 
-int task_handoff_register(struct notifier_block * n);
-int task_handoff_unregister(struct notifier_block * n);
-
 int profile_event_register(enum profile_type, struct notifier_block * n);
 int profile_event_unregister(enum profile_type, struct notifier_block * n);
 
@@ -111,16 +113,6 @@ static inline void profile_hit(int type, void *ip)
return;
 }
 
-static inline int task_handoff_register(struct notifier_block * n)
-{
-   return -ENOSYS;
-}
-
-static inline int task_handoff_unregister(struct notifier_block * n)
-{
-   return -ENOSYS;
-}
-
 static inline int profile_event_register(enum profile_type t, struct 
notifier_block * n)
 {
return -ENOSYS;
@@ -132,7 +124,6 @@ static inline int profile_event_unregister(enum 
profile_type t, struct notifier_
 }
 
 #define profile_task_exit(a) do { } while (0)
-#define profile_handoff_task(a) (0)
 #define profile_munmap(a) do { } while (0)
 
 static inline int register_timer_hook(int (*hook)(struct pt_regs *))
diff --git a/kernel/profile.c b/kernel/profile.c
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -137,7 +137,6 @@ int __ref profile_init(void)
 /* Profile event notifications */
 
 static BLOCKING_NOTIFIER_HEAD(task_exit_notifier);
-static ATOMIC_NOTIFIER_HEAD(task_free_notifier);
 static BLOCKING_NOTIFIER_HEAD(munmap_notifier);
 
 void profile_task_exit(struct task_struct *task)
@@ -145,30 +144,11 @@ void profile_task_exit(struct task_struct *task)
blocking_notifier_call_chain(_exit_notifier, 0, task);
 }
 
-int profile_handoff_task(struct task_struct *task)
-{
-   int ret;
-   ret = atomic_notifier_call_chain(_free_notifier, 0, task);
-   return (ret == NOTIFY_OK) ? 1 : 0;
-}
-
 void profile_munmap(unsigned long addr)
 

[patch resend 2/2] oprofile: simplify task handoff

2012-12-12 Thread David Rientjes
Since the task handoff notifier is isolated only to oprofile, it no
longer consists of a possible chain of notifiers.  Thus, it's easy to
replace it with a simple linked list when enabled, which is anytime that
the notifier would have been registered.

Signed-off-by: David Rientjes 
---
 resend: no response to first posting on Nov 14

 drivers/oprofile/buffer_sync.c |   68 ++--
 1 file changed, 23 insertions(+), 45 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -42,24 +42,7 @@ static LIST_HEAD(dead_tasks);
 static cpumask_var_t marked_cpus;
 static DEFINE_SPINLOCK(task_mortuary);
 static void process_task_mortuary(void);
-static ATOMIC_NOTIFIER_HEAD(task_free_notifier);
-
-static int task_handoff_register(struct notifier_block *n)
-{
-   return atomic_notifier_chain_register(_free_notifier, n);
-}
-
-static int task_handoff_unregister(struct notifier_block *n)
-{
-   return atomic_notifier_chain_unregister(_free_notifier, n);
-}
-
-int profile_handoff_task(struct task_struct *task)
-{
-   int ret;
-   ret = atomic_notifier_call_chain(_free_notifier, 0, task);
-   return (ret == NOTIFY_OK) ? 1 : 0;
-}
+static bool task_handoff_enabled __read_mostly;
 
 /* Take ownership of the task struct and place it on the
  * list for processing. Only after two full buffer syncs
@@ -68,15 +51,17 @@ int profile_handoff_task(struct task_struct *task)
  * Can be invoked from softirq via RCU callback due to
  * call_rcu() of the task struct, hence the _irqsave.
  */
-static int
-task_free_notify(struct notifier_block *self, unsigned long val, void *data)
+int profile_handoff_task(struct task_struct *task)
 {
unsigned long flags;
-   struct task_struct *task = data;
-   spin_lock_irqsave(_mortuary, flags);
-   list_add(>tasks, _tasks);
-   spin_unlock_irqrestore(_mortuary, flags);
-   return NOTIFY_OK;
+
+   if (task_handoff_enabled) {
+   spin_lock_irqsave(_mortuary, flags);
+   list_add(>tasks, _tasks);
+   spin_unlock_irqrestore(_mortuary, flags);
+   return 1;
+   }
+   return 0;
 }
 
 
@@ -143,10 +128,6 @@ module_load_notify(struct notifier_block *self, unsigned 
long val, void *data)
 }
 
 
-static struct notifier_block task_free_nb = {
-   .notifier_call  = task_free_notify,
-};
-
 static struct notifier_block task_exit_nb = {
.notifier_call  = task_exit_notify,
 };
@@ -173,33 +154,30 @@ int sync_start(void)
if (!zalloc_cpumask_var(_cpus, GFP_KERNEL))
return -ENOMEM;
 
-   err = task_handoff_register(_free_nb);
-   if (err)
-   goto out1;
+   task_handoff_enabled = true;
+   barrier();
err = profile_event_register(PROFILE_TASK_EXIT, _exit_nb);
if (err)
-   goto out2;
+   goto out1;
err = profile_event_register(PROFILE_MUNMAP, _nb);
if (err)
-   goto out3;
+   goto out2;
err = register_module_notifier(_load_nb);
if (err)
-   goto out4;
+   goto out3;
 
start_cpu_work();
-
-out:
-   return err;
-out4:
-   profile_event_unregister(PROFILE_MUNMAP, _nb);
+   return 0;
 out3:
-   profile_event_unregister(PROFILE_TASK_EXIT, _exit_nb);
+   profile_event_unregister(PROFILE_MUNMAP, _nb);
 out2:
-   task_handoff_unregister(_free_nb);
-   free_all_tasks();
+   profile_event_unregister(PROFILE_TASK_EXIT, _exit_nb);
 out1:
+   task_handoff_enabled = false;
+   barrier();
+   free_all_tasks();
free_cpumask_var(marked_cpus);
-   goto out;
+   return err;
 }
 
 
@@ -209,7 +187,7 @@ void sync_stop(void)
unregister_module_notifier(_load_nb);
profile_event_unregister(PROFILE_MUNMAP, _nb);
profile_event_unregister(PROFILE_TASK_EXIT, _exit_nb);
-   task_handoff_unregister(_free_nb);
+   task_handoff_enabled = false;
barrier();  /* do all of the above first */
 
flush_cpu_work();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >