Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Andrew Morton
On Fri, 30 Nov 2012 15:01:26 +0800 Lin Feng  wrote:

> 
> 
> On 11/30/2012 01:57 PM, Andrew Morton wrote:
> > On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng  wrote:
> > 
> >> hi Andrew,
> >>
> >> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> >>> Tricky.
> >>>
> >>> I expect the same problem would occur with pages which are under
> >>> O_DIRECT I/O.  Obviously O_DIRECT pages won't be pinned for such long
> >>> periods, but the durations could still be lengthy (seconds).
> >> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages 
> >> seem maybe not a problem for the moment.
> >>>
> >>> Worse is a futex page, which could easily remain pinned indefinitely.
> >>>
> >>> The best I can think of is to make changes in or around
> >>> get_user_pages(), to steal the pages from userspace and replace them
> >>> with non-movable ones before pinning them.  The performance cost of
> >>> something like this would surely be unacceptable for direct-io, but
> >>> maybe OK for the aio ring and futexes.
> >> thanks for your advice.
> >> I want to limit the impact as little as possible, as mentioned above,
> >> direct-io seems not a problem, we needn't touch them. Maybe we can 
> >> just change the use of get_user_pages()(in or around) such as aio 
> >> ring pages. I will try to find a way to do this.
> > 
> > What about futexes?
> hi Andrew,
> 
> Yes, better to find an approach to solve them all.
>  
> But I'm worried about that if we just confine get_user_pages() to use 
> none-movable pages, it will drain the none-movable pages soon. Because
> there are many places using get_user_pages() such as some drivers. 

Obviously we shouldn't change get_user_pages() for all callers.

> IMHO in most cases get_user_pages() callers should release the pages soon, 
> so pages allocated from movable zone should be OK. But I'm not sure if
> we get such rule upon get_user_pages(). 
> And in other cases we specify get_user_pages() to allocate pages from
> none-movable zone. 
> 
> So could we add a zone-alloc flags when we call get_user_pages()?

Well, that's a fairly low-level implementation detail.  A more typical
approach would be to add a new get_user_pages_non_movable() or such. 
That would probably have the same signature as get_user_pages(), with
one additional argument.  Then get_user_pages() becomes a one-line
wrapper which passes in a particular value of that argument.

But that means we'd also have to add get_user_pages_fast_non_movable()
and things might become a bit stupid.  A better approach might be to
add a new library function which callers can use before (or after?)
calling get_user_pages[_fast]().

Unsure.  It's the sort of thing where one has to dive in and try a few
things.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-29 Thread Lucas Stach
Am Freitag, den 30.11.2012, 09:44 +0200 schrieb Terje Bergström:
> On 29.11.2012 14:14, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 10:09:13AM +0100, Lucas Stach wrote:
> >> This way you would also be able to construct different handles (like GEM
> >> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
> >> not sure how useful this would be, but it seems like a reasonable design
> >> to me being able to do so.
> > 
> > Wouldn't that be useful for sharing buffers between DRM and V4L2 using
> > dma-buf? I'm not very familiar with how exactly importing and exporting
> > work with dma-buf, so maybe I need to read up some more.
> 
> I would still preserve the dma-buf support, for exactly this purpose.
> 
dma-buf is useful and should be preserved, as some userspace like
gstreamer might rely on us being able to import/export dma-buf handles
at some time. At the very latest we'll need it if someone wants to run a
UDL device to scanout a buffer rendered to by the internal GPU.

What I'm saying is just that with a common allocator we could cut down a
lot on the usage of dma-buf, where not really necessary. Also you might
be able to do some optimisations based on the fact that a dma-buf handle
exported for some V4L2 buffer, which gets imported into DRM to construct
a GEM object, is the very same nvhost object in the end.

Regards,
Lucas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 3/8] video: tegra: host: Add channel and client support

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 01:00:40PM +0200, Terje Bergström wrote:
> On 29.11.2012 12:04, Thierry Reding wrote:
> > Looking some more at how this is used, I'm starting to think that it
> > might be easier to export the various handlers and allow them to be
> > passed to the nvhost_intr_add_action() explicitly.
> 
> Oh, so you mean like "nvhost_intr_add_action(intr, id, threshold,
> nvhost_intr_action_submit_complete, channel, waiter, priv), and
> nvhost_intr_action_submit_complete is the function pointer?
> 
> There's one case to take care of: we merge the waits for the jobs into
> one waiter to save us from having too many irq calls. Perhaps that could
> be handled by a flag, or something like that.

Yes, something like ACTION_MERGE or something should work fine.
Alternatively you could handle it by providing two public functions, one
which adds to the list of jobs that can be merged, the other that adds
to the list that cannot be merged.

> >> +struct nvhost_job *nvhost_job_alloc(struct nvhost_channel *ch,
> >> + int num_cmdbufs, int num_relocs, int num_waitchks)
> >> +{
> >> + struct nvhost_job *job = NULL;
> >> + size_t size = job_size(num_cmdbufs, num_relocs, num_waitchks);
> >> +
> >> + if (!size)
> >> + return NULL;
> >> + job = vzalloc(size);
> > 
> > Why vzalloc()?
> 
> I guess it's basically moot, but we tried that when we had some memory
> fragmentation issues and it was left even though we did find out it's
> not needed.

I think kzalloc() would be a better choice here. Also, while at it you
may want to make the num_* parameters unsigned.

> >> + }
> >> +
> >> + /* get current syncpt values for waitchk */
> >> + for_each_set_bit(i, _mask[0], sizeof(waitchk_mask))
> >> + nvhost_syncpt_update_min(sp, i);
> > 
> > Or since you only use the mask here, why not move the
> > nvhost_syncpt_update_min() into the above loop?
> 
> I want to call nvhost_syncpt_update_min() only once per syncpt register.
> If the job has 100 sync point increments for 2D sync point, I'd read the
> value from hardware 100 times, which is expensive.

Right, hadn't thought about the fact that you can have multiple waits
for a single syncpoint in the job.

Looking at the code again, I see that you use sizeof(waitchk_mask) as
the third parameter to the for_each_set_bit() macro. However the size
parameter is to be specified in bits, not bytes.

Also the name nvhost_syncpt_update_min() has had me confused. So say it
is used to update the value that you've cached in software from the real
value in the register. However I interpret update_min() as "update the
minimum value in the register". Maybe something like *_load_min() would
be clearer.

> Thanks. I've collected a massive amount of feedback already. v3 will
> take quite a while to appear after we've finished all the reviews of v2.

Yes, that should keep you busy for quite a while. =) But I also think
we've made good progress so far.

Thierry


pgpsvcZci9X9V.pgp
Description: PGP signature


[PATCH] [trivial] treewide: Fix typos in various Kconfig

2012-11-29 Thread Masanari Iida
Correct spelling typo within various Kconfig.

Signed-off-by: Masanari Iida 
---
 arch/arm/mach-tegra/Kconfig |  2 +-
 arch/openrisc/Kconfig   |  2 +-
 drivers/gpio/Kconfig|  2 +-
 drivers/mmc/host/Kconfig|  2 +-
 drivers/thermal/Kconfig |  2 +-
 lib/Kconfig.debug   | 10 +-
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm/mach-tegra/Kconfig b/arch/arm/mach-tegra/Kconfig
index 9ff6f6e..dd1ae01 100644
--- a/arch/arm/mach-tegra/Kconfig
+++ b/arch/arm/mach-tegra/Kconfig
@@ -55,7 +55,7 @@ config TEGRA_AHB
help
  Adds AHB configuration functionality for NVIDIA Tegra SoCs,
  which controls AHB bus master arbitration and some
- perfomance parameters(priority, prefech size).
+ performance parameters(priority, prefech size).
 
 choice
 prompt "Default low-level debug console UART"
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index e7f1a29..ec37e18 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -146,7 +146,7 @@ config DEBUG_STACKOVERFLOW
help
  Make extra checks for space available on stack in some
   critical functions. This will cause kernel to run a bit slower,
- but will catch most of kernel stack overruns and exit gracefuly.
+ but will catch most of kernel stack overruns and exit gracefully.
 
  Say Y if you are unsure.
 
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 9e3fb34..14abfb0 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -485,7 +485,7 @@ config GPIO_ADNP
help
  This option enables support for N GPIOs found on Avionic Design
  I2C GPIO expanders. The register space will be extended by powers
- of two, so the controller will need to accomodate for that. For
+ of two, so the controller will need to accommodate for that. For
  example: if a controller provides 48 pins, 6 registers will be
  enough to represent all pins, but the driver will assume a
  register layout for 64 pins (8 registers).
diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 3c769ba..87f2ceb 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -537,7 +537,7 @@ config MMC_DW_PLTFM
  If unsure, say Y.
 
 config MMC_DW_EXYNOS
-   tristate "Exynos specific extentions for Synopsys DW Memory Card 
Interface"
+   tristate "Exynos specific extensions for Synopsys DW Memory Card 
Interface"
depends on MMC_DW
select MMC_DW_PLTFM
help
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 266c15e..3ba12f1 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -53,7 +53,7 @@ config EXYNOS_THERMAL
depends on (ARCH_EXYNOS4 || ARCH_EXYNOS5) && THERMAL
select CPU_FREQ_TABLE
help
- If you say yes here you get support for TMU (Thermal Managment
+ If you say yes here you get support for TMU (Thermal Management
  Unit) on SAMSUNG EXYNOS series of SoC.
 
 config FAIR_SHARE
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6a4b500..37684ee 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1115,7 +1115,7 @@ config NOTIFIER_ERROR_INJECTION
depends on DEBUG_KERNEL
select DEBUG_FS
help
- This option provides the ability to inject artifical errors to
+ This option provides the ability to inject artificial errors to
  specified notifier chain callbacks. It is useful to test the error
  handling of notifier call chain failures.
 
@@ -1126,7 +1126,7 @@ config CPU_NOTIFIER_ERROR_INJECT
depends on HOTPLUG_CPU && NOTIFIER_ERROR_INJECTION
help
  This option provides a kernel module that can be used to test
- the error handling of the cpu notifiers by injecting artifical
+ the error handling of the cpu notifiers by injecting artificial
  errors to CPU notifier chain callbacks.  It is controlled through
  debugfs interface under /sys/kernel/debug/notifier-error-inject/cpu
 
@@ -1150,7 +1150,7 @@ config PM_NOTIFIER_ERROR_INJECT
depends on PM && NOTIFIER_ERROR_INJECTION
default m if PM_DEBUG
help
- This option provides the ability to inject artifical errors to
+ This option provides the ability to inject artificial errors to
  PM notifier chain callbacks.  It is controlled through debugfs
  interface /sys/kernel/debug/notifier-error-inject/pm
 
@@ -1173,7 +1173,7 @@ config MEMORY_NOTIFIER_ERROR_INJECT
tristate "Memory hotplug notifier error injection module"
depends on MEMORY_HOTPLUG_SPARSE && NOTIFIER_ERROR_INJECTION
help
- This option provides the ability to inject artifical errors to
+ This option provides the ability to inject artificial errors to
  memory hotplug notifier chain callbacks.  It is controlled 

Re: [RFC v2 8/8] drm: tegra: Add gr2d device

2012-11-29 Thread Terje Bergström
On 29.11.2012 14:14, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 10:09:13AM +0100, Lucas Stach wrote:
>> This way you would also be able to construct different handles (like GEM
>> obj or V4L2 buffers) from the same backing nvhost object. Note that I'm
>> not sure how useful this would be, but it seems like a reasonable design
>> to me being able to do so.
> 
> Wouldn't that be useful for sharing buffers between DRM and V4L2 using
> dma-buf? I'm not very familiar with how exactly importing and exporting
> work with dma-buf, so maybe I need to read up some more.

I would still preserve the dma-buf support, for exactly this purpose.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
Just replying to part of your mail.

On 30.11.2012 09:22, Thierry Reding wrote:
> Actually for the display controller we want just a notification when the
> VBLANK happens. I'm not sure if we want to do that with syncpoints at
> all since it works quite well using regular interrupts.

VBLANK isn't actually a very good example of dc's use of sync points.
That can easily be done with regular interrupts, as you mention.

More important is when we have double buffering enabled. When you draw
something to a surface, and flip it to display, you want DC to notify
when the flip has been done and rendering can continue to the back buffer.

So, what you can do is return a fence from DC when initiating a flip,
and place that fence into 2D stream as a host wait so that 2D will
patiently wait for buffer to become free before it renders.

> What I'm proposing is to leave it up to each host1x client how they want
> to handle this. For display controllers it may be enough to have their
> callback run in interrupt context but other clients may need to do more
> work so they can queue it themselves.

DC doesn't need to worry about host1x interrupts at all. It's all
internal to the host1x driver, so we're now just talking about the
internal implementation of host1x.

We have two scenarios for the syncpt interrupts. One is that a job got
finished and we need to clean up the queue and free up resources. This
must be done in threads. Other is releasing a thread that is blocked by
a syncpt wait.

It's simpler if both of these are handled with the same infrastructure,
and we've shown that latency is very good even if we handle all events
in a thread.

> I know that this looks like it might be more work, but if it turns out
> that many drivers need to do the exact same thing, that functionality
> can be factored out into a helper. But it may just as well turn out that
> the requirements for each module are slightly different that forcing a
> workqueue on them could result in ugly workarounds because it doesn't
> quite work for them.

This is just driver internal, so there's no need for other drivers to
access this part.

> If we move responsibility of managing the workqueue out of host1x as I
> proposed above, maybe a lot of this code can be removed. Maybe you can
> explain a bit what they are used for exactly in your write-up.

It's going to be a big bad boy. :-)

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 11:41:50AM -0700, Stephen Warren wrote:
> On 11/29/2012 01:44 AM, Thierry Reding wrote:
> > On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
> 
> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
> >> b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
> > 
> > Erm... no. The usual way you should be doing this is either make
> > the register definitions account for the stride or use accessors
> > that apply the stride. You should be doing the latter anyway to
> > make accesses. For example:
> > 
> > static inline void host1x_syncpt_writel(struct host1x *host1x, 
> > unsigned long value, unsigned long offset) { writel(value,
> > host1x->regs + SYNCPT_BASE + offset); }
> > 
> > static inline unsigned long host1x_syncpt_readl(struct host1x
> > *host1x, unsigned long offset) { return readl(host1x->regs +
> > SYNCPT_BASE + offset); }
> > 
> > Alternatively, if you want to pass the register index instead of
> > the offset, you can use just multiply the offset in that function:
> > 
> > writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> > 
> > The same can also be done with the non-syncpt registers.
> 
> It seems like reasonable documentation to replace "<< 2" with "*
> REGISTER_STRIDE" here.

Given that it is a very common pattern, << 2 seems enough documentation
to me, but sure, if you prefer to be extra explicit that's fine with me.

Thierry


pgpYKjECswErR.pgp
Description: PGP signature


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 12:39:23PM +0200, Terje Bergström wrote:
> On 29.11.2012 10:44, Thierry Reding wrote:
> >> diff --git a/drivers/video/tegra/host/dev.c 
> >> b/drivers/video/tegra/host/dev.c
> >> index 98c9c9f..025a820 100644
> >> --- a/drivers/video/tegra/host/dev.c
> >> +++ b/drivers/video/tegra/host/dev.c
> >> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
> >>  }
> >>  EXPORT_SYMBOL(host1x_syncpt_read);
> >>
> >> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
> > 
> > The choice of data types is odd here. id refers to a syncpt so a better
> > choice would have been unsigned int because the size of the variable
> > doesn't actually matter. But as I already said in my reply to patch 1,
> > these are resources and should therefore better be abstracted through an
> > opaque pointer anyway.
> > 
> > timeout is usually signed long, so this function should reflect that. As
> > for the value this is probably fine as it will effectively be set from a
> > register value. Though you also cache them in software using atomics.
> 
> 32-bits is an architectural limit for the sync point id, so that's why I
> used it here.

But given that there are only 32 syncpoints they look rather costly, so
I don't expect more than a few hundred to ever be used in hardware,
right?

> But you're right - it doesn't really matter and could be changed to
> unsigned long.

I'd still opt for unsigned int. For no other reason than that it is how
other types of resources are enumerated.

> thresh and *value reflects that sync point value is 32-bit, and I'd keep
> that as is.

Yes, that makes sense.

> Timeout should be unsigned long, yes.

It should actually be signed long to match the type used for timeouts in
the various wait_*() functions.

> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
> >> b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "nvhost_intr.h"
> >> +#include "host1x/host1x.h"
> >> +
> >> +/* Spacing between sync registers */
> >> +#define REGISTER_STRIDE 4
> > 
> > Erm... no. The usual way you should be doing this is either make the
> > register definitions account for the stride or use accessors that apply
> > the stride. You should be doing the latter anyway to make accesses. For
> > example:
> > 
> > static inline void host1x_syncpt_writel(struct host1x *host1x,
> > unsigned long value,
> > unsigned long offset)
> > {
> > writel(value, host1x->regs + SYNCPT_BASE + offset);
> > }
> > 
> > static inline unsigned long host1x_syncpt_readl(struct host1x 
> > *host1x,
> > unsigned long 
> > offset)
> > {
> > return readl(host1x->regs + SYNCPT_BASE + offset);
> > }
> > 
> > Alternatively, if you want to pass the register index instead of the
> > offset, you can use just multiply the offset in that function:
> >
> > writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> >
> > The same can also be done with the non-syncpt registers.
> 
> The register number has a stride of 4 when doing writes, and 1 when
> adding to command streams. This is why I've kept the register
> definitions as is.

Yes, that's why it makes sense to use such helpers. It allows you to
reuse the register definitions for both direct and indirect access but
doesn't require you to repeat the stride multiplication every time.

> I could add helper functions. Just as a side note, the sync register
> space has other definitions than just the syncpt registers, so the
> naming should be changed a bit.

The TRM refers to them as SYNC registers, so SYNC_BASE should be fine.

> >> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> >> +{
> >> + struct nvhost_master *dev = dev_id;
> >> + void __iomem *sync_regs = dev->sync_aperture;
> >> + struct nvhost_intr *intr = >intr;
> >> + unsigned long reg;
> >> + int i, id;
> >> +
> >> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
> >> + reg = readl(sync_regs +
> >> + 
> >> host1x_sync_syncpt_thresh_cpu0_int_status_r() +
> >> + i * REGISTER_STRIDE);
> >> + for_each_set_bit(id, , BITS_PER_LONG) {
> >> + struct nvhost_intr_syncpt *sp =
> >> + intr->syncpt + (i * BITS_PER_LONG + id);
> >> + host1x_intr_syncpt_thresh_isr(sp);
> >> + queue_work(intr->wq, >work);
> >> + }
> >> + }
> >> +
> >> + return IRQ_HANDLED;
> >> +}
> > 
> > Maybe it would be better to call the syncpt handlers in interrupt
> > context and let them schedule work if they want to. I'm thinking about
> > the display controllers which may want 

Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Kamezawa Hiroyuki

(2012/11/30 14:57), Andrew Morton wrote:

On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng  wrote:


hi Andrew,

On 11/30/2012 07:39 AM, Andrew Morton wrote:

Tricky.

I expect the same problem would occur with pages which are under
O_DIRECT I/O.  Obviously O_DIRECT pages won't be pinned for such long
periods, but the durations could still be lengthy (seconds).

the offline retry timeout duration is 2 minutes, so to O_DIRECT pages
seem maybe not a problem for the moment.


Worse is a futex page, which could easily remain pinned indefinitely.

The best I can think of is to make changes in or around
get_user_pages(), to steal the pages from userspace and replace them
with non-movable ones before pinning them.  The performance cost of
something like this would surely be unacceptable for direct-io, but
maybe OK for the aio ring and futexes.

thanks for your advice.
I want to limit the impact as little as possible, as mentioned above,
direct-io seems not a problem, we needn't touch them. Maybe we can
just change the use of get_user_pages()(in or around) such as aio
ring pages. I will try to find a way to do this.


What about futexes?



IIUC, futex's key is now a pair of (mm,address) or (inode, pgoff).
Then, get_user_page() in futex.c will release the page by put_page().
'struct page' is just touched by get_futex_key() to obtain page->mapping info.

Thanks,
-Kame





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suggestion] drivers/tty: drivers/char/: for MAX_ASYNC_BUFFER_SIZE

2012-11-29 Thread Chen Gang
于 2012年11月30日 11:27, Paul Fulghum 写道:
> 
> I’m the maintainer for these drivers. I only caught this message by
> chance and
> have not had a chance to review the entire thread and original patches.
> It’s late and I’m tired so I won’t be able to look at this until tomorrow.
> 
> I do not doubt there is a problem that needs cleaning up. I just need a
> day to
> review and make sure this does not cause any problems.

  if it is surely an issue,
is it suitable to let Paul Fulghum to provide the relative patch ?
for synclink, he is more expert than me.
for test and test environments, he is also more expert than me.

  thanks.

-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ARM: tegra: moving stuff away from mach/clk.h

2012-11-29 Thread Prashant Gaikwad

On Thursday 29 November 2012 11:55 PM, Stephen Warren wrote:

On 11/28/2012 11:12 PM, Prashant Gaikwad wrote:

On Tuesday 27 November 2012 12:29 PM, Sivaram Nair wrote:

This patch moves some stuff away from mach/clk.h to other mach-tegra
files. This is part of the efforts to get rid of mach/clk.h which in
turn will help to enable single zImage.

Signed-off-by: Sivaram Nair 
---
   arch/arm/mach-tegra/clock.c|1 +
   arch/arm/mach-tegra/clock.h|   12 ++--
   arch/arm/mach-tegra/include/mach/clk.h |   11 ---
   3 files changed, 11 insertions(+), 13 deletions(-)

This patch will not add any real value since I have removed clk_cfg_ex
functionality in clock code rework.
It's just that I will have to rebase my changes on this.

Reviewed-by: Prashant Gaikwad 

In that case, I may as well not apply patch 2/2 at all, and I'll just
take your clock code cleanup instead. I assume that will be posted for
inclusion into 3.9?


Yes, I am targeting for 3.9.


--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] rtc: sa1100: enable/disable rtc when probe/remove the device

2012-11-29 Thread Haojian Zhuang
On Thu, Nov 29, 2012 at 6:25 PM, Russell King - ARM Linux
 wrote:
> On Wed, Nov 28, 2012 at 09:21:07PM -0500, Chao Xie wrote:
>> The original sa1100_rtc_open/sa1100_rtc_release will be called
>> when the /dev/rtc0 is opened or closed.
>> In fact, these two functions will enable/disable the clock, and
>> register/unregister the irqs.
>> User application will use /dev/rtc0 to read the rtc time or set
>> the alarm. The rtc should still run indepent of open/close the
>> rtc device.
>> So only enable clock and register the irqs when probe the device,
>> and disable clock and unregister the irqs when remove the device.
>
> NAK.  I don't think you properly understand what's going on here if you
> think moving the entire open and release functions into the probe and
> remove functions is the right thing to do.

Since PXA27x & PXA3xx supports dual rtc device at the same time,
user could choose use either of rtc at run time. Then clk & irq are setup
in open().

Chao,
So you shouldn't remove them into probe().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Lin Feng


On 11/30/2012 01:57 PM, Andrew Morton wrote:
> On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng  wrote:
> 
>> hi Andrew,
>>
>> On 11/30/2012 07:39 AM, Andrew Morton wrote:
>>> Tricky.
>>>
>>> I expect the same problem would occur with pages which are under
>>> O_DIRECT I/O.  Obviously O_DIRECT pages won't be pinned for such long
>>> periods, but the durations could still be lengthy (seconds).
>> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages 
>> seem maybe not a problem for the moment.
>>>
>>> Worse is a futex page, which could easily remain pinned indefinitely.
>>>
>>> The best I can think of is to make changes in or around
>>> get_user_pages(), to steal the pages from userspace and replace them
>>> with non-movable ones before pinning them.  The performance cost of
>>> something like this would surely be unacceptable for direct-io, but
>>> maybe OK for the aio ring and futexes.
>> thanks for your advice.
>> I want to limit the impact as little as possible, as mentioned above,
>> direct-io seems not a problem, we needn't touch them. Maybe we can 
>> just change the use of get_user_pages()(in or around) such as aio 
>> ring pages. I will try to find a way to do this.
> 
> What about futexes?
hi Andrew,

Yes, better to find an approach to solve them all.
 
But I'm worried about that if we just confine get_user_pages() to use 
none-movable pages, it will drain the none-movable pages soon. Because
there are many places using get_user_pages() such as some drivers. 

IMHO in most cases get_user_pages() callers should release the pages soon, 
so pages allocated from movable zone should be OK. But I'm not sure if
we get such rule upon get_user_pages(). 
And in other cases we specify get_user_pages() to allocate pages from
none-movable zone. 

So could we add a zone-alloc flags when we call get_user_pages()?

Thanks,
linfeng

> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings

2012-11-29 Thread Francois Romieu
Wang YanQing  :
[...]
> After add some debug code, I found this NIC only accept ethernet
> broadcast package, it can't filter out the package send to its
> MAC address, but it works good for sending.So ifconfig show the
> RX/TX status means it can receive ARP package.(It don't its MAC
> address, so below)

Which kernel version is it ?

[...]
> I haven't see any code to set GigaMAC registers in kernel when boot,
> so I guess BIOS or NIC's circuit make it, but of course one miss

I'd appreciate to figure it out (and understand why I did not notice
it when testing).

> the extended GigaMAC registers  in this problem. The probe code can
> get MAC address right, so MAC{0,4} must had been setted, but some
> guys forget the extended GigaMAC registers.
> 
> This patch fix it.

It is a good analysis job.

I'd rather see the GigaMAC registers written through a call to
rtl_rar_set when the mac address is read in rtl_init_one instead
of duplicating most of rtl_rar_set in a quite different place.

Hayes, can you specify if it would work or if it may mess the registers
init sequence ordering ?

Thanks.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] ARM: final SoC fixes for 3.7

2012-11-29 Thread Olof Johansson
On Thu, Nov 29, 2012 at 12:07 PM, Russell King - ARM Linux
 wrote:
> On Thu, Nov 29, 2012 at 02:24:25PM +, Arnd Bergmann wrote:
>> Hi Linus,
>>
>> These should be the last bug fixes you get from Olof and me for 3.7.
>> This is based on the previous one you pulled and there is nothing
>> spectacular in here. I'll follow up with a second pull request
>> knowing that those are probably too late now, but I'd like to let
>> you know about the ones that didn't make it and give you the chance
>> to pull those anyway if you prefer.
>
> I don't see anything in here for the OMAP warning that I've been reporting
> for the last two weeks.  What's going on with getting fixes merged into
> arm-soc?

I've merged it in now (for 3.8) -- the pull request from Tony had been
pending for a couple of days. Should mirror out shortly.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: zram: fix invalid memory references during disk write

2012-11-29 Thread Nitin Gupta
(somehow mail didn't go to the stable email alias and [PATCH] prefix was 
not added. CC'ed stable now)


On 11/29/2012 10:45 PM, Nitin Gupta wrote:

Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
  - Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
  - Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.

Fixes bug 50081:
https://bugzilla.kernel.org/show_bug.cgi?id=50081

Upstream commit ID: c8f2f0d: zram: Fix handling of incompressible pages
Apply to versions: 3.6.5, 3.6.6, 3.6.7, 3.6.8

Cc:  # staging-next: 37b51fd: zram: factor-out
# zram_decompress_page() function
Signed-off-by: Nitin Gupta 
Reported-by: Mihail Kasadjikov 
Reported-by: Tomas M 
Reviewed-by: Minchan Kim 
---
  drivers/staging/zram/zram_drv.c |   39 ---
  1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index fb4a7c9..f2a73bd 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -265,7 +265,7 @@ out_cleanup:
  static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
   int offset)
  {
-   int ret;
+   int ret = 0;
size_t clen;
unsigned long handle;
struct page *page;
@@ -286,10 +286,8 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
goto out;
}
ret = zram_decompress_page(zram, uncmem, index);
-   if (ret) {
-   kfree(uncmem);
+   if (ret)
goto out;
-   }
}

/*
@@ -302,16 +300,18 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,

user_mem = kmap_atomic(page);

-   if (is_partial_io(bvec))
+   if (is_partial_io(bvec)) {
memcpy(uncmem + offset, user_mem + bvec->bv_offset,
   bvec->bv_len);
-   else
+   kunmap_atomic(user_mem);
+   user_mem = NULL;
+   } else {
uncmem = user_mem;
+   }

if (page_zero_filled(uncmem)) {
-   kunmap_atomic(user_mem);
-   if (is_partial_io(bvec))
-   kfree(uncmem);
+   if (!is_partial_io(bvec))
+   kunmap_atomic(user_mem);
zram_stat_inc(>stats.pages_zero);
zram_set_flag(zram, index, ZRAM_ZERO);
ret = 0;
@@ -321,9 +321,11 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
ret = lzo1x_1_compress(uncmem, PAGE_SIZE, src, ,
   zram->compress_workmem);

-   kunmap_atomic(user_mem);
-   if (is_partial_io(bvec))
-   kfree(uncmem);
+   if (!is_partial_io(bvec)) {
+   kunmap_atomic(user_mem);
+   user_mem = NULL;
+   uncmem = NULL;
+   }

if (unlikely(ret != LZO_E_OK)) {
pr_err("Compression failed! err=%d\n", ret);
@@ -332,8 +334,10 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,

if (unlikely(clen > max_zpage_size)) {
zram_stat_inc(>stats.bad_compress);
-   src = uncmem;
clen = PAGE_SIZE;
+   src = NULL;
+   if (is_partial_io(bvec))
+   src = uncmem;
}

handle = zs_malloc(zram->mem_pool, clen);
@@ -345,7 +349,11 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
}
cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);

+   if ((clen == PAGE_SIZE) && !is_partial_io(bvec))
+   src = kmap_atomic(page);
memcpy(cmem, src, clen);
+   if ((clen == PAGE_SIZE) && !is_partial_io(bvec))
+   kunmap_atomic(src);

zs_unmap_object(zram->mem_pool, handle);

@@ -358,9 +366,10 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
if (clen <= PAGE_SIZE / 2)
zram_stat_inc(>stats.good_compress);

-   return 0;
-
  out:
+   if (is_partial_io(bvec))
+   kfree(uncmem);
+
if (ret)
zram_stat64_inc(zram, >stats.failed_writes);
return ret;
--
1.7.10.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] zram: reduce metadata overhead

2012-11-29 Thread Nitin Gupta
Changelog v2 vs v1:
 - Use is_zero_page() instead of direct handle comparison
 - Use 1 as invalid handle value instead of -1 since handle
is unsigned and thus -1 may refer to a valid object. While 1
is guaranteed to be invalid since  can never
refer to (end of) a valid object.
 - Remove references to 'table' in comments and messages since
we just have a plain array of handles now.

For every allocated object, zram maintains the the handle, size,
flags and count fields. Of these, only the handle is required
since zsmalloc now provides the object size given the handle.
The flags field was needed only to mark a given page as zero-filled.
Instead of this field, we now use an invalid value (-1) to mark such
pages. Lastly, the count field was unused, so was simply removed.

Signed-off-by: Nitin Gupta 
Reviewed-by: Jerome Marchand 
---
 drivers/staging/zram/zram_drv.c |   97 ---
 drivers/staging/zram/zram_drv.h |   20 ++--
 2 files changed, 43 insertions(+), 74 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index f2a73bd..e6c9bec 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -71,24 +71,6 @@ static void zram_stat64_inc(struct zram *zram, u64 *v)
zram_stat64_add(zram, v, 1);
 }
 
-static int zram_test_flag(struct zram *zram, u32 index,
-   enum zram_pageflags flag)
-{
-   return zram->table[index].flags & BIT(flag);
-}
-
-static void zram_set_flag(struct zram *zram, u32 index,
-   enum zram_pageflags flag)
-{
-   zram->table[index].flags |= BIT(flag);
-}
-
-static void zram_clear_flag(struct zram *zram, u32 index,
-   enum zram_pageflags flag)
-{
-   zram->table[index].flags &= ~BIT(flag);
-}
-
 static int page_zero_filled(void *ptr)
 {
unsigned int pos;
@@ -104,6 +86,11 @@ static int page_zero_filled(void *ptr)
return 1;
 }
 
+static inline int is_zero_page(unsigned long handle)
+{
+   return handle == zero_page_handle;
+}
+
 static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 {
if (!zram->disksize) {
@@ -135,21 +122,20 @@ static void zram_set_disksize(struct zram *zram, size_t 
totalram_bytes)
 
 static void zram_free_page(struct zram *zram, size_t index)
 {
-   unsigned long handle = zram->table[index].handle;
-   u16 size = zram->table[index].size;
+   unsigned long handle = zram->handle[index];
+   size_t size;
 
-   if (unlikely(!handle)) {
-   /*
-* No memory is allocated for zero filled pages.
-* Simply clear zero page flag.
-*/
-   if (zram_test_flag(zram, index, ZRAM_ZERO)) {
-   zram_clear_flag(zram, index, ZRAM_ZERO);
-   zram_stat_dec(>stats.pages_zero);
-   }
+   if (unlikely(!handle))
+   return;
+
+   if (is_zero_page(handle)) {
+   /* No memory is allocated for zero filled pages */
+   zram->handle[index] = 0;
+   zram_stat_dec(>stats.pages_zero);
return;
}
 
+   size = zs_get_object_size(zram->mem_pool, handle);
if (unlikely(size > max_zpage_size))
zram_stat_dec(>stats.bad_compress);
 
@@ -158,12 +144,10 @@ static void zram_free_page(struct zram *zram, size_t 
index)
if (size <= PAGE_SIZE / 2)
zram_stat_dec(>stats.good_compress);
 
-   zram_stat64_sub(zram, >stats.compr_size,
-   zram->table[index].size);
+   zram_stat64_sub(zram, >stats.compr_size, size);
zram_stat_dec(>stats.pages_stored);
 
-   zram->table[index].handle = 0;
-   zram->table[index].size = 0;
+   zram->handle[index] = 0;
 }
 
 static void handle_zero_page(struct bio_vec *bvec)
@@ -188,19 +172,20 @@ static int zram_decompress_page(struct zram *zram, char 
*mem, u32 index)
int ret = LZO_E_OK;
size_t clen = PAGE_SIZE;
unsigned char *cmem;
-   unsigned long handle = zram->table[index].handle;
+   unsigned long handle = zram->handle[index];
+   size_t objsize;
 
-   if (!handle || zram_test_flag(zram, index, ZRAM_ZERO)) {
+   if (!handle || is_zero_page(handle)) {
memset(mem, 0, PAGE_SIZE);
return 0;
}
 
+   objsize = zs_get_object_size(zram->mem_pool, handle);
cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
-   if (zram->table[index].size == PAGE_SIZE)
+   if (objsize == PAGE_SIZE)
memcpy(mem, cmem, PAGE_SIZE);
else
-   ret = lzo1x_decompress_safe(cmem, zram->table[index].size,
-   mem, );
+   ret = lzo1x_decompress_safe(cmem, objsize, mem, );
zs_unmap_object(zram->mem_pool, handle);
 
/* Should NEVER happen. Return bio error if it does. */
@@ 

[PATCH v2 1/2] zsmalloc: add function to query object size

2012-11-29 Thread Nitin Gupta
Changelog v2 vs v1:
 - None

Adds zs_get_object_size(handle) which provides the size of
the given object. This is useful since the user (zram etc.)
now do not have to maintain object sizes separately, saving
on some metadata size (4b per page).

The object handle encodes  pair which currently points
to the start of the object. Now, the handle implicitly stores the size
information by pointing to the object's end instead. Since zsmalloc is
a slab based allocator, the start of the object can be easily determined
and the difference between the end offset encoded in the handle and the
start gives us the object size.

Signed-off-by: Nitin Gupta 
---
 drivers/staging/zsmalloc/zsmalloc-main.c |  177 +-
 drivers/staging/zsmalloc/zsmalloc.h  |1 +
 2 files changed, 127 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
b/drivers/staging/zsmalloc/zsmalloc-main.c
index 09a9d35..65c9d3b 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -112,20 +112,20 @@
 #define MAX_PHYSMEM_BITS 36
 #else /* !CONFIG_HIGHMEM64G */
 /*
- * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
+ * If this definition of MAX_PHYSMEM_BITS is used, OFFSET_BITS will just
  * be PAGE_SHIFT
  */
 #define MAX_PHYSMEM_BITS BITS_PER_LONG
 #endif
 #endif
 #define _PFN_BITS  (MAX_PHYSMEM_BITS - PAGE_SHIFT)
-#define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS)
-#define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
+#define OFFSET_BITS(BITS_PER_LONG - _PFN_BITS)
+#define OFFSET_MASK((_AC(1, UL) << OFFSET_BITS) - 1)
 
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
-   MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+   MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OFFSET_BITS))
 #define ZS_MAX_ALLOC_SIZE  PAGE_SIZE
 
 /*
@@ -256,6 +256,11 @@ static int is_last_page(struct page *page)
return PagePrivate2(page);
 }
 
+static unsigned long get_page_index(struct page *page)
+{
+   return is_first_page(page) ? 0 : page->index;
+}
+
 static void get_zspage_mapping(struct page *page, unsigned int *class_idx,
enum fullness_group *fullness)
 {
@@ -433,39 +438,86 @@ static struct page *get_next_page(struct page *page)
return next;
 }
 
-/* Encode  as a single handle value */
-static void *obj_location_to_handle(struct page *page, unsigned long obj_idx)
+static struct page *get_prev_page(struct page *page)
 {
-   unsigned long handle;
+   struct page *prev, *first_page;
 
-   if (!page) {
-   BUG_ON(obj_idx);
-   return NULL;
-   }
+   first_page = get_first_page(page);
+   if (page == first_page)
+   prev = NULL;
+   else if (page == (struct page *)first_page->private)
+   prev = first_page;
+   else
+   prev = list_entry(page->lru.prev, struct page, lru);
 
-   handle = page_to_pfn(page) << OBJ_INDEX_BITS;
-   handle |= (obj_idx & OBJ_INDEX_MASK);
+   return prev;
 
-   return (void *)handle;
 }
 
-/* Decode  pair from the given object handle */
-static void obj_handle_to_location(unsigned long handle, struct page **page,
-   unsigned long *obj_idx)
+static void *encode_ptr(struct page *page, unsigned long offset)
 {
-   *page = pfn_to_page(handle >> OBJ_INDEX_BITS);
-   *obj_idx = handle & OBJ_INDEX_MASK;
+   unsigned long ptr;
+   ptr = page_to_pfn(page) << OFFSET_BITS;
+   ptr |= offset & OFFSET_MASK;
+   return (void *)ptr;
+}
+
+static void decode_ptr(unsigned long ptr, struct page **page,
+   unsigned int *offset)
+{
+   *page = pfn_to_page(ptr >> OFFSET_BITS);
+   *offset = ptr & OFFSET_MASK;
+}
+
+static struct page *obj_handle_to_page(unsigned long handle)
+{
+   struct page *page;
+   unsigned int offset;
+
+   decode_ptr(handle, , );
+   if (offset < get_page_index(page))
+   page = get_prev_page(page);
+
+   return page;
+}
+
+static unsigned int obj_handle_to_offset(unsigned long handle,
+   unsigned int class_size)
+{
+   struct page *page;
+   unsigned int offset;
+
+   decode_ptr(handle, , );
+   if (offset < get_page_index(page))
+   offset = PAGE_SIZE - class_size + get_page_index(page);
+   else
+   offset = roundup(offset, class_size) - class_size;
+
+   return offset;
 }
 
-static unsigned long obj_idx_to_offset(struct page *page,
-   unsigned long obj_idx, int class_size)
+/* Encode  as a single handle value */
+static void *obj_location_to_handle(struct page *page, unsigned int offset,
+   unsigned int size, unsigned int class_size)
 {
-   

Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-11-29 Thread Thierry Reding
On Fri, Nov 30, 2012 at 08:54:32AM +0200, Terje Bergström wrote:
> On 29.11.2012 20:34, Stephen Warren wrote:
> > On 11/29/2012 03:21 AM, Terje Bergström wrote:
> >> True. I might also as well delete the general interrupt altogether, as
> >> we don't use it for any real purpose.
> > 
> > Do make sure the interrupts still are part of the DT binding though, so
> > that the binding fully describes the HW, and the interrupt is available
> > to retrieve if we ever do use it in the future.
> 
> Sure, I will just not use the generic irq in DT, but it won't require
> any changes in DT bindings.
> 
> > You can still create tables of clocks inside the driver and loop over
> > them. So, loop unrolling isn't related to my comments at least. It's
> > just that clk_get() shouldn't take its parameters from platform data.
> > 
> > But if these are clocks for (arbitrary) child modules (that may or may
> > not exist dynamically), why aren't the drivers for the child modules
> > managing them?
> 
> There are actually two things here that I mixed, and because of that I
> probably confused everybody else.
> 
> Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
> the clocks. That's why list of clocks and their frequency policies have
> been part of the device description in nvhost. ACM is being replaced
> with runtime PM in downstream kernel, but it still requires rigorous
> testing and analysis of power profile before we can move to it.
> 
> Then, the second thing is that nvhost_probe() has had its own loop to go
> through the clocks of host1x module. It's copy-paste of what ACM did,
> which is just bad design. That's easily replaceable with static code, as
> nvhost_probe() is just for host1x. I'll do that, and as I rip out the
> generic power management code, I'll also make 2D and host1x drivers
> enable the clocks at probe with static code.
> 
> So I think we have a solution that resonates with all proposals.

Yes, that sounds good to me.

Thierry


pgpfOx8mUDgGn.pgp
Description: PGP signature


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 11:38:11AM -0700, Stephen Warren wrote:
> On 11/29/2012 04:47 AM, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> >> On 28.11.2012 23:23, Thierry Reding wrote:
> >>> This could be problematic. Since drivers/video and
> >>> drivers/gpu/drm are separate trees, this would entail a
> >>> continuous burden on keeping both trees synchronized. While I
> >>> realize that eventually it might be better to put the host1x
> >>> driver in a separate place to accomodate for its use by other
> >>> subsystems, I'm not sure moving it here right away is the best 
> >>> approach.
> >> 
> >> I understand your point, but I hope also that we'd end up with
> >> something that can be used as basis for the downstream kernel to
> >> migrate to upstream stack.
> >> 
> >> The key point here is to make the API between nvhost and tegradrm
> >> as small and robust to changes as possible.
> > 
> > I agree. But I also fear that there will be changes eventually and 
> > having both go in via different tree requires those trees to be
> > merged in a specific order to avoid breakage should the API change.
> > This will be particularly ugly in linux-next.
> > 
> > That's why I explicitly proposed to take this into
> > drivers/gpu/drm/tegra for the time being, until we can be
> > reasonably sure that the API is fixed. Then I'm fine with moving it
> > wherever seems the best fit. Even then there might be the
> > occasional dependency, but they should get fewer and fewer as the
> > code matures.
> 
> It is acceptable for one maintainer to ack patches, and another
> maintainer to merge a series that touches both "their own" code and
> code owned by another tree. This should of course only be needed when
> inter-module APIs change; changes to code within a module shouldn't
> require this.

Yes, that's true. But it still makes things more complicated since each
of the maintainers will have to do extra work to test the changes.
Anyway we'll see how this plays out. The ideal case would of course be
to get the API right from the start. =)

Thierry


pgpTwfhcri6rl.pgp
Description: PGP signature


Re: [RFC v2 1/8] video: tegra: Add nvhost driver

2012-11-29 Thread Terje Bergström
On 29.11.2012 20:34, Stephen Warren wrote:
> On 11/29/2012 03:21 AM, Terje Bergström wrote:
>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
> 
> Do make sure the interrupts still are part of the DT binding though, so
> that the binding fully describes the HW, and the interrupt is available
> to retrieve if we ever do use it in the future.

Sure, I will just not use the generic irq in DT, but it won't require
any changes in DT bindings.

> You can still create tables of clocks inside the driver and loop over
> them. So, loop unrolling isn't related to my comments at least. It's
> just that clk_get() shouldn't take its parameters from platform data.
> 
> But if these are clocks for (arbitrary) child modules (that may or may
> not exist dynamically), why aren't the drivers for the child modules
> managing them?

There are actually two things here that I mixed, and because of that I
probably confused everybody else.

Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
the clocks. That's why list of clocks and their frequency policies have
been part of the device description in nvhost. ACM is being replaced
with runtime PM in downstream kernel, but it still requires rigorous
testing and analysis of power profile before we can move to it.

Then, the second thing is that nvhost_probe() has had its own loop to go
through the clocks of host1x module. It's copy-paste of what ACM did,
which is just bad design. That's easily replaceable with static code, as
nvhost_probe() is just for host1x. I'll do that, and as I rip out the
generic power management code, I'll also make 2D and host1x drivers
enable the clocks at probe with static code.

So I think we have a solution that resonates with all proposals.

Best regards,
Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpio: New driver for GPO emulation using PWM generators

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 04:10:24PM +, Grant Likely wrote:
> On Wed, 28 Nov 2012 09:54:57 +0100, Peter Ujfalusi  
> wrote:
> > Hi Grant, Lars, Thierry,
> > 
> > On 11/26/2012 04:46 PM, Grant Likely wrote:
> > > You're effectively asking the pwm layer to behave like a gpio (which
> > > is completely reasonable). Having a completely separate translation node
> > > really doesn't make sense because it is entirely a software construct.
> > > In fact, the way your using it is *entirely* to make the Linux driver
> > > model instantiate the translation code. It has *nothing* to do with the
> > > structure of the hardware. It makes complete sense that if a PWM is
> > > going to be used as a GPIO, then the PWM node should conform to the GPIO
> > > binding.
> > 
> > I understand your point around this. I might say I agree with it as well...
> > I spent yesterday with prototyping and I'm not really convinced that it is a
> > good approach from C code point of view. I got it working, yes.
> > In essence this is what I have on top of the slightly modified gpio-pwm.c
> > driver I have submitted:
> > 
> > DTS files:
> > twl_pwm: pwm {
> > /* provides two PWMs (id 0, 1 for PWM1 and PWM2) */
> > compatible = "ti,twl6030-pwm";
> > #pwm-cells = <2>;
> > 
> > /* Enable GPIO us of the PWMs */
> > gpio-controller = <1>;
> 
> This line should be simply (the property shouldn't have any data):
>   gpio-controller;
> 
> > #gpio-cells = <2>;
> > pwm,period_ns = <7812500>;
> 
> Nit: property names should use '-' instead of '_'.
> 
> > };
> > 
> > leds {
> > compatible = "gpio-leds";
> > backlight {
> > label = "omap4::backlight";
> > gpios = <_pwm 1 0>; /* PWM1 of twl6030 */
> > };
> > 
> > keypad {
> > label = "omap4::keypad";
> > gpios = <_pwm 0 0>; /* PWM0 of twl6030 */
> > };
> > };
> > 
> > The bulk of the code in drivers/pwm/core.c to create the pwm-gpo device when
> > it is requested going to look something like this. I have removed the error
> > checks for now and I still don't have the code to clean up the allocated
> > memory for the created device on error, or in case the module is unloaded. 
> > We
> > should also prevent the pwm core from removal when the pwm-gpo driver is 
> > loaded.
> > We need to create the platform device for gpo-pwm, create the pdata 
> > structure
> > for it and fill it in. We also need to hand craft the pwm_lookup table so we
> > can use pwm_get() to request the PWM. I have other minor changes around this
> > to get things working when we booted with DT.
> > So the function to do the heavy lifting is something like this:
> > static void of_pwmchip_as_gpio(struct pwm_chip *chip)
> > {
> > struct platform_device *pdev;
> > struct gpio_pwm *gpos;
> > struct gpio_pwm_pdata *pdata;
> > struct pwm_lookup *lookup;
> > char gpodev_name[15];
> > int i;
> > u32 gpio_mode = 0;
> > u32 period_ns = 0;
> > 
> > of_property_read_u32(chip->dev->of_node, "gpio-controller",
> >  _mode);
> > if (!gpio_mode)
> > return;
> > 
> > of_property_read_u32(chip->dev->of_node, "pwm,period_ns", _ns);
> > if (!period_ns) {
> > dev_err(chip->dev,
> > "period_ns is not specified for GPIO use\n");
> > return;
> > }
> 
> This property name seems ambiguous. What do you need to encode here? It
> looks like there is a specific PWM period used for the 'on' state. What
> about the 'off' state? Would different pwm outputs have different
> frequencies required for GPIO usage.
> 
> Actually, I'm a bit surprised here that a period value is needed at all.
> I would expect if a PWM is used as a GPIO then the driver would already
> know how to set it up that way.

Just to make sure we're talking about the same thing here: if a PWM is
used as GPIO the assumption is that it would be set to 0% duty-cycle
when the GPIO value is set to 0 and 100% duty-cycle when set to the 1.
The period will still need to be set here, otherwise how would the PWM
core know what the hardware even supports?

Unless you're proposing to not include that in the PWM core but rather
in individual drivers. Then I suppose the driver could choose some
sensible default.

One other problem is that some PWM devices cannot be setup to achieve a
0% or 100% duty-cycle but instead will toggle for at least one period.
This would be another argument in favour of moving the functionality to
the individual drivers, perhaps with some functionality provided by the
core to do the gpio_chip registration (a period could be passed to that
function at registration time), which will likely be the same for all
hardware that can and wants to support this feature.

Thierry


pgpAujFGxIaFt.pgp
Description: PGP signature


zram: fix invalid memory references during disk write

2012-11-29 Thread Nitin Gupta
Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
 - Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
 - Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.

Fixes bug 50081:
https://bugzilla.kernel.org/show_bug.cgi?id=50081

Upstream commit ID: c8f2f0d: zram: Fix handling of incompressible pages
Apply to versions: 3.6.5, 3.6.6, 3.6.7, 3.6.8

Cc:  # staging-next: 37b51fd: zram: factor-out
# zram_decompress_page() function
Signed-off-by: Nitin Gupta 
Reported-by: Mihail Kasadjikov 
Reported-by: Tomas M 
Reviewed-by: Minchan Kim 
---
 drivers/staging/zram/zram_drv.c |   39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index fb4a7c9..f2a73bd 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -265,7 +265,7 @@ out_cleanup:
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
   int offset)
 {
-   int ret;
+   int ret = 0;
size_t clen;
unsigned long handle;
struct page *page;
@@ -286,10 +286,8 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
goto out;
}
ret = zram_decompress_page(zram, uncmem, index);
-   if (ret) {
-   kfree(uncmem);
+   if (ret)
goto out;
-   }
}

/*
@@ -302,16 +300,18 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,

user_mem = kmap_atomic(page);

-   if (is_partial_io(bvec))
+   if (is_partial_io(bvec)) {
memcpy(uncmem + offset, user_mem + bvec->bv_offset,
   bvec->bv_len);
-   else
+   kunmap_atomic(user_mem);
+   user_mem = NULL;
+   } else {
uncmem = user_mem;
+   }

if (page_zero_filled(uncmem)) {
-   kunmap_atomic(user_mem);
-   if (is_partial_io(bvec))
-   kfree(uncmem);
+   if (!is_partial_io(bvec))
+   kunmap_atomic(user_mem);
zram_stat_inc(>stats.pages_zero);
zram_set_flag(zram, index, ZRAM_ZERO);
ret = 0;
@@ -321,9 +321,11 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
ret = lzo1x_1_compress(uncmem, PAGE_SIZE, src, ,
   zram->compress_workmem);

-   kunmap_atomic(user_mem);
-   if (is_partial_io(bvec))
-   kfree(uncmem);
+   if (!is_partial_io(bvec)) {
+   kunmap_atomic(user_mem);
+   user_mem = NULL;
+   uncmem = NULL;
+   }

if (unlikely(ret != LZO_E_OK)) {
pr_err("Compression failed! err=%d\n", ret);
@@ -332,8 +334,10 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,

if (unlikely(clen > max_zpage_size)) {
zram_stat_inc(>stats.bad_compress);
-   src = uncmem;
clen = PAGE_SIZE;
+   src = NULL;
+   if (is_partial_io(bvec))
+   src = uncmem;
}

handle = zs_malloc(zram->mem_pool, clen);
@@ -345,7 +349,11 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
}
cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);

+   if ((clen == PAGE_SIZE) && !is_partial_io(bvec))
+   src = kmap_atomic(page);
memcpy(cmem, src, clen);
+   if ((clen == PAGE_SIZE) && !is_partial_io(bvec))
+   kunmap_atomic(src);

zs_unmap_object(zram->mem_pool, handle);

@@ -358,9 +366,10 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
if (clen <= PAGE_SIZE / 2)
zram_stat_inc(>stats.good_compress);

-   return 0;
-
 out:
+   if (is_partial_io(bvec))
+   kfree(uncmem);
+
if (ret)
zram_stat64_inc(zram, >stats.failed_writes);
return ret;
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v4 00/12] memory-hotplug: hot-remove physical memory

2012-11-29 Thread Tang Chen

Hi Andrew,

On 11/28/2012 03:27 AM, Andrew Morton wrote:


- acpi framework
   https://lkml.org/lkml/2012/10/26/175


What's happening with the acpi framework?  has it received any feedback
from the ACPI developers?


About ACPI framework, we are trying to do the following.

The memory device can be removed by 2 ways:
1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

In the 1st case, acpi_memory_disable_device() will be called.
In the 2nd case, acpi_memory_device_remove() will be called.
acpi_memory_device_remove() will also be called when we unbind the
memory device from the driver acpi_memhotplug or a driver
initialization fails.

acpi_memory_disable_device() has already implemented a code which
offlines memory and releases acpi_memory_info struct . But
acpi_memory_device_remove() has not implemented it yet.

So the patch prepares the framework for hot removing memory and
adds the framework into acpi_memory_device_remove().

All the ACPI related patches have been put into the linux-next branch
of the linux-pm.git tree as v3.8 material.Please refer to the following
url.
https://lkml.org/lkml/2012/11/2/160

So for now, with this patch set, we can do memory hot-remove on x86_64
linux.

I do hope you would merge them before 3.8-rc1, so that we can use this
functionality in 3.8.

As we are still testing all memory hotplug related functionalities, I
hope we can do the bug fix during 3.8 rc.

Thanks. :)




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 7/9] trace: use this_cpu_ptr per-cpu helper

2012-11-29 Thread Shan Wei
Shan Wei said, at 2012/11/16 16:34:
> Shan Wei said, at 2012/11/13 9:53:
>> From: Shan Wei 
>>
>> typeof() is a pointer to array of 1024 char, or char (*)[1024].
>> But, typeof([0]) is a pointer to char which match the return type of 
>> get_trace_buf().
>> As well-known, the value of  is equal to [0].
>> so return this_cpu_ptr(_buffer->buffer[0]) can avoid type cast. 
>>
>> Signed-off-by: Shan Wei 
> 
> Steven Rostedt,  would you like to pick it up to your tree?

ping..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: use __this_cpu_inc/dec operation instead of __get_cpu_var

2012-11-29 Thread Shan Wei
ping ..


Shan Wei said, at 2012/11/19 13:21:
> From: Shan Wei 
> 
> __this_cpu_inc_return() or __this_cpu_dec generates a single instruction,
> which is faster than __get_cpu_var operation.
> 
> Signed-off-by: Shan Wei 
> ---
>  kernel/trace/trace.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 18c0aa8..3795694 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1313,7 +1313,7 @@ static void __ftrace_trace_stack(struct ring_buffer 
> *buffer,
>*/
>   preempt_disable_notrace();
>  
> - use_stack = ++__get_cpu_var(ftrace_stack_reserve);
> + use_stack = __this_cpu_inc_return(ftrace_stack_reserve);
>   /*
>* We don't need any atomic variables, just a barrier.
>* If an interrupt comes in, we don't care, because it would
> @@ -1367,7 +1367,7 @@ static void __ftrace_trace_stack(struct ring_buffer 
> *buffer,
>   out:
>   /* Again, don't let gcc optimize things here */
>   barrier();
> - __get_cpu_var(ftrace_stack_reserve)--;
> + __this_cpu_dec(ftrace_stack_reserve);
>   preempt_enable_notrace();
>  
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suggestion] drivers/tty: drivers/char/: for MAX_ASYNC_BUFFER_SIZE

2012-11-29 Thread Chen Gang
于 2012年11月30日 11:27, Paul Fulghum 写道:
> 
> I’m the maintainer for these drivers. I only caught this message by
> chance and

  it seems you are not in MAINTAINER file.
  is it suitable to add your name into MAINTAINER file ?
(if it was, please help adding ?  I am not quite familiar with it)

  thanks.

-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the akpm tree with Linus' tree

2012-11-29 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in
include/linux/percpu-rwsem.h between commit 4b05a1c74d1c ("percpu-rwsem:
use synchronize_sched_expedited") from Linus' tree and commit
"percpu_rw_semaphore: reimplement to not block the readers unnecessarily"
from the akpm tree.

I fixed it up (using the version from the akpm tree) and can carry the
fix as necessary (more action may be required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpdyuaEURnEI.pgp
Description: PGP signature


Re: A vague, murky topic of "Buffer I/O error on device sdb6, logical block NNNNNNNNN" and a ext4/VFS oops

2012-11-29 Thread Robert Hancock

On 11/29/2012 01:27 PM, Artem S. Tashkinov wrote:

Hello,

When I was copying a lot of information (tens of gigabytes) from my primary HDD 
to a
secondary HDD I got gazillions of errors like these ones:

[19568.964762] EXT4-fs warning (device sdb6): ext4_end_bio:250: I/O error 
writing to inode 6029369 (offset 8036352 size 524288 starting block 51946549)
[19568.964767] sd 2:0:0:0: [sdb]
[19568.964768] Result: hostbyte=0x00 driverbyte=0x08
[19568.964770] sd 2:0:0:0: [sdb]
[19568.964771] Sense Key : 0xb [current] [descriptor]
[19568.964774] Descriptor sense data with sense descriptors (in hex):
[19568.964775] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[19568.964784] 00 00 00 00
[19568.964788] sd 2:0:0:0: [sdb]
[19568.964789] ASC=0x0 ASCQ=0x0
[19568.964791] sd 2:0:0:0: [sdb] CDB:
[19568.964792] cdb[0]=0x2a: 2a 00 18 c5 25 a8 00 00 70 00
[19568.964804] Buffer I/O error on device sdb6, logical block 13727786
[19568.964806] Buffer I/O error on device sdb6, logical block 13727787
[19568.964808] Buffer I/O error on device sdb6, logical block 13727788
[19568.964810] Buffer I/O error on device sdb6, logical block 13727789
[19568.964812] Buffer I/O error on device sdb6, logical block 13727790

along with:

[19568.964832] EXT4-fs warning (device sdb6): ext4_end_bio:250: I/O error 
writing to inode 6029369 (offset 8560640 size 57344 starting block 51946677)
[19568.964843] ata3: EH complete
[19624.635176] ata3.00: exception Emask 0x0 SAct 0x3fff SErr 0x4 action 0x6 
frozen
[19624.635181] ata3: SError: { CommWake }


This is likely the real problem - the controller saw a CommWake during 
operation, which likely means the SATA link bounced for some reason. 
Could be a bad cable, a power issue, or some other hardware problem. The 
rest is likely all fallout from that (except from those _GTF errors 
which are likely due to a somewhat broken BIOS).



[19624.635184] ata3.00: failed command: WRITE FPDMA QUEUED
[19624.635190] ata3.00: cmd 61/00:00:48:ee:cb/04:00:18:00:00/40 tag 0 ncq 
524288 out
[19624.635190]  res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[19624.635193] ata3.00: status: { DRDY }
[19624.635196] ata3.00: failed command: WRITE FPDMA QUEUED
[19624.635201] ata3.00: cmd 61/08:08:f0:65:bd/00:00:1d:00:00/40 tag 1 ncq 4096 
out
[19624.635201]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[19624.635203] ata3.00: status: { DRDY }
[19624.635206] ata3.00: failed command: WRITE FPDMA QUEUED
[19624.635211] ata3.00: cmd 61/00:10:48:f2:cb/04:00:18:00:00/40 tag 2 ncq 
524288 out
[19624.635211]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[19624.635213] ata3.00: status: { DRDY }
[19624.635215] ata3.00: failed command: WRITE FPDMA QUEUED
[19624.635220] ata3.00: cmd 61/00:18:48:f6:cb/04:00:18:00:00/40 tag 3 ncq 
524288 out
[19624.635220]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[19624.635223] ata3.00: status: { DRDY }
[19624.635225] ata3.00: failed command: WRITE FPDMA QUEUED

along with:

[19624.635320] ata3: hard resetting link
[19624.954880] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[19624.956101] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND 
(20120711/psargs-359)
[19624.956109] ACPI Error: Method parse/execution failed 
[\_SB_.PCI0.SAT0.SPT2._GTF] (Node ef0307b0), AE_NOT_FOUND (20120711/psparse-536)
[19624.958006] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND 
(20120711/psargs-359)
[19624.958011] ACPI Error: Method parse/execution failed 
[\_SB_.PCI0.SAT0.SPT2._GTF] (Node ef0307b0), AE_NOT_FOUND (20120711/psparse-536)
[19624.958366] ata3.00: configured for UDMA/133
[19624.960763] ata3.00: device reported invalid CHS sector 0
[19624.960765] ata3.00: device reported invalid CHS sector 0
[19624.960767] ata3.00: device reported invalid CHS sector 0
[19624.960769] ata3.00: device reported invalid CHS sector 0
[19624.960771] ata3.00: device reported invalid CHS sector 0
[19624.960773] ata3.00: device reported invalid CHS sector 0
[19624.960775] ata3.00: device reported invalid CHS sector 0
[19624.960777] ata3.00: device reported invalid CHS sector 0
[19624.960779] ata3.00: device reported invalid CHS sector 0
[19624.960781] ata3.00: device reported invalid CHS sector 0
[19624.960782] ata3.00: device reported invalid CHS sector 0
[19624.960784] ata3.00: device reported invalid CHS sector 0
[19624.960786] ata3.00: device reported invalid CHS sector 0
[19624.960788] ata3.00: device reported invalid CHS sector 0

and also this:

[19624.961128] Buffer I/O error on device sdb6, logical block 13783485
[19624.961132] EXT4-fs warning (device sdb6): ext4_end_bio:250: I/O error 
writing to inode 6029369 (offset 236183552 size 524288 starting block 52002249)
[19624.961142] sd 2:0:0:0: [sdb]
[19624.961144] Result: hostbyte=0x00 driverbyte=0x08
[19624.961146] sd 2:0:0:0: [sdb]
[19624.961147] Sense Key : 0xb [current] [descriptor]
[19624.961149] Descriptor sense data with sense 

[PATCH rcu] Remove unused code originally used for context tracking

2012-11-29 Thread Li Zhong
As new context tracking subsystem added, it seems ignore_user_qs and
in_user defined in struct rcu_dynticks are no longer needed, so remove
them. 

Signed-off-by: Li Zhong 
---
 kernel/rcutree.c | 3 ---
 kernel/rcutree.h | 4 
 2 files changed, 7 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e441b77..b8fae5d 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2719,9 +2719,6 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
rdp->dynticks = _cpu(rcu_dynticks, cpu);
WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
WARN_ON_ONCE(atomic_read(>dynticks->dynticks) != 1);
-#ifdef CONFIG_RCU_USER_QS
-   WARN_ON_ONCE(rdp->dynticks->in_user);
-#endif
rdp->cpu = cpu;
rdp->rsp = rsp;
rcu_boot_init_nocb_percpu_data(rdp);
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 4b69291..6f21f2e 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -102,10 +102,6 @@ struct rcu_dynticks {
/* idle-period nonlazy_posted snapshot. */
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
-#ifdef CONFIG_RCU_USER_QS
-   bool ignore_user_qs;/* Treat userspace as extended QS or not */
-   bool in_user;   /* Is the CPU in userland from RCU POV? */
-#endif
 };
 
 /* RCU's kthread states for tracing. */
-- 
1.7.11.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Btrfs: fix permissions of empty files not affected by umask

2012-11-29 Thread Liu Bo
On Thu, Nov 29, 2012 at 07:40:08PM -0800, Filipe Brandenburger wrote:
> When a new file is created with btrfs_create(), the inode will initially be
> created with permissions 0666 and later on in btrfs_init_acl() it will be
> adapted to mask out the umask bits. The problem is that this change won't make
> it into the btrfs_inode unless there's another change to the inode (e.g. 
> writing
> content changing the size or touching the file changing the mtime.)
> 
> This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure 
> that
> the change will not get lost if the in-memory inode is flushed before other
> changes are made to the file.
> 

Looks good to me.

Reviewed-by: Liu Bo 

thanks,
liubo

> Signed-off-by: Filipe Brandenburger 
> ---
>  fs/btrfs/inode.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 95542a1..caf9d76 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4996,6 +4996,12 @@ static int btrfs_create(struct inode *dir, struct 
> dentry *dentry,
>   goto out_unlock;
>   }
>  
> + err = btrfs_update_inode(trans, root, inode);
> + if (err) {
> + drop_inode = 1;
> + goto out_unlock;
> + }
> +
>   /*
>   * If the active LSM wants to access the inode during
>   * d_instantiate it needs these. Smack checks to see
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio powerpc: enabled on powernv platform

2012-11-29 Thread Alexey Kardashevskiy

On 29/11/12 15:20, Alex Williamson wrote:


+   /* Put tces to the table */
+   for (i = 0; (i < pages) && !ret; ++i, tce += IOMMU_PAGE_SIZE) {
+   ret = put_tce(tbl, entry + i, tce, direction);
+   /*
+* As IOMMU page size is always 4K, the system page size
+* can be 64K and there is no special tracking for IOMMU pages,
+* we only do rlimit check/update for the very first
+* 4K IOMMUpage within 64K system page.
+*/
+   if (!(tce & ~PAGE_MASK))
+   ++retpages;


Ah, here's the comment I was looking for, though I'm still not sure
about the read/write bits.

Isn't there an exploit here that a user can lock pages beyond their
limits if they just skip mapping the first 4k of each page?  Thanks,



Heh. True. Posted another patch with 4K pages per system page usage tracking.



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] i2c-s3c2410: Add bus arbitration implementation

2012-11-29 Thread Olof Johansson
On Thu, Nov 29, 2012 at 6:13 PM, Simon Glass  wrote:
> +Olof
>
> On Thu, Nov 29, 2012 at 8:34 AM, Mark Brown
>  wrote:
>> On Thu, Nov 29, 2012 at 10:35:35AM +0530, Naveen Krishna Chatradhi wrote:
>>
>>> The arbitrator is a general purpose function which uses two GPIOs to
>>> communicate with another device to claim/release a bus. We use it to
>>> arbitrate an i2c port between the AP and the EC.
>>
>> Should this not be layerd on top of the I2C controller rather than part
>> of the controller driver?  It doesn't seem terribly controller specific.
>
> It was originally done separately but I think it was felt that this
> was overly complex. Olof can you please comment on this?

it is indeed not controller specific per se, but we are unaware of any
other platform/driver using it. So, it seemed reasonable to implement
it in the driver as long as we have only one user; if another one
comes along it's of course better to move it to the common i2c code.

At least that was my opinion at the time. I could be convinced
otherwise if someone else has strong opinions on the matter.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] vfio powerpc: enabled on powernv platform

2012-11-29 Thread Alexey Kardashevskiy
This patch initializes IOMMU groups based on the IOMMU
configuration discovered during the PCI scan on POWERNV
(POWER non virtualized) platform. The IOMMU groups are
to be used later by VFIO driver (PCI pass through).

It also implements an API for mapping/unmapping pages for
guest PCI drivers and providing DMA window properties.
This API is going to be used later by QEMU-VFIO to handle
h_put_tce hypercalls from the KVM guest.

Although this driver has been tested only on the POWERNV
platform, it should work on any platform which supports
TCE tables.

To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
option and configure VFIO as required.

Cc: David Gibson 
Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h |9 ++
 arch/powerpc/kernel/iommu.c  |  186 ++
 arch/powerpc/platforms/powernv/pci.c |  135 
 drivers/iommu/Kconfig|8 ++
 4 files changed, 338 insertions(+)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index cbfe678..5c7087a 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -76,6 +76,9 @@ struct iommu_table {
struct iommu_pool large_pool;
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
+#ifdef CONFIG_IOMMU_API
+   struct iommu_group *it_group;
+#endif
 };
 
 struct scatterlist;
@@ -147,5 +150,11 @@ static inline void iommu_restore(void)
 }
 #endif
 
+extern long iommu_clear_tces(struct iommu_table *tbl, unsigned long entry,
+   unsigned long pages);
+extern long iommu_put_tces(struct iommu_table *tbl, unsigned long entry,
+   uint64_t tce, enum dma_data_direction direction,
+   unsigned long pages);
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ff5a6ce..0646c50 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DBG(...)
 
@@ -856,3 +857,188 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
free_pages((unsigned long)vaddr, get_order(size));
}
 }
+
+#ifdef CONFIG_IOMMU_API
+/*
+ * SPAPR TCE API
+ */
+
+/*
+ * Returns the number of used IOMMU pages (4K) within
+ * the same system page (4K or 64K).
+ * bitmap_weight is not used as it does not support bigendian maps.
+ */
+static int syspage_weight(unsigned long *map, unsigned long entry)
+{
+   int ret = 0, nbits = PAGE_SIZE/IOMMU_PAGE_SIZE;
+
+   /* Aligns TCE entry number to system page boundary */
+   entry &= PAGE_MASK >> IOMMU_PAGE_SHIFT;
+
+   /* Count used 4K pages */
+   while (nbits--)
+   ret += (test_bit(entry++, map) == 0) ? 0 : 1;
+
+   return ret;
+}
+
+static void tce_flush(struct iommu_table *tbl)
+{
+   /* Flush/invalidate TLB caches if necessary */
+   if (ppc_md.tce_flush)
+   ppc_md.tce_flush(tbl);
+
+   /* Make sure updates are seen by hardware */
+   mb();
+}
+
+/*
+ * iommu_clear_tces clears tces and returned the number of system pages
+ * which it called put_page() on
+ */
+static long clear_tces_nolock(struct iommu_table *tbl, unsigned long entry,
+   unsigned long pages)
+{
+   int i, retpages = 0;
+   unsigned long oldtce, oldweight;
+   struct page *page;
+
+   for (i = 0; i < pages; ++i) {
+   oldtce = ppc_md.tce_get(tbl, entry + i);
+   ppc_md.tce_free(tbl, entry + i, 1);
+
+   oldweight = syspage_weight(tbl->it_map, entry);
+   __clear_bit(entry, tbl->it_map);
+
+   if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
+   continue;
+
+   page = pfn_to_page(oldtce >> PAGE_SHIFT);
+
+   WARN_ON(!page);
+   if (!page)
+   continue;
+
+   if (oldtce & TCE_PCI_WRITE)
+   SetPageDirty(page);
+
+   put_page(page);
+
+   /* That was the last IOMMU page within the system page */
+   if ((oldweight == 1) && !syspage_weight(tbl->it_map, entry))
+   ++retpages;
+   }
+
+   return retpages;
+}
+
+/*
+ * iommu_clear_tces clears tces and returned the number
+ / of released system pages
+ */
+long iommu_clear_tces(struct iommu_table *tbl, unsigned long entry,
+   unsigned long pages)
+{
+   int ret;
+   struct iommu_pool *pool = get_pool(tbl, entry);
+
+   spin_lock(&(pool->lock));
+   ret = clear_tces_nolock(tbl, entry, pages);
+   tce_flush(tbl);
+   spin_unlock(&(pool->lock));
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_clear_tces);
+
+static int put_tce(struct iommu_table *tbl, unsigned long entry,
+   uint64_t tce, enum 

Re: [RFC,v2,3/8] video: tegra: host: Add channel and client support

2012-11-29 Thread Mark Zhang
On 11/29/2012 06:46 PM, Terje Bergström wrote:
> On 29.11.2012 12:01, Mark Zhang wrote:
>>
>> Just for curious, why "pb->mapped + 1K" is the end of a 4K pushbuffer?
> 
> pb->mapped is u32 *, so compiler will take care of multiplying by
> sizeof(u32).
> 

Ah, yes. Sorry, I must be insane at that time. :)

>>> +unsigned int nvhost_cdma_wait_locked(struct nvhost_cdma *cdma,
>>> +   enum cdma_event event)
>>> +{
>>> +   for (;;) {
>>> +   unsigned int space = cdma_status_locked(cdma, event);
>>> +   if (space)
>>> +   return space;
>>> +
>>> +   /* If somebody has managed to already start waiting, yield */
>>> +   if (cdma->event != CDMA_EVENT_NONE) {
>>> +   mutex_unlock(>lock);
>>> +   schedule();
>>> +   mutex_lock(>lock);
>>> +   continue;
>>> +   }
>>> +   cdma->event = event;
>>> +
>>> +   mutex_unlock(>lock);
>>> +   down(>sem);
>>> +   mutex_lock(>lock);
>>
>> I'm newbie of nvhost but I feel here is very tricky, about the lock and
>> unlock of this mutex: cdma->lock. Does it require this mutex is locked
>> before calling this function? And do we need to unlock it before the
>> code: "return space;" above? IMHO, this is not a good design and can we
>> find out a better solution?
> 
> Yeah, it's not perfect and good solutions are welcome.
> cdma_status_locked() must be called with a mutex. But, what we generally
> wait for is for space in push buffer. The cleanup code cannot run if we
> keep cdma->lock, so I release it.
> 
> The two ways to loop are because there was a race between two processes
> waiting for space. One of them set cdma->event to indicate what it's
> waiting for and can go to sleep, but the other has to keep spinning.
> 

Alright. I just feel this mutex operations is complicated and
error-prone, but I just get the big picture of nvhost and still don't
know much about a lot of details. So I'll let you know if I find some
better solutions.

> Terje
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH] fs/super.c set_anon_super calling optimization

2012-11-29 Thread Al Viro
On Fri, Nov 30, 2012 at 11:10:02AM +0530, Abhijit Pawar wrote:

> > Because we want it to be a valid sget() callback.  I doubt that this
> > optimization is worth doing, though - might even micro-pessimize the things
> > on architectures where all arguments are passed in registers.
> > 
> Al,
> Yes. it will be helpful in registers case.

How so?  Consider something like
static int btrfs_set_super(struct super_block *s, void *data)
{
int err = set_anon_super(s, data);
if (!err)
s->s_fs_info = data;
return err;
}
Compile it e.g. for alpha.  Or powerpc.  Or amd64, for that matter.
With and without your change.  And compare the resulting assembler.

Hell, if the arguments are passed in register, without your patch
we have the args for set_anon_super() all set just as we enter
btrfs_set_super().  With your patch the second one needs to be zeroed
out...

In any case, that's microoptimization in the best case and on quite a few
architectures it's a pessimization (granted, an equally minor one).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Andrew Morton
On Fri, 30 Nov 2012 11:42:05 +0800 Lin Feng  wrote:

> hi Andrew,
> 
> On 11/30/2012 07:39 AM, Andrew Morton wrote:
> > Tricky.
> > 
> > I expect the same problem would occur with pages which are under
> > O_DIRECT I/O.  Obviously O_DIRECT pages won't be pinned for such long
> > periods, but the durations could still be lengthy (seconds).
> the offline retry timeout duration is 2 minutes, so to O_DIRECT pages 
> seem maybe not a problem for the moment.
> > 
> > Worse is a futex page, which could easily remain pinned indefinitely.
> > 
> > The best I can think of is to make changes in or around
> > get_user_pages(), to steal the pages from userspace and replace them
> > with non-movable ones before pinning them.  The performance cost of
> > something like this would surely be unacceptable for direct-io, but
> > maybe OK for the aio ring and futexes.
> thanks for your advice.
> I want to limit the impact as little as possible, as mentioned above,
> direct-io seems not a problem, we needn't touch them. Maybe we can 
> just change the use of get_user_pages()(in or around) such as aio 
> ring pages. I will try to find a way to do this.

What about futexes?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the signal tree with the ftrace tree

2012-11-29 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/microblaze/include/asm/Kbuild between commit 8cbd9cc62540
("tracing,x86: Add a TSC trace_clock") from the ftrace tree and commit
24465a40ba45 ("take sys_fork/sys_vfork/sys_clone prototypes to
linux/syscalls.h") from the signal tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/microblaze/include/asm/Kbuild
index c5d7670,88a758a..000
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@@ -1,4 -1,6 +1,5 @@@
 -include include/asm-generic/Kbuild.asm
  
 -header-y  += elf.h
  generic-y += clkdev.h
  generic-y += exec.h
 +generic-y += trace_clock.h
+ generic-y += syscalls.h


pgpv2hasVNwyi.pgp
Description: PGP signature


Re: [PATCH v2 -tip 3/4] tracing: make a snapshot feature available from userspace

2012-11-29 Thread Hiraku Toyooka

Hi, Steven,

Thank you for your review.

(2012/11/16 10:46), Steven Rostedt wrote:
[snip]
> I was thinking about this some more, and I don't like the
> snapshot_allocate part. Also, I think the snapshot should not be
> allocated by default, and not used until the user explicitly asks for
> it.
>
> Have this:
>
> ---
>  # cat snapshot
> Snapshot is not allocated. To allocate it write the ASCII "1" into
> this file:
>
>   echo 1 > snapshot
>
> This will allocate the buffer for you. To free the snapshot echo "0"
> into this file.
>
>   echo "0" > snapshot
>
> Anything else will reset the snapshot if it is allocated, or return
> EINVAL if it is not allocated.
> ---
>

Your idea about "snapshot" is like following table, isn't it?

 status\input | 0  | 1  |else|
--++++
not allocated |   EINVAL   | alloc+swap |   EINVAL   |
--++++
  allocated   |free| clear+swap |   clear|
--++++

I think it is almost OK, but there is a problem.
When we echo "1" to the allocated snapshot, the clear operation adds
some delay because the time cost of tracing_reset_online_cpus() is in
proportion to the number of CPUs.
(It takes 72ms in my 8 CPU environment.)

So, when the snapshot is already cleared by echoing "else" values, we
can avoid the delay on echoing "1" by keeping "cleared" status
internally. For example, we can add the "cleared" flag to struct tracer.
What do you think about it?

>
> Also we can add a "trace_snapshot" to the kernel parameters to have it
> allocated on boot. But I can add that if you update these patches.
>

OK, I'll update my patches.

[snip]
>> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
>> index 4cea4f4..73d56d5 100644
>> --- a/kernel/trace/Kconfig
>> +++ b/kernel/trace/Kconfig
>> @@ -102,6 +102,17 @@ config RING_BUFFER_ALLOW_SWAP
>>   Allow the use of ring_buffer_swap_cpu.
>>   Adds a very slight overhead to tracing when enabled.
>>
>
> Move this config down below FTRACE_SYSCALLS and give it a prompt. As
> well as do not make it default y.
>

I'll modify it.

>
>> +config TRACER_SNAPSHOT
>> +bool
>
> bool "Create a snapshot trace buffer"
>

I'll fix it.


>
>> +default y
>> +select TRACER_MAX_TRACE
>> +help
>> +  Allow tracing users to take snapshot of the current buffer 
using the

>> +  ftrace interface, e.g.:
>> +
>> +  echo 1 > /sys/kernel/debug/tracing/snapshot
>> +  cat snapshot
>> +
[snip]
>>  static struct trace_iterator *
>> -__tracing_open(struct inode *inode, struct file *file)
>> +__tracing_open(struct inode *inode, struct file *file, int snapshot)
>
> bool snapshot
>

I'll fix it.


>>  {
>>  long cpu_file = (long) inode->i_private;
>>  struct trace_iterator *iter;
>> @@ -2408,10 +2410,11 @@ __tracing_open(struct inode *inode, struct 
file *file)

>>  if (!zalloc_cpumask_var(>started, GFP_KERNEL))
>>  goto fail;
>>
>> -if (current_trace && current_trace->print_max)
>> +if ((current_trace && current_trace->print_max) || snapshot)
>>  iter->tr = _tr;
>>  else
>>  iter->tr = _trace;
>> +iter->snapshot = !!snapshot;
>
> Get rid of the !!
>

I'll fix it.

[snip]
>> @@ -2517,7 +2522,7 @@ static int tracing_open(struct inode *inode, 
struct file *file)

>>  }
>>
>>  if (file->f_mode & FMODE_READ) {
>> -iter = __tracing_open(inode, file);
>> +iter = __tracing_open(inode, file, 0);
>
> , false)
>

I'll fix it.

>>  if (IS_ERR(iter))
>>  ret = PTR_ERR(iter);
>>  else if (trace_flags & TRACE_ITER_LATENCY_FMT)
>> @@ -3186,7 +3191,8 @@ static int tracing_set_tracer(const char *buf)
>>  trace_branch_disable();
>>  if (current_trace && current_trace->reset)
>>  current_trace->reset(tr);
>> -if (current_trace && current_trace->use_max_tr) {
>> +if (current_trace && current_trace->allocated_snapshot) {
>> +tracing_reset_online_cpus(_tr);
>
> max_tr->buffer could be NULL.
>
> Either test here, or better yet, put the test into
> tracing_reset_online_cpus().
>
> if (!buffer)
> return;
>

I see. I'll add the test to tracing_reset_online_cpus(). Should I make a
separated patch?

[snip]
>> +static ssize_t tracing_snapshot_read(struct file *filp, char __user 
*ubuf,

>> + size_t cnt, loff_t *ppos)
>> +{
>> +ssize_t ret = 0;
>> +
>> +mutex_lock(_types_lock);
>> +if (current_trace && current_trace->use_max_tr)
>> +ret = -EBUSY;
>> +mutex_unlock(_types_lock);
>
> I don't like this, as it is racy. The current tracer could change after
> the unlock, and your back to the problem.
>

You're right...
This is racy.

> Now what we may be able to do, but it would take a little checking for
> lock ordering with trace_access_lock() and trace_event_read_lock(), but
> we could add the mutex 

[PATCH] ARM: dt: tegra: cardhu: Add drm components

2012-11-29 Thread Mark Zhang
This patch adds the RGB & HDMI nodes in Tegra 30 cardhu's dts.
This makes the LVDS & HDMI working.

Signed-off-by: Mark Zhang 
---
Changes:
- This patch depcrates the patch:
  "[PATCH V2] ARM: dt: tegra: cardhu: Add drm components"
  Because the issue that LCD and HDMI can't work at the same time
  has been fixed by Thierry's patch:
  "drm: tegra: Use framebuffer pitch as line stride"

 arch/arm/boot/dts/tegra30-cardhu.dtsi |   23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/tegra30-cardhu.dtsi 
b/arch/arm/boot/dts/tegra30-cardhu.dtsi
index bdb2a66..9b33bfb 100644
--- a/arch/arm/boot/dts/tegra30-cardhu.dtsi
+++ b/arch/arm/boot/dts/tegra30-cardhu.dtsi
@@ -27,6 +27,25 @@
model = "NVIDIA Tegra30 Cardhu evaluation board";
compatible = "nvidia,cardhu", "nvidia,tegra30";
 
+   host1x {
+   dc@5420 {
+   rgb {
+   status = "okay";
+   nvidia,ddc-i2c-bus = <_ddc>;
+   };
+   };
+
+   hdmi {
+   status = "okay";
+
+   vdd-supply = <_3v3_reg>;
+   pll-supply = <_reg>;
+
+   nvidia,hpd-gpio = < 111 0>; /* PN7 */
+   nvidia,ddc-i2c-bus = <_ddc>;
+   };
+   };
+
memory {
reg = <0x8000 0x4000>;
};
@@ -114,7 +133,7 @@
clock-frequency = <40800>;
};
 
-   i2c@7000c000 {
+   rgb_ddc: i2c@7000c000 {
status = "okay";
clock-frequency = <10>;
};
@@ -137,7 +156,7 @@
};
};
 
-   i2c@7000c700 {
+   hdmi_ddc: i2c@7000c700 {
status = "okay";
clock-frequency = <10>;
};
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH] fs/super.c set_anon_super calling optimization

2012-11-29 Thread Abhijit Pawar
On 11/30/2012 09:35 AM, Al Viro wrote:
> On Fri, Oct 26, 2012 at 11:14:41AM -0200, Carlos Maiolino wrote:
>> Hi,
>>
>> On Thu, Oct 25, 2012 at 05:08:19PM +0530, Abhijit Pawar wrote:
>>> Hi,
>>> set_anon_super is called by many filesystems. Some call directly and
>>> some call through the wrapper. Many of them in the wrapper's call to
>>> this function are passing the second argument to this function which
>>> is not used anywhere.
>>>
>>> This patch replaces the second variable with NULL.
>>>
>>
>> If the variable isn't used anymore, why don't just get rid of it, instead of
>> call the function passing a NULL pointer on it?
> 
>   Because we want it to be a valid sget() callback.  I doubt that this
> optimization is worth doing, though - might even micro-pessimize the things
> on architectures where all arguments are passed in registers.
> 
Al,
Yes. it will be helpful in registers case.

-- 
-
Abhijit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] fs:sysfs pass NULL as second parameter for set_anon_super

2012-11-29 Thread Abhijit Pawar
On 11/29/2012 09:06 PM, Greg Kroah-Hartman wrote:
> On Thu, Nov 29, 2012 at 12:05:45PM +0530, Abhijit Pawar wrote:
>> set_anon_super does not use the second parameter in its implementation.
>> So there is no need to pass on the second parameter.
> 
> Why not just remove the second parameter from the call then?
> 
> thanks,
> 
> greg k-h
> 

This is used as a callback function. So changing the signature will
affect many other filesystems who use the default function rather than
the overridden one.


-- 
-
Abhijit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND] Add NumaChip remote PCI support

2012-11-29 Thread Daniel J Blueman

Hi Bjorn,

On 29/11/2012 07:08, Bjorn Helgaas wrote:

On Wed, Nov 21, 2012 at 1:39 AM, Daniel J Blueman
 wrote:

Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
preventing access to AMD Northbridges which shouldn't respond.

v2: Use PCI_DEVFN in precomputed constant limit; drop unneeded includes

Signed-off-by: Daniel J Blueman 
---
  arch/x86/include/asm/numachip/numachip.h |   20 +
  arch/x86/kernel/apic/apic_numachip.c |2 +
  arch/x86/pci/Makefile|1 +
  arch/x86/pci/numachip.c  |  134 ++
  4 files changed, 157 insertions(+)
  create mode 100644 arch/x86/include/asm/numachip/numachip.h
  create mode 100644 arch/x86/pci/numachip.c

diff --git a/arch/x86/include/asm/numachip/numachip.h 
b/arch/x86/include/asm/numachip/numachip.h
new file mode 100644
index 000..d35e71a
--- /dev/null
+++ b/arch/x86/include/asm/numachip/numachip.h
@@ -0,0 +1,20 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Numascale NumaConnect-specific header file
+ *
+ * Copyright (C) 2012 Numascale AS. All rights reserved.
+ *
+ * Send feedback to 
+ *
+ */
+
+#ifndef _ASM_X86_NUMACHIP_NUMACHIP_H
+#define _ASM_X86_NUMACHIP_NUMACHIP_H
+
+extern int __init pci_numachip_init(void);
+
+#endif /* _ASM_X86_NUMACHIP_NUMACHIP_H */
+
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index a65829a..9c2aa89 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -22,6 +22,7 @@
  #include 
  #include 

+#include 
  #include 
  #include 
  #include 
@@ -179,6 +180,7 @@ static int __init numachip_system_init(void)
 return 0;

 x86_cpuinit.fixup_cpu_id = fixup_cpu_id;
+   x86_init.pci.arch_init = pci_numachip_init;

 map_csrs();

diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
index 3af5a1e..ee0af58 100644
--- a/arch/x86/pci/Makefile
+++ b/arch/x86/pci/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_STA2X11)   += sta2x11-fixup.o
  obj-$(CONFIG_X86_VISWS)+= visws.o

  obj-$(CONFIG_X86_NUMAQ)+= numaq_32.o
+obj-$(CONFIG_X86_NUMACHIP) += numachip.o


It looks like this depends on CONFIG_PCI_MMCONFIG for
pci_mmconfig_lookup().  Are there config constraints that force
CONFIG_PCI_MMCONFIG=y when CONFIG_X86_NUMACHIP=y?


I'll revise the patch with this constraint after we work out the best 
approach for below.



  obj-$(CONFIG_X86_INTEL_MID)+= mrst.o

diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c
new file mode 100644
index 000..3773e05
--- /dev/null
+++ b/arch/x86/pci/numachip.c
@@ -0,0 +1,129 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Numascale NumaConnect-specific PCI code
+ *
+ * Copyright (C) 2012 Numascale AS. All rights reserved.
+ *
+ * Send feedback to 
+ *
+ * PCI accessor functions derived from mmconfig_64.c
+ *
+ */
+
+#include 
+#include 
+
+static u8 limit __read_mostly;
+
+static inline char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, 
unsigned int devfn)
+{
+   struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
+
+   if (cfg && cfg->virt)
+   return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
+   return NULL;
+}


Most of this file is copied directly from mmconfig_64.c (as you
mentioned above).  I wonder if we could avoid the code duplication by
making the pci_dev_base() implementation in mmconfig_64.c a weak
definition.  Then you could just supply a non-weak pci_dev_base() here
that would override that default version.  Your version would look
something like:

   char __iomem *pci_dev_base(unsigned int seg, unsigned int bus,
unsigned int devfn)
   {
   struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);

   if (cfg && cfg->virt && devfn < limit)
   return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
   return NULL;
   }

That would be different from what you have in this patch because reads
& writes to devices above "limit" would return -EINVAL rather than 0
as you do here.  Would that be a problem?


That would work nicely (pointer lookup and inlining etc aside) if there 
was the runtime ability to override pci_dev_base only if the NumaChip 
signature was detected.


We could expose pci_dev_base via struct x86_init_pci; the extra 
complexity and performance tradeoff may not be worth it for a single 
case perhaps?


Thanks,
  Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale Asia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] staging/serqt_usb2: Refactor qt_status_change_check() in serqt_usb2.c

2012-11-29 Thread YAMANE Toshiaki
On Fri, Nov 30, 2012 at 11:10 AM, Greg Kroah-Hartman
 wrote:
> On Thu, Nov 29, 2012 at 01:57:56PM +0900, YAMANE Toshiaki wrote:
>> Improved position to increment variable i,
>> And typo fixes.
>>
>> Signed-off-by: YAMANE Toshiaki 
>> ---
>>  drivers/staging/serqt_usb2/serqt_usb2.c |   12 ++--
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/staging/serqt_usb2/serqt_usb2.c 
>> b/drivers/staging/serqt_usb2/serqt_usb2.c
>> index 1b3e995..095d6f2 100644
>> --- a/drivers/staging/serqt_usb2/serqt_usb2.c
>> +++ b/drivers/staging/serqt_usb2/serqt_usb2.c
>> @@ -309,26 +309,26 @@ static void qt_status_change_check(struct tty_struct 
>> *tty,
>>   case 0x00:
>>   if (i > (RxCount - 4)) {
>>   dev_dbg(>dev,
>> - "Illegal escape seuences in 
>> received data\n");
>> + "Illegal escape sequence in 
>> received data\n");
>
> This is a different type of fix from:
>
>>   break;
>>   }
>>
>> - ProcessLineStatus(qt_port, data[i + 3]);
>> -
>>   i += 3;
>> + ProcessLineStatus(qt_port, data[i]);
>
> I think you just changed the logic in this function, didn't you?
>
>> +
>>   flag = 1;
>>   break;
>>
>>   case 0x01:
>>   if (i > (RxCount - 4)) {
>>   dev_dbg(>dev,
>> - "Illegal escape seuences in 
>> received data\n");
>> + "Illegal escape sequence in 
>> received data\n");
>>   break;
>>   }
>>
>> - ProcessModemStatus(qt_port, data[i + 3]);
>> -
>>   i += 3;
>> + ProcessModemStatus(qt_port, data[i]);
>
> Same here, what happens to i after this?
>
> Please break into two patches, and verify that you didn't break anything
> here.

Greg-san,

I am sorry for confusion.
I sent the patch twice since following patch was applied (gregkh/staging-next)

commit 9d36976fad3008fcc4209789566f7f3e7763f212
Modify qt_status_change_check() and delete qt_status_change().

-Incorporate comment of Mr.Joe Perches (sent Nov.17)
-I sent yesterday

Please discard the patches.

Thanks,


YAMANE Toshiaki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] regulator: wm831x-dcdc: Ensure selected voltage falls within requested range

2012-11-29 Thread Axel Lin
Use DIV_ROUND_UP to ensure selected voltage won't less than min_uV due to
integer truncation.

Signed-off-by: Axel Lin 
---
 drivers/regulator/wm831x-dcdc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/wm831x-dcdc.c b/drivers/regulator/wm831x-dcdc.c
index bce2565..96450f1 100644
--- a/drivers/regulator/wm831x-dcdc.c
+++ b/drivers/regulator/wm831x-dcdc.c
@@ -223,7 +223,7 @@ static int wm831x_buckv_map_voltage(struct regulator_dev 
*rdev,
if (min_uV < 60)
vsel = 0;
else if (min_uV <= 180)
-   vsel = ((min_uV - 60) / 12500) + 8;
+   vsel = DIV_ROUND_UP(min_uV - 60, 12500) + 8;
else
return -EINVAL;
 
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the arm-soc tree with the omap_dss2 tree

2012-11-29 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
arch/arm/mach-davinci/devices-da8xx.c between commit 3b43ad201dea
("da8xx-fb: cleanup LCDC configurations") from the omap_dss2 tree and
commit 8e0d72d2c7a6 ("ARM: davinci: da8xx: add DA850 PRUSS support") from
the arm-soc tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-davinci/devices-da8xx.c
index fcb30d3,46c9a0c..000
--- a/arch/arm/mach-davinci/devices-da8xx.c
+++ b/arch/arm/mach-davinci/devices-da8xx.c
@@@ -518,9 -520,98 +520,78 @@@ void __init da8xx_register_mcasp(int id
}
  }
  
+ static struct resource da8xx_pruss_resources[] = {
+   {
+   .start  = DA8XX_PRUSS_MEM_BASE,
+   .end= DA8XX_PRUSS_MEM_BASE + 0x,
+   .flags  = IORESOURCE_MEM,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT0,
+   .end= IRQ_DA8XX_EVTOUT0,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT1,
+   .end= IRQ_DA8XX_EVTOUT1,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT2,
+   .end= IRQ_DA8XX_EVTOUT2,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT3,
+   .end= IRQ_DA8XX_EVTOUT3,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT4,
+   .end= IRQ_DA8XX_EVTOUT4,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT5,
+   .end= IRQ_DA8XX_EVTOUT5,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT6,
+   .end= IRQ_DA8XX_EVTOUT6,
+   .flags  = IORESOURCE_IRQ,
+   },
+   {
+   .start  = IRQ_DA8XX_EVTOUT7,
+   .end= IRQ_DA8XX_EVTOUT7,
+   .flags  = IORESOURCE_IRQ,
+   },
+ };
+ 
+ static struct uio_pruss_pdata da8xx_uio_pruss_pdata = {
+   .pintc_base = 0x4000,
+ };
+ 
+ static struct platform_device da8xx_uio_pruss_dev = {
+   .name   = "pruss_uio",
+   .id = -1,
+   .num_resources  = ARRAY_SIZE(da8xx_pruss_resources),
+   .resource   = da8xx_pruss_resources,
+   .dev= {
+   .coherent_dma_mask  = DMA_BIT_MASK(32),
+   .platform_data  = _uio_pruss_pdata,
+   }
+ };
+ 
+ int __init da8xx_register_uio_pruss(void)
+ {
+   da8xx_uio_pruss_pdata.sram_pool = sram_get_gen_pool();
+   return platform_device_register(_uio_pruss_dev);
+ }
+ 
 -static const struct display_panel disp_panel = {
 -  QVGA,
 -  16,
 -  16,
 -  COLOR_ACTIVE,
 -};
 -
  static struct lcd_ctrl_config lcd_cfg = {
 -  _panel,
 -  .ac_bias= 255,
 -  .ac_bias_intrpt = 0,
 -  .dma_burst_sz   = 16,
 +  .panel_shade= COLOR_ACTIVE,
.bpp= 16,
 -  .fdd= 255,
 -  .tft_alt_mode   = 0,
 -  .stn_565_mode   = 0,
 -  .mono_8bit_mode = 0,
 -  .invert_line_clock  = 1,
 -  .invert_frm_clock   = 1,
 -  .sync_edge  = 0,
 -  .sync_ctrl  = 1,
 -  .raster_order   = 0,
 -  .fifo_th= 6,
  };
  
  struct da8xx_lcdc_platform_data sharp_lcd035q3dg01_pdata = {


pgpaS8kDFOjUX.pgp
Description: PGP signature


[PATCH] [trivial] treewide: Fix typos in various drivers

2012-11-29 Thread Masanari Iida
Fix typos in printk within various drivers.

Signed-off-by: Masanari Iida 
---
 arch/arm/kernel/kprobes-test.c | 2 +-
 arch/arm/mach-netx/xc.c| 2 +-
 arch/blackfin/mach-bf609/Kconfig   | 2 +-
 arch/blackfin/mm/sram-alloc.c  | 2 +-
 drivers/media/dvb-frontends/drxk_hard.c| 2 +-
 drivers/media/platform/mx2_emmaprp.c   | 2 +-
 drivers/remoteproc/remoteproc_elf_loader.c | 4 ++--
 tools/perf/Documentation/perf-record.txt   | 2 +-
 tools/perf/util/parse-events-test.c| 2 +-
 9 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/kprobes-test.c b/arch/arm/kernel/kprobes-test.c
index 1862d8f..0cd63d0 100644
--- a/arch/arm/kernel/kprobes-test.c
+++ b/arch/arm/kernel/kprobes-test.c
@@ -1598,7 +1598,7 @@ static int __init run_all_tests(void)
 {
int ret = 0;
 
-   pr_info("Begining kprobe tests...\n");
+   pr_info("Beginning kprobe tests...\n");
 
 #ifndef CONFIG_THUMB2_KERNEL
 
diff --git a/arch/arm/mach-netx/xc.c b/arch/arm/mach-netx/xc.c
index e4cfb7e..f1c972d 100644
--- a/arch/arm/mach-netx/xc.c
+++ b/arch/arm/mach-netx/xc.c
@@ -136,7 +136,7 @@ int xc_request_firmware(struct xc *x)
if (head->magic != 0x4e657458) {
if (head->magic == 0x5874654e) {
dev_err(x->dev,
-   "firmware magic is 'XteN'. Endianess problems?\n");
+   "firmware magic is 'XteN'. Endianness problems?\n");
ret = -ENODEV;
goto exit_release_firmware;
}
diff --git a/arch/blackfin/mach-bf609/Kconfig b/arch/blackfin/mach-bf609/Kconfig
index 101b33e..95a4f1b 100644
--- a/arch/blackfin/mach-bf609/Kconfig
+++ b/arch/blackfin/mach-bf609/Kconfig
@@ -56,7 +56,7 @@ config SEC_IRQ_PRIORITY_LEVELS
default 7
range 0 7
help
- Devide the total number of interrupt priority levels into sub-levels.
+ Divide the total number of interrupt priority levels into sub-levels.
  There is 2 ^ (SEC_IRQ_PRIORITY_LEVELS + 1) different levels.
 
 endmenu
diff --git a/arch/blackfin/mm/sram-alloc.c b/arch/blackfin/mm/sram-alloc.c
index 342e378..1f3b3ef 100644
--- a/arch/blackfin/mm/sram-alloc.c
+++ b/arch/blackfin/mm/sram-alloc.c
@@ -191,7 +191,7 @@ static irqreturn_t l2_ecc_err(int irq, void *dev_id)
 {
int status;
 
-   printk(KERN_ERR "L2 ecc error happend\n");
+   printk(KERN_ERR "L2 ecc error happened\n");
status = bfin_read32(L2CTL0_STAT);
if (status & 0x1)
printk(KERN_ERR "Core channel error type:0x%x, addr:0x%x\n",
diff --git a/drivers/media/dvb-frontends/drxk_hard.c 
b/drivers/media/dvb-frontends/drxk_hard.c
index 8b4c6d5..df9abe8 100644
--- a/drivers/media/dvb-frontends/drxk_hard.c
+++ b/drivers/media/dvb-frontends/drxk_hard.c
@@ -948,7 +948,7 @@ static int GetDeviceCapabilities(struct drxk_state *state)
state->m_oscClockFreq = 20250;
break;
default:
-   printk(KERN_ERR "drxk: Clock Frequency is unkonwn\n");
+   printk(KERN_ERR "drxk: Clock Frequency is unknown\n");
return -EINVAL;
}
/*
diff --git a/drivers/media/platform/mx2_emmaprp.c 
b/drivers/media/platform/mx2_emmaprp.c
index 8f22ce5..bfa6507 100644
--- a/drivers/media/platform/mx2_emmaprp.c
+++ b/drivers/media/platform/mx2_emmaprp.c
@@ -371,7 +371,7 @@ static irqreturn_t emmaprp_irq(int irq_emma, void *data)
if (!curr_ctx->aborting) {
if ((irqst & PRP_INTR_ST_RDERR) ||
(irqst & PRP_INTR_ST_CH2WERR)) {
-   pr_err("PrP bus error ocurred, this transfer is 
probably corrupted\n");
+   pr_err("PrP bus error occurred, this transfer is 
probably corrupted\n");
writel(PRP_CNTL_SWRST, pcdev->base_emma + PRP_CNTL);
} else if (irqst & PRP_INTR_ST_CH2B1CI) { /* buffer ready */
src_vb = v4l2_m2m_src_buf_remove(curr_ctx->m2m_ctx);
diff --git a/drivers/remoteproc/remoteproc_elf_loader.c 
b/drivers/remoteproc/remoteproc_elf_loader.c
index e1f89d6..0d36f94 100644
--- a/drivers/remoteproc/remoteproc_elf_loader.c
+++ b/drivers/remoteproc/remoteproc_elf_loader.c
@@ -66,13 +66,13 @@ rproc_elf_sanity_check(struct rproc *rproc, const struct 
firmware *fw)
return -EINVAL;
}
 
-   /* We assume the firmware has the same endianess as the host */
+   /* We assume the firmware has the same endianness as the host */
 # ifdef __LITTLE_ENDIAN
if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB) {
 # else /* BIG ENDIAN */
if (ehdr->e_ident[EI_DATA] != ELFDATA2MSB) {
 # endif
-   dev_err(dev, "Unsupported firmware endianess\n");
+   dev_err(dev, "Unsupported firmware endianness\n");
return -EINVAL;
}
 
diff --git a/tools/perf/Documentation/perf-record.txt 

[PATCH v3 2/4] ARM: dts: Add disable-wp for sd card slot on smdk5250

2012-11-29 Thread Doug Anderson
The next change will remove the code from the dw_mmc-exynos that added
the DW_MCI_QUIRK_NO_WRITE_PROTECT.  Keep existing functionality of
having no write protect pin on smdk5250 by adding the disable-wp
property.

Signed-off-by: Doug Anderson 
---
Changes in v3:
- New for this version of the patch series.

 arch/arm/boot/dts/exynos5250-smdk5250.dts |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5250-smdk5250.dts 
b/arch/arm/boot/dts/exynos5250-smdk5250.dts
index f30ca18..5538b13 100644
--- a/arch/arm/boot/dts/exynos5250-smdk5250.dts
+++ b/arch/arm/boot/dts/exynos5250-smdk5250.dts
@@ -146,6 +146,7 @@
reg = <0>;
bus-width = <4>;
samsung,cd-pinmux-gpio = < 2 2 3 3>;
+   disable-wp;
gpios = < 0 2 0 3>, < 1 2 0 3>,
< 3 2 3 3>, < 4 2 3 3>,
< 5 2 3 3>, < 6 2 3 3>,
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 4/4] mmc: dw_mmc: Handle wp-gpios from device tree

2012-11-29 Thread Doug Anderson
On some SoCs (like exynos5250) you need to use an external GPIO for
write protect.  Add support for wp-gpios to the core dw_mmc driver
since it could be useful across multiple SoCs.

With this change I am able to make use of the write protect for the
external SD slot on exynos5250-snow.

Signed-off-by: Doug Anderson 
---
Changes in v3: None
Changes in v2:
- Fixed return type from u32 to int
- Return -EINVAL instead of -1

 drivers/mmc/host/dw_mmc.c |   34 ++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index b47b1e9..8f8bac5 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dw_mmc.h"
 
@@ -75,6 +76,7 @@ struct idmac_desc {
  * @mmc: The mmc_host representing this slot.
  * @host: The MMC controller this slot is using.
  * @quirks: Slot-level quirks (DW_MCI_SLOT_QUIRK_XXX)
+ * @wp_gpio: If gpio_is_valid() we'll use this to read write protect.
  * @ctype: Card type for this slot.
  * @mrq: mmc_request currently being processed or waiting to be
  * processed, or NULL when the slot is idle.
@@ -90,6 +92,7 @@ struct dw_mci_slot {
struct dw_mci   *host;
 
int quirks;
+   int wp_gpio;
 
u32 ctype;
 
@@ -836,6 +839,8 @@ static int dw_mci_get_ro(struct mmc_host *mmc)
read_only = 0;
else if (brd->get_ro)
read_only = brd->get_ro(slot->id);
+   else if (gpio_is_valid(slot->wp_gpio))
+   read_only = gpio_get_value(slot->wp_gpio);
else
read_only =
mci_readl(slot->host, WRTPRT) & (1 << slot->id) ? 1 : 0;
@@ -1830,6 +1835,29 @@ static u32 dw_mci_of_get_bus_wd(struct device *dev, u8 
slot)
   " as 1\n");
return bus_wd;
 }
+
+/* find the write protect gpio for a given slot; or -1 if none specified */
+static int dw_mci_of_get_wp_gpio(struct device *dev, u8 slot)
+{
+   struct device_node *np = dw_mci_of_find_slot_node(dev, slot);
+   int gpio;
+
+   if (!np)
+   return -EINVAL;
+
+   gpio = of_get_named_gpio(np, "wp-gpios", 0);
+
+   /* Having a missing entry is valid; return silently */
+   if (!gpio_is_valid(gpio))
+   return -EINVAL;
+
+   if (devm_gpio_request(dev, gpio, "dw-mci-wp")) {
+   dev_warn(dev, "gpio [%d] request failed\n", gpio);
+   return -EINVAL;
+   }
+
+   return gpio;
+}
 #else /* CONFIG_OF */
 static int dw_mci_of_get_slot_quirks(struct device *dev, u8 slot)
 {
@@ -1843,6 +1871,10 @@ static struct device_node 
*dw_mci_of_find_slot_node(struct device *dev, u8 slot)
 {
return NULL;
 }
+static int dw_mci_of_get_wp_gpio(struct device *dev, u8 slot)
+{
+   return -EINVAL;
+}
 #endif /* CONFIG_OF */
 
 static int dw_mci_init_slot(struct dw_mci *host, unsigned int id)
@@ -1960,6 +1992,8 @@ static int dw_mci_init_slot(struct dw_mci *host, unsigned 
int id)
else
clear_bit(DW_MMC_CARD_PRESENT, >flags);
 
+   slot->wp_gpio = dw_mci_of_get_wp_gpio(host->dev, slot->id);
+
mmc_add_host(mmc);
 
 #if defined(CONFIG_DEBUG_FS)
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/4] mmc: dw_mmc: exynos: Remove code for wp-gpios

2012-11-29 Thread Doug Anderson
The exynos code claimed the write protect with devm_gpio_request() but
never did anything with it.  That meant that anyone using a write
protect GPIO would effectively be write protected all the time.

The handling for wp-gpios belongs in the main dw_mmc driver and has
been moved there.

Signed-off-by: Doug Anderson 
---
Changes in v3:
- Totally removed wp-gpios handling from exynos code.

Changes in v2: None

 drivers/mmc/host/dw_mmc-exynos.c |   10 --
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
index 4d50da6..72fd0f2 100644
--- a/drivers/mmc/host/dw_mmc-exynos.c
+++ b/drivers/mmc/host/dw_mmc-exynos.c
@@ -175,16 +175,6 @@ static int dw_mci_exynos_setup_bus(struct dw_mci *host,
}
}
 
-   gpio = of_get_named_gpio(slot_np, "wp-gpios", 0);
-   if (gpio_is_valid(gpio)) {
-   if (devm_gpio_request(host->dev, gpio, "dw-mci-wp"))
-   dev_info(host->dev, "gpio [%d] request failed\n",
-   gpio);
-   } else {
-   dev_info(host->dev, "wp gpio not available");
-   host->pdata->quirks |= DW_MCI_QUIRK_NO_WRITE_PROTECT;
-   }
-
if (host->pdata->quirks & DW_MCI_QUIRK_BROKEN_CARD_DETECTION)
return 0;
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/4] mmc: dw_mmc: Add "disable-wp" device tree property

2012-11-29 Thread Doug Anderson
The "disable-wp" property is used to specify that a given SD card slot
doesn't have a concept of write protect.  This eliminates the need for
special case code for SD slots that should never be write protected
(like a micro SD slot or a dev board).

The dw_mmc driver is special in needing to specify "disable-wp"
because the lack of a "wp-gpios" property means to use the special
purpose write protect line.  On some other mmc devices the lack of
"wp-gpios" means that write protect should be disabled.

Signed-off-by: Doug Anderson 
---
Changes in v3:
- New for this version of the patch series.  Chose "disable-wp" rather
  than the discussed "broken-internal-wp" since it mapped more cleanly
  to an existing quirk (and the only reason to specify that the
  internal wp is broken is if you're disabling the write protect
  anyway).

 .../devicetree/bindings/mmc/synopsis-dw-mshc.txt   |   12 +-
 drivers/mmc/host/dw_mmc.c  |   36 +++-
 include/linux/mmc/dw_mmc.h |4 ++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt 
b/Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt
index 06cd32d08..726fd21 100644
--- a/Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt
+++ b/Documentation/devicetree/bindings/mmc/synopsis-dw-mshc.txt
@@ -26,8 +26,16 @@ Required Properties:
* bus-width: as documented in mmc core bindings.
 
* wp-gpios: specifies the write protect gpio line. The format of the
- gpio specifier depends on the gpio controller. If the write-protect
- line is not available, this property is optional.
+ gpio specifier depends on the gpio controller. If a GPIO is not used
+ for write-protect, this property is optional.
+
+   * disable-wp: If the wp-gpios property isn't present then (by default)
+ we'd assume that the write protect is hooked up directly to the
+ controller's special purpose write protect line (accessible via
+ the WRTPRT register).  However, it's possible that we simply don't
+ want write protect.  In that case specify 'disable-wp'.
+ NOTE: This property is not required for slots known to always
+ connect to eMMC or SDIO cards.
 
 Optional properties:
 
diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 7342029..b47b1e9 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -74,6 +74,7 @@ struct idmac_desc {
  * struct dw_mci_slot - MMC slot state
  * @mmc: The mmc_host representing this slot.
  * @host: The MMC controller this slot is using.
+ * @quirks: Slot-level quirks (DW_MCI_SLOT_QUIRK_XXX)
  * @ctype: Card type for this slot.
  * @mrq: mmc_request currently being processed or waiting to be
  * processed, or NULL when the slot is idle.
@@ -88,6 +89,8 @@ struct dw_mci_slot {
struct mmc_host *mmc;
struct dw_mci   *host;
 
+   int quirks;
+
u32 ctype;
 
struct mmc_request  *mrq;
@@ -828,7 +831,8 @@ static int dw_mci_get_ro(struct mmc_host *mmc)
struct dw_mci_board *brd = slot->host->pdata;
 
/* Use platform get_ro function, else try on board write protect */
-   if (brd->quirks & DW_MCI_QUIRK_NO_WRITE_PROTECT)
+   if ((brd->quirks & DW_MCI_QUIRK_NO_WRITE_PROTECT) ||
+   (slot->quirks & DW_MCI_SLOT_QUIRK_NO_WRITE_PROTECT))
read_only = 0;
else if (brd->get_ro)
read_only = brd->get_ro(slot->id);
@@ -1788,6 +1792,30 @@ static struct device_node 
*dw_mci_of_find_slot_node(struct device *dev, u8 slot)
return NULL;
 }
 
+static struct dw_mci_of_slot_quirks {
+   char *quirk;
+   int id;
+} of_slot_quirks[] = {
+   {
+   .quirk  = "disable-wp",
+   .id = DW_MCI_SLOT_QUIRK_NO_WRITE_PROTECT,
+   },
+};
+
+static int dw_mci_of_get_slot_quirks(struct device *dev, u8 slot)
+{
+   struct device_node *np = dw_mci_of_find_slot_node(dev, slot);
+   int quirks = 0;
+   int idx;
+
+   /* get quirks */
+   for (idx = 0; idx < ARRAY_SIZE(of_slot_quirks); idx++)
+   if (of_get_property(np, of_slot_quirks[idx].quirk, NULL))
+   quirks |= of_slot_quirks[idx].id;
+
+   return quirks;
+}
+
 /* find out bus-width for a given slot */
 static u32 dw_mci_of_get_bus_wd(struct device *dev, u8 slot)
 {
@@ -1803,6 +1831,10 @@ static u32 dw_mci_of_get_bus_wd(struct device *dev, u8 
slot)
return bus_wd;
 }
 #else /* CONFIG_OF */
+static int dw_mci_of_get_slot_quirks(struct device *dev, u8 slot)
+{
+   return 0;
+}
 static u32 dw_mci_of_get_bus_wd(struct device *dev, u8 slot)
 {
return 1;
@@ -1831,6 +1863,8 @@ static int dw_mci_init_slot(struct dw_mci *host, unsigned 
int id)
slot->host = host;
host->slot[id] = slot;
 
+   

Re: [PATCH v2 1/2] mmc: dw_mmc: exynos: Stop claiming wp-gpio

2012-11-29 Thread Doug Anderson
Seungwon,


On Wed, Nov 28, 2012 at 11:46 PM, Seungwon Jeon  wrote:
> Hi Doug,
>
> On Thursday, November 29, 2012, Doug Anderson wrote:
>> Seungwon,
>>
>> Thanks for the review.  See below for comments.  If you'd like me to
>> respin then please let me know.  Otherwise I look forward to your ack.
>>
>> On Wed, Nov 28, 2012 at 1:29 AM, Seungwon Jeon  wrote:
>> > Yes. pin of write protection is common property.
>> > This change is good. I have some suggestion below.
>> > Could you check it?
>> >
>> > On Friday, November 23, 2012, Doug Anderson wrote:
>> >> The exynos code claimed wp-gpio with devm_gpio_request() but never did
>> >> anything with it.  That meant that anyone using a write protect GPIO
>> >> would effectively be write protected all the time.
>> >>
>> >> A future change will move the wp-gpio support to the core dw_mmc.c
>> >> file.  Now the exynos-specific code won't claim the GPIO but will
>> >> just set the DW_MCI_QUIRK_NO_WRITE_PROTECT quirk if write protect
>> >> won't be used.
>> >>
>> >> Signed-off-by: Doug Anderson 
>> >>
>> >> ---
>> >> Changes in v2:
>> >> - Nothing new in this patch
>> >>
>> >>  drivers/mmc/host/dw_mmc-exynos.c |   12 ++--
>> >>  1 files changed, 6 insertions(+), 6 deletions(-)
>> >>
>> >> diff --git a/drivers/mmc/host/dw_mmc-exynos.c 
>> >> b/drivers/mmc/host/dw_mmc-exynos.c
>> >> index 4d50da6..58cc03e 100644
>> >> --- a/drivers/mmc/host/dw_mmc-exynos.c
>> >> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>> >> @@ -175,12 +175,12 @@ static int dw_mci_exynos_setup_bus(struct dw_mci 
>> >> *host,
>> >>   }
>> >>   }
>> >>
>> >> - gpio = of_get_named_gpio(slot_np, "wp-gpios", 0);
>> >> - if (gpio_is_valid(gpio)) {
>> >> - if (devm_gpio_request(host->dev, gpio, "dw-mci-wp"))
>> >> - dev_info(host->dev, "gpio [%d] request failed\n",
>> >> - gpio);
>> >> - } else {
>> >> + /*
>> >> +  * If there are no write-protect GPIOs present then we assume no 
>> >> write
>> >> +  * protect.  The mci_readl() in dw_mmc.c won't work since it's not
>> >> +  * hooked up on exynos.
>> >> +  */
>> >> + if (!of_find_property(slot_np, "wp-gpios", NULL)) {
>> >>   dev_info(host->dev, "wp gpio not available");
>> >>   host->pdata->quirks |= DW_MCI_QUIRK_NO_WRITE_PROTECT;
>> >>   }
>> > All card types need this quirk in case wp-gpio property is empty?
>> > I think wp-pin is valid for SD card, not eMMC/SDIO.
>>
>> Right.  It is only checked right now by the SD code (mmc/core/sd.c).
>> It doesn't particularly hurt to set it the quirk in other cases though
>> and it seems nice not to add special cases.  I could imagine someone
>> extending the MMC code at some point to support write protect (via
>> GPIO) for eMMC, so there's even a slight justification for avoiding
>> the special case.
>>
>>
>> > Of course, I know origin code did it.
>> > How about removing whole checking routine?
>> > Instead, new definition for this quirk can be added into 
>> > 'dw_mci_of_quirks'(dw_mmc.c) and dts file.
>>
>> On _exynos_ all SD cards need this quirk if there is no wp-gpio
>> property.  However this is not generally true for all users of dw_mmc.
>>  The DesignWare IP Block actually has a write protect input that can
>> be read with "mci_readl(slot->host, WRTPRT)" but on exynos the
>> DesignWare write protect line isn't exposed on any physical pins.
>> That means that the only possible way to do write protect on exynos is
>> using a GPIO.
>>
>> The above means that on exynos if the GPIO isn't defined we will
>> assume no write protect.  On other platforms if the GPIO isn't defined
>> we'll assume that the "mci_readl" will work and we'll use that.
>>
>> If people would prefer it I can code up an alternate solution that
>> doesn't touch any exynos code but that would introduce a new device
>> tree binding.  We could accomplish what's needed for exynos using a
>> property like "broken-internal-wp".
>>
>> Please let me know if you'd like me to submit a new patch with this
>> solution or if you like the existing solution.
>>
> Write protect is additional interface related with SD socket.
> WP switch appears in SD standard size card.
> In case EMMC/SDIO spec, there is no mentions about this WP pin.
> As you mentioned above, that's why 'ger_ro' is called only in sd 
> path(mmc/core/sd.c).
> So, I meant that we don't need to consider WP pin status about non-SD type.

Ah, I understand now.  This is a good point.  I have updated the
documentation in the latest patch to mention this.  Thanks!


>
> Such as exynos5250, there is no exposed interface from host controller for 
> write protection pin.
> In that case, if general gpio pin is connected like your board environment, 
> we can define wp-gpio.
> Otherwise, 'broken-internal-wp' property will be good solution.

Latest patch (just about to send out) adds a per-slot "disable-wp"
property for dw_mmc.  See the patch for 

[PATCH]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings

2012-11-29 Thread Wang YanQing
I get a board with 8168e-vl(10ec:8168 with RTL_GIGA_MAC_VER_34),
everything looks well first, I can use ifconfig to set ip, netmask,
etc. And the rx/tx statistics show by ifconfig looks good when I
ping another host or ping it from another host. But it don't work,
I can't get ICMP REPLAY from both sides, although the RX/TX statistics
seem good.

After add some debug code, I found this NIC only accept ethernet
broadcast package, it can't filter out the package send to its
MAC address, but it works good for sending.So ifconfig show the
RX/TX status means it can receive ARP package.(It don't its MAC
address, so below)

I have try the driver provided by realtek's website, it have the
same problem at the first time. BUT IT WORK AFTER I REBOOT with
CRTL-ALT-DEL, the reason is that realtek's driver call rtl8168_rar_set
in the .shutdown function register with pci_register_driver. Yes,
the really reason to make it work is rtl8689_rar_set, this function
set extended GigaMAC registers, so after reboot without lost the power,
NIC keep the status before reboot.

I haven't see any code to set GigaMAC registers in kernel when boot,
so I guess BIOS or NIC's circuit make it, but of course one miss
the extended GigaMAC registers  in this problem. The probe code can
get MAC address right, so MAC{0,4} must had been setted, but some
guys forget the extended GigaMAC registers.

This patch fix it.
[ I don't known whether others' realtek's NIC with extended GigaMAC
reigisters have the same problem, I meet it in 8168e-vl with
RTL_GIGA_MAC_VER_34, so I make this patch just for it.]

Signed-off-by: Wang YanQing 
---
 drivers/net/ethernet/realtek/r8169.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 927aa33..e49c08d 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3095,7 +3095,30 @@ static void rtl8168e_1_hw_phy_config(struct 
rtl8169_private *tp)
rtl_writephy(tp, 0x0e, 0x);
rtl_writephy(tp, 0x0d, 0x);
 }
+static void rtl8168e_2_workaround(struct rtl8169_private *tp, u8 *addr)
+{
+   void __iomem *ioaddr = tp->mmio_addr;
+   u32 high;
+   u32 low;
+
+   low  = addr[0] | (addr[1] << 8) | (addr[2] << 16) | (addr[3] << 24);
+   high = addr[4] | (addr[5] << 8);
+
+
+   RTL_W8(Cfg9346, Cfg9346_Unlock);
+   if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+   const struct exgmac_reg e[] = {
+   { .addr = 0xe0, ERIAR_MASK_, .val = low },
+   { .addr = 0xe4, ERIAR_MASK_, .val = high },
+   { .addr = 0xf0, ERIAR_MASK_, .val = low << 16 },
+   { .addr = 0xf4, ERIAR_MASK_, .val = high << 16 |
+   low  >> 16 },
+   };
 
+   rtl_write_exgmac_batch(tp, e, ARRAY_SIZE(e));
+   }
+   RTL_W8(Cfg9346, Cfg9346_Lock);
+}
 static void rtl8168e_2_hw_phy_config(struct rtl8169_private *tp)
 {
static const struct phy_reg phy_reg_init[] = {
@@ -3178,6 +3201,7 @@ static void rtl8168e_2_hw_phy_config(struct 
rtl8169_private *tp)
rtl_w1w0_phy(tp, 0x19, 0x, 0x0001);
rtl_w1w0_phy(tp, 0x10, 0x, 0x0400);
rtl_writephy(tp, 0x1f, 0x);
+   rtl8168e_2_workaround(tp, tp->dev->dev_addr);
 }
 
 static void rtl8168f_hw_phy_config(struct rtl8169_private *tp)
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/14] perf tool: Centralize default columns init in perf_hpp__init

2012-11-29 Thread Namhyung Kim
Hi Jiri,

On Thu, 29 Nov 2012 13:13:19 +0100, Jiri Olsa wrote:
> On Thu, Nov 29, 2012 at 08:55:47PM +0900, Namhyung Kim wrote:
>> On Wed, 28 Nov 2012 14:52:45 +0100, Jiri Olsa wrote:
>> > Now when diff command is separated from other standard outputs,
>> > we can use perf_hpp__init to initialize all standard columns.
>> >
>> > Moving PERF_HPP__OVERHEAD column init back to perf_hpp__init,
>> > and removing extra enable calls.
>> 
>> Why was this needed in the first place?  AFAIK it's already there and
>> didn't used only for perf diff.
>
> hm, I think PERF_HPP__OVERHEAD wasn't part of perf_hpp__init and every
> report except for diff command is using it.. so I think it makes sense
> to move it to perf_hpp__init.. maybe I'm missing something.

You're right.  The _OVERHEAD column was enabled by default but wasn't
part of the _init function - sorry for the confusion.  But what I try to
say was that it can be folded into the patch 1.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for potential undercommit case

2012-11-29 Thread Raghavendra K T

On 11/29/2012 05:46 PM, Gleb Natapov wrote:

On Wed, Nov 28, 2012 at 10:40:56AM +0530, Raghavendra K T wrote:

On 11/28/2012 06:42 AM, Marcelo Tosatti wrote:


Don't understand the reasoning behind why 3 is a good choice.


Here is where I came from. (explaining from scratch for
completeness, forgive me :))
In moderate overcommits, we can falsely exit from ple handler even when
we have preempted task of same VM waiting on other cpus. To reduce this
problem, we try few times before exiting.
The problem boils down to:
what is the probability that we exit ple handler even when we have more
than 1 task in other cpus. Theoretical worst case should be around 1.5x
overcommit (As also pointed by Andrew Theurer). [But practical
worstcase may be around 2x,3x overcommits as indicated by the results
for the patch series]

So if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

  p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries: probability of aborting ple handler (1.5x overcommit)
  1 1/4
  2 1/8
  3 1/16

We can increase this probability with more tries, but the problem is
the overhead.

IIRC Avi (again) had an idea to track vcpu preemption. When vcpu thread
is preempted we do kvm->preempted_vcpus++, when it runs again we do
kvm->preempted_vcpus--. PLE handler can try harder if kvm->preempted_vcpus
is big or do not try at all if it is zero.


Thanks for the reply Gleb.

Yes.. It was on my next TODO as you know and it make sense to weigh all 
these approaches (undercommit patches/throttled yield/preempt
notifier/pvspinlock and their combination) to good extent before going 
further. I am happy if these patches are now in 'good shape to compare'

state. (same reason I had posted dynamic PLE appaoch too).




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] ARM: tegra: bus_notifier registers IOMMU devices(was: How to specify IOMMU'able devices in DT)

2012-11-29 Thread Mark Zhang
On 11/28/2012 09:48 PM, Hiroshi Doyu wrote:
> Hiroshi Doyu  wrote @ Mon, 24 Sep 2012 14:50:14 +0300 
> (EEST):
> ...
> On Mon, 2012-09-24 at 12:04 +0300, Hiroshi Doyu wrote:
>> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
>> index a1a7225..9eae3be 100644
>> --- a/drivers/base/platform.c
>> +++ b/drivers/base/platform.c
>> @@ -21,6 +21,8 @@
>>  #include 
>>  #include 
>>
>> +#include 
>> +
>>  #include "base.h"
>>
>>  #define to_platform_driver(drv)(container_of((drv), struct
>> platform_driver, \
>> @@ -305,8 +307,19 @@ int platform_device_add(struct platform_device
>> *pdev)
>>  dev_name(>dev), dev_name(pdev->dev.parent));
>>
>> ret = device_add(>dev);
>> -   if (ret == 0)
>> -   return ret;
>> +   if (ret)
>> +   goto failed;
>> +
>> +#ifdef CONFIG_PLATFORM_ENABLE_IOMMU
>> +   if (platform_bus_type.map && !pdev->dev.archdata.mapping) {
>> +   ret = arm_iommu_attach_device(>dev,
>> + platform_bus_type.map);
>> +   if (ret)
>> +   goto failed;
>
> This is horrible ... you're adding an architecture specific callback
> into our generic code; that's really a no-no.  If the concept of
> CONFIG_PLATFORM_ENABE_IOMMU is useful to more than just arm, then this
> could become a generic callback.

 As mentioned in the original, this is a heck to explain what is
 needed. I am looking for some generic solution for how to specify
 IOMMU info for each platform devices. I'm guessing that some other SoC
 may have the similar requirements on the above. As you mentioned, this
 solution should be a generic, not arch specific.
>>>
>>> Please read more about bus notifiers. IMHO a good example is provided in 
>>> the following thread:
>>> http://www.mail-archive.com/linux-samsung-soc@vger.kernel.org/msg12238.html
>>
>> This bus notifier seems enough flexible to afford the variation of
>> IOMMU map info, like Tegra ASID, which could be platform-specific, and
>> the other could be common too. There's already iommu_bus_notifier
>> too. I'll try to implement something base on this.
> 
> Experimentally implemented as below. With the followig patch, each
> device could specify its own map in DT, and automatically the device
> would be attached to the map.
> 
> There is a case that some devices share a map. This patch doesn't
> suppor such case yet.
> 
> From 8cb75bb6f3a8535a077e0e85265f87c1f1289bfd Mon Sep 17 00:00:00 2001
> From: Hiroshi Doyu 
> Date: Wed, 28 Nov 2012 14:47:04 +0200
> Subject: [PATCH 1/1] ARM: tegra: bus_notifier registers IOMMU devices
> 
> platform_bus notifier registers IOMMU devices if dma-window is
> specified.
> 
> Its format is:
>   dma-window = <"start" "size">;
> ex)
>   dma-window = <0x12345000 0x8000>;
> 
> Signed-off-by: Hiroshi Doyu 
> ---
>  arch/arm/mach-tegra/board-dt-tegra30.c |   40 
> 
>  1 file changed, 40 insertions(+)
> 
> diff --git a/arch/arm/mach-tegra/board-dt-tegra30.c 
> b/arch/arm/mach-tegra/board-dt-tegra30.c
> index a2b6cf1..570d718 100644
> --- a/arch/arm/mach-tegra/board-dt-tegra30.c
> +++ b/arch/arm/mach-tegra/board-dt-tegra30.c
> @@ -30,9 +30,11 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> +#include 
>  
>  #include "board.h"
>  #include "clock.h"
> @@ -86,10 +88,48 @@ static __initdata struct tegra_clk_init_table 
> tegra_dt_clk_init_table[] = {
>   { NULL, NULL,   0,  0},
>  };
>  
> +#ifdef CONFIG_ARM_DMA_USE_IOMMU
> +static int tegra_iommu_device_notifier(struct notifier_block *nb,
> +unsigned long event, void *_dev)
> +{
> + struct dma_iommu_mapping *map = NULL;
> + struct device *dev = _dev;
> + dma_addr_t base;
> + size_t size;
> + int err;
> +
> + switch (event) {
> + case BUS_NOTIFY_ADD_DEVICE:
> + err = of_get_dma_window(dev->of_node, NULL, 0, NULL, ,
> + );
> + if (!err)
> + map = arm_iommu_create_mapping(_bus_type,
> +base, size, 0);
> + if (IS_ERR_OR_NULL(map))
> + break;
> + if (arm_iommu_attach_device(dev, map))

Add "arm_iommu_release_mapping" here.

And finally we see this patch, that's great. :)

> + dev_err(dev, "Failed to attach %s\n", dev_name(dev));
> + dev_dbg(dev, "Attached %s to map %p\n", dev_name(dev), map);
> + break;
> + }
> + return NOTIFY_DONE;
> +}
> +#else
> +#define tegra_iommu_device_notifier NULL
> +#endif
> +
> +static struct notifier_block tegra_iommu_device_nb = {
> + .notifier_call = tegra_iommu_device_notifier,

[PATCH] OMAPDSS: Add terminating entry for picodlp_i2c_id table

2012-11-29 Thread Axel Lin
The i2c_device_id table is supposed to be zero-terminated.

Signed-off-by: Axel Lin 
---
 drivers/video/omap2/displays/panel-picodlp.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/video/omap2/displays/panel-picodlp.c 
b/drivers/video/omap2/displays/panel-picodlp.c
index e3a6c19..1b94018 100644
--- a/drivers/video/omap2/displays/panel-picodlp.c
+++ b/drivers/video/omap2/displays/panel-picodlp.c
@@ -50,6 +50,7 @@ struct picodlp_i2c_data {
 
 static struct i2c_device_id picodlp_i2c_id[] = {
{ "picodlp_i2c_driver", 0 },
+   { }
 };
 
 struct picodlp_i2c_command {
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the kvm tree with the tip tree

2012-11-29 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in
kernel/sched/core.c between commit 0a74bef8bed1 ("sched: Add an rq
migration call-back to sched_class") from the tip tree and commit
582b336ec2c0 ("sched: add notifier for cross-cpu migrations") from the
kvm tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc kernel/sched/core.c
index 20fb760,c86b8b6..000
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@@ -953,10 -959,16 +960,18 @@@ void set_task_cpu(struct task_struct *p
trace_sched_migrate_task(p, new_cpu);
  
if (task_cpu(p) != new_cpu) {
+   struct task_migration_notifier tmn;
+ 
 +  if (p->sched_class->migrate_task_rq)
 +  p->sched_class->migrate_task_rq(p, new_cpu);
p->se.nr_migrations++;
perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, NULL, 0);
+ 
+   tmn.task = p;
+   tmn.from_cpu = task_cpu(p);
+   tmn.to_cpu = new_cpu;
+ 
+   atomic_notifier_call_chain(_migration_notifier, 0, );
}
  
__set_task_cpu(p, new_cpu);


pgpSlRark0gT8.pgp
Description: PGP signature


[PATCH 2/2] mac802154: use kfree_skb() instead of dev_kfree_skb()

2012-11-29 Thread Alan Ott
kfree_skb() indicates failure, which is where this is being used.

Signed-off-by: Alan Ott 
---
 net/mac802154/tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index db63914..4e09d07 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -99,7 +99,7 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct 
sk_buff *skb,
}
 
if (skb_cow_head(skb, priv->hw.extra_tx_headroom)) {
-   dev_kfree_skb(skb);
+   kfree_skb(skb);
return NETDEV_TX_OK;
}
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] mac802154: fix memory leaks

2012-11-29 Thread Alan Ott
kfree_skb() was not getting called in the case of some failures.
This was pointed out by Eric Dumazet.

Signed-off-by: Alan Ott 
---
 net/mac802154/tx.c   | 5 -
 net/mac802154/wpan.c | 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 1a4df39..db63914 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -85,6 +85,7 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct 
sk_buff *skb,
 
if (!(priv->phy->channels_supported[page] & (1 << chan))) {
WARN_ON(1);
+   kfree_skb(skb);
return NETDEV_TX_OK;
}
 
@@ -103,8 +104,10 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, 
struct sk_buff *skb,
}
 
work = kzalloc(sizeof(struct xmit_work), GFP_ATOMIC);
-   if (!work)
+   if (!work) {
+   kfree_skb(skb);
return NETDEV_TX_BUSY;
+   }
 
INIT_WORK(>work, mac802154_xmit_worker);
work->skb = skb;
diff --git a/net/mac802154/wpan.c b/net/mac802154/wpan.c
index f30f6d4..1191039 100644
--- a/net/mac802154/wpan.c
+++ b/net/mac802154/wpan.c
@@ -327,8 +327,10 @@ mac802154_wpan_xmit(struct sk_buff *skb, struct net_device 
*dev)
 
if (chan == MAC802154_CHAN_NONE ||
page >= WPAN_NUM_PAGES ||
-   chan >= WPAN_NUM_CHANNELS)
+   chan >= WPAN_NUM_CHANNELS) {
+   kfree_skb(skb);
return NETDEV_TX_OK;
+   }
 
skb->skb_iif = dev->ifindex;
dev->stats.tx_packets++;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2 6/6] cgroup: remove css_get_next

2012-11-29 Thread Kamezawa Hiroyuki
(2012/11/27 3:47), Michal Hocko wrote:
> Now that we have generic and well ordered cgroup tree walkers there is
> no need to keep css_get_next in the place.
> 
> Signed-off-by: Michal Hocko 

Hm, then, the next think will be css_is_ancestor() etc..

Acked-by: KAMEZAWA Hiroyuki 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Al Viro
On Thu, Nov 29, 2012 at 06:33:53PM -0800, Patrick McLean wrote:

> Excellent, thanks. Is there any chance this will make it to 3.7? Also we 
> might want to cc stable@ on this as well since it is a regression in 3.6.

Definitely.  I've dropped that into vfs.git#for-linus and vfs.git#for-next
and tomorrow to Linus it goes...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2 5/6] memcg: further simplify mem_cgroup_iter

2012-11-29 Thread Kamezawa Hiroyuki
(2012/11/27 3:47), Michal Hocko wrote:
> mem_cgroup_iter basically does two things currently. It takes care of
> the house keeping (reference counting, raclaim cookie) and it iterates
> through a hierarchy tree (by using cgroup generic tree walk).
> The code would be much more easier to follow if we move the iteration
> outside of the function (to __mem_cgrou_iter_next) so the distinction
> is more clear.
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko 

Very nice look !

Acked-by: KAMEZAWA Hiroyuki 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2 4/6] memcg: simplify mem_cgroup_iter

2012-11-29 Thread Kamezawa Hiroyuki
(2012/11/27 3:47), Michal Hocko wrote:
> Current implementation of mem_cgroup_iter has to consider both css and
> memcg to find out whether no group has been found (css==NULL - aka the
> loop is completed) and that no memcg is associated with the found node
> (!memcg - aka css_tryget failed because the group is no longer alive).
> This leads to awkward tweaks like tests for css && !memcg to skip the
> current node.
> 
> It will be much easier if we got rid off css variable altogether and
> only rely on memcg. In order to do that the iteration part has to skip
> dead nodes. This sounds natural to me and as a nice side effect we will
> get a simple invariant that memcg is always alive when non-NULL and all
> nodes have been visited otherwise.
> 
> We could get rid of the surrounding while loop but keep it in for now to
> make review easier. It will go away in the following patch.
> 
> Signed-off-by: Michal Hocko 

Acked-by: KAMEZAWA Hiroyuki 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2 3/6] memcg: rework mem_cgroup_iter to use cgroup iterators

2012-11-29 Thread Kamezawa Hiroyuki
(2012/11/27 3:47), Michal Hocko wrote:
> mem_cgroup_iter curently relies on css->id when walking down a group
> hierarchy tree. This is really awkward because the tree walk depends on
> the groups creation ordering. The only guarantee is that a parent node
> is visited before its children.
> Example
>   1) mkdir -p a a/d a/b/c
>   2) mkdir -a a/b/c a/d
> Will create the same trees but the tree walks will be different:
>   1) a, d, b, c
>   2) a, b, c, d
> 
> 574bd9f7 (cgroup: implement generic child / descendant walk macros) has
> introduced generic cgroup tree walkers which provide either pre-order
> or post-order tree walk. This patch converts css->id based iteration
> to pre-order tree walk to keep the semantic with the original iterator
> where parent is always visited before its subtree.
> 
> cgroup_for_each_descendant_pre suggests using post_create and
> pre_destroy for proper synchronization with groups addidition resp.
> removal. This implementation doesn't use those because a new memory
> cgroup is fully initialized in mem_cgroup_create and css reference
> counting enforces that the group is alive for both the last seen cgroup
> and the found one resp. it signals that the group is dead and it should
> be skipped.
> 
> If the reclaim cookie is used we need to store the last visited group
> into the iterator so we have to be careful that it doesn't disappear in
> the mean time. Elevated reference count on the css keeps it alive even
> though the group have been removed (parked waiting for the last dput so
> that it can be freed).
> 
> V2
> - use css_{get,put} for iter->last_visited rather than
>mem_cgroup_{get,put} because it is stronger wrt. cgroup life cycle
> - cgroup_next_descendant_pre expects NULL pos for the first iterartion
>otherwise it might loop endlessly for intermediate node without any
>children.
> 
> Signed-off-by: Michal Hocko 

Acked-by: KAMEZAWA Hiroyuki 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH] fs/super.c set_anon_super calling optimization

2012-11-29 Thread Al Viro
On Fri, Oct 26, 2012 at 11:14:41AM -0200, Carlos Maiolino wrote:
> Hi,
> 
> On Thu, Oct 25, 2012 at 05:08:19PM +0530, Abhijit Pawar wrote:
> > Hi,
> > set_anon_super is called by many filesystems. Some call directly and
> > some call through the wrapper. Many of them in the wrapper's call to
> > this function are passing the second argument to this function which
> > is not used anywhere.
> > 
> > This patch replaces the second variable with NULL.
> > 
> 
> If the variable isn't used anymore, why don't just get rid of it, instead of
> call the function passing a NULL pointer on it?

Because we want it to be a valid sget() callback.  I doubt that this
optimization is worth doing, though - might even micro-pessimize the things
on architectures where all arguments are passed in registers.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bisected regression: iterate_fd() selinux change affects flash plugin

2012-11-29 Thread Al Viro
On Fri, Nov 30, 2012 at 03:40:34AM +, Al Viro wrote:

> The bug is real, but Pavel's patch is all wrong.  The problem is in the
> argument; we should pass descriptor number, not descriptor + 1.  And fixing
> that (in iterator_fd() itself) makes all callbacks work as they ought to.
> 
> PS: Pavel, the life is painful enough as it is, no need to involve BZ into
> it.  Next time you need to post a patch, please do just that, especially
> when it's so short, OK?

See tonight's vfs.git#for-linus; the head is at commit a77cfcb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Lin Feng
hi Andrew,

On 11/30/2012 07:39 AM, Andrew Morton wrote:
> Tricky.
> 
> I expect the same problem would occur with pages which are under
> O_DIRECT I/O.  Obviously O_DIRECT pages won't be pinned for such long
> periods, but the durations could still be lengthy (seconds).
the offline retry timeout duration is 2 minutes, so to O_DIRECT pages 
seem maybe not a problem for the moment.
> 
> Worse is a futex page, which could easily remain pinned indefinitely.
> 
> The best I can think of is to make changes in or around
> get_user_pages(), to steal the pages from userspace and replace them
> with non-movable ones before pinning them.  The performance cost of
> something like this would surely be unacceptable for direct-io, but
> maybe OK for the aio ring and futexes.
thanks for your advice.
I want to limit the impact as little as possible, as mentioned above,
direct-io seems not a problem, we needn't touch them. Maybe we can 
just change the use of get_user_pages()(in or around) such as aio 
ring pages. I will try to find a way to do this.

Thanks,
linfeng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bisected regression: iterate_fd() selinux change affects flash plugin

2012-11-29 Thread Al Viro
On Fri, Nov 16, 2012 at 02:58:46PM -0500, Eric Paris wrote:
> On Mon, Nov 12, 2012 at 11:57 AM, Pavel Roskin  wrote:
> > Quoting Eric Paris :
> >
> >> OMG this +1 -1 stuff is nuts...
> 
> Ping, Al.
> 
> int iterate_fd(struct files_struct *files, unsigned n,
> [snip]
> while (!res && n < fdt->max_fds) {
> file = rcu_dereference_check_fdtable(files, fdt->fd[n++]);
> if (file)
> res = f(p, file, n);
> }
> spin_unlock(>file_lock);
> return res;
> 
> So we increment n (the file descriptor number) in the dereference,
> then pass that (wrong) number to f().
> 
> Every single f() (including SELinux, the cause of this bug) returns
> fd+1 (so now we are up by 2).  Then all of the users of iterate fd
> actually use fd-1 (which is wrong)
> 
> Why not have iterate_fd return -ENOENT on no entries and stop all of
> the stupid games?  We fix the real bug (the above function should do
> the n++ after the f() call, and the interface is sane to design
> against...

Because we might bloody well want to have "run some test on all opened
files, return the first error".  And -ENOENT is quite possible one.
Moreover, -ENOENT for "everything's OK, keep going" would be really
weird.

The bug is real, but Pavel's patch is all wrong.  The problem is in the
argument; we should pass descriptor number, not descriptor + 1.  And fixing
that (in iterator_fd() itself) makes all callbacks work as they ought to.

PS: Pavel, the life is painful enough as it is, no need to involve BZ into
it.  Next time you need to post a patch, please do just that, especially
when it's so short, OK?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Btrfs: refactor error handling to drop inode in btrfs_create()

2012-11-29 Thread Filipe Brandenburger
Refactor it by checking whether the inode has been created and needs to be
dropped (drop_inode_on_err) and also if the err variable is set. That way the
variable doesn't need to be set on each and every error handling block.

Signed-off-by: Filipe Brandenburger 
---
 fs/btrfs/inode.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index caf9d76..1d66c9e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4963,7 +4963,7 @@ static int btrfs_create(struct inode *dir, struct dentry 
*dentry,
struct btrfs_trans_handle *trans;
struct btrfs_root *root = BTRFS_I(dir)->root;
struct inode *inode = NULL;
-   int drop_inode = 0;
+   int drop_inode_on_err = 0;
int err;
unsigned long nr = 0;
u64 objectid;
@@ -4989,18 +4989,15 @@ static int btrfs_create(struct inode *dir, struct 
dentry *dentry,
err = PTR_ERR(inode);
goto out_unlock;
}
+   drop_inode_on_err = 1;
 
err = btrfs_init_inode_security(trans, inode, dir, >d_name);
-   if (err) {
-   drop_inode = 1;
+   if (err)
goto out_unlock;
-   }
 
err = btrfs_update_inode(trans, root, inode);
-   if (err) {
-   drop_inode = 1;
+   if (err)
goto out_unlock;
-   }
 
/*
* If the active LSM wants to access the inode during
@@ -5013,17 +5010,17 @@ static int btrfs_create(struct inode *dir, struct 
dentry *dentry,
 
err = btrfs_add_nondir(trans, dir, dentry, inode, 0, index);
if (err)
-   drop_inode = 1;
-   else {
-   inode->i_mapping->a_ops = _aops;
-   inode->i_mapping->backing_dev_info = >fs_info->bdi;
-   BTRFS_I(inode)->io_tree.ops = _extent_io_ops;
-   d_instantiate(dentry, inode);
-   }
+   goto out_unlock;
+
+   inode->i_mapping->a_ops = _aops;
+   inode->i_mapping->backing_dev_info = >fs_info->bdi;
+   BTRFS_I(inode)->io_tree.ops = _extent_io_ops;
+   d_instantiate(dentry, inode);
+
 out_unlock:
nr = trans->blocks_used;
btrfs_end_transaction(trans, root);
-   if (drop_inode) {
+   if (err && drop_inode_on_err) {
inode_dec_link_count(inode);
iput(inode);
}
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Btrfs: fix permissions of empty files not affected by umask

2012-11-29 Thread Filipe Brandenburger
When a new file is created with btrfs_create(), the inode will initially be
created with permissions 0666 and later on in btrfs_init_acl() it will be
adapted to mask out the umask bits. The problem is that this change won't make
it into the btrfs_inode unless there's another change to the inode (e.g. writing
content changing the size or touching the file changing the mtime.)

This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure that
the change will not get lost if the in-memory inode is flushed before other
changes are made to the file.

Signed-off-by: Filipe Brandenburger 
---
 fs/btrfs/inode.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95542a1..caf9d76 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4996,6 +4996,12 @@ static int btrfs_create(struct inode *dir, struct dentry 
*dentry,
goto out_unlock;
}
 
+   err = btrfs_update_inode(trans, root, inode);
+   if (err) {
+   drop_inode = 1;
+   goto out_unlock;
+   }
+
/*
* If the active LSM wants to access the inode during
* d_instantiate it needs these. Smack checks to see
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] Btrfs: fix mode umasking on empty files

2012-11-29 Thread Filipe Brandenburger
Hi,

This set of patches fix bug #50861:
"Btrfs sometimes ignore umask and create world writable files"
https://bugzilla.kernel.org/show_bug.cgi?id=50861

It turns out that btrfs_create() will create an inode with permissions 0666 and
these will be changed to match the umask inside btrfs_init_acl() (and only in
cases where the ACL doesn't mandate which permissions the file should have.) The
changes are made to the "struct inode" but it's not marked as dirty, so these
changes will only propagate to the "struct btrfs_inode" if other changes are
made to the inode (e.g. by writing content and changing the size or by using
"touch" and changing the mtime, both of which will mark the inode as dirty.)

I fixed this issue by adding a call to btrfs_update_inode() in btrfs_create().
I believe this might be an acceptable solution since the same is already done on
most other system calls such as "mkdir" or "symlink".

An alternative might be applying the umask earlier, before the call to
btrfs_new_inode(), that way the inode would be created with the right permission
bits from the beginning, but that might either involve checking the ACLs before
creating the inode (which might need a rework of btrfs_init_acl()) or umasking
the bits unconditionally, but I guess there's a reason to apply that logic...

The first patch fixes the issue, the second patch refactors the code to avoid
the repetition of setting the flag variable on every error handling block.

Cheers,
Filipe


Filipe Brandenburger (2):
  Btrfs: fix permissions of empty files not affected by umask
  Btrfs: refactor error handling to drop inode in btrfs_create()

 fs/btrfs/inode.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Lin Feng
Hi Zach,

Thanks for your advice. So agree, I will look into it to lead aio 
to use non-movable pages.


Thanks,
linfeng
On 11/30/2012 08:04 AM, Zach Brown wrote:
>> The best I can think of is to make changes in or around
>> get_user_pages(), to steal the pages from userspace and replace them
>> with non-movable ones before pinning them.  The performance cost of
>> something like this would surely be unacceptable for direct-io, but
>> maybe OK for the aio ring and futexes.
> 
> In the aio case it seems like it could be taught to populate the mapping
> with non-movable pages to begin with.  It's calling get_user_pages() a
> few lines after instantiating the mapping itself with do_mmap_pgoff().
> 
> - z
> 

-- 
--
Lin Feng
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road,
Nanjing, 210012, China
PHONE:+86-25-86630566-8557 
COINS:7998-8557 
FAX:+86-25-83317685
MAIL:linf...@cn.fujitsu.com
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-3.6.7-rt18 smoke test on ARM

2012-11-29 Thread Frank Rowand
3.6.7-rt18 builds and boots on some ARM boards (PandaBoard, Realview) for:

  - SMP, PREEMPT_RT_FULL
  - SMP, PREEMPT_NONE
  - UP, PREEMPT_RT_FULL
  - UP, PREEMPT_NONE


A patch is required to resolve a BUG_ON() from preempt_schedule_irq().
The patch is available at https://lkml.org/lkml/2012/11/29/432

The PandaBoard frequently fails to boot with an eth0 error.  I have
only seen this for SMP, PREEMPT_NONE.  This same error also occurs
in 3.6.7 without the RT_PREEMPT patches applied, so this does not
appear to be an RT_PREEMPT issue.  The boot error starts with the
time out on ep0out:

[3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
[3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
[3.275543] smsc95xx v1.0.4
[8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at 
usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
[8.091003] hub 1-1:1.0: state 7 ports 5 chg  evt 0002
[   13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
[   13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 
0x0108
[   13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
[   13.529998] IP-Config: Failed to open eth0


-Frank

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-29 Thread Bjorn Helgaas
[+cc Jeff, linux-ide, David, Joerg, iommu]

On Thu, Nov 29, 2012 at 7:39 PM, Robert Hancock  wrote:
> On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas  wrote:
>> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz  
>> wrote:
>>>
>>>
>>> -Original Message-
>>> From: Robert Hancock [mailto:hancock...@gmail.com]
>>> Sent: Wednesday, November 28, 2012 7:55 PM
>>> To: Justin Piszcz
>>> Cc: Bjorn Helgaas; Bruno Prémont; supp...@supermicro.com;
>>> linux-kernel@vger.kernel.org; Dan Williams
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>>> bug question
>>>
>>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz 
>>> wrote:


 -Original Message-
 From: Robert Hancock [mailto:hancock...@gmail.com]
 Sent: Wednesday, November 28, 2012 7:35 PM
 To: Justin Piszcz
 Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; supp...@supermicro.com;
 linux-kernel@vger.kernel.org; 'Dan Williams'
 Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>>> ACPI/firmware
 bug question


 What does lspci -vv show on that controller? Not sure what actual
 chipset that controller is, but there's a known issue with some Marvell
 6Gbps SATA controllers with DMAR enabled - it seems the device issues
 memory read/write requests from the wrong PCI function ID and the IOMMU
 rightly denies access as the function listed in the requests doesn't
 have any mapping to that memory. I don't think there's presently a
 workaround other than disabling DMAR. We could (and likely should) be
 detecting that device and adding some kind of quirk for it.

 That sounds likely...
 It is shown below:

 Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
 Adapter

 lspci -vv output:

 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
 controller
>>>
>>> Yeah, that's one of those controllers I think. But I can't tell from
>>> the bit of the dmesg you posted exactly what's going on. Can you post
>>> a full boot log from having the card installed and some drive attached
>>> (by putting the boot drive on another controller for example)?
>>>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [0.632170]  pci:ff: ACPI _OSC support notification failed,
 disabling
> PCIe ASPM
> [0.632239]  pci:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)

 What's the full dmesg from this machine (or is it already posted
>>> somewhere)?

 It is now available here:
 http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>>
 Is that the same boot log? It doesn't have this error in it.
>>>
>>> Yes, the error is here: (its towards the bottom)
>>>
>>>  [7.973015] ata14.00: qc timeout (cmd 0xa1)
>>> [8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   51.824682] dmar: DRHD: handling fault status reg 502
>>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>>
>> You have these devices:
>>
>> pci :04:00.0: [10de:01d3] type 00 class 0x03 nVidia G72
>> pci :84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 
>> SATA
>> pci :84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>>
>> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
>> and if you get rid of that driver, they'll probably go away.
>>
>> But this 84:00.1 DMAR error:
>>
>> dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff0
>> DMAR:[fault reason 02] Present bit in context entry is clear
>>
>> looks like the probable cause of the Marvell issue.  It looks similar
>> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
>> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
>> DMAR rejects DMA that appears to be from bb:dd.1.
>>
>> Another report that's even more similar is
>> https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
>> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
>> is exactly like what you're seeing.
>>
>> So you're not alone, but 

Re: [Suggestion] drivers/tty: drivers/char/: for MAX_ASYNC_BUFFER_SIZE

2012-11-29 Thread Chen Gang
于 2012年11月30日 10:27, Chen Gang 写道:
> 于 2012年11月29日 21:41, Alan Cox 写道:
>> On Thu, 29 Nov 2012 13:07:28 +0800
>> Chen Gang  wrote:
>>
>>> Hello Greg Kroah-Hartman:
>>>
>>> for MAX_ASYNC_BUFFER_SIZE:
>>>   it is defined as 4096;
>>>   but for the max buffer size which it processes, is 65535.
>>>   so suggest to #define MAX_ASYNC_BUFFER_SIZE 0x1  (better than 0x)
>>
>> I don't see the need to change this. Possibly some of the old synclink
>> drivers need to check more carefully for overflows if configured for very
>> large frame sizes ?
>>
> 

  sorry forget to reply "I don't see the need to change this"

  I think what Alan Cox said is:
if it was necessary (surely overflows by testing):
  not touch MAX_ASYNC_BUFFER_SIZE,
  can judge the buffer whether larger than MAX_ASYNC_BUFFER_SIZE.
if larger, we can skip it.

  I think we also have another 4 ways: (if surely overflows by testing)
I prefer:
  use flag_buf[HDLC_MAX_FRAME_SIZE] instead of 
flag_buf[MAX_ASYNC_BUFFER_SIZE]
  it is the simplest and clearest way.
  it will consume a little more memory, but it seems minor negative effect 
with global.
2nd way:
  dynamically allocate relative buffer to fit the current max frame size 
(4096..65535).
  it is not complex, but can save a little memory
3rd way:
  we have to make a loop to receive one frame.
  it will be complex, need reconstruction current source code (and more 
testing).
4th way:
  #define MAX_ASYNC_BUFFER_SIZE  0x1
  it is my original suggestion, but it seems not quite suitable.


  welcome to giving your choice (or provide your new choice), thanks.

  thanks.

gchen.
> I am just through code review (so it is only a suggestion), I will try to 
> perform test.
> also welcome another members to help testing.
> 
> this issue has effect with 4 synclink drivers (most of source code are the 
> same).
>   drivers/char/pcmcia/synclink_cs.c:213:  char 
> flag_buf[MAX_ASYNC_BUFFER_SIZE];
>   drivers/tty/synclink_gt.c:320:  char flag_buf[MAX_ASYNC_BUFFER_SIZE];
>   drivers/tty/synclink.c:294: char flag_buf[MAX_ASYNC_BUFFER_SIZE];
>   drivers/tty/synclinkmp.c:265:   char flag_buf[MAX_ASYNC_BUFFER_SIZE];
> 
> for the char_buf, has already useless (can be removed)
>   drivers/tty/synclink_gt.c:321:  char char_buf[MAX_ASYNC_BUFFER_SIZE];
>   drivers/tty/synclink.c:295: char char_buf[MAX_ASYNC_BUFFER_SIZE];   
>   drivers/tty/synclinkmp.c:266:   char char_buf[MAX_ASYNC_BUFFER_SIZE];
> 
> 
>>>
>>> -
>>> Step 3:
>>>
>>> one sample in drivers/tty/n_gsm.c  (same for another implementation)
>>>
>>>   receive_buf is a function ptr which may be gsmld_receive_buf at line 
>>> 2819. 
>>>   it does not check the length of count whether larger than 
>>> MAX_ASYNC_BUFFER_SIZE.
>>>   if count is larger than MAX_ASYNC_BUFFER_SIZE, will cause issue.
>>
>> Why should it - MAX_ASYNC_BUFFER_SIZE is an internal detail of the
>> synclink drivers. 
>>
>> Alan
>>
>>
> 
>   no, not need.  (excuse me, my English is not quite well, maybe you 
> misunderstand what I said)
> 
>   at least, currently:
> the caller should be sure that the buffer length is enough (it seems not, 
> I need test it).
> the internal has no duty to check it.
> 
> 


-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 0/5] Add movablecore_map boot option

2012-11-29 Thread H. Peter Anvin
Disk I/O is still a big consumer of lowmem.

"Luck, Tony"  wrote:

>> If any significant percentage of memory is in ZONE_MOVABLE then the
>memory
>> hotplug people will have to deal with all the lowmem/highmem problems
>> that used to be faced by 32-bit x86 with PAE enabled. 
>
>While these problems may still exist on large systems - I think it
>becomes
>harder to construct workloads that run into problems.  In those bad old
>days
>a significant fraction of lowmem was consumed by the kernel ... so it
>was
>pretty easy to find meta-data intensive workloads that would push it
>over
>a cliff.  Here we  are talking about systems with say 128GB per node
>divided
>into 64GB moveable and 64GB non-moveable (and I'd regard this as a
>rather
>low-end machine).  Unless the workload consists of zillions of tiny
>processes
>all mapping shared memory blocks, the percentage of memory allocated to
>the kernel is going to be tiny compared with the old 4GB days.
>
>-Tony

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] 3.0.53-rt77

2012-11-29 Thread Frank Rowand
On 11/27/12 18:44, Steven Rostedt wrote:
> 
> Dear RT Folks,
> 
> I'm pleased to announce the 3.0.53-rt77 stable release.

3.0.53-rt77 builds and boots on some ARM boards (PandaBoard, Realview) for:

  - SMP, PREEMPT_RT_FULL
  - SMP, PREEMPT_NONE
  - UP, PREEMPT_RT_FULL
  - UP, PREEMPT_NONE


-Frank

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core

2012-11-29 Thread Kamezawa Hiroyuki
(2012/11/29 6:34), Tejun Heo wrote:
> Hello, guys.
> 
> Depending on cgroup core locking - cgroup_mutex - is messy and makes
> cgroup prone to locking dependency problems.  The current code already
> has lock dependency loop - memcg nests get_online_cpus() inside
> cgroup_mutex.  cpuset the other way around.
> 
> Regardless of the locking details, whatever is protecting cgroup has
> inherently to be something outer to most other locking constructs.
> cgroup calls into a lot of major subsystems which in turn have to
> perform subsystem-specific locking.  Trying to nest cgroup
> synchronization inside other locks isn't something which can work
> well.
> 
> cgroup now has enough API to allow subsystems to implement their own
> locking and cgroup_mutex is scheduled to be made private to cgroup
> core.  This patchset makes cpuset implement its own locking instead of
> relying on cgroup_mutex.
> 
> cpuset is rather nasty in this respect.  Some of it seems to have come
> from the implementation history - cgroup core grew out of cpuset - but
> big part stems from cpuset's need to migrate tasks to an ancestor
> cgroup when an hotunplug event makes a cpuset empty (w/o any cpu or
> memory).
> 
> This patchset decouples cpuset locking from cgroup_mutex.  After the
> patchset, cpuset uses cpuset-specific cpuset_mutex instead of
> cgroup_mutex.  This also removes the lockdep warning triggered during
> cpu offlining (see 0009).
> 
> Note that this leaves memcg as the only external user of cgroup_mutex.
> Michal, Kame, can you guys please convert memcg to use its own locking
> too?
> 

Hmm. let me seeat quick glance cgroup_lock() is used at
  hierarchy policy change
  kmem_limit
  migration policy change
  swapiness change
  oom control

Because all aboves takes care of changes in hierarchy,
Having a new memcg's mutex in ->create() may be a way.

Ah, hm, Costa is mentioning task-attach. is the task-attach problem in memcg ?

Thanks,
-Kame











> This patchset contains the following thirteen patches.
> 
>   0001-cpuset-remove-unused-cpuset_unlock.patch
>   0002-cpuset-remove-fast-exit-path-from-remove_tasks_in_em.patch
>   0003-cpuset-introduce-css_on-offline.patch
>   0004-cpuset-introduce-CS_ONLINE.patch
>   0005-cpuset-introduce-cpuset_for_each_child.patch
>   0006-cpuset-cleanup-cpuset-_can-_attach.patch
>   0007-cpuset-drop-async_rebuild_sched_domains.patch
>   0008-cpuset-reorganize-CPU-memory-hotplug-handling.patch
>   0009-cpuset-don-t-nest-cgroup_mutex-inside-get_online_cpu.patch
>   0010-cpuset-make-CPU-memory-hotplug-propagation-asynchron.patch
>   0011-cpuset-pin-down-cpus-and-mems-while-a-task-is-being-.patch
>   0012-cpuset-schedule-hotplug-propagation-from-cpuset_atta.patch
>   0013-cpuset-replace-cgroup_mutex-locking-with-cpuset-inte.patch
> 
> 0001-0006 are prep patches.
> 
> 0007-0009 make cpuset nest get_online_cpus() inside cgroup_mutex, not
> the other way around.
> 
> 0010-0012 plug holes which would be exposed by switching to
> cpuset-specific locking.
> 
> 0013 replaces cgroup_mutex with cpuset_mutex.
> 
> This patchset is on top of cgroup/for-3.8 (fddfb02ad0) and also
> available in the following git branch.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cpuset-locking
> 
> diffstat follows.
> 
>   kernel/cpuset.c |  750 
> +++-
>   1 file changed, 423 insertions(+), 327 deletions(-)
> 
> Thanks.
> 
> --
> tejun
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] Add movablecore_map boot option

2012-11-29 Thread Yasuaki Ishimatsu

Hi Jiang,

2012/11/30 11:56, Jiang Liu wrote:

Hi Mel,
Thanks for your great comments!

On 2012-11-29 19:00, Mel Gorman wrote:

On Wed, Nov 28, 2012 at 01:38:47PM -0800, H. Peter Anvin wrote:

On 11/28/2012 01:34 PM, Luck, Tony wrote:


2. use boot option
   This is our proposal. New boot option can specify memory range to use
   as movable memory.


Isn't this just moving the work to the user? To pick good values for the
movable areas, they need to know how the memory lines up across
node boundaries ... because they need to make sure to allow some
non-movable memory allocations on each node so that the kernel can
take advantage of node locality.

So the user would have to read at least the SRAT table, and perhaps
more, to figure out what to provide as arguments.

Since this is going to be used on a dynamic system where nodes might
be added an removed - the right values for these arguments might
change from one boot to the next. So even if the user gets them right
on day 1, a month later when a new node has been added, or a broken
node removed the values would be stale.



I gave this feedback in person at LCE: I consider the kernel
configuration option to be useless for anything other than debugging.
Trying to promote it as an actual solution, to be used by end users in
the field, is ridiculous at best.



I've not been paying a whole pile of attention to this because it's not an
area I'm active in but I agree that configuring ZONE_MOVABLE like
this at boot-time is going to be problematic. As awkward as it is, it
would probably work out better to only boot with one node by default and
then hot-add the nodes at runtime using either an online sysfs file or
an online-reserved file that hot-adds the memory to ZONE_MOVABLE. Still
clumsy but better than specifying addresses on the command line.

That said, I also find using ZONE_MOVABLE to be a problem in itself that
will cause problems down the road. Maybe this was discussed already but
just in case I'll describe the problems I see.

If any significant percentage of memory is in ZONE_MOVABLE then the memory
hotplug people will have to deal with all the lowmem/highmem problems
that used to be faced by 32-bit x86 with PAE enabled. As a simple example,
metadata intensive workloads will not be able to use all of memory because
the kernel allocations will be confined to a subset of memory. A more
complex example is that page table page allocations are also restricted
meaning it's possible that a process will not even be able to mmap() a high
percentage of memory simply because it cannot allocate the page tables to
store the mappings. ZONE_MOVABLE works up to a *point*, but it's a hack. It
was a hack when it was introduced but at least then the expectation was
that ZONE_MOVABLE was going to be used for huge pages and there at least
an expectation that it would not be available for normal usage.

Fundamentally the reason one would want to use ZONE_MOVABLE is because
we cannot migrate a lot of kernel memory -- slab pages, page table pages,
device-allocated buffers etc.  My understanding is that other OS's get around
this by requiring that subsystems and drivers have callbacks that allow the
core VM to force certain memory to be released but that may be impractical
for Linux. I don't know for sure though, this is just what I heard.

As I know, one other OS limits immovable pages at low end, and the limit
will increase on demand. But the drawback of this solution is serious
performance drop (average about 10%) because it essentially disable NUMA
optimization for kernel/DMA memory allocations.


For Linux, the hotplug people need to start thinking about how to get
around this migration problem. The first problem faced is the memory model
and how it maps virt->phys addresses. We have a 1:1 mapping because it's
fast but not because it's a fundamental requirement. Start considering
what happens if the memory model is changed to allow some sections to have
fast lookup for virt_to_phys and other sections to have slow lookups. On
hotplug, try and empty all the sections. If the section cannot be emptied
because of kernel pages then the section gets marked as "offline-migrated"
or something. Stop the whole machine (yes, I mean stop_machine), copy
those unmovable pages to another location, update the kernel virt->phys
mapping for the section being offlined so the virt addresses point to the
new physical addresses and resume.  Virt->phys lookups are going to be
a lot slower because a full section lookup will be necessary every time
effectively breaking SPARSE_VMEMMAP and there will be a performance penalty
but it should work. This will cover some slab pages where the data is only
accessed via the virtual address -- inode caches, dcache etc.

It will not work where the physical address is used. The obvious example
is page table pages. For page tables, during stop machine you will have to
walk all processes page tables looking for references to the page you're
trying to 

Re: [PATCH][GIT PULL] ftrace: Clear bits properly in reset_iter_read()

2012-11-29 Thread Steven Rostedt
On Fri, 2012-11-16 at 08:22 -0500, Steven Rostedt wrote:
> Ingo,

Ping?

-- Steve

> 
> Please pull the latest tip/perf/urgent tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> tip/perf/urgent
> 
> Head SHA1: 70f77b3f7ec010ff9624c1f2e39a81babc9e2429
> 
> 
> Dan Carpenter (1):
>   ftrace: Clear bits properly in reset_iter_read()
> 
> 
>  kernel/trace/ftrace.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> ---
> commit 70f77b3f7ec010ff9624c1f2e39a81babc9e2429
> Author: Dan Carpenter 
> Date:   Sat Jun 9 19:10:27 2012 +0300
> 
> ftrace: Clear bits properly in reset_iter_read()
> 
> There is a typo here where '&' is used instead of '|' and it turns the
> statement into a noop.  The original code is equivalent to:
> 
>   iter->flags &= ~((1 << 2) & (1 << 4));
> 
> Link: http://lkml.kernel.org/r/20120609161027.GD6488@elgon.mountain
> 
> Cc: sta...@vger.kernel.org # all of them
> Signed-off-by: Dan Carpenter 
> Signed-off-by: Steven Rostedt 
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 9dcf15d..51b7159 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2437,7 +2437,7 @@ static void reset_iter_read(struct ftrace_iterator 
> *iter)
>  {
>   iter->pos = 0;
>   iter->func_pos = 0;
> - iter->flags &= ~(FTRACE_ITER_PRINTALL & FTRACE_ITER_HASH);
> + iter->flags &= ~(FTRACE_ITER_PRINTALL | FTRACE_ITER_HASH);
>  }
>  
>  static void *t_start(struct seq_file *m, loff_t *pos)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V1 1/2] Xen acpi memory hotplug driver

2012-11-29 Thread Liu, Jinsong
Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 21, 2012 at 11:45:04AM +, Liu, Jinsong wrote:
>>> From 630c65690c878255ce71e7c1172338ed08709273 Mon Sep 17 00:00:00
>>> 2001 
>> From: Liu Jinsong 
>> Date: Tue, 20 Nov 2012 21:14:37 +0800
>> Subject: [PATCH 1/2] Xen acpi memory hotplug driver
>> 
>> Xen acpi memory hotplug consists of 2 logic components:
>> Xen acpi memory hotplug driver and Xen hypercall.
>> 
>> This patch implement Xen acpi memory hotplug driver. When running
>> under xen platform, Xen driver will early occupy (so native driver
> 
> How will it 'early occupy'? Can you spell it out here please?

Sure, will add it like
'When running under xen platform, at booting stage xen memory hotplug driver 
will early occupy via subsys_initcall (earlier than native module_init), so xen 
driver will take effect and native driver will be blocked'.

> 
>> will be blocked). When acpi memory notify OSPM, xen driver will take
>> effect, adding related memory device and parsing memory information.
>> 
>> Signed-off-by: Liu Jinsong 
>> ---
>>  drivers/xen/Kconfig   |   11 +
>>  drivers/xen/Makefile  |1 +
>>  drivers/xen/xen-acpi-memhotplug.c |  383
>>  + 3 files changed, 395
>>  insertions(+), 0 deletions(-) create mode 100644
>> drivers/xen/xen-acpi-memhotplug.c 
>> 
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index 126d8ce..abd0396 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -206,4 +206,15 @@ config XEN_MCE_LOG
>>Allow kernel fetching MCE error from Xen platform and
>>converting it into Linux mcelog format for mcelog tools
>> 
>> +config XEN_ACPI_MEMORY_HOTPLUG
>> +bool "Xen ACPI memory hotplug"
> 
> There should be a way to make this a module.

I have some concerns to make it a module:
1. xen and native memhotplug driver both work as module, while we need early 
load xen driver.
2. if possible, a xen stub driver may solve load sequence issue, but it may 
involve other issues
  * if xen driver load then unload, native driver may have chance to load 
successfully;
  * if xen driver load --> unload --> load again, then it will lose hotplug 
notification during unload period;
  * if xen driver load --> unload --> load again, then it will re-add all 
memory devices, but the handle for 'booting memory device' and 'hotplug memory 
device' are different while we have no way to distinguish these 2 kind of 
devices.

IMHO I think to make xen hotplug logic as module may involves unexpected 
result. Is there any obvious advantages of doing so? after all we have provided 
config choice to user. Thoughts?

> 
> 
>> +depends on XEN_DOM0 && X86_64 && ACPI
>> +default n
>> +help
>> +  This is Xen acpi memory hotplug.
>    -> ACPI
> 
>> +
>> +  Currently Xen only support acpi memory hot-add. If you want
>  -> ACPI
> 
>> +  to hot-add memory at runtime (the hot-added memory cannot be
>> +  removed until machine stop), select Y here, otherwise select N. +
>>  endmenu
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index 7435470..c339eb4 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -30,6 +30,7 @@ obj-$(CONFIG_XEN_MCE_LOG)  += mcelog.o
>>  obj-$(CONFIG_XEN_PCIDEV_BACKEND)+= xen-pciback/
>>  obj-$(CONFIG_XEN_PRIVCMD)   += xen-privcmd.o
>>  obj-$(CONFIG_XEN_ACPI_PROCESSOR)+= xen-acpi-processor.o
>> +obj-$(CONFIG_XEN_ACPI_MEMORY_HOTPLUG)   += xen-acpi-memhotplug.o 
>>  xen-evtchn-y:= evtchn.o xen-gntdev-y
>> := gntdev.o
>>  xen-gntalloc-y  := gntalloc.o
>> diff --git a/drivers/xen/xen-acpi-memhotplug.c
>> b/drivers/xen/xen-acpi-memhotplug.c new file mode 100644 index
>> 000..f0c7990 --- /dev/null
>> +++ b/drivers/xen/xen-acpi-memhotplug.c
>> @@ -0,0 +1,383 @@
>> +/*
>> + * Copyright (C) 2012 Intel Corporation
>> + *Author: Liu Jinsong 
>> + *Author: Jiang Yunhong  + *
>> + * This program is free software; you can redistribute it and/or
>> modify + * it under the terms of the GNU General Public License as
>> published by + * the Free Software Foundation; either version 2 of
>> the License, or (at + * your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> but + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE
>> or + * NON INFRINGEMENT.  See the GNU General Public License for
>> more + * details. + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define ACPI_MEMORY_DEVICE_CLASS"memory"
>> +#define ACPI_MEMORY_DEVICE_HID  "PNP0C80"
>> +#define ACPI_MEMORY_DEVICE_NAME "Hotplug Mem Device"
> 
> Weird tabs?
> 

It ported from native and 

Re: [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited

2012-11-29 Thread Mikulas Patocka


On Thu, 29 Nov 2012, Andrew Morton wrote:

> On Tue, 27 Nov 2012 22:59:52 -0500 (EST)
> Mikulas Patocka  wrote:
> 
> > percpu-rwsem: use synchronize_sched_expedited
> > 
> > Use synchronize_sched_expedited() instead of synchronize_sched()
> > to improve mount speed.
> > 
> > This patch improves mount time from 0.500s to 0.013s.
> > 
> > Note: if realtime people complain about the use
> > synchronize_sched_expedited() and synchronize_rcu_expedited(), I suggest
> > that they introduce an option CONFIG_REALTIME or
> > /proc/sys/kernel/realtime and turn off these *_expedited functions if
> > the option is enabled (i.e. turn synchronize_sched_expedited into
> > synchronize_sched and synchronize_rcu_expedited into synchronize_rcu).
> > 
> > Signed-off-by: Mikulas Patocka 
> 
> So I read through this thread but I really didn't see a clear
> description of why mount() got slower.  The changelog for 4b05a1c74d1
> is spectacularly awful :(
> 
> 
> Apparently the slowdown occurred because a blockdev mount patch
> 62ac665ff9fc07497ca524 ("blockdev: turn a rw semaphore into a percpu rw
> semaphore") newly uses percpu rwsems, and percpu rwsems are slow on the
> down_write() path.
> 
> And using synchronize_sched_expedited() rather than synchronize_sched()
> makes percpu_down_write() somewhat less slow.  Correct?

Yes.

> Why is it OK to use synchronize_sched_expedited() here?  If it's
> faster, why can't we use synchronize_sched_expedited() everywhere and
> zap synchronize_sched()?

Because synchronize_sched_expedited sends interrupts to all processors and 
it is bad for realtime workloads.

Peter Zijlstra once complained when I used synchronize_rcu_expedited in 
bdi_remove_from_list (but he left it there).

I suggest that if it really hurts real time response for someone, let they 
introduce a switch to turn it into non-expedited call.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] Add movablecore_map boot option

2012-11-29 Thread Jiang Liu
Hi Mel,
Thanks for your great comments!

On 2012-11-29 19:00, Mel Gorman wrote:
> On Wed, Nov 28, 2012 at 01:38:47PM -0800, H. Peter Anvin wrote:
>> On 11/28/2012 01:34 PM, Luck, Tony wrote:

 2. use boot option
   This is our proposal. New boot option can specify memory range to use
   as movable memory.
>>>
>>> Isn't this just moving the work to the user? To pick good values for the
>>> movable areas, they need to know how the memory lines up across
>>> node boundaries ... because they need to make sure to allow some
>>> non-movable memory allocations on each node so that the kernel can
>>> take advantage of node locality.
>>>
>>> So the user would have to read at least the SRAT table, and perhaps
>>> more, to figure out what to provide as arguments.
>>>
>>> Since this is going to be used on a dynamic system where nodes might
>>> be added an removed - the right values for these arguments might
>>> change from one boot to the next. So even if the user gets them right
>>> on day 1, a month later when a new node has been added, or a broken
>>> node removed the values would be stale.
>>>
>>
>> I gave this feedback in person at LCE: I consider the kernel
>> configuration option to be useless for anything other than debugging.
>> Trying to promote it as an actual solution, to be used by end users in
>> the field, is ridiculous at best.
>>
> 
> I've not been paying a whole pile of attention to this because it's not an
> area I'm active in but I agree that configuring ZONE_MOVABLE like
> this at boot-time is going to be problematic. As awkward as it is, it
> would probably work out better to only boot with one node by default and
> then hot-add the nodes at runtime using either an online sysfs file or
> an online-reserved file that hot-adds the memory to ZONE_MOVABLE. Still
> clumsy but better than specifying addresses on the command line.
> 
> That said, I also find using ZONE_MOVABLE to be a problem in itself that
> will cause problems down the road. Maybe this was discussed already but
> just in case I'll describe the problems I see.
> 
> If any significant percentage of memory is in ZONE_MOVABLE then the memory
> hotplug people will have to deal with all the lowmem/highmem problems
> that used to be faced by 32-bit x86 with PAE enabled. As a simple example,
> metadata intensive workloads will not be able to use all of memory because
> the kernel allocations will be confined to a subset of memory. A more
> complex example is that page table page allocations are also restricted
> meaning it's possible that a process will not even be able to mmap() a high
> percentage of memory simply because it cannot allocate the page tables to
> store the mappings. ZONE_MOVABLE works up to a *point*, but it's a hack. It
> was a hack when it was introduced but at least then the expectation was
> that ZONE_MOVABLE was going to be used for huge pages and there at least
> an expectation that it would not be available for normal usage.
> 
> Fundamentally the reason one would want to use ZONE_MOVABLE is because
> we cannot migrate a lot of kernel memory -- slab pages, page table pages,
> device-allocated buffers etc.  My understanding is that other OS's get around
> this by requiring that subsystems and drivers have callbacks that allow the
> core VM to force certain memory to be released but that may be impractical
> for Linux. I don't know for sure though, this is just what I heard.
As I know, one other OS limits immovable pages at low end, and the limit
will increase on demand. But the drawback of this solution is serious
performance drop (average about 10%) because it essentially disable NUMA
optimization for kernel/DMA memory allocations.

> For Linux, the hotplug people need to start thinking about how to get
> around this migration problem. The first problem faced is the memory model
> and how it maps virt->phys addresses. We have a 1:1 mapping because it's
> fast but not because it's a fundamental requirement. Start considering
> what happens if the memory model is changed to allow some sections to have
> fast lookup for virt_to_phys and other sections to have slow lookups. On
> hotplug, try and empty all the sections. If the section cannot be emptied
> because of kernel pages then the section gets marked as "offline-migrated"
> or something. Stop the whole machine (yes, I mean stop_machine), copy
> those unmovable pages to another location, update the kernel virt->phys
> mapping for the section being offlined so the virt addresses point to the
> new physical addresses and resume.  Virt->phys lookups are going to be
> a lot slower because a full section lookup will be necessary every time
> effectively breaking SPARSE_VMEMMAP and there will be a performance penalty
> but it should work. This will cover some slab pages where the data is only
> accessed via the virtual address -- inode caches, dcache etc.
> 
> It will not work where the physical address is used. The obvious example
> is page table 

RE: [PATCH v2 0/5] Add movablecore_map boot option

2012-11-29 Thread Luck, Tony
> If any significant percentage of memory is in ZONE_MOVABLE then the memory
> hotplug people will have to deal with all the lowmem/highmem problems
> that used to be faced by 32-bit x86 with PAE enabled. 

While these problems may still exist on large systems - I think it becomes
harder to construct workloads that run into problems.  In those bad old days
a significant fraction of lowmem was consumed by the kernel ... so it was
pretty easy to find meta-data intensive workloads that would push it over
a cliff.  Here we  are talking about systems with say 128GB per node divided
into 64GB moveable and 64GB non-moveable (and I'd regard this as a rather
low-end machine).  Unless the workload consists of zillions of tiny processes
all mapping shared memory blocks, the percentage of memory allocated to
the kernel is going to be tiny compared with the old 4GB days.

-Tony

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-29 Thread Yasuaki Ishimatsu

Hi Jianguo,

2012/11/30 11:47, Jianguo Wu wrote:

Hi Congyang,

Thanks for your review and comments.

On 2012/11/30 9:45, Wen Congyang wrote:


At 11/28/2012 05:40 PM, Jianguo Wu Wrote:

Hi Congyang,

I think vmemmap's pgtable pages should be freed after all entries are cleared, 
I have a patch to do this.
The code logic is the same as [Patch v4 09/12] memory-hotplug: remove page 
table of x86_64 architecture.

How do you think about this?

Signed-off-by: Jianguo Wu 
Signed-off-by: Jiang Liu 
---
  include/linux/mm.h  |1 +
  mm/sparse-vmemmap.c |  214 +++
  mm/sparse.c |5 +-
  3 files changed, 218 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5657670..1f26af5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
long pages, int node);
  void vmemmap_populate_print_last(void);
  void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
  unsigned long size);
+void vmemmap_free(struct page *memmap, unsigned long nr_pages);

  enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 1b7e22a..242cb28 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -29,6 +29,10 @@
  #include 
  #include 

+#ifdef CONFIG_MEMORY_HOTREMOVE
+#include 
+#endif
+
  /*
   * Allocate a block of memory to be used to back the virtual memory map
   * or to back the page tables that are used to create the mapping.
@@ -224,3 +228,213 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
vmemmap_buf_end = NULL;
}
  }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+static void vmemmap_free_pages(struct page *page, int order)
+{
+   struct zone *zone;
+   unsigned long magic;
+
+   magic = (unsigned long) page->lru.next;
+   if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
+   put_page_bootmem(page);
+
+   zone = page_zone(page);
+   zone_span_writelock(zone);
+   zone->present_pages++;
+   zone_span_writeunlock(zone);
+   totalram_pages++;
+   } else {
+   if (is_vmalloc_addr(page_address(page)))
+   vfree(page_address(page));


Hmm, vmemmap doesn't use vmalloc() to allocate memory.



yes, this can be removed.


+   else
+   free_pages((unsigned long)page_address(page), order);
+   }
+}
+
+static void free_pte_table(pmd_t *pmd)
+{
+   pte_t *pte, *pte_start;
+   int i;
+
+   pte_start = (pte_t *)pmd_page_vaddr(*pmd);
+   for (i = 0; i < PTRS_PER_PTE; i++) {
+   pte = pte_start + i;
+   if (pte_val(*pte))
+   return;
+   }
+
+   /* free a pte talbe */
+   vmemmap_free_pages(pmd_page(*pmd), 0);
+   spin_lock(_mm.page_table_lock);
+   pmd_clear(pmd);
+   spin_unlock(_mm.page_table_lock);
+}
+
+static void free_pmd_table(pud_t *pud)
+{
+   pmd_t *pmd, *pmd_start;
+   int i;
+
+   pmd_start = (pmd_t *)pud_page_vaddr(*pud);
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   pmd = pmd_start + i;
+   if (pmd_val(*pmd))
+   return;
+   }
+
+   /* free a pmd talbe */
+   vmemmap_free_pages(pud_page(*pud), 0);
+   spin_lock(_mm.page_table_lock);
+   pud_clear(pud);
+   spin_unlock(_mm.page_table_lock);
+}
+
+static void free_pud_table(pgd_t *pgd)
+{
+   pud_t *pud, *pud_start;
+   int i;
+
+   pud_start = (pud_t *)pgd_page_vaddr(*pgd);
+   for (i = 0; i < PTRS_PER_PUD; i++) {
+   pud = pud_start + i;
+   if (pud_val(*pud))
+   return;
+   }
+
+   /* free a pud table */
+   vmemmap_free_pages(pgd_page(*pgd), 0);
+   spin_lock(_mm.page_table_lock);
+   pgd_clear(pgd);
+   spin_unlock(_mm.page_table_lock);
+}
+
+static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase)
+{
+   struct page *page = pmd_page(*(pmd_t *)kpte);
+   int i = 0;
+   unsigned long magic;
+   unsigned long section_nr;
+
+   __split_large_page(kpte, address, pbase);
+   __flush_tlb_all();
+
+   magic = (unsigned long) page->lru.next;
+   if (magic == SECTION_INFO) {
+   section_nr = pfn_to_section_nr(page_to_pfn(page));
+   while (i < PTRS_PER_PMD) {
+   page++;
+   i++;
+   get_page_bootmem(section_nr, page, SECTION_INFO);
+   }
+   }
+
+   return 0;
+}
+
+static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long 
end)
+{
+   pte_t *pte;
+   unsigned long next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   for (; addr < end; pte++, addr += PAGE_SIZE) {
+   

Re: [ 02/38] PCI/PM: Fix deadlock when unbinding device if parent in D3cold

2012-11-29 Thread Huang Ying
On Thu, 2012-11-29 at 18:01 -0800, Greg Kroah-Hartman wrote:
> On Fri, Nov 23, 2012 at 03:47:42PM +0800, Huang Ying wrote:
> > On Fri, 2012-11-23 at 11:09 +0800, Huang Ying wrote:
> > > On Fri, 2012-11-23 at 02:35 +, Ben Hutchings wrote:
> > > > On Wed, 2012-11-21 at 16:39 -0800, Greg Kroah-Hartman wrote:
> > > > > 3.0-stable review patch.  If anyone has any objections, please let me 
> > > > > know.
> > > > > 
> > > > > --
> > > > > 
> > > > > From: Huang Ying 
> > > > > 
> > > > > commit 90b5c1d7c45eeb622302680ff96ed30c1a2b6f0e upstream.
> > > > > 
> > > > > If a PCI device and its parents are put into D3cold, unbinding the
> > > > > device will trigger deadlock as follow:
> > > > > 
> > > > > - driver_unbind
> > > > >   - device_release_driver
> > > > > - device_lock(dev)<--- previous 
> > > > > lock here
> > > > > - __device_release_driver
> > > > >   - pm_runtime_get_sync
> > > > > ...
> > > > >   - rpm_resume(dev)
> > > > > - rpm_resume(dev->parent)
> > > > >   ...
> > > > > - pci_pm_runtime_resume
> > > > >   ...
> > > > >   - pci_set_power_state
> > > > > - __pci_start_power_transition
> > > > >   - pci_wakeup_bus(dev->parent->subordinate)
> > > > > - pci_walk_bus
> > > > >   - device_lock(dev)  <--- deadlock here
> > > > > 
> > > > > 
> > > > > If we do not do device_lock in pci_walk_bus, we can avoid deadlock.
> > > > > Device_lock in pci_walk_bus is introduced in commit:
> > > > > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
> > > > > is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
> > > > > said device_lock is added to pci_walk_bus because:
> > > > > 
> > > > >   Some error handling functions call pci_walk_bus. For example, PCIe
> > > > >   aer. Here we lock the device, so the driver wouldn't detach from the
> > > > >   device, as the cb might call driver's callback function.
> > > > > 
> > > > > So I fixed the deadlock as follows:
> > > > > 
> > > > > - remove device_lock from pci_walk_bus
> > > > > - add device_lock into callback if callback will call driver's 
> > > > > callback
> > > > > 
> > > > > I checked pci_walk_bus users one by one, and found only PCIe aer needs
> > > > > device lock.
> > > > [...]
> > > > 
> > > > What about eeh_report_error() in
> > > > arch/powerpc/platforms/pseries/eeh_driver.c?
> > > 
> > > En...  Because pci_walk_bus() invocation is removed in 3.7, so this
> > > patch is only valid for 3.7.  We need another version for 3.6.
> > 
> > Here is the patch for 3.6.  I have no powerpc machine, so build test
> > only.
> > 
> > Subject: [BUGFIX] PCI/PM: Fix deadlock when unbind device if its parent in 
> > D3cold
> > 
> > If a PCI device and its parents are put into D3cold, unbinding the
> > device will trigger deadlock as follow:
> > 
> > - driver_unbind
> >   - device_release_driver
> > - device_lock(dev)  <--- previous lock here
> > - __device_release_driver
> >   - pm_runtime_get_sync
> > ...
> >   - rpm_resume(dev)
> > - rpm_resume(dev->parent)
> >   ...
> > - pci_pm_runtime_resume
> >   ...
> >   - pci_set_power_state
> > - __pci_start_power_transition
> >   - pci_wakeup_bus(dev->parent->subordinate)
> > - pci_walk_bus
> >   - device_lock(dev)<--- dead lock here
> > 
> > 
> > If we do not do device_lock in pci_walk_bus, we can avoid dead lock.
> > Device_lock in pci_walk_bus is introduced in commit:
> > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
> > is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
> > said device_lock is added to pci_walk_bus because:
> > 
> >   Some error handling functions call pci_walk_bus. For example, PCIe
> >   aer. Here we lock the device, so the driver wouldn't detach from the
> >   device, as the cb might call driver's callback function.
> > 
> > So I fixed the dead lock as follow:
> > 
> > - remove device_lock from pci_walk_bus
> > - add device_lock into callback if callback will call driver's callback
> > 
> > I checked pci_walk_bus users one by one, and found only PCIe aer needs
> > device lock.
> > 
> > Signed-off-by: Huang Ying 
> > Cc: Zhang Yanmin 
> > ---
> >  arch/powerpc/platforms/pseries/eeh_driver.c |   51 
> > ++--
> 
> Due to me applying a power pci patch,
> feadf7c0a1a7c08c74bebb4a13b755f8c40e3bbc in Linus's tree to 3.6-stable,
> this patch doesn't apply here anymore.
> 
> Because that patch is in the tree, is it now just safe to take your
> original, unmodified, version of this patch for 3.6-stable?

No.  My original version does not work.  I need to 

Re: [Suggestion] drivers/tty: drivers/char/: for MAX_ASYNC_BUFFER_SIZE

2012-11-29 Thread Chen Gang
于 2012年11月30日 02:32, Greg KH 写道:
> On Thu, Nov 29, 2012 at 01:57:59PM +0800, Chen Gang wrote:
>>> And, I really don't understand here, why do you want to change this?
>>> What is it going to change?  And why?
>>>
>>
>> Why:
>>   for the context MGSLPC_INFO *info in drivers/char/pcmcia/synclink_cs.c
>> info->max_frame_size can be the value between 4096 .. 65535 (can be
>> set by its module input parameter)
>> info->flag_buf length is 4096 (MAX_ASYNC_BUFFER_SIZE)
>>   in function rx_get_frame
>> the framesize is limit by info->max_frame_size, but may still be
>> larger that 4096.
>> when call function ldisc_receive_buf, info->flag_buf is equal to
>> 4096, but framesize can be more than 4096. it will cause memory over flow.
> 
> Do you use that pcmcia driver for anything?  Are those cards still
> around?

I am not use them.

I am just through code review (so it is only a suggestion).

this issue has effect with 4 synclink drivers
I checked their source code, all of them have the same issue.
  drivers/char/pcmcia/synclink_cs.c:213:char 
flag_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclink_gt.c:320:char flag_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclink.c:294:   char flag_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclinkmp.c:265: char flag_buf[MAX_ASYNC_BUFFER_SIZE];

by the way, for the char_buf, has already useless (can be removed)
  drivers/tty/synclink_gt.c:321:char char_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclink.c:295:   char char_buf[MAX_ASYNC_BUFFER_SIZE];   
  drivers/tty/synclinkmp.c:266: char char_buf[MAX_ASYNC_BUFFER_SIZE];



> 
>> What:
>>   #define MAX_ASYNC_BUFFER_SIZE  0x1 (instead of 4096, originally).
>>   let it match the max frame size.
>>
>> At last:
>>   my suggestion may be incorrect, need relative member (who expert about
>> it) to help checking.
> 
> That driver might be incorrect, yes, care to make up a patch for it and
> test it to verify it fixes the problem?
> 

and now Alan Cox has his own opinions
  at least, I think it is valuable to continue discussing about it.

if Alan Cox agree with it (but it seems not),  I will make patch, and try to 
perform test.
also welcome another members to help testing.



> thanks,
> 
> greg k-h
> 
> 


-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] Do a proper locking for mmap and block size change

2012-11-29 Thread Dave Chinner
On Thu, Nov 29, 2012 at 02:16:50PM -0800, Linus Torvalds wrote:
> On Thu, Nov 29, 2012 at 1:29 PM, Chris Mason  wrote:
> >
> > Just reading the new blkdev_get_blocks, it looks like we're mixing
> > shifts.  In direct-io.c map_bh->b_size is how much we'd like to map, and
> > it has no relation at all to the actual block size of the device.  The
> > interface is abusing b_size to ask for as large a mapping as possible.
> 
> Ugh. That's a big violation of how buffer-heads are supposed to work:
> the block number is very much defined to be in multiples of b_size
> (see for example "submit_bh()" that turns it into a sector number).
> 
> But you're right. The direct-IO code really *is* violating that, and
> knows that get_block() ends up being defined in i_blkbits regardless
> of b_size.

Same with mpage_readpages(), so it's not just direct IO that has
this problem

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-29 Thread Jianguo Wu
Hi Congyang,

Thanks for your review and comments.

On 2012/11/30 9:45, Wen Congyang wrote:

> At 11/28/2012 05:40 PM, Jianguo Wu Wrote:
>> Hi Congyang,
>>
>> I think vmemmap's pgtable pages should be freed after all entries are 
>> cleared, I have a patch to do this.
>> The code logic is the same as [Patch v4 09/12] memory-hotplug: remove page 
>> table of x86_64 architecture.
>>
>> How do you think about this?
>>
>> Signed-off-by: Jianguo Wu 
>> Signed-off-by: Jiang Liu 
>> ---
>>  include/linux/mm.h  |1 +
>>  mm/sparse-vmemmap.c |  214 
>> +++
>>  mm/sparse.c |5 +-
>>  3 files changed, 218 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5657670..1f26af5 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
>> long pages, int node);
>>  void vmemmap_populate_print_last(void);
>>  void register_page_bootmem_memmap(unsigned long section_nr, struct page 
>> *map,
>>unsigned long size);
>> +void vmemmap_free(struct page *memmap, unsigned long nr_pages);
>>  
>>  enum mf_flags {
>>  MF_COUNT_INCREASED = 1 << 0,
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 1b7e22a..242cb28 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -29,6 +29,10 @@
>>  #include 
>>  #include 
>>  
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +#include 
>> +#endif
>> +
>>  /*
>>   * Allocate a block of memory to be used to back the virtual memory map
>>   * or to back the page tables that are used to create the mapping.
>> @@ -224,3 +228,213 @@ void __init sparse_mem_maps_populate_node(struct page 
>> **map_map,
>>  vmemmap_buf_end = NULL;
>>  }
>>  }
>> +
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +static void vmemmap_free_pages(struct page *page, int order)
>> +{
>> +struct zone *zone;
>> +unsigned long magic;
>> +
>> +magic = (unsigned long) page->lru.next;
>> +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
>> +put_page_bootmem(page);
>> +
>> +zone = page_zone(page);
>> +zone_span_writelock(zone);
>> +zone->present_pages++;
>> +zone_span_writeunlock(zone);
>> +totalram_pages++;
>> +} else {
>> +if (is_vmalloc_addr(page_address(page)))
>> +vfree(page_address(page));
> 
> Hmm, vmemmap doesn't use vmalloc() to allocate memory.
> 

yes, this can be removed.

>> +else
>> +free_pages((unsigned long)page_address(page), order);
>> +}
>> +}
>> +
>> +static void free_pte_table(pmd_t *pmd)
>> +{
>> +pte_t *pte, *pte_start;
>> +int i;
>> +
>> +pte_start = (pte_t *)pmd_page_vaddr(*pmd);
>> +for (i = 0; i < PTRS_PER_PTE; i++) {
>> +pte = pte_start + i;
>> +if (pte_val(*pte))
>> +return;
>> +}
>> +
>> +/* free a pte talbe */
>> +vmemmap_free_pages(pmd_page(*pmd), 0);
>> +spin_lock(_mm.page_table_lock);
>> +pmd_clear(pmd);
>> +spin_unlock(_mm.page_table_lock);
>> +}
>> +
>> +static void free_pmd_table(pud_t *pud)
>> +{
>> +pmd_t *pmd, *pmd_start;
>> +int i;
>> +
>> +pmd_start = (pmd_t *)pud_page_vaddr(*pud);
>> +for (i = 0; i < PTRS_PER_PMD; i++) {
>> +pmd = pmd_start + i;
>> +if (pmd_val(*pmd))
>> +return;
>> +}
>> +
>> +/* free a pmd talbe */
>> +vmemmap_free_pages(pud_page(*pud), 0);
>> +spin_lock(_mm.page_table_lock);
>> +pud_clear(pud);
>> +spin_unlock(_mm.page_table_lock);
>> +}
>> +
>> +static void free_pud_table(pgd_t *pgd)
>> +{
>> +pud_t *pud, *pud_start;
>> +int i;
>> +
>> +pud_start = (pud_t *)pgd_page_vaddr(*pgd);
>> +for (i = 0; i < PTRS_PER_PUD; i++) {
>> +pud = pud_start + i;
>> +if (pud_val(*pud))
>> +return;
>> +}
>> +
>> +/* free a pud table */
>> +vmemmap_free_pages(pgd_page(*pgd), 0);
>> +spin_lock(_mm.page_table_lock);
>> +pgd_clear(pgd);
>> +spin_unlock(_mm.page_table_lock);
>> +}
>> +
>> +static int split_large_page(pte_t *kpte, unsigned long address, pte_t 
>> *pbase)
>> +{
>> +struct page *page = pmd_page(*(pmd_t *)kpte);
>> +int i = 0;
>> +unsigned long magic;
>> +unsigned long section_nr;
>> +
>> +__split_large_page(kpte, address, pbase);
>> +__flush_tlb_all();
>> +
>> +magic = (unsigned long) page->lru.next;
>> +if (magic == SECTION_INFO) {
>> +section_nr = pfn_to_section_nr(page_to_pfn(page));
>> +while (i < PTRS_PER_PMD) {
>> +page++;
>> +i++;
>> +get_page_bootmem(section_nr, page, SECTION_INFO);
>> +}
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static void vmemmap_pte_remove(pmd_t *pmd, 

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-29 Thread Robert Hancock
On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas  wrote:
> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz  
> wrote:
>>
>>
>> -Original Message-
>> From: Robert Hancock [mailto:hancock...@gmail.com]
>> Sent: Wednesday, November 28, 2012 7:55 PM
>> To: Justin Piszcz
>> Cc: Bjorn Helgaas; Bruno Prémont; supp...@supermicro.com;
>> linux-kernel@vger.kernel.org; Dan Williams
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>> bug question
>>
>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz 
>> wrote:
>>>
>>>
>>> -Original Message-
>>> From: Robert Hancock [mailto:hancock...@gmail.com]
>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>> To: Justin Piszcz
>>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; supp...@supermicro.com;
>>> linux-kernel@vger.kernel.org; 'Dan Williams'
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>> ACPI/firmware
>>> bug question
>>>
>>>
>>> What does lspci -vv show on that controller? Not sure what actual
>>> chipset that controller is, but there's a known issue with some Marvell
>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>> rightly denies access as the function listed in the requests doesn't
>>> have any mapping to that memory. I don't think there's presently a
>>> workaround other than disabling DMAR. We could (and likely should) be
>>> detecting that device and adding some kind of quirk for it.
>>>
>>> That sounds likely...
>>> It is shown below:
>>>
>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>> Adapter
>>>
>>> lspci -vv output:
>>>
>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>> controller
>>
>> Yeah, that's one of those controllers I think. But I can't tell from
>> the bit of the dmesg you posted exactly what's going on. Can you post
>> a full boot log from having the card installed and some drive attached
>> (by putting the boot drive on another controller for example)?
>>
 ==> Further issues with the X9SRL-F -- does this board support ASPM or is
 this a Linux/ASPM implementation issue?
 [0.632170]  pci:ff: ACPI _OSC support notification failed,
>>> disabling
 PCIe ASPM
 [0.632239]  pci:ff: Unable to request _OSC control (_OSC support
 mask: 0x08)
>>>
>>> What's the full dmesg from this machine (or is it already posted
>> somewhere)?
>>>
>>> It is now available here:
>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>
>>> Is that the same boot log? It doesn't have this error in it.
>>
>> Yes, the error is here: (its towards the bottom)
>>
>>  [7.973015] ata14.00: qc timeout (cmd 0xa1)
>> [8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [   51.824682] dmar: DRHD: handling fault status reg 502
>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>
> You have these devices:
>
> pci :04:00.0: [10de:01d3] type 00 class 0x03 nVidia G72
> pci :84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
> pci :84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>
> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
> and if you get rid of that driver, they'll probably go away.
>
> But this 84:00.1 DMAR error:
>
> dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff0
> DMAR:[fault reason 02] Present bit in context entry is clear
>
> looks like the probable cause of the Marvell issue.  It looks similar
> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
> DMAR rejects DMA that appears to be from bb:dd.1.
>
> Another report that's even more similar is
> https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
> is exactly like what you're seeing.
>
> So you're not alone, but unfortunately, nobody seems to be working on
> either bug report.  I took the liberty to add you to the cc: list of
> both.
>
> I don't really know what else to do at this point.  Maybe a SATA
> expert with some Marvell 

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean
On 29/11/12 06:00 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:
>>> Very interesting.  Do you have anything mounted on the corresponding
>>> directories on server?  The picture looks like you are getting empty
>>> fhandles in readdir+ respons for exactly the same directories that happen
>>> to be mountpoints on client.  In any case, we shouldn't do that blind
>>> d_drop() - empty fhandles can happen.  The only remaining question is
>>> why do they happen on that set of entries.  From my reading of
>>> encode_entryplus_baggage() it looks like we have compose_entry_fh()
>>> failing for those entries and those entries alone.  One possible cause
>>> would be d_mountpoint(dchild) being true on server.  If it is true, we
>>> can declare the case closed; if not, I really wonder what's going on.
>>
>> Those directories do have the server's own copies of the said directories 
>> bind mounted at the moment in a separate mount namespace.
>>
>> Unmounting those directories on the server does appear to stop the WARN_ON 
>> from triggering.
> 
> OK, that settles it.  WARN_ON() and printks in the area can be dropped;
> the right fix is below.  However, there's a similar place in cifs that
> also needs to be dealt with and I really, really wonder why the hell do
> we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
> bug, but I would like to understand what's wrong with simply returning
> 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care
> of unhashing, etc. itself.  Would make have_submounts() in there pointless
> as well - we could just return 0 and let d_invalidate() take care of the
> checks...  Trond?
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct 
> nfs_entry *entry)
>   nfs_refresh_inode(dentry->d_inode, entry->fattr);
>   goto out;
>   } else {
> - d_drop(dentry);
> + if (d_invalidate(dentry) != 0)
> + goto out;
>   dput(dentry);
>   }
>   }

Excellent, thanks. Is there any chance this will make it to 3.7? Also we might 
want to cc stable@ on this as well since it is a regression in 3.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache_locked

2012-11-29 Thread azurIt
>Here we go with the patch for 3.2.34. Could you test with this one,
>please?


Michal, unfortunately i had to boot to another kernel because the one with this 
patch keeps killing my MySQL server :( it was, probably, doing it on OOM in any 
cgroup - looks like OOM was not choosing processes only from cgroup which is 
out of memory. Here is the log from syslog: 
http://www.watchdog.sk/lkml/oom_mysqld

Maybe i should mention that MySQL server has it's own cgroup (called 'mysql') 
but with no limits to any resources.

azurIt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] zsmalloc: add function to query object size

2012-11-29 Thread Nitin Gupta
On Thu, Nov 29, 2012 at 5:58 PM, Minchan Kim  wrote:
> On Thu, Nov 29, 2012 at 04:53:15PM -0800, Nitin Gupta wrote:
>> On Thu, Nov 29, 2012 at 4:03 PM, Minchan Kim  wrote:
>> > On Thu, Nov 29, 2012 at 01:09:56AM -0800, Nitin Gupta wrote:
>> >> On 11/28/2012 11:45 PM, Minchan Kim wrote:
>> >> >On Mon, Nov 26, 2012 at 11:26:40PM -0800, Nitin Gupta wrote:
>> >> >>Adds zs_get_object_size(handle) which provides the size of
>> >> >>the given object. This is useful since the user (zram etc.)
>> >> >>now do not have to maintain object sizes separately, saving
>> >> >>on some metadata size (4b per page).
>> >> >>
>> >> >>The object handle encodes  pair which currently points
>> >> >
>> >> >Nitpick.
>> >> >
>> >> > in descrption would be proper instead of
>> >> >. You are going to replace  with .
>> >> >
>> >>
>> >> I think 'offset' conveys the meaning more clearly; 'index' is after
>> >> all just a chopped-off version of offset :)
>> >
>> > In my perceptoin, offset means location from some point by some byte
>> > while index is thing we have to multiply sizeof(object) to get.
>> > So you used index before the patch but now you try to use offset instead of
>> > index.
>> >
>> > Anyway, it's minor nitpick. Never mind if you don't agree. :)
>> >
>> >>
>> >>
>> >> >>to the start of the object. Now, the handle implicitly stores the size
>> >> >>information by pointing to the object's end instead. Since zsmalloc is
>> >> >>a slab based allocator, the start of the object can be easily determined
>> >> >>and the difference between the end offset encoded in the handle and the
>> >> >>start gives us the object size.
>> >> >
>> >> >It's a good idea. Look at just minor comment below.
>> >> >
>> >> >Let's talk with another concern. This patch surely helps current
>> >> >customer's memory usage who should add 4 byte for accounting the
>> >> >statistics while zsmalloc could be slow down.
>> >> >Is it really valuable?
>> >> >
>> >> >Yes. zram/zcache had a tight coupling with zsmalloc so it already
>> >> >lost modularity. :( In this POV, this patch makes sense.
>> >> >But if someone are willing to remove statistics for performance?
>> >> >Although he remove it, zsmalloc is still slow down.
>> >> >
>> >> >Statistic for user of zsmalloc should be cost of user himself, not 
>> >> >zsmalloc
>> >> >and it accelerates dependency with customer so it makes changing 
>> >> >allocator
>> >> >hard in future. We already had such experience(xvmalloc->zsmalloc). Of 
>> >> >course,
>> >> >it's not good that worry future things too early without any plan.
>> >> >So I'm not strong againt you. If any reviewer don't raise an eyebrow,
>> >> >I wil rely on your decision.
>> >> >
>> >>
>> >> Looking over the changes I do not expect any difference in
>> >> performance -- just a bit more arithmetic, however the use of
>> >> get_prev_page() which may dereference a few extra pointers might not
>> >> be really free. Also, my iozone test[1] shows very little difference
>> >> in performance:
>> >
>> > Iozone test isn't enough to prove the minor slow down because it would have
>> > many noise about I/O path and different compression ratio per I/O.
>> >
>> >>
>> >> With just the fix for crash (patch 1/2):
>> >> 9.28user 1159.84system 21:46.54elapsed 89%CPU (0avgtext+0avgdata
>> >> 50200maxresident)k
>> >> 212988544inputs+190660816outputs (1major+16706minor)pagefaults 0swaps
>> >>
>> >> With the new get_object_size() code (patch 1/2 + patch 2/2):
>> >> 9.20user 1112.63system 21:03.61elapsed 88%CPU (0avgtext+0avgdata
>> >> 50200maxresident)k
>> >> 194636568inputs+190500424outputs (1major+16707minor)pagefaults 0swaps
>> >>
>> >> I really cannot explain this ~40s speedup but anyways, I think
>> >> optimizing zsmalloc/zram should be taken up separately, at least
>> >> when this new code does not seem to have any significant effects.
>> >>
>> >>
>> >> [1] iozone test:
>> >>  - Create zram of size 1200m
>> >>  - Create ext4 fs
>> >>  - iozone -a -g 1G
>> >>  - In parallel: watch zram_stress
>> >>
>> >> # zram_stress
>> >> sync
>> >> echo 1 | sudo tee /proc/sys/vm/drop_caches
>> >>
>> >>
>> >> Surely, not the most accurate of the tests but gives an idea if
>> >> anything is making a significant difference.
>> >
>> > For more accurate test, it would be better to use zsmapbench by Seth.
>> > https://github.com/spartacus06/zsmapbench
>>
>> Thanks for the pointer. I also found fio which can generate I/O with 
>> different
>> levels of compressibility. These would help with future evaluations.
>>
>> > Frankly speaking, I don't expect it has a significant regression as you 
>> > said.
>> > More concern to me is we are going to make tight coupling with zram/zcache 
>> > +
>> > zsmalloc. It makes changing from zsmalloc to smarter allocator hard.
>>
>> I could not understand how this patch increasing zram + zsmalloc coupling.
>> All we need from any potential allocator replacement is an interface to
>> provide the object size given its handle.  If the new allocation cannot
>
> 

Re: [PATCH v2] Do a proper locking for mmap and block size change

2012-11-29 Thread Chris Mason
On Thu, Nov 29, 2012 at 07:13:02PM -0700, Linus Torvalds wrote:
> On Thu, Nov 29, 2012 at 5:16 PM, Chris Mason  wrote:
> >
> > I searched through filemap.c for the magic i_size check that would let
> > us get away with ignoring i_blkbits in get_blocks, but its just not
> > there.  The whole fallback-to-buffered scheme seems to rely on
> > get_blocks checking for i_size.  I really hope I'm just missing
> > something.
> 
> So generic_write_checks() limits the size to i_size at for writes (and
> for "isblk").

Great, that's what I was missing.

> 
> Sure, then it will do the buffered part after that, but that should
> all be fine anyway, since by then we use the normal page cache.
> 
> For reads, generic_file_aio_read() will check pos < size, but doesn't
> seem to actually limit the size of the iovec.

I couldn't explain that either.

> 
> I'm not sure why it doesn't just do "iov_shorten()".
> 
> Anyway, having looked at actually passing in the block size to
> get_block(), I can say that is a horrible idea. There are tons of
> get_block functions (for various filesystems), and *none* of them
> really want the block size, because they tend to work on block
> indexes. And if they do want the block size, they'll just get it from
> the inode or sb, since they are filesystems and it's all stable.
> 
> So the *only* of the places that would want the block size is
> fs/block_dev.c. And the callers really already seem to do the i_size
> check, although they sometimes do it badly. And since there are fewer
> callers than there are get_block() implementations, I think we should
> just fix the callers and be done with it.
> 
> Those generic_file_aio_read/write() functions in fs/direct-io.c really
> just seem to be badly written. The fact that they may depend on the
> i_size check in get_blocks() is sad, but I think we should fix it and
> just remove the check for block devices. That's going to simplify so
> much..
> 
> I updated the 'block-dev' branch to have that simpler fs/block_dev.c
> model instead. I'll look at the iovec shortening later. It's a
> non-fast-forward thing, look out!
> 
> (I actually think we should just add the max-offset check to
> rw_copy_check_uvector(). That one already does the MAX_RW_COUNT thing,
> and we could make it do a max_offset check as well).

This is definitely easier, and I can't see any reason not to do it.  I'm
used to get_block being expensive and so it didn't even cross my mind.

We can benchmark things just to make sure.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Suggestion] drivers/tty: drivers/char/: for MAX_ASYNC_BUFFER_SIZE

2012-11-29 Thread Chen Gang
于 2012年11月29日 21:41, Alan Cox 写道:
> On Thu, 29 Nov 2012 13:07:28 +0800
> Chen Gang  wrote:
> 
>> Hello Greg Kroah-Hartman:
>>
>> for MAX_ASYNC_BUFFER_SIZE:
>>   it is defined as 4096;
>>   but for the max buffer size which it processes, is 65535.
>>   so suggest to #define MAX_ASYNC_BUFFER_SIZE 0x1  (better than 0x)
> 
> I don't see the need to change this. Possibly some of the old synclink
> drivers need to check more carefully for overflows if configured for very
> large frame sizes ?
> 

I am just through code review (so it is only a suggestion), I will try to 
perform test.
also welcome another members to help testing.

this issue has effect with 4 synclink drivers (most of source code are the 
same).
  drivers/char/pcmcia/synclink_cs.c:213:char 
flag_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclink_gt.c:320:char flag_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclink.c:294:   char flag_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclinkmp.c:265: char flag_buf[MAX_ASYNC_BUFFER_SIZE];

for the char_buf, has already useless (can be removed)
  drivers/tty/synclink_gt.c:321:char char_buf[MAX_ASYNC_BUFFER_SIZE];
  drivers/tty/synclink.c:295:   char char_buf[MAX_ASYNC_BUFFER_SIZE];   
  drivers/tty/synclinkmp.c:266: char char_buf[MAX_ASYNC_BUFFER_SIZE];


>>
>> -
>> Step 3:
>>
>> one sample in drivers/tty/n_gsm.c  (same for another implementation)
>>
>>   receive_buf is a function ptr which may be gsmld_receive_buf at line 2819. 
>>   it does not check the length of count whether larger than 
>> MAX_ASYNC_BUFFER_SIZE.
>>   if count is larger than MAX_ASYNC_BUFFER_SIZE, will cause issue.
> 
> Why should it - MAX_ASYNC_BUFFER_SIZE is an internal detail of the
> synclink drivers. 
> 
> Alan
> 
> 

  no, not need.  (excuse me, my English is not quite well, maybe you 
misunderstand what I said)

  at least, currently:
the caller should be sure that the buffer length is enough (it seems not, I 
need test it).
the internal has no duty to check it.


-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] zram: fix invalid memory references during disk write

2012-11-29 Thread Nitin Gupta
On Thu, Nov 29, 2012 at 6:13 PM, Greg KH  wrote:
> On Fri, Nov 30, 2012 at 09:36:09AM +0900, Minchan Kim wrote:
>> Hi Greg,
>>
>> I would like to tidy zram_bvec_write up but it needs more churn
>> than needed to fix this bug and makes review hard. So I want you to pick up
>> this patch asap because it's a candidate of stable.
>
> Ok, I have way too many different, and competing, zram patches in my
> inbox right now.
>
> So I'm just going to delete them all, and ask Nitin to resend them all
> so I know what is going on here.
>
> Nitin, is that ok?
>

Sure. I will also include affected stable version in the changelog (or
whatever stable_kernel_rules says).

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 036/270] kbuild: Do not package /boot and /lib in make tar-pkg

2012-11-29 Thread Ben Hutchings
On Thu, 2012-11-29 at 17:38 -0800, Greg Kroah-Hartman wrote:
> On Tue, Nov 27, 2012 at 02:26:27AM +, Ben Hutchings wrote:
> > On Mon, 2012-11-26 at 14:55 -0200, Herton Ronaldo Krzesinski wrote:
> > > 3.5.7u1 -stable review patch.  If anyone has any objections, please let 
> > > me know.
> > > 
> > > --
> > > 
> > > From: Michal Marek 
> > > 
> > > commit fe04ddf7c2910362f3817c8156e41cbd6c0ee35d upstream.
> > > 
> > > There were reports of users destroying their Fedora installs by a kernel
> > > tarball that replaces the /lib -> /usr/lib symlink. Let's remove the
> > > toplevel directories from the tarball to prevent this from happening.
> > > 
> > > Reported-by: Andi Kleen 
> > > Suggested-by: Ben Hutchings 
> > > Signed-off-by: Michal Marek 
> > > [ herton: dropped unrelated changes to arch/x86/Makefile and
> > >   scripts/Makefile.fwinst, which don't apply anyway on 3.5, see commit
> > >   3ce9e53e71da0d5f3912f80e0dd6b501f304 upstream ]
> > > Signed-off-by: Herton Ronaldo Krzesinski 
> > 
> > This is missing from 3.4.
> 
> I don't think it is needed, as 3ce9e53e71da0d5f3912f80e0dd6b501f304
> didn't go into 3.4, so all should be good for now.

No, 3ce9e53e71da0d5f3912f80e0dd6b501f304 was later and reverted
unintended changes in fe04ddf7c2910362f3817c8156e41cbd6c0ee35d.  You
should probably combine the two.

See these stable commits:

3.2: 0767530 kbuild: Do not package /boot and /lib in make tar-pkg
3.6: 4bb50fa kbuild: Do not package /boot and /lib in make tar-pkg
3.6: 0a7f602 kbuild: Fix accidental revert in commit fe04ddf

Ben.

-- 
Ben Hutchings
Never attribute to conspiracy what can adequately be explained by stupidity.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3] zram: fix invalid memory references during disk write

2012-11-29 Thread Greg KH
On Fri, Nov 30, 2012 at 09:36:09AM +0900, Minchan Kim wrote:
> Hi Greg,
> 
> I would like to tidy zram_bvec_write up but it needs more churn
> than needed to fix this bug and makes review hard. So I want you to pick up
> this patch asap because it's a candidate of stable.

Ok, I have way too many different, and competing, zram patches in my
inbox right now.

So I'm just going to delete them all, and ask Nitin to resend them all
so I know what is going on here.

Nitin, is that ok?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] Do a proper locking for mmap and block size change

2012-11-29 Thread Linus Torvalds
On Thu, Nov 29, 2012 at 5:16 PM, Chris Mason  wrote:
>
> I searched through filemap.c for the magic i_size check that would let
> us get away with ignoring i_blkbits in get_blocks, but its just not
> there.  The whole fallback-to-buffered scheme seems to rely on
> get_blocks checking for i_size.  I really hope I'm just missing
> something.

So generic_write_checks() limits the size to i_size at for writes (and
for "isblk").

Sure, then it will do the buffered part after that, but that should
all be fine anyway, since by then we use the normal page cache.

For reads, generic_file_aio_read() will check pos < size, but doesn't
seem to actually limit the size of the iovec.

I'm not sure why it doesn't just do "iov_shorten()".

Anyway, having looked at actually passing in the block size to
get_block(), I can say that is a horrible idea. There are tons of
get_block functions (for various filesystems), and *none* of them
really want the block size, because they tend to work on block
indexes. And if they do want the block size, they'll just get it from
the inode or sb, since they are filesystems and it's all stable.

So the *only* of the places that would want the block size is
fs/block_dev.c. And the callers really already seem to do the i_size
check, although they sometimes do it badly. And since there are fewer
callers than there are get_block() implementations, I think we should
just fix the callers and be done with it.

Those generic_file_aio_read/write() functions in fs/direct-io.c really
just seem to be badly written. The fact that they may depend on the
i_size check in get_blocks() is sad, but I think we should fix it and
just remove the check for block devices. That's going to simplify so
much..

I updated the 'block-dev' branch to have that simpler fs/block_dev.c
model instead. I'll look at the iovec shortening later. It's a
non-fast-forward thing, look out!

(I actually think we should just add the max-offset check to
rw_copy_check_uvector(). That one already does the MAX_RW_COUNT thing,
and we could make it do a max_offset check as well).

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >