On 12.4.2013 2:22, Tejun Heo wrote:
> On Thu, Apr 11, 2013 at 08:06:10PM -0400, Mikulas Patocka wrote:
>> All that I can tell you is that adding an empty atomic operation
>> "cmpxchg(>bi_css->refcnt, bio->bi_css->refcnt, bio->bi_css->refcnt);"
>> to bio_clone_context and bio_disassociate_task
exynos4x12_clkdiv_dmc1 contains { G2DACP, DIVC2C, DIVC2C_ACLK }, thus
set the size to 3 rather than 6.
Signed-off-by: Axel Lin
---
drivers/devfreq/exynos4_bus.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c
These tables are never modified, make them const.
Signed-off-by: Axel Lin
---
drivers/devfreq/exynos4_bus.c | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c
index 3f37f3b..45d00d1 100644
---
We need to call mutex_unlock() in the error path.
Signed-off-by: Axel Lin
---
drivers/devfreq/exynos4_bus.c |3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c
index 1deee09..54b9615 100644
---
(2013/04/11 4:39), Yinghai Lu wrote:
> Index: linux-2.6/kernel/kexec.c
> ===
> --- linux-2.6.orig/kernel/kexec.c
> +++ linux-2.6/kernel/kexec.c
> @@ -1368,35 +1368,108 @@ static int __init parse_crashkernel_simp
> return 0;
>
Ping Rik, I also want to know the answer. ;-)
On 04/11/2013 01:58 PM, Will Huck wrote:
Hi Rik,
On 03/22/2013 11:52 AM, Rik van Riel wrote:
On 03/21/2013 08:05 PM, Will Huck wrote:
One offline question, how to understand this in function balance_pgdat:
/*
* Do some background aging of the
* Tejun Heo wrote:
> On Mon, Apr 08, 2013 at 08:31:07AM -0700, Tejun Heo wrote:
> > Andrew, ping?
>
> Ping #2. Workqueue conversion of writeback in the block tree needs
> these patches to avoid losing debug information over the conversion,
> so it'd be great if this can be scheduled for 3.10.
* Robin Holt wrote:
> For the v3.9 release, can we consider my awful patch?
How about trying what I suggested, to make reboot affine to the boot CPU
explicitly, not by shutting down all the other CPUs, but by set_cpus_allowed()
or
so?
That should solve the regression, without the ugly
I used to have one of these but have it away when cleaning out my study... no
space.
Ingo Molnar wrote:
>
>* Borislav Petkov wrote:
>
>> On Thu, Apr 11, 2013 at 12:26:09PM -0700, H. Peter Anvin wrote:
>> > What host is this?
>>
>> Judging by the DMI string in the oops:
>>
>> > [
Hello,
Saw these oopses while fuzzing with trinity.
I have some local modifications to trinity that might explain why Dave
and others have not hit this before.
Tommi
[91911.171328] warning: process `trinity-child7' used the deprecated
sysctl system call with 1029078728.32609.1029078728.32609.
* Borislav Petkov wrote:
> On Thu, Apr 11, 2013 at 12:26:09PM -0700, H. Peter Anvin wrote:
> > What host is this?
>
> Judging by the DMI string in the oops:
>
> > [ 15.921486] Pid: 73, comm: hwclock Tainted: GW3.9.0-rc6+
> > #222032 System manufacturer System Product Name/A8N-E
* Andrea Arcangeli wrote:
> Hi,
>
> On Thu, Apr 11, 2013 at 02:29:18PM +0200, Ingo Molnar wrote:
> >
> >
> > * tip-bot for Andrea Arcangeli wrote:
> >
> > > Commit-ID: f76cfa3c2496c462b5bc01bd0c9340c2715b73ca
> > > Gitweb:
> > >
better to set krule->watch = NULL.
maybe it is not a real issue, but can make code clearer,
so can help the readers to analyse another issues.
Signed-off-by: Chen Gang
---
kernel/audit_watch.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/audit_watch.c
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote:
> On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> > I think it might be more enlightening if Mel traced which process in
> > which funclion is holding the buffer lock. I suspect we'll find out that
> > the flusher thread
I think "CUI2" should be changed to "CIU2", because CIU means Central Intrrupt
Unit.
Singed-off-by: EunBong Song
---
arch/mips/cavium-octeon/octeon-irq.c |2 +-
arch/mips/include/asm/mach-cavium-octeon/irq.h |2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git
since "normally audit_add_tree_rule() will free it on failure",
need free it completely, when failure occures.
need additional put_tree before return, since get_tree was called.
always need goto error processing area for list_del_init.
Signed-off-by: Chen Gang
---
Hi Eric,
Sorry reply on top.
From the source code from linex-next.git tree, line 55~64:
#include
#include ***
#include
#ifdef CONFIG_SECURITY
#include
#endif
#include ***
#include
#include
#include
net/netlink.h is included twice, and linux/netlink.h is
2013/4/12, Wei Yongjun :
> From: Wei Yongjun
>
> Fix to return a negative error code from the error handling
> case instead of 0, as returned elsewhere in this function.
> Introduce by commit c0d39e(f2fs: fix return values from validate
> superblock)
>
> Signed-off-by: Wei Yongjun
Acked-by:
On 04/10/2013 04:51 PM, Peter Zijlstra wrote:
> On Wed, 2013-04-10 at 11:30 +0800, Michael Wang wrote:
>> | 15 GB | 32 | 35918 | | 37632 | +4.77% | 47923 | +33.42% |
>> 52241 | +45.45%
>
> So I don't get this... is wake_affine() once every milisecond _that_
> expensive?
>
> Seeing we
From: Wei Yongjun
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun
---
kernel/events/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c
On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote:
> I think it might be more enlightening if Mel traced which process in
> which funclion is holding the buffer lock. I suspect we'll find out that
> the flusher thread has submitted the buffer for IO as an async write and
> thus it takes a
On 04/09/2013 07:07 AM, Mel Gorman wrote:
balance_pgdat() is very long and some of the logic can and should
be internal to kswapd_shrink_zone(). Move it so the flow of
balance_pgdat() is marginally easier to follow.
Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
--
All rights reversed
On 04/09/2013 07:07 AM, Mel Gorman wrote:
Historically, kswapd used to congestion_wait() at higher priorities if it
was not making forward progress. This made no sense as the failure to make
progress could be completely independent of IO. It was later replaced by
wait_iff_congested() and removed
On 04/12/2013 01:04 AM, Sebastian Andrzej Siewior wrote:
> * Lai Jiangshan | 2013-04-09 09:09:56 [+0800]:
>
>> If the percpu array can be defined in __SRCU_STRUCT_INIT(),
>> I'm happy to expose it. but it is not currently.
>
> I have no idea how to achieve this.
>
>> Why crypto can't use boot
On 04/09/2013 07:07 AM, Mel Gorman wrote:
Currently kswapd queues dirty pages for writeback if scanning at an elevated
priority but the priority kswapd scans at is not related to the number
of unqueued dirty encountered. Since commit "mm: vmscan: Flatten kswapd
priority loop", the priority is
On 04/09/2013 07:06 AM, Mel Gorman wrote:
In the past, kswapd makes a decision on whether to compact memory after the
pgdat was considered balanced. This more or less worked but it is late to
make such a decision and does not fit well now that kswapd makes a decision
whether to exit the zone
On 04/09/2013 07:06 AM, Mel Gorman wrote:
kswapd stops raising the scanning priority when at least SWAP_CLUSTER_MAX
pages have been reclaimed or the pgdat is considered balanced. It then
rechecks if it needs to restart at DEF_PRIORITY and whether high-order
reclaim needs to be reset. This is not
Prepare to put page table on local nodes.
Move calling of init_mem_mapping to early_initmem_init.
Rework alloc_low_pages to alloc page table in following order:
BRK, local node, low range
update: remove two lines in changelog about xen.
Signed-off-by: Yinghai Lu
Cc: Pekka Enberg
Cc:
Hi Linus,
Please pull the following two patches to recive the fixes for slave-dmaengine
The first one fixes issue in pl330 to check for DT compatible and second one
fixes omap-dma to start without delay
The following changes since commit 07961ac7c0ee8b546658717034fe692fd12eefa9:
are available
From: Wei Yongjun
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Introduce by commit c0d39e(f2fs: fix return values from validate superblock)
Signed-off-by: Wei Yongjun
---
fs/f2fs/super.c | 3 ++-
1 file changed, 2
Please send a replacement patch.
Yinghai Lu wrote:
>On Thu, Apr 11, 2013 at 6:05 PM, Konrad Wilk
>wrote:
>>
>> - ying...@kernel.org wrote:
>>
>>> Prepare to put page table on local nodes.
>>>
>>> Move calling of init_mem_mapping to early_initmem_init.
>>>
>>> Rework alloc_low_pages to
On Thu, Apr 11, 2013 at 07:14:31PM +0200, Sedat Dilek wrote:
> On Thu, Apr 11, 2013 at 10:30 AM, Stephen Rothwell
> wrote:
> > Hi all,
> >
> > Changes since 20130410:
> >
> > The tip tree gained conflicts against the net-next and pm trees.
> >
> > The driver-core tree gained a conflict against
On Thu, Apr 11, 2013 at 6:05 PM, Konrad Wilk wrote:
>
> - ying...@kernel.org wrote:
>
>> Prepare to put page table on local nodes.
>>
>> Move calling of init_mem_mapping to early_initmem_init.
>>
>> Rework alloc_low_pages to alloc page table in following order:
>> BRK, local node, low
On Tue, 2013-03-26 at 21:29 +0800, Yuxuan Shui wrote:
> max_state may change at runtime, for example, when loading/unloading
> cpufreq policy.
>
this seems to be a problem that we have not covered yet.
when loading/unloading the cpufreq policy, the cpufreq_frequency_table
will be changed as
Add incremental accessory counters that are going to be used for
debug fs entries.
Acked-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
drivers/staging/zcache/ramster/debug.h | 67 ++
drivers/staging/zcache/ramster/ramster.c | 32 +++---
2 files
Add RAMSTER_DEBUG Kconfig entry.
Acked-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
drivers/staging/zcache/Kconfig |8
drivers/staging/zcache/Makefile|2 +-
drivers/staging/zcache/ramster/debug.h |2 +-
3 files changed, 10 insertions(+), 2 deletions(-)
Note that at this point there is no CONFIG_RAMSTER_DEBUG
option in the Kconfig. So in effect all of the counters
are nop until that option gets re-introduced in:
zcache/ramster/debug: Add RAMSTE_DEBUG Kconfig entry
Acked-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
Use an array to initialize/use debugfs attributes, it makes them
neater as zcache/debug.c does.
Acked-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
drivers/staging/zcache/ramster/debug.c | 68 +++-
1 file changed, 32 insertions(+), 36 deletions(-)
diff --git
commit 9a5c59687ad ("staging: ramster: Provide accessory functions for
counter decrease") forget decrease foregin pers pages, this patch fix
it.
Acked-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
drivers/staging/zcache/ramster/ramster.c |1 +
1 file changed, 1 insertion(+)
diff
Add how-to for ramster.
Acked-by: Dan Magenheimer
Singed-off-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
drivers/staging/zcache/ramster/HOWTO.txt | 257 ++
1 file changed, 257 insertions(+)
create mode 100644 drivers/staging/zcache/ramster/HOWTO.txt
diff
Changelog:
v1 -> v2:
* fix bisect issue
* fix issue in patch staging: ramster: Provide accessory functions for counter
decrease
* drop patch staging: zcache: remove zcache_freeze
* Add Dan Acked-by
Fix bugs in zcache and rips out the debug counters out of ramster.c and
sticks them in
Fix coding style issue: ERROR: space prohibited before that '++' (ctx:WxO)
and line beyond 8 characters.
Acked-by: Dan Magenheimer
Signed-off-by: Wanpeng Li
---
drivers/staging/zcache/debug.h | 95
1 file changed, 76 insertions(+), 19 deletions(-)
These patches allow the NUMA memory layout (meaning which node each physical
page belongs to, the mapping from physical pages to NUMA nodes) to be changed
at runtime in place (without hotplugging).
Depends on "mm: avoid duplication of setup_nr_node_ids()",
Signed-off-by: Cody P Schafer
---
include/linux/rbtree.h | 8
1 file changed, 8 insertions(+)
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 2879e96..1b239ca 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -85,4 +85,12 @@ static inline void
Use the *_is_empty() helpers to be more clear about what we're actually
checking for.
Signed-off-by: Cody P Schafer
---
mm/memory_hotplug.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index df04c36..deea8c2 100644
---
Create a new function grow_pgdat_and_zone() which handles locking +
growth of a zone & the pgdat which it is associated with.
Signed-off-by: Cody P Schafer
---
include/linux/memory_hotplug.h | 3 +++
mm/memory_hotplug.c| 17 +++--
2 files changed, 14 insertions(+), 6
Export ensure_zone_is_initialized() so that it can be used to initialize
new zones within the dynamic numa code.
Signed-off-by: Cody P Schafer
---
mm/internal.h | 8
mm/memory_hotplug.c | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/internal.h
Add postorder iteration functions for rbtree. These are useful for
safely freeing an entire rbtree without modifying the tree at all.
Signed-off-by: Cody P Schafer
---
include/linux/rbtree.h | 4
lib/rbtree.c | 40
2 files changed, 44
Add return_pages_to_zone(), which uses return_page_to_zone().
It is a minimized version of __free_pages_ok() which handles adding
pages which have been removed from another zone into a new zone.
Signed-off-by: Cody P Schafer
---
mm/internal.h | 5 -
mm/page_alloc.c | 17 +
Add a pageflag called "lookup_node"/ PG_lookup_node / Page*LookupNode().
Used by dynamic numa to indicate when a page has a new node assignment
waiting for it.
FIXME: This also exempts PG_lookup_node from PAGE_FLAGS_CHECK_AT_PREP
due to the asynchronous usage of PG_lookup_node, which needs to be
Add nid_zone(), which returns the zone corresponding to a given nid & zonenum.
Signed-off-by: Cody P Schafer
---
include/linux/mm.h | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9ddae00..1b6abae 100644
---
When dynamic numa is enabled, the last or first page in a pageblock may
have been transplanted to a new zone (or may not yet be transplanted to
a new zone).
Disable a BUG_ON() which checks that the start_page and end_page are in
the same zone, if they are not in the proper zone they will simply
free_hot_cold_page() is used for order == 0 pages, and is where the
page's zone is decided.
In the normal case, these pages are freed to the per-cpu lists. When a
page needs transplanting (ie: the actual node it belongs to has changed,
and it needs to be moved to another zone), the pcp lists are
Add a debugfs interface to dnuma/memlayout. It keeps track of a
variable backlog of memory layouts, provides some statistics on dnuma
moved pages & cache performance, and allows the setting of a new global
memlayout.
TODO: split out statistics, backlog, & write interfaces from eachother.
On x86, we have numa_info specifically to track the numa layout, which
is precisely the data memlayout needs, so use it to create an initial
memlayout.
Signed-off-by: Cody P Schafer
---
arch/x86/mm/numa.c | 28
1 file changed, 28 insertions(+)
diff --git
Signed-off-by: Cody P Schafer
---
mm/memlayout.c | 16
1 file changed, 16 insertions(+)
diff --git a/mm/memlayout.c b/mm/memlayout.c
index 45e7df6..4dc6706 100644
--- a/mm/memlayout.c
+++ b/mm/memlayout.c
@@ -247,3 +247,19 @@ void memlayout_global_init(void)
In free_pcppages_bulk(), check if a page needs to be moved to a new
node/zone & then perform the transplant (in a slightly defered manner).
Signed-off-by: Cody P Schafer
---
mm/page_alloc.c | 36 +++-
1 file changed, 35 insertions(+), 1 deletion(-)
diff --git
__free_pages_ok() handles higher order (order != 0) pages. Transplant
hook is added here as this is where the struct zone to free to is
decided.
Signed-off-by: Cody P Schafer
---
mm/page_alloc.c | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c
When a memlayout is tracked (ie: CONFIG_DYNAMIC_NUMA is enabled), rather
than iterate over numa_meminfo, a lookup can be done using memlayout.
Signed-off-by: Cody P Schafer
---
arch/x86/mm/numa.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/numa.c
Signed-off-by: Cody P Schafer
---
mm/page_alloc.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a54baa9..20304cb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -253,8 +253,11 @@ static int
Signed-off-by: Cody P Schafer
---
mm/memory_hotplug.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f5ea9b7..5fcd29e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1063,6 +1063,8 @@ int __mem_online_node(int nid)
Signed-off-by: Cody P Schafer
---
mm/page_alloc.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 20304cb..686d8f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3488,8 +3488,8 @@ static int default_zonelist_order(void)
memlayout_global_init() initializes the first memlayout, which is
assumed to match the initial page-flag nid settings.
This is done in start_kernel() as the initdata used to populate the
memlayout is purged from memory early in the boot process (XXX: When?).
Signed-off-by: Cody P Schafer
---
For dynamic numa, sometimes the hypervisor we're running under will want
to split a single NUMA node into multiple NUMA nodes. If the number of
numa nodes is limited to the number avaliable when the system booted (as
it is on x86), we may not be able to fully adopt the new memory layout
provided
On some systems, the hypervisor can (and will) relocate physical
addresses as seen in a VM between real NUMA nodes. For example, IBM
Power systems which are using particular revisions of PHYP (IBM's
proprietary hypervisor)
This change set introduces the infrastructure for tracking & dynamically
With some code that expands the zone boundaries, VM_BUG_ON(bad_range()) was
being triggered.
Previously, page_outside_zone_boundaries() decided that once it detected
a page outside the boundaries, it was certainly outside even if the
seqlock indicated the data was invalid & needed to be reread.
In dynamic numa, when onlining nodes, lock_memory_hotplug() is already
held when mem_online_node()'s functionality is needed.
Factor out the locking and create a new function __mem_online_node() to
allow reuse.
Signed-off-by: Cody P Schafer
---
include/linux/memory_hotplug.h | 1 +
With dynamic numa, pages are going to be gradully moved from one node to
another, causing the page ranges that move_freepages() examines to
contain pages that actually belong to another node.
When dynamic numa is enabled, we skip these pages instead of VM_BUGing
out on them.
This additionally
On Thu, 2013-04-04 at 16:24 -0400, Eduardo Valentin wrote:
> On 29-03-2013 10:26, Zhang Rui wrote:
> > this is the preparation work to build all the thermal core framework
> > source file, like governors, cpu cooling, etc, into one module.
> >
> > No functional change in this patch.
> >
> >
---
mm/vmstat.c | 4
1 file changed, 4 insertions(+)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index e1d8ed1..2b93877 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -495,6 +495,10 @@ void refresh_cpu_vm_stats(int cpu)
atomic_long_add(global_diff[i], _stat[i]);
}
+/*
+
On 04/11/2013 11:10 PM, Yinghai Lu wrote:
On Thu, Apr 11, 2013 at 12:41 AM, Tang Chen wrote:
3. If we add flag to memblock, we can mark different memory. And I remember
you mentioned before that we can use memblock to reserve local node data
for node-life-cycle data, like vmemmap,
- ying...@kernel.org wrote:
> Prepare to put page table on local nodes.
>
> Move calling of init_mem_mapping to early_initmem_init.
>
> Rework alloc_low_pages to alloc page table in following order:
> BRK, local node, low range
>
> Still only load_cr3 one time, otherwise we would
On 2013/4/12 1:29, Bjorn Helgaas wrote:
> On Wed, Apr 10, 2013 at 7:50 PM, Yijing Wang wrote:
Hi Bjorn,
Thanks for review.
> My goal is that a user should never have to specify a kernel boot
> parameter or edit a modules.conf file, but the user did previously
>
On Wed, Apr 10, 2013 at 4:44 PM, Dave Jiang wrote:
> These are the updated patches from first submission series and rebased against
> vinod's slave-dma git tree for-linus branch.
>
> Patches 1 & 4 have been updated after discussion with Dan.
Patches 1-4 acked.
> Patch 5 was acked by
> Dan but
For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.
We do not need to use nid in memblock to find out absent pages.
So we can move that numa_meminfo_cover_memory() early.
Also could
Move node_map_pfn_alignment() to arch/x86/mm as no other user for it.
Will update it to use numa_meminfo instead of memblock.
Signed-off-by: Yinghai Lu
---
arch/x86/mm/numa.c | 50 ++
include/linux/mm.h | 1 -
mm/page_alloc.c| 50
One commit that tried to parse SRAT early get reverted before v3.9-rc1.
| commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
| Author: Tang Chen
| Date: Fri Feb 22 16:33:44 2013 -0800
|
|acpi, memory-hotplug: parse SRAT before memblock is ready
It broke several things, like acpi override and
Need to use get_ramdisk_image() with early microcode_updating in other file.
Change it to global.
Also make it to take boot_params pointer, as head_32.S need to access it via
phys address during 32bit flat mode.
Signed-off-by: Yinghai Lu
Acked-by: Tejun Heo
Tested-by: Thomas Renninger
---
Use common get_ramdisk_image() to get ramdisk start phys address.
We need this to get correct ramdisk adress for 64bit bzImage that
initrd can be loaded above 4G by kexec-tools.
-v2: fix one typo that is found by Tang Chen
Signed-off-by: Yinghai Lu
Cc: Fenghua Yu
Acked-by: Tejun Heo
We need to have numa info ready before init_mem_mapping, so we
can call init_mem_mapping per nodes also can trim node mem range to
big alignment.
Current numa parsing need to allocate some buffer and need to be
called after init_mem_mapping.
So try to split parsing numa info to two stages, and
If node with ram is hotplugable, local node mem for page table and vmemmap
should be on that node ram.
This patch is some kind of refreshment of
| commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327
| Date: Mon Dec 27 16:48:17 2010 -0800
|
|x86-64, numa: Put pgtable to local node memory
That
Now we have arch_pfn_mapped array, and max_low_pfn_mapped should not
be used anymore.
User should use arch_pfn_mapped or just 1UL<<(32-PAGE_SHIFT) instead.
Only user is ACPI_INITRD_TABLE_OVERRIDE, and it should not use that,
as later accessing is using early_ioremap(). We could change to use
early_initmem_init() call early_x86_numa_init() to parse numa info early.
Later will call init_mem_mapping for nodes in it.
Signed-off-by: Yinghai Lu
Cc: Pekka Enberg
Cc: Jacob Shin
---
arch/x86/include/asm/page_types.h | 1 +
arch/x86/kernel/setup.c | 1 +
arch/x86/mm/init.c
Prepare to put page table on local nodes.
Move calling of init_mem_mapping to early_initmem_init.
Rework alloc_low_pages to alloc page table in following order:
BRK, local node, low range
Still only load_cr3 one time, otherwise we would break xen 64bit again.
Signed-off-by: Yinghai Lu
We need to handle slit later, as it need to allocate buffer for distance
matrix. Also we do not need SLIT info before init_mem_mapping.
So move SLIT parsing later.
x86_acpi_numa_init become x86_acpi_numa_init_srat/x86_acpi_numa_init_slit.
It should not break ia64 by replacing acpi_numa_init
We could use numa_meminfo directly instead of memblock nid.
So we could move down set memblock nid and only do it one time
for successful path.
-v2: according to tj, separate moving to another patch.
Signed-off-by: Yinghai Lu
---
arch/x86/mm/numa.c | 30 +++---
1 file
As request by hpa, add comments for why we choose 5 for
step size shift.
Signed-off-by: Yinghai Lu
---
arch/x86/mm/init.c | 21 ++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 28b294f..2754e45 100644
---
For the separation, we need to set memblock nid later, as it
could change memblock array, and possible doube memblock.memory
array that will need to allocate buffer.
Only set memblock nid one time for successful path.
Also rename numa_register_memblks to numa_check_memblks()
after move out code
head64.c could use #PF handler set page table to access initrd before
init mem mapping and initrd relocating.
head_32.S could use 32bit flat mode to access initrd before init mem
mapping initrd relocating.
That make 32bit and 64 bit more consistent.
-v2: use inline function in header file
Parsing numa info has been separated to two functions now.
early_initmem_info() only parse info in numa_meminfo and
nodes_parsed. still keep numaq, acpi_numa, amd_numa, dummy
fall back sequence working.
SLIT and numa emulation handling are still left in initmem_init().
Call early_initmem_init
For finding with 32bit, it would be easy to access initrd in 32bit
flat mode, as we don't need to set page table.
That is from head_32.S, and microcode updating already use this trick.
Need to change acpi_initrd_override_find to use phys to access global
variables.
Pass is_phys in the function,
Current acpi tables in initrd is limited to 10, that is too small.
64 should be good enough as we have 35 sigs and could have several
SSDT.
Two problems in current code prevent us from increasing limit:
1. that cpio file info array is put in stack, as every element is 32
bytes, could run out
In 32bit we will find table with phys address during 32bit flat mode
in head_32.S, because at that time we don't need set page table to
access initrd.
For copying we could use early_ioremap() with phys directly before mem mapping
is set.
To keep 32bit and 64bit consistent, use phys_addr for all.
To parse srat early, we need to move acpi table probing early.
acpi_initrd_table_override is before acpi table probing. So we need to
move it early too.
Current code acpi_initrd_table_override is after init_mem_mapping and
relocate_initrd(), so it can scan initrd and copy acpi tables with kernel
It needs to allocate buffer for new numa_meminfo and distance matrix,
so move it down.
Also we change the behavoir:
before this patch, if user input wrong data in command line, it
will fall back to next numa probing or disabling numa.
after this patch, if user input wrong data in command line, it
Move node_possible_map handling out of numa_check_memblks to avoid side
changing in numa_check_memblks().
Only set once for successful path instead of resetting in numa_init()
every time.
Suggested-by: Tejun Heo
Signed-off-by: Yinghai Lu
---
arch/x86/mm/numa.c | 11 +++
1 file
Now we only search buffer for override acpi table under 4G.
In some case, like user use memmap to exclude all low ram,
we may not find range for it under 4G.
Do second try to search above 4G.
Signed-off-by: Yinghai Lu
Cc: "Rafael J. Wysocki"
Cc: linux-a...@vger.kernel.org
Tested-by: Thomas
On Thu, Apr 11, 2013 at 08:06:10PM -0400, Mikulas Patocka wrote:
> All that I can tell you is that adding an empty atomic operation
> "cmpxchg(>bi_css->refcnt, bio->bi_css->refcnt, bio->bi_css->refcnt);"
> to bio_clone_context and bio_disassociate_task increases the time to run a
> benchmark
Am 11.04.2013 20:29, schrieb Felipe Balbi:
> and who said OMAP USB depends on CONFIG_USB_PHY ? Some platforms need to
> control a PHY and some don't.
I've read that so.
> Go check out kernel 2.6.39 (maybe even 3.1 and 3.2) and you'll see that
> we're much better off today where we can actually
On Thu, Apr 11, 2013 at 10:48:27PM +0200, Gregory CLEMENT wrote:
> Hi Jason,
>
> On 04/11/2013 08:12 PM, Jason Cooper wrote:
> > On Tue, Apr 09, 2013 at 12:52:13AM +0200, Gregory CLEMENT wrote:
> >> From: Lior Amsalem
> >>
> >> In order to be able to use more than 4GB address-cells and
1 - 100 of 1769 matches
Mail list logo