Re: [PATCH 1/2] perf_event: remove unused DEBUG_PERF_USE_VMALLOC
On Tue, Aug 30, 2016 at 02:08:34PM -0500, Kim Phillips wrote: > This 'DEBUG'-prefixed version of PERF_USE_VMALLOC is not used anywhere. > It appears to be leftovers from commit 906010b "perf_event: Provide > vmalloc() based mmap() backing" that introduced it. > > Not sure what commit cb30711 "perf_event: Don't allow vmalloc() backed > perf on powerpc" was trying to do with it either. > > Signed-off-by: Kim Phillips > Cc: Peter Zijlstra > Cc: Michael Ellerman > --- > init/Kconfig | 13 - > 1 file changed, 13 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index cac3f09..934a61f 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1707,19 +1707,6 @@ config PERF_EVENTS > > Say Y if unsure. > > -config DEBUG_PERF_USE_VMALLOC > - default n > - bool "Debug: use vmalloc to back perf mmap() buffers" > - depends on PERF_EVENTS && DEBUG_KERNEL && !PPC > - select PERF_USE_VMALLOC ^ It forces the use of vmalloc backed pages for the ring-buffer so that we can test that code on x86, which otherwise doesn't use it.
Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes
On Wed, Aug 31, 2016 at 11:39:59AM +0530, Srikar Dronamraju wrote: > This indeed fixes the problem. > Please add my > Tested-by: Srikar Dronamraju > Ok, thanks. Unfortunately we cannot do a wide conversion like this because some users of populated_zone() really meant to check for present_pages. In all cases, the expectation was that reserved pages would be tiny but fadump messes that up. Can you verify this also works please? ---8<--- mm, vmscan: Only allocate and reclaim from zones with pages managed by the buddy allocator Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts of memory when booting a secondary kernel. Srikar Dronamraju reported that multiple nodes may have no memory managed by the buddy allocator but still return true for populated_zone(). Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") was reported to cause kswapd to spin at 100% CPU usage when fadump was enabled. The old code happened to deal with the situation of a populated node with zero free pages by co-incidence but the current code tries to reclaim populated zones without realising that is impossible. We cannot just convert populated_zone() as many existing users really need to check for present_pages. This patch introduces a managed_zone() helper and uses it in the few cases where it is critical that the check is made for managed pages -- zonelist constuction and page reclaim. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 11 +-- mm/page_alloc.c| 4 ++-- mm/vmscan.c| 22 +++--- 3 files changed, 22 insertions(+), 15 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d572b78b65e1..69f886b79656 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -828,9 +828,16 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long); */ #define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones) -static inline int populated_zone(struct zone *zone) +/* Returns true if a zone has pages managed by the buddy allocator */ +static inline bool managed_zone(struct zone *zone) { - return (!!zone->present_pages); + return zone->managed_pages; +} + +/* Returns true if a zone has memory */ +static inline bool populated_zone(struct zone *zone) +{ + return zone->present_pages; } extern int movable_zone; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1c09d9f7f692..ea7558149ee5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4405,7 +4405,7 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist, do { zone_type--; zone = pgdat->node_zones + zone_type; - if (populated_zone(zone)) { + if (managed_zone(zone)) { zoneref_set_zone(zone, &zonelist->_zonerefs[nr_zones++]); check_highest_zone(zone_type); @@ -4643,7 +4643,7 @@ static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes) for (j = 0; j < nr_nodes; j++) { node = node_order[j]; z = &NODE_DATA(node)->node_zones[zone_type]; - if (populated_zone(z)) { + if (managed_zone(z)) { zoneref_set_zone(z, &zonelist->_zonerefs[pos++]); check_highest_zone(zone_type); diff --git a/mm/vmscan.c b/mm/vmscan.c index 98774f45b04a..55943a284082 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1665,7 +1665,7 @@ static bool inactive_reclaimable_pages(struct lruvec *lruvec, for (zid = sc->reclaim_idx; zid >= 0; zid--) { zone = &pgdat->node_zones[zid]; - if (!populated_zone(zone)) + if (!managed_zone(zone)) continue; if (zone_page_state_snapshot(zone, NR_ZONE_LRU_BASE + @@ -2036,7 +2036,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file, struct zone *zone = &pgdat->node_zones[zid]; unsigned long inactive_zone, active_zone; - if (!populated_zone(zone)) + if (!managed_zone(zone)) continue; inactive_zone = zone_page_state(zone, @@ -2171,7 +2171,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, for (z = 0; z < MAX_NR_ZONES; z++) { struct zone *zone = &pgdat->node_zones[z]; - if (!populated_zone(zone)) + if (!managed_zone(zone)) continue; total_high_wmark += high_wmark_pages(zone); @@ -2508,7 +2508,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, /* If compaction would go ahead or the allocation woul
Re: Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
El Tue, Aug 30, 2016 at 09:21:01PM -0700, Nicolin Chen deia: > > No, the problem is not at the rate but the source -- Although the > MLB clock exists in the clock tree as a better rate provider, it > might not be correctly enabled or running at the rate it claims. > > > There are five MLB clocks sharing the same clock gate according > to CCM chapter in the Reference Manual of imx6q. But five clocks > come from three different parent clocks, and I am wondering if > the MLB clock that's connected to the S/PDIF module is really > derived from this AXI. > > Hope Fabio might be able to help on the clock tree issue here:) > I hope too, it's a little over my head, to be euphemistic. > > Another solution for you could be to change the rates of two of > those existing clocks to the perfect rates for 44.1KHz and 48KHz > respectively, 22579200Hz and 24576000Hz for example. (If you > only need one sample rate support, changing rxtx1 SPDIF clock > only then.) Thank you very much. I'm not sure what practical problem that would solve for me, audio sounds quite right to my ears with the workaround (disabling MLB). I've looked page 121 of http://cache.freescale.com/files/32bit/doc/data_sheet/IMX6DQIEC.pdf And it seems like the the margin for the SPDIF clock would be 16 ns and I'm like 10 times out of spec. But I can't hear the problem. I may try it one day to hear how it sounds. I'll try to remember it if I ever come across some problem with my audio. For now what I'd like is to stay as close to linux-libre mainline as possible, so the quick workaround is enough for me. Now for the general case, I'm not sure what the solution should be. Page 4 of the pdf above says MLB is not present in industrial "parts", only automotive, or consumer "parts". There are several versions of IMX6Q in the market. What version must I have ? I guess consumer (with MLB) but I'm not sure... According to the wandboard-quad-rev-b1 manual its consumer, MCIMX6Q5EYM10AC, so I should have MLB, I guess. $ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 10 (v7l) BogoMIPS : 7.54 Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 CPU implementer: 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 [...] I can't tell what CPU part : 0xc09 means. In the reference manual pg 796 I see the same gate seems to affect Media Local Bus (MLB) clock and Digital Transmission Content Protection (DTCP). I don't use DTCP but I haven't done anything to disable it. http://www.nxp.com/files/32bit/doc/ref_manual/IMX6DQRM.pdf?fasp=1&WT_TYPE=Reference%20Manuals&WT_VENDOR=FREESCALE&WT_FILE_FORMAT=pdf&WT_ASSET=Documentation&fileExt=.pdf Thanks again, you've been very helpful.
Re: [PATCH v3 0/3] Account reserved memory when allocating system hash
On Mon 29-08-16 18:36:47, Srikar Dronamraju wrote: > Fadump kernel reserves large chunks of memory even before the pages are > initialised. This could mean memory that corresponds to several nodes might > fall in memblock reserved regions. > > Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise > only certain size memory per node. The certain size takes into account > the dentry and inode cache sizes. However such a kernel when booting a > secondary kernel will not be able to allocate the required amount of > memory to suffice for the dentry and inode caches. This results in > crashes like the below on large systems such as 32 TB systems. > > Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes) > vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes > swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC) > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3 > Call Trace: > [c108fb10] [c07fac88] dump_stack+0xb0/0xf0 (unreliable) > [c108fb50] [c0235264] warn_alloc_failed+0x114/0x160 > [c108fbf0] [c0281484] __vmalloc_node_range+0x304/0x340 > [c108fca0] [c028152c] __vmalloc+0x6c/0x90 > [c108fd40] [c0aecfb0] > alloc_large_system_hash+0x1b8/0x2c0 > [c108fe00] [c0af7240] inode_init+0x94/0xe4 > [c108fe80] [c0af6fec] vfs_caches_init+0x8c/0x13c > [c108ff00] [c0ac4014] start_kernel+0x50c/0x578 > [c108ff90] [c0008c6c] start_here_common+0x20/0xa8 > > This patchset solves this problem by accounting the size of reserved memory > when calculating the size of large system hashes. So I think that this is just a fallout from how fadump is hackish and tricky. Reserving large portion/majority of memory from the kernel just sounds like a mind field. This patchset is dealing with one particular problem. Fair enough, it seems like the easiest way to go and something that would be stable backport safe as well so Acked-by: Michal Hocko to those whole series but I cannot say I would be happy about the whole fadump thing... > While this patchset applies on v4.8-rc3, it cannot be tested on v4.8-rc3 > because of http://lkml.kernel.org/r/20160829093844.ga2...@linux.vnet.ibm.com > However it has been tested on v4.7/v4.6 and v4.4 another supporting argument for the above. 15 out of 16 nodes without any memory... Sigh > v2: > http://lkml.kernel.org/r/1470330729-6273-1-git-send-email-sri...@linux.vnet.ibm.com > > > Cc: linux...@kvack.org > Cc: Mel Gorman > Cc: Vlastimil Babka > Cc: Michal Hocko > Cc: Andrew Morton > Cc: Michael Ellerman > Cc: linuxppc-dev@lists.ozlabs.org > Cc: Mahesh Salgaonkar > Cc: Hari Bathini > Cc: Dave Hansen > Cc: Balbir Singh > Cc: Srikar Dronamraju > > Srikar Dronamraju (3): > mm: Introduce arch_reserved_kernel_pages() > mm/memblock: Expose total reserved memory > powerpc: Implement arch_reserved_kernel_pages > > arch/powerpc/include/asm/mmzone.h | 3 +++ > arch/powerpc/kernel/fadump.c | 5 + > include/linux/memblock.h | 1 + > include/linux/mm.h| 3 +++ > mm/memblock.c | 5 + > mm/page_alloc.c | 12 > 6 files changed, 29 insertions(+) > > -- > 1.8.5.6 -- Michal Hocko SUSE Labs
Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes
On Wed 31-08-16 09:49:42, Mel Gorman wrote: > On Wed, Aug 31, 2016 at 11:39:59AM +0530, Srikar Dronamraju wrote: > > This indeed fixes the problem. > > Please add my > > Tested-by: Srikar Dronamraju > > > > Ok, thanks. Unfortunately we cannot do a wide conversion like this > because some users of populated_zone() really meant to check for > present_pages. In all cases, the expectation was that reserved pages > would be tiny but fadump messes that up. Can you verify this also works > please? > > ---8<--- > mm, vmscan: Only allocate and reclaim from zones with pages managed by the > buddy allocator > > Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts > of memory when booting a secondary kernel. Srikar Dronamraju reported that > multiple nodes may have no memory managed by the buddy allocator but still > return true for populated_zone(). > > Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") > was reported to cause kswapd to spin at 100% CPU usage when fadump was > enabled. The old code happened to deal with the situation of a populated > node with zero free pages by co-incidence but the current code tries to > reclaim populated zones without realising that is impossible. > > We cannot just convert populated_zone() as many existing users really > need to check for present_pages. This patch introduces a managed_zone() > helper and uses it in the few cases where it is critical that the check > is made for managed pages -- zonelist constuction and page reclaim. OK, the patch makes sense to me. I am not happy about two very similar functions, to be honest though. managed vs. present checks will be quite subtle and it is not entirely clear when to use which one. I agree that the reclaim path is the most critical one so the patch seems OK to me. At least from a quick glance it should help with the reported issue so feel free to add Acked-by: Michal Hocko I expect we might want to turn other places as well but they are far from critical. I would appreciate some lead there and stick a clarifying comment [...] > -static inline int populated_zone(struct zone *zone) > +/* Returns true if a zone has pages managed by the buddy allocator */ /* * Returns true if a zone has pages managed by the buddy allocator. * All the reclaim decisions have to use this function rather than * populated_zone(). If the whole zone is reserved then we can easily * end up with populated_zone() && !managed_zone(). */ What do you think? > +static inline bool managed_zone(struct zone *zone) > { > - return (!!zone->present_pages); > + return zone->managed_pages; > +} > + > +/* Returns true if a zone has memory */ > +static inline bool populated_zone(struct zone *zone) > +{ > + return zone->present_pages; > } -- Michal Hocko SUSE Labs
Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format
On Mon, Jun 20, 2016 at 09:02:30PM -0700, Sukadev Bhattiprolu wrote: > CPUs support a large number of performance monitoring events (PMU events) > and often these events are very specific to an architecture/model of the > CPU. To use most of these PMU events with perf, we currently have to identify > them by their raw codes: > > perf stat -e r100f2 sleep 1 > > This patchset allows architectures to specify these PMU events in JSON > files located in 'tools/perf/pmu-events/arch/' of the mainline tree. > The events from the JSON files for the architecture are then built into > the perf binary. > > At run time, perf identifies the specific set of events for the CPU and > creates "event aliases". These aliases allow users to specify events by > "name" as: > > perf stat -e pm_1plus_ppc_cmpl sleep 1 > > The file, 'tools/perf/pmu-events/README' in [PATCH 16/16] gives more > details. > > Note: > - All known events tables for the architecture are included in the > perf binary. > > - For architectures that don't have any JSON files, an empty mapping > table is created and they should continue to build. > > Thanks to input from Andi Kleen, Jiri Olsa, Namhyung Kim and Ingo Molnar. > > These patches are available from: > > https://github.com/sukadev/linux.git > > Branch Description > -- > json-code-v20 Source Code only > json-data-v20 x86 and Powerpc datafiles only > json-code+data-v20 Both code and data (for build/test) > > NOTE: Only "source code" patches (i.e those in json-code-v20) are > being > emailed. Please pull the "data files" from the json-data-v20 branch. > > Changelog[v20] > - Rebase to recent perf/core > - Add Patch 20/20 to allow perf-stat to work with the period= field hi, I had discussion with Ingo about the state of this patchset and there's one more requirement from his side - to split event files into per topic files I made some initial changes over latest Sukadev's branch and came up with something like this: $ find pmu-events/arch/x86/ pmu-events/arch/x86/ pmu-events/arch/x86/NehalemEX_core pmu-events/arch/x86/NehalemEX_core/Memory.json pmu-events/arch/x86/NehalemEX_core/Virtual-Memory.json pmu-events/arch/x86/NehalemEX_core/Cache.json pmu-events/arch/x86/NehalemEX_core/Pipeline.json pmu-events/arch/x86/NehalemEX_core/Floating-point.json pmu-events/arch/x86/NehalemEX_core/Other.json pmu-events/arch/x86/mapfile.csv pmu-events/arch/x86/Broadwell_core pmu-events/arch/x86/Broadwell_core/Memory.json pmu-events/arch/x86/Broadwell_core/Virtual-Memory.json pmu-events/arch/x86/Broadwell_core/Cache.json pmu-events/arch/x86/Broadwell_core/Pipeline.json pmu-events/arch/x86/Broadwell_core/Floating-point.json pmu-events/arch/x86/Broadwell_core/Other.json pmu-events/arch/x86/Broadwell_core/Frontend.json so let's have a discussion if this is acceptable for you guys I've already made some changes in pmu-events/* to support this hierarchy to see how bad the change would be.. and it's not that bad ;-) you can check followin patches (only 2 Intel files transformed): 1d5ffa8bb969 perf, tools: Change jevents 65919f8901e3 perf, tools: Split Broadwell_core.json 7cd309a85465 perf, tools: Add Broadwell V14 event file e316aff2dd4e perf, tools: Split NehalemEX_core.json e19e8de49408 perf, tools: Add NehalemEX V1 event file It's available in: git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git perf/json thanks, jirka
Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes
On Wed, Aug 31, 2016 at 01:09:33PM +0200, Michal Hocko wrote: > > We cannot just convert populated_zone() as many existing users really > > need to check for present_pages. This patch introduces a managed_zone() > > helper and uses it in the few cases where it is critical that the check > > is made for managed pages -- zonelist constuction and page reclaim. > > OK, the patch makes sense to me. I am not happy about two very similar > functions, to be honest though. managed vs. present checks will be quite > subtle and it is not entirely clear when to use which one. In the vast majority of cases, the distinction is irrelevant. The patch only updates the places where it really matters to minimise any confusion. > Acked-by: Michal Hocko Thanks. > /* > * Returns true if a zone has pages managed by the buddy allocator. > * All the reclaim decisions have to use this function rather than > * populated_zone(). If the whole zone is reserved then we can easily > * end up with populated_zone() && !managed_zone(). > */ > > What do you think? > This makes a lot of sense. I've updated the patch and will await a test from Srikar before reposting. -- Mel Gorman SUSE Labs
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
Hi Xavi/Nicolin, On Wed, Aug 31, 2016 at 6:10 AM, Xavi Drudis Ferran wrote: > El Tue, Aug 30, 2016 at 09:21:01PM -0700, Nicolin Chen deia: >> >> No, the problem is not at the rate but the source -- Although the >> MLB clock exists in the clock tree as a better rate provider, it >> might not be correctly enabled or running at the rate it claims. >> > >> >> There are five MLB clocks sharing the same clock gate according >> to CCM chapter in the Reference Manual of imx6q. But five clocks >> come from three different parent clocks, and I am wondering if >> the MLB clock that's connected to the S/PDIF module is really >> derived from this AXI. >> >> Hope Fabio might be able to help on the clock tree issue here:) >> > > I hope too, it's a little over my head, to be euphemistic. > >> >> Another solution for you could be to change the rates of two of >> those existing clocks to the perfect rates for 44.1KHz and 48KHz >> respectively, 22579200Hz and 24576000Hz for example. (If you >> only need one sample rate support, changing rxtx1 SPDIF clock >> only then.) > > Thank you very much. I'm not sure what practical problem that would > solve for me, audio sounds quite right to my ears with the workaround > (disabling MLB). I've looked page 121 of > http://cache.freescale.com/files/32bit/doc/data_sheet/IMX6DQIEC.pdf > And it seems like the the margin for the SPDIF clock would be 16 ns > and I'm like 10 times out of spec. But I can't hear the problem. I > may try it one day to hear how it sounds. > > I'll try to remember it if I ever come across some problem with my audio. > For now what I'd like is to stay as close to linux-libre mainline > as possible, so the quick workaround is enough for me. > > Now for the general case, I'm not sure what the solution should be. > Page 4 of the pdf above says MLB is not present in industrial "parts", > only automotive, or consumer "parts". There are several versions of > IMX6Q in the market. What version must I have ? I guess consumer > (with MLB) but I'm not sure... According to the wandboard-quad-rev-b1 > manual its consumer, MCIMX6Q5EYM10AC, so I should have MLB, I guess. > > $ cat /proc/cpuinfo > processor : 0 > model name : ARMv7 Processor rev 10 (v7l) > BogoMIPS : 7.54 > Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls > vfpd32 > CPU implementer: 0x41 > CPU architecture: 7 > CPU variant : 0x2 > CPU part : 0xc09 > CPU revision : 10 > [...] > > I can't tell what CPU part : 0xc09 means. > > In the reference manual pg 796 I see the same gate seems to affect Media > Local Bus (MLB) clock and Digital Transmission Content Protection > (DTCP). I don't use DTCP but I haven't done anything to disable it. > > http://www.nxp.com/files/32bit/doc/ref_manual/IMX6DQRM.pdf?fasp=1&WT_TYPE=Reference%20Manuals&WT_VENDOR=FREESCALE&WT_FILE_FORMAT=pdf&WT_ASSET=Documentation&fileExt=.pdf Sorry for the delay. As far as I can see, there are two current issues: 1. Regression caused by: 833f2cbf7091099bae ("ARM: dts: imx6: change the core clock of spdif"). Looks like that this commit did much more than just changing the core clock of spdif. It does not mention why MLB clock has been added. Looking at MX6Q RM I do not see the connection between MLB and SPDIF. So I agree with Xavi's suggestion of using the dummy_clk instead of mlb clock. Xavi, Care to send a formal patch with your change? 2. SPDIF clock rate not accurate. Probably using PLL4 as SPDIF source would help to get more accurate SPDIF clock rates. Could you please try the untested change? --- a/drivers/clk/imx/clk-imx6q.c +++ b/drivers/clk/imx/clk-imx6q.c @@ -623,7 +623,7 @@ static void __init imx6q_clocks_init(struct device_node *ccm_node) pr_warn("failed to set up CLKO: %d\n", ret); /* Audio-related clocks configuration */ - clk_set_parent(clk[IMX6QDL_CLK_SPDIF_SEL], clk[IMX6QDL_CLK_PLL3_PFD3_454M]); + clk_set_parent(clk[IMX6QDL_CLK_SPDIF_SEL], clk[IMX6QDL_CLK_PLL4_AUDIO_DIV]); /* All existing boards with PCIe use LVDS1 */ if (IS_ENABLED(CONFIG_PCI_IMX6)) Regards, Fabio Estevam
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
Xavi, On Wed, Aug 31, 2016 at 10:11 AM, Fabio Estevam wrote: > Xavi, > > Care to send a formal patch with your change? If you prefer, I can send this change to the ARM kernel mailing list. Please let me know what you prefer. Thanks
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
El Wed, Aug 31, 2016 at 10:30:25AM -0300, Fabio Estevam deia: > Xavi, > > On Wed, Aug 31, 2016 at 10:11 AM, Fabio Estevam wrote: > > > Xavi, > > > > Care to send a formal patch with your change? > > If you prefer, I can send this change to the ARM kernel mailing list. > Whatever is easier for you. I'll have to look up the formalities for sending the patch myself (format, copyright, where to send, etc.) since I've never sent a patch for linux. I don't believe such a simple change can be copyrightable, but the original isn't mine, it's from that URL I gave, https://community.nxp.com/thread/387131 so originally from amb...@iwavesystems.com > Please let me know what you prefer. > If it's easy for you to send it yourself, I would prefer so and I'm grateful. If not, it'll be an exercise for me, no problem.
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
El Wed, Aug 31, 2016 at 10:11:13AM -0300, Fabio Estevam deia: > 2. SPDIF clock rate not accurate. Probably using PLL4 as SPDIF source > would help to get more accurate SPDIF clock rates. > > Could you please try the untested change? > > --- a/drivers/clk/imx/clk-imx6q.c > +++ b/drivers/clk/imx/clk-imx6q.c > @@ -623,7 +623,7 @@ static void __init imx6q_clocks_init(struct > device_node *ccm_node) > pr_warn("failed to set up CLKO: %d\n", ret); > > /* Audio-related clocks configuration */ > - clk_set_parent(clk[IMX6QDL_CLK_SPDIF_SEL], > clk[IMX6QDL_CLK_PLL3_PFD3_454M]); > + clk_set_parent(clk[IMX6QDL_CLK_SPDIF_SEL], > clk[IMX6QDL_CLK_PLL4_AUDIO_DIV]); > > /* All existing boards with PCIe use LVDS1 */ > if (IS_ENABLED(CONFIG_PCI_IMX6)) > I'm going to try. I'll take a while. I'll report the result later. Thank you very much.
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
Hi Xavi, On Wed, Aug 31, 2016 at 10:47 AM, Xavi Drudis Ferran wrote: > If it's easy for you to send it yourself, I would prefer so and I'm > grateful. If not, it'll be an exercise for me, no problem. I have just submitted the patch with you on Cc. If you could reply to it with your Tested-by tag, that would be great. Thanks
[ RFC PATCH 1/3] powerpc/pasemi: Add Nemo motherboard config option.
Add config option for the Nemo motherboard used in the Amigaone X1000. This is a custom PASemi board with an AMD SB600 southbridge, and needs some patches to it device tree. This option will be used to build these into the kernel Signed-off-by: Darren Stevens --- diff --git a/arch/powerpc/platforms/pasemi/Kconfig b/arch/powerpc/platforms/pasemi/Kconfig index 00d4b28..c7f1dbe 100644 --- a/arch/powerpc/platforms/pasemi/Kconfig +++ b/arch/powerpc/platforms/pasemi/Kconfig @@ -14,6 +14,16 @@ config PPC_PASEMI menu "PA Semi PWRficient options" depends on PPC_PASEMI +config PPC_PASEMI_NEMO + bool "Nemo motherboard Support" + depends on PPC_PASEMI + select PPC_I8259 + help + This option enables support for the 'Nemo' motherboard + used in A-Eons's Amigaone X1000. This consists of some + device tree patches and workarounds for the SB600 South + Bridge that provides SATA/USB/Audio. + config PPC_PASEMI_IOMMU bool "PA Semi IOMMU support" depends on PPC_PASEMI
[ RFC PATCH 0/3] powerpc/pasemi: initial Nemo motherboard support.
The following series of 3 patches brings initial device tree patches for A-Eon's Nemo motherboard, as used in the Amigaone X1000. The dtb passed by the CFE firmware has a number of issues, which up till now have been fixed by use of patches applied to the mainline kernel. This occasionally causes problems with changes made to mainline. Patching the firmware to correct the dtb is not an option for the following reasons: It was modified by a 3rd party, and we don't have a copy of the source. All versions of CFE used on the X1000 export the same dtb. At least one machine suffered damage during a firmware upgrade attempt, many people will be unwilling to reflash their system if an upgrade is produced. I've changed the config option 'CONFIG_PPC_PASEMI_SB600' used in our current patch 'CONFIG_PPC_PASEMI_NEMO' which I think better describes its function. Kind regards Darren
[ RFC PATCH 3/3] powerpc:pasemi: Fix device_type of Nemo SB600 node.
The of_node for the SB600 (io-bridge) has its device_type set to 'io-bridge' Set it to 'isa' so that it can be found by isa_bridge_find_early() instead of using patches in the kernel. Signed-off-by: Darren Stevens --- diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 8269093..d75937a 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2697,6 +2697,24 @@ static void __init fixup_device_tree_pasemi(void) } } } + + /* +* The io-bridge has device_type set to 'io-bridge' +* change it to 'isa' so that generic isa-bridge code can add the SB600 and +* its on-board peripherals. +*/ + + name = "/pxp@0,e000/io-bridge@0"; +iob = call_prom("finddevice", 1, 1, ADDR(name)); +if (!PHANDLE_VALID(iob)) +return; + + /* device_type is already set, just change it. */ + + prom_printf("Changing device_type of SB600 node...\n"); + + prom_setprop(iob, name, "device_type", "isa", sizeof("isa")); + #endif //CONFIG_PPC_PASEMI_NEMO } #else
[ RFC PATCH 2/3] powerpc/pasemi: Fix Nemo SB600 i8259 interrupts.
The device tree on the Nemo passes all of the i8259 interruts with numbers between 212 and 222, and points their interrupt-parent property to the pasemi-opic, requiring custom patches to the kernel. Fix the values so that they can be controlled by the generic ppc i8259 code. Signed-off-by: Darren Stevens --- diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 4e74fc5..8269093 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2639,6 +2639,69 @@ static void __init fixup_device_tree_efika(void) #else #define fixup_device_tree_efika() #endif +#ifdef CONFIG_PPC_PASEMI +static void __init fixup_device_tree_pasemi(void) +{ +#ifdef CONFIG_PPC_PASEMI_NEMO +/* + * CFE supplied on Nemo is broken in several ways, biggest + * problem is that it reassigns ISA interrupts to unused mpic ints. + * Add an interrupt-controller property for the io-bridge to use + * and correct the ints so we can attach them to an irq_domain + */ + phandle iob, node; + u32 interrupts[2]; + u32 parent; + u32 val = 0, rval; + char * name, * pci_name; + + /* Find the root pci node */ + name = "/pxp@0,e000"; + iob = call_prom("finddevice", 1, 1, ADDR(name)); + if (!PHANDLE_VALID(iob)) + return; + + /* check if interrupt-controller node set yet */ + if (prom_getproplen(iob, "interrupt-controller") !=PROM_ERROR) + return; + + prom_printf("adding interrupt-controller property for SB600...\n"); + + prom_setprop(iob, name, "interrupt-controller", &val, 0); + + pci_name = "/pxp@0,e000/pci@11"; + node = call_prom("finddevice", 1, 1, ADDR(pci_name)); + parent = ADDR(iob); + for( ; prom_next_node(&node); ) { + /* scan each node for one with an interrupt */ + if (PHANDLE_VALID(node)) { + rval = prom_getproplen(node, "interrupts"); + if (rval != 0 && rval != PROM_ERROR) { + prom_getprop(node, "interrupts", &interrupts, sizeof(interrupts)); + if ((interrupts[0] > 211) && (interrupts[0] < 223)) { + /* found a node, update both interrupts and interrupt-parent */ + if ((interrupts[0] > 211) && (interrupts[0] < 216)) + interrupts[0] -= 203; + if ((interrupts[0] > 215) && (interrupts[0] < 221)) + interrupts[0] -= 213; + if (interrupts[0] == 221) + interrupts[0] = 14; + if (interrupts[0] == 222) + interrupts[0] = 8; + + prom_setprop(node, pci_name, "interrupts", interrupts, + sizeof(interrupts)); + prom_setprop(node, pci_name, "interrupt-parent", &parent, + sizeof(parent)); + } + } + } + } +#endif //CONFIG_PPC_PASEMI_NEMO +} +#else +#define fixup_device_tree_pasemi() +#endif static void __init fixup_device_tree(void) { @@ -2647,6 +2710,7 @@ static void __init fixup_device_tree(void) fixup_device_tree_chrp(); fixup_device_tree_pmac(); fixup_device_tree_efika(); + fixup_device_tree_pasemi(); } static void __init prom_find_boot_cpu(void)
Re: hwrng: pasemi_rng.c: Migrate to managed API
Hello PrasannaKumar On 30/08/2016, PrasannaKumar Muralidharan wrote: > Hi Darren, >> On mine (Amigaone X1000) that is correct, we boot linux with a vmlinux >> file, and the bootloader (CFE) passes a fixed dtb. I think it is >> possible to dump the tree from inside CFE, if it would help I can >> invetigate? > > I don't know if it is possible to get dts from dtb even if you manage > to extract devicetree blob from your system. I didn't explain well, There is a CFE command 'show devtree' here's the relevant bits (I Hope) [CFE ]CFE> show devtree [/] | #interrupt-cells val 0x0002 | #address-cells val 0x0002 | #size-cells val 0x0002 ...[snip]... [sdc@fc00] | name str 'sdc' | device_type str 'sdc' | #address-cellsval 0x0001 | #size-cells val 0x0001 | compatiblestr '1682m-sdc' 'pasemi,pwrficient-sdc' 'pasemi,sdc' | reg cell FC00 0080 ...[snip]... [rng@fc105000] | name str 'rng' | device_typestr 'rng' | compatible str '1682m-rng' 'pasemi,pwrficient-rng' 'pasemi,rng' | regcell FC105000 1000 Regards
Re: [PATCH] ps3: Remove deprecated create_singlethread_workqueue
On Tue, Aug 30, 2016 at 10:44:51PM +0530, Bhaktipriya Shridhar wrote: > The workqueue "ps3av->wq" queues a single work item &ps3av->work and hence > doesn't require ordering. It is involved in waking up ps3avd to do the > video mode setting and hence it's not being used on a memory reclaim > path. Hence, it has been converted to use system_wq. > > System workqueues have been able to handle high level of concurrency > for a long time now and hence it's not required to have a singlethreaded > workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue > created with create_singlethread_workqueue(), system_wq allows multiple > work items to overlap executions even on the same CPU; however, a > per-cpu workqueue doesn't have any CPU locality or global ordering > guarantee unless the target CPU is explicitly specified and thus the > increase of local concurrency shouldn't make any difference. > > The work item has been flushed in ps3av_remove to ensure that > there are no pending tasks while disconnecting the driver. > > Signed-off-by: Bhaktipriya Shridhar Acked-by: Tejun Heo Thanks. -- tejun
Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format
> hi, > I had discussion with Ingo about the state of this patchset > and there's one more requirement from his side - to split > event files into per topic files Thanks Jiri. > > I made some initial changes over latest Sukadev's branch > and came up with something like this: Did you just split it by the "Topic" fields? > > $ find pmu-events/arch/x86/ > pmu-events/arch/x86/ > pmu-events/arch/x86/NehalemEX_core > pmu-events/arch/x86/NehalemEX_core/Memory.json > pmu-events/arch/x86/NehalemEX_core/Virtual-Memory.json > pmu-events/arch/x86/NehalemEX_core/Cache.json > pmu-events/arch/x86/NehalemEX_core/Pipeline.json > pmu-events/arch/x86/NehalemEX_core/Floating-point.json > pmu-events/arch/x86/NehalemEX_core/Other.json > pmu-events/arch/x86/mapfile.csv > pmu-events/arch/x86/Broadwell_core > pmu-events/arch/x86/Broadwell_core/Memory.json > pmu-events/arch/x86/Broadwell_core/Virtual-Memory.json > pmu-events/arch/x86/Broadwell_core/Cache.json > pmu-events/arch/x86/Broadwell_core/Pipeline.json > pmu-events/arch/x86/Broadwell_core/Floating-point.json > pmu-events/arch/x86/Broadwell_core/Other.json > pmu-events/arch/x86/Broadwell_core/Frontend.json > > so let's have a discussion if this is acceptable for you guys Splitting is fine for me, as long as it's scriptable. I already have some scripts to generate the perf json files, can update them to split. > > I've already made some changes in pmu-events/* to support > this hierarchy to see how bad the change would be.. and > it's not that bad ;-) Everything has to be automated, please no manual changes. -Andi
Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format
On Wed, Aug 31, 2016 at 07:42:47AM -0700, Andi Kleen wrote: > > hi, > > I had discussion with Ingo about the state of this patchset > > and there's one more requirement from his side - to split > > event files into per topic files > > Thanks Jiri. > > > > I made some initial changes over latest Sukadev's branch > > and came up with something like this: > > Did you just split it by the "Topic" fields? yep > > > > > $ find pmu-events/arch/x86/ > > pmu-events/arch/x86/ > > pmu-events/arch/x86/NehalemEX_core > > pmu-events/arch/x86/NehalemEX_core/Memory.json > > pmu-events/arch/x86/NehalemEX_core/Virtual-Memory.json > > pmu-events/arch/x86/NehalemEX_core/Cache.json > > pmu-events/arch/x86/NehalemEX_core/Pipeline.json > > pmu-events/arch/x86/NehalemEX_core/Floating-point.json > > pmu-events/arch/x86/NehalemEX_core/Other.json > > pmu-events/arch/x86/mapfile.csv > > pmu-events/arch/x86/Broadwell_core > > pmu-events/arch/x86/Broadwell_core/Memory.json > > pmu-events/arch/x86/Broadwell_core/Virtual-Memory.json > > pmu-events/arch/x86/Broadwell_core/Cache.json > > pmu-events/arch/x86/Broadwell_core/Pipeline.json > > pmu-events/arch/x86/Broadwell_core/Floating-point.json > > pmu-events/arch/x86/Broadwell_core/Other.json > > pmu-events/arch/x86/Broadwell_core/Frontend.json > > > > so let's have a discussion if this is acceptable for you guys > > Splitting is fine for me, as long as it's scriptable. > > I already have some scripts to generate the perf json files, > can update them to split. yep, there's split-json.py script earlier in the perf/json branch > > > > > I've already made some changes in pmu-events/* to support > > this hierarchy to see how bad the change would be.. and > > it's not that bad ;-) > > Everything has to be automated, please no manual changes. sure so, if you're ok with the layout, how do you want to proceed further? thanks, jirka
Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format
> > > > > > > > I've already made some changes in pmu-events/* to support > > > this hierarchy to see how bad the change would be.. and > > > it's not that bad ;-) > > > > Everything has to be automated, please no manual changes. > > sure > > so, if you're ok with the layout, how do you want to proceed further? If the split version is acceptable it's fine for me to merge it. I'll add split-json to my scripting, so the next update would be split too. -Andi
Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes
> mm, vmscan: Only allocate and reclaim from zones with pages managed by the > buddy allocator > > Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts > of memory when booting a secondary kernel. Srikar Dronamraju reported that > multiple nodes may have no memory managed by the buddy allocator but still > return true for populated_zone(). > > Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") > was reported to cause kswapd to spin at 100% CPU usage when fadump was > enabled. The old code happened to deal with the situation of a populated > node with zero free pages by co-incidence but the current code tries to > reclaim populated zones without realising that is impossible. > > We cannot just convert populated_zone() as many existing users really > need to check for present_pages. This patch introduces a managed_zone() > helper and uses it in the few cases where it is critical that the check > is made for managed pages -- zonelist constuction and page reclaim. one nit s/constuction/construction/ > Verified that it works fine. -- Thanks and Regards Srikar Dronamraju
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
El Wed, Aug 31, 2016 at 03:49:25PM +0200, Xavi Drudis Ferran deia: > El Wed, Aug 31, 2016 at 10:11:13AM -0300, Fabio Estevam deia: > > 2. SPDIF clock rate not accurate. Probably using PLL4 as SPDIF source > > would help to get more accurate SPDIF clock rates. > > > > Could you please try the untested change? > > > > --- a/drivers/clk/imx/clk-imx6q.c > > +++ b/drivers/clk/imx/clk-imx6q.c > > @@ -623,7 +623,7 @@ static void __init imx6q_clocks_init(struct > > device_node *ccm_node) > > pr_warn("failed to set up CLKO: %d\n", ret); > > > > /* Audio-related clocks configuration */ > > - clk_set_parent(clk[IMX6QDL_CLK_SPDIF_SEL], > > clk[IMX6QDL_CLK_PLL3_PFD3_454M]); > > + clk_set_parent(clk[IMX6QDL_CLK_SPDIF_SEL], > > clk[IMX6QDL_CLK_PLL4_AUDIO_DIV]); > > > > /* All existing boards with PCIe use LVDS1 */ > > if (IS_ENABLED(CONFIG_PCI_IMX6)) > > > > I'm going to try. I'll take a while. I'll report the result later. > > Thank you very much. I just tried. Spdif output still works. I can't hear any difference. I've summarised the tests in a table: Nominal Hz 32000 44100 48000 96000 192000 ns 31250 22676 20833 10417 5208 Linux-libre-4.7 (unchanged) (no spdif output) Hz 32226 43882 47965 95930 196428 ns 31031 22788 20849 10424 5091 deviation(ns) 219 -113-15 -8117 only core (SPDIF_GCLK), rxtx0 (CLK_OSC), rxtx1(SPDIF) & spba (spdif output) Hz 31719 43859 47368 94736 187500 ns 31527 22800 2 10556 5333 deviation(ns) -277 -125 -278 -139 -125 without MLB (the rest unchanged) (spdif output) Hz 32226 43859 47368 94736 187500 ns 31031 22800 2 10556 5333 deviation(ns) 219 -125 -278 -139 -125 without MLB, and PLL4 instead of PLL3 for SPDIF (spdif output) Hz 32226 44836 49107 93750 187500 ns 31031 22304 20364 10667 5333 deviation(ns) 219372470 -250 -125 I saw page 121 of http://cache.freescale.com/files/32bit/doc/data_sheet/IMX6DQIEC.pdf And it seems like the the margin for the SPDIF clock would be 16 ns so I've just inverted the frequencies to compare, but I'm not convinced it's relevant. Here's the extract from dmesg with your patch (and .dtsi like mainline except MLB replaced with DUMMY). [...] [7.517394] etnaviv-gpu 13.gpu: model: GC2000, revision: 5108 [7.578089] imx_thermal 200.aips-bus:tempmon: Extended Commercial CPU temperature grade - max:105C critical:100C passive:95C [7.594443] etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215 [7.594454] etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0 [7.594459] etnaviv-gpu 2204000.gpu: hw init failed: -6 [7.653041] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe [7.656319] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk [7.704159] fsl-spdif-dai 2004000.spdif: use rxtx5 as tx clock source for 32000Hz sample rate [7.711719] fsl-spdif-dai 2004000.spdif: use txclk df 16 for 32000Hz sample rate [7.718328] fsl-spdif-dai 2004000.spdif: use sysclk df 2 for 32000Hz sample rate [7.724762] fsl-spdif-dai 2004000.spdif: the best rate for 32000Hz sample rate is 32226Hz [7.732679] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk [7.760087] fsl-spdif-dai 2004000.spdif: use rxtx5 as tx clock source for 44100Hz sample rate [7.766536] fsl-spdif-dai 2004000.spdif: use txclk df 1 for 44100Hz sample rate [7.772986] fsl-spdif-dai 2004000.spdif: use sysclk df 23 for 44100Hz sample rate [7.780120] fsl-spdif-dai 2004000.spdif: the best rate for 44100Hz sample rate is 44836Hz [7.788952] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk [7.825238] fsl-spdif-dai 2004000.spdif: use rxtx5 as tx clock source for 48000Hz sample rate [7.831515] fsl-spdif-dai 2004000.spdif: use txclk df 7 for 48000Hz sample rate [7.837583] fsl-spdif-dai 2004000.spdif: use sysclk df 3 for 48000Hz sample rate [7.843718] fsl-spdif-dai 2004000.spdif: the best rate for 48000Hz sample rate is 49107Hz [7.849672] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk [7.878632] fsl-spdif-dai 2004000.spdif: use rxtx0 as tx clock source for 96000Hz sample rate [7.884550] fsl-spdif-dai 2004000.spdif: use txclk df 4 for 96000Hz sample rate [7.890390] fsl-spdif-dai 2004000.spdif: the best rate for 96000Hz sample rate is 93750Hz [7.896253] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk [7.921228] fsl-spdif-dai 2004000.spdif
Re: [alsa-devel] Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1
On Wed, Aug 31, 2016 at 2:49 PM, Xavi Drudis Ferran wrote: > Thank you amd feel free to suggest more tests, but it is good enough > as it is for me. Ok, thanks for trying. So let's keep the SPDIF parent clock as is.
Re: [PATHC v2 0/9] ima: carry the measurement list across kexec
On Tue, 30 Aug 2016 18:40:02 -0400 Mimi Zohar wrote: > The TPM PCRs are only reset on a hard reboot. In order to validate a > TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list > of the running kernel must be saved and then restored on the subsequent > boot, possibly of a different architecture. > > The existing securityfs binary_runtime_measurements file conveniently > provides a serialized format of the IMA measurement list. This patch > set serializes the measurement list in this format and restores it. > > Up to now, the binary_runtime_measurements was defined as architecture > native format. The assumption being that userspace could and would > handle any architecture conversions. With the ability of carrying the > measurement list across kexec, possibly from one architecture to a > different one, the per boot architecture information is lost and with it > the ability of recalculating the template digest hash. To resolve this > problem, without breaking the existing ABI, this patch set introduces > the boot command line option "ima_canonical_fmt", which is arbitrarily > defined as little endian. > > The need for this boot command line option will be limited to the > existing version 1 format of the binary_runtime_measurements. > Subsequent formats will be defined as canonical format (eg. TPM 2.0 > support for larger digests). > > This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer > hand-over for the next kernel" patch set. > > These patches can also be found in the next-kexec-restore branch of: > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git I'll merge these into -mm to get some linux-next exposure. I don't know what your upstream merge planes will be?
Re: [PATHC v2 0/9] ima: carry the measurement list across kexec
On Wed, 2016-08-31 at 13:50 -0700, Andrew Morton wrote: > On Tue, 30 Aug 2016 18:40:02 -0400 Mimi Zohar > wrote: > > > The TPM PCRs are only reset on a hard reboot. In order to validate a > > TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list > > of the running kernel must be saved and then restored on the subsequent > > boot, possibly of a different architecture. > > > > The existing securityfs binary_runtime_measurements file conveniently > > provides a serialized format of the IMA measurement list. This patch > > set serializes the measurement list in this format and restores it. > > > > Up to now, the binary_runtime_measurements was defined as architecture > > native format. The assumption being that userspace could and would > > handle any architecture conversions. With the ability of carrying the > > measurement list across kexec, possibly from one architecture to a > > different one, the per boot architecture information is lost and with it > > the ability of recalculating the template digest hash. To resolve this > > problem, without breaking the existing ABI, this patch set introduces > > the boot command line option "ima_canonical_fmt", which is arbitrarily > > defined as little endian. > > > > The need for this boot command line option will be limited to the > > existing version 1 format of the binary_runtime_measurements. > > Subsequent formats will be defined as canonical format (eg. TPM 2.0 > > support for larger digests). > > > > This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer > > hand-over for the next kernel" patch set. > > > > These patches can also be found in the next-kexec-restore branch of: > > git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git > > I'll merge these into -mm to get some linux-next exposure. I don't > know what your upstream merge plans will be? Sounds good. I'm hoping to get some review/comments on this patch set as well. At the moment, I'm chasing down a kernel test robot report from this afternoon. Mimi
Re: [PATHC v2 5/9] ima: on soft reboot, save the measurement list
Hi, Mimi On 08/30/16 at 06:40pm, Mimi Zohar wrote: > From: Thiago Jung Bauermann > > This patch uses the kexec buffer passing mechanism to pass the > serialized IMA binary_runtime_measurements to the next kernel. > > Changelog v2: > - Fix build issue by defining a stub ima_add_kexec_buffer and stub > struct kimage when CONFIG_IMA=n and CONFIG_IMA_KEXEC=n. (Fenguang Wu) > - removed kexec_add_handover_buffer() checksum argument. > - added skip_checksum member to kexec_buf > - only register reboot notifier once > > Changelog v1: > - updated to call IMA functions (Mimi) > - move code from ima_template.c to ima_kexec.c (Mimi) > > Signed-off-by: Thiago Jung Bauermann > Signed-off-by: Mimi Zohar > --- > include/linux/ima.h| 12 ++ > kernel/kexec_file.c| 4 ++ > security/integrity/ima/ima_kexec.c | 88 > ++ > 3 files changed, 104 insertions(+) > > diff --git a/include/linux/ima.h b/include/linux/ima.h > index 0eb7c2e..7f6952f 100644 > --- a/include/linux/ima.h > +++ b/include/linux/ima.h > @@ -11,6 +11,7 @@ > #define _LINUX_IMA_H > > #include > +#include > struct linux_binprm; > > #ifdef CONFIG_IMA > @@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void > *buf, loff_t size, > enum kernel_read_file_id id); > extern void ima_post_path_mknod(struct dentry *dentry); > > +#ifdef CONFIG_IMA_KEXEC > +extern void ima_add_kexec_buffer(struct kimage *image); > +#endif > + > #else > static inline int ima_bprm_check(struct linux_binprm *bprm) > { > @@ -62,6 +67,13 @@ static inline void ima_post_path_mknod(struct dentry > *dentry) > > #endif /* CONFIG_IMA */ > > +#ifndef CONFIG_IMA_KEXEC > +struct kimage; > + > +static inline void ima_add_kexec_buffer(struct kimage *image) > +{} > +#endif > + > #ifdef CONFIG_IMA_APPRAISE > extern void ima_inode_post_setattr(struct dentry *dentry); > extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name, > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 0e90d14..9585861 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -200,6 +201,9 @@ kimage_file_prepare_segments(struct kimage *image, int > kernel_fd, int initrd_fd, > return ret; > image->kernel_buf_len = size; > > + /* IMA needs to pass the measurement list to the next kernel. */ > + ima_add_kexec_buffer(image); > + > /* Call arch image probe handlers */ > ret = arch_kexec_kernel_image_probe(image, image->kernel_buf, > image->kernel_buf_len); > diff --git a/security/integrity/ima/ima_kexec.c > b/security/integrity/ima/ima_kexec.c > index e77ca9d..0e4d0db 100644 > --- a/security/integrity/ima/ima_kexec.c > +++ b/security/integrity/ima/ima_kexec.c > @@ -23,6 +23,11 @@ > > #include "ima.h" > > +#ifdef CONFIG_IMA_KEXEC > +/* Physical address of the measurement buffer in the next kernel. */ > +static unsigned long kexec_buffer_load_addr; > +static size_t kexec_segment_size; > + > static int ima_dump_measurement_list(unsigned long *buffer_size, void > **buffer, >unsigned long segment_size) > { > @@ -75,6 +80,89 @@ out: > } > > /* > + * Called during kexec execute so that IMA can save the measurement list. > + */ > +static int ima_update_kexec_buffer(struct notifier_block *self, > +unsigned long action, void *data) > +{ > + void *kexec_buffer = NULL; > + size_t kexec_buffer_size; > + int ret; > + > + if (!kexec_in_progress) > + return NOTIFY_OK; > + > + kexec_buffer_size = ima_get_binary_runtime_size(); > + if (kexec_buffer_size > > + (kexec_segment_size - sizeof(struct ima_kexec_hdr))) { > + pr_err("Binary measurement list grew too large.\n"); > + goto out; > + } > + > + ima_dump_measurement_list(&kexec_buffer_size, &kexec_buffer, > + kexec_segment_size); > + if (!kexec_buffer) { > + pr_err("Not enough memory for the kexec measurement buffer.\n"); > + goto out; > + } > + ret = kexec_update_segment(kexec_buffer, kexec_buffer_size, > +kexec_buffer_load_addr, kexec_segment_size); > + if (ret) > + pr_err("Error updating kexec buffer: %d\n", ret); > +out: > + return NOTIFY_OK; > +} > + > +struct notifier_block update_buffer_nb = { > + .notifier_call = ima_update_kexec_buffer, > +}; > + > +/* > + * Called during kexec_file_load so that IMA can add a segment to the kexec > + * image for the measurement list for the next kernel. > + */ > +void ima_add_kexec_buffer(struct kimage *image) > +{ > + static int registered = 0; > + struct kexec_buf kbuf = { .image = image, .buf_
Re: [PATCH 00/13] Add support for perf_arch_regs
On Tuesday 30 August 2016 09:31 PM, Nilay Vaish wrote: On 28 August 2016 at 16:00, Madhavan Srinivasan wrote: Patchset to extend PERF_SAMPLE_REGS_INTR to include platform specific PMU registers. Patchset applies cleanly on tip:perf/core branch It's a perennial request from hardware folks to be able to see the raw values of the pmu registers. Partly it's so that they can verify perf is doing what they want, and some of it is that they're interested in some of the more obscure info that isn't plumbed out through other perf interfaces. Over the years internally we have used various hack to get the requested data out but this is an attempt to use a somewhat standard mechanism (using PERF_SAMPLE_REGS_INTR). This would also be helpful for those of us working on the perf hardware backends, to be able to verify that we're programming things correctly, without resorting to debug printks etc. Mechanism proposed: 1)perf_regs structure is extended with a perf_arch_regs structure which each arch/ can populate with their specific platform registers to sample on each perf interrupt and an arch_regs_mask variable, which is for perf tool to know about the perf_arch_regs that are supported. 2)perf/core func perf_sample_regs_intr() extended to update the perf_arch_regs structure and the perf_arch_reg_mask. Set of new support functions added perf_get_arch_regs_mask() and perf_get_arch_reg() to aid the updates from arch/ side. 3) perf/core funcs perf_prepare_sample() and perf_output_sample() are extended to support the update for the perf_arch_regs_mask and perf_arch_regs in the sample 4)perf/core func perf_output_sample_regs() extended to dump the arch_regs to the output sample. 5)Finally, perf tool side is updated to include a new element "arch_regs_mask" in the "struct regs_dump", event sample funcs and print functions are updated to support perf_arch_regs. I read the patch series and I have one suggestion to make. I think we should not use 'arch regs' to refer to these pmu registers. I think Reason is that they are arch specific pmu regs. But I guess we can go with pmu_regs also. And having a "pregs" as option to list in -I? will be fine? (patch 13 in the patch series) Maddy architectural registers typically refer to the ones that hold the state of the process. Can we replace arch_regs by pmu_regs, or some other choice? Thanks Nilay
Re: [PATCH 04/13] perf/core: Extend perf_output_sample_regs() to include perf_arch_regs
On Tuesday 30 August 2016 09:41 PM, Nilay Vaish wrote: On 28 August 2016 at 16:00, Madhavan Srinivasan wrote: diff --git a/kernel/events/core.c b/kernel/events/core.c index 274288819829..e16bf4d057d1 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5371,16 +5371,24 @@ u64 __attribute__((weak)) perf_arch_reg_value(struct perf_arch_regs *regs, static void perf_output_sample_regs(struct perf_output_handle *handle, - struct pt_regs *regs, u64 mask) + struct perf_regs *regs, u64 mask) { int bit; DECLARE_BITMAP(_mask, 64); + u64 arch_regs_mask = regs->arch_regs_mask; bitmap_from_u64(_mask, mask); for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) { u64 val; - val = perf_reg_value(regs, bit); + val = perf_reg_value(regs->regs, bit); + perf_output_put(handle, val); + } + + bitmap_from_u64(_mask, arch_regs_mask); + for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) { + u64 val; + val = perf_arch_reg_value(regs->arch_regs, bit); perf_output_put(handle, val); } } @@ -5792,7 +5800,7 @@ void perf_output_sample(struct perf_output_handle *handle, if (abi) { u64 mask = event->attr.sample_regs_user; perf_output_sample_regs(handle, - data->regs_user.regs, + &data->regs_user, mask); } } @@ -5827,7 +5835,7 @@ void perf_output_sample(struct perf_output_handle *handle, u64 mask = event->attr.sample_regs_intr; perf_output_sample_regs(handle, - data->regs_intr.regs, + &data->regs_intr, mask); } } -- 2.7.4 I would like to suggest a slightly different version. Would it make more sense to have something like following: I agree we are outputting two different structures, but since we use the INTR_REG infrastructure to dump the arch pmu registers, I preferred to extend perf_output_sample_regs. But I guess I can break it up. Maddy @@ -5792,7 +5800,7 @@ void perf_output_sample(struct perf_output_handle *handle, if (abi) { u64 mask = event->attr.sample_regs_user; perf_output_sample_regs(handle, data->regs_user.regs, mask); } + + if (arch_regs_mask) { + perf_output_pmu_regs(handle, data->regs_users.arch_regs, arch_regs_mask); + } } Somehow I don't like outputting the two sets of registers through the same function call. -- Nilay
Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format
On Wed, Aug 31, 2016 at 09:15:30AM -0700, Andi Kleen wrote: > > > > > > > > > > > I've already made some changes in pmu-events/* to support > > > > this hierarchy to see how bad the change would be.. and > > > > it's not that bad ;-) > > > > > > Everything has to be automated, please no manual changes. > > > > sure > > > > so, if you're ok with the layout, how do you want to proceed further? > > If the split version is acceptable it's fine for me to merge it. > > I'll add split-json to my scripting, so the next update would > be split too. ook, I'll wait for patches then thanks, jirka